23.3. 文字セットサポート

PostgreSQL 17.5文書
		第23章ローカライゼーション	誤訳等の報告
前へ	上へ	23.3. 文字セットサポート	次へ

23.3. 文字セットサポート #

<title>Character Set Support</title>

The character set support in <productname>PostgreSQL</productname> allows you to store text in a variety of character sets (also called encodings), including single-byte character sets such as the ISO 8859 series and multiple-byte character sets such as <acronym>EUC</acronym> (Extended Unix Code), UTF-8, and Mule internal code. All supported character sets can be used transparently by clients, but a few are not supported for use within the server (that is, as a server-side encoding). The default character set is selected while initializing your <productname>PostgreSQL</productname> database cluster using <command>initdb</command>. It can be overridden when you create a database, so you can have multiple databases each with a different character set. PostgreSQLの文字セット（エンコーディングとも呼ばれます）サポートにより、ISO 8859シリーズなどのシングルバイト文字やEUC（拡張Unixコード）、UTF-8、Mule内部コードなどのマルチバイト文字を含む、各種文字セットでテキストを保存できます。全ての文字セットはクライアントにより透過的に使用できますが、いくつかは、サーバ内での（つまりサーバサイドエンコーディングとして）使用はサポートされていません。デフォルトの文字セットは、initdbを使用したPostgreSQLデータベースクラスタの初期化時に決定されます。これは、データベースを作成する時に上書きできるので、異なる文字セットを使用した複数のデータベースを持つことができます。

An important restriction, however, is that each database's character set must be compatible with the database's <envar>LC_CTYPE</envar> (character classification) and <envar>LC_COLLATE</envar> (string sort order) locale settings. For <literal>C</literal> or <literal>POSIX</literal> locale, any character set is allowed, but for other libc-provided locales there is only one character set that will work correctly. (On Windows, however, UTF-8 encoding can be used with any locale.) If you have ICU support configured, ICU-provided locales can be used with most but not all server-side encodings. しかし重要な制限として、それぞれのデータベースの文字セットがサーバのLC_CTYPE（文字分類）およびLC_COLLATE（文字列並べ替え順序）ロケール設定と互換性がなくてはいけないことがあげられます。 CもしくはPOSIXロケール設定の場合、どのような文字セットも許可されています。しかし、libcが提供する他のロケール設定の場合、正しく動作する文字セットはひとつだけとなります。（しかしWindowsではUTF-8符号化方式をどのロケールでも使用できます。） ICUサポートが組み込まれている場合は、サーバサイドのすべてではないにしても、ほとんどのエンコーディングで、ICUが提供する照合順序が利用できます。

23.3.1. サポートされる文字セット #

<title>Supported Character Sets</title>

<xref linkend="charset-table"/> shows the character sets available for use in <productname>PostgreSQL</productname>. PostgreSQLで使用できる文字セットを表 23.3に示します。

表23.3 PostgreSQL文字セット

<title><productname>PostgreSQL</productname> Character Sets</title>

名前	説明	言語	サーバ？	ICU?	バイト数/文字	別名
`BIG5`	Big Five	繁体字	いいえ	いいえ	1-2	`WIN950`、`Windows950`
`EUC_CN`	Extended UNIX Code-CN	簡体字	はい	はい	1-3
`EUC_JP`	Extended UNIX Code-JP	日本語	はい	はい	1-3
`EUC_JIS_2004`	Extended UNIX Code-JP, JIS X 0213	日本語	はい	いいえ	1-3
`EUC_KR`	Extended UNIX Code-KR	韓国語	はい	はい	1-3
`EUC_TW`	Extended UNIX Code-TW	繁体字、台湾語	はい	はい	1-4
`GB18030`	National Standard	中国語	いいえ	いいえ	1-4
`GBK`	Extended National Standard	簡体字	いいえ	いいえ	1-2	`WIN936`、`Windows936`
`ISO_8859_5`	ISO 8859-5、ECMA 113	ラテン/キリル	はい	はい	1
`ISO_8859_6`	ISO 8859-6、ECMA 114	ラテン/アラビア語	はい	はい	1
`ISO_8859_7`	ISO 8859-7、ECMA 118	ラテン/ギリシャ語	はい	はい	1
`ISO_8859_8`	ISO 8859-8、ECMA 121	ラテン/ヘブライ語	はい	はい	1
`JOHAB`	JOHAB	韓国語（ハングル）	いいえ	いいえ	1-3
`KOI8R`	KOI8-R	キリル文字（ロシア）	はい	はい	1	`KOI8`
`KOI8U`	KOI8-U	キリル文字（ウクライナ）	はい	はい	1
`LATIN1`	ISO 8859-1、ECMA 94	西ヨーロッパ	はい	はい	1	`ISO88591`
`LATIN2`	ISO 8859-2、ECMA 94	中央ヨーロッパ	はい	はい	1	`ISO88592`
`LATIN3`	ISO 8859-3、ECMA 94	南ヨーロッパ	はい	はい	1	`ISO88593`
`LATIN4`	ISO 8859-4、ECMA 94	北ヨーロッパ	はい	はい	1	`ISO88594`
`LATIN5`	ISO 8859-9、ECMA 128	トルコ	はい	はい	1	`ISO88599`
`LATIN6`	ISO 8859-10、ECMA 144	北欧	はい	はい	1	`ISO885910`
`LATIN7`	ISO 8859-13	バルト語派	はい	はい	1	`ISO885913`
`LATIN8`	ISO 8859-14	ケルト	はい	はい	1	`ISO885914`
`LATIN9`	ISO 8859-15	LATIN1でヨーロッパと訛りを含む	はい	はい	1	`ISO885915`
`LATIN10`	ISO 8859-16、ASRO SR 14111	ルーマニア	はい	いいえ	1	`ISO885916`
`MULE_INTERNAL`	Mule内部コード	多言語Emacs	はい	いいえ	1-4
`SJIS`	Shift JIS	日本語	いいえ	いいえ	1-2	`Mskanji`、`ShiftJIS`、`WIN932`、`Windows932`
`SHIFT_JIS_2004`	Shift JIS, JIS X 0213	日本語	いいえ	いいえ	1-2
`SQL_ASCII`	未指定（テキストを参照）	何でも	はい	いいえ	1
`UHC`	統合ハングルコード	韓国語	いいえ	いいえ	1-2	`WIN949`、`Windows949`
`UTF8`	Unicode、8ビット	すべて	はい	はい	1-4	`Unicode`
`WIN866`	Windows CP866	キリル文字	はい	はい	1	`ALT`
`WIN874`	Windows CP874	タイ語	はい	いいえ	1
`WIN1250`	Windows CP1250	中央ヨーロッパ	はい	はい	1
`WIN1251`	Windows CP1251	キリル文字	はい	はい	1	`WIN`
`WIN1252`	Windows CP1252	西ヨーロッパ	はい	はい	1
`WIN1253`	Windows CP1253	ギリシャ	はい	はい	1
`WIN1254`	Windows CP1254	トルコ	はい	はい	1
`WIN1255`	Windows CP1255	ヘブライ	はい	はい	1
`WIN1256`	Windows CP1256	アラビア語	はい	はい	1
`WIN1257`	Windows CP1257	バルト語派	はい	はい	1
`WIN1258`	Windows CP1258	ベトナム語	はい	はい	1	`ABC`、`TCVN`、`TCVN5712`、`VSCII`

Not all client <acronym>API</acronym>s support all the listed character sets. For example, the <productname>PostgreSQL</productname> JDBC driver does not support <literal>MULE_INTERNAL</literal>, <literal>LATIN6</literal>, <literal>LATIN8</literal>, and <literal>LATIN10</literal>. 全てのクライアントのAPIが上の一覧表に示した文字セットをサポートしているわけではありません。例えばPostgreSQL JDBCドライバはMULE_INTERNAL、LATIN6、LATIN8、そしてLATIN10をサポートしません。

The <literal>SQL_ASCII</literal> setting behaves considerably differently from the other settings. When the server character set is <literal>SQL_ASCII</literal>, the server interprets byte values 0–127 according to the ASCII standard, while byte values 128–255 are taken as uninterpreted characters. No encoding conversion will be done when the setting is <literal>SQL_ASCII</literal>. Thus, this setting is not so much a declaration that a specific encoding is in use, as a declaration of ignorance about the encoding. In most cases, if you are working with any non-ASCII data, it is unwise to use the <literal>SQL_ASCII</literal> setting because <productname>PostgreSQL</productname> will be unable to help you by converting or validating non-ASCII characters. SQL_ASCIIの設定は、他の設定とかなり異なります。サーバのキャラクタセットがSQL_ASCIIのとき、サーバは0から127のバイト値をASCIIに変換します。一方、128から255までは変換されません。設定がSQL_ASCIIの場合は、符号化は実行されません。よって、この設定は特定の符号化を使用している場合には、その符号化を無視するようになってしまいます。多くの場合、ASCIIではない環境で作業する場合はSQL_ASCIIの設定を使用するのは、賢いことではありません。なぜならPostgreSQLはASCIIではない文字を変換したり検査したりすることは出来ないからです。

23.3.2. 文字セットの設定 #

<title>Setting the Character Set</title>

<command>initdb</command> defines the default character set (encoding) for a <productname>PostgreSQL</productname> cluster. For example, initdbでPostgreSQLクラスタのデフォルト文字セット（エンコーディング）を定義します。以下に例を示します。

initdb -E EUC_JP

sets the default character set to <literal>EUC_JP</literal> (Extended Unix Code for Japanese). You can use <option>--encoding</option> instead of <option>-E</option> if you prefer longer option strings. If no <option>-E</option> or <option>--encoding</option> option is given, <command>initdb</command> attempts to determine the appropriate encoding to use based on the specified or default locale. これはデフォルトの文字セットをEUC_JP（日本語拡張Unixコード）に設定します。より長いオプションの文字列がお好みなら-Eの代わりに--encodingと書くこともできます。 -Eオプションも--encodingオプションも与えられない場合、initdbは、指定もしくはデフォルトのロケールに基づいて適当な符号化方式を決定しようとします。

You can specify a non-default encoding at database creation time, provided that the encoding is compatible with the selected locale: データベース作成時に選択したロケールと互換性を持つ符号化方式を提供することで、デフォルト以外の符号化方式を指定できます。

createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean

This will create a database named <literal>korean</literal> that uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>. Another way to accomplish this is to use this SQL command: これはEUC_KR文字セットとko_KRロケールを使用するkoreanという名前のデータベースを作成します。 SQLコマンドで同じことを行うには次のようにします。

CREATE DATABASE korean WITH ENCODING 'EUC_KR' LC_COLLATE='ko_KR.euckr' LC_CTYPE='ko_KR.euckr' TEMPLATE=template0;

Notice that the above commands specify copying the <literal>template0</literal> database. When copying any other database, the encoding and locale settings cannot be changed from those of the source database, because that might result in corrupt data. For more information see <xref linkend="manage-ag-templatedbs"/>. 上のコマンドにてtemplate0データベースのコピーが指定されていることに注目してください。他のデータベースからコピーする場合、データが破損する結果となる可能性がありますので、符号化方式とロケール設定を元のデータベースの設定から変更することはできません。詳細については22.3を参照してください。

The encoding for a database is stored in the system catalog <literal>pg_database</literal>. You can see it by using the <command>psql</command> <option>-l</option> option or the <command>\l</command> command. データベースの符号化方式はpg_databaseシステムカタログに格納されます。 psqlの-lオプションか\lコマンドで符号化方式を確認できます。

$ psql -l
                                         List of databases
   Name    |  Owner   | Encoding  |  Collation  |    Ctype    |          Access Privileges
-----------+----------+-----------+-------------+-------------+-------------------------------------
 clocaledb | hlinnaka | SQL_ASCII | C           | C           |
 englishdb | hlinnaka | UTF8      | en_GB.UTF8  | en_GB.UTF8  |
 japanese  | hlinnaka | UTF8      | ja_JP.UTF8  | ja_JP.UTF8  |
 korean    | hlinnaka | EUC_KR    | ko_KR.euckr | ko_KR.euckr |
 postgres  | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  |
 template0 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
 template1 | hlinnaka | UTF8      | fi_FI.UTF8  | fi_FI.UTF8  | {=c/hlinnaka,hlinnaka=CTc/hlinnaka}
(7 rows)

重要

On most modern operating systems, <productname>PostgreSQL</productname> can determine which character set is implied by the <envar>LC_CTYPE</envar> setting, and it will enforce that only the matching database encoding is used. On older systems it is your responsibility to ensure that you use the encoding expected by the locale you have selected. A mistake in this area is likely to lead to strange behavior of locale-dependent operations such as sorting. 最近のオペレーティングシステムでは、PostgreSQLは、LC_CTYPEの設定によりどの文字セットが指定されているか決定できます。そして、一致するデータベース符号化方式のみを強制的に使用します。古いオペレーティングシステムでは、自分で選択したロケールが想定している符号化方式を確実に使用することは各自の責任になります。ここでの間違いは、ソート処理などのロケールに依存する操作が、奇妙な動作するといったことにつながります。

<productname>PostgreSQL</productname> will allow superusers to create databases with <literal>SQL_ASCII</literal> encoding even when <envar>LC_CTYPE</envar> is not <literal>C</literal> or <literal>POSIX</literal>. As noted above, <literal>SQL_ASCII</literal> does not enforce that the data stored in the database has any particular encoding, and so this choice poses risks of locale-dependent misbehavior. Using this combination of settings is deprecated and may someday be forbidden altogether. PostgreSQLは、LC_CTYPEがCもしくはPOSIXでもない場合にも、スーパーユーザがSQL_ASCIIエンコーディングでデータベースを作成することを許可します。上記のように、SQL_ASCIIは、データベースに保存されているデータが特定のエンコーディングを持つことを強制しません。それゆえ、この選択はロケールに依存したおかしな動作を引き起こすリスクを高めます。この設定の組み合わせを使用することは、お勧めできませんし、いつの日か完全に禁止されるかもしれません。

23.3.3. サーバ・クライアント間の自動文字セット変換 #

<title>Automatic Character Set Conversion Between Server and Client</title>

<productname>PostgreSQL</productname> supports automatic character set conversion between server and client for many combinations of character sets (<xref linkend="multibyte-conversions-supported"/> shows which ones). PostgreSQLは、多数の文字セットの組み合わせ（23.3.4 のいずれか）に対してサーバとクライアントの間で自動的に文字セットを変換する機能を提供しています。

To enable automatic character set conversion, you have to tell <productname>PostgreSQL</productname> the character set (encoding) you would like to use in the client. There are several ways to accomplish this: 自動文字セット変換を有効にするためには、クライアントでどのような文字セット（符号化方式）を使用させたいかをPostgreSQLに伝えなければなりません。これを行うにはいくつかの方法があります。

Using the <command>\encoding</command> command in <application>psql</application>. <command>\encoding</command> allows you to change client encoding on the fly. For example, to change the encoding to <literal>SJIS</literal>, type: psqlで\encodingコマンドを使います。 \encodingは実行中であってもクライアントの符号化方式を変更させることができます。例えば符号化方式をSJISに変えたい場合は次のように入力します。
```
\encoding SJIS
```
<application>libpq</application> (<xref linkend="libpq-control"/>) has functions to control the client encoding. libpq (32.11)はクライアントの符号化方式を制御する関数を保持しています。
Using <command>SET client_encoding TO</command>. Setting the client encoding can be done with this SQL command: SET client_encoding TOを使います。次のSQLコマンドでクライアントの符号化方式を設定できます。
```
SET CLIENT_ENCODING TO 'value';
```
Also you can use the standard SQL syntax <literal>SET NAMES</literal> for this purpose: 標準SQLの構文SET NAMESを同じ目的で使うこともできます。
```
SET NAMES 'value';
```
To query the current client encoding: 現在のクライアントの符号化方式を問い合わせるには次のようにします。
```
SHOW client_encoding;
```
To return to the default encoding: デフォルトの符号化方式に戻すのには次のようにします。
```
RESET client_encoding;
```
Using <envar>PGCLIENTENCODING</envar>. If the environment variable <envar>PGCLIENTENCODING</envar> is defined in the client's environment, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.) PGCLIENTENCODINGを使います。クライアントの環境でPGCLIENTENCODING環境変数が定義されていると、サーバと接続が確立した時点で自動的にクライアントの符号化方式が選択されます。（上で説明したその他のどんな方法でもその後書き換えできます。）
Using the configuration variable <xref linkend="guc-client-encoding"/>. If the <varname>client_encoding</varname> variable is set, that client encoding is automatically selected when a connection to the server is made. (This can subsequently be overridden using any of the other methods mentioned above.) client_encoding変数を使います。 client_encoding変数が設定されていると、サーバとの接続が確立した時点で自動的にクライアントの符号化方式が選択されます。（上で説明したその他のどんな方法でもその後書き換えできます。）

If the conversion of a particular character is not possible — suppose you chose <literal>EUC_JP</literal> for the server and <literal>LATIN1</literal> for the client, and some Japanese characters are returned that do not have a representation in <literal>LATIN1</literal> — an error is reported. EUC_JPをサーバに、そしてLATIN1をクライアントに選んだ場合のように、特定の文字の変換ができない時、日本語文字はLATIN1に入っていないという旨の日本語が返され、エラーが報告されます。

If the client character set is defined as <literal>SQL_ASCII</literal>, encoding conversion is disabled, regardless of the server's character set. (However, if the server's character set is not <literal>SQL_ASCII</literal>, the server will still check that incoming data is valid for that encoding; so the net effect is as though the client character set were the same as the server's.) Just as for the server, use of <literal>SQL_ASCII</literal> is unwise unless you are working with all-ASCII data. クライアント側のキャラクタセットがSQL_ASCIIに定義されている場合は、符号化変換はサーバ側のキャラクタセットに関係無く無効化されます。（ただし、サーバの文字セットがSQL_ASCIIでない場合、サーバは受信データがそのエンコーディングに対して有効であることをチェックします。したがって、クライアントの文字セットがサーバの文字セットと同じであるかのような結果になります。）サーバ側と同じように、SQL_ASCIIを使用することは、すべてASCIIのデータを扱っている場合を除き、賢い方法ではありません。

23.3.4. 利用可能な文字セットの変換 #

<title>Available Character Set Conversions</title>

<productname>PostgreSQL</productname> allows conversion between any two character sets for which a conversion function is listed in the <link linkend="catalog-pg-conversion"><structname>pg_conversion</structname></link> system catalog. <productname>PostgreSQL</productname> comes with some predefined conversions, as summarized in <xref linkend="multibyte-translation-table"/> and shown in more detail in <xref linkend="builtin-conversions-table"/>. You can create a new conversion using the SQL command <xref linkend="sql-createconversion"/>. (To be used for automatic client/server conversions, a conversion must be marked as <quote>default</quote> for its character set pair.) PostgreSQLは、pg_conversionシステムカタログ内にリストされた変換関数によって2つの文字セット間を変換できます。 PostgreSQLでは表 23.4で要約され表 23.5に詳細が示されているように、いくつかの変換があらかじめ組み込まれています。 CREATE CONVERSIONSQLコマンドを用いることで新しい変換を作成できます。（クライアントもしくはサーバの自動変換を使用するためには、変換がその文字セットの組み合わせのための「デフォルト」として設定されている必要があります。）

表23.4 組み込みクライアントもしくはサーバ文字セット変換

<title>Built-in Client/Server Character Set Conversions</title>

サーバ文字セット	利用可能なクライアント文字セット
`BIG5`	サーバの符号化方式としてサポートされていません
`EUC_CN`	EUC_CN, `MULE_INTERNAL`, `UTF8`
`EUC_JP`	EUC_JP, `MULE_INTERNAL`, `SJIS`, `UTF8`
`EUC_JIS_2004`	EUC_JIS_2004, `SHIFT_JIS_2004`, `UTF8`
`EUC_KR`	EUC_KR, `MULE_INTERNAL`, `UTF8`
`EUC_TW`	EUC_TW, `BIG5`, `MULE_INTERNAL`, `UTF8`
`GB18030`	サーバの符号化方式としてサポートされていません
`GBK`	サーバの符号化方式としてサポートされていません
`ISO_8859_5`	ISO_8859_5, `KOI8R`, `MULE_INTERNAL`, `UTF8`, `WIN866`, `WIN1251`
`ISO_8859_6`	ISO_8859_6, `UTF8`
`ISO_8859_7`	ISO_8859_7, `UTF8`
`ISO_8859_8`	ISO_8859_8, `UTF8`
`JOHAB`	サーバの符号化方式としてサポートされていません
`KOI8R`	KOI8R, `ISO_8859_5`, `MULE_INTERNAL`, `UTF8`, `WIN866`, `WIN1251`
`KOI8U`	KOI8U, `UTF8`
`LATIN1`	LATIN1, `MULE_INTERNAL`, `UTF8`
`LATIN2`	LATIN2, `MULE_INTERNAL`, `UTF8`, `WIN1250`
`LATIN3`	LATIN3, `MULE_INTERNAL`, `UTF8`
`LATIN4`	LATIN4, `MULE_INTERNAL`, `UTF8`
`LATIN5`	LATIN5, `UTF8`
`LATIN6`	LATIN6, `UTF8`
`LATIN7`	LATIN7, `UTF8`
`LATIN8`	LATIN8, `UTF8`
`LATIN9`	LATIN9, `UTF8`
`LATIN10`	LATIN10, `UTF8`
`MULE_INTERNAL`	MULE_INTERNAL, `BIG5`, `EUC_CN`, `EUC_JP`, `EUC_KR`, `EUC_TW`, `ISO_8859_5`, `KOI8R`, <literal>LATIN1</literal> to <literal>LATIN4</literal>, `LATIN1` から `LATIN4`, `SJIS`, `WIN866`, `WIN1250`, `WIN1251`
`SJIS`	サーバの符号化方式としてサポートされていません
`SHIFT_JIS_2004`	サーバの符号化方式としてサポートされていません
`SQL_ASCII`	任意（変換は実行されません）
`UHC`	サーバの符号化方式としてサポートされていません
`UTF8`	すべての符号化方式がサポートされています
`WIN866`	WIN866, `ISO_8859_5`, `KOI8R`, `MULE_INTERNAL`, `UTF8`, `WIN1251`
`WIN874`	WIN874, `UTF8`
`WIN1250`	WIN1250, `LATIN2`, `MULE_INTERNAL`, `UTF8`
`WIN1251`	WIN1251, `ISO_8859_5`, `KOI8R`, `MULE_INTERNAL`, `UTF8`, `WIN866`
`WIN1252`	WIN1252, `UTF8`
`WIN1253`	WIN1253, `UTF8`
`WIN1254`	WIN1254, `UTF8`
`WIN1255`	WIN1255, `UTF8`
`WIN1256`	WIN1256, `UTF8`
`WIN1257`	WIN1257, `UTF8`
`WIN1258`	WIN1258, `UTF8`

表23.5 すべての組み込み文字セット変換

<title>All Built-in Character Set Conversions</title>

変換名 ^[a]	変換元符号化方式	変換先符号化方式
`big5_to_euc_tw`	`BIG5`	`EUC_TW`
`big5_to_mic`	`BIG5`	`MULE_INTERNAL`
`big5_to_utf8`	`BIG5`	`UTF8`
`euc_cn_to_mic`	`EUC_CN`	`MULE_INTERNAL`
`euc_cn_to_utf8`	`EUC_CN`	`UTF8`
`euc_jp_to_mic`	`EUC_JP`	`MULE_INTERNAL`
`euc_jp_to_sjis`	`EUC_JP`	`SJIS`
`euc_jp_to_utf8`	`EUC_JP`	`UTF8`
`euc_kr_to_mic`	`EUC_KR`	`MULE_INTERNAL`
`euc_kr_to_utf8`	`EUC_KR`	`UTF8`
`euc_tw_to_big5`	`EUC_TW`	`BIG5`
`euc_tw_to_mic`	`EUC_TW`	`MULE_INTERNAL`
`euc_tw_to_utf8`	`EUC_TW`	`UTF8`
`gb18030_to_utf8`	`GB18030`	`UTF8`
`gbk_to_utf8`	`GBK`	`UTF8`
`iso_8859_10_to_utf8`	`LATIN6`	`UTF8`
`iso_8859_13_to_utf8`	`LATIN7`	`UTF8`
`iso_8859_14_to_utf8`	`LATIN8`	`UTF8`
`iso_8859_15_to_utf8`	`LATIN9`	`UTF8`
`iso_8859_16_to_utf8`	`LATIN10`	`UTF8`
`iso_8859_1_to_mic`	`LATIN1`	`MULE_INTERNAL`
`iso_8859_1_to_utf8`	`LATIN1`	`UTF8`
`iso_8859_2_to_mic`	`LATIN2`	`MULE_INTERNAL`
`iso_8859_2_to_utf8`	`LATIN2`	`UTF8`
`iso_8859_2_to_windows_1250`	`LATIN2`	`WIN1250`
`iso_8859_3_to_mic`	`LATIN3`	`MULE_INTERNAL`
`iso_8859_3_to_utf8`	`LATIN3`	`UTF8`
`iso_8859_4_to_mic`	`LATIN4`	`MULE_INTERNAL`
`iso_8859_4_to_utf8`	`LATIN4`	`UTF8`
`iso_8859_5_to_koi8_r`	`ISO_8859_5`	`KOI8R`
`iso_8859_5_to_mic`	`ISO_8859_5`	`MULE_INTERNAL`
`iso_8859_5_to_utf8`	`ISO_8859_5`	`UTF8`
`iso_8859_5_to_windows_1251`	`ISO_8859_5`	`WIN1251`
`iso_8859_5_to_windows_866`	`ISO_8859_5`	`WIN866`
`iso_8859_6_to_utf8`	`ISO_8859_6`	`UTF8`
`iso_8859_7_to_utf8`	`ISO_8859_7`	`UTF8`
`iso_8859_8_to_utf8`	`ISO_8859_8`	`UTF8`
`iso_8859_9_to_utf8`	`LATIN5`	`UTF8`
`johab_to_utf8`	`JOHAB`	`UTF8`
`koi8_r_to_iso_8859_5`	`KOI8R`	`ISO_8859_5`
`koi8_r_to_mic`	`KOI8R`	`MULE_INTERNAL`
`koi8_r_to_utf8`	`KOI8R`	`UTF8`
`koi8_r_to_windows_1251`	`KOI8R`	`WIN1251`
`koi8_r_to_windows_866`	`KOI8R`	`WIN866`
`koi8_u_to_utf8`	`KOI8U`	`UTF8`
`mic_to_big5`	`MULE_INTERNAL`	`BIG5`
`mic_to_euc_cn`	`MULE_INTERNAL`	`EUC_CN`
`mic_to_euc_jp`	`MULE_INTERNAL`	`EUC_JP`
`mic_to_euc_kr`	`MULE_INTERNAL`	`EUC_KR`
`mic_to_euc_tw`	`MULE_INTERNAL`	`EUC_TW`
`mic_to_iso_8859_1`	`MULE_INTERNAL`	`LATIN1`
`mic_to_iso_8859_2`	`MULE_INTERNAL`	`LATIN2`
`mic_to_iso_8859_3`	`MULE_INTERNAL`	`LATIN3`
`mic_to_iso_8859_4`	`MULE_INTERNAL`	`LATIN4`
`mic_to_iso_8859_5`	`MULE_INTERNAL`	`ISO_8859_5`
`mic_to_koi8_r`	`MULE_INTERNAL`	`KOI8R`
`mic_to_sjis`	`MULE_INTERNAL`	`SJIS`
`mic_to_windows_1250`	`MULE_INTERNAL`	`WIN1250`
`mic_to_windows_1251`	`MULE_INTERNAL`	`WIN1251`
`mic_to_windows_866`	`MULE_INTERNAL`	`WIN866`
`sjis_to_euc_jp`	`SJIS`	`EUC_JP`
`sjis_to_mic`	`SJIS`	`MULE_INTERNAL`
`sjis_to_utf8`	`SJIS`	`UTF8`
`windows_1258_to_utf8`	`WIN1258`	`UTF8`
`uhc_to_utf8`	`UHC`	`UTF8`
`utf8_to_big5`	`UTF8`	`BIG5`
`utf8_to_euc_cn`	`UTF8`	`EUC_CN`
`utf8_to_euc_jp`	`UTF8`	`EUC_JP`
`utf8_to_euc_kr`	`UTF8`	`EUC_KR`
`utf8_to_euc_tw`	`UTF8`	`EUC_TW`
`utf8_to_gb18030`	`UTF8`	`GB18030`
`utf8_to_gbk`	`UTF8`	`GBK`
`utf8_to_iso_8859_1`	`UTF8`	`LATIN1`
`utf8_to_iso_8859_10`	`UTF8`	`LATIN6`
`utf8_to_iso_8859_13`	`UTF8`	`LATIN7`
`utf8_to_iso_8859_14`	`UTF8`	`LATIN8`
`utf8_to_iso_8859_15`	`UTF8`	`LATIN9`
`utf8_to_iso_8859_16`	`UTF8`	`LATIN10`
`utf8_to_iso_8859_2`	`UTF8`	`LATIN2`
`utf8_to_iso_8859_3`	`UTF8`	`LATIN3`
`utf8_to_iso_8859_4`	`UTF8`	`LATIN4`
`utf8_to_iso_8859_5`	`UTF8`	`ISO_8859_5`
`utf8_to_iso_8859_6`	`UTF8`	`ISO_8859_6`
`utf8_to_iso_8859_7`	`UTF8`	`ISO_8859_7`
`utf8_to_iso_8859_8`	`UTF8`	`ISO_8859_8`
`utf8_to_iso_8859_9`	`UTF8`	`LATIN5`
`utf8_to_johab`	`UTF8`	`JOHAB`
`utf8_to_koi8_r`	`UTF8`	`KOI8R`
`utf8_to_koi8_u`	`UTF8`	`KOI8U`
`utf8_to_sjis`	`UTF8`	`SJIS`
`utf8_to_windows_1258`	`UTF8`	`WIN1258`
`utf8_to_uhc`	`UTF8`	`UHC`
`utf8_to_windows_1250`	`UTF8`	`WIN1250`
`utf8_to_windows_1251`	`UTF8`	`WIN1251`
`utf8_to_windows_1252`	`UTF8`	`WIN1252`
`utf8_to_windows_1253`	`UTF8`	`WIN1253`
`utf8_to_windows_1254`	`UTF8`	`WIN1254`
`utf8_to_windows_1255`	`UTF8`	`WIN1255`
`utf8_to_windows_1256`	`UTF8`	`WIN1256`
`utf8_to_windows_1257`	`UTF8`	`WIN1257`
`utf8_to_windows_866`	`UTF8`	`WIN866`
`utf8_to_windows_874`	`UTF8`	`WIN874`
`windows_1250_to_iso_8859_2`	`WIN1250`	`LATIN2`
`windows_1250_to_mic`	`WIN1250`	`MULE_INTERNAL`
`windows_1250_to_utf8`	`WIN1250`	`UTF8`
`windows_1251_to_iso_8859_5`	`WIN1251`	`ISO_8859_5`
`windows_1251_to_koi8_r`	`WIN1251`	`KOI8R`
`windows_1251_to_mic`	`WIN1251`	`MULE_INTERNAL`
`windows_1251_to_utf8`	`WIN1251`	`UTF8`
`windows_1251_to_windows_866`	`WIN1251`	`WIN866`
`windows_1252_to_utf8`	`WIN1252`	`UTF8`
`windows_1256_to_utf8`	`WIN1256`	`UTF8`
`windows_866_to_iso_8859_5`	`WIN866`	`ISO_8859_5`
`windows_866_to_koi8_r`	`WIN866`	`KOI8R`
`windows_866_to_mic`	`WIN866`	`MULE_INTERNAL`
`windows_866_to_utf8`	`WIN866`	`UTF8`
`windows_866_to_windows_1251`	`WIN866`	`WIN`
`windows_874_to_utf8`	`WIN874`	`UTF8`
`euc_jis_2004_to_utf8`	`EUC_JIS_2004`	`UTF8`
`utf8_to_euc_jis_2004`	`UTF8`	`EUC_JIS_2004`
`shift_jis_2004_to_utf8`	`SHIFT_JIS_2004`	`UTF8`
`utf8_to_shift_jis_2004`	`UTF8`	`SHIFT_JIS_2004`
`euc_jis_2004_to_shift_jis_2004`	`EUC_JIS_2004`	`SHIFT_JIS_2004`
`shift_jis_2004_to_euc_jis_2004`	`SHIFT_JIS_2004`	`EUC_JIS_2004`
^[a] The conversion names follow a standard naming scheme: The official name of the source encoding with all non-alphanumeric characters replaced by underscores, followed by <literal>_to_</literal>, followed by the similarly processed destination encoding name. Therefore, these names sometimes deviate from the customary encoding names shown in <xref linkend="charset-table"/>. 変換名は、標準の命名規定に従います。英数字以外のすべての文字がアンダースコアに置き換えられた変換元符号化方式の正式名に`_to_`が続き、同様に処理された変換先符号化方式名が続きます。したがって、これらの名前は、表 23.3に示されている通常の符号化方式名と異なる場合があります。

23.3.5. 推奨文書 #

<title>Further Reading</title>

These are good sources to start learning about various kinds of encoding systems. ここに記したものは様々な符号化方式システムを学習するのに良い資料です。

CJKV日中韓越情報処理: 中国語、日本語、韓国語 & ベトナム語処理: Contains detailed explanations of <literal>EUC_JP</literal>, <literal>EUC_CN</literal>, <literal>EUC_KR</literal>, <literal>EUC_TW</literal>. EUC_JP、EUC_CN、EUC_KR、EUC_TWの詳しい説明があります。
https://www.unicode.org/: The web site of the Unicode Consortium. Unicode協会のWebサイトです。
RFC 3629: <acronym>UTF</acronym>-8 (8-bit UCS/Unicode Transformation Format) is defined here. ここでUTF-8（8ビットUCS/Unicode変換書式）が定義されています。

前へ	上へ	次へ
23.2. 照合順序サポート	ホーム	第24章定常的なデータベース保守作業