8.13. XML型

PostgreSQL 17.5文書
		第8章データ型	誤訳等の報告
前へ	上へ	8.13. XML型	次へ

8.13. XML型 #

The <type>xml</type> data type can be used to store XML data. Its advantage over storing XML data in a <type>text</type> field is that it checks the input values for well-formedness, and there are support functions to perform type-safe operations on it; see <xref linkend="functions-xml"/>. Use of this data type requires the installation to have been built with <command>configure --with-libxml</command>. xmlデータ型を使用して、XMLデータを格納することができます。 text型のフィールドにXMLデータを格納する方法より、入力された値が整形式かどうかを検査する利点があります。また、型を安全に操作するサポート関数があります。 9.15を参照してください。このデータ型を使用するためには、インストレーションがconfigure --with-libxmlで構築されていることが必要です。

The <type>xml</type> type can store well-formed <quote>documents</quote>, as defined by the XML standard, as well as <quote>content</quote> fragments, which are defined by reference to the more permissive <ulink url="https://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/#DocumentNode"><quote>document node</quote></ulink> of the XQuery and XPath data model. Roughly, this means that content fragments can have more than one top-level element or character node. The expression <literal><replaceable>xmlvalue</replaceable> IS DOCUMENT</literal> can be used to evaluate whether a particular <type>xml</type> value is a full document or only a content fragment. xml型は、XML標準で定義された整形式の「文書」およびXQueryとXPathデータモデルのより寛容な「文書ノード」を参照して定義される「コンテンツ」フラグメントを格納できます。大雑把に言うと、これは、コンテンツフラグメントが2つ以上の最上位要素や文字ノードを持つことができることを意味します。 xmlvalue IS DOCUMENTという式を使用して、特定のxml値が完全な文書か単なるコンテンツフラグメントか評価することができます。

Limits and compatibility notes for the <type>xml</type> data type can be found in <xref linkend="xml-limits-conformance"/>. xmlデータ型の制限と互換性に関する注意事項は、 D.3から確認できます。

8.13.1. XML値の作成 #

<title>Creating XML Values</title>

To produce a value of type <type>xml</type> from character data, use the function <function>xmlparse</function>:<indexterm><primary>xmlparse</primary></indexterm> 文字データからxml型の値を生成するためには、xmlparse関数を使用してください。

XMLPARSE ( { DOCUMENT | CONTENT } value)

Examples: 例：

XMLPARSE (DOCUMENT '<?xml version="1.0"?><book><title>Manual</title><chapter>...</chapter></book>')
XMLPARSE (CONTENT 'abc<foo>bar</foo><bar>foo</bar>')

While this is the only way to convert character strings into XML values according to the SQL standard, the PostgreSQL-specific syntaxes: 標準SQLに従って文字列をXML値に変換するためにはこの方法しかありませんが、次のようなPostgreSQL固有の構文も使用することができます。

xml '<foo>bar</foo>'
'<foo>bar</foo>'::xml

can also be used.

The <type>xml</type> type does not validate input values against a document type declaration (DTD),<indexterm><primary>DTD</primary></indexterm> even when the input value specifies a DTD. There is also currently no built-in support for validating against other XML schema languages such as XML Schema. xml型では文書型定義（DTD）に対して入力値を検証することは、入力値がDTDを指定していたとしても、行いません。また同様に、現時点ではXML Schemaなどの他のXMLスキーマ言語に対する検証サポートも組み込まれていません。

The inverse operation, producing a character string value from <type>xml</type>, uses the function <function>xmlserialize</function>:<indexterm><primary>xmlserialize</primary></indexterm> xmlから文字列値を生成するという逆演算ではxmlserialize関数を使用してください。

XMLSERIALIZE ( { DOCUMENT | CONTENT } value AS type [ [ NO ] INDENT ] )

<replaceable>type</replaceable> can be <type>character</type>, <type>character varying</type>, or <type>text</type> (or an alias for one of those). Again, according to the SQL standard, this is the only way to convert between type <type>xml</type> and character types, but PostgreSQL also allows you to simply cast the value. ここで、typeは、character、character varying、text（またはこれらの別名）を取ることができます。この場合も、標準SQLに従ってxmlと文字列型間の変換を行うためにはこの方法しかありません。 PostgreSQLでは単に値をキャストすることが可能です。

The <literal>INDENT</literal> option causes the result to be pretty-printed, while <literal>NO INDENT</literal> (which is the default) just emits the original input string. Casting to a character type likewise produces the original string. INDENTオプションを指定すると、結果は整形されます。NO INDENT(デフォルトです)はオリジナルの入力文字列を単に出力します。文字列型にキャストすると、同様にオリジナル文字列が生成されます。

When a character string value is cast to or from type <type>xml</type> without going through <type>XMLPARSE</type> or <type>XMLSERIALIZE</type>, respectively, the choice of <literal>DOCUMENT</literal> versus <literal>CONTENT</literal> is determined by the <quote>XML option</quote> <indexterm><primary>XML option</primary></indexterm> session configuration parameter, which can be set using the standard command: XMLPARSEやXMLSERIALIZEを使わずに文字列値とxmlとの間をキャストした場合、DOCUMENTかCONTENTかという選択が「XML option」セッション設定パラメータによって決定されます。このパラメータは標準コマンド

SET XML OPTION { DOCUMENT | CONTENT };

or the more PostgreSQL-like syntax または、よりPostgreSQLらしい構文

SET xmloption TO { DOCUMENT | CONTENT };

を使用して設定することができます。 The default is <literal>CONTENT</literal>, so all forms of XML data are allowed. デフォルトはCONTENTですので、すべての書式のXMLデータを扱うことができます。

8.13.2. 符号化方式の取扱い #

<title>Encoding Handling</title>

Care must be taken when dealing with multiple character encodings on the client, server, and in the XML data passed through them. When using the text mode to pass queries to the server and query results to the client (which is the normal mode), PostgreSQL converts all character data passed between the client and the server and vice versa to the character encoding of the respective end; see <xref linkend="multibyte"/>. This includes string representations of XML values, such as in the above examples. This would ordinarily mean that encoding declarations contained in XML data can become invalid as the character data is converted to other encodings while traveling between client and server, because the embedded encoding declaration is not changed. To cope with this behavior, encoding declarations contained in character strings presented for input to the <type>xml</type> type are <emphasis>ignored</emphasis>, and content is assumed to be in the current server encoding. Consequently, for correct processing, character strings of XML data must be sent from the client in the current client encoding. It is the responsibility of the client to either convert documents to the current client encoding before sending them to the server, or to adjust the client encoding appropriately. On output, values of type <type>xml</type> will not have an encoding declaration, and clients should assume all data is in the current client encoding. クライアント側、サーバ側、および、これらを経由してやり取りされるXMLデータ内部で複数の文字符号化方式を扱う場合には注意が必要です。テキストモードを使用してサーバに問い合わせを渡し、そしてクライアントに問い合わせ結果を渡す場合（これが通常のモードです）、PostgreSQLは、クライアントからサーバ、サーバからクライアントでやり取りされるすべての文字データを受信側の文字符号化方式に変換します。 23.3を参照してください。これには上の例のようなXML値の文字列表現も含まれます。これは通常、埋め込まれたencoding宣言は変更されずに、クライアント/サーバ間でやり取りされる間に文字データが他方の符号化方式に変換されてしまうので、XMLデータ内のencodingが無効になる可能性があることを意味します。この動作に対処するため、xml型の入力として表現された文字列に含まれているencoding宣言は無視され、その内容は常にサーバの現在の符号化方式になっているものと仮定されます。したがって、正しく処理するためには、XMLデータにおける文字列をクライアントの現在の符号化方式で送信しなければなりません。サーバに送信する前に文書を現在のクライアントの符号化方式に変換するか、クライアントの符号化方式を適切に調節するかは、クライアントの責任です。出力ではxml型の値はencoding宣言を持ちません。クライアントはすべてのデータが現在のクライアントの符号化方式であることを前提としなければなりません。

When using binary mode to pass query parameters to the server and query results back to the client, no encoding conversion is performed, so the situation is different. In this case, an encoding declaration in the XML data will be observed, and if it is absent, the data will be assumed to be in UTF-8 (as required by the XML standard; note that PostgreSQL does not support UTF-16). On output, data will have an encoding declaration specifying the client encoding, unless the client encoding is UTF-8, in which case it will be omitted. バイナリモードを使用して、問い合わせパラメータをサーバに渡し、そして問い合わせ結果をクライアントに返す場合、符号化方式の変換は行われません。このため状況は異なります。この場合、XMLデータ内のencoding宣言が認識され、もし存在しなければ、データがUTF-8であると仮定されます。（XML標準の要求通りです。 PostgreSQLはUTF-16をサポートしていないことに注意してください。）出力では、データはクライアントの符号化方式を指定したencoding宣言を持ちます。ただし、もしクライアントの符号化方式がUTF-8の場合はencoding宣言は省略されます。

Needless to say, processing XML data with PostgreSQL will be less error-prone and more efficient if the XML data encoding, client encoding, and server encoding are the same. Since XML data is internally processed in UTF-8, computations will be most efficient if the server encoding is also UTF-8. 言うまでもありませんが、PostgreSQLを使用したXML処理では、XMLデータの符号化方式、クライアントの符号化方式、サーバの符号化方式が同じ場合にエラーが起こりづらく、より効率的です。 XMLデータは内部的にUTF-8として処理されますので、サーバの符号化方式が同一のUTF-8である場合、最も効率が上がります。

注意

Some XML-related functions may not work at all on non-ASCII data when the server encoding is not UTF-8. This is known to be an issue for <function>xmltable()</function> and <function>xpath()</function> in particular. サーバ符号化方式がUTF-8でない場合、いくつかのXMLに関係した関数は非ASCIIデータに対して全く機能しないことがあります。これは特にxmltable()とxpath()に対する問題として知られています。

8.13.3. XML値へのアクセス #

<title>Accessing XML Values</title>

The <type>xml</type> data type is unusual in that it does not provide any comparison operators. This is because there is no well-defined and universally useful comparison algorithm for XML data. One consequence of this is that you cannot retrieve rows by comparing an <type>xml</type> column against a search value. XML values should therefore typically be accompanied by a separate key field such as an ID. An alternative solution for comparing XML values is to convert them to character strings first, but note that character string comparison has little to do with a useful XML comparison method. xmlデータ型は、比較演算子をまったく提供しないというところが他と異なります。これは、XMLデータに対し、よく定義され、誰にとっても有用な比較アルゴリズムが存在しないためです。この結果、xml列を検索値と比べて行を取り出すことはできません。したがって通常XML値には、IDなどの別のキーフィールドを一般的に付属させなければなりません。 XML値の比較を行うもうひとつの方法は、文字列に一度変換することです。しかし、文字列比較は有用なXML比較方法といえないことに注意してください。

Since there are no comparison operators for the <type>xml</type> data type, it is not possible to create an index directly on a column of this type. If speedy searches in XML data are desired, possible workarounds include casting the expression to a character string type and indexing that, or indexing an XPath expression. Of course, the actual query would have to be adjusted to search by the indexed expression. xmlデータ型用の比較演算子がありませんので、この型の列に直接インデックスを作成することはできません。 XMLデータを高速に検索することが望まれるなら、その表現を文字列型にキャストし、それをインデックス付けするか、または、XPath式をインデックス付けするかという対策をとることができます。当然ながら、インデックス付けされた式で検索されるよう実際の問い合わせを調整する必要があります。

The text-search functionality in PostgreSQL can also be used to speed up full-document searches of XML data. The necessary preprocessing support is, however, not yet available in the PostgreSQL distribution. PostgreSQLのテキスト検索機能を使用して、XMLデータの全文検索速度をあげることもできます。しかし、PostgreSQL配布物では必要な前処理を未だサポートしていません。

前へ	上へ	次へ
8.12. UUID型	ホーム	8.14. JSONデータ型