12.2. テーブルとインデックス

PostgreSQL 17.5文書
		第12章全文検索	誤訳等の報告
前へ	上へ	12.2. テーブルとインデックス	次へ

12.2. テーブルとインデックス #

<title>Tables and Indexes</title>

The examples in the previous section illustrated full text matching using simple constant strings. This section shows how to search table data, optionally using indexes. 前節の例では、単純な文字列定数を使った全文検索照合を説明しました。この節では、テーブルのデータを検索する方法、そしてインデックスを使う方法を示します。

12.2.1. テーブルを検索する #

<title>Searching a Table</title>

It is possible to do a full text search without an index. A simple query to print the <structname>title</structname> of each row that contains the word <literal>friend</literal> in its <structfield>body</structfield> field is: インデックスがなくても全文検索をすることは可能です。bodyフィールド中のfriendという単語を含む行のtitleを印刷する単純な問い合わせは次のようになります。

SELECT title
FROM pgweb
WHERE to_tsvector('english', body) @@ to_tsquery('english', 'friend');

This will also find related words such as <literal>friends</literal> and <literal>friendly</literal>, since all these are reduced to the same normalized lexeme. 同時に、これは、friends、friendlyのように、関連する単語を見つけ出します。これらはすべて同じ正規化された語彙素に帰結するからです。

The query above specifies that the <literal>english</literal> configuration is to be used to parse and normalize the strings. Alternatively we could omit the configuration parameters: 上の問い合わせはenglish設定を使って文字列をパースして正規化することを指定しています。別の方法としては、設定パラメータを省略することができます。

SELECT title
FROM pgweb
WHERE to_tsvector(body) @@ to_tsquery('friend');

This query will use the configuration set by <xref linkend="guc-default-text-search-config"/>. この問い合わせはdefault_text_search_configで設定された設定を使用します。

A more complex example is to select the ten most recent documents that contain <literal>create</literal> and <literal>table</literal> in the <structname>title</structname> or <structname>body</structname>: もっと複雑な例として、createとtableをtitleまたはbodyに含む文書のうち新しい順に10個選ぶというものを示します。

SELECT title
FROM pgweb
WHERE to_tsvector(title || ' ' || body) @@ to_tsquery('create & table')
ORDER BY last_mod_date DESC
LIMIT 10;

For clarity we omitted the <function>coalesce</function> function calls which would be needed to find rows that contain <literal>NULL</literal> in one of the two fields. 細かいことですが、この例では、二つのうち一つのフィールドにNULLを含む行を探すために必要なcoalesce関数の呼び出しを省略しています。

Although these queries will work without an index, most applications will find this approach too slow, except perhaps for occasional ad-hoc searches. Practical use of text searching usually requires creating an index. これらの問い合わせはインデックスなしでも動きますが、たまに実行する一時的な問い合わせ用を除くと、たいていの用途には遅すぎます。実用上は、インデックスを作成することが必要なのが普通です。

12.2.2. インデックスの作成 #

<title>Creating Indexes</title>

We can create a <acronym>GIN</acronym> index (<xref linkend="textsearch-indexes"/>) to speed up text searches: テキスト検索を高速化するために、GINインデックス(12.9)を作ることができます。

CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', body));

Notice that the 2-argument version of <function>to_tsvector</function> is used. Only text search functions that specify a configuration name can be used in expression indexes (<xref linkend="indexes-expressional"/>). This is because the index contents must be unaffected by <xref linkend="guc-default-text-search-config"/>. If they were affected, the index contents might be inconsistent because different entries could contain <type>tsvector</type>s that were created with different text search configurations, and there would be no way to guess which was which. It would be impossible to dump and restore such an index correctly. 2引数バージョンのto_tsvectorを使っていることに注意してください。設定名を指定するテキスト検索関数だけが、式インデックス(11.7)で使えます。これは、インデックス内容が、default_text_search_configの影響を受けないためです。もし影響を受けるとすると、異なるテキスト検索設定で作られたtsvectorを持つエントリの間でインデックス内容が首尾一貫しなくなるからです。そして、どのエントリがどのようにして作られたのか、推測する方法はないでしょう。そのようなインデックスを正しくダンプ、リストアするのは不可能でしょう。

Because the two-argument version of <function>to_tsvector</function> was used in the index above, only a query reference that uses the 2-argument version of <function>to_tsvector</function> with the same configuration name will use that index. That is, <literal>WHERE to_tsvector('english', body) @@ 'a & b'</literal> can use the index, but <literal>WHERE to_tsvector(body) @@ 'a & b'</literal> cannot. This ensures that an index will be used only with the same configuration used to create the index entries. 上記のインデックスでは、2引数バージョンのto_tsvectorが使われているので、同じ設定名の2引数バージョンのto_tsvectorを使う問い合わせ参照だけがそのインデックスを使います。すなわち、WHERE to_tsvector('english', body) @@ 'a & b'はインデックスが使えますが、WHERE to_tsvector(body) @@ 'a & b'は使えません。これにより、インデックスエントリを作ったときの設定と、同じ設定のときだけインデックスが使われることが保証されます。

It is possible to set up more complex expression indexes wherein the configuration name is specified by another column, e.g.: 他の列によって設定名が指定されたより複雑な式インデックスを作ることができます。例えば、

CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector(config_name, body));

where <literal>config_name</literal> is a column in the <literal>pgweb</literal> table. This allows mixed configurations in the same index while recording which configuration was used for each index entry. This would be useful, for example, if the document collection contained documents in different languages. Again, queries that are meant to use the index must be phrased to match, e.g., <literal>WHERE to_tsvector(config_name, body) @@ 'a & b'</literal>. ここで、config_nameはpgwebテーブルの列です。これによって、各々のインデックスエントリで使用された設定を記録しつつ、同じインデックスの中で異なる設定を混在させることができます。これは、例えば文書の集まりが異なる言語の文書を含む場合に有用です。ここでも、インデックスを使うよう考慮されている問い合わせは、合致するように書かれなければなりません。例えば、WHERE to_tsvector(config_name, body) @@ 'a & b'。

Indexes can even concatenate columns: インデックスには、列を連結することさえできます。

CREATE INDEX pgweb_idx ON pgweb USING GIN (to_tsvector('english', title || ' ' || body));

Another approach is to create a separate <type>tsvector</type> column to hold the output of <function>to_tsvector</function>. To keep this column automatically up to date with its source data, use a stored generated column. This example is a concatenation of <literal>title</literal> and <literal>body</literal>, using <function>coalesce</function> to ensure that one field will still be indexed when the other is <literal>NULL</literal>: 別の方法として、to_tsvectorの出力を保持する別のtsvector列を作る方法があります。この列を元のデータに合わせて自動的に更新し続けるには、格納された生成列を使います。この例では、titleとbodyを連結、coalesceを使って、一つのフィールドがNULLであっても他のフィールドがインデックス付けされることを保証しています。

ALTER TABLE pgweb
    ADD COLUMN textsearchable_index_col tsvector
               GENERATED ALWAYS AS (to_tsvector('english', coalesce(title, '') || ' ' || coalesce(body, ''))) STORED;

Then we create a <acronym>GIN</acronym> index to speed up the search: そして、GINインデックスを作って検索速度を上げます。

CREATE INDEX textsearch_idx ON pgweb USING GIN (textsearchable_index_col);

Now we are ready to perform a fast full text search: これで、高速全文検索を実行する準備ができました。

SELECT title
FROM pgweb
WHERE textsearchable_index_col @@ to_tsquery('create & table')
ORDER BY last_mod_date DESC
LIMIT 10;

One advantage of the separate-column approach over an expression index is that it is not necessary to explicitly specify the text search configuration in queries in order to make use of the index. As shown in the example above, the query can depend on <varname>default_text_search_config</varname>. Another advantage is that searches will be faster, since it will not be necessary to redo the <function>to_tsvector</function> calls to verify index matches. (This is more important when using a GiST index than a GIN index; see <xref linkend="textsearch-indexes"/>.) The expression-index approach is simpler to set up, however, and it requires less disk space since the <type>tsvector</type> representation is not stored explicitly. 別列方式が式インデックスに勝る点の一つは、インデックスを使うために問い合わせの中でテキスト検索設定を明示的に指定する必要がないことです。上の例で示したように、問い合わせはdefault_text_search_configに依存できます。もう一つの利点は、インデックスの合致を検証するためにto_tsvectorを再実行する必要がないのでより高速だという事です。 (この点はGINインデックスを使うときよりも、GiSTインデックスを使う場合に重要です。12.9参照。) しかしながら、式インデックス方式はセットアップがより容易で、tsvector表現を明示的に保存する必要がないので、ディスクスペースの消費が少ないです。

前へ	上へ	次へ
12.1. はじめに	ホーム	12.3. テキスト検索の制御