11.8. 部分インデックス

PostgreSQL 18.4文書
		第11章インデックス	誤訳等の報告
前へ	上へ	11.8. 部分インデックス	次へ

11.8. 部分インデックス #

<title>Partial Indexes</title>

A <firstterm>partial index</firstterm> is an index built over a subset of a table; the subset is defined by a conditional expression (called the <firstterm>predicate</firstterm> of the partial index). The index contains entries only for those table rows that satisfy the predicate. Partial indexes are a specialized feature, but there are several situations in which they are useful. 部分インデックスとは、テーブルの部分集合に構築されるインデックスです。部分集合は、（部分インデックスの述語と呼ばれる）条件式で定義されます。部分インデックスには、その述語を満たすテーブル行のみに対するエントリが含まれます。部分インデックスは特別な機能です。しかし、これらが有用となる状況が複数あります。

One major reason for using a partial index is to avoid indexing common values. Since a query searching for a common value (one that accounts for more than a few percent of all the table rows) will not use the index anyway, there is no point in keeping those rows in the index at all. This reduces the size of the index, which will speed up those queries that do use the index. It will also speed up many table update operations because the index does not need to be updated in all cases. <xref linkend="indexes-partial-ex1"/> shows a possible application of this idea. 部分インデックスを利用する主な目的は、頻出値に対してインデックスを作成しないようにすることです。（テーブル全体の行のうち、数パーセント以上を占める）頻出値を検索する問い合わせでは、いかなる場合でもインデックスを使用しないため、インデックスにそれらの行を持ち続けることは全く意味がありません。これによりインデックスのサイズが小さくなりますので、インデックスを使用する問い合わせが速くなります。また、インデックスを更新する必要のないケースも生じるため、テーブルを更新する作業の多くも速くなります。例 11.1にこの概念に基づいた用例を示します。

例11.1 頻出値を除外するための部分インデックスの作成

<title>Setting up a Partial Index to Exclude Common Values</title>

Suppose you are storing web server access logs in a database. Most accesses originate from the IP address range of your organization but some are from elsewhere (say, employees on dial-up connections). If your searches by IP are primarily for outside accesses, you probably do not need to index the IP range that corresponds to your organization's subnet. ウェブサーバのアクセスログをデータベースに格納しているとします。多くのアクセスは、社内のIPアドレスの範囲内から発信されています。しかし、範囲外のアドレス（例えば、社員がダイアルアップ接続している場所）からの発信もあります。主に範囲外からのアクセスをIPアドレスで検索する場合、社内のサブネットに該当するIPアドレスの範囲にインデックスを作成する必要はないでしょう。

Assume a table like this: 以下のようなテーブルがあると想定します。

CREATE TABLE access_log (
    url varchar,
    client_ip inet,
    ...
);

To create a partial index that suits our example, use a command such as this: この例に適する部分インデックスを作成するには、以下のようなコマンドを使用します。

CREATE INDEX access_log_client_ip_ix ON access_log (client_ip)
WHERE NOT (client_ip > inet '192.168.100.0' AND
           client_ip < inet '192.168.100.255');

A typical query that can use this index would be: このインデックスを使用できる問い合わせの典型的な例は、以下のようなものです。

SELECT *
FROM access_log
WHERE url = '/index.html' AND client_ip = inet '212.78.10.32';

Here the query's IP address is covered by the partial index. The following query cannot use the partial index, as it uses an IP address that is excluded from the index: この問い合わせのIPアドレスは部分インデックスでカバーされています。以下の問い合わせは、インデックスから除外されているIPアドレスを使用しているので、部分インデックスを使用できません。

SELECT *
FROM access_log
WHERE url = '/index.html' AND client_ip = inet '192.168.100.23';

Observe that this kind of partial index requires that the common values be predetermined, so such partial indexes are best used for data distributions that do not change. Such indexes can be recreated occasionally to adjust for new data distributions, but this adds maintenance effort. このような部分インデックスを使用するには、あらかじめ頻出値が何であるかを知っている必要があることに注意してください。値の分布が変わらない場合に、このような部分インデックスが最善です。データの分布が新しくなった場合はインデックスの再作成によって調整できますが、これはメンテナンス作業を増やしてしまいます。

Another possible use for a partial index is to exclude values from the index that the typical query workload is not interested in; this is shown in <xref linkend="indexes-partial-ex2"/>. This results in the same advantages as listed above, but it prevents the <quote>uninteresting</quote> values from being accessed via that index, even if an index scan might be profitable in that case. Obviously, setting up partial indexes for this kind of scenario will require a lot of care and experimentation. 部分インデックスを使用する有効な他の方法としては、一般的な問い合わせに必要のない値をインデックスから取り除くことです。例 11.2を参照してください。この方法の利点は上で示したものと同じです。ただ、この方法を使用すると、インデックススキャンが適している場合でも、「必要のない」値へのインデックスを介したアクセスが防止されてしまいます。以上のことから明白なように、このようなケースで部分インデックスを作成する際は、細心の注意を払い、十分な検証を行う必要があります。

例11.2 必要のない値を除外するための部分インデックスの作成

<title>Setting up a Partial Index to Exclude Uninteresting Values</title>

If you have a table that contains both billed and unbilled orders, where the unbilled orders take up a small fraction of the total table and yet those are the most-accessed rows, you can improve performance by creating an index on just the unbilled rows. The command to create the index would look like this: 請求済み注文書および未請求注文書からなる、１つのテーブルがあるとします。そして、未請求注文書の方がテーブル全体に対する割合が小さく、かつその部分へのアクセス数が最も多かったとします。このような場合、未請求の行のみにインデックスを作成することにより、性能を向上させることができます。インデックスの作成には、以下のようなコマンドを使用します。

CREATE INDEX orders_unbilled_index ON orders (order_nr)
    WHERE billed is not true;

A possible query to use this index would be: このインデックスを使用する問い合わせの例としては、次のものが考えられます。

SELECT * FROM orders WHERE billed is not true AND order_nr < 10000;

However, the index can also be used in queries that do not involve <structfield>order_nr</structfield> at all, e.g.: しかし、このインデックスは、order_nrをまったく使用しない問い合わせでも使用することができます。以下は、その例です。

SELECT * FROM orders WHERE billed is not true AND amount > 5000.00;

This is not as efficient as a partial index on the <structfield>amount</structfield> column would be, since the system has to scan the entire index. Yet, if there are relatively few unbilled orders, using this partial index just to find the unbilled orders could be a win. この問い合わせでは、システムがインデックス全体を検索する必要があるため、amount列に部分インデックスを作成した場合ほど効率は良くありません。しかし、未請求注文書データが比較的少ない場合は、この部分インデックスを未請求注文書を検出するためだけに使用した方が効率が良い可能性があります。

Note that this query cannot use this index: 以下の問い合わせでは、このインデックスを使用できないことに注意してください。

SELECT * FROM orders WHERE order_nr = 3501;

The order 3501 might be among the billed or unbilled orders. 注文番号3501は請求済みかもしれませんし、未請求かもしれないからです。

<xref linkend="indexes-partial-ex2"/> also illustrates that the indexed column and the column used in the predicate do not need to match. <productname>PostgreSQL</productname> supports partial indexes with arbitrary predicates, so long as only columns of the table being indexed are involved. However, keep in mind that the predicate must match the conditions used in the queries that are supposed to benefit from the index. To be precise, a partial index can be used in a query only if the system can recognize that the <literal>WHERE</literal> condition of the query mathematically implies the predicate of the index. <productname>PostgreSQL</productname> does not have a sophisticated theorem prover that can recognize mathematically equivalent expressions that are written in different forms. (Not only is such a general theorem prover extremely difficult to create, it would probably be too slow to be of any real use.) The system can recognize simple inequality implications, for example <quote>x < 1</quote> implies <quote>x < 2</quote>; otherwise the predicate condition must exactly match part of the query's <literal>WHERE</literal> condition or the index will not be recognized as usable. Matching takes place at query planning time, not at run time. As a result, parameterized query clauses do not work with a partial index. For example a prepared query with a parameter might specify <quote>x < ?</quote> which will never imply <quote>x < 2</quote> for all possible values of the parameter. 例 11.2でもわかるように、インデックスが付けられた列名と、述語で使用されている列名は、一致している必要はありません。 PostgreSQLでは、インデックス付けされるテーブルの列だけが含まれているのなら、任意の述語で部分インデックスを使用できます。しかし、この述語は、インデックスを使用させたい問い合わせの条件と一致する必要があることに留意してください。正確に言うと、部分インデックスを問い合わせで使用できるのは、インデックスの述語が問い合わせのWHERE条件に数学的に当てはまるとシステムが判断できる場合のみです。 PostgreSQLには、異なった形式で記述された述語が数学的に同等のものであると判断できるような、洗練された定理証明機能はありません。（そのような汎用的な定理証明機能の作成は、非常に困難であるだけではなく、おそらく実際の利用にはあまりにも実行速度が遅過ぎるでしょう。）システムでは、例えば「x < 1」は「x < 2」を意味するというような、単純な比較演算子の意味は認識可能です。しかし、それ以外の場合は、述語条件は問い合わせのWHERE条件と完全に一致している必要があります。一致していない場合は、インデックスは使用可能と認識されません。一致するかどうかは、実行時ではなく、問い合わせ計画作成時に判定されます。したがって、パラメータ付きの問い合わせでは部分インデックスは動作しません。たとえば、「x < ?」と指定されたパラメータを持つ、プリペアド問い合わせでは、どのようなパラメータ値であっても「x < 2」を表しません。

A third possible use for partial indexes does not require the index to be used in queries at all. The idea here is to create a unique index over a subset of a table, as in <xref linkend="indexes-partial-ex3"/>. This enforces uniqueness among the rows that satisfy the index predicate, without constraining those that do not. 部分インデックスの考えられる３つ目の用法では、問い合わせでインデックスをまったく使用しません。この考え方は、テーブルの部分集合に一意インデックスを作成するというものです。例 11.3を参照してください。これにより、インデックスの述語を満たさない行を制約することなく、その述語を満たす行での一意性を強制します。

例11.3 一意な部分インデックスの作成

<title>Setting up a Partial Unique Index</title>

Suppose that we have a table describing test outcomes. We wish to ensure that there is only one <quote>successful</quote> entry for a given subject and target combination, but there might be any number of <quote>unsuccessful</quote> entries. Here is one way to do it: テストの結果が格納されているテーブルがあるとします。与えられた件名（subject）および対象（target）の組み合わせに対して、「成功」のエントリが確実に1つしかないようにします。「失敗」のエントリは、複数あっても構いません。以下に、これを実行する一例を示します。

CREATE TABLE tests (
    subject text,
    target text,
    success boolean,
    ...
);

CREATE UNIQUE INDEX tests_success_constraint ON tests (subject, target)
    WHERE success;

This is a particularly efficient approach when there are few successful tests and many unsuccessful ones. It is also possible to allow only one null in a column by creating a unique partial index with an <literal>IS NULL</literal> restriction. これは、成功するテストが少なく、失敗するテストが多い場合に特に有効な手法です。また、IS NULL制限を使用して一意の部分インデックスを作成することにより、ひとつの列にNULL値をひとつのみ許可できます。

Finally, a partial index can also be used to override the system's query plan choices. Also, data sets with peculiar distributions might cause the system to use an index when it really should not. In that case the index can be set up so that it is not available for the offending query. Normally, <productname>PostgreSQL</productname> makes reasonable choices about index usage (e.g., it avoids them when retrieving common values, so the earlier example really only saves index size, it is not required to avoid index usage), and grossly incorrect plan choices are cause for a bug report. 最後に、部分インデックスは、システムの問い合わせ計画の選択を変更するためにも使用できます。特殊なデータ分布を持つデータ集合では、システムが実際には使用すべきでないインデックスを使用してしまうことがあります。このような場合、特定の問い合わせでは使用することができないインデックスを設定することができます。通常、PostgreSQLはインデックスの使用について適切な選択を行います（例えば、頻出値の検索にはインデックスを使用しませんので、前述の例はインデックスのサイズを実際に小さくするだけのもので、インデックスの使用を制限するためには必要はありません）。まったく不適切な計画を選択するようであれば、バグとして報告してください。

Keep in mind that setting up a partial index indicates that you know at least as much as the query planner knows, in particular you know when an index might be profitable. Forming this knowledge requires experience and understanding of how indexes in <productname>PostgreSQL</productname> work. In most cases, the advantage of a partial index over a regular index will be minimal. There are cases where they are quite counterproductive, as in <xref linkend="indexes-partial-ex4"/>. 部分インデックスを作成するには、少なくとも問い合わせプランナと同等の知識を持っていること、特に、インデックスが有益となる状況を理解している必要があることに留意してください。このような知識を得るためには、PostgreSQLでインデックスがどのように機能するかを理解し、経験を積むことが必要です。ほとんどの場合、通常のインデックスと比べて、部分インデックスを使用する利点は微細です。例 11.4のように、かなり逆効果な場合があります。

例11.4 パーティショニングの代わりに部分インデックスを使用しない

<title>Do Not Use Partial Indexes as a Substitute for Partitioning</title>

You might be tempted to create a large set of non-overlapping partial indexes, for example 例えば、重複しない部分インデックスの大きなセットを作りたいと思うかもしれません。

CREATE INDEX mytable_cat_1 ON mytable (data) WHERE category = 1;
CREATE INDEX mytable_cat_2 ON mytable (data) WHERE category = 2;
CREATE INDEX mytable_cat_3 ON mytable (data) WHERE category = 3;
...
CREATE INDEX mytable_cat_N ON mytable (data) WHERE category = N;

This is a bad idea! Almost certainly, you'll be better off with a single non-partial index, declared like これは良くないアイディアです！ほとんどの場合、以下のように宣言された、部分的でない単一のインデックスを使用する方が良いでしょう。

CREATE INDEX mytable_cat_data ON mytable (category, data);

(Put the category column first, for the reasons described in <xref linkend="indexes-multicolumn"/>.) While a search in this larger index might have to descend through a couple more tree levels than a search in a smaller index, that's almost certainly going to be cheaper than the planner effort needed to select the appropriate one of the partial indexes. The core of the problem is that the system does not understand the relationship among the partial indexes, and will laboriously test each one to see if it's applicable to the current query. （11.3で説明されている理由から、最初にcategory列を指定します。）この大きなインデックスでの検索は、小さなインデックスでの検索よりも2,3ツリーレベルを下に移動する必要がありますが、部分インデックスの適切な1つを選択するためにプランナがおこなう作業よりも、ほぼ確実にコストが削減できます。この問題の核心は、システムが部分インデックス間の関係を理解していないことと、現在の問い合わせに適用出来るかどうかそれぞれ苦労してテストすることです。

If your table is large enough that a single index really is a bad idea, you should look into using partitioning instead (see <xref linkend="ddl-partitioning"/>). With that mechanism, the system does understand that the tables and indexes are non-overlapping, so far better performance is possible. テーブルが非常に大きくて、単一のインデックスが本当に悪いアイデアである場合は、代わりにパーティショニングを使用する必要があります（5.12を参照してください）。このメカニズムにより、テーブルとインデックスが重複していないことが、システムで認識されるため、パフォーマンスが大幅に向上します。

More information about partial indexes can be found in <xref linkend="ston89b"/>, <xref linkend="olson93"/>, and <xref linkend="seshadri95"/>. 部分インデックスの詳細については、[ston89b]、[olson93]、および[seshadri95]を参照してください。

前へ	上へ	次へ
11.7. 式に対するインデックス	ホーム	11.9. インデックスオンリースキャンとカバリングインデックス