F.6. bloom — ブルームフィルタインデックスアクセスメソッド

PostgreSQL 17.5文書
		付録F 追加で提供されるモジュールと拡張	誤訳等の報告
前へ	上へ	F.6. bloom — ブルームフィルタインデックスアクセスメソッド	次へ

F.6. bloom — ブルームフィルタインデックスアクセスメソッド #

<title>bloom — bloom filter index access method</title>

<literal>bloom</literal> provides an index access method based on <ulink url="https://en.wikipedia.org/wiki/Bloom_filter">Bloom filters</ulink>. bloomは、ブルームフィルタによるインデックスアクセスメソッドを提供します。

A Bloom filter is a space-efficient data structure that is used to test whether an element is a member of a set. In the case of an index access method, it allows fast exclusion of non-matching tuples via signatures whose size is determined at index creation. ブルームフィルタは、空間効率の良いデータ構造で、ある要素が集合のメンバかどうかをテストするのに用いられます。インデックスアクセスメソッドとして使用する場合、インデックス作成時に大きさが決まるシグネチャを使って、条件を満たさないタプルを高速に除外することができます。

A signature is a lossy representation of the indexed attribute(s), and as such is prone to reporting false positives; that is, it may be reported that an element is in the set, when it is not. So index search results must always be rechecked using the actual attribute values from the heap entry. Larger signatures reduce the odds of a false positive and thus reduce the number of useless heap visits, but of course also make the index larger and hence slower to scan. シグネチャはインデックス化された属性を非可逆的に表現するもので、その性質上、偽陽性の結果を出すことがあります。すなわち、集合の中にない要素が、集合の中にあると報告するかもしれません。ですから、インデックスの検索結果は、ヒープエントリ中の実際の属性値を使って、必ず再検査しなければなりません。シグネチャが大きくなれば偽陽性の可能性が下がるので不必要なヒープの検索は減りますが、もちろんそうなるとインデックスが大きくなるので、スキャンが遅くなります。

This type of index is most useful when a table has many attributes and queries test arbitrary combinations of them. A traditional btree index is faster than a bloom index, but it can require many btree indexes to support all possible queries where one needs only a single bloom index. Note however that bloom indexes only support equality queries, whereas btree indexes can also perform inequality and range searches. この種のインデックスは、テーブルに多数の属性があり、その任意の組み合わせを検索する問い合わせを実行するときにもっとも有効です。伝統的なbtreeインデックスはブルームインデックスよりも高速ですが、可能なすべての問い合わせをサポートするためには多数のbtreeインデックスが必要なのに対し、ブルームインデックスでは、たった一つのブルームインデックスだけで事足ります。しかし、ブルームインデックスでは等価検索だけをサポートすることに注意してください。 btreeインデックスでは、等価だけでなく、範囲検索も実行できます。

F.6.1. パラメータ #

<title>Parameters</title>

A <literal>bloom</literal> index accepts the following parameters in its <literal>WITH</literal> clause: bloomインデックスは、WITH句中の以下のパラメータを受け付けます。

length: Length of each signature (index entry) in bits. It is rounded up to the nearest multiple of <literal>16</literal>. The default is <literal>80</literal> bits and the maximum is <literal>4096</literal>. ビット単位の個々のシグネチャ（インデックスエントリ）の長さ。 16の倍数に近い値に丸められます。デフォルトは80ビットで、最大値は4096です。

col1 — col32: Number of bits generated for each index column. Each parameter's name refers to the number of the index column that it controls. The default is <literal>2</literal> bits and the maximum is <literal>4095</literal>. Parameters for index columns not actually used are ignored. 各インデックスカラムに対して生成するビット数。各々のパラメータ名は、管理対象のインデックスカラムの番号です。デフォルトは2ビットで、最大値は4095です。実際には使用されないインデックスカラムについてのパラメータは無視されます。

F.6.2. 例 #

<title>Examples</title>

This is an example of creating a bloom index: ブルームインデックスの作成例です。

CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
       WITH (length=80, col1=2, col2=2, col3=4);

The index is created with a signature length of 80 bits, with attributes i1 and i2 mapped to 2 bits, and attribute i3 mapped to 4 bits. We could have omitted the <literal>length</literal>, <literal>col1</literal>, and <literal>col2</literal> specifications since those have the default values. このインデックスは80ビット長のシグネチャで作成され、属性i1とi2は2ビットに、i3は4ビットにマップされます。 length、col1、col2指定はデフォルト値を使っているので、省略しても構いません。

Here is a more complete example of bloom index definition and usage, as well as a comparison with equivalent btree indexes. The bloom index is considerably smaller than the btree index, and can perform better. より完全なブルームインデックスの定義と使用法を示します。比較のために、これと同等のbtreeインデックスも併せて示します。ブルームインデックスはbtreeインデックスよりもかなり小さく、また、より良い性能を発揮できるかもしれません。

=# CREATE TABLE tbloom AS
   SELECT
     (random() * 1000000)::int as i1,
     (random() * 1000000)::int as i2,
     (random() * 1000000)::int as i3,
     (random() * 1000000)::int as i4,
     (random() * 1000000)::int as i5,
     (random() * 1000000)::int as i6
   FROM
  generate_series(1,10000000);
SELECT 10000000

A sequential scan over this large table takes a long time: これだけ大きなテーブルに対するシーケンシャルスキャンは長い時間がかかります。

=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on tbloom  (cost=0.00..213744.00 rows=250 width=24) (actual time=357.059..357.059 rows=0 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 10000000
 Planning Time: 0.346 ms
 Execution Time: 357.076 ms
(5 rows)

Even with the btree index defined the result will still be a sequential scan: たとえbtreeインデックスが定義されていたとしても、結果はまだシーケンシャルスキャンです。

=# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
 pg_size_pretty
----------------
 386 MB
(1 row)
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                              QUERY PLAN
------------------------------------------------------------------------------------------------------
 Seq Scan on tbloom  (cost=0.00..213744.00 rows=2 width=24) (actual time=351.016..351.017 rows=0 loops=1)
   Filter: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Filter: 10000000
 Planning Time: 0.138 ms
 Execution Time: 351.035 ms
(5 rows)

Having the bloom index defined on the table is better than btree in handling this type of search: そのテーブルにブルームインデックスが定義されていれば、btreeよりもこの種の検索をうまく扱います。

=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
 pg_size_pretty
----------------
 153 MB
(1 row)
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=1792.00..1799.69 rows=2 width=24) (actual time=22.605..22.606 rows=0 loops=1)
   Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
   Rows Removed by Index Recheck: 2300
   Heap Blocks: exact=2256
   ->  Bitmap Index Scan on bloomidx  (cost=0.00..178436.00 rows=1 width=0) (actual time=20.005..20.005 rows=2300 loops=1)
         Index Cond: ((i2 = 898732) AND (i5 = 123451))
 Planning Time: 0.099 ms
 Execution Time: 22.632 ms
(8 rows)

Now, the main problem with the btree search is that btree is inefficient when the search conditions do not constrain the leading index column(s). A better strategy for btree is to create a separate index on each column. Then the planner will choose something like this: btree検索の主要な問題は、検索条件が、先頭（そしてそれに続く）インデックスカラムを使用しないときに、効率が悪くなってしまうことです。 btreeでは各々のカラムに対して別々のインデックスを作るのが良い戦略です。するとプランはこのような選択をします。

=# CREATE INDEX btreeidx1 ON tbloom (i1);
CREATE INDEX
=# CREATE INDEX btreeidx2 ON tbloom (i2);
CREATE INDEX
=# CREATE INDEX btreeidx3 ON tbloom (i3);
CREATE INDEX
=# CREATE INDEX btreeidx4 ON tbloom (i4);
CREATE INDEX
=# CREATE INDEX btreeidx5 ON tbloom (i5);
CREATE INDEX
=# CREATE INDEX btreeidx6 ON tbloom (i6);
CREATE INDEX
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbloom  (cost=9.29..13.30 rows=1 width=24) (actual time=0.032..0.033 rows=0 loops=1)
   Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
   ->  BitmapAnd  (cost=9.29..9.29 rows=1 width=0) (actual time=0.047..0.047 rows=0 loops=1)
         ->  Bitmap Index Scan on btreeidx5  (cost=0.00..4.52 rows=11 width=0) (actual time=0.026..0.026 rows=7 loops=1)
               Index Cond: (i5 = 123451)
         ->  Bitmap Index Scan on btreeidx2  (cost=0.00..4.52 rows=11 width=0) (actual time=0.007..0.007 rows=8 loops=1)
               Index Cond: (i2 = 898732)
 Planning Time: 0.264 ms
 Execution Time: 0.047 ms
(9 rows)

Although this query runs much faster than with either of the single indexes, we pay a penalty in index size. Each of the single-column btree indexes occupies 88.5 MB, so the total space needed is 531 MB, over three times the space used by the bloom index. 個別のインデックスのどれかを使うよりもこの問い合わせはずっと高速に実行できますが、インデックスのサイズにペナルティを払わなければなりません。各々の単一カラムのbtreeインデックスは、88.5MBになります。ですから、全体で必要なスペースは531MBです。ブルームインデックスで使用するスペースの3倍以上です。

F.6.3. 演算子クラスインタフェース #

<title>Operator Class Interface</title>

An operator class for bloom indexes requires only a hash function for the indexed data type and an equality operator for searching. This example shows the operator class definition for the <type>text</type> data type: ブルームインデックスの演算子クラスには、インデックス対象のデータ型に対するハッシュ関数と、検索のための等価演算子だけが必要です。この例では、textデータ型に対する演算子クラスの定義を示します。

CREATE OPERATOR CLASS text_ops
DEFAULT FOR TYPE text USING bloom AS
    OPERATOR    1   =(text, text),
    FUNCTION    1   hashtext(text);

F.6.4. 制限事項 #

<title>Limitations</title>

Only operator classes for <type>int4</type> and <type>text</type> are included with the module. このモジュールには、int4とtextに対する演算子クラスだけが含まれています。
Only the <literal>=</literal> operator is supported for search. But it is possible to add support for arrays with union and intersection operations in the future. =演算子だけが検索ではサポートされています。しかし、配列の和、積演算のサポートを将来追加することは可能です。
<literal>bloom</literal> access method doesn't support <literal>UNIQUE</literal> indexes. bloomアクセスメソッドはUNIQUEインデックスをサポートしていません。
<literal>bloom</literal> access method doesn't support searching for <literal>NULL</literal> values. bloomアクセスメソッドはNULL値の検索をサポートしていません。

F.6.5. 作者 #

<title>Authors</title>

Teodor Sigaev <teodor@postgrespro.ru>, Postgres Professional, Moscow, Russia

Alexander Korotkov <a.korotkov@postgrespro.ru>, Postgres Professional, Moscow, Russia

Oleg Bartunov <obartunov@postgrespro.ru>, Postgres Professional, Moscow, Russia

前へ	上へ	次へ
F.5. basic_archive — WALアーカイブモジュールの例	ホーム	F.7. btree_gin — GIN演算子クラスとB-tree動作