64.6. ハッシュインデックス

PostgreSQL 17.6文書
		第64章組み込みインデックスアクセスメソッド	誤訳等の報告
前へ	上へ	64.6. ハッシュインデックス	次へ

64.6. ハッシュインデックス #

<title>Hash Indexes</title>

64.6.1. 概要 #

<title>Overview</title>

<productname>PostgreSQL</productname> includes an implementation of persistent on-disk hash indexes, which are fully crash recoverable. Any data type can be indexed by a hash index, including data types that do not have a well-defined linear ordering. Hash indexes store only the hash value of the data being indexed, thus there are no restrictions on the size of the data column being indexed. PostgreSQLには、クラッシュから完全に回復可能なディスク上の永続的なハッシュインデックスの実装が含まれています。明確な線形順序付けを持たないものも含め、すべてのデータ型がハッシュインデックスでインデックス可能です。ハッシュインデックスは、インデックスされる値のハッシュ値のみを保存するので、インデックス対象の列のサイズは制限となりません。

Hash indexes support only single-column indexes and do not allow uniqueness checking. ハッシュインデックスは単一列のインデックスのみをサポートし、唯一性のチェックはできません。

Hash indexes support only the <literal>=</literal> operator, so WHERE clauses that specify range operations will not be able to take advantage of hash indexes. ハッシュインデックスは=演算子のみをサポートしており、範囲演算を指定するWHERE句はハッシュインデックスの恩恵をこうむることができません。

Each hash index tuple stores just the 4-byte hash value, not the actual column value. As a result, hash indexes may be much smaller than B-trees when indexing longer data items such as UUIDs, URLs, etc. The absence of the column value also makes all hash index scans lossy. Hash indexes may take part in bitmap index scans and backward scans. 各インデックスタプルは単なる4バイトのハッシュ値で、実際の列の値ではありません。そのため、UUIDやURLのような大きなデータをインデックスすると、ハッシュインデックスはB-treeよりもずっと小さくなるかも知れません。また、列値が含まれていないため、すべてのハッシュスキャンは損失がある(lossy)ものになります。ハッシュインデックスはビットマップインデックススキャンおよび後方スキャンに参加できます。

Hash indexes are best optimized for SELECT and UPDATE-heavy workloads that use equality scans on larger tables. In a B-tree index, searches must descend through the tree until the leaf page is found. In tables with millions of rows, this descent can increase access time to data. The equivalent of a leaf page in a hash index is referred to as a bucket page. In contrast, a hash index allows accessing the bucket pages directly, thereby potentially reducing index access time in larger tables. This reduction in "logical I/O" becomes even more pronounced on indexes/data larger than shared_buffers/RAM. ハッシュインデックスは、大きなテーブルに対して同値スキャンを使用するSELECTとUPDATEを多用するワークロードに対して最適です。 B-treeインデックスでは、スキャンはリーフページが見つかるまで木を降下しなければなりません。何百万行のテーブルではこの降下スキャンによりデータをアクセスする時間がかかることがあります。ハッシュインデックスにおけるリーフページに相当するものはバケットページと呼ばれます。対照的に、ハッシュインデックスはバケットページを直接アクセスすることが可能で、大きなテーブルでのインデックスアクセスの時間を短縮できる可能性があります。「論理的なI/O」における時間短縮は、共有バッファ/RAMよりもインデックス/データが大きな時にはより顕著になります。

Hash indexes have been designed to cope with uneven distributions of hash values. Direct access to the bucket pages works well if the hash values are evenly distributed. When inserts mean that the bucket page becomes full, additional overflow pages are chained to that specific bucket page, locally expanding the storage for index tuples that match that hash value. When scanning a hash bucket during queries, we need to scan through all of the overflow pages. Thus an unbalanced hash index might actually be worse than a B-tree in terms of number of block accesses required, for some data. ハッシュインデックスはハッシュ値の均等ではない分布を想定して設計されています。バケットページへのアクセスはハッシュ値が均一に分布している時にうまく働きます。挿入によりバケットページが満杯になると、追加のオーバーフローページが特定のバケットページに連結され、そのハッシュ値に適合するインデックスタプル用の領域を局所的に拡張します。問い合わせ中にハッシュバケットをスキャンする際は、すべてのオーバーフローページをスキャンする必要があります。そのため、不均等なハッシュインデックスは、いくつかのデータに対してアクセスが必要なブロックの数の点で、Bツリーよりも悪いかも知れません。

As a result of the overflow cases, we can say that hash indexes are most suitable for unique, nearly unique data or data with a low number of rows per hash bucket. One possible way to avoid problems is to exclude highly non-unique values from the index using a partial index condition, but this may not be suitable in many cases. オーバーフローが発生する場合を考慮すると、ハッシュインデックスは一意か、ほぼ一意に近いデータあるいは、それぞれのハッシュバケットに少数の行があるデータがもっとも適していると言えます。問題を回避できる方法の１つは部分インデックス条件を使って極端に一意ではない値を排除することですが、多くの場合にこれが適しているとは言えないかも知れません。

Like B-Trees, hash indexes perform simple index tuple deletion. This is a deferred maintenance operation that deletes index tuples that are known to be safe to delete (those whose item identifier's LP_DEAD bit is already set). If an insert finds no space is available on a page we try to avoid creating a new overflow page by attempting to remove dead index tuples. Removal cannot occur if the page is pinned at that time. Deletion of dead index pointers also occurs during VACUUM. Bツリーのように、ハッシュインデックスは単純なインデックスタプルの削除を行います。これは削除しても安全であると分かる（アイテム識別子のLP_DEADビットがすでにセットされている）インデックスタプルを削除する遅延メンテナンス操作です。挿入の際にページに領域が確保できない場合は、不要インデックスタプルを削除することによって、新しいオーバーフローページの作成を回避しようとします。その時点でそのページにピンがある場合は削除することはできません。不要インデックスポインタの削除もVACUUM中に発生します。

If it can, VACUUM will also try to squeeze the index tuples onto as few overflow pages as possible, minimizing the overflow chain. If an overflow page becomes empty, overflow pages can be recycled for reuse in other buckets, though we never return them to the operating system. There is currently no provision to shrink a hash index, other than by rebuilding it with REINDEX. There is no provision for reducing the number of buckets, either. 可能ならば、VACUUMはインデックスタプルをできるだけ少ないオーバーフローページに押し込むことも試み、オーバーフローの連結を最小限に抑えます。あるオーバーフローページが空になったらそのオーバーフローページは再利用できますが、オペレーティングシステムに返却することはありません。今の所、REINDEXで再構築する以外にハッシュインデックスを縮小するようにする予定はありません。バケット数を少なくする予定もありません。

Hash indexes may expand the number of bucket pages as the number of rows indexed grows. The hash key-to-bucket-number mapping is chosen so that the index can be incrementally expanded. When a new bucket is to be added to the index, exactly one existing bucket will need to be "split", with some of its tuples being transferred to the new bucket according to the updated key-to-bucket-number mapping. ハッシュインデックスはインデックスされた行数が増えるとバケットページ数も拡張します。ハッシュキーからバケット番号へのマッピングは、インデックスが徐々に拡張できるように選択されます。新しいバケットがインデックスに追加されることになったら、存在しているバケットの厳密に一つが「分割」される必要があります。更新されたハッシュキーからバケット番号へのマッピングにしたがって、タプルが新しいバケットに転送されます。

The expansion occurs in the foreground, which could increase execution time for user inserts. Thus, hash indexes may not be suitable for tables with rapidly increasing number of rows. その拡張はフォアグラウンドで行われるので、ユーザが挿入を実行するのにかかる時間を増加させるでしょう。ですから、ハッシュインデックスは行数が急激に拡張するテーブルには適していないかもしれません。

64.6.2. 実装 #

<title>Implementation</title>

There are four kinds of pages in a hash index: the meta page (page zero), which contains statically allocated control information; primary bucket pages; overflow pages; and bitmap pages, which keep track of overflow pages that have been freed and are available for re-use. For addressing purposes, bitmap pages are regarded as a subset of the overflow pages. ハッシュインデックスには4種類のページがあります。静的に確保された制御情報を持つメタページ(ページ0)、主バケットページ、オーバーフローページ、解放されて再利用が可能なオーバーフローページを追跡するビットマップページ、です。アドレッシング目的という点では、ビットマップページはオーバーフローページのサブセットと見なされます。

Both scanning the index and inserting tuples require locating the bucket where a given tuple ought to be located. To do this, we need the bucket count, highmask, and lowmask from the metapage; however, it's undesirable for performance reasons to have to have to lock and pin the metapage for every such operation. Instead, we retain a cached copy of the metapage in each backend's relcache entry. This will produce the correct bucket mapping as long as the target bucket hasn't been split since the last cache refresh. インデックスのスキャンおよびタプルの挿入は両者とも、与えられたタプルが置かれるべきバケットを特定する必要があります。これを実施するためには、メタページ内のバケット数、上位マスク、下位マスクが必要です。しかし、性能上の理由で、そのような操作を行うたびにメタページをロックしてピンを立てるのは好ましいことではありません。そうする代わりに、それぞれのバックエンドのリレーションキャッシュ(relcache)のエントリにキャッシュされたメタページの複製を保持します。最後にキャッシュが更新されてから目的のバケットが分割されていない限り、これは正しいバケットのマッピングを生成します。

Primary bucket pages and overflow pages are allocated independently since any given index might need more or fewer overflow pages relative to its number of buckets. The hash code uses an interesting set of addressing rules to support a variable number of overflow pages while not having to move primary bucket pages around after they are created. 特定のインデックスにおいて、バケット数に対して必要なオーバーフローページは増減する可能性があるため、主バケットページとオーバーフローページは独立して確保されます。ハッシュコードは可変数のオーバーフローページをサポートするために一連の興味深いアドレッシング規則を使用しており、主バケットページを作成後に移動する必要もありません。

Each row in the table indexed is represented by a single index tuple in the hash index. Hash index tuples are stored in bucket pages, and if they exist, overflow pages. We speed up searches by keeping the index entries in any one index page sorted by hash code, thus allowing binary search to be used within an index page. Note however that there is *no* assumption about the relative ordering of hash codes across different index pages of a bucket. インデックス付されたテーブル内の各行はハッシュインデックスにおいては単一のインデックスタプルで表現されています。ハッシュインデックスタプルはバケットページに格納され、オーバーフローページが存在するならそこにも存在します。一つのインデックスページにインデックスエントリをハッシュコードによりソートされた状態で保持することで、インデックスページ内での二分探索を可能とし、探索を高速化しています。しかし、バケット内の異なるインデックスページ間においては、ハッシュコードの相対的な順序付けに何も前提はないことに留意してください。

The bucket splitting algorithms to expand the hash index are too complex to be worthy of mention here, though are described in more detail in <filename>src/backend/access/hash/README</filename>. The split algorithm is crash safe and can be restarted if not completed successfully. ハッシュインデックスを拡張するためにバケットを分割するアルゴリズムは複雑過ぎてここで言及するには及びませんが、より詳細がsrc/backend/access/hash/READMEに記載されています。分割アルゴリズムはクラッシュ耐性があり、正常に完了していなくても再スタートできます。

前へ	上へ	次へ
64.5. BRINインデックス	ホーム	第65章データベースの物理的な格納