68.3. 拡張性

PostgreSQL 16.0文書
		第68章 GiSTインデックス	誤訳等の報告
前へ	上へ	68.3. 拡張性	次へ

68.3. 拡張性 #

<title>Extensibility</title>

Traditionally, implementing a new index access method meant a lot of difficult work. It was necessary to understand the inner workings of the database, such as the lock manager and Write-Ahead Log. The <acronym>GiST</acronym> interface has a high level of abstraction, requiring the access method implementer only to implement the semantics of the data type being accessed. The <acronym>GiST</acronym> layer itself takes care of concurrency, logging and searching the tree structure. 伝統的に、新しいインデックスメソッドの実装は、非常に難しい作業を意味していました。ロックマネージャやログ先行書き込みなどデータベースの内部動作を理解する必要がありました。 GiSTインタフェースは高度に抽象化されており、アクセスメソッドの実装者には、アクセスするデータ型のセマンティクスのみの実装を要求します。 GiST層自身が同時実行性、ログ処理、ツリー構造の検索処理に関する注意を行います。

This extensibility should not be confused with the extensibility of the other standard search trees in terms of the data they can handle. For example, <productname>PostgreSQL</productname> supports extensible B-trees and hash indexes. That means that you can use <productname>PostgreSQL</productname> to build a B-tree or hash over any data type you want. But B-trees only support range predicates (<literal><</literal>, <literal>=</literal>, <literal>></literal>), and hash indexes only support equality queries. この拡張性と、他の、扱うことができるデータを対象とした標準検索ツリーの拡張性とを混同すべきではありません。例えば、PostgreSQLは拡張可能なB-treeとハッシュインデックスをサポートしています。これは、PostgreSQLを使用して、任意のデータ型に対するB-treeやハッシュを構築することができることを意味します。しかし、B-treeは範囲述語（<、=、>）のみをサポートし、ハッシュインデックスは等価性問い合わせのみをサポートします。

So if you index, say, an image collection with a <productname>PostgreSQL</productname> B-tree, you can only issue queries such as <quote>is imagex equal to imagey</quote>, <quote>is imagex less than imagey</quote> and <quote>is imagex greater than imagey</quote>. Depending on how you define <quote>equals</quote>, <quote>less than</quote> and <quote>greater than</quote> in this context, this could be useful. However, by using a <acronym>GiST</acronym> based index, you could create ways to ask domain-specific questions, perhaps <quote>find all images of horses</quote> or <quote>find all over-exposed images</quote>. ですから、PostgreSQLのB-treeで例えば画像群をインデックス付けする場合、「画像xは画像yと同じか」、「画像xは画像yより小さいか」、「画像xは画像yより大きいか」といった問い合わせのみ発行することができます。この文脈でどのように「同じか」や「より小さいか」、「より大きいか」を定義するかに依存して、これが有意なこともあるでしょう。しかし、GiSTを基にしたインデックスを使用すれば、問題分野に特化した、おそらくは、「馬の画像を全て見つけたい」、「露出オーバーの写真をすべて見つけたい」といった質問に答えられる手段を作成することができます。

All it takes to get a <acronym>GiST</acronym> access method up and running is to implement several user-defined methods, which define the behavior of keys in the tree. Of course these methods have to be pretty fancy to support fancy queries, but for all the standard queries (B-trees, R-trees, etc.) they're relatively straightforward. In short, <acronym>GiST</acronym> combines extensibility along with generality, code reuse, and a clean interface. GiSTアクセスメソッドを有効にし、実行するために行なわなければならないことは、ツリーのキーの動作を定義する、複数のユーザ定義のメソッドを実装することです。当然ながら、これらのメソッドは手の込んだ問い合わせをサポートするためかなり意匠を凝らす必要があります。しかし、すべての標準的な問い合わせ（B-treeやR-treeなど）ではこれらは、相対的に見てごく簡単です。まとめると、GiSTは汎用性、コード再利用、整理されたインタフェースと拡張性を兼ね備えたものです。

There are five methods that an index operator class for <acronym>GiST</acronym> must provide, and six that are optional. Correctness of the index is ensured by proper implementation of the <function>same</function>, <function>consistent</function> and <function>union</function> methods, while efficiency (size and speed) of the index will depend on the <function>penalty</function> and <function>picksplit</function> methods. Two optional methods are <function>compress</function> and <function>decompress</function>, which allow an index to have internal tree data of a different type than the data it indexes. The leaves are to be of the indexed data type, while the other tree nodes can be of any C struct (but you still have to follow <productname>PostgreSQL</productname> data type rules here, see about <literal>varlena</literal> for variable sized data). If the tree's internal data type exists at the SQL level, the <literal>STORAGE</literal> option of the <command>CREATE OPERATOR CLASS</command> command can be used. The optional eighth method is <function>distance</function>, which is needed if the operator class wishes to support ordered scans (nearest-neighbor searches). The optional ninth method <function>fetch</function> is needed if the operator class wishes to support index-only scans, except when the <function>compress</function> method is omitted. The optional tenth method <function>options</function> is needed if the operator class has user-specified parameters. The optional eleventh method <function>sortsupport</function> is used to speed up building a <acronym>GiST</acronym> index. GiST用の演算子クラスが提供しなければならないメソッドが5つ、オプションで提供可能なメソッドが6つあります。インデックスの正確性は、same、consistent、unionメソッドを適切に実装することで保証されます。一方、インデックスの効率（容量と速度）はpenaltyとpicksplitメソッドに依存します。オプションのメソッドの２つは、compressとdecompressです。これによりインデックスはインデックス付けするデータと異なるデータ型のツリーデータを内部で持つことができるようになります。リーフはインデックス付けするデータ型となりますが、他のツリーノードは何らかのC構造体を取ることができます。（しかしここでもPostgreSQLのデータ型規約に従わなければなりません。容量が可変のデータに関してはvarlenaを参照してください。）ツリーの内部データ型がSQLレベルで存在する場合、CREATE OPERATOR CLASSコマンドのSTORAGEオプションを使用することができます。オプションの8番目のメソッドはdistanceです。これは演算子クラスに順序付けスキャン（最近傍検索）をサポートさせたい場合に必要です。オプションの9番目のメソッドfetchは、compressメソッドが省略されている場合を除き、演算子クラスがインデックスオンリースキャンをサポートしたい場合に必要になります。オプションの10番目のメソッドoptionsは、演算子クラスがユーザに固有のパラメータを持つ場合に必要です。オプションの11番目のメソッドsortsupportは、GiSTインデックスの構築を高速にするのに使われます。

consistent

Given an index entry <literal>p</literal> and a query value <literal>q</literal>, this function determines whether the index entry is <quote>consistent</quote> with the query; that is, could the predicate <quote><replaceable>indexed_column</replaceable> <replaceable>indexable_operator</replaceable> <literal>q</literal></quote> be true for any row represented by the index entry? For a leaf index entry this is equivalent to testing the indexable condition, while for an internal tree node this determines whether it is necessary to scan the subtree of the index represented by the tree node. When the result is <literal>true</literal>, a <literal>recheck</literal> flag must also be returned. This indicates whether the predicate is certainly true or only possibly true. If <literal>recheck</literal> = <literal>false</literal> then the index has tested the predicate condition exactly, whereas if <literal>recheck</literal> = <literal>true</literal> the row is only a candidate match. In that case the system will automatically evaluate the <replaceable>indexable_operator</replaceable> against the actual row value to see if it is really a match. This convention allows <acronym>GiST</acronym> to support both lossless and lossy index structures. インデックス項目pと問い合わせ値qが与えられると、この関数はインデックス項目が問い合わせと「一貫性」があるかどうか、つまり、述語「indexed_columnindexable_operator q」が、インデックス項目で表現される行に対して真かどうかを決定します。リーフインデックス項目では、これはインデックス付条件の試験と等価です。一方で内部ツリーノードでは、これはツリーノードで表現されるインデックスの副ツリーを走査する必要があるかどうかを決定します。結果がtrueならば、recheckフラグも返されなければなりません。これは、述語が確実に真なのか一部のみ真なのかを示します。 recheck = falseならば、インデックスは述語条件を正確に試験されたことを示し、recheck= trueならば行が単に一致候補であることを示します。この場合、システムは自動的にindexable_operatorを実際の行値に対して評価し、本当に一致するかどうか確認します。この規則により、GiSTはインデックス構造が非可逆な場合でも可逆な場合でもサポートすることができます。

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようになります。

CREATE OR REPLACE FUNCTION my_consistent(internal, data_type, smallint, oid, internal)
RETURNS bool
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

And the matching code in the C module could then follow this skeleton: そして、Cモジュール内の対応するコードは以下のような骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_consistent);

Datum
my_consistent(PG_FUNCTION_ARGS)
{
    GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
    data_type  *query = PG_GETARG_DATA_TYPE_P(1);
    StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
    /* Oid subtype = PG_GETARG_OID(3); */
    bool       *recheck = (bool *) PG_GETARG_POINTER(4);
    data_type  *key = DatumGetDataType(entry->key);
    bool        retval;

    /*

     * determine return value as a function of strategy, key and query.
     *
     * Use GIST_LEAF(entry) to know where you're called in the index tree,
     * which comes handy when supporting the = operator for example (you could
     * check for non empty union() in non-leaf nodes and equality in leaf
     * nodes).

     * strategy、keyおよびqueryの関数として戻り値を決定してください。
     *
     * インデックスツリー内のどこで呼びだされているかを知るためGIST_LEAF(entry)を使用してください。
     * それは、例えば = 演算子をサポートする場合重宝です
     *（非リーフノードにおける空でないunion()とリーフノードにおける等価性を検査することができます）。
     */


    *recheck = true;        /* or false if check is exact */

    *recheck = true;        /* もしくは検査が正確であれば偽 */

    PG_RETURN_BOOL(retval);
}

Here, <varname>key</varname> is an element in the index and <varname>query</varname> the value being looked up in the index. The <literal>StrategyNumber</literal> parameter indicates which operator of your operator class is being applied — it matches one of the operator numbers in the <command>CREATE OPERATOR CLASS</command> command. ここで、keyはインデックス要素であり、queryはインデックスに対して検索される値です。 StrategyNumberパラメータは、演算子クラスのどの演算子が適用されるかを示します。これはCREATE OPERATOR CLASSコマンドの演算子番号の1つに一致します。

Depending on which operators you have included in the class, the data type of <varname>query</varname> could vary with the operator, since it will be whatever type is on the right-hand side of the operator, which might be different from the indexed data type appearing on the left-hand side. (The above code skeleton assumes that only one type is possible; if not, fetching the <varname>query</varname> argument value would have to depend on the operator.) It is recommended that the SQL declaration of the <function>consistent</function> function use the opclass's indexed data type for the <varname>query</varname> argument, even though the actual type might be something else depending on the operator. 演算子の右辺にはいかなる型も来ることがあり、それは左辺に現れるインデックス付けされたデータ型とは違うものかもしれませんので、このクラスにどの演算子を含めたかに依存して、queryのデータ型は演算子に応じて変動することがあります。 (上のコードの骨格は型が1つだけ可能であることを仮定しています。そうでなければ、query引数の値を取得するのは演算子に依存しないといけないでしょう。) consistent関数のSQL宣言では、実際の型は演算子に依存して何か他のものであるとしても、query引数の演算子クラスのインデックス付けされたデータ型を使うことをお勧めします。

union

This method consolidates information in the tree. Given a set of entries, this function generates a new index entry that represents all the given entries. このメソッドはツリー内の情報を統合します。項目の集合が与えられると、この関数は与えられた項目すべてを表現するインデックス項目を新しく生成します。

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようになります。

CREATE OR REPLACE FUNCTION my_union(internal, internal)
RETURNS storage_type
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

And the matching code in the C module could then follow this skeleton: そして、Cモジュール内の対応するコードは以下のような骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_union);

Datum
my_union(PG_FUNCTION_ARGS)
{
    GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
    GISTENTRY  *ent = entryvec->vector;
    data_type  *out,
               *tmp,
               *old;
    int         numranges,
                i = 0;

    numranges = entryvec->n;
    tmp = DatumGetDataType(ent[0].key);
    out = tmp;

    if (numranges == 1)
    {
        out = data_type_deep_copy(tmp);

        PG_RETURN_DATA_TYPE_P(out);
    }

    for (i = 1; i < numranges; i++)
    {
        old = out;
        tmp = DatumGetDataType(ent[i].key);
        out = my_union_implementation(out, tmp);
    }

    PG_RETURN_DATA_TYPE_P(out);
}

As you can see, in this skeleton we're dealing with a data type where <literal>union(X, Y, Z) = union(union(X, Y), Z)</literal>. It's easy enough to support data types where this is not the case, by implementing the proper union algorithm in this <acronym>GiST</acronym> support method. ご覧になったように、この骨格でunion(X, Y, Z) = union(union(X, Y), Z)であるようなデータ型を処理しています。このGiSTサポートメソッドに適切なunionアルゴリズムを実装することで、このような場合以外のデータ型をサポートすることは非常に容易です。

The result of the <function>union</function> function must be a value of the index's storage type, whatever that is (it might or might not be different from the indexed column's type). The <function>union</function> function should return a pointer to newly <function>palloc()</function>ed memory. You can't just return the input value as-is, even if there is no type change. union関数の結果は、(インデックス付けされた列の型とは異なるかもしれないし、異ならないかもしれませんが)それが何であれインデックスの格納型の値でなければなりません。 union関数は新たにpalloc()されたメモリへのポインタを返さなければなりません。型の変更がなかったとしても、入力値をそのまま返すことはできません。

As shown above, the <function>union</function> function's first <type>internal</type> argument is actually a <structname>GistEntryVector</structname> pointer. The second argument is a pointer to an integer variable, which can be ignored. (It used to be required that the <function>union</function> function store the size of its result value into that variable, but this is no longer necessary.) 上に示したように、union関数の1番目のinternal引数は実際はGistEntryVectorのポインタです。 2番目の引数は整数の変数へのポインタであり、無視できます。 (union関数がその結果値の大きさをその変数に保存するのに必要だったのですが、これはもはや必要ではありません。)

compress

Converts a data item into a format suitable for physical storage in an index page. If the <function>compress</function> method is omitted, data items are stored in the index without modification. データ項目をインデックスページ内の物理的な格納に適した形式に変換します。 compressメソッドが省略されている場合、データ項目は変更されずにインデックスに格納されます。

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようになります。

CREATE OR REPLACE FUNCTION my_compress(internal)
RETURNS internal
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

And the matching code in the C module could then follow this skeleton: そして、Cモジュール内の対応するコードは以下のような骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_compress);

Datum
my_compress(PG_FUNCTION_ARGS)
{
    GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
    GISTENTRY  *retval;

    if (entry->leafkey)
    {

        /* replace entry-&gt;key with a compressed version */

        /* 圧縮バージョンで entry->key を差し替え */
        compressed_data_type *compressed_data = palloc(sizeof(compressed_data_type));


        /* fill *compressed_data from entry-&gt;key ... */

        /* entry->key ... から *compressed_data を補填 */

        retval = palloc(sizeof(GISTENTRY));
        gistentryinit(*retval, PointerGetDatum(compressed_data),
                      entry->rel, entry->page, entry->offset, FALSE);
    }
    else
    {

        /* typically we needn't do anything with non-leaf entries */

        /* 通常非リーフ項目に対して行うことはない */
        retval = entry;
    }

    PG_RETURN_POINTER(retval);
}

You have to adapt <replaceable>compressed_data_type</replaceable> to the specific type you're converting to in order to compress your leaf nodes, of course. 当然ながらcompressed_data_typeを、リーフノードを圧縮するために変換する特定の型に適合させなければなりません。

decompress

Converts the stored representation of a data item into a format that can be manipulated by the other GiST methods in the operator class. If the <function>decompress</function> method is omitted, it is assumed that the other GiST methods can work directly on the stored data format. (<function>decompress</function> is not necessarily the reverse of the <function>compress</function> method; in particular, if <function>compress</function> is lossy then it's impossible for <function>decompress</function> to exactly reconstruct the original data. <function>decompress</function> is not necessarily equivalent to <function>fetch</function>, either, since the other GiST methods might not require full reconstruction of the data.) データ項目の格納された表現を、演算子クラスの他のGiSTメソッドで操作できる形式に変換します。 decompressメソッドが省略された場合、他のGiSTメソッドが直接操作出来るデータ形式で格納されると想定されます。（decompressは、必ずしもcompressメソッドの逆になるわけではありません。特に、compressが不可逆な場合、decompressで元のデータを正確に再構築するのが不可能になります。他のGiSTメソッドはすべてのデータを再構築することは必要としないかもしれないので、decompressはfetchと等価であるとは限りません。）

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようになります。

CREATE OR REPLACE FUNCTION my_decompress(internal)
RETURNS internal
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

And the matching code in the C module could then follow this skeleton: そして、Cモジュール内の対応するコードは以下のような骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_decompress);

Datum
my_decompress(PG_FUNCTION_ARGS)
{
    PG_RETURN_POINTER(PG_GETARG_POINTER(0));
}

The above skeleton is suitable for the case where no decompression is needed. (But, of course, omitting the method altogether is even easier, and is recommended in such cases.) 上記骨格は、伸長を必要としない場合に適したものです。（ただし、もちろん、このメソッドを完全に省略する方がさらに簡単なので、このような場合は省略することをお勧めします。）

penalty

Returns a value indicating the <quote>cost</quote> of inserting the new entry into a particular branch of the tree. Items will be inserted down the path of least <function>penalty</function> in the tree. Values returned by <function>penalty</function> should be non-negative. If a negative value is returned, it will be treated as zero. 新しい項目をツリーの特定の分岐点に挿入するための「コスト」を示す値を返します。項目は、ツリー内でpenaltyが最小の経路に挿入されます。 penaltyから返される値は非負でなければなりません。負の値が返された場合、ゼロとして扱われます。

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようになります。

CREATE OR REPLACE FUNCTION my_penalty(internal, internal, internal)
RETURNS internal
AS 'MODULE_PATHNAME'

LANGUAGE C STRICT;  &#45;- in some cases penalty functions need not be strict

LANGUAGE C STRICT;  -- penalty関数は厳密である必要がない場合もあります

And the matching code in the C module could then follow this skeleton: そして、Cモジュール内の対応するコードは以下のような骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_penalty);

Datum
my_penalty(PG_FUNCTION_ARGS)
{
    GISTENTRY  *origentry = (GISTENTRY *) PG_GETARG_POINTER(0);
    GISTENTRY  *newentry = (GISTENTRY *) PG_GETARG_POINTER(1);
    float      *penalty = (float *) PG_GETARG_POINTER(2);
    data_type  *orig = DatumGetDataType(origentry->key);
    data_type  *new = DatumGetDataType(newentry->key);

    *penalty = my_penalty_implementation(orig, new);
    PG_RETURN_POINTER(penalty);
}

For historical reasons, the <function>penalty</function> function doesn't just return a <type>float</type> result; instead it has to store the value at the location indicated by the third argument. The return value per se is ignored, though it's conventional to pass back the address of that argument. 歴史的な理由により、penalty関数は単純にfloatの結果を返しません。その代わり、3番目の引数で指定された場所に値を格納しなければなりません。その引数のアドレスを戻すのが慣例ですが、戻り値それ自体は無視されます。

The <function>penalty</function> function is crucial to good performance of the index. It'll get used at insertion time to determine which branch to follow when choosing where to add the new entry in the tree. At query time, the more balanced the index, the quicker the lookup. penalty関数は優れた性能のインデックスではきわめて重要です。これは、挿入の段階で新しい項目をツリーに追加する場所を決定する際にどの分岐に従うかを決定するために使用されます。問い合わせの際、インデックスのバランスが良ければ、検索が速くなります。

picksplit

When an index page split is necessary, this function decides which entries on the page are to stay on the old page, and which are to move to the new page. インデックスページ分割が必要になった時、この関数は、ページ内のどの項目を古いページに残すか、および、どれを新しいページに移動するかを決定します。

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようになります。

CREATE OR REPLACE FUNCTION my_picksplit(internal, internal)
RETURNS internal
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

And the matching code in the C module could then follow this skeleton: そして、Cモジュール内の対応するコードは以下のような骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_picksplit);

Datum
my_picksplit(PG_FUNCTION_ARGS)
{
    GistEntryVector *entryvec = (GistEntryVector *) PG_GETARG_POINTER(0);
    GIST_SPLITVEC *v = (GIST_SPLITVEC *) PG_GETARG_POINTER(1);
    OffsetNumber maxoff = entryvec->n - 1;
    GISTENTRY  *ent = entryvec->vector;
    int         i,
                nbytes;
    OffsetNumber *left,
               *right;
    data_type  *tmp_union;
    data_type  *unionL;
    data_type  *unionR;
    GISTENTRY **raw_entryvec;

    maxoff = entryvec->n - 1;
    nbytes = (maxoff + 1) * sizeof(OffsetNumber);

    v->spl_left = (OffsetNumber *) palloc(nbytes);
    left = v->spl_left;
    v->spl_nleft = 0;

    v->spl_right = (OffsetNumber *) palloc(nbytes);
    right = v->spl_right;
    v->spl_nright = 0;

    unionL = NULL;
    unionR = NULL;


    /* Initialize the raw entry vector. */

    /* 項目自体のベクタの初期化 */
    raw_entryvec = (GISTENTRY **) malloc(entryvec->n * sizeof(void *));
    for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
        raw_entryvec[i] = &(entryvec->vector[i]);

    for (i = FirstOffsetNumber; i <= maxoff; i = OffsetNumberNext(i))
    {
        int         real_index = raw_entryvec[i] - entryvec->vector;

        tmp_union = DatumGetDataType(entryvec->vector[real_index].key);
        Assert(tmp_union != NULL);

        /*

         * Choose where to put the index entries and update unionL and unionR
         * accordingly. Append the entries to either v-&gt;spl_left or
         * v-&gt;spl_right, and care about the counters.

         * インデックス項目の格納場所を決定し、それに合わせてunionLとunionRを更新
         * します。v->spl_left もしくは v->spl_right のどちらかに項目を追加します。
         * カウンタに留意してください。
         */

        if (my_choice_is_left(unionL, curl, unionR, curr))
        {
            if (unionL == NULL)
                unionL = tmp_union;
            else
                unionL = my_union_implementation(unionL, tmp_union);

            *left = real_index;
            ++left;
            ++(v->spl_nleft);
        }
        else
        {
            /*

             * Same on the right

             * 右と同じ
             */
        }
    }

    v->spl_ldatum = DataTypeGetDatum(unionL);
    v->spl_rdatum = DataTypeGetDatum(unionR);
    PG_RETURN_POINTER(v);
}

Notice that the <function>picksplit</function> function's result is delivered by modifying the passed-in <structname>v</structname> structure. The return value per se is ignored, though it's conventional to pass back the address of <structname>v</structname>. picksplit関数の結果は渡されたv構造体を修正することで返されることに注意してください。 vのアドレスを戻すのが慣例ですが、戻り値それ自体は無視されます。

Like <function>penalty</function>, the <function>picksplit</function> function is crucial to good performance of the index. Designing suitable <function>penalty</function> and <function>picksplit</function> implementations is where the challenge of implementing well-performing <acronym>GiST</acronym> indexes lies. penalty同様、picksplit関数も優れた性能のインデックスのためにきわめて重要です。 penaltyとpicksplitの実装を適切に設計することが、性能が良いGiSTインデックスの実装を行うことにつながります。

same

Returns true if two index entries are identical, false otherwise. (An <quote>index entry</quote> is a value of the index's storage type, not necessarily the original indexed column's type.) ２つのインデックス項目が同一の場合に真、さもなくば偽を返します。 (「インデックス項目」はインデックスの格納型の値であり、必ずしも元のインデックス付けされた列の型という訳ではありません。)

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようになります。

CREATE OR REPLACE FUNCTION my_same(storage_type, storage_type, internal)
RETURNS internal
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

And the matching code in the C module could then follow this skeleton: そして、Cモジュール内の対応するコードは以下のような骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_same);

Datum
my_same(PG_FUNCTION_ARGS)
{
    prefix_range *v1 = PG_GETARG_PREFIX_RANGE_P(0);
    prefix_range *v2 = PG_GETARG_PREFIX_RANGE_P(1);
    bool       *result = (bool *) PG_GETARG_POINTER(2);

    *result = my_eq(v1, v2);
    PG_RETURN_POINTER(result);
}

For historical reasons, the <function>same</function> function doesn't just return a Boolean result; instead it has to store the flag at the location indicated by the third argument. The return value per se is ignored, though it's conventional to pass back the address of that argument. 歴史的な理由により、same関数は単純に論理値の結果を返しません。その代わり、3番目の引数で指定された場所にフラグを格納しなければなりません。その引数のアドレスを戻すのが慣例ですが、戻り値それ自体は無視されます。

distance

Given an index entry <literal>p</literal> and a query value <literal>q</literal>, this function determines the index entry's <quote>distance</quote> from the query value. This function must be supplied if the operator class contains any ordering operators. A query using the ordering operator will be implemented by returning index entries with the smallest <quote>distance</quote> values first, so the results must be consistent with the operator's semantics. For a leaf index entry the result just represents the distance to the index entry; for an internal tree node, the result must be the smallest distance that any child entry could have. インデックス項目pと問い合わせ値qを与えると、この関数は問い合わせ値からのインデックス項目の「距離」を決定します。この関数は、演算子クラスが何らかの順序付け演算子を含む場合には提供しなければなりません。順序付け演算子を使用する問い合わせは、まず最小の「距離」を持つインデックス項目を返すことで実装されます。このためこの結果は演算子の意味と一貫性を持たなければなりません。リーフインデックスノード項目では、結果は単にインデックス項目との距離を表します。内部ツリーノードでは、結果はすべての子項目が持つ中から最も最小の距離でなければなりません。

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようにならなければなりません。

CREATE OR REPLACE FUNCTION my_distance(internal, data_type, smallint, oid, internal)
RETURNS float8
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

And the matching code in the C module could then follow this skeleton: Cモジュールにおける対応するコードは次の骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_distance);

Datum
my_distance(PG_FUNCTION_ARGS)
{
    GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
    data_type  *query = PG_GETARG_DATA_TYPE_P(1);
    StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
    /* Oid subtype = PG_GETARG_OID(3); */
    /* bool *recheck = (bool *) PG_GETARG_POINTER(4); */
    data_type  *key = DatumGetDataType(entry->key);
    double      retval;

    /*

     * determine return value as a function of strategy, key and query.

     * strategy、keyおよびqueryの関数として戻り値を決定してください。
     */

    PG_RETURN_FLOAT8(retval);
}

The arguments to the <function>distance</function> function are identical to the arguments of the <function>consistent</function> function. distance関数の引数はconsistent関数の引数と同一です。

Some approximation is allowed when determining the distance, so long as the result is never greater than the entry's actual distance. Thus, for example, distance to a bounding box is usually sufficient in geometric applications. For an internal tree node, the distance returned must not be greater than the distance to any of the child nodes. If the returned distance is not exact, the function must set <literal>*recheck</literal> to true. (This is not necessary for internal tree nodes; for them, the calculation is always assumed to be inexact.) In this case the executor will calculate the accurate distance after fetching the tuple from the heap, and reorder the tuples if necessary. 距離の決定において、その結果がエントリの実際の距離よりも大きくならない限り、多少の概算は許されます。したがって、例えば、幾何学に関するアプリケーションでは、通常は外接矩形への距離で十分です。内部ツリーノードについては、返される距離はどの子ノードへの距離よりも大きくなることは許されません。返される距離が正確でない場合、関数は*recheckを真にセットする必要があります。（内部ツリーノードについては、計算はいつでも不正確であると見なされるため、これは必要ありません。）この場合、エグゼキュータはヒープからタプルを取得した後で正確な距離を計算し、必要ならタプルを並べ替えます。

If the distance function returns <literal>*recheck = true</literal> for any leaf node, the original ordering operator's return type must be <type>float8</type> or <type>float4</type>, and the distance function's result values must be comparable to those of the original ordering operator, since the executor will sort using both distance function results and recalculated ordering-operator results. Otherwise, the distance function's result values can be any finite <type>float8</type> values, so long as the relative order of the result values matches the order returned by the ordering operator. (Infinity and minus infinity are used internally to handle cases such as nulls, so it is not recommended that <function>distance</function> functions return these values.) 距離関数がリーフノードについて*recheck = trueを返す場合、元の順序づけ演算子の戻り型はfloat8またはfloat4でなければならず、また距離関数の結果の値は元の順序づけ演算子の戻り型と比較可能でなければなりません。なぜならエグゼキュータは距離関数の結果および再計算された順序づけ演算子の結果の両方を利用してソート処理を行うからです。その他の場合は、結果値の相対的な順序が順序づけ演算子が返す順序と一致する限り、距離関数の戻り値は任意の有限のfloat8の値とすることができます。（無限大とマイナス無限大は内部的にNULLなどの場合を処理するために利用するので、distance関数がこれらの値を戻すことは薦められません。）

fetch

Converts the compressed index representation of a data item into the original data type, for index-only scans. The returned data must be an exact, non-lossy copy of the originally indexed value. インデックスオンリースキャンで使用するため、データ項目の圧縮されたインデックス表現を元のデータ型に変換します。返されたデータは元のインデックス値の正確で、何も失われていない複製でなければなりません。

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようにならなければなりません。

CREATE OR REPLACE FUNCTION my_fetch(internal)
RETURNS internal
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

The argument is a pointer to a <structname>GISTENTRY</structname> struct. On entry, its <structfield>key</structfield> field contains a non-NULL leaf datum in compressed form. The return value is another <structname>GISTENTRY</structname> struct, whose <structfield>key</structfield> field contains the same datum in its original, uncompressed form. If the opclass's compress function does nothing for leaf entries, the <function>fetch</function> method can return the argument as-is. Or, if the opclass does not have a compress function, the <function>fetch</function> method can be omitted as well, since it would necessarily be a no-op. 引数はGISTENTRY構造体へのポインタです。関数が呼び出された時点では、そのkeyフィールドには、NULLでないリーフデータが圧縮形式で入っています。戻り値は別のGISTENTRY構造体で、そのkeyフィールドには、同じデータが元の非圧縮形式で入っています。 opclassの圧縮関数がリーフのエントリに対して何もしないなら、fetchメソッドは引数をそのまま返すことができます。また、opclassに圧縮関数がない場合、fetchメソッドも省略できます。これは、必然的にno-opになるからです。

The matching code in the C module could then follow this skeleton: Cモジュールにおける対応するコードは次の骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_fetch);

Datum
my_fetch(PG_FUNCTION_ARGS)
{
    GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
    input_data_type *in = DatumGetPointer(entry->key);
    fetched_data_type *fetched_data;
    GISTENTRY  *retval;

    retval = palloc(sizeof(GISTENTRY));
    fetched_data = palloc(sizeof(fetched_data_type));

    /*

     * Convert 'fetched_data' into the a Datum of the original datatype.

     * fetched_dataを元のデータ型のデータに変換する。
     */


    /* fill *retval from fetched_data. */

    /* fetched_dataを使って*retvalに値を入れる。 */
    gistentryinit(*retval, PointerGetDatum(converted_datum),
                  entry->rel, entry->page, entry->offset, FALSE);

    PG_RETURN_POINTER(retval);
}

If the compress method is lossy for leaf entries, the operator class cannot support index-only scans, and must not define a <function>fetch</function> function. compressメソッドがリーフエントリに対してデータ損失がある場合、演算子クラスはインデックスオンリースキャンをサポートすることができず、fetch関数を定義してはいけません。

options

Allows definition of user-visible parameters that control operator class behavior. 演算子クラスの振舞いを制御するユーザに可視のパラメータの集合を定義します。

The <acronym>SQL</acronym> declaration of the function must look like this: 関数のSQL宣言は以下のようになります。

CREATE OR REPLACE FUNCTION my_options(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

The function is passed a pointer to a <structname>local_relopts</structname> struct, which needs to be filled with a set of operator class specific options. The options can be accessed from other support functions using the <literal>PG_HAS_OPCLASS_OPTIONS()</literal> and <literal>PG_GET_OPCLASS_OPTIONS()</literal> macros. 関数にはlocal_relopts構造体へのポインタが渡されますが、構造体を演算子クラスに固有のオプションの集合で満たすことが必要です。オプションはマクロPG_HAS_OPCLASS_OPTIONS()とPG_GET_OPCLASS_OPTIONS()を使って他のサポート関数からアクセスできます。

An example implementation of my_options() and parameters use from other support functions are given below: my_options()の実装と他のサポート関数からのパラメータの使用の例は以下の通りです。

typedef enum MyEnumType
{
    MY_ENUM_ON,
    MY_ENUM_OFF,
    MY_ENUM_AUTO
} MyEnumType;

typedef struct
{

    int32   vl_len_;    /* varlena header (do not touch directly!) */
    int     int_param;  /* integer parameter */
    double  real_param; /* real parameter */
    MyEnumType enum_param; /* enum parameter */
    int     str_param;  /* string parameter */

    int32   vl_len_;    /* varlenaヘッダ(直接触らないこと!) */
    int     int_param;  /* 整数パラメータ */
    double  real_param; /* 実数パラメータ */
    MyEnumType enum_param; /* enumパラメータ */
    int     str_param;  /* 文字列パラメータ */
} MyOptionsStruct;


/* String representation of enum values */

/* enum値の文字列表現 */
static relopt_enum_elt_def myEnumValues[] =
{
    {"on", MY_ENUM_ON},
    {"off", MY_ENUM_OFF},
    {"auto", MY_ENUM_AUTO},

    {(const char *) NULL}   /* list terminator */

    {(const char *) NULL}   /* リストの終端 */
};

static char *str_param_default = "default";

/*

 * Sample validator: checks that string is not longer than 8 bytes.

 * 有効性検査の例: 文字列が8バイトより長くないことを検査します。
 */
static void
validate_my_string_relopt(const char *value)
{
    if (strlen(value) > 8)
        ereport(ERROR,
                (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
                 errmsg("str_param must be at most 8 bytes")));
}

/*

 * Sample filler: switches characters to lower case.

 * 充填の例: 文字を小文字に交換します。
 */
static Size
fill_my_string_relopt(const char *value, void *ptr)
{
    char   *tmp = str_tolower(value, strlen(value), DEFAULT_COLLATION_OID);
    int     len = strlen(tmp);

    if (ptr)
        strcpy((char *) ptr, tmp);

    pfree(tmp);
    return len + 1;
}

PG_FUNCTION_INFO_V1(my_options);

Datum
my_options(PG_FUNCTION_ARGS)
{
    local_relopts *relopts = (local_relopts *) PG_GETARG_POINTER(0);

    init_local_reloptions(relopts, sizeof(MyOptionsStruct));
    add_local_int_reloption(relopts, "int_param", "integer parameter",
                            100, 0, 1000000,
                            offsetof(MyOptionsStruct, int_param));
    add_local_real_reloption(relopts, "real_param", "real parameter",
                             1.0, 0.0, 1000000.0,
                             offsetof(MyOptionsStruct, real_param));
    add_local_enum_reloption(relopts, "enum_param", "enum parameter",
                             myEnumValues, MY_ENUM_ON,
                             "Valid values are: \"on\", \"off\" and \"auto\".",
                             offsetof(MyOptionsStruct, enum_param));
    add_local_string_reloption(relopts, "str_param", "string parameter",
                               str_param_default,
                               &validate_my_string_relopt,
                               &fill_my_string_relopt,
                               offsetof(MyOptionsStruct, str_param));

    PG_RETURN_VOID();
}

PG_FUNCTION_INFO_V1(my_compress);

Datum
my_compress(PG_FUNCTION_ARGS)
{
    int     int_param = 100;
    double  real_param = 1.0;
    MyEnumType enum_param = MY_ENUM_ON;
    char   *str_param = str_param_default;

    /*

     * Normally, when opclass contains 'options' method, then options are always
     * passed to support functions.  However, if you add 'options' method to
     * existing opclass, previously defined indexes have no options, so the
     * check is required.

     * 通常は、演算子クラスが'options'メソッドを含む場合、optionsは常にサポート関数に
     * 渡されます。しかしながら、'options'メソッドを既存の演算子クラスに追加した場合、
     * 前に定義されたインデックスにoptionsがない場合、検査が必要です。
     */
    if (PG_HAS_OPCLASS_OPTIONS())
    {
        MyOptionsStruct *options = (MyOptionsStruct *) PG_GET_OPCLASS_OPTIONS();

        int_param = options->int_param;
        real_param = options->real_param;
        enum_param = options->enum_param;
        str_param = GET_STRING_RELOPTION(options, str_param);
    }


    /* the rest implementation of support function */

    /* サポート関数の残りの実装 */
}

Since the representation of the key in <acronym>GiST</acronym> is flexible, it may depend on user-specified parameters. For instance, the length of key signature may be specified. See <literal>gtsvector_options()</literal> for example. GiSTでのキーの表現には柔軟性がありますので、ユーザに固有のパラメータに依存するかもしれません。例えば、キーの署名の長さが指定されるかもしれません。例についてはgtsvector_options()を参照してください。

sortsupport

Returns a comparator function to sort data in a way that preserves locality. It is used by <command>CREATE INDEX</command> and <command>REINDEX</command> commands. The quality of the created index depends on how well the sort order determined by the comparator function preserves locality of the inputs. ある程度局所性を保つ方法でデータをソートする比較関数を返します。 CREATE INDEXとREINDEXコマンドで使われます。作成されるインデックスの質は、比較関数により決定された順序が入力の局所性をどれだけよく保っているかに依存します。

The <function>sortsupport</function> method is optional. If it is not provided, <command>CREATE INDEX</command> builds the index by inserting each tuple to the tree using the <function>penalty</function> and <function>picksplit</function> functions, which is much slower. sortsupportメソッドは省略可能です。提供されなければ、CREATE INDEXは各タプルをpenalty関数とpicksplit関数を使ってツリーに挿入することでインデックスを構築します。これはずっと遅いです。

The <acronym>SQL</acronym> declaration of the function must look like this: この関数のSQL宣言は以下のようになります。

CREATE OR REPLACE FUNCTION my_sortsupport(internal)
RETURNS void
AS 'MODULE_PATHNAME'
LANGUAGE C STRICT;

The argument is a pointer to a <structname>SortSupport</structname> struct. At a minimum, the function must fill in its comparator field. The comparator takes three arguments: two Datums to compare, and a pointer to the <structname>SortSupport</structname> struct. The Datums are the two indexed values in the format that they are stored in the index; that is, in the format returned by the <function>compress</function> method. The full API is defined in <filename>src/include/utils/sortsupport.h</filename>. 引数はSortSupport構造体へのポインタです。最低でも、関数は構造体のcomparatorフィールドを埋めなければなりません。比較関数は引数を3つ取ります。比較するDatumを2つとSortSupport構造体へのポインタです。 Datumはインデックス付けされる値2つで、インデックスに格納される形式です。すなわち、compressメソッドにより返される形式です。完全なAPIはsrc/include/utils/sortsupport.hで定義されています。

The matching code in the C module could then follow this skeleton: そして、Cモジュール内の対応するコードは以下のような骨格に従うことになります。

PG_FUNCTION_INFO_V1(my_sortsupport);

static int
my_fastcmp(Datum x, Datum y, SortSupport ssup)
{

  /* establish order between x and y by computing some sorting value z */

  /* ソートのための値zを計算することでxとyの間に順序を確立する */

  int z1 = ComputeSpatialCode(x);
  int z2 = ComputeSpatialCode(y);

  return z1 == z2 ? 0 : z1 > z2 ? 1 : -1;
}

Datum
my_sortsupport(PG_FUNCTION_ARGS)
{
  SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);

  ssup->comparator = my_fastcmp;
  PG_RETURN_VOID();
}

All the GiST support methods are normally called in short-lived memory contexts; that is, <varname>CurrentMemoryContext</varname> will get reset after each tuple is processed. It is therefore not very important to worry about pfree'ing everything you palloc. However, in some cases it's useful for a support method to cache data across repeated calls. To do that, allocate the longer-lived data in <literal>fcinfo->flinfo->fn_mcxt</literal>, and keep a pointer to it in <literal>fcinfo->flinfo->fn_extra</literal>. Such data will survive for the life of the index operation (e.g., a single GiST index scan, index build, or index tuple insertion). Be careful to pfree the previous value when replacing a <literal>fn_extra</literal> value, or the leak will accumulate for the duration of the operation. すべてのGiSTサポートメソッドは通常短期間有効なメモリコンテキストで呼び出されます。つまりCurrentMemoryContextは各タプルが処理された後にリセットされます。そのためpallocしたすべてをpfreeすることに注意するのはあまり重要ではありません。しかし、サポートメソッドで、繰り返される呼び出しを跨がってデータをキャッシュすることが有用な場合があります。このためには、fcinfo->flinfo->fn_mcxtの中で長期間有効なデータを割り当て、そこへのポインタをfcinfo->flinfo->fn_extraの中に保持してください。こうしたデータはインデックス操作（例えば１つのGiSTインデックススキャン、インデックス構築、インデックスタプルの挿入）の間有効です。 fn_extra値を置き換える時に以前の値をpfreeすることに注意してください。さもないと操作の間リークが蓄積されます。

前へ	上へ	次へ
68.2. 組み込み演算子クラス	ホーム	68.4. 実装