F.9. citext — 大文字小文字の区別がない文字列型

PostgreSQL 17.6文書
		付録F 追加で提供されるモジュールと拡張	誤訳等の報告
前へ	上へ	F.9. citext — 大文字小文字の区別がない文字列型	次へ

F.9. citext — 大文字小文字の区別がない文字列型 #

<title>citext — a case-insensitive character string type</title>

The <filename>citext</filename> module provides a case-insensitive character string type, <type>citext</type>. Essentially, it internally calls <function>lower</function> when comparing values. Otherwise, it behaves almost exactly like <type>text</type>. citextモジュールは、大文字小文字の区別がないcitext文字列型を提供します。これは値の比較の際、基本的に内部でlowerを呼び出します。この他はほぼtextと同様に動作します。

ヒント

Consider using <firstterm>nondeterministic collations</firstterm> (see <xref linkend="collation-nondeterministic"/>) instead of this module. They can be used for case-insensitive comparisons, accent-insensitive comparisons, and other combinations, and they handle more Unicode special cases correctly. このモジュールの代わりに非決定論的照合順序(23.2.2.4参照)を使うことを検討してください。大文字小文字を区別しない比較、アクセントを区別しない比較、その他の組み合わせに対して使えますし、より多くのユニコードの特別な場合を正しく扱います。

This module is considered <quote>trusted</quote>, that is, it can be installed by non-superusers who have <literal>CREATE</literal> privilege on the current database. このモジュールは「trusted」と見なされます。つまり、現在のデータベースに対してCREATE権限を持つ非スーパーユーザがインストールできます。

F.9.1. 原理 #

<title>Rationale</title>

The standard approach to doing case-insensitive matches in <productname>PostgreSQL</productname> has been to use the <function>lower</function> function when comparing values, for example PostgreSQLにおいて大文字小文字の区別のない比較を行う標準的な手法は、値を比べる際に以下のようにlower関数を使用することでした。例です。

SELECT * FROM tab WHERE lower(col) = LOWER(?);

This works reasonably well, but has a number of drawbacks: これはまあまあ動作しますが、数多くの欠点があります。

It makes your SQL statements verbose, and you always have to remember to use <function>lower</function> on both the column and the query value. 作成するSQL文を冗長にします。また常に列と問い合わせの値両方にlowerを使用することを忘れないようにしなければなりません。
It won't use an index, unless you create a functional index using <function>lower</function>. lowerを使用して関数インデックスを作成していない限り、インデックスを使用しません。
If you declare a column as <literal>UNIQUE</literal> or <literal>PRIMARY KEY</literal>, the implicitly generated index is case-sensitive. So it's useless for case-insensitive searches, and it won't enforce uniqueness case-insensitively. UNIQUEまたはPRIMARY KEYとして列を宣言するのであれば、暗黙的に生成されるインデックスは大文字小文字を区別します。このため、大文字小文字を区別しない検索では使えず、また、大文字小文字を区別しない一意性を強制させられません。

The <type>citext</type> data type allows you to eliminate calls to <function>lower</function> in SQL queries, and allows a primary key to be case-insensitive. <type>citext</type> is locale-aware, just like <type>text</type>, which means that the matching of upper case and lower case characters is dependent on the rules of the database's <literal>LC_CTYPE</literal> setting. Again, this behavior is identical to the use of <function>lower</function> in queries. But because it's done transparently by the data type, you don't have to remember to do anything special in your queries. citextデータ型によりSQL問い合わせ内のlower呼び出しを省くことができます。さらに、大文字小文字の区別がない主キーを実現できます。 citextはtextと同様にロケールも考慮します。つまり大文字と小文字のマッチングは、LC_CTYPEデータベース設定の規則に依存します。ここでも、この動作はlowerを使用した問い合わせと同一です。しかしこのデータ型により、ロケールの考慮は透過的に行われますので、問い合わせで特殊なことを行うことを覚えておく必要はありません。

F.9.2. 使用方法 #

Here's a simple example of usage: 簡単な例を示します。

CREATE TABLE users (
    nick CITEXT PRIMARY KEY,
    pass TEXT   NOT NULL
);

INSERT INTO users VALUES ( 'larry',  sha256(random()::text::bytea) );
INSERT INTO users VALUES ( 'Tom',    sha256(random()::text::bytea) );
INSERT INTO users VALUES ( 'Damian', sha256(random()::text::bytea) );
INSERT INTO users VALUES ( 'NEAL',   sha256(random()::text::bytea) );
INSERT INTO users VALUES ( 'Bjørn',  sha256(random()::text::bytea) );

SELECT * FROM users WHERE nick = 'Larry';

The <command>SELECT</command> statement will return one tuple, even though the <structfield>nick</structfield> column was set to <literal>larry</literal> and the query was for <literal>Larry</literal>. SELECT文は、nick列がlarryに設定され、問い合わせがLarryに対してであっても、１つのタプルを返します。

F.9.3. 文字列比較の動作 #

<title>String Comparison Behavior</title>

<type>citext</type> performs comparisons by converting each string to lower case (as though <function>lower</function> were called) and then comparing the results normally. Thus, for example, two strings are considered equal if <function>lower</function> would produce identical results for them. citextはそれぞれの文字列を（lowerが呼ばれますが）小文字に変換して結果を普通に比較します。よって、例えばlowerで小文字にした場合に同じ結果となるような２つの文字列が等しいとみなされます。

In order to emulate a case-insensitive collation as closely as possible, there are <type>citext</type>-specific versions of a number of string-processing operators and functions. So, for example, the regular expression operators <literal>~</literal> and <literal>~*</literal> exhibit the same behavior when applied to <type>citext</type>: they both match case-insensitively. The same is true for <literal>!~</literal> and <literal>!~*</literal>, as well as for the <literal>LIKE</literal> operators <literal>~~</literal> and <literal>~~*</literal>, and <literal>!~~</literal> and <literal>!~~*</literal>. If you'd like to match case-sensitively, you can cast the operator's arguments to <type>text</type>. 大文字小文字の区別のない照合をできる限り正確にエミュレートするために、数多くのcitext独自版の各種文字列処理演算子と関数があります。したがって、例えば正規表現演算子~および~*は、citextに適用する時に同じ動作を提供します。これら両方は大文字小文字を区別することなくマッチします。 !~や!~*だけではなくLIKE演算子、~~、~~*、!~~、!~~*でも同じことが言えます。もし大文字小文字を区別して比較したい場合は、演算子の引数をtextにキャストすることができます。

Similarly, all of the following functions perform matching case-insensitively if their arguments are <type>citext</type>: 引数がcitextであれば、同様にして以下の関数は大文字小文字を区別しない一致を実行します。

regexp_match()
regexp_matches()
regexp_replace()
regexp_split_to_array()
regexp_split_to_table()
replace()
split_part()
strpos()
translate()

For the regexp functions, if you want to match case-sensitively, you can specify the <quote>c</quote> flag to force a case-sensitive match. Otherwise, you must cast to <type>text</type> before using one of these functions if you want case-sensitive behavior. 正規表現関数（RegExp関数）では、大文字小文字を区別して一致させたい場合「c」フラグを付けて、強制的に大文字小文字を区別して一致させることができます。そうしないと、大文字小文字を区別させたい場合にはこれらの関数のいずれかを使用する前段階でtextにキャストしなければなりません。

F.9.4. 制限事項 #

<title>Limitations</title>

<type>citext</type>'s case-folding behavior depends on the <literal>LC_CTYPE</literal> setting of your database. How it compares values is therefore determined when the database is created. It is not truly case-insensitive in the terms defined by the Unicode standard. Effectively, what this means is that, as long as you're happy with your collation, you should be happy with <type>citext</type>'s comparisons. But if you have data in different languages stored in your database, users of one language may find their query results are not as expected if the collation is for another language. citextの大文字小文字を区別しない動作は使用するデータベースのLC_CTYPEに依存します。どのように値を比較するかは、データベースが作成されたときに決定されます。 Unicode標準の定義という観点では、真に大文字小文字の区別がないわけではありません。実質的に何を意味しているかというと、使用している照合が十分なものであれば、citextによる比較も十分なものになるはずです。しかしデータベースに様々な言語でデータを格納している場合は、ある言語のユーザは照合が他の言語用のものであった場合想定外の問い合わせ結果を得るかもしれません。
As of <productname>PostgreSQL</productname> 9.1, you can attach a <literal>COLLATE</literal> specification to <type>citext</type> columns or data values. Currently, <type>citext</type> operators will honor a non-default <literal>COLLATE</literal> specification while comparing case-folded strings, but the initial folding to lower case is always done according to the database's <literal>LC_CTYPE</literal> setting (that is, as though <literal>COLLATE "default"</literal> were given). This may be changed in a future release so that both steps follow the input <literal>COLLATE</literal> specification. PostgreSQL 9.1では、COLLATE指定をcitext列もしくはデータ値に付け加えることができます。現状では、citext演算子は大文字小文字を含んだ文字列を比較する際に、デフォルトではないCOLLATE指定を重んじます。しかし、最初の小文字変換はデータベースのLC_CTYPE設定にしたがって、常に実行されます（つまり、COLLATE "default"が指定されたようになります）これは、両方のステップが入力されたCOLLATE指定に従うように、将来のリリースにおいて変更されるでしょう。
<type>citext</type> is not as efficient as <type>text</type> because the operator functions and the B-tree comparison functions must make copies of the data and convert it to lower case for comparisons. Also, only <type>text</type> can support B-Tree deduplication. However, <type>citext</type> is slightly more efficient than using <function>lower</function> to get case-insensitive matching. 演算子関数およびB-tree比較関数でデータの複製を作成しそれを比較のために小文字に変換しなければなりませんので、citextはtextほど効率的ではありません。また、textだけがB-Tree重複排除をサポートできます。しかしcitextは、大文字小文字の区別をしない一致をさせるためにlowerを使用する場合よりかなり効率的です。
<type>citext</type> doesn't help much if you need data to compare case-sensitively in some contexts and case-insensitively in other contexts. The standard answer is to use the <type>text</type> type and manually use the <function>lower</function> function when you need to compare case-insensitively; this works all right if case-insensitive comparison is needed only infrequently. If you need case-insensitive behavior most of the time and case-sensitive infrequently, consider storing the data as <type>citext</type> and explicitly casting the column to <type>text</type> when you want case-sensitive comparison. In either situation, you will need two indexes if you want both types of searches to be fast. citextは、ある文脈では大文字小文字の区別を行い、またある文脈では大文字小文字の区別を行わない比較をする必要がある場合、あまり役に立ちません。標準的な解法はtext型を使用し、大文字小文字を区別する比較が必要であれば手作業でlower関数を使用することです。これは大文字小文字を区別しない比較の必要性がまれであれば、問題なく動作します。大文字小文字を区別しない比較がほとんどで、大文字小文字を区別する比較の必要性がまれである場合は、データをcitextとして格納し、大文字小文字を区別する比較の際にその列を明示的にtextにキャストすることを検討してください。どちらの場合でも、2種類の検索の両方を高速にするために２つのインデックスを作成しなければならないでしょう。
The schema containing the <type>citext</type> operators must be in the current <varname>search_path</varname> (typically <literal>public</literal>); if it is not, the normal case-sensitive <type>text</type> operators will be invoked instead. citext演算子を含んだスキーマは、現在のsearch_path(典型的にはpublic)に存在しなければいけません。もし無い場合は通常の大文字小文字が区別されるtext比較が代わりに呼び出されます。
The approach of lower-casing strings for comparison does not handle some Unicode special cases correctly, for example when one upper-case letter has two lower-case letter equivalents. Unicode distinguishes between <firstterm>case mapping</firstterm> and <firstterm>case folding</firstterm> for this reason. Use nondeterministic collations instead of <type>citext</type> to handle that correctly. 比較のために文字列を小文字にする方法は、例えば、大文字1つに対応する小文字が2つある場合等、ユニコードの特別な場合を正しく扱えないことがあります。ユニコードはこの理由で大文字小文字の対応関係と大文字小文字の畳み込みを区別します。正しくその場合を扱うには、citextの代わりに非決定論的照合順序を使ってください。

F.9.5. 作者 #

<title>Author</title>

David E. Wheeler <david@kineticode.com>

Inspired by the original <type>citext</type> module by Donald Fraser. Donald Fraserによるcitextモジュール原本からのヒント

前へ	上へ	次へ
F.8. btree_gist — GiST演算子クラスとB-tree動作	ホーム	F.10. cube — 多次元立方体データ型