14.3. 明示的なJOIN句でプランナを制御する

PostgreSQL 17.5文書
		第14章性能に関するヒント	誤訳等の報告
前へ	上へ	14.3. 明示的な`JOIN`句でプランナを制御する	次へ

14.3. 明示的な`JOIN`句でプランナを制御する #

<title>Controlling the Planner with Explicit <literal>JOIN</literal> Clauses</title>

It is possible to control the query planner to some extent by using the explicit <literal>JOIN</literal> syntax. To see why this matters, we first need some background. 明示的なJOIN構文を使って問い合わせプランナをある程度制御できます。どうしてこういうことが問題になるのか、まずその背景を見る必要があります。

In a simple join query, such as: 単純な問い合わせ、例えば

SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;

the planner is free to join the given tables in any order. For example, it could generate a query plan that joins A to B, using the <literal>WHERE</literal> condition <literal>a.id = b.id</literal>, and then joins C to this joined table, using the other <literal>WHERE</literal> condition. Or it could join B to C and then join A to that result. Or it could join A to C and then join them with B — but that would be inefficient, since the full Cartesian product of A and C would have to be formed, there being no applicable condition in the <literal>WHERE</literal> clause to allow optimization of the join. (All joins in the <productname>PostgreSQL</productname> executor happen between two input tables, so it's necessary to build up the result in one or another of these fashions.) The important point is that these different join possibilities give semantically equivalent results but might have hugely different execution costs. Therefore, the planner will explore all of them to try to find the most efficient query plan. では、プランナは自由に与えられたテーブルを任意の順で結合することができます。例えば、WHERE条件のa.id = b.idを使ってまずAとBを結合し、他のWHERE条件を使ってその結合テーブルにCを結合するといった計画を立てることができます。あるいは、BとCを結合し、その結果にAを結合することもできます。あるいは、AとCを結合し、その結果にBを結合することもできるでしょう。しかし、それでは効率が良くありません。なぜなら、結合の最適化を行うために適用できる条件がWHERE句にないので、AとCの全デカルト積が作られるからです。（PostgreSQLのエグゼキュータでは、結合はすべて2つのテーブルの間で行われるため、このようにして1つひとつ結果を作っていかなければなりません。）重要なのは、これらの違った結合の方法は意味的には同じ結果なのですが、実行コストは大きく異なる可能性があるということです。ですから、プランナは最も効率の良い計画を探すために可能な計画をすべて検査します。

When a query only involves two or three tables, there aren't many join orders to worry about. But the number of possible join orders grows exponentially as the number of tables expands. Beyond ten or so input tables it's no longer practical to do an exhaustive search of all the possibilities, and even for six or seven tables planning might take an annoyingly long time. When there are too many input tables, the <productname>PostgreSQL</productname> planner will switch from exhaustive search to a <firstterm>genetic</firstterm> probabilistic search through a limited number of possibilities. (The switch-over threshold is set by the <xref linkend="guc-geqo-threshold"/> run-time parameter.) The genetic search takes less time, but it won't necessarily find the best possible plan. 結合の対象がせいぜい2、3個のテーブルなら心配するほど結合の種類は多くありません。しかし、テーブル数が増えると可能な結合の数は指数関数的に増えていきます。 10程度以上にテーブルが増えると、すべての可能性をしらみつぶしに探索することはもはや実用的ではなくなります。 6や7個のテーブルでさえも、計画を作成する時間が無視できなくなります。テーブルの数が多過ぎる時は、PostgreSQLのプランナはしらみつぶしの探索から、限られた可能性だけを探索する遺伝的確率的な探索へと切り替わります。（切り替えの閾値はgeqo_threshold実行時パラメータで設定されます。）遺伝的探索は短い時間で探索を行いますが、必ずしも最適な計画を見つけるとは限りません。

When the query involves outer joins, the planner has less freedom than it does for plain (inner) joins. For example, consider: 外部結合が含まれるような問い合わせでは、通常の（内部）結合よりプランナの選択の余地が小さくなります。例えば、次のような問い合わせを考えます。

SELECT * FROM a LEFT JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);

Although this query's restrictions are superficially similar to the previous example, the semantics are different because a row must be emitted for each row of A that has no matching row in the join of B and C. Therefore the planner has no choice of join order here: it must join B to C and then join A to that result. Accordingly, this query takes less time to plan than the previous query. In other cases, the planner might be able to determine that more than one join order is safe. For example, given: この問い合わせの検索条件は前述の例と表面的には似ているように思えますが、BとCの結合結果の行に適合しないAの各行が出力されなければならないため、意味的には異なります。したがって、ここではプランナには結合順に関して選択の余地がありません。まずBとCを結合し、その結果にAを結合しなければならないのです。そういうわけで、この問い合わせでは計画を立てるのに要する時間は前の例よりも短くなります。その他の場合、プランナが安全な結合順を複数決定できる可能性があります。例えば、以下を考えてみます。

SELECT * FROM a LEFT JOIN b ON (a.bid = b.id) LEFT JOIN c ON (a.cid = c.id);

it is valid to join A to either B or C first. Currently, only <literal>FULL JOIN</literal> completely constrains the join order. Most practical cases involving <literal>LEFT JOIN</literal> or <literal>RIGHT JOIN</literal> can be rearranged to some extent. この場合、Aを先にBと結合してもCと結合しても有効です。現時点では、FULL JOINのみが完全に結合順を制限します。 LEFT JOINやRIGHT JOINを含む、ほとんどの実環境では、何らかの拡張に再調整することができます。

Explicit inner join syntax (<literal>INNER JOIN</literal>, <literal>CROSS JOIN</literal>, or unadorned <literal>JOIN</literal>) is semantically the same as listing the input relations in <literal>FROM</literal>, so it does not constrain the join order. 明示的な内部結合構文（INNER JOIN、CROSS JOIN、装飾のないJOIN）は、意味的にはFROM内の入力リレーションの列挙と同じです。したがって、結合順を制約しません。

Even though most kinds of <literal>JOIN</literal> don't completely constrain the join order, it is possible to instruct the <productname>PostgreSQL</productname> query planner to treat all <literal>JOIN</literal> clauses as constraining the join order anyway. For example, these three queries are logically equivalent: ほとんどの種類のJOINは完全に結合順を制約しませんが、PostgreSQL問い合わせプランナに、すべてのJOIN句に対してとりあえず結合順を制限させることができます。例えば、以下の3つの問い合わせは論理的には同一です。

SELECT * FROM a, b, c WHERE a.id = b.id AND b.ref = c.id;
SELECT * FROM a CROSS JOIN b CROSS JOIN c WHERE a.id = b.id AND b.ref = c.id;
SELECT * FROM a JOIN (b JOIN c ON (b.ref = c.id)) ON (a.id = b.id);

But if we tell the planner to honor the <literal>JOIN</literal> order, the second and third take less time to plan than the first. This effect is not worth worrying about for only three tables, but it can be a lifesaver with many tables. しかし、プランナにJOINの順番を守るように伝えた場合、2番目と3番目の問い合わせは最初のものよりも短い時間で計画を立てることができます。この効果はたった3つのテーブルでは気にするほどのものではありませんが、多くのテーブルを結合する際には最後の頼みの綱になるかもしれません。

To force the planner to follow the join order laid out by explicit <literal>JOIN</literal>s, set the <xref linkend="guc-join-collapse-limit"/> run-time parameter to 1. (Other possible values are discussed below.) プランナを強制的に明示的なJOINに潜在する結合順に従わせるには、join_collapse_limit実行時パラメータを1に設定してください。（以下で他の取り得る値について説明します。）

You do not need to constrain the join order completely in order to cut search time, because it's OK to use <literal>JOIN</literal> operators within items of a plain <literal>FROM</literal> list. For example, consider: 検索時間を節約するために、結合順を完全に束縛する必要はありません。なぜなら、単純なFROMリストの項目内にJOIN演算子を使っても構わないからです。例えば、次の例です。

SELECT * FROM a CROSS JOIN b, c, d, e WHERE ...;

With <varname>join_collapse_limit</varname> = 1, this forces the planner to join A to B before joining them to other tables, but doesn't constrain its choices otherwise. In this example, the number of possible join orders is reduced by a factor of 5. join_collapse_limit = 1とした場合、プランナは強制的に他のテーブルと結合する前にAとBを結合しますが、それ以外については特に拘束はありません。この例では、結合順の候補は5の階乗分の1に減ります。

Constraining the planner's search in this way is a useful technique both for reducing planning time and for directing the planner to a good query plan. If the planner chooses a bad join order by default, you can force it to choose a better order via <literal>JOIN</literal> syntax — assuming that you know of a better order, that is. Experimentation is recommended. こうした方法でプランナの検索に制約を加えることは、計画作成時間の短縮とプランナに対する優れた問い合わせ計画への方向付けの両方のために有用な技法です。プランナが劣った結合順をデフォルトで選択するのであれば、JOIN構文経由でより良い順番を選択するように強制することができます。ただし、より良い順番を理解しているという前提があります。これには実験することを勧めます。

A closely related issue that affects planning time is collapsing of subqueries into their parent query. For example, consider: 計画作成時間に影響する密接に関連した問題として、副問い合わせをその親問い合わせに折り畳むことがあります。例えば、以下を考えてみます。

SELECT *
FROM x, y,
    (SELECT * FROM a, b, c WHERE something) AS ss
WHERE somethingelse;

This situation might arise from use of a view that contains a join; the view's <literal>SELECT</literal> rule will be inserted in place of the view reference, yielding a query much like the above. Normally, the planner will try to collapse the subquery into the parent, yielding: こうした状況は、結合を含むビューを使用する際に現れます。そのビューのSELECTルールはビューを参照するところに挿入され、上のような問い合わせを生成します。通常、プランナは副問い合わせを親問い合わせに折り畳み、以下を生成します。

SELECT * FROM x, y, a, b, c WHERE something AND somethingelse;

This usually results in a better plan than planning the subquery separately. (For example, the outer <literal>WHERE</literal> conditions might be such that joining X to A first eliminates many rows of A, thus avoiding the need to form the full logical output of the subquery.) But at the same time, we have increased the planning time; here, we have a five-way join problem replacing two separate three-way join problems. Because of the exponential growth of the number of possibilities, this makes a big difference. The planner tries to avoid getting stuck in huge join search problems by not collapsing a subquery if more than <varname>from_collapse_limit</varname> <literal>FROM</literal> items would result in the parent query. You can trade off planning time against quality of plan by adjusting this run-time parameter up or down. これは通常、副問い合わせの計画を別途作成するより優れた計画を作成します。（例えば、外部のWHERE条件はXをAに結合するようになり、まずAの多くの行が取り除かれます。これにより、副問い合わせの完全な論理的出力が不要になります。）しかし、同時に計画作成時間が増加します。この場合、2つの3通りの結合問題から5通りの結合問題になります。候補数は指数関数的に増加するため、これは大きな違いになります。プランナは大規模な結合検索問題で行き詰まらないように、もしfrom_collapse_limit個のFROM項目が親問い合わせで発生してしまう場合は副問い合わせの折り畳みを抑制します。この実行時パラメータの値を上下に調整することで計画作成時間と計画の質をトレードオフすることができます。

<xref linkend="guc-from-collapse-limit"/> and <xref linkend="guc-join-collapse-limit"/> are similarly named because they do almost the same thing: one controls when the planner will <quote>flatten out</quote> subqueries, and the other controls when it will flatten out explicit joins. Typically you would either set <varname>join_collapse_limit</varname> equal to <varname>from_collapse_limit</varname> (so that explicit joins and subqueries act similarly) or set <varname>join_collapse_limit</varname> to 1 (if you want to control join order with explicit joins). But you might set them differently if you are trying to fine-tune the trade-off between planning time and run time. 両者はほとんど同じことを行うため、from_collapse_limitとjoin_collapse_limitは似たような名前になっています。片方は副問い合わせの「平坦化」をプランナがいつ行うかを制御し、もう片方は明示的な結合の平坦化をいつ行うかを制御します。通常、join_collapse_limitをfrom_collapse_limitと同じ値に設定する（明示的な結合と副問い合わせの動作を同じにする）か、join_collapse_limitを1に設定する（明示的な結合で結合順を制御したい場合）かのどちらかを行います。しかし、計画作成時間と実行時間の間のトレードオフを細かく調整するつもりであれば、これらを別の値に設定しても構いません。

前へ	上へ	次へ
14.2. プランナで使用される統計情報	ホーム	14.4. データベースへのデータ投入

14.3. 明示的なJOIN句でプランナを制御する #

14.3. 明示的な`JOIN`句でプランナを制御する #