15.1. パラレルクエリはどのように動くのか

PostgreSQL 17.5文書
		第15章パラレルクエリ	誤訳等の報告
前へ	上へ	15.1. パラレルクエリはどのように動くのか	次へ

15.1. パラレルクエリはどのように動くのか #

<title>How Parallel Query Works</title>

When the optimizer determines that parallel query is the fastest execution strategy for a particular query, it will create a query plan that includes a <firstterm>Gather</firstterm> or <firstterm>Gather Merge</firstterm> node. Here is a simple example: あるクエリの最速の実行戦略がパラレルクエリであるとオプティマイザが決定すると、GatherまたはGather Mergeノードを含むクエリプランを作成します。単純な例を示します。

EXPLAIN SELECT * FROM pgbench_accounts WHERE filler LIKE '%x%';
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Gather  (cost=1000.00..217018.43 rows=1 width=97)
   Workers Planned: 2
   ->  Parallel Seq Scan on pgbench_accounts  (cost=0.00..216018.33 rows=1 width=97)
         Filter: (filler ~~ '%x%'::text)
(4 rows)

In all cases, the <literal>Gather</literal> or <literal>Gather Merge</literal> node will have exactly one child plan, which is the portion of the plan that will be executed in parallel. If the <literal>Gather</literal> or <literal>Gather Merge</literal> node is at the very top of the plan tree, then the entire query will execute in parallel. If it is somewhere else in the plan tree, then only the portion of the plan below it will run in parallel. In the example above, the query accesses only one table, so there is only one plan node other than the <literal>Gather</literal> node itself; since that plan node is a child of the <literal>Gather</literal> node, it will run in parallel. どの場合でも、GatherまたはGather Mergeノードは、正確に一つの子ノードを持ちます。子プランは、プランの中で並列に実行される部分です。 GatherまたはGather Mergeノードがプランツリーの中で最上位にある場合は、クエリ全体が並列に実行されます。 GatherまたはGather Mergeノードがプランツリーの他の部分にある場合は、その部分だけが並列に実行されます。上の例では、クエリはただ一つのテーブルにアクセスするので、Gatherノード自身以外では、たった一つのプランノードだけが存在します。そのプランノードはGatherノードの子ノードなので、並列に実行されます。

Every background worker process that is successfully started for a given parallel query will execute the parallel portion of the plan. The leader will also execute that portion of the plan, but it has an additional responsibility: it must also read all of the tuples generated by the workers. When the parallel portion of the plan generates only a small number of tuples, the leader will often behave very much like an additional worker, speeding up query execution. Conversely, when the parallel portion of the plan generates a large number of tuples, the leader may be almost entirely occupied with reading the tuples generated by the workers and performing any further processing steps that are required by plan nodes above the level of the <literal>Gather</literal> node or <literal>Gather Merge</literal> node. In such cases, the leader will do very little of the work of executing the parallel portion of the plan. 与えられたパラレルクエリから起動されたすべてのバックグラウンドワーカープロセスは、そのプランの一部を実行します。リーダーはそうしたプランの部分を実行するだけでなく、追加の任務が与えられます。つまり、ワーカーが生成したすべてのタプルを読み込まなければなりません。プラン中のパラレル部分が少数のタプルしか生成しない場合は、リーダーは追加のワーカーとほぼ同じように振る舞い、クエリの実行を高速化します。反対にプラン中のパラレル部分が大量のタプルを生成する場合は、リーダーはワーカーが生成したタプルの読み込みと、GatherノードあるいはGather Mergeより上位のプランノードが要求する追加の処理ステップに忙殺されるかもしれません。そのような場合は、リーダーはプランの並列実行部分のごく一部しか処理しません。

When the node at the top of the parallel portion of the plan is <literal>Gather Merge</literal> rather than <literal>Gather</literal>, it indicates that each process executing the parallel portion of the plan is producing tuples in sorted order, and that the leader is performing an order-preserving merge. In contrast, <literal>Gather</literal> reads tuples from the workers in whatever order is convenient, destroying any sort order that may have existed. プランの並列部分の最上位ノードがGatherではなくてGather Mergeなら、プランの並列部分を実行する各プロセスはタプルをソート順に生成し、リーダーはソート順を保存するマージを実行していることを意味します。対照的に、Gatherは、ワーカーから都合の良い順でタプルを読み込むので、ソート順が存在しているとしても、それを壊してしまいます。