65.1. データベースファイルのレイアウト

PostgreSQL 17.5文書
		第65章データベースの物理的な格納	誤訳等の報告
前へ	上へ	65.1. データベースファイルのレイアウト	次へ

65.1. データベースファイルのレイアウト #

<title>Database File Layout</title>

This section describes the storage format at the level of files and directories. 本節ではファイルとディレクトリというレベルで格納書式について説明します。

Traditionally, the configuration and data files used by a database cluster are stored together within the cluster's data directory, commonly referred to as <varname>PGDATA</varname> (after the name of the environment variable that can be used to define it). A common location for <varname>PGDATA</varname> is <filename>/var/lib/pgsql/data</filename>. Multiple clusters, managed by different server instances, can exist on the same machine. 伝統的に、データベースクラスタで利用される制御ファイルとデータファイルは、クラスタのデータディレクトリ内に一緒に格納され、通常（このディレクトリを定義するために使用できる環境変数名にちなんで）PGDATAとして参照されます。通常のPGDATAの位置は/var/lib/pgsql/dataです。異なるサーバインスタンスによって管理することで、複数のクラスタを同一のマシン上に存在させることができます。

The <varname>PGDATA</varname> directory contains several subdirectories and control files, as shown in <xref linkend="pgdata-contents-table"/>. In addition to these required items, the cluster configuration files <filename>postgresql.conf</filename>, <filename>pg_hba.conf</filename>, and <filename>pg_ident.conf</filename> are traditionally stored in <varname>PGDATA</varname>, although it is possible to place them elsewhere. 表 65.1に示すように、PGDATAディレクトリには数個のサブディレクトリと制御ファイルがあります。これら必要な項目に加え、クラスタの設定ファイルであるpostgresql.conf、pg_hba.confおよびpg_ident.confが、他の場所にも置くことができますが、伝統的にPGDATA内に格納されます。

表65.1 PGDATAの内容

<title>Contents of <varname>PGDATA</varname></title>

Item 項目	説明
`PG_VERSION`	PostgreSQLの主バージョン番号を保有するファイル
`base`	データベースごとのサブディレクトリを保有するサブディレクトリ
`current_logfiles`	ログ収集機構が現在書き込んでいるログファイルを記録するファイル
`global`	`pg_database`のようなクラスタで共有するテーブルを保有するサブディレクトリ
`pg_commit_ts`	トランザクションのコミット時刻のデータを保有するサブディレクトリ
`pg_dynshmem`	動的共有メモリサブシステムで使われるファイルを保有するサブディレクトリ
`pg_logical`	ロジカルデコーディングのための状態データを保有するサブディレクトリ
`pg_multixact`	マルチトランザクションの状態のデータを保有するサブディレクトリ（共有行ロックで使用されます）
`pg_notify`	LISTEN/NOTIFY状態データを保有するサブディレクトリ
`pg_replslot`	レプリケーションスロットデータを保有するサブディレクトリ
`pg_serial`	コミットされたシリアライザブルトランザクションに関する情報を保有するサブディレクトリ
`pg_snapshots`	エクスポートされたスナップショットを保有するサブディレクトリ
`pg_stat`	統計サブシステム用の永続ファイルを保有するサブディレクトリ
`pg_stat_tmp`	統計サブシステム用の一時ファイルを保有するサブディレクトリ
`pg_subtrans`	サブトランザクションの状態のデータを保有するサブディレクトリ
`pg_tblspc`	テーブル空間へのシンボリックリンクを保有するサブディレクトリ
`pg_twophase`	プリペアドトランザクション用の状態ファイルを保有するサブディレクトリ
`pg_wal`	WAL（先行書き込みログ）ファイルを保有するサブディレクトリ
`pg_xact`	トランザクションのコミット状態のデータを保有するサブディレクトリ
`postgresql.auto.conf`	`ALTER SYSTEM`により設定された設定パラメータを格納するのに使われるファイル
`postmaster.opts`	最後にサーバを起動した時のコマンドラインオプションを記録するファイル
`postmaster.pid`	現在のpostmasterプロセスID（PID）、クラスタのデータディレクトリパス、postmaster起動時のタイムスタンプ、ポート番号、Unixドメインソケットのディレクトリパス（空も可）、有効な監視アドレスの一番目（IPアドレスまたは`*`、TCPを監視していない場合は空）および共有メモリのセグメントIDを記録するロックファイル（サーバが停止した後は存在しません）

For each database in the cluster there is a subdirectory within <varname>PGDATA</varname><filename>/base</filename>, named after the database's OID in <structname>pg_database</structname>. This subdirectory is the default location for the database's files; in particular, its system catalogs are stored there. クラスタ内の各データベースに対して、PGDATA/base内にサブディレクトリが存在し、サブディレクトリ名はpg_database内のデータベースOIDとなります。このサブディレクトリはデータベースファイルのデフォルトの位置であり、特にシステムカタログがそこに格納されます。

Note that the following sections describe the behavior of the builtin <literal>heap</literal> <link linkend="tableam">table access method</link>, and the builtin <link linkend="indexam">index access methods</link>. Due to the extensible nature of <productname>PostgreSQL</productname>, other access methods might work differently. 以下の節では、組み込みのheapテーブルアクセスメソッドと組み込みのインデックスアクセスメソッドの振舞いを説明していることに注意してください。 PostgreSQLの拡張性のため、他のアクセスメソッドは異なる動作をするかもしれません。

Each table and index is stored in a separate file. For ordinary relations, these files are named after the table or index's <firstterm>filenode</firstterm> number, which can be found in <structname>pg_class</structname>.<structfield>relfilenode</structfield>. But for temporary relations, the file name is of the form <literal>t<replaceable>BBB</replaceable>_<replaceable>FFF</replaceable></literal>, where <replaceable>BBB</replaceable> is the process number of the backend which created the file, and <replaceable>FFF</replaceable> is the filenode number. In either case, in addition to the main file (a/k/a main fork), each table and index has a <firstterm>free space map</firstterm> (see <xref linkend="storage-fsm"/>), which stores information about free space available in the relation. The free space map is stored in a file named with the filenode number plus the suffix <literal>_fsm</literal>. Tables also have a <firstterm>visibility map</firstterm>, stored in a fork with the suffix <literal>_vm</literal>, to track which pages are known to have no dead tuples. The visibility map is described further in <xref linkend="storage-vm"/>. Unlogged tables and indexes have a third fork, known as the initialization fork, which is stored in a fork with the suffix <literal>_init</literal> (see <xref linkend="storage-init"/>). 各テーブルおよびインデックスは別個のファイルに格納されます。通常のリレーションでは、これらのファイル名はテーブルまたはインデックスのファイルノード番号となります。ファイルノード番号はpg_class.relfilenode内で見つけられます。しかし一時的なリレーションでは、ファイル名はtBBB_FFFという形になります。ここでBBBはファイルを生成したバックエンドのプロセス番号、FFFはファイルノード番号です。どちらの場合でも、主ファイル（いわゆる主フォーク）に加え、それぞれのテーブルとインデックスはリレーションに利用できる空き領域についての情報を格納する空き領域マップ（65.3参照）を持ちます。空き領域マップはファイルノード番号に接尾辞_fsmがついた名前のファイルに格納されます。テーブルは同時に、どのページが無効なタプルを持っていないと判断できるように追跡する可視性マップを持ち、フォークに接尾辞_vmを付けたファイルに格納します。可視性マップは65.4でより詳しく解説します。ログを取らないテーブルとインデックスは、初期化フォークという第３のフォークを持ち、フォークに接尾辞_initを付けたファイルに格納します（65.5参照）。

注意

Note that while a table's filenode often matches its OID, this is <emphasis>not</emphasis> necessarily the case; some operations, like <command>TRUNCATE</command>, <command>REINDEX</command>, <command>CLUSTER</command> and some forms of <command>ALTER TABLE</command>, can change the filenode while preserving the OID. Avoid assuming that filenode and table OID are the same. Also, for certain system catalogs including <structname>pg_class</structname> itself, <structname>pg_class</structname>.<structfield>relfilenode</structfield> contains zero. The actual filenode number of these catalogs is stored in a lower-level data structure, and can be obtained using the <function>pg_relation_filenode()</function> function. テーブルにおけるファイルノード番号とOIDは多くの場合一致しますが、常に一致するとは限らないことに注意してください。 TRUNCATE、REINDEX、CLUSTER等のいくつかの操作、およびALTER TABLEにおけるいくつかの構文は、OIDを保持したままファイルノード番号を変更できます。ファイルノード番号とテーブルOIDが同一であると仮定しないでください。またpg_class自身を含む特定のシステムカタログにおいて、pg_class.relfilenodeはゼロを持ちます。これらのカタログの実際のファイルノード番号は低レベルなデータ構造内に保管されており、pg_relation_filenode()関数を使用して入手できます。

When a table or index exceeds 1 GB, it is divided into gigabyte-sized <firstterm>segments</firstterm>. The first segment's file name is the same as the filenode; subsequent segments are named filenode.1, filenode.2, etc. This arrangement avoids problems on platforms that have file size limitations. (Actually, 1 GB is just the default segment size. The segment size can be adjusted using the configuration option <option>--with-segsize</option> when building <productname>PostgreSQL</productname>.) In principle, free space map and visibility map forks could require multiple segments as well, though this is unlikely to happen in practice. テーブルまたはインデックスが１ギガバイトを超えると、ギガバイト単位のセグメントに分割されます。最初のセグメントのファイル名はファイルノード番号と同一であり、それ以降は、ファイルノード番号.1、ファイルノード番号.2等の名称になります。この配置法によってファイル容量に制限のあるプラットフォームにおける問題を回避します。（実際、１ギガバイトは単なるデフォルトのセグメント容量です。セグメント容量はPostgreSQLを構築する際、--with-segsize設定オプションを使用して調整することができます。）原理上、空き領域マップと可視性マップのフォークにおいても複数のセグメントも必要とする可能性がありますが、実際のところは起こりそうにありません。

A table that has columns with potentially large entries will have an associated <firstterm>TOAST</firstterm> table, which is used for out-of-line storage of field values that are too large to keep in the table rows proper. <structname>pg_class</structname>.<structfield>reltoastrelid</structfield> links from a table to its <acronym>TOAST</acronym> table, if any. See <xref linkend="storage-toast"/> for more information. 項目が大きくなりそうな列を持ったテーブルは、連携したTOASTテーブルを保有する可能性があります。 TOASTテーブルは、テーブル行の中には大き過ぎて適切に保持できないフィールド値を行外の格納をするために使用されます。 TOASTテーブルが存在する時、pg_class.reltoastrelidは元のテーブルとTOASTテーブルを結びつけます。 65.2を参照してください。

The contents of tables and indexes are discussed further in <xref linkend="storage-page-layout"/>. テーブルおよびインデックスの内容は、65.6においてさらに考察されています。

Tablespaces make the scenario more complicated. Each user-defined tablespace has a symbolic link inside the <varname>PGDATA</varname><filename>/pg_tblspc</filename> directory, which points to the physical tablespace directory (i.e., the location specified in the tablespace's <command>CREATE TABLESPACE</command> command). This symbolic link is named after the tablespace's OID. Inside the physical tablespace directory there is a subdirectory with a name that depends on the <productname>PostgreSQL</productname> server version, such as <literal>PG_9.0_201008051</literal>. (The reason for using this subdirectory is so that successive versions of the database can use the same <command>CREATE TABLESPACE</command> location value without conflicts.) Within the version-specific subdirectory, there is a subdirectory for each database that has elements in the tablespace, named after the database's OID. Tables and indexes are stored within that directory, using the filenode naming scheme. The <literal>pg_default</literal> tablespace is not accessed through <filename>pg_tblspc</filename>, but corresponds to <varname>PGDATA</varname><filename>/base</filename>. Similarly, the <literal>pg_global</literal> tablespace is not accessed through <filename>pg_tblspc</filename>, but corresponds to <varname>PGDATA</varname><filename>/global</filename>. テーブル空間は状況をさらに複雑にします。ユーザが定義したテーブル空間はそれぞれ、PGDATA/pg_tblspcディレクトリ内に物理的なテーブル空間ディレクトリ（つまりそのテーブル空間のCREATE TABLESPACEコマンドで指定された場所）を指し示す、シンボリックリンクを持ちます。シンボリックリンクの名称はテーブル空間のOIDとなります。物理的テーブル空間ディレクトリの内部では、PG_9.0_201008051などのPostgreSQLサーバのバージョンに依存した名前のサブディレクトリが存在します。（このサブディレクトリを使用する理由は、競合することなくCREATE TABLESPACEで指定する場所と同じものを将来のバージョンのデータベースでも使用できるようにするためです。）このバージョン固有のサブディレクトリの内部では、テーブル空間に要素を持つデータベースごとに、データベースOIDをディレクトリ名としたサブディレクトリが存在します。テーブルとインデックスは、ファイルノードの命名の規定に従って、そのディレクトリ内に格納されます。 pg_defaultテーブル空間は pg_tblspcを通してアクセスされるのではなく、PGDATA/baseと連携します。同様に、pg_globalテーブル空間はpg_tblspcを通してアクセスされるのではなく、PGDATA/globalと連携します。

The <function>pg_relation_filepath()</function> function shows the entire path (relative to <varname>PGDATA</varname>) of any relation. It is often useful as a substitute for remembering many of the above rules. But keep in mind that this function just gives the name of the first segment of the main fork of the relation — you may need to append a segment number and/or <literal>_fsm</literal>, <literal>_vm</literal>, or <literal>_init</literal> to find all the files associated with the relation. pg_relation_filepath()関数は任意のリレーションの(PGDATAから相対的な)パス全体を示します。これは上の規則の多くを記憶する必要がありませんので、しばしば有用です。しかし、この関数がリレーションの主フォークの最初のセグメントの名前だけを返すことに注意して下さい。リレーションに関したすべてのファイルを見つけるためにセグメント番号や_fsmや_vm、_initを追加する必要があるかもしれません。

Temporary files (for operations such as sorting more data than can fit in memory) are created within <varname>PGDATA</varname><filename>/base/pgsql_tmp</filename>, or within a <filename>pgsql_tmp</filename> subdirectory of a tablespace directory if a tablespace other than <literal>pg_default</literal> is specified for them. The name of a temporary file has the form <filename>pgsql_tmp<replaceable>PPP</replaceable>.<replaceable>NNN</replaceable></filename>, where <replaceable>PPP</replaceable> is the PID of the owning backend and <replaceable>NNN</replaceable> distinguishes different temporary files of that backend. 一時ファイル（メモリ内に収まりきらないデータのソートなどの操作用）はPGDATA/base/pgsql_tmp内、または、pg_default以外のテーブル空間が指定されていた場合はテーブル空間ディレクトリ下のpgsql_tmpサブディレクトリ内に作成されます。一時ファイルの名前はpgsql_tmpPPP.NNNという形式です。ここで、PPPは所有するバックエンドのPIDであり、NNNで同一バックエンドで作成された別の一時ファイルと区別します。