Skip to content

Commit d0ee56b

Browse files
authored
Update docs (#48)
1 parent 386cc87 commit d0ee56b

File tree

22 files changed

+653
-359
lines changed

22 files changed

+653
-359
lines changed

docs/architecture/comparison.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ PgDog aims to be the de facto PostgreSQL proxy and pooler. Below is a feature co
2828
| [Manual routing](../features/sharding/manual-routing.md) | Only using comments (regex), doesn't work with prepared statements | :material-check-circle-outline: |
2929
| [Automatic routing](../features/sharding/query-routing.md) | No | :material-check-circle-outline: |
3030
| [Primary key generation](../features/sharding/schema_management/primary_keys.md) | No | :material-check-circle-outline: |
31-
| [Cross-shard queries](../features/sharding/cross-shard.md) | No | Partial support |
32-
| [COPY](../features/sharding/copy.md) | No | :material-check-circle-outline: |
31+
| [Cross-shard queries](../features/sharding/cross-shard-queries/index.md) | No | Partial support |
32+
| [COPY](../features/sharding/cross-shard-queries/copy.md) | No | :material-check-circle-outline: |
3333
| [Postgres-compatible sharding functions](../features/sharding/sharding-functions.md) | No | Same functions as declarative partitioning |
3434
| Two-Phase Commit | No | :material-check-circle-outline: |

docs/configuration/pgdog.toml/general.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,8 +393,23 @@ Default: **`50_000`**
393393

394394
### `query_parser_enabled`
395395

396+
!!! warning "Deprecated setting"
397+
This setting is deprecated. Use [`query_parser`](#query_parser) instead.
398+
396399
Force-enable query parsing to take advantage of its features in non-sharded databases, like [advisory locks](../../features/transaction-mode.md#advisory-locks) or managing [session state](../../features/transaction-mode.md#session-state).
397400

401+
### `query_parser`
402+
403+
Toggle the query parser to enable/disable query parsing and all of its benefits. By default, the query parser is turned on automatically, so only disable it if you know what you're doing.
404+
405+
Available options:
406+
407+
- `on` (enabled)
408+
- `off` (disabled)
409+
- `auto` (automatically enabled or disabled, depending on database configuration)
410+
411+
Default: **`auto`**
412+
398413
## Logging
399414

400415
### `log_connections`

docs/configuration/pgdog.toml/rewrite.md

Lines changed: 8 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,9 @@
22
icon: material/alpha-r-box-outline
33
---
44

5-
# Rewrite
5+
# Rewrite engine
66

7-
The `rewrite` section controls PgDog's automatic SQL rewrites for sharded clusters. It affects shard-key updates and multi-row INSERT statements, and can be toggled globally or per-policy.
8-
9-
## Options
7+
The `rewrite` section controls PgDog's automatic SQL rewrites for sharded databases. It affects sharding key updates and multi-tuple inserts. Either one can be toggled separately:
108

119
```toml
1210
[rewrite]
@@ -15,58 +13,29 @@ shard_key = "error"
1513
split_inserts = "error"
1614
```
1715

18-
| Field | Description | Default |
16+
| Setting | Description | Default |
1917
| --- | --- | --- |
20-
| `enabled` | Master toggle: when `false`, PgDog parses but never applies rewrite plans. | `false` |
18+
| `enabled` | Enables/disables the query rewrite engine. | `false` |
2119
| `shard_key` | Behavior when an `UPDATE` changes a sharding key: `error` rejects the statement,<br>`rewrite` migrates the row between shards,<br>`ignore` forwards it unchanged. | `"error"` |
2220
| `split_inserts` | Behavior when a sharded table receives a multi-row `INSERT`: `error` rejects the statement, `rewrite` fans the rows out to their shards, `ignore` forwards it unchanged. | `"error"` |
2321

2422
!!! note "Two-phase commit"
25-
PgDog recommends enabling [two-phase commit](../../features/sharding/2pc.md) when either policy is set to `rewrite`. Without it, rewrites are committed shard-by-shard and can leave partial changes if a shard fails.
23+
Consider enabling [two-phase commit](../../features/sharding/2pc.md) when either feature is set to `rewrite`. Without it, rewrites are committed shard-by-shard and can leave partial changes if a transaction fails.
2624

2725
## Runtime overrides
2826

2927
The admin database exposes these toggles via the `SET` command:
3028

3129
```postgresql
32-
SET rewrite_enabled TO true; -- mirrors [rewrite].enabled
30+
SET rewrite_enabled TO true; -- enable/disable rewrite engine
3331
SET rewrite_shard_key_updates TO rewrite; -- error | rewrite | ignore
3432
SET rewrite_split_inserts TO rewrite; -- error | rewrite | ignore
3533
```
3634

3735
The setting changes are applied immediately. These overrides allow canary testing before persisting them in `pgdog.toml`.
3836

39-
## Limitations
40-
41-
### Sharding key updates
42-
43-
Sharding key rewrites in an `UPDATE` clause have to resolve to a single row. If the sharding key isn't unique or the `WHERE` clause has an incorrect `OR` condition, for example, PgDog will rollback the transaction and raise an error.
44-
45-
For example:
46-
47-
```postgresql
48-
UPDATE users SET id = 5 WHERE admin = true;
49-
```
50-
51-
On a single-shard deployment, this would raise a unique index violation error. On a cross-shard deployment, the PgDog rewrite engine will block cross-shard updates that could potentially affect multiple rows.
52-
53-
### Multi-tuple inserts
54-
55-
`INSERT` statements with multiple tuples have to be executed outside of an explicit transaction. PgDog needs to start a cross-shard transaction to safely commit the rows to multiple shards, and an existing transaction will interfere with its internal state.
56-
57-
For example:
58-
59-
```postgresql
60-
BEGIN;
61-
INSERT INTO users VALUES ($1, $2), ($3, $4);
62-
```
63-
64-
This scenario will raise an error (code `25001`).
65-
66-
### Default behavior
67-
68-
Both split inserts and sharding key updates fallback to raising an error if `enabled` is set to `false`.
6937

7038
### Read more
7139

72-
- [Rewrite behavior](../../features/sharding/sharding-functions.md#rewrite-behavior)
40+
- [Cross-shard INSERT](../../features/sharding/cross-shard-queries/insert.md#multiple-tuples)
41+
- [Cross-shard UPDATE](../../features/sharding/cross-shard-queries/update.md#sharding-key-updates)

docs/features/sharding/.pages

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
nav:
22
- 'index.md'
33
- 'basics.md'
4+
- 'supported-queries.md'
45
- 'query-routing.md'
56
- 'manual-routing.md'
6-
- 'cross-shard.md'
7+
- 'cross-shard-queries'
78
- 'sharding-functions.md'
8-
- 'copy.md'
99
- '...'
1010
- 'resharding'
1111
- 'internals'

docs/features/sharding/2pc.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ ALTER SYSTEM SET max_prepared_transactions TO 1000;
3131
Alternatively, if you're running on managed Postgres (e.g., AWS RDS), this parameter can usually be set through your cloud admin panel.
3232

3333
!!! note
34-
This parameter can only be enabled on server start. Once you change it, make sure to restart your Postgres servers.
34+
Changes to this parameter require a server restart to take effect.
3535

3636
Once prepared transactions are enabled in Postgres, two-phase commit can be enabled in [`pgdog.toml`](../../configuration/pgdog.toml/general.md):
3737

@@ -104,4 +104,4 @@ Two-phase commit is used for writes only. Read transactions are finished using n
104104
## Read more
105105

106106
- [Omnisharded tables](omnishards.md)
107-
- [Cross-shard queries](cross-shard.md)
107+
- [Cross-shard queries](cross-shard-queries/index.md)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
title: Cross-shard queries
2+
nav:
3+
- 'index.md'
4+
- 'select.md'
5+
- 'insert.md'
6+
- 'update.md'
7+
- 'ddl.md'
8+
- '...'

docs/features/sharding/copy.md renamed to docs/features/sharding/cross-shard-queries/copy.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
icon: material/upload
33
---
4-
# COPY command
4+
# COPY
55

6-
`COPY` is a special PostgreSQL command that ingests a file directly into a specified database table. This allows for writing data faster than by using individual `INSERT` queries.
6+
`COPY` is a special PostgreSQL command that can ingest a file directly into a specified database table. This allows for writing data faster than by using individual `INSERT` queries.
77

88
PgDog supports parsing the `COPY` command, splitting the input data stream automatically, and sending the rows to each shard in parallel.
99

@@ -22,7 +22,7 @@ PgDog supports sharding data sent via `COPY`, using any one of the following for
2222
| Text | PostgreSQL version of CSV, with `<tab>` (`\t`) as the delimiter. | `hello\tworld\t1\t2\t3` |
2323
| Binary | PostgreSQL-specific format that encodes data using the format used to store it on disk. | |
2424

25-
Each row is extracted from the data stream, inspected for the sharding key, and sent to a data node. The sharding key should be specified in the [configuration](../../configuration/pgdog.toml/sharded_tables.md) and provided in the command statement, for example:
25+
Each row is extracted from the data stream, inspected for the sharding key, and sent to a data node. The sharding key should be specified in the [configuration](../../../configuration/pgdog.toml/sharded_tables.md) and provided in the command statement, for example:
2626

2727
```postgresql
2828
COPY users (id, email) FROM STDIN;
@@ -36,7 +36,19 @@ By using _N_ nodes in a sharded database cluster, the performance of `COPY` incr
3636

3737
The cost of parsing and sharding the CSV stream in PgDog is negligibly small.
3838

39+
## COPY out
40+
41+
All `COPY [...] TO STDOUT` statements are treated as cross-shard and are executed on all shards concurrently. The rows are streamed directly, without buffering or sorting, which allows reading large amounts of data from all shards quickly.
42+
43+
PgDog doesn't currently support routing `COPY` statements based on its query. For example, the following statement will be sent to all shards even if it contains a sharding key:
44+
45+
```postgresql
46+
COPY (SELECT * FROM users WHERE id IN ($1, $2, $3)) TO STDOUT;
47+
```
48+
49+
If the query fetches rows from more than one shard, PgDog will also ignore any `ORDER BY` predicates and return rows in whatever order they arrive from the shards.
50+
3951
## Read more
4052

41-
- [Two-phase commit](2pc.md)
42-
- [Omnisharded tables](omnishards.md)
53+
- [Two-phase commit](../2pc.md)
54+
- [Omnisharded tables](../omnishards.md)
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
icon: material/table-cog
3+
---
4+
5+
# CREATE, ALTER, DROP
6+
7+
`CREATE`, `ALTER` and `DROP`, also known as **D**ata **D**efinition **L**anguage (DDL), are, by design, cross-shard statements. When a client sends over a DDL command, PgDog will send it to all shards in parallel, ensuring the table, index, view and sequence definitions are identical across the database cluster.
8+
9+
## Atomicity
10+
11+
DDL statements should be atomic across all shards. This is to protect against a single shard failing to create a table or index, which could result in an inconsistent schema. PgDog can use [two-phase commit](../2pc.md) to ensure this is the case, however that means that all DDL statements must be executed inside a transaction, for example:
12+
13+
```postgresql
14+
BEGIN;
15+
CREATE TABLE users (
16+
id BIGSERIAL PRIMARY KEY,
17+
email VARCHAR NOT NULL,
18+
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
19+
);
20+
COMMIT;
21+
```
22+
23+
## Idempotency
24+
25+
Some statements, like `CREATE INDEX CONCURRENTLY`, cannot run inside transactions. To make sure these are safely executed, you have two options: use [manual routing](../manual-routing.md) and execute it on each shard individually, or write idempotent schema migrations, for example:
26+
27+
```postgresql
28+
DROP INDEX IF EXISTS user_id_idx;
29+
CREATE INDEX CONCURRENTLY user_id_idx ON users USING btree(user_id);
30+
```
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
icon: material/multicast
3+
---
4+
5+
# Cross-shard queries overview
6+
7+
If a client can't or doesn't specify a sharding key in the query, PgDog will send that query to all shards in parallel, and combine the results automatically. To the client, this looks like the query was executed by a single database.
8+
9+
<center style="margin-top: 2rem;">
10+
<img src="/images/cross-shard.png" width="95%" alt="Cross-shard queries" />
11+
</center>
12+
13+
## How it works
14+
15+
PgDog understands the Postgres protocol and query language. It can connect to multiple database servers, send the query to all of them, and collect [`DataRow`](#under-the-hood) messages as they are returned by each connection.
16+
17+
Once all servers finish executing the request, PgDog processes the result, performs any requested sorting, aggregation or row disambiguation, and sends the complete result back to the client, as if all rows came from one database server.
18+
19+
Just like with [direct-to-shard](../query-routing.md) queries, each SQL command is handled differently, as documented below:
20+
21+
- [`SELECT`](select.md)
22+
- [`INSERT`](insert.md)
23+
- [`UPDATE`, `DELETE`](update.md)
24+
- [`CREATE`, `ALTER`, `DROP`](ddl.md) (and other DDL statements)
25+
- [`COPY`](copy.md)
26+
27+
28+
## Under the hood
29+
30+
PgDog implements the PostgreSQL wire protocol, which is well documented and stable. The messages sent by Postgres clients and servers contain all the necessary information about data types, column names and executed statements, which PgDog can use to present multi-database results as a single stream of data.
31+
32+
The following protocol messages are especially relevant:
33+
34+
| Message | Description |
35+
|-|-|
36+
| `DataRow` | Each `DataRow` message contains one tuple, for each row returned by the query. |
37+
| `RowDescription` | This message has the column names and data types returned by the query. |
38+
| `CommandComplete` | Indicates that the query has finished returning results. PgDog uses it to start sorting and aggregation. |
39+
40+
The protocol has two formats for encoding tuples: text and binary. Text format is equivalent to calling the `to_string()` method on native types, while binary encoding sends them in network-byte order. For example:
41+
42+
=== "Data"
43+
```postgresql
44+
SELECT 1::bigint, 2::integer, 'three'::VARCHAR;
45+
```
46+
=== "Encoding"
47+
| Data type | Text | Binary |
48+
|-|-|-|
49+
| `BIGINT` | `"1"` | `00 00 00 00 00 00 00 01` |
50+
| `INTEGER` | `"2"` | `00 00 00 02` |
51+
| `VARCHAR` | `"three"` | `three` |
52+
53+
Since PgDog needs to process rows before sending them to the client, we implemented parsing both formats for [most data types](select.md#supported-data-types).
54+
55+
56+
## Disabling cross-shard queries
57+
58+
If you don't want PgDog to route cross-shard queries, e.g., because you have a [multitenant](../../multi-tenancy.md) system with no interdependencies, cross-shard queries can be disabled with a configuration setting:
59+
60+
```toml
61+
[general]
62+
cross_shard_disabled = true
63+
```
64+
65+
When this setting is enabled and a query doesn't have a sharding key, instead of executing the query, PgDog will return an error and abort the transaction.
66+
67+
## Read more
68+
69+
- [Sharding functions](../sharding-functions.md)
70+
- [Cross-shard `SELECT`](select.md)
71+
- [Cross-shard `INSERT`](insert.md)
72+
- [Cross-shard `UPDATE` and `DELETE`](update.md)
73+
- [DDL, e.g., `CREATE TABLE`](ddl.md)
74+
- [`COPY` command](copy.md)

0 commit comments

Comments
 (0)