表分区

表分区是一种技术，允许您将大表划分为更小、更易管理的部分，称为"分区"。

每个分区包含基于特定条件的数据子集，例如值范围或特定条件。分区可以显著提高查询性能，并简化大型数据集的数据管理。

表分区的优势

提升查询性能：允许查询针对特定分区，减少扫描数据量并提高查询执行速度。
可扩展性：通过分区，您可以根据数据增长或变化添加或删除分区，实现更好的可扩展性和灵活性。
高效数据管理：通过操作较小的分区而非整个表，简化数据加载、归档和删除等任务。
优化维护操作：可以优化清理(vacuuming)和索引操作，使维护任务更快速。

分区方法

Postgres 支持多种基于数据划分方式的分区方法，常用的方法包括：

范围分区：根据指定的值范围将数据划分为多个分区。例如，可以按日期对销售表进行分区，每个分区代表特定的时间范围（如每月一个分区）。
列表分区：根据指定的值列表将数据划分为多个分区。例如，可以按地区对客户表进行分区，每个分区包含来自特定地区的客户（如美国客户一个分区，欧洲客户另一个分区）。
哈希分区：使用哈希函数将数据均匀分布到各个分区。这种方法可以实现数据在分区间的均衡分布，有助于负载均衡。但缺点是无法基于特定值直接查询。

创建分区表

让我们以销售表为例，基于订单日期进行范围分区，创建按月存储数据的分区：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
create table sales (    id bigint generated by default as identity,    order_date date not null,    customer_id bigint,    amount bigint,    -- 必须将所有分区列包含在约束中：    primary key (order_date, id))partition by range (order_date);create table sales_2000_01	partition of sales  for values from ('2000-01-01') to ('2000-02-01');create table sales_2000_02	partition of sales	for values from ('2000-02-01') to ('2000-03-01');

要创建分区表，需要在表创建语句后添加 partition by range (<列名>)。用于分区的列_必须_包含在任何唯一索引中，这就是为什么我们在此指定了一个复合主键（primary key (order_date, id)）。

查询分区表

查询分区表时，您有两种选择：

查询父表
查询特定分区

查询父表

当您查询父表时，Postgres会根据查询中指定的条件自动将查询路由到相关分区。这使您可以同时从所有分区检索数据。

示例：

1
2
3
select *from saleswhere order_date >= '2000-01-01' and order_date < '2000-03-01';

此查询将从 sales_2000_01 和 sales_2000_02 两个分区检索数据。

查询特定分区

如果只需要从特定分区检索数据，可以直接查询该分区而非父表。当您需要针对分区内的特定范围或条件时，这种方法非常有用。

1
2
select *from sales_2000_02;

此查询将仅从 sales_2000_02 分区检索数据。

何时对表进行分区

没有明确的阈值来确定何时应该使用分区。分区会引入复杂性，在真正需要之前应避免不必要的复杂性。以下是一些指导原则：

如果考虑性能因素，在非分区表出现性能下降之前，不要使用分区
如果将分区作为管理工具使用，可以随时创建分区
如果您不知道应该如何分区数据，那么可能还为时过早

示例

以下是Postgres中每种分区类型的简单示例。

范围分区

让我们考虑一个基于订单日期存储销售数据的范围分区示例。我们将创建按月分区的表来存储每个月的销售数据。

在这个示例中，sales 表被分区为两个分区：sales_january 和 sales_february。这些分区中的数据基于指定的订单日期范围：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
create table sales (    id bigint generated by default as identity,    order_date date not null,    customer_id bigint,    amount bigint,    -- 我们需要将所有分区列    -- 包含在约束条件中：    primary key (order_date, id))partition by range (order_date);create table sales_2000_01	partition of sales  for values from ('2000-01-01') to ('2000-02-01');create table sales_2000_02	partition of sales	for values from ('2000-02-01') to ('2000-03-01');

列表分区

让我们考虑一个基于客户所在地区存储客户数据的列表分区示例。我们将创建不同地区的客户数据分区。

在这个示例中，customers 表被分区为两个分区：customers_americas 和 customers_asia。这些分区中的数据基于指定的地区列表：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- 创建分区表create table customers (    id bigint generated by default as identity,    name text,    country text,    -- 我们需要将所有分区列    -- 包含在约束条件中：    primary key (country, id))partition by list(country);create table customers_americas	partition of customers	for values in ('US', 'CANADA');create table customers_asia	partition of customers  for values in ('INDIA', 'CHINA', 'JAPAN');

哈希分区

您可以使用哈希分区来均匀分布数据。

在这个示例中，products 表被分成两个分区：products_one 和 products_two。数据通过哈希函数分布到这些分区中：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
create table products (    id bigint generated by default as identity,    name text,    category text,    price bigint)partition by hash (id);create table products_one	partition of products  for values with (modulus 2, remainder 1);create table products_two	partition of products  for values with (modulus 2, remainder 0);

其他工具

Postgres 分区还有几个其他可用工具，最著名的是 pg_partman。原生分区功能在 Postgres 10 中引入，通常被认为具有更好的性能。