PostgreSQL 高可用集群部署指南

Ne0inhk

22 Mar 2026 — 11 min read

PostgreSQL 高可用集群部署指南

概述

本指南基于 Patroni + etcd 方案，提供完整的 PostgreSQL 高可用集群部署流程。该方案支持自动故障转移、读写分离、负载均衡等功能，确保数据库的高可用性和可扩展性。

架构组件

PostgreSQL 12.17: 核心数据库
Patroni: 集群管理和自动故障转移
etcd: 分布式键值存储，作为集群状态存储
HAProxy: 负载均衡器和读写分离
Keepalived: 虚拟IP漂移和高可用

集群规划

本部署使用 3 节点架构：

node1: 192.168.95.139
node2: 192.168.95.140
node3: 192.168.95.141

环境准备

1. 系统要求

CentOS 7.x
Python 3.6+
3 台物理机或虚拟机
网络互通，防火墙开放必要端口

2. 升级 Python 环境

# 参考 "Centos7设置python3.6 环境"# 安装 Python 3.6 yum install -y python3 python3-pip # 设置默认 Python 版本 alternatives --install /usr/bin/python python /usr/bin/python3 60

3. 创建必要目录

# 在所有节点执行mkdir -p /u1/pgsql/data # PostgreSQL 数据目录mkdir -p /etc/patroni/ # Patroni 配置目录mkdir -p /tmp/patroni02 # Patroni 日志目录

组件安装和配置

1. 安装和配置 etcd 集群

1.1 安装 etcd

# 在所有节点执行 yum install etcd -y

1.2 配置 etcd 集群

节点 1 配置 (/etc/etcd/etcd.conf)：

ETCD_DATA_DIR="/var/lib/etcd/codis.etcd"ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"ETCD_NAME="node1"ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.95.139:2380"ETCD_ADVERTISE_CLIENT_URLS="http://192.168.95.139:2379"ETCD_INITIAL_CLUSTER="node1=http://192.168.95.139:2380,node2=http://192.168.95.140:2380,node3=http://192.168.95.141:2380"ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"ETCD_INITIAL_CLUSTER_STATE="new"

节点 2 配置 (/etc/etcd/etcd.conf)：

ETCD_DATA_DIR="/var/lib/etcd/codis.etcd"ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"ETCD_NAME="node2"ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.95.140:2380"ETCD_ADVERTISE_CLIENT_URLS="http://192.168.95.140:2379"ETCD_INITIAL_CLUSTER="node1=http://192.168.95.139:2380,node2=http://192.168.95.140:2380,node3=http://192.168.95.141:2380"ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"ETCD_INITIAL_CLUSTER_STATE="new"

节点 3 配置 (/etc/etcd/etcd.conf)：

ETCD_DATA_DIR="/var/lib/etcd/codis.etcd"ETCD_LISTEN_PEER_URLS="http://0.0.0.0:2380"ETCD_LISTEN_CLIENT_URLS="http://0.0.0.0:2379"ETCD_NAME="node3"ETCD_INITIAL_ADVERTISE_PEER_URLS="http://192.168.95.141:2380"ETCD_ADVERTISE_CLIENT_URLS="http://192.168.95.141:2379"ETCD_INITIAL_CLUSTER="node1=http://192.168.95.139:2380,node2=http://192.168.95.140:2380,node3=http://192.168.95.141:2380"ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"ETCD_INITIAL_CLUSTER_STATE="new"

1.3 启动 etcd 服务

# 在所有节点执行 systemctl daemon-reload systemctl enable etcd systemctl start etcd # 验证 etcd 成员列表 etcdctl member list # 检查集群健康状态 etcdctl --endpoints http://192.168.95.141:2379 cluster-health

1.4 安装 etcd-viewer（可选）

docker pull dontpanic57/etcd-viewer docker run -d -p 8080:8080 dontpanic57/etcd-viewer

2. 安装 PostgreSQL 12.17

2.1 离线安装 PostgreSQL

下载安装包（注意版本号一致性）：

postgresql12-libs-12.17-1PGDG.rhel7.x86_64.rpm
postgresql12-12.17-1PGDG.rhel7.x86_64.rpm
postgresql12-server-12.17-1PGDG.rhel7.x86_64.rpm
postgresql12-contrib-12.17-1PGDG.rhel7.x86_64.rpm

按顺序安装：

# 在所有节点执行，按依赖顺序安装rpm -ivh postgresql12-libs-12.17-1PGDG.rhel7.x86_64.rpm rpm -ivh postgresql12-12.17-1PGDG.rhel7.x86_64.rpm rpm -ivh postgresql12-server-12.17-1PGDG.rhel7.x86_64.rpm rpm -ivh postgresql12-contrib-12.17-1PGDG.rhel7.x86_64.rpm

解决依赖问题：

# 如遇依赖问题，先在外网下载依赖包 yum install --downloadonly --downloaddir=/pg12_rpm libicu # 上传至内网服务器后安装 yum install -y libicu-50.2-4.el7_7.x86_64.rpm

2.2 配置 PostgreSQL 数据目录

# 在所有节点执行chown postgres:postgres /u1/pgsql chmod -R 700 /u1/pgsql

3. 安装和配置 Patroni

3.1 安装 Patroni

# 安装 Python 依赖 pip install psycopg2-binary -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com pip3 install patroni[etcd] -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com # 验证安装which patroni

常见问题解决：

# 如遇 "python setup.py egg_info" 错误 yum remove python-pip # 到官网重新下载安装 python-pip# 创建 pip 软链接ln -sf /usr/local/python3/bin/pip /usr/bin/pip # 安装缺失组件 yum install python3-pip python3 python2 python2-pip

3.2 创建 .pgpass 文件

# 在所有节点执行，切换到 postgres 用户su - postgres echo"192.168.95.139:5432:postgres:postgres:postgres123"> ~/.pgpass echo"192.168.95.140:5432:postgres:postgres:postgres123">> ~/.pgpass echo"192.168.95.141:5432:postgres:postgres:postgres123">> ~/.pgpass chmod600 ~/.pgpass exit

3.3 创建 Patroni 配置文件

节点 1 配置 (/etc/patroni/patroni_postgresql.yml)：

scope: postgresql12 namespace: /postgresql/ name: node1 # 日志配置log:level: INFO traceback_level: ERROR dir: /tmp/patroni02 file_num:10file_size:104857600# REST API 配置restapi:listen: 192.168.95.139:8008connect_address: 192.168.95.139:8008# etcd 配置etcd:host: 192.168.95.139:2379host: 192.168.95.140:2379host: 192.168.95.141:2379# 集群初始化配置bootstrap:dcs:ttl:30loop_wait:10retry_timeout:10maximum_lag_on_failover:1048576master_start_timeout:300synchronous_mode:falsepostgresql:use_pg_rewind:trueuse_slots:trueparameters:listen_addresses:"*"wal_level: logical hot_standby:"on"wal_keep_segments:10max_wal_senders:10max_replication_slots:10wal_log_hints:"on"initdb:-encoding: UTF8 -locale: C -lc-ctype: zh_CN.UTF-8- data-checksums pg_hba:- host replication postgres 0.0.0.0/0 md5 - host all all 0.0.0.0/0 md5 # PostgreSQL 配置postgresql:listen: 0.0.0.0:5432connect_address: 192.168.95.139:5432data_dir: /u1/pgsql/data bin_dir: /usr/pgsql-12/bin config_dir: /u1/pgsql/data pgpass: /var/lib/pgsql/.pgpass authentication:replication:username: postgres password: postgres123 superuser:username: postgres password: postgres123 rewind:username: postgres password: postgres123 # 标签配置tags:nofailover:falsenoloadbalance:falseclonefrom:falsenosync:false

节点 2 配置（修改对应IP为192.168.95.140）和节点 3 配置（修改对应IP为192.168.95.141）同理，只需更改 name、listen、connect_address 和 pgpass 配置。

3.4 设置权限和目录

# 在所有节点执行chown -R postgres.postgres /etc/patroni/ chmod600 /etc/patroni/patroni_postgresql.yml chown -R postgres.postgres /tmp/patroni02

3.5 创建 Patroni 系统服务

创建服务文件 (/usr/lib/systemd/system/patroni.service)：

[Unit] Description=High availability PostgreSQL Cluster After=syslog.target network.target [Service] Type=simple User=postgres Group=postgres ExecStart=/usr/local/bin/patroni /etc/patroni/patroni_postgresql.yml KillMode=process TimeoutSec=30 Restart=no [Install] WantedBy=multi-user.target

启动 Patroni 服务：

# 在所有节点执行 systemctl daemon-reload systemctl enable patroni systemctl start patroni

3.6 验证集群状态

# 查看集群状态 patronictl -c /etc/patroni/patroni_postgresql.yml list # 如遇 Python 依赖错误 yum install python3-psycopg2

成功状态示例：

+ Cluster: postgresql12 (6965741508141313244) ---------+----+-----------+ | Member | Host | Role | State | TL | Lag in MB | +-----------+------------------+---------+---------+----+-----------+ | node1 | 192.168.95.139 | Leader | running | 1 | | | node2 | 192.168.95.140 | Replica | running | 1 | 0.0 | | node3 | 192.168.95.141 | Replica | running | 1 | 0.0 | +-----------+------------------+---------+---------+----+-----------+

3.7 手动故障转移测试

# 手动切换 Leader（测试用） patronictl -c /etc/patroni/patroni_postgresql.yml switchover

高可用组件配置

1. 安装和配置 HAProxy

1.1 安装 HAProxy

# 在所有节点执行 yum install -y haproxy

1.2 配置 HAProxy

配置文件 (/etc/haproxy/haproxy.cfg)：

global maxconn 100 log 127.0.0.1 local2 defaults log global mode tcp retries 2timeout client 30m timeout connect 4s timeout server 30m timeout check 5s # 统计页面 listen stats mode http bind *:7000 stats enable stats uri / # 主库读写服务 listen pgsql mode tcp bind *:5000 option httpchk http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server postgresql_192.168.95.139_5432 192.168.95.139:5432 maxconn 100 check port 8008 server postgresql_192.168.95.140_5432 192.168.95.140:5432 maxconn 100 check port 8008 server postgresql_192.168.95.141_5432 192.168.95.141:5432 maxconn 100 check port 8008# 备库只读服务 listen pgsql_read mode tcp bind *:6000 option httpchk GET /replica http-check expect status 200 default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions server postgresql_192.168.95.139_5432 192.168.95.139:5432 maxconn 100 check port 8008 server postgresql_192.168.95.140_5432 192.168.95.140:5432 maxconn 100 check port 8008 server postgresql_192.168.95.141_5432 192.168.95.141:5432 maxconn 100 check port 8008

1.3 启动 HAProxy

systemctl enable haproxy systemctl start haproxy # 解决端口绑定问题（SELinux） setsebool -P haproxy_connect_any=1

2. 安装和配置 Keepalived

2.1 安装 Keepalived

# 在 node1 和 node2 上安装 yum install -y keepalived

2.2 配置 Keepalived

node1 配置 (/etc/keepalived/keepalived.conf)：

global_defs { router_id LVS_DEVEL } vrrp_script check_haproxy { script "pgrep -x haproxy" interval 2 weight 10} vrrp_instance VI_1 { state BACKUP # 备服务器状态 interface ens33 # 网络接口，根据实际修改 virtual_router_id 210 priority 90# 优先级，node1 设置为 90 advert_int 1 track_script { check_haproxy } virtual_ipaddress {192.168.95.142 # 虚拟 IP}}

node2 配置（将 priority 改为 80，其他相同）

2.3 启动 Keepalived

systemctl enable keepalived systemctl start keepalived # 验证虚拟 IP 配置ip addr show dev ens33

数据库连接配置

1. 应用数据库配置

根据您的项目 application-dev.yml，当前使用 Oracle 数据库。要切换到 PostgreSQL 高可用集群，需要修改配置：

spring:datasource:# HAProxy 代理的读写分离配置url: jdbc:postgresql://192.168.95.142:5000/postgres username: postgres password: postgres123 # 多主机 URL 配置（备选方案）# 主库连接# url: jdbc:postgresql://192.168.95.139:5432,192.168.95.140:5432,192.168.95.141:5432/postgres?targetServerType=primary# 读写分离配置# url: jdbc:postgresql://192.168.95.139:5432,192.168.95.140:5432,192.168.95.141:5432/postgres?targetServerType=preferSecondary&loadBalanceHosts=true

2. 连接字符串配置选项

2.1 JDBC 多主机 URL

连接主库（可写节点）：

jdbc:postgresql://192.168.95.139:5432,192.168.95.140:5432,192.168.95.141:5432/postgres?targetServerType=primary

优先连接备库：

jdbc:postgresql://192.168.95.139:5432,192.168.95.140:5432,192.168.95.141:5432/postgres?targetServerType=preferSecondary&loadBalanceHosts=true

随机连接任意节点：

jdbc:postgresql://192.168.95.139:5432,192.168.95.140:5432,192.168.95.141:5432/postgres?targetServerType=any&loadBalanceHosts=true

2.2 HAProxy 代理配置

读写分离访问：

写操作: 192.168.95.142:5000
读操作: 192.168.95.142:6000
状态监控: http://192.168.95.142:7000

高级配置

1. 级联复制配置

1.1 集群内部级联复制

在备库节点添加以下配置：

tags:replicatefrom: node2 # 指定从 node2 复制数据

1.2 集群间级联复制

创建灾备集群配置：

bootstrap:dcs:standby_cluster:host: 192.168.234.210 # 上游集群地址port:5432primary_slot_name: slot1 create_replica_methods:- basebackup

2. 复制槽配置

在主集群配置中添加复制槽：

bootstrap:dcs:slots:slot1:type: physical

监控和维护

1. 集群状态监控

1.1 Patroni 状态查询

# 查看集群详细信息 patronictl -c /etc/patroni/patroni_postgresql.yml list # 查看成员详情 patronictl -c /etc/patroni/patroni_postgresql.yml query # 查看配置 patronictl -c /etc/patroni/patroni_postgresql.yml show

1.2 PostgreSQL 状态监控

# 连接数据库查看复制状态 psql -h 192.168.95.139 -U postgres -c "SELECT * FROM pg_stat_replication;"

2. 日志查看

# Patroni 日志tail -f /tmp/patroni02/patroni.log # PostgreSQL 日志tail -f /u1/pgsql/data/log/postgresql-*.log # etcd 日志 journalctl -u etcd -f

3. 性能优化配置

在 Patroni 配置的 parameters 部分添加优化参数：

postgresql:parameters:# 内存优化shared_buffers: 256MB effective_cache_size: 1GB work_mem: 4MB maintenance_work_mem: 64MB # 连接优化max_connections:100# WAL 配置wal_buffers: 16MB checkpoint_completion_target:0.9# 查询优化random_page_cost:1.1effective_io_concurrency:200

故障处理

1. 常见问题解决

1.1 Patroni 启动失败

问题: .pgpass 文件路径错误

# 检查配置文件中的路径设置 pgpass: /var/lib/pgsql/.pgpass # 正确路径

1.2 集群初始化失败

# 清理数据目录重新初始化rm -rf /u1/pgsql/data/* systemctl restart patroni

1.3 HAProxy 启动失败

问题: 端口绑定失败

# 检查 SELinux 设置 setsebool -P haproxy_connect_any=1# 或关闭 SELinux setenforce 0

1.4 Keepalived 虚拟IP不生效

# 检查网络接口配置ip addr show dev ens33 # 检查防火墙设置 firewall-cmd --add-rich-rule='rule family="ipv4" destination address="192.168.95.142" accept'

2. 应急处理流程

2.1 主库故障处理

等待自动故障转移: Patroni 会自动处理

手动故障转移（如需要）:

patronictl -c /etc/patroni/patroni_postgresql.yml switchover

确认故障: 检查集群状态

patronictl -c /etc/patroni/patroni_postgresql.yml list

2.2 节点恢复流程

修复节点: 解决硬件或系统问题
验证同步状态: 确认数据同步完成

重启 Patroni:

systemctl restart patroni

2.3 数据恢复流程

# 从备份恢复数据sudo -u postgres pg_ctl -D /u1/pgsql/data stop rm -rf /u1/pgsql/data/* # 恢复备份文件 systemctl start patroni

备份和恢复

1. 自动备份配置

创建备份脚本 (/usr/local/bin/pg_backup.sh)：

#!/bin/bash# PostgreSQL 高可用集群备份脚本# 备份配置BACKUP_DIR="/backup/pg"DATE=$(date +%Y%m%d_%H%M%S)RETENTION_DAYS=7# 创建备份目录mkdir -p $BACKUP_DIR# 连接到主库进行备份 psql -h 192.168.95.139 -U postgres -c "SELECT pg_start_backup('backup_$DATE');"# 备份数据文件tar -czf $BACKUP_DIR/pg_data_$DATE.tar.gz -C /u1/pgsql/data .# 结束备份 psql -h 192.168.95.139 -U postgres -c "SELECT pg_stop_backup();"# 清理旧备份find$BACKUP_DIR -name "pg_data_*.tar.gz" -mtime +$RETENTION_DAYS -delete echo"Backup completed: pg_data_$DATE.tar.gz"

设置定时任务：

chmod +x /usr/local/bin/pg_backup.sh # 每天凌晨2点执行备份echo"0 2 * * * /usr/local/bin/pg_backup.sh"|crontab -

2. 恢复操作

# 停止 Patroni systemctl stop patroni # 备份当前数据mv /u1/pgsql/data /u1/pgsql/data.backup # 恢复数据mkdir -p /u1/pgsql/data tar -xzf /backup/pg/pg_data_YYYYMMDD_HHMMSS.tar.gz -C /u1/pgsql/data # 设置权限chown -R postgres:postgres /u1/pgsql/data chmod700 /u1/pgsql/data # 启动 Patroni systemctl start patroni

总结

本部署指南提供了完整的 PostgreSQL 高可用集群部署方案，包含以下核心功能：

✅ 高可用性: 基于 Patroni 的自动故障转移
✅ 读写分离: HAProxy 代理的读写流量分离
✅ 负载均衡: 多节点负载分发
✅ 虚拟IP: Keepalived 提供的透明高可用
✅ 监控界面: HAProxy 统计页面
✅ 扩展性: 支持动态添加节点

生产环境注意事项

安全配置:
- 启用 SSL/TLS 加密
- 配置防火墙规则
- 设置强密码策略
性能优化:
- 根据硬件配置调整参数
- 监控系统性能指标
- 定期进行性能评估
容灾备份:
- 配置异地备份
- 定期演练恢复流程
- 监控备份完整性
维护计划:
- 定期更新系统补丁
- 监控磁盘空间使用
- 检查系统日志异常

通过以上配置，您可以获得一个稳定、高可用的 PostgreSQL 集群环境。