postgresql的hot standby(replication stream)

PG在9.*版本后热备提供了新的一个功能，那就是Stream Replication的读写分离,是PG高可用性的一个典型应用，也就是我们传统意义上说的Hot-Standby，比如Oracle的DG，mssql的mirror以及Mysql的读写分离等，与其他数据库相比较，有相同点，也有不同点，这些后述。下面是PG的流复制的安装步骤以及测试。

环境：
Vmware Workstation 8.0
操作系统:CentOS 6.2
数据库：PostgreSQL 9.1.3

虚拟主机2台
MASTER: 192.168.2.130
SLAVE: 192.168.2.129

环境参数
[postgres@localhost ~]$ echo $PGHOME
/home/postgres

[postgres@localhost ~]$ echo $PGDATA
/database/pgdata

Step1：安装PG数据库
略，slave端可以只装数据库，不初始化数据库

Step2：创建流复制用户

master端执行: CREATE USER repuserreplication LOGIN CONNECTION LIMIT3 ENCRYPTED PASSWORD 'repuser';

Step3：配置Master端的访问文件pg_hba.conf

增加一行：
host replication repuser 192.168.2.129/16 md5

Step4：配置MASTER端配置文件

max_wal_senders = 1
wal_level = hot_standby
archive_mode = on
archive_command = 'cd ./'

hot_standby = on
wal_keep_segments = 64

注：max_wal_senders是Slave库的节点数，有多少个slave库就设多少,
wal_level是write ahead log参数值,设置流复制务必将此值更新成hot_standby
wal_keep_segments默认值是16,是PG_XLOG下的日志文件数相关参数
archive也可以选择关闭，归档是定时恢复用的，流复制不是必须的

Step5：主库备份(Master端)
5.1：开启文件备份,前提是wal_level参数值必须是archive或者hot_standby,前面已做
select pg_start_backup('Replition work');

5.2：拷贝$PGDATA文件，并复制到Slave服务器上,排除pg_xlog内容,因为在slave还是要被清掉的，另外注意TAR打包时的权限问题，本次打包时遇到一个问题:

tar报错

tar (child): Cannot open: Permission denied
tar: Error is not recoverable: exiting now,后来将/database的权限授予了chown postgres解决.

打包
tar czvf pgdata.tar.gz pgdata --exclude=pgdata/pg_xlog

备机端如果已经安装了postgres数据库，数据文件夹名称一样的话，可以先停掉备机数据库，改名数据文件夹名称
mv pgdata pgdata.old

远程拷贝至slave端并在备机端解压
scp pgdata.tar.gz root@192.168.2.129:/database/
tar xzvf pgdata.tar.gz

5.3:上述步骤完成后,结束master端的备份

select pg_stop_backup(),current_timestamp;

Step6：修改Slave端配置信息

6.1：postgresql.conf文件
hot_standby = on

6.2：recovery.conf文件
$ cp $PGHOME/share/recovery.conf.sample $PGDATA/recovery.conf

$ vi recovery.conf --新增以下三行

standby_mode = 'on'

trigger_file = '/database/pgdata/postgresql.trigger.1949'

primary_conninfo = 'host=192.168.2.130 port=1949 user=repuser password=repuser keepalives_idle=60'

6.3：配置.pgpass文件(slave端)
新增slave访问master的密码文件，可以不用输密码
192.168.2.130:1949:postgres:repuser:repuser

6.4：删除slave端(从master端拷过来的)的pid文件和pg_xlog
$ rm -rf $PGDATA/pg_xlog
$ rm -f $PGDATA/postmaster.pid
$ mkdir $PGDATA/pg_xlog

Step7：启动Slave库
正常启动备库(pg_ctl -D $PGDATA -l pg.log start),有异常可以看log
复制完成后，可以通过CSV日志去查看，本处未设，直接查看进程

7.1 查看master进程：

[postgres@localhost ~]$ ps -ef|grep postgres
root 2454 2438 0 20:25 pts/0 00:00:00 su - postgres
postgres 2461 2454 0 20:25 pts/0 00:00:00 -bash
postgres 2535 1 0 20:26 pts/1 00:00:00 /home/postgres/bin/postgres -D /database/pgdata
postgres 2537 2535 0 20:26 ? 00:00:00 postgres: writer process
postgres 2538 2535 0 20:26 ? 00:00:00 postgres: wal writer process
postgres 2539 2535 0 20:26 ? 00:00:00 postgres: autovacuum launcher process
postgres 2540 2535 0 20:26 ? 00:00:00 postgres: archiver process
postgres 2541 2535 0 20:26 ? 00:00:00 postgres: stats collector process
postgres 3079 2535 0 21:56 ? 00:00:00 postgres: wal sender process repuser 192.168.2.129(45446) streaming 0/C01EDB8
postgres 3116 2535 0 22:02 ? 00:00:00 postgres: postgres postgres 192.168.2.1(52648) idle
postgres 3118 2535 0 22:02 ? 00:00:00 postgres: postgres test 192.168.2.1(52649) idle
postgres 3120 2535 0 22:02 ? 00:00:00 postgres: postgres test 192.168.2.1(52654) idle
root 3126 2490 0 22:04 pts/1 00:00:00 su - postgres
postgres 3214 3128 0 22:16 pts/1 00:00:00 grep postgres
postgres 3128 3126 0 22:04 pts/1 00:00:00 -bash
postgres 3213 3128 2 22:16 pts/1 00:00:00 ps -ef

7.2 查看slave进程：

[postgres@localhost ~]$ ps -ef|grep postgres
postgres 2856 1 0 21:54 pts/2 00:00:00 /home/postgres/bin/postgres -D /database/pgdata
postgres 2858 2856 0 21:54 ? 00:00:00 postgres: startup process recovering 000000010000000000000003
postgres 2859 2856 0 21:54 ? 00:00:00 postgres: writer process
postgres 2860 2856 0 21:54 ? 00:00:00 postgres: stats collector process
postgres 2899 2856 0 21:56 ? 00:00:00 postgres: wal receiver process streaming 0/C01ED28
postgres 3007 2856 0 22:02 ? 00:00:00 postgres: postgres postgres 192.168.2.1(52652) idle
postgres 3013 2856 0 22:03 ? 00:00:00 postgres: postgres test 192.168.2.1(52657) idle
postgres 3014 2856 0 22:03 ? 00:00:00 postgres: postgres test 192.168.2.1(52658) idle
root 3020 2756 0 22:04 pts/2 00:00:00 su - postgres
postgres 3022 3020 0 22:04 pts/2 00:00:00 -bash
postgres 3091 3022 4 22:15 pts/2 00:00:00 ps -ef
postgres 3092 3022 0 22:15 pts/2 00:00:00 grep postgres

此时在slave端的pg_xlog下面也产生了日志文件,并且之前pg_start_backup生成的文件名也变成了old的了.

查看日志内容：

[postgres@localhost ~]$ more pgsql.log

LOG: database system was shut down in recovery at 2012-04-23 18:33:25 PDT

LOG: entering standby mode

LOG: streaming replication successfully connected to primary

LOG: redo starts at 0/8000020

LOG: consistent recovery state reached at 0/C000000

LOG: database system is ready to accept read only connections

Step8：测试

On Master:
test=# select * from kenyon;

id | name

----+--------

2 | kenyon

(1 row)

test=# insert into kenyon values (2,'kenyon testing data');

INSERT 0 1

test=#

On Slave：

test=# select * from kenyon;

id | name

----+---------------------

2 | kenyon

2 | kenyon testing data

(2 rows)

test=# delete from kenyon where id = 2;

ERROR: cannot execute DELETE in a read-only transaction

新增的数据已经传输过去，并且slave端的会话是只读的。

postgresql的hot standby(replication stream)

相关文章