redis-trib.rb命令详解

redis-trib.rb是官方提供的Redis Cluster的管理工具，无需额外下载，默认位于源码包的src目录下，但因该工具是用ruby开发的，所以需要准备相关的依赖环境。

准备redis-trib.rb的运行环境

wget https://cache.ruby-lang.org/pub/ruby/2.5/ruby-2.5.1.tar.gz

yum -y install zlib-devel

tar xvf ruby-2.5.1.tar.gz

cd ruby-2.5.1/

./configure -prefix=/usr/local/ruby

make

make install

cd /usr/local/ruby/

cp bin/ruby /usr/local/bin

cp bin/gem /usr/local/bin

安装rubygem redis依赖

wget http://rubygems.org/downloads/redis-3.3.0.gem

gem install -l redis-3.3.0.gem

redis-trib.rb支持的操作

# redis-trib.rb help
Usage: redis-trib <command> <options> <arguments ...>

  create          host1:port1 ... hostN:portN
                  --replicas <arg>
  check           host:port
  info            host:port
  fix             host:port
                  --timeout <arg>
  reshard         host:port
                  --from <arg>
                  --to <arg>
                  --slots <arg>
                  --yes
                  --timeout <arg>
                  --pipeline <arg>
  rebalance       host:port
                  --weight <arg>
                  --auto-weights
                  --use-empty-masters
                  --timeout <arg>
                  --simulate
                  --pipeline <arg>
                  --threshold <arg>
  add-node        new_host:new_port existing_host:existing_port
                  --slave
                  --master-id <arg>
  del-node        host:port node_id
  set-timeout     host:port milliseconds
  call            host:port command arg arg .. arg
  import          host:port
                  --from <arg>
                  --copy
                  --replace
  help            (show this help)

For check,fix,reshard,del-node,set-timeout you can specify the host and port of any working node in the cluster.

支持的操作如下：

1. create：创建集群

2. check：检查集群

3. info：查看集群信息

4. fix：修复集群

5. reshard：在线迁移slot

6. rebalance：平衡集群节点slot数量

7. add-node：添加新节点

8. del-node：删除节点

9. set-timeout：设置节点的超时时间

10. call：在集群所有节点上执行命令

11. import：将外部redis数据导入集群

创建集群

redis-trib.rb create --replicas 1 127.0.0.1:6379 127.0.0.1:6380 127.0.0.1:6381 127.0.0.1:6382 127.0.0.1:6383 127.0.0.1:6384

--replicas参数指定集群中每个主节点配备几个从节点，这里设置为1。

>>> Creating cluster
/usr/local/ruby/lib/ruby/gems/2.5.0/gems/redis-3.3.0/lib/redis/client.rb:459: warning: constant ::Fixnum is deprecated
>>> Performing hash slots allocation on 6 nodes...
Using 3 masters:
127.0.0.1:6379
6380
6381
Adding replica 6383 to 63796384 to 63806382 to 6381
>>> Trying to optimize slaves allocation for anti-affinity
[WARNING] Some slaves are in the same host as their master
M: bc775f9c4dea40820b82c9451778b1fcd42f92bc 
   slots:0-5460 (5461 slots) master
M: 3b27d00d13706a032a92ff6b0a914af272dcaaf2 5461-10922 (5462 slots) master
M: d874f003257f1fb036bbd856ca605172a1741232 10923-16383 ( slots) master
S: 648eb314863b82aaa676380be7db2ec307f5547d 6382
   replicates bc775f9c4dea40820b82c9451778b1fcd42f92bc
S: 65a6efb441ac44c348f7da8c62e26b888cda7c48 6383
   replicates 3b27d00d13706a032a92ff6b0a914af272dcaaf2
S: 57bda956485109552547aef6c77fba43d2124abf 6384
   replicates d874f003257f1fb036bbd856ca605172a1741232
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join...
>>> Performing Cluster Check (using node )
M: bc775f9c4dea40820b82c9451778b1fcd42f92bc  slots) master
   1 additional replica(s)
S: 648eb314863b82aaa676380be7db2ec307f5547d 
   slots: (0 slots) slave
   replicates bc775f9c4dea40820b82c9451778b1fcd42f92bc
M: 3b27d00d13706a032a92ff6b0a914af272dcaaf2  additional replica(s)
S: 57bda956485109552547aef6c77fba43d2124abf  slots) slave
   replicates d874f003257f1fb036bbd856ca605172a1741232
S: 65a6efb441ac44c348f7da8c62e26b888cda7c48  slots) slave
   replicates 3b27d00d13706a032a92ff6b0a914af272dcaaf2
M: d874f003257f1fb036bbd856ca605172a1741232  additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check  open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

16384个槽全部被分配，集群创建成功。注意：给redis-trib.rb的节点地址必须是不包含任何槽/数据的节点，否则会拒绝创建集群。

>>> deprecated
[ERR] Node 6379 is not empty. Either the node already knows other nodes (check with CLUSTER NODES) or contains some key in database 0.

关于主从节点的选择及槽的分配，其算法如下：

1> 把节点按照host分类，这样保证master节点能分配到更多的主机中。

2> 遍历host列表，从每个host列表中弹出一个节点，放入interleaved数组。直到所有的节点都弹出为止。

3> 将interleaved数组中前master个数量的节点保存到masters数组中。

4> 计算每个master节点负责的slot数量，16384除以master数量取整，这里记为N。

5> 遍历masters数组，每个master分配N个slot，最后一个master，分配剩下的slot。

6> 接下来为master分配slave，分配算法会尽量保证master和slave节点不在同一台主机上。对于分配完指定slave数量的节点，还有多余的节点，也会为这些节点寻找master。分配算法会遍历两次masters数组。

7> 第一次遍历master数组，在余下的节点列表找到replicas数量个slave。每个slave为第一个和master节点host不一样的节点，如果没有不一样的节点，则直接取出余下列表的第一个节点。

8> 第二次遍历是分配节点数除以replicas不为整数而多出的一部分节点。

检查集群状态

redis-trib.rb check 127.0.0.1:6379

指定任意一个节点即可。

/usr/local/ruby/lib/ruby/gems/ deprecated
>>> Performing Cluster Check (16384 slots covered.

查看集群信息

redis-trib.rb info 127.0.0.1:6383

/usr/local/ruby/lib/ruby/gems/ deprecated
6380 (3b27d00d...) -> 0 keys | 5462 slots |  slaves.
6381 (d874f003...) -> 1 keys | 5461 slots | 6379 (bc775f9c...) ->  slaves.
[OK] 1 keys in  masters.
0.00 keys per slot on average.

修复集群

目前fix命令能修复两种异常，

1. 节点中存在处于迁移中（importing或migrating状态）的slot。

2. 节点中存在未分配的slot。

其它异常不能通过fix命令修复。

[root@slowtech conf]# redis-trib.rb fix 6379 /usr/local/ruby/lib/ruby/gems/) S: d826c5fd98efa8a17a880e9a90a25f06c88e6ae9 slots) slave replicates a8b3d0f9b12d63dab3b7337d602245d96dd55844 S: 55c05d5b0dfea0d52f88548717ddf24975268de6 slots) slave replicates a8b3d0f9b12d63dab3b7337d602245d96dd55844 M: f413fb7e6460308b17cdb71442798e1341b56cbc 50-163342 additional replica(s) S: beba753c5a63607fa66d9ec7427ed9a511ea136e slots) slave replicates f413fb7e6460308b17cdb71442798e1341b56cbc S: 83797d518e56c235272402611477f576973e9d34 slots) slave replicates f413fb7e6460308b17cdb71442798e1341b56cbc M: a8b3d0f9b12d63dab3b7337d602245d96dd55844 49 (50在线迁移slot

交互环境中使用

如，

redis-trib.rb reshard 6379

指定任意一个节点即可。

/usr/local/ruby/lib/ruby/gems/3225-22363224,13958 (1172313959-242516384 slots covered.
How many slots do you want to move (from 1 to 16384)? 200
What is the receiving node ID? 3b27d00d13706a032a92ff6b0a914af272dcaaf2
Please enter all the source node IDs.
  Type all' to use all the nodes as source nodes  the hash slots.
  Type done once you entered all the source nodes IDs.
Source node #1:

它首先会提示需要迁移多个槽，我这里写的是200。

接着它会提示需要将槽迁移到哪个节点，这里必须写节点ID。

紧跟着它会提示槽从哪些节点中迁出。

如果指定为all，则待迁移的槽在剩余节点中平均分配，在这里，127.0.0.1:6379和127.0.0.1:6381各迁移100个槽出来。

也可从指定节点中迁出，这个时候，必须指定源节点的节点ID，最后以done结束，如下所示，

Source node #:bc775f9c4dea40820b82c9451778b1fcd42f92bc
Source node #:done

Ready to move  slots.
  Source nodes:
    M: bc775f9c4dea40820b82c9451778b1fcd42f92bc  additional replica(s)
  Destination node:
    M: 3b27d00d13706a032a92ff6b0a914af272dcaaf2  additional replica(s)
  Resharding plan:
    Moving slot 3225 from bc775f9c4dea40820b82c9451778b1fcd42f92bc
    Moving slot 3226 3227  bc775f9c4dea40820b82c9451778b1fcd42f92bc
    ...
Do you want to proceed with the proposed reshard plan (yes/no)? yes
Moving slot 6379 to : .
Moving slot : 
Moving slot : ..
Moving slot 3228 : 
...

最后，提示是否继续进行。

命令行中使用

redis-trib.rb reshard host:port --from <arg> --to <arg> --slots <arg> --yes --timeout <arg> --pipeline <arg>

其中，

host:port：必传参数，集群内任意节点地址，用来获取整个集群信息。

--from：源节点id，如果有多个源节点，使用逗号分隔，如果是all，则源节点为集群内出目标节点外的其它所有主节点。

--to：目标节点id，只能填写一个。

--slots：需要迁移槽的总数量。

--yes：迁移无需用户手动确认。

--timeout：控制每次migrate操作的超时时间，默认为60000毫秒。

--pipeline：控制每次批量迁移键的数量，默认为10。

如，

redis-trib.rb reshard --from a8b3d0f9b12d63dab3b7337d602245d96dd55844 --to f413fb7e6460308b17cdb71442798e1341b56cbc  --slots 10923 --yes --pipeline 20 127.0.0.1:6383

平衡集群节点slot数量

rebalance       host:port
                  --weight <arg>
                  --auto-simulate
                  --pipeline <arg>
                  --threshold <arg>

其中，

--weight <arg>：节点的权重，格式为node_id=weight，如果需要为多个节点分配权重的话，需要添加多个--weight <arg>参数，即--weight b31e3a2e=5 --weight 60b8e3a1=5，node_id可为节点名称的前缀，只要保证前缀位数能唯一区分该节点即可。没有传递–weight的节点的权重默认为1。

--auto-weights：自动将每个节点的权重默认为1。如果--weight和--auto-weights同时指定，则--auto-weights会覆盖前者。

--threshold <arg>：只有节点需要迁移的slot阈值超过threshold，才会执行rebalance操作。

--use-empty-masters：默认没有分配slot节点的master是不参与rebalance的。如果要让其参与rebalance，需添加该参数。

--timeout <arg>：设置migrate命令的超时时间。

--simulate：设置该参数，只会提示用户会迁移哪些slots，而不会执行真正的迁移操作。

--pipeline <arg>：定义cluster getkeysinslot命令一次取出的key数量，不传的话使用默认值为10。

如，

# redis-trib.rb rebalance --weight a8b3d0f9b12d63dab3b7337d602245d96dd55844=3 --weight f413fb7e6460308b17cdb71442798e1341b56cbc=2  --use-empty-masters  )
[OK] All nodes agree about slots configuration.
>>> Check  slots covered.
>>> Rebalancing across 2 nodes. Total weight = 5.0
Moving 3824 slots 6380 to 
#########################################...

删除节点

redis-trib.rb del-node host:port node_id

在删除节点之前，其对应的槽必须为空，所以，在进行节点删除动作之前，必须使用redis-trib.rb reshard将其迁移出去。

需要注意的是，如果某个节点的槽被完全迁移出去，其对应的slave也会随着更新，指向迁移的目标节点。

# redis-trib.rb del-node  8f7836a9a14fb6638530b42e04f5e58e28de0a6c
>>> Removing node 8f7836a9a14fb6638530b42e04f5e58e28de0a6c from cluster  deprecated
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.

添加新节点

redis-trib add-node new_host:new_port existing_host:existing_port --slave --master-id <arg>

其中，

new_host:new_port：待添加的节点，必须确保其为空或不在其它集群中。否则，会提示以下错误。

[ERR] Node 0.

所以，线上建议使用redis-trib.rb添加新节点，因为其会对新节点的状态进行检查。如果手动使用cluster meet命令加入已经存在于其它集群的节点，会造成被加入节点的集群合并到现有集群的情况，从而造成数据丢失和错乱，后果非常严重，线上谨慎操作。

existing_host:existing_port：集群中任意一个节点的地址。

如果添加的是主节点，只需指定源节点和目标节点的地址即可。

redis-trib.rb add-node 6379 6384

如果添加的是从节点，其语法如下，

redis-trib.rb add-node --slave --master-id f413fb7e6460308b17cdb71442798e1341b56cbc 6384

注意：--slave和--master-id必须写在前面，同样的参数，如果是下面这样写法，会提示错误，

# redis-trib.rb add-node 6384 --slave --master-id f413fb7e6460308b17cdb71442798e1341b56cbc
[ERR] Wrong number of arguments for specified sub command

添加从节点，可不设置--master-id，此时会随机选择主节点。

设置节点的超时时间

redis-trib.rb set-timeout host:port milliseconds

其实就是批量修改集群各节点的cluster-node-timeout参数。

# redis-trib.rb set-timeout 20000
/usr/local/ruby/lib/ruby/gems/ deprecated
>>> Reconfiguring node timeout in every cluster node...
*** New timeout set for 6379
*** New timeout 6383
*** New timeout 6381
*** New timeout 6382
*** New timeout 6384
*** New timeout 6380
>>> New node timeout set. 6 OK,1)">0 ERR.

在集群所有节点上执行命令

redis-trib.rb call host:port command arg arg .. arg

如，

[root@slowtech conf]# redis-trib.rb call set hello world
/usr/local/ruby/lib/ruby/gems/ Calling SET hello world
6379: MOVED 866 6381
6383: MOVED : OK
6382: MOVED 6384: MOVED 6380: MOVED 

[root@slowtech conf]# redis-trib.rb call get hello
/usr/local/ruby/lib/ruby/gems/ Calling GET hello
: world
6381

将外部redis数据导入集群

redis-trib.rb import --6378 6379

其内部处理流程如下：

1> 通过load_cluster_info_from_node方法加载集群信息，check_cluster方法检查集群是否健康。

2> 连接外部redis节点，如果外部节点开启了cluster_enabled，则提示错误（[ERR] The source node should not be a cluster node.）

3> 通过scan命令遍历外部节点，一次获取1000条数据。

4> 遍历这些key，计算出key对应的slot。

5> 执行migrate命令,源节点是外部节点,目的节点是集群slot对应的节点，如果设置了--copy参数，则传递copy参数，其会保留源节点的key，如果设置了--replace，则传递replace参数。如果目标节点中存在同名key，其值会被覆盖。两个参数可同时指定。

6> 不停执行scan命令，直到遍历完所有key。

7> 迁移完成。

[root@slowtech conf]# redis-trib.rb import --6378 --replace  6379 
>>> Importing data 6378 to cluster 
/usr/local/ruby/lib/ruby/gems/ slots covered.
>>> Connecting to the source Redis instance
*** Importing from DB 
Migrating key5 to 6381: OK

参考

1. redis cluster管理工具redis-trib.rb详解

redis-trib.rb命令详解

相关文章