redis sentinel在swarm模式下检测从机问题

lkaoscv7  于 2021-06-09  发布在  Redis
关注(0)|答案(0)|浏览(352)

我试图建立一个简单的redis哨兵演示由docker和swarm。
有两个节点:node1(swarm manager),node2。node1将运行redis master和sentinel,node2将运行redis slave。
这是我的docker compose文件(控制带有标签的容器的分配):

version: "3.3"
services:
        master:
           image: "redis:5.0.7"
           deploy:
                mode: global
                placement:
                        constraints: [node.labels.redismaster == true]
           networks:
                myredisnet:
           command: redis-server /etc/redis.conf
           volumes:
                - "~/redis.conf:/etc/redis.conf"
        salve:
           image: "redis:5.0.7"
           deploy:
                mode: global
                placement:
                        constraints: [node.labels.redisslave1 == true]
           networks:
                myredisnet:
           command: redis-server /etc/redis-slave.conf
           volumes:
                - "~/redis-slave.conf:/etc/redis-slave.conf"
        sentinel:
           image: "redis:5.0.7"
           ports:
                - "26379:26379"
           volumes:
                - "~/sentinel.conf:/usr/local/bin/sentinel.conf"
           deploy:
                mode: global
                placement:
                        constraints: [node.labels.redismaster == true]
           networks:
                myredisnet:
           command: redis-sentinel /usr/local/bin/sentinel.conf
networks:
        myredisnet:
                driver: overlay

我的redis conf文件和redis slave conf文件除了 slaveof master 6379 在redis从属文件中( master 是docker compose文件中的服务名称):

bind 0.0.0.0
protected-mode yes
masterauth redispass
requirepass redispass

这是我的sentinel conf文件:

port 26379
logfile "/var/log/sentinel.log"
protected-mode no
dir "/root"
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster master 6379 1
sentinel auth-pass mymaster redispass

在i用户之后 docker stack deploy -c docker-compose.yml redis 为了部署这些服务,一切看起来都很正常,redis主从的构建也很正确。
但哨兵似乎有问题。当我进入哨兵集装箱码头( docker exec -it )并查看sentinel日志:

root@d2fe4dc7ffa4:/data# cat /var/log/sentinel.log 
1:X 23 Feb 2020 11:22:25.114 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 23 Feb 2020 11:22:25.114 # Redis version=5.0.7, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 23 Feb 2020 11:22:25.114 # Configuration loaded
1:X 23 Feb 2020 11:22:25.115 * Running mode=sentinel, port=26379.
1:X 23 Feb 2020 11:22:25.115 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 23 Feb 2020 11:22:25.116 # Sentinel ID is 1f9c8c8f688f0a9925dad749fea86c196781f6bf
1:X 23 Feb 2020 11:22:25.116 # +monitor master mymaster 10.0.9.2 6379 quorum 1
1:X 23 Feb 2020 11:22:25.118 * +slave slave 10.0.9.7:6379 10.0.9.7 6379 @ mymaster 10.0.9.2 6379
1:X 23 Feb 2020 11:22:55.168 # +sdown slave 10.0.9.7:6379 10.0.9.7 6379 @ mymaster 10.0.9.2 6379

如您所见,sentinel认为从属节点不可用。让我困惑的是哨兵检测到这个奴隶的ip是10.0.9.7。在节点2上。我通过命令发现药膏容器的ip应该是10.0.9.6:

on node2:
[root@node02 ~]# docker inspect 4ba57e6fd395
...
"Networks": {
                "redis_myredisnet": {
                    "IPAMConfig": {
                        "IPv4Address": "10.0.9.6"
                    },
                    "Links": null,
                    "Aliases": [
                        "4ba57e6fd395"
                    ],
                    "NetworkID": "ziry6mb6fkz5ido2cg9j86t6a",
                    "EndpointID": "771da42d9d7dc03ecb3892d2c3cdf83be97268625b0ee24f0fa3ffb6c2377b6d",
                    "Gateway": "",
                    "IPAddress": "10.0.9.6",
                    "IPPrefixLen": 24,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": "02:42:0a:00:09:06",
                    "DriverOpts": null
                }
            }

当我进入redis主集装箱码头( docker exec -it )执行 redis-cli , auth redispass , info 要检查节点1上的复制信息:


# Replication

role:master
connected_slaves:1
slave0:ip=10.0.9.7,port=6379,state=online,offset=224628,lag=1
master_replid:f4bf3ba64df96919b6e9cd4e0935ace6d31b0ba6
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:224759
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:224759

如您所见,从属ip也是 slave0:ip=10.0.9.7 . 所以我做了一个小实验,我用 apt-get update; apt-get install telnet 并尝试在我的redis主容器中telnet 10.0.9.7 6379:

root@d2fe4dc7ffa4:/data# telnet 10.0.9.7 6379
Trying 10.0.9.7...
telnet: Unable to connect to remote host: Connection refused

我还测试了telnet 10.0.9.6 6379:

root@d2fe4dc7ffa4:/data# telnet 10.0.9.6 6379
Trying 10.0.9.6...
Connected to 10.0.9.6.
Escape character is '^]'.
auth redispass
+OK

而且,我执行 docker inspect (slave service name) 这是奴隶服务vip:

[root@node03 ~]# docker inspect redis_salve
...
 "Endpoint": {
            "Spec": {
                "Mode": "vip"
            },
            "VirtualIPs": [
                {
                    "NetworkID": "ziry6mb6fkz5ido2cg9j86t6a",
                    "Addr": "10.0.9.5/24"
                }
            ]
        }

那么这个ip 10.0.9.7是从哪里来的呢?看来,我的哨兵服务也有问题。当我暂停redis主容器时,sentinel无法切换到从属节点。
另外,这是sentinel服务运行后我的sentinel conf文件:

[root@node03 ~]# cat sentinel.conf 
port 26379
logfile "/var/log/sentinel.log"
protected-mode no
dir "/root"
sentinel myid 1f9c8c8f688f0a9925dad749fea86c196781f6bf
sentinel deny-scripts-reconfig yes

# Generated by CONFIG REWRITE

sentinel monitor mymaster 10.0.9.2 6379 1
sentinel auth-pass mymaster redispass
sentinel config-epoch mymaster 0
sentinel leader-epoch mymaster 0
sentinel known-replica mymaster 10.0.9.7 6379
sentinel current-epoch 0

任何帮助都将不胜感激!!!!!!!!!

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题