当使用Docker Compose在同一个网络上创建三个cassandra节点时,如何防止它们分裂成不同的环?

sg3maiej  于 2023-10-18  发布在  Cassandra
关注(0)|答案(1)|浏览(132)

我在XUbuntu上,所有最新的更新都已应用。
我已经设置了Docker repo,并安装了最新的Docker工具套件。
我已经使用apt-get并安装了docker-compose
我创建了以下docker-compose.yaml文件:

version: "3.3"
 
networks:
  cassandra-net:
    driver: bridge
    
services:
 
  cassandra-1:
    image: "cassandra:latest"
    container_name: "cassandra-1"
    ports:
      - "7000:7000"
      - "9042:9042"
    networks:
      - "cassandra-net"
    volumes:
      - ./volumes/cassandra-1:/var/lib/cassandra:rw      

  cassandra-2:
    image: "cassandra:latest"
    container_name: "cassandra-2"
    environment:
      - "CASSANDRA_SEEDS=cassandra-1"
    networks:
      - "cassandra-net"
    depends_on:
      - "cassandra-1"
    volumes:
      - ./volumes/cassandra-2:/var/lib/cassandra:rw      
 
  cassandra-3:
    image: "cassandra:latest"
    container_name: "cassandra-3"
    networks:
      - "cassandra-net"
    environment:
      - "CASSANDRA_SEEDS=cassandra-1"
    depends_on:
      - "cassandra-1"
    volumes:
      - ./volumes/cassandra-3:/var/lib/cassandra:rw

当我使用这个bash脚本检查三个节点(cassandra-1,cassandra-2,cassandra-3)的状态时:

docker exec -it cassandra-1 nodetool status
docker exec -it cassandra-2 nodetool status
docker exec -it cassandra-3 nodetool status

我得到以下输出:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.18.0.3  252.53 KiB  16      100.0%            640dbf8a-13bb-46e5-8b1e-4542ee3352c4  rack1
UN  172.18.0.2  189.34 KiB  16      100.0%            7807001a-1885-41d1-b661-bd6c7e0db239  rack1

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.18.0.4  178.24 KiB  16      100.0%            fe8fb5fe-7342-4eeb-92bb-01a755ecd8ad  rack1
UN  172.18.0.2  263.65 KiB  16      100.0%            7807001a-1885-41d1-b661-bd6c7e0db239  rack1

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.18.0.3  178.2 KiB   16      100.0%            640dbf8a-13bb-46e5-8b1e-4542ee3352c4  rack1
UN  172.18.0.2  263.65 KiB  16      100.0%            7807001a-1885-41d1-b661-bd6c7e0db239  rack1

我希望看到每个节点看到所有三个地址(172.18.0.2,3,4),但每个节点只“看到”另一个节点,而不是其他两个。

v6ylcynt

v6ylcynt1#

我在使用Docker Compose在本地环境中启动节点时遇到了类似的问题。
我是通过以下几种方法解决的:

  • Controlling the startup order of the nodes,这样Compose首先要确保种子节点cassandra-1运行正常,然后启动cassandra-2,还要确保cassandra-2节点运行正常,然后启动cassandra-3,依此类推。基本上,防止所有节点同时启动,特别是种子节点之后的节点。当节点与Compose * 同时 * 启动时,可能会导致错误,例如与令牌范围冲突,导致某些节点无法加入集群。
  • 使用更类似于您的生产环境的Snitch配置,这通常是在需要多节点或多集群或多数据中心时。例如,您可以使用GossipingPropertyFileSnitch,这也是Cassandra教程中用于Initializing a multiple node cluster (multiple datacenters)的相同的Snitch类型。
  • 解释如何设置CASSANDRA_CLUSTER_NAMECASSANDRA_DC环境变量,这将相应地在cassandra.yaml配置中设置cluster_name,并在cassandra-rackdc.properties文件中设置dc选项。这允许您显式地告诉节点加入同一个数据中心和集群。这些选项仅适用于GossipingPropertyFileSnitch

有了这些,这里是一个修改后的Compose文件版本:

version: "3.3"

networks:
  cassandra-net:
    driver: bridge

services:

  cassandra-1:
    image: "cassandra:latest"  # cassandra:4.1.3
    container_name: "cassandra-1"
    ports:
      - 7000:7000
      - 9042:9042
    networks:
      - cassandra-net
    environment:
      - CASSANDRA_START_RPC=true       # default
      - CASSANDRA_RPC_ADDRESS=0.0.0.0  # default
      - CASSANDRA_LISTEN_ADDRESS=auto  # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
      - CASSANDRA_CLUSTER_NAME=my-cluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=my-datacenter-1
    volumes:
      - cassandra-node-1:/var/lib/cassandra:rw
    restart:
      on-failure
    healthcheck:
      test: ["CMD-SHELL", "nodetool status"]
      interval: 2m
      start_period: 2m
      timeout: 10s
      retries: 3

  cassandra-2:
    image: "cassandra:latest"  # cassandra:4.1.3
    container_name: "cassandra-2"
    ports:
      - 9043:9042
    networks:
      - cassandra-net
    environment:
      - CASSANDRA_START_RPC=true       # default
      - CASSANDRA_RPC_ADDRESS=0.0.0.0  # default
      - CASSANDRA_LISTEN_ADDRESS=auto  # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
      - CASSANDRA_CLUSTER_NAME=my-cluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=my-datacenter-1
      - CASSANDRA_SEEDS=cassandra-1
    depends_on:
      cassandra-1:
        condition: service_healthy
    volumes:
      - cassandra-node-2:/var/lib/cassandra:rw
    restart:
      on-failure
    healthcheck:
      test: ["CMD-SHELL", "nodetool status"]
      interval: 2m
      start_period: 2m
      timeout: 10s
      retries: 3

  cassandra-3:
    image: "cassandra:latest"  # cassandra:4.1.3
    container_name: "cassandra-3"
    ports:
      - 9044:9042
    networks:
      - cassandra-net
    environment:
      - CASSANDRA_START_RPC=true       # default
      - CASSANDRA_RPC_ADDRESS=0.0.0.0  # default
      - CASSANDRA_LISTEN_ADDRESS=auto  # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
      - CASSANDRA_CLUSTER_NAME=my-cluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=my-datacenter-1
      - CASSANDRA_SEEDS=cassandra-1
    depends_on:
      cassandra-2:
        condition: service_healthy
    volumes:
      - cassandra-node-3:/var/lib/cassandra:rw
    restart:
      on-failure
    healthcheck:
      test: ["CMD-SHELL", "nodetool status"]
      interval: 2m
      start_period: 2m
      timeout: 10s
      retries: 3

volumes:
  cassandra-node-1:
  cassandra-node-2:
  cassandra-node-3:

这里的主要内容是healthcheck

healthcheck:
      test: ["CMD-SHELL", "nodetool status"]
      interval: 2m
      start_period: 2m
      timeout: 10s
      retries: 3

.以及每个节点上更新的depends_on

depends_on:
      cassandra-2:
        condition: service_healthy

修改后的Compose将cassandra-3设置为 onlycassandra-2健康时启动,并 onlycassandra-1健康时启动cassandra-2。在该Compose文件中:

  • 2分钟后调用nodetool status(给予时间让节点启动/引导)
  • 如果它在<10秒内响应并且退出代码为0,则该节点被认为是健康的
  • 每隔2 m重复检查3次

如果你轮询docker container ls,你会看到这样的东西:

CONTAINER ID   IMAGE              ...   STATUS                                  PORTS       NAMES
bce16c1b0de4   cassandra:latest   ...   Up About a minute (health: starting)    ...         cassandra-2
697fb8559c3c   cassandra:latest   ...   Up 3 minutes (healthy)                  ...         cassandra-1

.而节点从1开始。在上面的例子中,cassandra-3正在等待cassandra-2在启动之前变为“(healthy)”,这就是为什么您还没有看到它。
使用nodetool status不是最好的健康检查,但它至少可以等待节点完成引导。您可以通过解析输出并确保节点在列表中为UN来改进它。2m的间隔/周期也是任意的,根据您的test设置适合您的系统的任何内容。
里面还有一些额外的env变量

environment:
      - CASSANDRA_START_RPC=true       # default
      - CASSANDRA_RPC_ADDRESS=0.0.0.0  # default
      - CASSANDRA_LISTEN_ADDRESS=auto  # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS

.这可能是不需要的,因为这些已经是cassandra Docker镜像的默认值(请参阅Dockerhub页面上的Configuring Cassandra部分。基本上,它们显式地将容器的IP地址设置为侦听和广播地址。我只是在这里记下它,以防默认值更改。
最后,在我们的环境中,如果你在同一台机器上运行所有节点,你需要为每个节点指定不同的端口:

cassandra-1:
    ...
    ports:
      - 7000:7000
      - 9042:9042

  cassandra-2:
    ...
    ports:
      - 9043:9042

  cassandra-3:
    ...
    ports:
      - 9044:9042

.否则,容器可能无法正确启动。
如果一切顺利:

$ docker container ls 
CONTAINER ID   IMAGE              COMMAND                  CREATED         STATUS                                 PORTS                                                                          NAMES
4f9f7459f8d5   cassandra:latest   "docker-entrypoint.s…"   5 minutes ago   Up About a minute (health: starting)   7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9044->9042/tcp                      cassandra-3
05225ba91e5d   cassandra:latest   "docker-entrypoint.s…"   5 minutes ago   Up 3 minutes (healthy)                 7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9043->9042/tcp                      cassandra-2
ca2882224274   cassandra:latest   "docker-entrypoint.s…"   5 minutes ago   Up 5 minutes (healthy)                 7001/tcp, 0.0.0.0:7000->7000/tcp, 7199/tcp, 0.0.0.0:9042->9042/tcp, 9160/tcp   cassandra-1
$ docker exec cassandra-1 nodetool status
Datacenter: my-datacenter-1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.27.0.4  70.22 KiB   16      76.0%             5a4908f1-6e6f-42b1-88f2-8d5c6290b361  rack1
UN  172.27.0.3  75.19 KiB   16      59.3%             7060719b-d1db-4177-a2c3-1897320e6e33  rack1
UN  172.27.0.2  109.41 KiB  16      64.7%             94345229-fd00-424d-b16c-e1556fae7849  rack1
$ docker exec cassandra-2 nodetool status
Datacenter: my-datacenter-1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.27.0.4  70.22 KiB   16      76.0%             5a4908f1-6e6f-42b1-88f2-8d5c6290b361  rack1
UN  172.27.0.3  75.19 KiB   16      59.3%             7060719b-d1db-4177-a2c3-1897320e6e33  rack1
UN  172.27.0.2  109.41 KiB  16      64.7%             94345229-fd00-424d-b16c-e1556fae7849  rack1
$ docker exec cassandra-3 nodetool status
Datacenter: my-datacenter-1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.27.0.4  70.22 KiB   16      76.0%             5a4908f1-6e6f-42b1-88f2-8d5c6290b361  rack1
UN  172.27.0.3  75.19 KiB   16      59.3%             7060719b-d1db-4177-a2c3-1897320e6e33  rack1
UN  172.27.0.2  109.41 KiB  16      64.7%             94345229-fd00-424d-b16c-e1556fae7849  rack1

主要问题是启动节点需要很长时间。在示例Compose文件中,healthcheck.interval2m,所有3个节点都需要大约5分钟才能正确启动。

相关问题