我在cassandra集群中遇到问题，该集群有几个数据中心，每个数据中心有3个节点，每个数据中心有2个节点充当种子：
我有一个带有replicationfactor 3的keyspace x，它在数据中心dc1中有3个拷贝，在数据中心dc2中有3个拷贝( KEYSPACE X WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'} AND durable_writes = true; )
现在，我要做的（也许我在这里遗漏了什么）是将cqlsh插入数据中心dc2中的每个节点（比如node2a、node2b和node2c），并执行以下操作：
cqlsh节点2N
一致性所有
从x.table中选择*；
通过将一致性设置为all，我知道我必须从每个节点得到一个响应，3个属于dc1，3个属于dc2，总共6个响应。但不是这样，我在每个节点得到3个不同的结果：
node2a:查询失败，返回 Cannot achieve consistency level ALL info: {'required_replicas': 6, 'alive_replicas': 5, 'consistency': ALL} node2b：查询成功并返回表数据
node2c：查询需要1-2分钟，然后返回一个 Coordinator node timed out waiting for replica nodes' responses. Operation timed out - received only 5 responses. info: {'received_responses': 5, 'required_responses': 6, 'consistency': ALL} 我之所以在cqlsh中执行这些查询，是因为我们的一个应用程序在查询cassandra时行为不稳定（比如说没有足够的副本用于仲裁等），我怀疑我们可能在节点之间的通信方面存在一些问题。不是说闲话就是把不同的事情告诉不同的节点，就是这样。通信工作从每个节点到任何其他节点（我们可以cqlsh、ssh和所有东西）。
我的理论是正确的吗？我们的结构有点不一致？如果是这样，我如何调试这些失败？有没有办法知道哪个节点没有活动或者没有响应，这样我就可以更仔细地查看它的通信？我尝试了“tracing on”，但它只适用于成功的查询，所以我只在node2b中获得跟踪（顺便说一句，同一个节点上的行为并不总是相同的，它似乎是随机的）
如果不是，我的cqlsh测试是否有效？还是我错过了Cassandra谜题的关键部分？
非常感谢，我要发疯了。。。。
编辑：根据请求，这里是nodetool descripebCluster的输出。我在dc2的3个节点上都做了，而且：
节点2a： Cluster Information: Name: Cassandra Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 19ada8a5-4688-3fa8-9479-e612388f67ee: [node2A, node2B, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)] 节点2B： Cluster Information: Name: Cassandra Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 19ada8a5-4688-3fa8-9479-e612388f67ee: [node2A, node2B, node2C, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)] UNREACHABLE: [couple of IPs from other datacenter/keyspaces] 节点2C： Cluster Information: Name: Cassandra Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 19ada8a5-4688-3fa8-9479-e612388f67ee: [node2B, node2C, node1A, node1B, node1C, other IPs from other nodes (from other datacenters and keyspaces)] UNREACHABLE: [node2A and other IPs] 值得注意的是，在node2a中没有node2c，在node2b中所有3个节点都出现，而在node2c中，node2a是不可访问的。。。
我觉得这是非常错误的，不知怎么的。。。
我刚刚执行了一个“nodetool status keyspacex”，结果如下：
节点2a： Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN node2A 67,78 MB 256 100,0% - RAC1 UN node2B 67,18 MB 256 100,0% - RAC1 ?N node2C 67,11 MB 256 100,0% - RAC1 节点2B： Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN node2A 67,78 MB 256 100,0% - RAC1 UN node2B 67,18 MB 256 100,0% - RAC1 UN node2C 67,11 MB 256 100,0% - RAC1 节点2C： Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN node2A 67,78 MB 256 100,0% - RAC1 UN node2B 67,18 MB 256 100,0% - RAC1 UN node2C 67,11 MB 256 100,0% - RAC1 现在，为什么node2a不知道node2c的状态（它显示为？而它并没有出现在描述集群的阴谋厌恶中？但是为什么node2c从node2a抱怨在describecluster中无法访问，而根据状态，它却知道node2a已经启动了呢？

2条答案

按热度按时间

1wnzp6jl1#

首先，您可以检查任何节点是否可以访问您可以运行nodetool describe cluster并分析输出。
节点之间的通信是通过端口7000通过流言和消息交换进行的，而不是通过ssh或cqlsh。
大约3岁以上questions:-
当您运行查询时，可能任何节点当时都不可访问，并且您没有实现所有节点的一致性。
此时节点处于活动状态并实现了一致性，您就得到了数据。
在这种情况下，协调器节点没有在时间内通过超时异常从所有节点获取数据。它可以设置在Cassandra。亚马尔。
希望回答了你的问题。

赞(0）回复(0）举报 2021-06-15

qhhrdooz2#

这与Cassandra的一个内部问题有关。由于一些损坏的提示文件，gossiper进程正在关闭，但其余的cassandra进程正在启动并运行，因此node可以看到其他所有进程，但其余的进程则表示由于gossiper关闭而关闭（实际端口9160在异常之后关闭）
异常截图
真正的Cassandra问题是https://issues.apache.org/jira/browse/cassandra-12728
希望有用

赞(0）回复(0）举报 2021-06-14

一致性—同一数据中心中的cassandra节点提供不同的查询结果/错误

2条答案

相关问题

热门标签

最新问答