cassandra—度量中一个表的写入延迟突然很高

lqfhib0f 于 2021-06-14 发布在 Cassandra

关注(0)|答案(1)|浏览(490)

我们突然发现一个表（设备）的指标中有很高的写入延迟。

这是一个包含<100个条目的小表，我们在其中定期更新字段。
这是在rf=3的3节点集群上。每个节点有8gb的ram。我们正在docker中运行cassandra 3.11.4。
原木里没有什么不寻常的东西。应用程序运行也很顺利。
节点工具表直方图

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count
                              (micros)          (micros)           (bytes)                  
50%             0.00            263.21              0.00               258                17
75%             0.00           1131.75              0.00               372                20
95%             0.00          12108.97              0.00               642                29
98%             0.00          25109.16              0.00               642                35
99%             0.00          43388.63              0.00               642                35
Min             0.00              8.24              0.00                51                 0
Max             0.00         155469.30              0.00               770                35

节点工具状态

Datacenter: datacenter-prod
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address      Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.164.0.23  2.62 GiB   256          100.0%            e7e2a38a-d4f3-4758-a345-73fcffe26035  rack1
UN  10.164.0.24  2.61 GiB   256          100.0%            0c18b8e4-5ca2-4fb5-9e8c-663b74909fbb  rack1
UN  10.164.0.58  2.62 GiB   256          100.0%            547c0746-72a8-4fec-812a-8b926d2426ae  rack1

怎么回事？统计数据是在撒谎还是有什么问题？
编辑：我可以把问题缩小到一个节点。节点2上的导出器显示：

cassandra_stats{cluster="Prod Cluster 2",datacenter="datacenter-prod",keyspace="iot_data",table="devices",name="org:apache:cassandra:metrics:table:iot_data:devices:writelatency:99thpercentile",} 268650.95

而node1和node3是这样的：

cassandra_stats{cluster="Prod Cluster 2",datacenter="datacenter-prod",keyspace="iot_data",table="devices",name="org:apache:cassandra:metrics:table:iot_data:devices:writelatency:99thpercentile",} 10090.808

但我仍然不知道是什么原因导致了node2。它没有负载，内存使用也很好？！有什么想法吗？

cassandra

来源：https://stackoverflow.com/questions/57579366/suddenly-high-write-latency-for-one-table-in-metrics