Cassandra -避免节点工具清理

ctehm74n  于 2022-12-12  发布在  Cassandra
关注(0)|答案(2)|浏览(172)

If we have added new nodes to a C* ring, do we need to run "nodetool cleanup" to get rid of the data that has now been assigned elsewhere? Or is this going to happen anyway during normal compactions? During normal compactions, does C* remove data that does no longer belong on this node, or do we need to run "nodetoool cleanup" for that? Asking because "cleanup" takes forever and crashes the node before finishing.
If we need to run "nodetool cleanup", is there a way to find out which nodes now have data they should no longer own? (i.e data that now belongs on the new nodes, but is still present on the old nodes because no one removed it. This is the data that "nodetool cleanup" would remove.) We have RF=3 and two data centers, each of which has a complete copy of the data. I assume we need to run cleanup on all nodes in the data center where we have added nodes, because each row on the new node used to be on another node (primary), plus two copies (replicas) on two other nodes.

z31licg0

z31licg01#

如果你使用的是Apache Cassandra 1.2或更高版本,cleanup会检查文件上的 meta数据,这样它只会在需要的时候才做一些事情。所以你可以在每个节点上运行它,只有那些有额外数据的节点才会做一些事情。数据不会在正常的压缩过程中被删除,你必须调用cleanup来删除它。

b4lqfgs4

b4lqfgs42#

我发现比较每个节点在数据文件夹中占用的空间很有用(对我来说是/var/lib/cassandra/data).有些东西,比如快照,在节点之间可能会有所不同,但是当你看到新节点比旧节点使用更少的磁盘空间时,这可能是因为它们在添加新节点后没有进行清理.当你在那里的时候,您还可以检查其中最大的.db文件,并检查您的存储是否有足够的可用空间来存储另一个相同大小的文件。清理似乎是将.db文件的数据复制到新文件中,减去现在在其他节点上的数据。因此,在它运行时,您可能需要额外的空间。

相关问题