我们正在使用 Cassandra database in production environment . 我们有一个 single cross colo cluster of 24 nodes 意义 12 nodes in PHX 以及 12 nodes in SLC colo . 我们有一个 replication factor of 4 这意味着 2 copies will be there in each datacenter .
下面是 keyspace 以及 column families 是由我们的 Production DBA's .
使用placement\u strategy='org.apache.cassandra.locator.networktopologystrategy'和strategy\u options={slc:2,phx:2};

create column family PROFILE_USER
with key_validation_class = 'UTF8Type'
and comparator = 'UTF8Type'
and default_validation_class = 'UTF8Type'
and gc_grace = 86400;

我们正在跑步 Cassandra 1.2.2 而且它有 org.apache.cassandra.dht.Murmur3Partitioner ，与 KeyCaching , SizeTieredCompactionStrategy 以及 Virtual Nodes 也已启用。cassandra节点部署在 HDD instead of ssd的。我正在使用Astyanax client从中读取数据Cassandra database使用consistency level as ONE. 我插入50 Millions records在使用Astyanax client压实完成后，我开始read against the Cassandra production database. 下面是我用来创建连接配置的代码Astyanax client` -

/**
 * Creating Cassandra connection using Astyanax client
 *
 */
private CassandraAstyanaxConnection() {
    context = new AstyanaxContext.Builder()
    .forCluster(ModelConstants.CLUSTER)
    .forKeyspace(ModelConstants.KEYSPACE)
    .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(100)
        .setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
        .setLocalDatacenter("phx") //filtering out the nodes basis on data center
    )
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()
        .setCqlVersion("3.0.0")
        .setTargetCassandraVersion("1.2")
        .setConnectionPoolType(ConnectionPoolType.ROUND_ROBIN)
        .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE))
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
    .buildKeyspace(ThriftFamilyFactory.getInstance());
    context.start();
    keyspace = context.getEntity();
    emp_cf = ColumnFamily.newColumnFamily(
        ModelConstants.COLUMN_FAMILY, 
        StringSerializer.get(), 
        StringSerializer.get());
}

大部分时间我都在 95th percentile read performance 周围 8/9/10 ms .
我想看看有没有办法让我变得更好 read performance 与 Cassandra database . 在我的印象中，我将获得第95百分位作为 1 or 2 ms 但是在对生产集群做了一些测试之后，我的所有假设都错了。从我运行客户端程序的地方ping到cassandra生产节点的时间是 0.3ms average .
下面是我得到的结果。

Read Latency(95th Percentile)      Number of Threads    Duration the program was running(in minutes)    Throughput(requests/seconds)    Total number of id's requested    Total number of columns requested
    8 milliseconds                         10                      30                                               1584                              2851481                        52764072

有没有人能告诉我，我还可以尝试哪些方法来获得良好的读取延迟性能？我知道可能有类似的人在我同样的情况下，以及谁在生产中使用Cassandra。任何帮助都将不胜感激。
谢谢你的帮助。

.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
.setPort(9160)
.setMaxConnsPerHost(100)
.setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
.setLocalDatacenter("phx") //filtering out the nodes basis on data center
.setLatencyScoreStrategy(new SmaLatencyScoreStrategyImpl(10000,10000,100,0.50))
)

1条答案

按热度按时间

2ledvvac1#

我会尝试以下方法：

阿斯蒂亚纳克斯

将connectionpooltype设置为token\u aware，而不是round\u robin。
另外，我将使用一些astyanax延迟感知连接池特性。例如：

.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(100)
        .setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
        .setLocalDatacenter("phx") //filtering out the nodes basis on data center
        .setLatencyScoreStrategy(new SmaLatencyScoreStrategyImpl(10000,10000,100,0.50))
    )

延迟设置是通过scorestrategy的构造函数提供的。e、 g.SMAlatencyScore战略实施。
我也在想这个问题，所以我会在这里张贴，如果我学到什么额外的。
请参阅：延迟和令牌感知配置

Cassandra

你可以做一些事情来优化阅读。注意：我没有尝试过这些，但是它们在我要调查的事情列表中（所以我想我应该和大家分享）。
隐藏物
启用键缓存和行缓存。
密钥缓存

bin/nodetool --host 127.0.0.1 --port 8080 setcachecapacity MyKeyspace MyColumnFam 200001 0

行缓存

bin/nodetool --host 127.0.0.1 --port 8080 setcachecapacity MyKeyspace MyColumnFam 0 200005

然后用你的应用程序场景在该节点上敲打一段时间后检查命中率：

bin/nodetool --host 127.0.0.1  --port 8080 cfstats

一致性
考虑读取一致性，以便在数据一致性上看到这一点（这是datastax文档，但仍然相关）
考虑降低读取修复机会。

update column family MyColumnFam with read_repair_chance=.5

在降低read\u repair\u几率后，考虑调整复制因子以提高读取性能（但这将终止写入，因为我们将写入更多节点）。

create keyspace cache with replication_factor=XX;

磁盘
不知道这里是否有什么要做的，但我认为我应该包括它。确保最佳的文件系统（例如ext4）。如果您有一个高复制因子，我们可以围绕这个优化磁盘（知道我们会从cassandra得到我们的耐用性）。i、 e.什么raid级别最适合我们的设置。

展开查看全部

赞(0）回复(0）举报 2021-06-15

cassandra读取性能

1条答案

阿斯蒂亚纳克斯

Cassandra

相关问题

热门标签

最新问答