cassandra-stress yaml文件是如何工作的？

我在看一个yaml文件cassandra-stress：

# Keyspace name and create CQL
#
keyspace: stressexample
keyspace_definition: |
  CREATE KEYSPACE stressexample WITH replication = {'class': 'NetworkTopologyStrategy', 'AWS_VPC_US_WEST_2': '2'};
#
# Table name and create CQL
#
table: eventsrawtest
table_definition: |
  CREATE TABLE eventsrawtest (
        host text,
        bucket_time text,
        service text,
        time timestamp,
        metric double,
        state text,
        PRIMARY KEY ((host, bucket_time, service), time)
  ) WITH CLUSTERING ORDER BY (time DESC)
 
#
# Meta information for generating data
#
columnspec:
  - name: host
    size: fixed(32) #In chars, no. of chars of UUID
    population: uniform(1..600)  # We have about 600 hosts with equal events per host
  - name: bucket_time
    size: fixed(18)
    population: uniform(1..288) # 288 potential buckets
  - name: service
    size: uniform(10..100)
    population: uniform(1000..2000) # 1000 - 2000 metrics per host
  - name: time
    cluster: fixed(15) 
  - name: state
    size: fixed(4)
 
#
# Specs for insert queries
#
insert:
  partitions: fixed(1)      # 1 partition per batch
  batchtype: UNLOGGED       # use unlogged batches
  select: fixed(10)/10      # no chance of skipping a row when generating inserts
 
#
# Read queries to run against the schema
#
queries:
   pull-for-rollup:
      cql: select * from eventsrawtest where host = ? and service = ? and bucket_time = ?
      fields: samerow             # pick selection values from same row in partition
   get-a-value:
      cql: select * from eventsrawtest where host = ? and service = ? and bucket_time = ? and time = ?
      fields: samerow             # pick selection values from same row in partition

我发现这个文件在互联网上，我不太明白它是如何工作的。
首先，我不理解columnspec。对于分区列host、bucket_time、service，它说：

population: uniform(1..600)  # We have about 600 hosts with equal events per host
population: uniform(1..288) # 288 potential buckets
population: uniform(1000..2000) # 1000 - 2000 metrics per host

这是否意味着我将拥有最多6002882000个分区？这是运行cassandra-stress时我将拥有的分区总数吗？这意味着当压力测试完成时，我将看到的最大分区数将是6002882000？如果执行“选择计数”，我将看到的最大列数（*）从表”将600 × 288 × 2000 × 15？
接下来我不明白插入部分

partitions: fixed(1)      # 1 partition per batch

这是否意味着1次插入操作只会更新1个分区？

select: fixed(10)/10      # no chance of skipping a row when generating inserts

这个select是什么？我不明白它是如何工作的。一开始我的表是空的，如果表中什么都没有，它将如何选择并插入任何内容？我的理解是它从每个批中选择100%的数据进行插入（因为它说是fixed（10）/10），然后插入它，对吗？

您发布的cassandra-stress YAML包含4个部分：

要进行压力测试的keyspace:和table:的模式，
columnspec:部分包含定义如何生成合成数据的元信息，
insert:部分定义如何写入数据，并
queries:部分定义将如何读取数据。

对于分区键：

host列将包含固定大小的32个字符，总体均匀分布在1到600台主机之间。
bucket_time列将包含固定大小的18个字符，总体均匀分布在1到288个“桶”之间。
service列将包含10到100个字符，包含1000到2000个服务。

由于service列的可能数量均匀分布在1000到2000之间，因此我们可以假设平均值service为1500。这意味着分区总数（Tp）的计算公式为：

Tp = hosts x buckets x services
       = 600 x 288 x 1500

该表将time作为聚簇键，由于每个分区包含15行的固定大小（根据columnspec），因此表中的最大行数（不是列数）为：

max_rows = Tp x time_rows
             = (600 x 288 x 1500) x 15

对于“write”部分，规范partitions: fixed(1)表示每个写操作将只插入1分区。规范select: fixed(10)/10表示所有10行（从columnspec中15个可能生成的time值中“选择”）将被写入分区。
有关总体分布和统计函数的更多信息，请参阅the cassandra-stress document on the Apache Cassandra website。

cassandra-stress yaml文件是如何工作的？

1条答案

相关问题

热门标签

最新问答