kafka connect partition.duration.ms和flush size之间的属性关系？

bweufnob 于 2021-06-06 发布在 Kafka

关注(0)|答案(2)|浏览(417)

有人能解释一下partition.duration.ms和flushsize在下面的配置中的意义吗。设置这些属性的背后应该有什么想法？

"connector.class": "io.confluent.connect.s3.S3SinkConnector",
  "s3.region": "eu-central-1",
  "partition.duration.ms": "1000",
  "topics.dir": "root_bucket",
  "flush.size": "10",
  "topics": "TEST_SRV",
  "tasks.max": "1",
  "s3.part.size": "5242880",
  "timezone": "UTC",
  "locale": "US",
  "key.converter.schemas.enable": "true",
  "format.class": "io.confluent.connect.s3.format.json.JsonFormat",
  "partitioner.class": "io.confluent.connect.storage.partitioner.TimeBasedPartitioner",
  "schema.generator.class": "io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator",
  "value.converter.schemas.enable": "false",
  "value.converter": "org.apache.kafka.connect.json.JsonConverter",
  "storage.class": "io.confluent.connect.s3.storage.S3Storage",
  "s3.bucket.name": "events-dev-s3",
  "key.converter": "org.apache.kafka.connect.storage.StringConverter",
  "path.format": "'year'-YYYY/'month'-MM/'day'-dd/'hour'-HH",
  "timestamp.extractor": "RecordField",
  "timestamp.field": "event_data.created_at"

apache-kafka apache-kafka-connect

来源：https://stackoverflow.com/questions/52760883/kafka-connect-property-relation-between-partition-duration-ms-and-flush-size

2条答案

按热度按时间

yhuiod9q1#

1秒的分区持续时间没有意义，因为您已经将分区器设置为只进行每小时一次的分区。
分区器未设置为仅生成每小时分区。 "path.format": "'year'-YYYY/'month'-MM/'day'-dd/'hour'-HH" 这将目录结构粒度设置为一小时 "partition.duration.ms": "1000" 这将连接器配置为每“秒”值的数据输出一个文件（…每个输入分区）
这些文件将写入“hourly”目录，其中包含为其生成文件的“second”。
i、 e.hourly目录将包含该小时的所有数据（在本例中为每秒所有文件）

赞(0）回复(0）举报 2021-06-07

f1tvaqid2#

分区持续时间确定基于时间的分区器创建新“path.format”的频率。在您的例子中，1秒的分区持续时间没有意义，因为您已经将分区器设置为只进行每小时一次的分区。
那么flush大小是任何给定文件中存在多少kafka记录的上限
这些价值观背后的思想取决于你的主题的吞吐量，以及在你阅读s3的记录而不是直接从Kafka的记录之前你愿意忍受的延迟时间。
请注意，您为每个s3扫描付费，因此更高的刷新率和更少的总体文件将有助于节省资金

赞(0）回复(0）举报 2021-06-06

我来回答

kafka connect partition.duration.ms和flush size之间的属性关系？

2条答案

相关问题

热门标签

最新问答