为什么spark流在以前的主题记录上运行？

jc3wubiy 于 2021-06-06 发布在 Kafka

关注(0)|答案(1)|浏览(342)

我经营Zookeeper和Kafka经纪人，但我没有经营Kafka制片人。我运行了spark流代码并在这里打印了未过滤的流。我的问题是，为什么我要接收这些数据流
{“vehicleid”：“0”，“lon”：“0”，“lat”：“0”，“ts”：“0”}
{“vehicleid”：“0”，“lon”：“0”，“lat”：“0”，“ts”：“0”}
虽然我不是制片人？这些信息是什么意思？

19/06/24 20:20:00 INFO JobScheduler: Finished job streaming job 1561378800000 ms.0 from job set of time 1561378800000 ms
19/06/24 20:20:00 INFO JobScheduler: Total delay: 0.028 s for time 1561378800000 ms (execution: 0.021 s)
19/06/24 20:20:00 INFO MapPartitionsRDD: Removing RDD 161 from persistence list
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1716
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1893
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1944
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
{"vehicleId":"0","lon":"0","lat":"0","ts":"0"}
...

19/06/24 20:20:00 INFO KafkaRDD: Removing RDD 160 from persistence list
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1628
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1781
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1570
19/06/24 20:20:00 INFO BlockManager: Removing RDD 161
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1808
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 2020
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1624
19/06/24 20:20:00 INFO ContextCleaner: Cleaned accumulator 1918
19/06/24 20:20:00 INFO ContextCleaner: Cleaned

apache-kafka spark-streaming

来源：https://stackoverflow.com/questions/56736615/why-spark-streaming-is-running-on-previous-topics-records

1条答案

按热度按时间

wfsdck301#

您可能需要检查在“auto.offset.reset”中设置的内容。
从spark流媒体指南：

val kafkaParams = Map[String, Object](
  "bootstrap.servers" -> "localhost:9092,anotherhost:9092",
  "key.deserializer" -> classOf[StringDeserializer],
  "value.deserializer" -> classOf[StringDeserializer],
  "group.id" -> "use_a_separate_group_id_for_each_stream",
  "auto.offset.reset" -> "latest",
  "enable.auto.commit" -> (false: java.lang.Boolean)
)

他们将偏移重置设置为“最新”。你的似乎是最早的。

赞(0）回复(0）举报 2021-06-06

我来回答

为什么spark流在以前的主题记录上运行？

1条答案

相关问题

热门标签

最新问答