如何将所有行作为json数组写入kafka流Dataframe？

nzkunb0c 于 2021-06-06 发布在 Kafka

关注(0)|答案(2)|浏览(462)

我正在寻找一个解决方案，写的Spark流数据到Kafka。我使用以下方法将数据写入Kafka

df.selectExpr("to_json(struct(*)) AS value").writeStream.format("kafka")

但我的问题是，在给Kafka写信时，数据显示如下

{"country":"US","plan":postpaid,"value":300}
{"country":"CAN","plan":0.0,"value":30}

我的预期产出是

[
    {"country":"US","plan":postpaid,"value":300}
    {"country":"CAN","plan":0.0,"value":30}
   ]

我想把数组中的行括起来。如何在spark流媒体中实现同样的效果？有人能给点建议吗

apache-kafka apache-spark spark-structured-streaming

来源：https://stackoverflow.com/questions/55070299/how-to-write-a-streaming-dataframe-out-to-kafka-with-all-rows-as-json-array

2条答案

按热度按时间

thigvfpy1#

我真的不确定这是否可行，但我还是会把我的建议贴在这里；因此，您可以在以后转换Dataframe：

//Input  
 inputDF.show(false)
 +---+-------+
 |int|string |
 +---+-------+
 |1  |string1|
 |2  |string2|
 +---+-------+

 //convert that to json
 inputDF.toJSON.show(false)
 +----------------------------+
 |value                       |
 +----------------------------+
 |{"int":1,"string":"string1"}|
 |{"int":2,"string":"string2"}|
 +----------------------------+

 //then use collect and mkString
 println(inputDF.toJSON.collect().mkString("[", "," , "]"))
 [{"int":1,"string":"string1"},{"int":2,"string":"string2"}]

赞(0）回复(0）举报 2021-06-07

vlju58qv2#

我假设流Dataframe的模式( df )具体如下：

root
 |-- country: string (nullable = true)
 |-- plan: string (nullable = true)
 |-- value: string (nullable = true)

我还假设您希望写入（生成）流Dataframe中的所有行( df )以单个记录的形式输出到kafka主题，其中的行是json数组的形式。
如果是这样，你应该 groupBy 排成一排 collect_list 将所有行组合成一行，您可以将其写出给Kafka。

// df is a batch DataFrame so I could show for demo purposes
scala> df.show
+-------+--------+-----+
|country|    plan|value|
+-------+--------+-----+
|     US|postpaid|  300|
|    CAN|     0.0|   30|
+-------+--------+-----+

val jsons = df.selectExpr("to_json(struct(*)) AS value")
scala> jsons.show(truncate = false)
+------------------------------------------------+
|value                                           |
+------------------------------------------------+
|{"country":"US","plan":"postpaid","value":"300"}|
|{"country":"CAN","plan":"0.0","value":"30"}     |
+------------------------------------------------+

val grouped = jsons.groupBy().agg(collect_list("value") as "value")
scala> grouped.show(truncate = false)
+-----------------------------------------------------------------------------------------------+
|value                                                                                          |
+-----------------------------------------------------------------------------------------------+
|[{"country":"US","plan":"postpaid","value":"300"}, {"country":"CAN","plan":"0.0","value":"30"}]|
+-----------------------------------------------------------------------------------------------+

我会在datastreamwriter.foreachbatch中执行以上所有操作，以获得要处理的Dataframe。

赞(0）回复(0）举报 2021-06-06

我来回答

如何将所有行作为json数组写入kafka流Dataframe？

2条答案

相关问题

热门标签

最新问答