nosql数据库的键值对

vlf7wbxs  于 2021-05-19  发布在  Spark
关注(0)|答案(1)|浏览(457)

我正在尝试将Dataframe加载到nosql。输入是一个csv格式的文件,数据为
输入/输出:

+--------+------+---+---+---+---+----+----+
|DATE    |VAL|100|200|300|400|101 |201 |
+--------+------+---+---+---+---+----+----+
|20200701|A  |1  |2  |3  |4  |1.1 |2.1 |
|20201001|B  |10 |20 |30 |40 |10.1|20.1|
+--------+------+---+---+---+---+----+----+

值1=[100200300400]
需要将val1中的列转储到json结构“val\u 1”中,并将其余列转储到“val\u 2”中。期望输出
订单号:

{
"DATE": "20200701",
"VAL": "A",
"val_1": {
"100":"1",
"200":"2",
"300":"3",
"400":"4"
},
"val_2": {
"101":"1.1",
"201":"2.1"
},
{
"DATE": "20201001",
"VAL": "B",
"val_1": {
"100":"10",
"200":"20",
"300":"30",
"400":"40"
},
"val_2": {
"101":"10.1",
"201":"20.1"
}
xoshrz7s

xoshrz7s1#

听起来像是个问题。您可能有更好的方法来使用已有的连接器执行该任务,但以下是您当前帖子的解决方案:

df.withColumn("val_1", F.struct(["100", "200", "300", "400"])).withColumn(
    "val_2", F.struct(["101", "201",])
).select("DATE", "VAL", "val_1", "val_2").toJSON().collect()

['{"DATE":"20200701","VAL":"A","val_1":{"100":1,"200":2,"300":3,"400":4},"val_2":{"101":1.1,"201":2.1}}',
 '{"DATE":"20201001","VAL":"B","val_1":{"100":10,"200":20,"300":30,"400":40},"val_2":{"101":10.1,"201":20.1}}']

相关问题