我几乎已经准备好了我想要做的事情,但是转换为JSON对象的方法并不能帮助我解决缺失的问题。我想得到相同的东西,但是在“添加”和“第一”中会有更多的内容,所以我需要它们是对象的数组。
我的代码是:
case class FirstIdentity(docType: String, docNumber: String, pId: String)
case class SecondIdentity(firm: String, code: String, orgType: String,
orgNumber: String, typee: String, perms: Seq[String])
case class General(id: Int, pName: String, description: String, add: Seq[SecondIdentity],
delete: Seq[String], act: String, firsts: Seq[FirstIdentity])
val someDF = Seq(
("0010XR_TYPE_6","0010XR", "222222", "6", "TYPE", "77444478", "6", 123, 1, "PF 1", "name", "description",
Seq("PERM1", "PERM2"))
).toDF("firm", "code", "org_number", "org_type", "type", "doc_number",
"doc_type", "id", "p_id", "p_name", "name", "description", "perms")
someDF.createOrReplaceTempView("vw_test")
val filter = spark.sql("""
select
firm, code, org_number, org_type, type, doc_number,
doc_type, id, p_id, p_name, name, description, perms
from vw_test
""")
val group =
filter.rdd.map(x => {
(
x.getInt(x.fieldIndex("id")),
x.getString(x.fieldIndex("p_name")),
x.getString(x.fieldIndex("description")),
SecondIdentity(
x.getString(x.fieldIndex("firm")),
x.getString(x.fieldIndex("code")),
x.getString(x.fieldIndex("org_type")),
x.getString(x.fieldIndex("org_number")),
x.getString(x.fieldIndex("type")),
x.getSeq(x.fieldIndex("perms"))
),
"act",
FirstIdentity(
x.getString(x.fieldIndex("doc_number")),
x.getString(x.fieldIndex("doc_type")),
x.getInt(x.fieldIndex("p_id")).toString
)
)
})
.toDF("id", "name", "desc", "add", "actKey", "firsts")
.groupBy("id", "name", "desc", "add", "actKey", "firsts")
.agg(collect_list("add").as("null"))
.drop("null")
group.toJSON.show(false)
结果:
{
"id": 123,
"name": "PF 1",
"desc": "description",
"add": {
"firm": "0010XR_TYPE_6",
"code": "0010XR",
"orgType": "6",
"orgNumber": "222222",
"typee": "TYPE",
"perms": [
"PERM1",
"PERM2"
]
},
"actKey": "act",
"firsts": {
"docType": "77444478",
"docNumber": "6",
"pId": "1"
}
}
我想要一个包含“ADD”和“FIRST”的数组
这一点:
编辑
{
"id": 123,
"name": "PF 1",
"desc": "description",
"add": [ <----
{
"firm": "0010XR_TYPE_6",
"code": "0010XR",
"orgType": "6",
"orgNumber": "222222",
"typee": "TYPE",
"perms": [
"PERM1",
"PERM2"
]
},
{
"firm": "0010XR_TYPE_6",
"code": "0010XR",
"orgType": "5",
"orgNumber": "11111",
"typee": "TYPE2",
"perms": [
"PERM1",
"PERM2"
]
}
],
"actKey": "act",
"firsts": [ <----
{
"docType": "77444478",
"docNumber": "6",
"pId": "1"
},
{
"docType": "411133",
"docNumber": "6",
"pId": "2"
}
]
}
1条答案
按热度按时间pbwdgjma1#
根据您的评论,您希望根据某个分组聚合Add。请勾选要分组的所有列。要自动生成的列不能是分组的一部分。这是行不通的,而且总是会给你不同的记录。
这将按照您的期望工作(我想):
结果:
在您的map函数中,您没有将FirstEntity和Second EntityMap为Seq,因此不会将Add转换为数组。
将您的Map函数更改为:
将导致以下结果: