scala 使用类对象创建JSON

jpfvwuh4  于 2022-11-09  发布在  Scala
关注(0)|答案(1)|浏览(192)

我几乎已经准备好了我想要做的事情,但是转换为JSON对象的方法并不能帮助我解决缺失的问题。我想得到相同的东西,但是在“添加”和“第一”中会有更多的内容,所以我需要它们是对象的数组。
我的代码是:

case class FirstIdentity(docType: String, docNumber: String, pId: String)
case class SecondIdentity(firm: String, code: String, orgType: String,
                              orgNumber: String, typee: String, perms: Seq[String])
case class General(id: Int, pName: String, description: String, add: Seq[SecondIdentity],
                       delete: Seq[String], act: String, firsts: Seq[FirstIdentity])

val someDF = Seq(
      ("0010XR_TYPE_6","0010XR", "222222", "6", "TYPE", "77444478", "6", 123, 1, "PF 1", "name", "description",
      Seq("PERM1", "PERM2"))
    ).toDF("firm", "code", "org_number", "org_type", "type", "doc_number",
           "doc_type", "id", "p_id", "p_name", "name", "description", "perms")

someDF.createOrReplaceTempView("vw_test")

val filter = spark.sql("""
                        select
                            firm, code, org_number, org_type, type, doc_number,
                             doc_type, id, p_id, p_name, name, description, perms
                         from vw_test
                    """)

val group =
      filter.rdd.map(x => {
          (
            x.getInt(x.fieldIndex("id")),
            x.getString(x.fieldIndex("p_name")),
            x.getString(x.fieldIndex("description")),
            SecondIdentity(
              x.getString(x.fieldIndex("firm")),
              x.getString(x.fieldIndex("code")),
              x.getString(x.fieldIndex("org_type")),
              x.getString(x.fieldIndex("org_number")),
              x.getString(x.fieldIndex("type")),
              x.getSeq(x.fieldIndex("perms"))
            ),
            "act",
            FirstIdentity(
              x.getString(x.fieldIndex("doc_number")),
              x.getString(x.fieldIndex("doc_type")),
              x.getInt(x.fieldIndex("p_id")).toString
            )
          )
        })
        .toDF("id", "name", "desc", "add", "actKey", "firsts")
        .groupBy("id", "name", "desc", "add", "actKey", "firsts")
        .agg(collect_list("add").as("null"))
        .drop("null")

group.toJSON.show(false)

结果:

{
  "id": 123,
  "name": "PF 1",
  "desc": "description",
  "add": {
    "firm": "0010XR_TYPE_6",
    "code": "0010XR",
    "orgType": "6",
    "orgNumber": "222222",
    "typee": "TYPE",
    "perms": [
      "PERM1",
      "PERM2"
    ]
  },
  "actKey": "act",
  "firsts": {
    "docType": "77444478",
    "docNumber": "6",
    "pId": "1"
  }
}

我想要一个包含“ADD”和“FIRST”的数组
这一点:

编辑

{
  "id": 123,
  "name": "PF 1",
  "desc": "description",
  "add": [   <----
    {
      "firm": "0010XR_TYPE_6",
      "code": "0010XR",
      "orgType": "6",
      "orgNumber": "222222",
      "typee": "TYPE",
      "perms": [
        "PERM1",
        "PERM2"
      ]
    },
    {
      "firm": "0010XR_TYPE_6",
      "code": "0010XR",
      "orgType": "5",
      "orgNumber": "11111",
      "typee": "TYPE2",
      "perms": [
        "PERM1",
        "PERM2"
      ]
    }
  ],
  "actKey": "act",
  "firsts": [  <----
    {
      "docType": "77444478",
      "docNumber": "6",
      "pId": "1"
    },
    {
      "docType": "411133",
      "docNumber": "6",
      "pId": "2"
    }
  ]
}
pbwdgjma

pbwdgjma1#

根据您的评论,您希望根据某个分组聚合Add。请勾选要分组的所有列。要自动生成的列不能是分组的一部分。这是行不通的,而且总是会给你不同的记录。
这将按照您的期望工作(我想):

val group =
    filter.rdd.map(x => {
      (
        x.getInt(x.fieldIndex("id")),
        x.getString(x.fieldIndex("p_name")),
        x.getString(x.fieldIndex("description")),
        SecondIdentity(
          x.getString(x.fieldIndex("firm")),
          x.getString(x.fieldIndex("code")),
          x.getString(x.fieldIndex("org_type")),
          x.getString(x.fieldIndex("org_number")),
          x.getString(x.fieldIndex("type")),
          x.getSeq(x.fieldIndex("perms"))
        ),
        "act",
        FirstIdentity(
          x.getString(x.fieldIndex("doc_number")),
          x.getString(x.fieldIndex("doc_type")),
          x.getInt(x.fieldIndex("p_id")).toString
        )
      )
    })
      .toDF("id", "name", "desc", "add", "actKey", "firsts")
      .groupBy("id", "name", "desc", "actKey")
      .agg(collect_list("add").as("null"))
      .drop("null")

结果:

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                                                                                                                                                       |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"id":123,"name":"PF 1","desc":"description","actKey":"act","collect_list(add)":[{"firm":"0010XR_TYPE_6","code":"0010XR","orgType":"6","orgNumber":"222222","typee":"TYPE","perms":["PERM1","PERM2"]},{"firm":"0010XR_TYPE_5","code":"0010XR","orgType":"5","orgNumber":"222223","typee":"TYPE","perms":["PERM1","PERM2"]}]}|
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

在您的map函数中,您没有将FirstEntity和Second EntityMap为Seq,因此不会将Add转换为数组。
将您的Map函数更改为:

filter.rdd.map(x => {
      (
        x.getInt(x.fieldIndex("id")),
        x.getString(x.fieldIndex("p_name")),
        x.getString(x.fieldIndex("description")),
        Seq(SecondIdentity(
          x.getString(x.fieldIndex("firm")),
          x.getString(x.fieldIndex("code")),
          x.getString(x.fieldIndex("org_type")),
          x.getString(x.fieldIndex("org_number")),
          x.getString(x.fieldIndex("type")),
          x.getSeq(x.fieldIndex("perms"))
        )),
        "act",
        Seq(FirstIdentity(
          x.getString(x.fieldIndex("doc_number")),
          x.getString(x.fieldIndex("doc_type")),
          x.getInt(x.fieldIndex("p_id")).toString
        ))
      )
    })

将导致以下结果:

+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                                                                                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"id":123,"name":"PF 1","desc":"description","add":[{"firm":"0010XR_TYPE_6","code":"0010XR","orgType":"6","orgNumber":"222222","typee":"TYPE","perms":["PERM1","PERM2"]}],"actKey":"act","firsts":[{"docType":"77444478","docNumber":"6","pId":"1"}]}|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

相关问题