使用 Dataframe 创建Json

ruarlubt  于 2023-02-06  发布在  其他
关注(0)|答案(1)|浏览(97)

我们正在使用下面的 Dataframe 来创建json文件
输入文件

import pandas as pd
import numpy as np
a1=["DA_STinf","DA_Stinf_NA","DA_Stinf_city","DA_Stinf_NA_ID","DA_Stinf_NA_ID_GRANT","DA_country"]
a2=["data.studentinfo","data.studentinfo.name","data.studentinfo.city","data.studentinfo.name.id","data.studentinfo.name.id.grant","data.country"]
a3=[np.NaN,np.NaN,"StringType",np.NaN,"BoolType","StringType"]
d1=pd.DataFrame(list(zip(a1,a2,a3)),columns=['data','action','datatype'])

我们必须以动态方式使用上述 Dataframe 构建以下2个结构,我们已将上述数据适配为以下格式
对于模式,例如::

StructType([StructField(Column_name,Datatype,True)])

对于数据,例如:

F.struct(F.col(column_name)).alias(json_expected_name)

架构的预期输出结构

StructType(
    [
        StructField("data", 
                    StructType(
                    [
                        StructField(
                        "studentinfo",
                        StructType(
                        [
                            StructField("city",StringType(),True),
                            StructField("name",StructType(
                            [
                            StructField("id",
                            StructType(
                                [
                                StructField("grant",BoolType(),True)
                                ])
                            )]
                        )
                    )   
                ]
            )
        ),
        StructField("country",StringType(),True)
        ])
    )   
])

2)预期数据提取

df.select(      
    F.struct(
        F.struct(
                F.struct(F.col("DA_Stinf_city")).alias("city"),
                F.struct(
                    F.struct(F.col("DA_Stinf_NA_ID_GRANT")).alias("id")
                    ).alias("name"),
        ).alias("studentinfo"),
        F.struct(F.col("DA_country")).alias("country")
    ).alias("data")
)

我们必须使用for循环,并将这些条目添加到(www.example.com)data-〉studentinfo-〉name-〉id中,我已经在预期输出结构中添加了这些条目data.studentinfo.name.id) data->studentinfo->name->id Which I have already add in expected output structure

uxh89sit

uxh89sit1#

这是结果json。2你需要怎样把json重组成一个你想要的新的层次json结构。3 action有你的树的层次元素和类型的数据类型。4我认为你可以假设null数据类型是numeric。5 name数据类型是错误的null。6它应该是stringtype

import pandas as pd
import numpy as np
import json

   a1=["DA_STinf","DA_Stinf_NA","DA_Stinf_city","DA_Stinf_NA_ID","DA_Stinf_NA_ID_GRANT","DA_country"]
a2=["data.studentinfo","data.studentinfo.name","data.studentinfo.city","data.studentinfo.name.id","data.studentinfo.name.id.grant","data.country"]
a3=["StructType","StructTypeType","StringType","NumberType","BoolType","StringType"]
df=pd.DataFrame(list(zip(a1,a2,a3)),columns=['data','action','datatype'])

json_tree=df.to_json()

{
   "data":{
      "0":"DA_STinf",
      "1":"DA_Stinf_NA",
      "2":"DA_Stinf_city",
      "3":"DA_Stinf_NA_ID",
      "4":"DA_Stinf_NA_ID_GRANT",
      "5":"DA_country"
   },
   "action":{
      "0":"data.studentinfo",
      "1":"data.studentinfo.name",
      "2":"data.studentinfo.city",
      "3":"data.studentinfo.name.id",
      "4":"data.studentinfo.name.id.grant",
      "5":"data.country"
   },
   "datatype":{
      "0":"StructType",
      "1":"StructType",
      "2":"StringType",
      "3":"NumericType",
      "4":"BoolType",
      "5":"StringType"
   }
}

def convert_action_to_hierarchy(data):
data=json.loads(data)
action = data['action']
datatype_list = data['datatype']
result = {}
for i in range(len(action)):
    action_list = action[str(i)].split('.')
   
    temp = result
    for j in range(len(action_list)):
        datatype = datatype_list[str(j)]
        result[action_list[j]]=(j,datatype)
                   
return result

print(convert_action_to_hierarchy(json_tree))

输出:

{'data': (0, 'StructType'), 'studentinfo': (1, 'StructType'), 'name': (2, 'StringType'), 'city': (2, 'StringType'), 'id': (3, 'NumberType'), 'grant': (4, 'BoolType'), 'country': (1, 'StringType')}

数字是层次结构中的级别

相关问题