python-3.x 使用panda Dataframe构建结构

t3irkdon  于 2023-02-10  发布在  Python
关注(0)|答案(1)|浏览(129)

输入数据

import pandas as pd
import numpy as np

a1=["data.country", "data.studentinfo.city","data.studentinfo.name.id.grant"]
a2=["StringType()","StringType()","StringType()"]
d1=pd.DataFrame(list(zip(a1,a2)),columns=['action','type'])

我们必须使用 Dataframe 和for循环构建以下结构

StructType([StructField("data", 
    StructType([StructField("country",StringType(),True),
                StructField("studentinfo",
                StructType([StructField("city",StringType(),True),
                    StructField("name",StructType([
                        StructField("id",StructType([
                        StructField("grant",StringType(),True)])
                        )]))    
                ])
            )])
    )])
46qrfjad

46qrfjad1#

第一阶段是构建结构,然后函数将其转换为以下格式:

s = dict()
for _, r in d1.iterrows():
  d = s
  fields = r['action'].split('.')
  for name in fields[:-1]:
    if not name in d:
      d[name] = dict()
    d = d[name]
  d[fields[-1]] = r['type']

def sprint(n):
  children = list()
  for k, v in n.items():
    entry = f'StructField("{k}",'
    if type(v) is dict:
      entry += sprint(v)
    else:
      entry += f'{v},True)'
    children.append(entry)
  return f'StructType([{",".join(children)}])'

print(sprint(s))

相关问题