我想为朝阳创建一个名为child JSON的层次结构。我使用python创建了一个代码,但是totalyearlycompensation的列需要显示总和,而其他4列都是相同的。例如
Product Manager,Master's Degree,Asian,Male,4500000
Product Manager,Master's Degree,Asian,Male,4980000
这应该反映出年度总薪酬总额为(9480000)
title,Education,Race,gender,totalyearlycompensation
Mechanical Engineer,Bachelor's Degree,Asian,Male,5000
Mechanical Engineer,Bachelor's Degree,Asian,Male,5000
Mechanical Engineer,Bachelor's Degree,Asian,Female,180000
Data Scientist,Bachelor's Degree,Asian,Female,180000
Software Engineer,Bachelor's Degree,Asian,Male,10000
Software Engineer,Bachelor's Degree,Asian,Male,10000
Sales,Master's Degree,Asian,Male,10000
Software Engineer,Master's Degree,Asian,Male,10000
Software Engineer,Bachelor's Degree,Asian,Male,10000
Software Engineer,Bachelor's Degree,Asian,Male,10000
Software Engineer,Bachelor's Degree,Asian,Male,10000
Software Engineer,Bachelor's Degree,Asian,Female,10000
Software Engineer,Bachelor's Degree,Asian,Male,11000
Business Analyst,Master's Degree,Asian,Male,11000
Software Engineer,Bachelor's Degree,Asian,Male,11000
Software Engineer,Master's Degree,Asian,Male,11000
Software Engineer,Bachelor's Degree,Asian,Male,11000
Software Engineer,Bachelor's Degree,Asian,Male,11000
Software Engineer,Bachelor's Degree,Asian,Male,11000
Software Engineer,Bachelor's Degree,Asian,Male,11000
Software Engineer,Bachelor's Degree,Asian,Female,11000
Software Engineer,Bachelor's Degree,Asian,Male,11000
Software Engineer,Bachelor's Degree,Asian,Male,11000
Human Resources,Master's Degree,Asian,Female,11000
Product Designer,Bachelor's Degree,Asian,Male,13000
Software Engineer,Bachelor's Degree,Asian,Male,13000
Software Engineering Manager,Bachelor's Degree,Asian,Male,13000
Software Engineering Manager,Master's Degree,White,Female,1605000
Software Engineering Manager,Bachelor's Degree,White,Male,1733000
Software Engineering Manager,Master's Degree,Black,Male,2372000
Product Manager,Master's Degree,Asian,Male,4500000
Product Manager,Master's Degree,Asian,Male,4980000
...........................................continues with more similar data **
输出与以下内容类似
{
"name": "Flare",
"children": [
{
"name": "Mechanical Engineer",
"children": [
{
"name": Bachelor's Degree",
"children": [
{
"name": "Asian",
"children": [
{
"name": "Male",
"totalyearlycompensation": 10000
},
{
"name": "Female",
"totalyearlycompensation": 180000
}
]
},
{
"name": "White",
"children": [
{
"name": "Male",
"totalyearlycompensation": 550000
},
]
},
]
},
{
"name": "Master's Degree",
"children": [
{
"name": "Asian",
"children": [
{
"name": "Male",
"totalyearlycompensation": 222000
}
]
}
]
}
]
}
我的代码
import csv
import json
class Node(object):
def __init__(self, name, size=None):
self.name = name
self.children = []
self.size = size
def child(self, cname, size=None):
child_found = [c for c in self.children if c.name == cname]
if not child_found:
_child = Node(cname, size)
self.children.append(_child)
else:
_child = child_found[0]
return _child
def as_dict(self):
res = {'name': self.name}
if self.size is None:
res['children'] = [c.as_dict() for c in self.children]
else:
res['size'] = self.size
return res
root = Node('Flare')
with open('DataLevels.csv', 'r') as f:
reader = csv.reader(f)
next(reader)
for row in reader:
grp1, grp2, grp3, grp4, size = row
root.child(grp1).child(grp2).child(grp3).child(grp4, int(size))
print (json.dumps(root.as_dict(), indent=5))
with open('output.json', 'w') as f2:
(json.dump(root.as_dict(), f2, indent=4,ensure_ascii=False))
1条答案
按热度按时间js5cn81o1#
数据更自然地被存储为字典,以名称作为键,而不是列表。我认为这就达到了您的目的。注意,
Node
现在本质上是一个花哨的defaultdict
,它将其所有子对象创建为defaultdict
s。输出量: