我想使用Python将下面的嵌套JSON文件转换为CSV文件。
{
"page": {
"page": 1,
"pageSize": 250
},
"dataRows": [
{
"entityId": 349255,
"Id": "41432-95P",
"disabled": false,
"followed": false,
"suggestion": false,
"inactive": false,
"pinned": false,
"highlighted": false,
"columnValues": {
"lastName": [
{
"columnValueType": "ENTITY",
"accessStatus": "OK",
"columnValueType": "ENTITY",
"name": "McBrady",
"Id": "41432-95P",
"unpublished": false
}
],
"gender": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Male"
}
],
"hqCity": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Seattle"
}
],
"prefix": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Dr."
}
],
"lastUpdateDate": [
{
"columnValueType": "DATE",
"accessStatus": "OK",
"columnValueType": "DATE",
"expected": false,
"asOfdate": "2023-06-26"
}
],
"companyName": [
{
"columnValueType": "BUSINESS_ENTITY",
"accessStatus": "OK",
"columnValueType": "BUSINESS_ENTITY",
"name": "Global Partnerships",
"Id": "56347-39",
"unpublished": false,
"profileType": "INVESTOR"
}
],
"roles": [
{
"columnValueType": "INT_COLUMN_VALUE",
"accessStatus": "OK",
"columnValueType": "INT_COLUMN_VALUE",
"marked": false,
"value": 3
}
],
"dailyUpdates": [],
"assetClass": [],
"hqCountry": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "United States"
}
],
"latestNoteAuthor": [],
"primaryPosition": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Chair, Enterprise Risk, Compliance and Audit Committee and Member of the Board of Directors"
}
],
"boardSeats": [
{
"columnValueType": "INT_COLUMN_VALUE",
"accessStatus": "OK",
"columnValueType": "INT_COLUMN_VALUE",
"marked": false,
"value": 2
}
],
"fundRoles": [],
"institution": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Harvard University"
},
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "University of Oxford"
}
],
"latestNote": [],
"Id": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "41432-95P"
}
],
"hqRegion": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Americas"
}
],
"email": [],
"dealRoles": [],
"PrimaryCompanyType": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Not-For-Profit Venture Capital"
}
],
"mgtRoles": [],
"hqStateProvince": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Washington"
}
],
"fullName": [
{
"columnValueType": "ENTITY_WITH_NOTE",
"accessStatus": "OK",
"columnValueType": "ENTITY_WITH_NOTE",
"name": "Matthew McBrady Ph.D",
"Id": "41432-95P",
"unpublished": false
}
],
"hqLocation": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Seattle, WA"
}
],
"biography": [
{
"columnValueType": "DESCRIPTION_WITH_SOURCE",
"accessStatus": "OK",
"columnValueType": "DESCRIPTION_WITH_SOURCE",
"value": "Dr. Matthew McBrady serves as Chair, of the Enterprise Risk, Compliance, and Audit Committee.",
"morningstarSource": true
}
],
"firstName": [
{
"columnValueType": "ENTITY",
"accessStatus": "OK",
"columnValueType": "ENTITY",
"name": "Matthew",
"Id": "41432-95P",
"unpublished": false
}
],
"phone": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "+1 (206) 652-8773"
}
],
"hqSubRegion": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "North America"
}
],
"hqAddressLine2": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "Suite 410"
}
],
"hqAddressLine1": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "1201 Western Avenue"
}
],
"hqFax": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "+1 (206) 456-7877"
}
],
"middleName": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "R."
}
],
"companyWebsite": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "www.globalpartnerships.org"
}
],
"hqZipCode": [
{
"columnValueType": "STRING",
"accessStatus": "OK",
"columnValueType": "STRING",
"value": "98101"
}
],
"weeklyUpdates": []
}
}
]
}
字符串
在使用chatgpt迭代后,我得到了以下代码。但是,我无法让它捕获包含entityId的第一个嵌套。而且,在随后的每一个巢中,我都想捕获所有的字段;例如,在“lastName”中,我希望同时捕获“name”和“Id”,类似地,在“companyName”中,我希望“name”、“Id”和“profileType”位于单独的列中。正如我在chatgpt中提到的,我不关心“columnValueType”、“accessStatus”或“unpublished”。
下面是Python代码:
import csv
import json
def extract_field_value(data):
if isinstance(data, dict):
if 'value' in data:
return str(data['value'])
elif 'columnValueType' in data and data['columnValueType'] == 'ENTITY':
return str(data['name'])
else:
values = []
for key, value in data.items():
if key not in ['columnValueType', 'accessStatus', 'unpublished']:
values.append(extract_field_value(value))
return ', '.join(values) if values else ''
elif isinstance(data, list):
values = []
for item in data:
value = extract_field_value(item)
if value:
values.append(value)
return ', '.join(values) if values else ''
else:
return str(data) if data is not None else ''
# Read the JSON data
with open('data.json') as file:
data = json.load(file)
# Extract the nested data rows
data_rows = data['dataRows']
# Extract the column headers from the first data row
column_headers = list(data_rows[0]['columnValues'].keys())
# Create a CSV file
with open('data.csv', 'w', newline='') as file:
writer = csv.writer(file)
# Write the column headers as the first row
writer.writerow(column_headers)
# Write each data row as a separate row in the CSV file
for row in data_rows:
column_values = row['columnValues']
csv_row = []
for column_header in column_headers:
values = column_values.get(column_header, [])
value = extract_field_value(values)
csv_row.append(value)
writer.writerow(csv_row)
print("CSV file created successfully.")
型
2条答案
按热度按时间tnkciper1#
我看到你试图写一个更通用的程序来弄清楚JSON的结构。我认为这会更容易,至少在一开始,因为你知道这个结构,只是让你的代码知道它。非常明确,在这种情况下,它看起来简单得多。
这种方法还利用了csv模块中的DictWriter类,因此最后一行中有标题...不需要单独跟踪。我也喜欢打字,所以我为我看到的JSON结构添加了类型提示(特别是columnValues)。
字符串
当我在你的示例JSON上运行它时,第一行最终看起来像这样:
| 最终值| Final value |
| --| ------------ |
| 41432-95P| 41432-95P |
| 假的| False |
| 假的| False |
| 假的| False |
| 假的| False |
| 假的| False |
| 假的| False |
| 麦克布雷迪| McBrady |
| 41432-95P| 41432-95P |
| 男性| Male |
| 西雅图| Seattle |
| 博士| Dr. |
| 假的| False |
| 2023-06-26 2023-06-26| 2023-06-26 |
| 全球伙伴关系| Global Partnerships |
| 56347-39| 56347-39 |
| 投资者关系| INVESTOR |
| 假的| False |
| 三个| 3 |
| 美利坚合众国Name| United States |
| 企业风险、合规和审计委员会主席兼董事会成员| Chair, Enterprise Risk, Compliance and Audit Committee and Member of the Board of Directors |
| 假的| False |
| 二个| 2 |
| 哈佛大学| Harvard University |
| 41432-95P| 41432-95P |
| 美洲国家| Americas |
| 非营利风险投资| Not-For-Profit Venture Capital |
| 华盛顿| Washington |
| Matthew McBrady博士| Matthew McBrady Ph.D |
| 41432-95P| 41432-95P |
| 西雅图,WA| Seattle, WA |
| Matthew McBrady博士担任企业风险、合规和审计委员会主席。||
| 真的| True |
| 马修| Matthew |
| 41432-95P| 41432-95P |
| 电话:+1(206)652-8773传真:+1(206)652-8773| +1 (206) 652-8773 |
| 北美洲| North America |
| 410套房| Suite 410 |
| 西大街1201号| 1201 Western Avenue |
| 电话:+1(206)456-7877传真:+1(206)456-7877| +1 (206) 456-7877 |
| n. R| R. |
| |<www.globalpartnerships.org>|
| 九八一零一| 98101 |
我知道这与您的代码所显示的内容有所不同,但我只想给予您一个粗略的概念(方法)。我相信你可以调整代码,特别是extract_columns,以满足你的需要。
nzrxty8p2#
您可以尝试以下示例如何将Json解析为DataFrame:
字符串
图纸:
型