如何在Python中将大型JSON转换为CSV?

ufj5ltwl  于 2023-06-19  发布在  Python
关注(0)|答案(3)|浏览(96)

我有一个4gb的json文件,我需要将其转换为csv,我尝试了以下代码:

import json
import csv

csv.field_size_limit(10**9)

With open('name.json') as json_file:
    jsondata = json.load(json_file)
 
data_file = open('name.csv', 'w', newline='')
csv_writer = csv.writer(data_file)
 
count = 0
for data in jsondata:
    if count == 0:
        header = data.keys()
        csv_writer.writerow(header)
        count += 1
    csv_writer.writerow(data.values())
 
data_file.close()

尝试了相同代码的许多变体。总是得到错误json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 3756)
需要逐行读取JSON并将数据写入CSV的代码
UPD:pastebin上此文件的三行示例

q9yhzks0

q9yhzks01#

正如Michael Butscher在评论中提到的,你可能有一个JSON Lines文件:多个有效的JSON对象,一行接一行。我说可能是因为你对问题的描述和错误代码指向JSON行,但是pastebin链接中的(格式化的)JSON已经缩进,因此不再是“行”了。
尽管如此,正如Michael所说,你可以像平常一样打开文件,遍历行,并加载每行(作为字符串):

f_in = open("input.json")
for line in f_in:
    line_data = json.loads(line)

从那里,你可以决定如何将其放入CSV,也许是这样的:

data = line_data["data"]
    writer.writerow([data["id"], data["discount"]])

这里有一个完整的建议:

import csv
import json

writer = csv.writer(open("output.csv", "w"))

f_in = open("input.json")
for i, line in enumerate(f_in):
    line_data = json.loads(line)
    meta = line_data["meta"]
    data = line_data["data"]

    if i == 0:
        writer.writerow(list(meta.keys()) + list(data.keys()))
    writer.writerow(list(meta.values()) + list(data.values()))

给定这个输入JSON行:

{"meta": {"type": "order"}, "data": {"id":  107042415, "discount":  330, "personCount":  3}}
{"meta": {"type": "order"}, "data": {"id":  107042785, "discount":  0, "personCount":  2}}
{"meta": {"type": "order"}, "data": {"id":  107042866, "discount":  0, "personCount":  1}}

我得到这个CSV:

type,id,discount,personCount
order,107042415,330,3
order,107042785,0,2
order,107042866,0,1
bvpmtnay

bvpmtnay2#

您可以将JSON读取为Pandas DataFrame,并将加载的DataFrame保存为CSV。

import pandas
df = pandas.read_json(path_or_buffer)
df.to_csv(output_path)

您可以参考https://pandas.pydata.org/docs/reference/api/pandas.read_json.html以获得完整的文档。
只要JSON没有语法错误,这就可以工作。

j0pj023g

j0pj023g3#

import json
import csv

csv.field_size_limit(10**9)

# Open the CSV file for writing.
data_file = open('name.csv', 'w', newline='')
csv_writer = csv.writer(data_file)

# Flag to indicate whether the header row has been written to the CSV file.
header_written = False

# Open the JSON file for reading.
with open('name.json', 'r') as json_file:
    for line in json_file:
        # Parse the JSON data for the current line.
        data = json.loads(line)

        # Write the header row if it hasn't been written yet.
        if not header_written:
            csv_writer.writerow(data.keys())
            header_written = True

        # Write the data row.
        csv_writer.writerow(data.values())

data_file.close()

你能试试这个吗?

相关问题