如何在附加模式下导出DataFrame到_json- PythonPandas？

mmvthczy 于 2022-12-15 发布在 Python

关注(0)|答案(5)|浏览(140)

我有一个现有的JSON文件格式的字典列表。

$cat output.json
[{'a':1, 'b':2}, {'a':2, 'b':3}]

我有一个数据框

df = pd.DataFrame({'a':pd.Series([1,2], index=list('CD')), \
              "b":pd.Series([3,4], index=list('CD')})

我想用to_json保存“df”，以便将其附加到文件output.json中：

df.to_json('output.json', orient='records')  #  mode='a' not available for to_json

to_csv有append mode='a'，但实际上to_json没有。

预期生成的output.json文件为：

[{'a':1, 'b':2}, {'a':2, 'b':3}, {'a':1, 'b':3}, {'a':2, 'b':4}]

现有的output.json文件可能很大（比如Tetabytes），是否可以在不加载文件的情况下追加新的 Dataframe 结果？

JSON

来源：https://stackoverflow.com/questions/30227872/how-to-export-dataframe-to-json-in-append-mode-python-pandas

5条答案

按热度按时间

blmhpbnm1#

没有，如果不使用pandas或json模块重写整个文件，你就不能追加到json文件中。你可以“手动”修改文件，方法是在a模式下打开文件，找到正确的位置并插入数据。但我不推荐这样做。如果您的文件要比RAM大，最好使用json以外的文件格式。
这个answer可能也有帮助，它不会创建有效的json文件（相反，每行都是一个json字符串），但是它的目标和你的非常相似。

赞(0）回复(0）举报 2022-12-15

3gtaxfhh2#

也许你需要从orient='records'的Angular 来考虑：

def to_json_append(df,file):
    '''
    Load the file with
    pd.read_json(file,orient='records',lines=True)
    '''
    df.to_json('tmp.json',orient='records',lines=True)
    #append
    f=open('tmp.json','r')
    k=f.read()
    f.close()
    f=open(file,'a')
    f.write('\n') #Prepare next data entry
    f.write(k)
    f.close()

df=pd.read_json('output.json')
#Save again as lines
df.to_json('output.json',orient='records',lines=True)
#new data
df = pd.DataFrame({'a':pd.Series([1,2], index=list('CD')), \
              "b":pd.Series([3,4], index=list('CD')})
#append:
to_json_append(df,'output.json')

加载完整数据

pd.read_json('output.json',orient='records',lines=True)

赞(0）回复(0）举报 2022-12-15

bwleehnv3#

我已经解决了它只是使用内置Pandas。DataFrame方法。你需要记住的情况下巨大的 Dataframe 的性能（有办法处理它）。代码：

if os.path.isfile(dir_to_json_file):
    # if exist open read it
    df_read = pd.read_json(dir_to_json_file, orient='index')
    # add data that you want to save
    df_read = pd.concat([df_read, df_to_append], ignore_index=True)
    # in case of adding to much unnecessery data (if you need)
    df_read.drop_duplicates(inplace=True)

    # save it to json file in AppData.bin
    df_read.to_json(dir_to_json_file, orient='index')
else:
    df_to_append.to_json(dir_to_json_file, orient='index')

赞(0）回复(0）举报 2022-12-15

ctrmrzij4#

例如，用小内存将大量数据写入JSON文件：
假设我们有1，000个 Dataframe ，每个 Dataframe 相当于1000，000行json，每个 Dataframe 需要100MB，总文件大小为1000 * 100MB = 100GB。
解决方案：
1.使用缓冲区存储每个 Dataframe 的内容
1.我用Pandas把它转换成文本
1.使用附加模式写入文本到文件的结尾

import io
import pandas as pd
from pathlib_mate import Path

n_lines_per_df = 10
n_df = 3
columns = ["id", "value"]
value = "alice@example.com"
f = Path(__file__).change(new_basename="big-json-file.json")
if not f.exists():
    for nth_df in range(n_df):
        data = list()
        for nth_line in range(nth_df * n_lines_per_df, (nth_df + 1) * n_lines_per_df):
            data.append((nth_line, value))
        df = pd.DataFrame(data, columns=columns)
        buffer = io.StringIO()
        df.to_json(
            buffer,
            orient="records",
            lines=True,
        )
        with open(f.abspath, "a") as file:
            file.write(buffer.getvalue())

赞(0）回复(0）举报 2022-12-15

nhhxz33t5#

你可以这样做，它会把每条记录/行写成json，并在新的一行中。

f = open(outfile_path, mode="a")

for chunk_df in data:
    f.write(chunk_df.to_json(orient="records", lines=True))

f.close()

赞(0）回复(0）举报 2022-12-15

我来回答

如何在附加模式下导出DataFrame到_json- PythonPandas？

5条答案

相关问题

热门标签

最新问答