python-3.x 为什么这需要14 k it/s的密钥值对缩减

qlvxas9a  于 2023-01-27  发布在  Python
关注(0)|答案(1)|浏览(150)

我有一个函数可以读取一个充满json的文件,json的每一行都有大约78个键值对,我使用

data={key:data.get(key,"") for key in keys}

到我需要的键值对,然后上传到duckdb。

def upload(file,conn):
    n=0
    temp=[]
    keys=['x1','x2','x3','x4','x5','x6','x7','x8','x9','x10','x11','x12','x13','x14','x15','x16','x17','x18','x19','x20','x21','x22','x23','x24','x25','x26','x27','x28','x29','x30','x31','x32','x33','x34','x35'
]
    for line in tqdm(file):
        data=json.loads(line)
        data={key:data.get(key,"") for key in keys}
        temp.append(data)
    df=pd.DataFrame(temp)
    conn.execute('INSERT INTO Main SELECT * FROM df')
    temp=[]

duckdb部分的代码运行速度非常快,但循环的速度大约是14000 it/s我如何提高这个速度?

lymnna71

lymnna711#

您的文件内容似乎如下:

{"x1": "value11", "x2": "value21", ...}
{"x1": "value12", "x2": "value21", ...}

您可以将其转换为:

[
  {"x1": "value11", "x2": "value21", ...},
  {"x1": "value12", "x2": "value21", ...}
]

并使用pd.read_json(file_path)

相关问题