运行Python,提取XML数据并将数据Map到MySQL表-数据框架问题

2fjabf4q  于 2023-02-26  发布在  Python
关注(0)|答案(1)|浏览(99)

我从另一个答案的代码片段开始。但是我的问题是将 Dataframe 转换为MySQL表,显示数据库中的所有行。请参阅下面的详细信息...

#reading mapping file and converting mapping to dictionary
import os
import pandas as pd
from sqlalchemy import create_engine
map_path = 'D:\StackExchange\Mapping.xlsx'
if os.path.isfile(map_path):
    #map_df = pd.read_excel(map_path,worksheet='Mapping')
    map_df = pd.read_excel(map_path)
    mapping_dict = pd.Series(map_df['XML Columns'].values,index=map_df['SQL Columns']).to_dict()

#Reading XML file

import xml.etree.ElementTree as ET
xml_path = 'D:\StackExchange\PostLinks.xml'
if os.path.isfile(xml_path):
        root = ET.parse(xml_path).getroot()

#Reading xml elements one by one and storing attributes in a dictionary.
#line 23 "if k in ['', '']:" <----- 'COLUMNname','COLUMNname','COLUMNname'
missing_col = []
xmldf_dict = {"df_dicts":[]}
for elem in root:
    df_dict = {}
    for k,v in mapping_dict.items():
        if k in ['Body']:
            continue
        try:
            df_dict[k] =  elem.attrib[v]
        except KeyError:
            missing_col.add(k)

    xmldf_dict["df_dicts"].append(df_dict)

#Merging missing columns dataframe with xml dataframe

missing_col_df = pd.DataFrame(columns=missing_col)
xml_df = pd.DataFrame(xmldf_dict["df_dicts"])
final_df = pd.concat([xml_df,missing_col_df],axis=1)
#print(final_df)

my_conn=create_engine("mysql+mysqldb://sqluser:password@localhost/stackexchange_project")
df = pd.DataFrame(xmldf_dict["df_dicts"],index=0)
df.to_sql(con=my_conn,name='postlinks',if_exists='append',index=False)

倒数第二行:

df = pd.DataFrame(xmldf_dict["df_dicts"],index=0)

我尝试将所有xml数据传递到MySql中新创建的Table中。我得到三个结果:

  • MySQL错误:错误代码:2013.查询期间与MySQL服务器断开连接

我试过调整代码,有时它也会返回标题和一行,或者只是标题。感谢您的时间!

mqkwyuun

mqkwyuun1#

“查询期间丢失与MySQL服务器的连接”表示在执行to_sql()方法期间丢失与MySQL服务器的连接
我建议以块的形式写入服务器

import xml.etree.ElementTree as ET
import pandas as pd
from sqlalchemy import create_engine

xml_path = 'D:\StackExchange\PostLinks.xml'
if os.path.isfile(xml_path):
    root = ET.parse(xml_path).getroot()

# Define column mapping
mapping_dict = {
    "Id": "Id",
    "CreationDate": "CreationDate",
    "PostId": "PostId",
    "RelatedPostId": "RelatedPostId",
    "LinkTypeId": "LinkTypeId"
}

# Read xml elements one by one and store attributes in a dictionary
missing_col = set()
xmldf_dict = {"df_dicts":[]}
for elem in root:
    df_dict = {}
    for k, v in mapping_dict.items():
        if k == 'Body':
            continue
        try:
            df_dict[k] = elem.attrib[v]
        except KeyError:
            missing_col.add(k)

    xmldf_dict["df_dicts"].append(df_dict)

# Merge missing columns dataframe with xml dataframe
missing_col_df = pd.DataFrame(columns=missing_col)
xml_df = pd.DataFrame(xmldf_dict["df_dicts"])
final_df = pd.concat([xml_df, missing_col_df], axis=1)

# Write data to MySQL database
my_conn = create_engine("mysql+mysqldb://sqluser:password@localhost/stackexchange_project")

# Set chunksize to insert the data in smaller chunks
chunksize = 1000

for i in range(0, len(xmldf_dict["df_dicts"]), chunksize):
    chunk = xmldf_dict["df_dicts"][i:i+chunksize]
    df = pd.DataFrame(chunk)
    df.to_sql(con=my_conn, name='postlinks', if_exists='append', index=False)

print("Data written to MySQL database successfully!")

这样做的唯一问题是,如果您多次运行代码,它将继续向现有表追加新行,这可能会导致重复行。
您可以删除该表并在再次运行代码之前重新创建它,或者使用其他方法(如if_exists='replace')用新数据替换整个表。

相关问题