我有超过1000个CSV文件(我们称之为辅助文件),具有相同的结构(压缩大小从400 KB到3531 KB不等)
SECONDARY文件的名称对应于将它们连接到具有另一种信息类型的主CSV文档的键。
我需要在SECONDARY文件的几列上执行一些功能(有些值在整个文件中重复,需要进行分组)。
后来,我将需要得到一些信息,从次要文件考虑到一些信息,从主文件。
我试过了:
import pandas as pd
import sqlite3
import os
import re
# Set the directory where your CSV files are located
csv_directory = r"path"
# Connect to SQLite database (or create a new one if it doesn't exist)
conn = sqlite3.connect("db.db")
cursor = conn.cursor()
# Get a list of all CSV files in the directory
csv_files = [file for file in os.listdir(csv_directory) if file.endswith(".csv")]
# Define the master table name
master_bus = "master_table"
# Replace or remove invalid characters from the master table name
master_bus = re.sub(r'\W+', '', master_bus)
# loop through each CSV file and create a table in the database
for csv_file in csv_files:
table_name = os.path.splitext(csv_file)[0]
csv_path = os.path.join(csv_directory, csv_file)
# Read CSV file into a pandas DataFrame
df = pd.read_csv(csv_path)
# Add a new column with the filename
df['filename'] = os.path.basename(csv_file)
# Write the DataFrame to SQLite
df.to_sql(table_name, conn, index=False)
# Concatenate all tables into the master table
tables = cursor.execute("SELECT name FROM sqlite_master WHERE type='table';").fetchall()
tables = [table[0] for table in tables]
# Create the master table
cursor.execute(f"CREATE TABLE '{master_bus}' AS SELECT * FROM {tables[0]} WHERE 0;")
# Concatenate data from all tables into the master table
for table in tables:
cursor.execute(f"INSERT INTO {master_bus} SELECT * FROM {table};")
# Commit changes and close connection
conn.commit()
conn.close()
字符串
它有点工作。我可以用另一个代码为每个csv文件创建一个带有表的DB。但是当我试图将文件的名称添加到一个新列中时,并创建一个主表时,会发生这种情况:
OperationalError Traceback (most recent call last)
c:\Users\acuna\Desktop\Computer Science\Project_Bus.ipynb Cell 17 line 4
37 tables = [table[0] for table in tables]
39 # Create the master table
---> 40 cursor.execute(f"CREATE TABLE '{master_bus}' AS SELECT * FROM {tables[0]} WHERE 0;")
42 # Concatenate data from all tables into the master table
43 for table in tables:
OperationalError: near "-": syntax error
型
1条答案
按热度按时间mwkjh3gx1#
您的代码中的错误是您试图创建的表名中的无效字符。在SQLite中,表名不能包含特殊字符(如-),因此您需要在创建表之前删除或替换它们。
字符串