pandas 如何在多个相似的文件上使用现有的Python脚本并生成新的单独 Dataframe

oknwwptz  于 2023-08-01  发布在  Python
关注(0)|答案(1)|浏览(106)

我有一个现有的Python脚本,它目前摄取一个Excel文件。我想调整脚本,以便它摄取四个Excel文件并执行与原始脚本相同的操作,但在最后创建四个单独的数据框-所有数据框都命名为,以便它们与原始输入文件有一些相似之处。
到目前为止,它还没有工作。有关文件名的部分出现错误:file_names = [“file1.xlsx”,“file2.xlsx”,“file3.xlsx”,“file4.xlsx”].
我肯定它是不对的,但我不知道该改什么。

import numpy as np
import pandas as pd
import os
import re
from itertools import product
import glob
import xlwings as xw
import openpyxl as op

data1 = {'Column A':[3, 17, 12, 142],
        'Column B':[20, 21, 19, 18],
        'Column C':['Year1','Year1','Year1','Year1']}
data2 = {'Column A':[192, 14, 11, 984],
        'Column B':[200, 221, 119, 158],
       'Column C':['Year2','Year2','Year2','Year2'] }
data3 = {'Column A':[2, 99, 41, 67],
        'Column B':[20, 25, 190, 187],
        'Column C':['Year3','Year3','Year3','Year3']}
data4 = {'Column 4A':[25, 919, 441, 672],
        'Column 4B':[21, 45, 100, 107],
        'Column 4C':['Year4','Year4','Year4','Year4']}

df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
df4 = pd.DataFrame(data4)

df4.rename(columns={'Column 4A': 'Column A', 'Column 4B': 'Column B​','Column 4C':'Column C'}, inplace=True)

file_names = [df1, df2, df3, df4]

def ingest_excel_files(file_names):
    """Ingests four Excel files and returns a dictionary of data frames."""
    data_frames = {}
    for file_name in file_names:
        data_frames[file_name] = pd.read_excel(file_name)
    return data_frames

def perform_operation(data_frames):
    for file_name, data_frame in data_frames.items():
        numbers = data_frame
        numbers_copy_new= numbers.copy()
        
        numbers["New Column"] = numbers["Column A"] + numbers["Column B"]

def create_dataframes(data_frames):
    """Creates data frames for each of the original files."""
    for file_name, data_frame in data_frames.items():
        df = pd.DataFrame(data_frame)
        df.to_excel(f"{file_name}.xlsx")      

if __name__ == "__main__":
    file_names = ["file1.xlsx", "file2.xlsx", "file3.xlsx", "file4.xlsx"]
    data_frames = ingest_excel_files(file_names)
    perform_operation(data_frames)
    create_dataframes(data_frames)

字符串
我期待/寻找四个新的 Dataframe ,每个 Dataframe 都有原始 Dataframe 的名称和“_new”(我知道我的脚本还没有对此的引用):

df1["New Column"] = df1["Column A"] + df1["Column B"]
print(df1)
df1_new = df1

df2["New Column"] = df2["Column A"] + df2["Column B"]
print(df2)
df2_new = df2

df3["New Column"] = df3["Column A"] + df3["Column B"]
print(df3)
df3_new = df3

df4["New Column"] = df4["Column A"] + df4["Column B"]
print(df4)
df4_new = df4

i86rm4rw

i86rm4rw1#

说明

  • 移动了用于操作一个名为perform_operation的单独函数的代码。此函数接受单个数据框,添加新列,并返回修改后的数据框。
  • create_dataframes遍历原始 Dataframe ,对每个 Dataframe 进行操作,并将新的 Dataframe 存储在具有关键字的字典中。
  • 我们创建一个字典data_frames,其中包含带有适当键的原始 Dataframe
  • 我们用data_frames调用create_dataframes函数,它返回一个包含新 Dataframe 的字典。
  • 最后,我们循环遍历新的 Dataframe ,并将每个 Dataframe 保存为具有名称的单独Excel文件。
def perform_operation(data_frame):
    data_frame["New Column"] = data_frame["Column A"] + data_frame["Column B"]
    return data_frame

def create_dataframes(data_frames):
    new_data_frames = {}
    for file_name, data_frame in data_frames.items():
        new_df = perform_operation(data_frame)
        new_data_frames[f"{file_name}_new"] = new_df
    return new_data_frames

if __name__ == "__main__":
    data_frames = {"df1": df1, "df2": df2, "df3": df3, "df4": df4}
    new_data_frames = create_dataframes(data_frames)
    
    # Save each new data frame as an Excel file
    for file_name, data_frame in new_data_frames.items():
        data_frame.to_excel(f"{file_name}.xlsx", index=False)

字符串
假设:

  • 每个DataFrame都有三个列,名为“Column A”、“Column B”和“Column C”。您要执行的操作是创建一个新列,“New Column”包含“Column A”和“Column B”之和。
  • 输出Excel文件的数据结构与原始DataFrame相同

相关问题