尝试将多个Excel文件中的特定列(包括动态最后一行)合并到一个 Dataframe 中

but5z9lq  于 2023-03-04  发布在  其他
关注(0)|答案(2)|浏览(106)

我尝试将14个Excel文件中的数据合并到一个数据框中,并将数据框保存为CSV文件。我正在遍历Excel文件,但没有任何数据被合并到单个数据框中。我认为问题出在代码动态查找每个Excel文件中的最后一行。所有我想要合并的数据都在CB:DL列中,从第6行开始,向下大约100k行,所有Excel文件都在不同的行号结束。
下面是我正在测试的代码。

#import modules
import pandas as pd
import glob
from openpyxl import Workbook
from openpyxl import load_workbook as xw
from openpyxl.utils import get_column_letter

# path of the folder
path = r'C:\\All Raw Data\\'
  
# reading all the excel files
filenames = glob.glob(path + "\\*.xlsx")

 
# to iterate excel file one by one 
# inside the folder
for file in filenames:
    print(file)
    
    #print('File names:', filenames)
      
    # initializing empty data frame
    finalexcelsheet = pd.DataFrame()

    wb = Workbook(file)
    print(wb)
    for sheet in wb:
        ws = wb.sheet["Speech"]
        print(ws)
    
        for col in range(1, ws.max_column + 1):
            col_letter = get_column_letter(col)
            max_col_row = len([cell for cell in ws[col_letter] if cell.value])
            print("Column: {}, Row numbers: {}".format(col_letter, max_col_row))
                
            
            # combining multiple excel worksheets into single data frames
            df = pd.concat(pd.read_excel(file, sheet_name=None, header=6, usecols='CB'+max_col_row+':DL'+max_col_row), ignore_index=True, sort=False)
            print(df.shape)
            
            # appending excel files one by one
            merged= finalexcelsheet.append(df, ignore_index=True)
  
# to print the combined data
print(merged.shape)

merged.to_csv('C:\\All Raw Data\\merged.csv')
n3schb8v

n3schb8v1#

我对你的代码做了一点修改,但我想你可以试试这个:

import pandas as pd
import glob
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter

path = r'C:\\All Raw Data\\'
filenames = glob.glob(path + "\\*.xlsx")
dfs = []

for file in filenames:
    print(file)
    wb = load_workbook(file)
    for sheetname in wb.sheetnames:
        ws = wb[sheetname]
        print(ws)
        max_row = ws.max_row
        cols = 'CB:DL'
        df = pd.read_excel(file, sheet_name=sheetname, header=6, usecols=cols, nrows=max_row-5)
        dfs.append(df)

merged = pd.concat(dfs, ignore_index=True)
print(merged.shape)
merged.to_csv('C:\\All Raw Data\\merged.csv')
emeijp43

emeijp432#

下面是最终为我工作的代码。感谢巨大的帮助!!没有你的指导和支持,arngrim 280!!

import pandas as pd
import glob
from openpyxl import load_workbook
from openpyxl.utils import get_column_letter

path = r'C:\\All Raw Data\\'
filenames = glob.glob(path + "*.xlsx")
dfs = []

for file in filenames:
    print(file)
    wb = load_workbook(file)
    for sheetname in wb.sheetnames:
        ws = wb[sheetname]
        #print(ws)
        max_row = ws.max_row
        print(max_row)
        cols = 'CB:DL'
        df = pd.read_excel(file, sheet_name=sheetname, header=5, usecols=cols, nrows=max_row)
        df['filename'] = file
        print(df.head())
        dfs.append(df)

merged = pd.concat(dfs, ignore_index=True)
print(merged.shape)
merged.to_csv('C:\\All Raw Data\\merged.csv')

相关问题