我有麻烦循环通过我的. zip文件(其中5个)在我的目录,其中有多个工作表在他们。
1.我想计算每个. zip文件中的记录数(附加每张表的记录数)
现在我刚刚得到了下面输出的第一行。我确实认为我在df = df[:1]
这一行有一个问题,因为我认为它只限于一行。有什么想法吗?
输出应该类似于-
代码:
# Import File Function
def read_excel_sheets(xls_path):
"""Read all sheets of an Excel workbook and return a single DataFrame"""
xl = pd.ExcelFile(xls_path)
df = pd.DataFrame()
for idx, name in enumerate(xl.sheet_names):
sheet = xl.parse(name, header=None, dtype=str, ignore_index=True)
# Drop Empty Columns
sheet.dropna(axis=1, how='all', inplace=True)
# Add sheet name as column
sheet['sheet'] = name.split(" ")[-1]
# Appending Data
if len(df.columns) >= len(sheet.columns):
df = df.append(sheet, ignore_index=True, sort=True)
else:
df = sheet.append(df, ignore_index=True, sort=True)
del sheet
return df
# Process File Function
def process_files(list_of_files):
df = pd.DataFrame()
for file in list_of_files:
# zip file handler
zip = zipfile.ZipFile(file)
# list available files in the container
zfiles = zip.namelist()
#extensions to process
extensions = (".xls",".xlsx")
# Importing to Dataframe
for zfile in zfiles:
if zfile.endswith(extensions):
# For Row Count Check
df_temp = read_excel_sheets(zip.open(zfile))
df = df.append(df_temp, ignore_index=True, sort=True)
df['Length'] = len(df)
df['FileName'] = file
df = df[['Length','FileName']]
df = df[:1]
return df
input_location = r'O:\Stack\Over\Flow'
month_to_process = glob.glob(input_location + "\\2022 - 10\\*.zip")
df = process_files(month_to_process)
print(df)
1条答案
按热度按时间nwsw7zdq1#
全局化路径中所有文件名包含“Test”的zip文件。循环文件和工作表创建一个字典。将字典转换为Pandasdf。
您必须更改文件路径和文件名:
输出: