我需要创建一个子例程,循环遍历outlook消息集合,打开附件,并将zip文件夹中的任何表格数据提取到pandas数据框中。为了获取表格数据,我创建了一个名为zip_to_dfs的函数,它接受outlook MailItem附件作为参数。
#function to extract tabluar data within zip file to pandas dataframe. returns dictionary object(key=filename;value=pandas df)
import pandas as pd, zipfile, tempfile, os
def zip_to_dfs(attachment, extract_fn=None):
#returns diciontary object with filename for key and dataframes from attached files as values
df_objects = {}
tmp=tempfile.TemporaryFile().name
attachment.SaveAsFile(tmp)
if zipfile.is_zipfile(tmp)==True:
zf = zipfile.ZipFile(tmp)
#below subroutine could be made to separate function (read tablular to df) to make more readable
for file in zf.infolist():
extension = os.path.splitext(file.filename)[1]
if extension in ['.xls','.xlsx','.xlsm']:
temp_df = pd.read_excel(zf.open(file.filename), header=None)
df_objects.update({file.filename:temp_df})
elif file.filename.endswith(".csv"):
temp_df = pd.read_csv(zf.open(file.filename), header=None)
df_objects.update({file.filename:temp_df})
else:
raise NotImplementedError('Unexpected filetype: '+str(file.filename))
else:
raise NotImplementedError('Expected zip file')
return(df_objects)
该函数按预期工作,但可能效率不高。有人使用过tempfile或zip文件库吗?如果是这样,你知道Zipfile和TemporaryFile方法是否会自动清理吗?或者这些文件在磁盘上是打开的?您是否发现这种方法存在其他明显的问题?
编辑的代码版本:
def zipattach_to_dfs(attachment, extract_fn=None):
#evaluates zip file attachments and returns dictionary with file name as key and dataframes as values
df_objects = {}
with NamedTemporaryFile(suffix='.tmp', delete=False) as tmp:
attachment.SaveAsFile(tmp.name)
zf = ZipFile(tmp)
for file in zf.infolist():
datetime = (file.date_time)
key = (f'{file.filename}({datetime[0]}-{datetime[1]}-{datetime[2]})')
if isexcel(file) ==True:
temp_df = pd.read_excel(zf.open(file.filename), header=None)
df_objects.update({key:temp_df})
elif file.filename.endswith(".csv"):
temp_df = pd.read_csv(zf.open(file.filename), header=None)
df_objects.update({key:temp_df})
else:
raise NotImplementedError('Unexpected filetype: '+str(file.filename))
return (df_objects)
1条答案
按热度按时间yx2lnoni1#
ZipFile
也支持 with 语句。下面是我根据你的代码提出的建议:调用这个函数看起来像下面这样: