pandas 通过循环提取的数据之间的空白(来自多个Excel文件)

7fyelxc5 于 2022-12-16 发布在其他

关注(0)|答案(1)|浏览(146)

path ="C:\\Users\\Adam\\Desktop\\Stock Trackers\\"

excel_file_list = os.listdir(path)

finalDf = pd.DataFrame()
for file in excel_file_list:
 #if excel_files.startswith("Stock"): 
    df = pd.read_excel(path+file,sheet_name="Main",usecols="A:D,R")
    df['Qty Received']=df['Total Received']
    df = df.drop('Total Received', axis=1)
    df['InvoicedValue'] = df['Price']*df['Qty Invoiced']
    df['ReceivedValue'] = df['Price']*df['Qty Received']
    df['DeltaQty']= df['Qty Received']-df['Qty Invoiced']
    df['DeltaValue']= df['ReceivedValue']-df['InvoicedValue']
    finalDf = pd.concat([finalDf, df])

finalDf.to_excel("finalfile4.xlsx")

上面的脚本生成了“finalfile4”，但问题是最终输出没有将所有数据（每个文件的）堆叠在一起。
在每个文件之后（因此for循环的每次迭代），都有空行，直到第10982行，在该行之后，下一个文件的数据开始：

.
在两个文件的数据之间，除了“接收数量”列为零外，还有空行。
我如何修正这个问题，使循环的每次迭代都在前一个文件数据的正下方输出数据？

pandas

来源：https://stackoverflow.com/questions/74813953/empty-gaps-between-data-pulled-via-loop-from-multiple-excel-files

1条答案

按热度按时间

gopyfrb31#

您可能在原始excel工作表中使用公式填充了整列。从屏幕截图中可以看出，A（no name）和F（"Qty Received"）列有值，即使其余行为空。您可以在将其连接到主 Dataframe 之前从读取的 Dataframe 中删除这些行。

...
for file in excel_file_list:
 #if excel_files.startswith("Stock"): 
    df = pd.read_excel(path+file,sheet_name="Main",usecols="A:D,R")
    df = df[df["EAN"] != ""] # Select only those rows where EAN is not blank
    ...

赞(0）回复(0）举报 2022-12-16

我来回答

pandas 通过循环提取的数据之间的空白(来自多个Excel文件)

1条答案

相关问题

热门标签

最新问答