pandas 基于时间戳将CSV连接到 Dataframe

o4hqfura 于 2023-02-27 发布在其他

关注(0)|答案(1)|浏览(111)

我有一个包含CSV的子目录的目录，我想将这些子目录连接到一个 Dataframe 中，但我只想对基于文件名中的时间戳导出的“最近”文件执行此操作。
例如，以下是包含在各个子目录中的文件列表：

FileA_20230208_ExportedOn_20230202T0215Z
FileA_20230208_ExportedOn_20230208T0015Z
FileB_20230208_ExportedOn_20230205T0215Z
FileB_20230208_ExportedOn_20230208T2218Z
FileC_20230208_ExportedOn_20210208T0215Z
FileC_20230208_ExportedOn_20230201T0215Z
FileC_20230208_ExportedOn_20230208T2208Z
FileC_20230208_ExportedOn_20200207T0215Z
FileA_20230209_ExportedOn_20230202T1915Z
FileA_20230209_ExportedOn_20230202T0215Z

因此，最终的 Dataframe 应该是以下4个文件的串联：

FileA_20230208_ExportedOn_20230208T0015Z
FileB_20230208_ExportedOn_20230208T2218Z
FileC_20230208_ExportedOn_20230208T2208Z
FileA_20230209_ExportedOn_20230202T1915Z

我可以通过以下方式将它们连接起来：

import pandas as pd

# Dir of CSVs
files = glob.glob('/**/*.csv', recursive=True)

# Combine all CSVs into single CSV
df = pd.concat([pd.read_csv(fp).assign(file_name=Path(fp).name)
                for fp in files], ignore_index=True)

但是，如何只选择时间戳最近的文件？

pandas

来源：https://stackoverflow.com/questions/75575730/concatenate-csvs-into-dataframe-based-on-timestamp

1条答案

按热度按时间

a8jjtwal1#

在文件列表中使用.sort()以逆序获取最新的文件（时间戳可以按字典顺序进行比较，因此我们不需要进一步解析它们），然后对列表进行切片以获取前4个文件：

files = [  # e.g. glob.glob('/**/*.csv', recursive=True)
    "FileA_20230208_ExportedOn_20230202T0215Z.csv",
    "FileA_20230208_ExportedOn_20230208T0015Z.csv",
    "FileB_20230208_ExportedOn_20230205T0215Z.csv",
    "FileB_20230208_ExportedOn_20230208T2218Z.csv",
    "FileC_20230208_ExportedOn_20210208T0215Z.csv",
    "FileC_20230208_ExportedOn_20230201T0215Z.csv",
    "FileC_20230208_ExportedOn_20230208T2208Z.csv",
    "FileC_20230208_ExportedOn_20200207T0215Z.csv",
    "FileA_20230209_ExportedOn_20230202T1915Z.csv",
    "FileA_20230209_ExportedOn_20230202T0215Z.csv",
]

files.sort(key=lambda name: name.partition("ExportedOn_")[2], reverse=True)
print(files[:4])

打印输出

[
 'FileB_20230208_ExportedOn_20230208T2218Z.csv', 
 'FileC_20230208_ExportedOn_20230208T2208Z.csv', 
 'FileA_20230208_ExportedOn_20230208T0015Z.csv', 
 'FileB_20230208_ExportedOn_20230205T0215Z.csv',
]

(i.e.我认为您的示例遗漏了从20230205导出的内容：））

赞(0）回复(0）举报 2023-02-27

我来回答

pandas 基于时间戳将CSV连接到 Dataframe

1条答案

相关问题

热门标签

最新问答