regex 如何在python中从文件夹中选择指定日期范围的特定csv文件？

fhg3lkii 于 2022-11-18 发布在 Python

关注(0)|答案(2)|浏览(125)

我有一个文件夹（与python脚本位于同一目录中），从1月1日到12月31日有很多csv文件，我只想从该文件夹中读取特定日期范围内的特定csv文件到python中，然后将这些文件附加到列表中。
文件命名如下，多个月的每一天都有文件：
第一条银行在2020年2月1日至2020年3月31日期间，可以在2020年3月1日至2020年3月31日期间，在2020年3月31日至2020年3月31日期间，在2020年3月31日至2020年3月31日期间，在2020年3月31日期间，在2020年3月31日至2020年3月31日期间，在2020年3月31日期间，在2020年3月31日至2020年3月31日期间，在2020年3月31日期间，在2020年3月31日至2
目前，我有代码通过使用'startswith'和'endswith'语法来获取整个三月份的csv文件。但是，这样做只允许我一次针对一个月的文件。我希望能够在指定的日期范围内读取多个月的csv文件，例如10月，11月和12月或2月和3月（基本上开始和结束在任何月份）。
下面的代码只获取March的文件，然后从列表中提取文件并将其合并到一个 Dataframe 中。

#Accessing csv files from directory
startdate  = datetime.strptime("2022-05-01", "%Y-%m-%d")
enddate = datetime.strptime("2022-06-30", "%Y-%m-%d")
all_files = []
path = os.path.realpath(os.path.join(os.getcwd(),os.path.dirname('__file__')))
for root, dirs, files in os.walk(path):
    for file in files:
        if file.startswith("/BANK_NIFTY_5MINs_") and file.endswith(".csv"):
             file_date = datetime.strptime(os.path.basename(file), "BANK_NIFTY_5MINs_%Y-%m-%d.csv")
             if startdate <= file_date <= enddate:
                  all_files.append(os.path.join(root, file))

上述外观的输出：* 'BANK_NIFTY_5MINs_2020 -03- 01.csv'* 等等，但必须是完整路径，例如：* 'c：\Users\User123\Desktop\Myfolder\2020\BANK\BANK_NIFTY_5MINs_2020-03- 01.csv'*.合并功能要求列表中的完整路径为该格式才能进一步处理。

regex

来源：https://stackoverflow.com/questions/74386583/how-to-select-specific-csv-files-for-specified-date-range-from-a-folder-in-pytho

2条答案

按热度按时间

insrf1ej1#

我会采用不同的方法，以获得更多灵活性

import os
from datetime import datetime
from pprint import pprint

def quick_str_to_date(s: str) -> datetime:
    return datetime.strptime(s, "%Y-%m-%d")

def get_file_by_date_range(path: str, startdate: datetime or str, enddate: datetime or str) -> list:
    if type(startdate) == str:
        startdate = quick_str_to_date(startdate)
    if type(enddate) == str:
        enddate = quick_str_to_date(enddate)
    result = []   
    for root, dirs, files in os.walk(path):
        for filename in files:
            if filename.startswith("BANK_NIFTY_5MINs_") and filename.lower().endswith(".csv"):
                file_date = datetime.strptime(os.path.basename(filename), "BANK_NIFTY_5MINs_%Y-%m-%d.csv")
                if startdate <= file_date <= enddate:
                    result.append(filename)
    return result

print("all")
pprint(get_file_by_date_range("/full/path/to/files", "2000-01-01", "2100-12-31"))

print("\nfebuari")
pprint(get_file_by_date_range("/full/path/to/files", "2020-02-01", "2020-02-28"))

print("\none day")
pprint(get_file_by_date_range("/full/path/to/files", "2020-02-01", "2020-02-01"))

输出

all
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-02.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-28.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-03-01.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-03-31.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']

febuari
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-02.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-28.csv',
 '/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']

one day
['/full/path/to/files/BANK_NIFTY_5MINs_2020-02-01.csv']

赞(0）回复(0）举报 2022-11-18

q8l4jmvw2#

如果您想使用regex执行此操作，请参阅以下内容：

# replace `file.startswith(...) and file.endswith(...)`
re.match('BANK_NIFTY_5MINs_2020-(02|03|10|11|12)-[0-9]+', file)
###                              ^^^^^^^^^^^^^^ Feb, Mar, Oct-Dec

这是最基本的一个让你去，它可能会得到改善。
但对于您的情况，我会使用简单的glob：

all_files = glob.glob('./BANK_NIFTY_5MINs_2020-0[2-3]-*.csv')

赞(0）回复(0）举报 2022-11-18

我来回答

regex 如何在python中从文件夹中选择指定日期范围的特定csv文件？

2条答案

相关问题

热门标签

最新问答