我正在尝试过滤Excel数据集,以便只显示Pandas当前一周的数据

nszi6y05  于 2023-01-24  发布在  其他
关注(0)|答案(2)|浏览(115)

这是我的数据集当前的外观

Num Name Type Status Current %  Next Milestone\nDate 
   1   sam  -    Open     54%     2023-01-16 00:00:00
   2   Dave -    Open     54%     2023-01-20 00:00:00
   3   Jake -    Open     45%     2023-01-13 00:00:00
   4   Oli  -    Open     30%     2023-01-31 00:00:00

我想过滤此数据,只显示下一个里程碑日期由于本周。我也希望格式不包括时间。今天的日期20/01/2023。我只想从16日至20日的数据,目前认为也许我应该过滤指数高达5然而,我真的不确定。

Num Name Type Status Current %  Next Milestone\nDate 
   1   sam   -    Open     54%     2023-01-16 
   2   Dave  -    Open     54%     2023-01-10 
   3   Jack  -    Open     45%     2023-01-17 
   4   Dean  -    Open     30%     2023-01-19

我试过用

for循环返回列表“days”,其中包含从当前日期开始的星期几,即星期三将给予[2,3,4,0,1]

days = []
for i in range(7):                        
day = datetime.weekdays(datetime.today())+i
if day%7<5:
    days.append(day%7)

## for loop iterates until the date reaches friday (stops increasing), 
then the else appends the rest by going backwards 
filters = [0]
 for i in range(4):                         
    if days[i+1]>days[i]:                         
        filters.append(1+i)
    else:
        for j in range(4-i):
            filters.append(-(j+1))
        break
## 'filters' contains dates relative to 'today' i.e. on tuesday, filters  
= [1,2,3,-1] <==> [We,Th,Fr,Mo]' i.e. on tuesday, weekdays = [1,2,3,-1] 
<==> [We,Th,Fr,Mo]

## 'weekdays' will contain all the weekdays needed for the filters to be 
used on the dataset

weekdays = []  

## for loop appends the dates needed for the filter, by adding the 
relative dates seen in weekdays using timedelta

for i in range(5):                                                       
    other_days = datetime.today()+timedelta(days=filters[i])      
    o_days = pd.to_datetime(other_days).date()      ## convert to datetime 
    weekdays.append(o_days)

if语句添加下一周的星期四和星期五

if datetime.weekday(datetime.today()) == 3 or 
datetime.weekday(datetime.today()) == 4:
   for i in range(5):                                                     
    other_days = datetime.today()+timedelta(days=filters[i]+7)     ## same method as before this adds the days 7 days after each day in the current week
    o_days = pd.to_datetime(other_days).date()
    weekdays.append(o_days)

## converts weekdays to datetime64 data type so it can be filtered
weekdays = np.array(weekdays, dtype='datetime64')

## applies filter
dataset_modified = dataset_modified[(dataset_modified['Next Milestone\nDate'] > min(weekdays)) & (dataset_modified['Next Milestone\nDate']< max(weekdays))]

它给出错误AttributeError:类型对象“datetime.datetime”没有属性“weekdays”

bvuwiixz

bvuwiixz1#

df['Next Milestone\nDate'] = pd.to_datetime(df['Next Milestone\nDate']).dt.date
df = df[(df['Next Milestone\nDate'] <= pd.to_datetime('20/01/2023'))&
        (df['Next Milestone\nDate'] >= pd.to_datetime('16/01/2023')))
vqlkdk9b

vqlkdk9b2#

使用Series.dt.normalize表示具有00:00:00时间的日期时间,然后按Series.between筛选boolean indexing

df['Next Milestone\nDate'] = pd.to_datetime(df['Next Milestone\nDate']).dt.normalize()

out = df[df['Next Milestone\nDate'].between('2023-01-16','2023-01-20')]

如果转换为日期过滤像imburningbabe解决方案,它是缓慢的(这里5.5倍):

#200k rows
df = pd.concat([df] * 100000, ignore_index=True)

In [189]: %%timeit
     ...: df['Next Milestone Date'] = pd.to_datetime(df['Next Milestone Date']).dt.date
     ...: df1 = df[(df['Next Milestone Date'] <= pd.to_datetime('20/01/2023'))&(df['Next Milestone Date'] >= pd.to_datetime('16/01/2023'))]
     ...: 
282 ms ± 3.89 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [190]: %%timeit
     ...: df['Next Milestone Date'] = pd.to_datetime(df['Next Milestone Date']).dt.normalize()
     ...: 
     ...: out = df[df['Next Milestone Date'].between('2023-01-16','2023-01-20')]
     ...: 
51.2 ms ± 377 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

相关问题