更简单的Pandas过滤器,选择要显示的日期时间列

vkc1a9a2  于 2023-02-14  发布在  其他
关注(0)|答案(1)|浏览(87)

我有一个员工计划,我filter以获得name, timein, timeout的DF,如下所示:

employees = [('BOB', datetime(2022,12,1,6,0,0), datetime(2022,12,1,14,0,0)),
('BOB', datetime(2022,12,2,6,0,0), datetime(2022,12,2,14,0,0)),
('GILL', datetime(2022,12,1,6,0,0), datetime(2022,12,1,14,0,0)),
('GILL', datetime(2022,12,3,6,0,0), datetime(2022,12,3,14,0,0)),
('TOBY', datetime(2022,12,1,14,0,0), datetime(2022,12,1,20,30,0))]
labels = ['name', 'timein', 'timeout']
df = pd.DataFrame.from_records(employees, columns=labels)

**我需要比较当前timeout和下一个timein值之间的时间增量。**我的想法是过滤、选择和更新到一个dict:

{'BOB' : [(datetime(2022,12,1,6,0,0), datetime(2022,12,1,14,0,0)), (datetime(2022,12,2,6,0,0), datetime(2022,12,2,14,0,0)), etc...}
那么它应该是一个简单的测试(针对常见的错误模式):dict['BOB'][i+1][0] - dict['BOB'][i][1] < fixed_duration
但Pandas经历了一些Numpy wringer和生产天知道什么:

results = {}
names = df['name'].unique().tolist()
for name in names:
    times = df.loc[df['name'] == 'BOB', ['schedulein', 'scheduleout']].values.tolist()
    results.update({name: times})
    
results

{'BOB': [[1669874400000000000, 1669903200000000000],
  [1669960800000000000, 1669989600000000000]],
 'GILL': [[1669874400000000000, 1669903200000000000],
  [1669960800000000000, 1669989600000000000]],
 'TOBY': [[1669874400000000000, 1669903200000000000],
  [1669960800000000000, 1669989600000000000]]}

为什么无法调出日期时间?
奖金如果你知道更多的Pandas的方式,我叫它,“过滤器,选择”

8fq7wneg

8fq7wneg1#

下面是您要执行的操作:

import pandas as pd
from datetime import datetime

employees = [('BOB', datetime(2022,12,1,6,0,0), datetime(2022,12,1,14,0,0)),('BOB', datetime(2022,12,2,6,0,0), datetime(2022,12,2,14,0,0)),('GILL', datetime(2022,12,1,6,0,0), datetime(2022,12,1,14,0,0)),('GILL', datetime(2022,12,3,6,0,0), datetime(2022,12,3,14,0,0)),('TOBY', datetime(2022,12,1,14,0,0), datetime(2022,12,1,20,30,0))]
labels = ['name', 'timein', 'timeout']
df = pd.DataFrame.from_records(employees, columns=labels)

results = {}
names = df['name'].unique().tolist()
for name in names:
    times = df.loc[df['name'] == name, ['timein', 'timeout']].astype(object).values.tolist()
    results.update({name: times})

print(results)

它提供给您:

{'BOB': [[Timestamp('2022-12-01 06:00:00'), Timestamp('2022-12-01 14:00:00')], [Timestamp('2022-12-02 06:00:00'), Timestamp('2022-12-02 14:00:00')]], 'GILL': [[Timestamp('2022-12-01 06:00:00'), Timestamp('2022-12-01 14:00:00')], [Timestamp('2022-12-03 06:00:00'), Timestamp('2022-12-03 14:00:00')]], 'TOBY': [[Timestamp('2022-12-01 14:00:00'), Timestamp('2022-12-01 20:30:00')]]}

相关问题