我有两个dataframe如下,我想返回多少成功(是)在一年(对于一个特定的人)1年前,他/她的具体日期,即每个条目在to check
中定义的范围在history
。
例如,在to_check
,Mike 20200602中,我想知道Mike的历史记录(1年前,直到20200602)中有多少成功(是)。
通过使用“to_check”作为列表,我想出了一个笨拙的方法:
import pandas as pd
import datetime
import numpy as np
from io import StringIO
import time
from datetime import datetime, date, time, timedelta
csvfile = StringIO("""
Name Check
Mike 20200602
David 20210415
Kate 20201109""")
csvfile_1 = StringIO("""
Name History Success
David 20180312 Yes
David 20180811 Yes
David 20191223 Yes
David 20210311 Yes
Kate 20180906 Yes
Kate 20180912 Yes
Kate 20191204 Yes
Kate 20200505 Yes
Mike 20180912 Yes
Mike 20190312 Yes
Mike 20190806 Yes
Mike 20191204 Yes""")
df_check = pd.read_csv(csvfile, sep = ' ', engine='python')
df_history = pd.read_csv(csvfile_1, sep = ' ', engine='python')
df_history['Date'] = pd.to_datetime(df_history['History'], format='%Y%m%d')
to_check = ["Mike 20200602","David 20210415","Kate 20201109"]
for t in to_check:
name, d = t.split(" ")
date_obj = datetime.strptime(d, '%Y%m%d')
delta = timedelta(days = 365)
day_before = date_obj - delta
m1 = df_history['Name'] == name
m2 = df_history['Date'] >= day_before
df_history['OP'] = np.where(m1 & m2, "fit", '')
how_many = df_history['OP'].value_counts().tolist()[1]
print (t, how_many)
输出:
Mike 20200602 2
David 20210415 1
Kate 20201109 2
什么是更好更聪明的方法来实现它?
1条答案
按热度按时间vmdwslir1#
merge
和query
,但我建议将日期保留为数字,以便于偏移:输出: