pandas 从df_a中的特定行开始,计算其在df_B中过去一年中的出现次数

nfeuvbwi  于 2023-04-04  发布在  其他
关注(0)|答案(1)|浏览(73)

我有两个dataframe如下,我想返回多少成功(是)在一年(对于一个特定的人)1年前,他/她的具体日期,即每个条目在to check中定义的范围在history
例如,在to_check,Mike 20200602中,我想知道Mike的历史记录(1年前,直到20200602)中有多少成功(是)。

通过使用“to_check”作为列表,我想出了一个笨拙的方法:

import pandas as pd
import datetime
import numpy as np
from io import StringIO
import time
from datetime import datetime, date, time, timedelta

csvfile = StringIO("""
Name Check
Mike 20200602
David 20210415
Kate 20201109""")

csvfile_1 = StringIO("""
Name History Success
David 20180312 Yes
David 20180811 Yes
David 20191223 Yes
David 20210311 Yes
Kate 20180906 Yes
Kate 20180912 Yes
Kate 20191204 Yes
Kate 20200505 Yes
Mike 20180912 Yes
Mike 20190312 Yes
Mike 20190806 Yes
Mike 20191204 Yes""")

df_check = pd.read_csv(csvfile, sep = ' ', engine='python')
df_history = pd.read_csv(csvfile_1, sep = ' ', engine='python')

df_history['Date'] = pd.to_datetime(df_history['History'], format='%Y%m%d')

to_check = ["Mike 20200602","David 20210415","Kate 20201109"]

for t in to_check:
    name, d = t.split(" ")
    date_obj = datetime.strptime(d, '%Y%m%d')
    delta = timedelta(days = 365)
    day_before = date_obj - delta
    m1 = df_history['Name'] == name
    m2 = df_history['Date'] >= day_before

    df_history['OP'] = np.where(m1 & m2, "fit", '')

    how_many = df_history['OP'].value_counts().tolist()[1]

    print (t, how_many)

输出:

Mike 20200602 2
David 20210415 1
Kate 20201109 2

什么是更好更聪明的方法来实现它?

vmdwslir

vmdwslir1#

mergequery,但我建议将日期保留为数字,以便于偏移:

# both `Check` and `History` are numbers, not dates
(df_check.merge(df_history, on='Name', how='left')
    .query('History<=Check<History+10000')
    .groupby('Name').agg({'History':'first', 'Success':'size'})
)

输出:

History  Success
Name                    
David  20210311        1
Kate   20191204        2
Mike   20190806        2

相关问题