Pandas dataframe按年份获取最后一行

esbemjvw  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(138)

我需要过滤下面的dataframe,以仅获取每年的最后日期:

A          B          C          D          E        F          G  
Date                                                                                   
2018-05-03  20.700001  20.840000  20.549999  20.549999  12.248506  1355100   0.462484  
2018-05-15  19.400000  19.400000  18.620001  19.000000  11.748817  5060500   0.736852  
2018-08-10  19.709999  19.840000  19.209999  19.400000  12.286919  2369800   0.477069  
2018-11-12  22.190001  22.500000  22.030001  22.379999  14.629877  1727100   0.711570  
2018-12-18  23.450001  23.450001  22.730000  22.799999  15.160932  1576600   0.399275  
2019-05-06  25.990000  26.000000  25.570000  25.950001  17.365524  1500700   0.165950  
2019-05-20  23.990000  24.940001  23.990000  24.889999  16.851423  1449000   0.280836  
2019-08-09  28.299999  28.450001  27.900000  28.250000  19.616335  1868600   0.722007 
2019-11-21  27.750000  27.959999  27.709999  27.820000  19.695263   884000   0.541491 
2019-12-17  30.299999  30.440001  29.910000  30.030001  21.386765  1019300   0.180416

预期成果:

A          B          C          D          E        F          G  
Date                                                                                   
2018-12-18  23.450001  23.450001  22.730000  22.799999  15.160932  1576600   0.399275  
2019-12-17  30.299999  30.440001  29.910000  30.030001  21.386765  1019300   0.180416
s4n0splo

s4n0splo1#

如果你的索引是一个日期时间索引,那么以下两种解决方案中的任何一种都应该有效:

df.groupby(pd.Grouper(level=0,freq = 'A')).tail(1)

df.groupby(lambda x: x.year).tail(1)
zc0qhyus

zc0qhyus2#

你只需要使用一个group by并找到最大日期,然后就很容易过滤了。见下面的代码。

import pandas as pd
import numpy as np

# Generate random dates
def random_dates(start, end, n=10):

    start_u = start.value//10**9
    end_u = end.value//10**9

    return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')

start = pd.to_datetime('2015-01-01')
end = pd.to_datetime('2017-01-31')
dates = random_dates(start, end, 20)
df = pd.DataFrame({'date':dates,
                   'value':[x for x in range(20)]})

# Actually do the problem on generated df
s = df.loc[:,'date']
s = s.groupby(s.dt.strftime('%Y')).max().tolist()

print(df[df['date'].isin(s)])

相关问题