索引没有整数的Pandas Dataframe

cwdobuhd  于 2023-01-28  发布在  其他
关注(0)|答案(1)|浏览(138)

如果我有一个 Dataframe

date
01.01.2003
02.01.2003
03.01.2003
05.01.2003
06.01.2003

我用这个代码

for i in (df['date']):
    if df['date'].iloc[i+1]-df['date'].iloc[i] == 1 :
        df['Max'] = df['date'].iloc[i+1]
    else :
        df['Max'] = ''

它会把错误

Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported.  Instead of adding/subtracting `n`, use `n * obj.freq`

但是如果我把i改为timdelta(days = 1),它会变成错误,说如果不是整数,就不能索引。
那么代码应该怎么样呢?我想在每一个连续的日子里定义"最大值"。
这是我喜欢的输出

date         max
01.01.2003   
02.01.2003
03.01.2003   03.01.2003
05.01.2003
06.01.2003   06.01.2003
  • 请注意,它只写入连续的最大值,并将其他值留空。从03.01.2003到05.01.2003不连续,因此重新开始
nimxete2

nimxete21#

将列转换为日期时间,然后通过Series.diff的连续日期时间将最大值获取到新列,并通过1 dayGroupBy.transformmax进行比较:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

df['Max'] = df.groupby(df['date'].diff().dt.days.ne(1).cumsum())['date'].transform('max')

#thank you Corralien for alternative
df['Max'] = df.groupby(df['date'].diff().ne('1D').cumsum())['date'].transform('max')
print (df)
        date        Max
0 2003-01-01 2003-01-03
1 2003-01-02 2003-01-03
2 2003-01-03 2003-01-03

如果需要删除连续的最大日期时间,请用途:

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

s = df['date'].diff().dt.days.ne(1).cumsum()
df['Max'] = df.groupby(s)['date'].transform('max').mask(s.duplicated(keep='last'))

#thank you Corralien for alternative
s = df['date'].diff().ne('1D').cumsum()
df['Max'] = df.groupby(s)['date'].transform('max').mask(s.duplicated(keep='last'))
print (df)
        date        Max
0 2003-01-01        NaT
1 2003-01-02        NaT
2 2003-01-03 2003-01-03
3 2003-01-05        NaT
4 2003-01-06 2003-01-06

相关问题