dd前缀重复值并将具有此值的新行添加到pandas中的特定组

uinbv5nw  于 2023-09-29  发布在  其他
关注(0)|答案(1)|浏览(87)

我有一个df,比如

Type   Species     Value
Dog    Species2    100
Dog    Species1    200
Dog    Species3    300
Dog    Species3    900
ALL_   Species1    400
ALL_   Species2    500
ALL_   Species3    600

如何为每个重复的Species向ALL_添加行,同时向ALL_DOG中的重复Species添加suffix_number
我应该得到:

Type   Species     Value
Dog    Species2    100
Dog    Species1    200
Dog    Species3_1  300
Dog    Species3_2  900
ALL_   Species1    400
ALL_   Species2    500
ALL_   Species3_1  600
ALL_   Species3_2  600

这不是Pandas dataframe - add suffix to column value only if it is repeated的重复,因为我还需要为组ALL_中的重复Species添加新行
下面是一个简单的例子,如果它可以帮助:

data = {'Type': ['Mammuthus', 'Mammuthus', 'Mammuthus', 'Mammuthus', 'ALL_', 'ALL_', 'ALL_'],
        'Species': ['Species2', 'Species1', 'Species3', 'Species3', 'Species1', 'Species2', 'Species3'],
        'Value': [100, 200, 300, 900, 400, 500, 600]}
xmd2e60i

xmd2e60i1#

填充缺失组合

out = (df
   .assign(idx=df.groupby(['Type', 'Species']).cumcount().add(1))
   .pivot(index=['Species', 'idx'], columns='Type', values='Value')
   .groupby(level='Species').ffill()
   .reset_index().melt(['Species', 'idx'])
   .assign(Species=lambda d: d['Species'] + ('_'+d.pop('idx').astype(str))
                   .where(d.duplicated(['Species', 'Type'], keep=False), ''))
 )

输出量:

Species       Type  value
0    Species1       ALL_  400.0
1    Species2       ALL_  500.0
2  Species3_1       ALL_  600.0
3  Species3_2       ALL_  600.0
4    Species1  Mammuthus  200.0
5    Species2  Mammuthus  100.0
6  Species3_1  Mammuthus  300.0
7  Species3_2  Mammuthus  900.0

原始答案

您可以为每个组使用自定义groupby.cumcount

df['Species'] += (df.groupby('Type')['Species']
                    .transform(lambda s: s.groupby(s).cumcount().add(1)
                                          .astype(str).radd('_')
                                          .where(s.duplicated(keep=False), '')
                              )
                 )

替代语法:

m = df.duplicated(subset=['Type', 'Species'], keep=False)
df.loc[m, 'Species'] += (m[m].groupby([df['Type'], df['Species']])
                         .cumcount().add(1).astype(str).radd('_')
                        )

输出量:

Type     Species  Value
0   Dog    Species2    100
1   Dog    Species1    200
2   Dog  Species3_1    300
3   Dog  Species3_2    900
4  ALL_    Species1    400
5  ALL_    Species2    500
6  ALL_    Species3    600

相关问题