Pandas添加增量索引

hc8w905p  于 2023-08-01  发布在  其他
关注(0)|答案(4)|浏览(104)

我有下面的Pandas DataFrame

Key Value 
A    10
A    20
B    30
B    40
C    50
A    60
A    70
A    70
B    80
A    90

字符串
我需要创建一个索引,只有当键重复后,不同的键序列后,自动递增。所以,我需要这样的输出:

Key Value Index
A    10     1
A    20     1
B    30     1
B    40     1
C    50     1
A    60     2
A    70     2
A    70     2
B    80     2
A    90     3


谢谢你,谢谢
我尝试使用方法groupbycumcount() + 1,但它不工作。

zpqajqem

zpqajqem1#

import pandas as pd

df = pd.DataFrame({
    'Key': ['A', 'A', 'B', 'B', 'C', 'A', 'A', 'A', 'B', 'A'],
    'Value': [10, 20, 30, 40, 50, 60, 70, 70, 80, 90]
})

df['Index'] = (df.Key != df.Key.shift()).cumsum()
df['Index'] = df.groupby('Key')['Index'].rank(method='dense').astype(int)

display(df)


Key Value   Index
0   A   10  1
1   A   20  1
2   B   30  1
3   B   40  1
4   C   50  1
5   A   60  2
6   A   70  2
7   A   70  2
8   B   80  2
9   A   90  3

字符串
对正在发生的事情的快速分解

# checks whether the current Key is not equal to the last Key, returning a boolean series.
# The cumsum function then returns the cumulative sum of this series, which gives you the unique key for each group that you requested. 

df.Key != df.Key.shift().cumsum()

# The below ranks these numbers by each Key group, which gives each unique number within a group the same rank.

groupby('Key')['Index'].rank(method='dense')

ttp71kqs

ttp71kqs2#

使用有序的Categorical和numpy.cumsum

import numpy as np

s = pd.Categorical(df['Key'], ordered=True)
df['Index'] = np.cumsum(s<s.shift())+1

字符串

  • 如果您想要自定义订单通行证categories=['X', 'Z', 'Y']。*

或者,像@SimonT评论的那样,如果你的类别是按字典排序的:

df['Index'] = np.cumsum(df['Key']<df['Key'].shift())+1


输出量:

Key  Value  Index
0   A     10      1
1   A     20      1
2   B     30      1
3   B     40      1
4   C     50      1
5   A     60      2
6   A     70      2
7   A     70      2
8   B     80      2
9   A     90      3

fgw7neuy

fgw7neuy3#

另一种方法是使用pd.factorize计算密集秩

df['Index'] = (df['Key'] != df['Key'].shift()).cumsum()
df['Index'] = df.groupby('Key')['Index'].transform(lambda x: pd.factorize(x)[0] + 1)

字符串

输出:

Key  Value  Index
0   A     10      1
1   A     20      1
2   B     30      1
3   B     40      1
4   C     50      1
5   A     60      2
6   A     70      2
7   A     70      2
8   B     80      2
9   A     90      3

a1o7rhls

a1o7rhls4#

试试这个:

df['Key'].ne(df['Key'].shift()).groupby(df['Key']).cumsum()

字符串
或者是

df.loc[df['Key'].ne(df['Key'].shift())].groupby('Key').cumcount().add(1).reindex(df.index,method = 'ffill')


输出量:

0    1
1    1
2    1
3    1
4    1
5    2
6    2
7    2
8    2
9    3

相关问题