pandas 创建一个列,该列按ID指示值发生更改的时间

8gsdolmq  于 2023-05-12  发布在  其他
关注(0)|答案(4)|浏览(90)

我知道这很简单,但我有一个小麻烦,试图看到当有人切换的角色。df

ID     Date         Job 
101     05/2022      Sales 
101     06/2022      Sales 
102     12/2021      Tech    
102     1/2022       Tech 
102     2/2022       Finance 
103     4/2022       HR 
103     5/2022       Sales 
103     6/2022       Tech

所需输出:

ID     Date       Job      Switch 
101    05/2022    Sales      No 
101    06/2022    Sales      No 
102    12/2021    Tech       No 
102    01/2022     Tech       No 
102    02/2022     Finance    Yes 
103    04/2022     HR         No 
103    05/2022     Sales      Yes 
103    06/2022     Tech       Yes

我认为正确的方法是从df.groupby('ID')[Job']开始。

kiz8lqtg

kiz8lqtg1#

你是对的,在groupby上使用shift()

mask = df.groupby(['ID'])['Job'].shift().fillna(df['Job']).ne(df['Job'])
df['Switch'] = np.where(mask, 'Yes', 'No')

输出:

ID     Date      Job Switch
0  101  05/2022    Sales     No
1  101  06/2022    Sales     No
2  102  12/2021     Tech     No
3  102   1/2022     Tech     No
4  102   2/2022  Finance    Yes
5  103   4/2022       HR     No
6  103   5/2022    Sales    Yes
7  103   6/2022     Tech    Yes
8yoxcaq7

8yoxcaq72#

你可以分组,然后使用shift,像这样:

import pandas as pd

data = {
    'ID': [101, 101, 102, 102, 102, 103, 103, 103],
    'Date': ['05/2022', '06/2022', '12/2021', '01/2022', '02/2022', '04/2022', '05/2022', '06/2022'],
    'Job': ['Sales', 'Sales', 'Tech', 'Tech', 'Finance', 'HR', 'Sales', 'Tech']
}

df = pd.DataFrame(data)

# Group by 'ID' and compare current row's job with previous row's job
df['Switch'] = df.groupby('ID')['Job'].apply(lambda x: x != x.shift())
df['Switch'] = df['Switch'].map({True: 'Yes', False: 'No'})
vaqhlq81

vaqhlq813#

使用grouby.transform

df['Switch'] = df['Job'].ne(df.groupby('ID')['Job'].transform('first'))

输出:

ID     Date      Job  Switch
0  101  05/2022    Sales   False
1  101  06/2022    Sales   False
2  102  12/2021     Tech   False
3  102   1/2022     Tech   False
4  102   2/2022  Finance    True
5  103   4/2022       HR   False
6  103   5/2022    Sales    True
7  103   6/2022     Tech    True

如果您选择是/否:

df['Switch'] = np.where(df['Job'].ne(df.groupby('ID')['Job'].transform('first')),
                        'Yes', 'No')

输出:

ID     Date      Job Switch
0  101  05/2022    Sales     No
1  101  06/2022    Sales     No
2  102  12/2021     Tech     No
3  102   1/2022     Tech     No
4  102   2/2022  Finance    Yes
5  103   4/2022       HR     No
6  103   5/2022    Sales    Yes
7  103   6/2022     Tech    Yes
shyt4zoc

shyt4zoc4#

您可以使用transform和自定义函数:

has_switched = lambda x: x.shift(fill_value=x.iloc[0]).ne(x)

df['Switch'] = (df.groupby('ID')['Job']
                  .transform(has_switched)
                  .replace({True: 'Yes', False: 'No'}))

输出:

>>> df
    ID     Date      Job Switch
0  101  05/2022    Sales     No
1  101  06/2022    Sales     No
2  102  12/2021     Tech     No
3  102   1/2022     Tech     No
4  102   2/2022  Finance    Yes
5  103   4/2022       HR     No
6  103   5/2022    Sales    Yes
7  103   6/2022     Tech    Yes

相关问题