numpy 基于组和条件填充列

wvmv3b1j  于 2023-05-07  发布在  其他
关注(0)|答案(1)|浏览(192)

考虑下面的Pandas DataFrame

df = pd.DataFrame({'Make': ['Tesla','Tesla','Tesla','Toyota','Ford','Ford','Ford','BMW','BMW','BMW','Mercedes','Mercedes','Mercedes'],
                   'Type': ['Model X','Model X','Model X','Corolla','Bronco','Bronco','Mustang','3 Series','3 Series','7 Series','C-Class','C-Class','S-Class'],
                   'Year': [2015, 2015, 2015, 2017, 2018, 2018, 2020, 2015, 2015, 2017, 2018, 2018, 2020],
                   'Price': [85000, 90000, 95000, 20000, 35000, 35000, 45000, 40000, 40000, 65000, 50000, 50000, 75000],
                   'Color': ['White','White','White','Red','Blue','Blue','Yellow','Silver','Silver','Black','White','White','Black'],
                   'Code'  : ['TSLABG','TSLA',None,'TYTA','FRD','_BG',None,None,'BMW','BMW','MercedesBG','Mercedes_BG','MercedesBG']
                  })
df
     Make   Type       Year Price   Color   Code
0   Tesla   Model X    2015 85000   White   TSLABG
1   Tesla   Model X    2015 90000   White   TSLA
2   Tesla   Model X    2015 95000   White   None
3   Toyota  Corolla    2017 20000   Red     TYTA
4    Ford   Bronco     2018 35000   Blue    FRD 
5    Ford   Bronco     2018 35000   Blue    _BG
6    Ford   Mustang    2020 45000   Yellow  None
7     BMW   3 Series   2015 40000   Silver  None
8     BMW   3 Series   2015 40000   Silver  BMW
9     BMW   7 Series   2017 65000   Black   BMW
10 Mercedes C-Class    2018 50000   White   MercedesBG
11 Mercedes C-Class    2018 50000   White   Mercedes_BG
12 Mercedes S-Class    2020 75000   Black   MercedesBG

我试图根据Make列更新Code列,如果Code列有None,则必须根据Code列的其他值为同一Make正确填充。换句话说,如果任何MakeCode列中定义了Code,则该值应用于填充Code列中的None值,并且如果BG_BG附加到相同Make的任何代码值,所有的Code值都应该分别附加BG_BG,以用于相同的Make
由于BMW没有BG_BG用于现有的BMW代码值,因此在替换None时,不会追加BG_BG。对于福特,其中一个值存在_BG,另一个值存在FRD,因此Ford的所有Code值都应该是FRD_BG
对于Mercedes,有Code值,并将BG附加到代码中,因此如果任何Code值在BG之前添加了_,则应从Code值中删除_
我试过的代码

code = (df['Code'].str.split('(_)', expand=True).add_prefix('part').replace('-', None).groupby(df['Make']).transform('first').fillna('').agg(''.join, axis=1))
df['Code'] = code
df

我得到的输出是MercedesBG_BG,但我期望的是MercedesBG

Make   Type       Year Price   Color   Code
0   Tesla   Model X    2015 85000   White   TSLABG
1   Tesla   Model X    2015 90000   White   TSLABG
2   Tesla   Model X    2015 95000   White   TSLABG
3   Toyota  Corolla    2017 20000   Red     TYTA
4    Ford   Bronco     2018 35000   Blue    FRD_BG
5    Ford   Bronco     2018 35000   Blue    FRD_BG
6    Ford   Mustang    2020 45000   Yellow  FRD_BG
7     BMW   3 Series   2015 40000   Silver  BMW
8     BMW   3 Series   2015 40000   Silver  BMW
9     BMW   7 Series   2017 65000   Black   BMW
10 Mercedes C-Class    2018 50000   White   MercedesBG_BG
11 Mercedes C-Class    2018 50000   White   MercedesBG_BG
12 Mercedes S-Class    2020 75000   Black   MercedesBG_BG

预期输出为:

Make   Type       Year Price   Color   Code
0   Tesla   Model X    2015 85000   White   TSLABG
1   Tesla   Model X    2015 90000   White   TSLABG
2   Tesla   Model X    2015 95000   White   TSLABG
3   Toyota  Corolla    2017 20000   Red     TYTA
4    Ford   Bronco     2018 35000   Blue    FRD_BG
5    Ford   Bronco     2018 35000   Blue    FRD_BG
6    Ford   Mustang    2020 45000   Yellow  FRD_BG
7     BMW   3 Series   2015 40000   Silver  BMW
8     BMW   3 Series   2015 40000   Silver  BMW
9     BMW   7 Series   2017 65000   Black   BMW
10 Mercedes C-Class    2018 50000   White   MercedesBG
11 Mercedes C-Class    2018 50000   White   MercedesBG
12 Mercedes S-Class    2020 75000   Black   MercedesBG
gg58donl

gg58donl1#

您可以使用(BG|_BG)而不是(_)进行拆分,以获得所需的结果:

code = (df['Code'].str.split('(BG|_BG)', expand=True).add_prefix('part').replace(
    '-', None).groupby(df['Make']).transform('first').fillna('').agg(''.join, axis=1))
df['Code'] = code

df

Make      Type  Year  Price   Color        Code
0      Tesla   Model X  2015  85000   White      TSLABG
1      Tesla   Model X  2015  90000   White      TSLABG
2      Tesla   Model X  2015  95000   White      TSLABG
3     Toyota   Corolla  2017  20000     Red        TYTA
4       Ford    Bronco  2018  35000    Blue      FRD_BG
5       Ford    Bronco  2018  35000    Blue      FRD_BG
6       Ford   Mustang  2020  45000  Yellow      FRD_BG
7        BMW  3 Series  2015  40000  Silver         BMW
8        BMW  3 Series  2015  40000  Silver         BMW
9        BMW  7 Series  2017  65000   Black         BMW
10  Mercedes   C-Class  2018  50000   White  MercedesBG
11  Mercedes   C-Class  2018  50000   White  MercedesBG
12  Mercedes   S-Class  2020  75000   Black  MercedesBG

相关问题