numpy 使用基于同一组以前值的值填充pandas数据框列

yfwxisqw  于 2023-05-07  发布在  其他
关注(0)|答案(1)|浏览(77)

考虑下面的DataFrame

df = pd.DataFrame({'Make': ['Tesla','Tesla','Tesla','Toyota','Ford','Ford','Ford','BMW','BMW','BMW','Mercedes','Mercedes','Mercedes'],
                   'Type': ['Model X','Model X','Model X','Corolla','Bronco','Bronco','Mustang','3 Series','3 Series','7 Series','C-Class','C-Class','S-Class'],
                   'Year': [2015, 2015, 2015, 2017, 2018, 2018, 2020, 2015, 2015, 2017, 2018, 2018, 2020],
                   'Price': [85000, 90000, 95000, 20000, 35000, 35000, 45000, 40000, 40000, 65000, 50000, 50000, 75000],
                   'Color': ['White','White','White','Red','Blue','Blue','Yellow','Silver','Silver','Black','White','White','Black'],
                   'Code'  : ['TSLA_BG','TSLA','-','TYTA','FRD','-_BG','-','-','BMW','BMW','Mercedes_BG','Mercedes_BG','Mercedes_BG']
                  })
df
     Make   Type       Year Price   Color   Code
0   Tesla   Model X    2015 85000   White   TSLA_BG
1   Tesla   Model X    2015 90000   White   TSLA
2   Tesla   Model X    2015 95000   White   -
3   Toyota  Corolla    2017 20000   Red     TYTA
4    Ford   Bronco     2018 35000   Blue    FRD
5    Ford   Bronco     2018 35000   Blue    -_BG
6    Ford   Mustang    2020 45000   Yellow  -
7     BMW   3 Series   2015 40000   Silver  -
8     BMW   3 Series   2015 40000   Silver  BMW
9     BMW   7 Series   2017 65000   Black   BMW
10 Mercedes C-Class    2018 50000   White   Mercedes_BG
11 Mercedes C-Class    2018 50000   White   Mercedes_BG
12 Mercedes S-Class    2020 75000   Black   Mercedes_BG

我试图根据Make列更新Code列,如果Code列有-,则必须根据Code列的其他值为同一Make正确填充。换句话说,如果任何Make具有在Code列中定义的Code,则该值应用于填充Code列中的-值,并且如果_BG附加到相同Make的任何代码值,所有的Code值都应该附加上_BG,用于相同的Make。由于BMW没有_BG用于现有的BMW代码值,因此在替换-时,它不会附加_BG
预期输出为:

Make   Type       Year Price   Color   Code
0   Tesla   Model X    2015 85000   White   TSLA_BG
1   Tesla   Model X    2015 90000   White   TSLA_BG
2   Tesla   Model X    2015 95000   White   TSLA_BG
3   Toyota  Corolla    2017 20000   Red     TYTA
4    Ford   Bronco     2018 35000   Blue    FRD_BG
5    Ford   Bronco     2018 35000   Blue    FRD_BG
6    Ford   Mustang    2020 45000   Yellow  FRD_BG
7     BMW   3 Series   2015 40000   Silver  BMW
8     BMW   3 Series   2015 40000   Silver  BMW
9     BMW   7 Series   2017 65000   Black   BMW
10 Mercedes C-Class    2018 50000   White   Mercedes_BG
11 Mercedes C-Class    2018 50000   White   Mercedes_BG
12 Mercedes S-Class    2020 75000   Black   Mercedes_BG
ix0qys7i

ix0qys7i1#

有点棘手,但它应该工作:

code = (df['Code'].str.split('(_)', expand=True).add_prefix('part').replace('-', None)
                  .groupby(df['Make']).transform('first').fillna('').agg(''.join, axis=1))
df['Code'] = code
print(df)

# Output
        Make      Type  Year  Price   Color         Code
0      Tesla   Model X  2015  85000   White      TSLA_BG
1      Tesla   Model X  2015  90000   White      TSLA_BG
2      Tesla   Model X  2015  95000   White      TSLA_BG
3     Toyota   Corolla  2017  20000     Red         TYTA
4       Ford    Bronco  2018  35000    Blue       FRD_BG
5       Ford    Bronco  2018  35000    Blue       FRD_BG
6       Ford   Mustang  2020  45000  Yellow       FRD_BG
7        BMW  3 Series  2015  40000  Silver          BMW
8        BMW  3 Series  2015  40000  Silver          BMW
9        BMW  7 Series  2017  65000   Black          BMW
10  Mercedes   C-Class  2018  50000   White  Mercedes_BG
11  Mercedes   C-Class  2018  50000   White  Mercedes_BG
12  Mercedes   S-Class  2020  75000   Black  Mercedes_BG

一步一步:

# Explode Code column but keep the separator
>>> out = df['Code'].str.split('(_)', expand=True).add_prefix('part')
       part0 part1 part2
0       TSLA     _    BG
1       TSLA  None  None
2          -  None  None
3       TYTA  None  None
4        FRD  None  None
5          -     _    BG
6          -  None  None
7          -  None  None
8        BMW  None  None
9        BMW  None  None
10  Mercedes     _    BG
11  Mercedes     _    BG
12  Mercedes     _    BG

# Replace - by None
>>> out = out.replace('-', None)
       part0 part1 part2
0       TSLA     _    BG
1       TSLA  None  None
2       None  None  None
3       TYTA  None  None
4        FRD  None  None
5       None     _    BG
6       None  None  None
7       None  None  None
8        BMW  None  None
9        BMW  None  None
10  Mercedes     _    BG
11  Mercedes     _    BG
12  Mercedes     _    BG
# Broadcast the first value of each group (None is discarded)
>>> out = out.groupby(df['Make']).transform('first')
       part0 part1 part2
0       TSLA     _    BG
1       TSLA     _    BG
2       TSLA     _    BG
3       TYTA  None  None
4        FRD     _    BG
5        FRD     _    BG
6        FRD     _    BG
7        BMW  None  None
8        BMW  None  None
9        BMW  None  None
10  Mercedes     _    BG
11  Mercedes     _    BG
12  Mercedes     _    BG

# Merge strings
>>> out = out.fillna('').agg(''.join, axis=1)
0         TSLA_BG
1         TSLA_BG
2         TSLA_BG
3            TYTA
4          FRD_BG
5          FRD_BG
6          FRD_BG
7             BMW
8             BMW
9             BMW
10    Mercedes_BG
11    Mercedes_BG
12    Mercedes_BG
dtype: object

另一种方式:

lhs = df['Code'].str.extract('^([^-_]+)', expand=False).groupby(df['Make']).transform('first')
rhs = df['Code'].str.extract('(_.*)$', expand=False).groupby(df['Make']).transform('first')
df['Code'] = lhs + rhs.fillna('')

相关问题