pandas 内爆行并创建新列

n53p2ov0  于 2022-11-27  发布在  其他
关注(0)|答案(1)|浏览(133)

如何创建一个唯一的详细信息列,条件是类型列中的fruit后跟fruit -2。detail 1或detail 2可以是NaN

df type       detail1   detail2        name  
0  fruit                               apple
1  fruit -2   best      best           apple
2             yellow    yellowish      apple
3             green                    apple
4  fruit                               banana
5  sub
6  fruit -2   best      best           banana
7             yellow    orange         banana
8             green     brown          banana

预期输出

df type       detail1   detail2        name     unique_detail
0  fruit                               apple    [best, yellow, yellowish, green ]
1  fruit -2   best      best           apple    [best, yellow, yellowish, green ]
2             yellow    yellowish      apple    [best, yellow, yellowish, green ]
3             green                    apple    [best, yellow, yellowish, green brown]
4  fruit                               banana   sub: [yellow, orange, green, brown]
5  sub
6  fruit -2                            banana   sub:[yellow, orange, green, brown]
7             yellow    orange         banana   sub:[yellow, orange, green, brown]
8             green     brown          banana   sub:[yellow, orange, green, brown]

我试过了

m = df.type.eq("fruit") & df.type.shift(-1).ne("fruit -2")
df["detail"] = df.detail1 + df.detail2
df["detail"] = df.groupby("type").transform("unique")
df["detail"] = df["detail"].mask(m, "sub:"+df.detail)
avwztpqn

avwztpqn1#

确切的逻辑并不完全清楚,但是您应该使用groupby.apply的自定义函数:

def process(df):
    m1 = df['type'].shift().eq('fruit')
    m2 = df['type'].ne('fruit -2')
    m3 = df['type'].isnull()
    
    prefix = next(iter(df.loc[m1&m2, 'type']), '')
    if prefix:
        prefix += ': '
    
    return prefix + str(df[m3].filter(regex='^detail').stack().unique())

group = df['name'].ffill()

s = df.groupby(group).apply(process)

df['unique_detail'] = group.map(s)

您也可以使用作为石斑鱼:

group = (df['type'].eq('fruit')
         &df['type'].shift(-1).ne('fruit -2')
         ).cumsum()

输出量:

type detail1    detail2    name                             unique_detail
0     fruit     NaN        NaN   apple            ['yellow' 'yellowish' 'green']
1  fruit -2    best       best   apple            ['yellow' 'yellowish' 'green']
2       NaN  yellow  yellowish   apple            ['yellow' 'yellowish' 'green']
3       NaN   green        NaN   apple            ['yellow' 'yellowish' 'green']
4     fruit     NaN        NaN  banana  sub: ['yellow' 'orange' 'green' 'brown']
5       sub     NaN        NaN    None  sub: ['yellow' 'orange' 'green' 'brown']
6  fruit -2    best       best  banana  sub: ['yellow' 'orange' 'green' 'brown']
7       NaN  yellow     orange  banana  sub: ['yellow' 'orange' 'green' 'brown']
8       NaN   green      brown  banana  sub: ['yellow' 'orange' 'green' 'brown']

相关问题