用多个条件替换pandas dataframe中的值

oknrviil  于 2023-06-20  发布在  其他
关注(0)|答案(6)|浏览(121)

我想创建一个执行Map的新列。例如,如果col1只包含Ted值,那么我想使MapTed,如果它包含Ted和Not Ted,我想使MapBoth,如果它只包含Not Ted值,我想使MapNot Ted。
我知道如何在两个条件下做,因为我可以使用np.where..但这更棘手因为有三个条件如何在python或pandas中做到这一点?

    • 输入:**
col1
Ted, Ted
Ted, Not Ted
Not Ted, Not Ted
Not Ted, Ted
Ted, Ted
    • 预期输出:**
col1                    new_col    
Ted, Ted                Ted
Ted, Not Ted            Both
Not Ted, Not Ted        Not Ted
Not Ted, Ted            Both
Ted, Ted                Ted
j7dteeu8

j7dteeu81#

使用split和 * set operations *:

df["new_col"] = [
    set(lst).pop() if len(set(lst))<2 else "Both"
    for lst in df["col1"].str.split(",\s*")
]

输出:

print(df)

               col1  new_col
0          Ted, Ted      Ted
1      Ted, Not Ted     Both
2  Not Ted, Not Ted  Not Ted
3      Not Ted, Ted     Both
4          Ted, Ted      Ted
ijxebb2r

ijxebb2r2#

使用explodegroupby

g = df['col1'].str.split(', *').explode().eq('Ted').groupby(level=0)

df['new_col'] = np.select([g.all(), ~g.any()], ['Ted', 'Not Ted'], 'Both')

或者:

s = (df['col1'].str.split(', *').explode()
     .groupby(level=0).agg(lambda x: list(set(x)))
     )
df['new_col'] = s.str[0].where(s.str.len()==1, 'Both')

或者,使用(冻结)设置操作:

d = {frozenset(['Ted']): 'Ted', frozenset(['Not Ted']): 'Not Ted'}
df['new_col'] = [d.get(frozenset(x.split(', ')), 'Both') for x in df['col1']]

输出:

col1  new_col
0          Ted, Ted      Ted
1      Ted, Not Ted     Both
2  Not Ted, Not Ted  Not Ted
3      Not Ted, Ted     Both
4          Ted, Ted      Ted
blpfk2vs

blpfk2vs3#

你可以使用.apply来创建你的 Dataframe 。

import pandas as pd

df = pd.DataFrame({'col1': ['Ted, Ted', 'Ted, Not Ted', 'Not Ted, Not Ted', 'Not Ted, Ted', 'Ted, Ted']})

def map_col1_to_col2(value):
    names = value.split(', ')
    if names[0] == 'Ted' and names[1] == 'Ted':
        return 'Ted'
    elif names[0] == 'Not Ted' and names[1] == 'Not Ted':
        return 'Not Ted'
    else:
        return 'Both'

df['col2'] = df['col1'].apply(map_col1_to_col2)
jdgnovmf

jdgnovmf4#

你可以使用lambda函数来检查,然后使用apply,如下所示:

def check_values(s):
    if 'Ted' in s and 'Not Ted' in s:
        return 'Both'
    elif 'Ted' in s:
        return 'Ted'
    elif 'Not Ted' in s:
        return 'Not Ted'
    else:
        return np.nan  # for cases where none of the above apply

df['new_col'] = df['col1'].apply(lambda x: check_values(x.split(', ')))

也可以使用np.where

df['new_col'] = np.where(df['col1'].str.contains('Ted') & df['col1'].str.contains('Not Ted'), 'Both',
                         np.where(df['col1'].str.contains('Ted'), 'Ted', 
                                  np.where(df['col1'].str.contains('Not Ted'), 'Not Ted', np.nan)))
wgx48brx

wgx48brx5#

下面是一种使用str.get_dummies()的方法

d = df['col1'].str.get_dummies(sep=', ')

d.mul([1,2]).sum(axis=1).map(dict(enumerate(['Neither','Not Ted','Ted','Both'])))

输出:

0        Ted
1       Both
2    Not Ted
3       Both
4        Ted
gudnpqoy

gudnpqoy6#

import pandas as pd

# using dataframe already created by Seojin Kim
df = pd.DataFrame({'col1': ['Ted, Ted', 'Ted, Not Ted', 'Not Ted, Not Ted', 'Not Ted, Ted', 'Ted, Ted']})
print(df)

col1
0   Ted, Ted
1   Ted, Not Ted
2   Not Ted, Not Ted
3   Not Ted, Ted
4   Ted, Ted

df['col2']=df['col1'].apply(lambda x:','.join(set(i.strip() for i in x.split(','))))

print(df)

                col1                   col2
0           Ted, Ted                    Ted
1       Ted, Not Ted            Ted,Not Ted
2   Not Ted, Not Ted                Not Ted
3       Not Ted, Ted            Ted,Not Ted
4           Ted, Ted                    Ted

相关问题