如果Pandas列包含字典值中的任何值,则添加字典键作为标签

k5ifujac  于 2023-01-01  发布在  其他
关注(0)|答案(4)|浏览(192)

我有一个这样的dataFrame:

df = pd.DataFrame({'products' : ['a,b,c', 'a,c', 'b,d','a,b,c']})

    products
0   a,b,c
1   a,c
2   b,d
3   a,b,c

我还创建了一个字典,将特定产品Map到某个类别:

mydict = {'good':['a'],'bad':['d'],'neutral':['b','c','a']}

我尝试创建一个新列,假设df['quality']添加字典键(产品类别),如果df['products']中的任何产品包含在该特定键的值中,那么最终输出应该如下所示:

products quality
0   a,b,c     good, neutral   
1   a,c       good, neutral
2   b,d       neutral, bad
3   a,b,c     good, neutral

有什么想法吗?我是不是把问题复杂化了?

tquggr8v

tquggr8v1#

您可以首先生成一个将类别Map到产品的反转字典,例如a -> [good, neutral]。然后使用此反转字典将df中的值拆分为,explodemap。然后使用groupbyset将它们重新聚集到展开列表产品上,最后将它们与,连接:

from collections import defaultdict
from itertools import chain

# form the dictionary
reversed_dict = defaultdict(list)
[reversed_dict[cat].append(prod) for prod, categs in mydict.items()
                                 for cat in categs]

# apply over the df
df["quality"] = (df.products
                   .str.split(",")
                   .explode()
                   .map(reversed_dict)
                   .groupby(level=0)
                   .agg(lambda s: ", ".join(set(chain.from_iterable(s)))))

得到

>>> df

  products        quality
0    a,b,c  good, neutral
1      a,c  good, neutral
2      b,d   bad, neutral
3    a,b,c  good, neutral
wtzytmuj

wtzytmuj2#

我们试试看

help = pd.Series(mydict).explode().reset_index().groupby(0)['index'].agg(','.join)

df['quality'] = df.products.replace(help,regex=True).str.split(',').map(set).str.join(',')
Out[150]: 
0    good,neutral
1    good,neutral
2     bad,neutral
3    good,neutral
Name: products, dtype: object
6ie5vjzr

6ie5vjzr3#

你应该这样定义mydict:

mydict = {'a': ['good', 'neutral'],
          'b': ['neutral'],
          'c': ['neutral'],
          'd': ['bad']}

然后:

def func(row):
    categories = []
    for item in row['products'].split(','):
        categories = categories + mydict[item]
    return ','.join(sorted(list(set(categories))))

    
df['quality'] = df.apply(lambda row: func(row), axis=1)

退货:

products    quality
0   a,b,c       good,neutral
1   a,c         good,neutral
2   b,d         bad,neutral
3   a,b,c       good,neutral
ht4b089n

ht4b089n4#

下面是另一种方法:

d = {'a': ['good', 'neutral'],
          'b': ['neutral'],
          'c': ['neutral'],
          'd': ['bad']}

df['quality'] = df['products'].str.split(',').explode().map(d).explode().groupby(level=0).unique().str.join(',')

或(第一部分将当前词典转换为新格式)

s = df['products'].str.split(',').explode()

d = {i:[] for i in set(s)}

for k,v in mydict.items():
    for i in v:
        d.get(i).append(k)

s.map(d).map(set).groupby(level=0).agg(lambda x: set.union(*x)).str.join(',')

相关问题