我有一个下面的数据框架和两个列表。我想检查list1
中的项目是否在Description
列中可用,然后创建一个新列并添加标签“weather
”。对于list2
,我需要添加标签“equipment
”。
list1 = ['wind','air']
list2 = ['crane','machine']
df
Description
There was a heavy wind due to cyclone.
Pollution hamper the air quality.
The machine failure was due to short circuit.
The game was called off due to wind.
Players played the game very well.
the crane operator took the crane to wrong side
期望输出
Description Label
There was a heavy wind due to cyclone. weather
Pollution hamper the air quality. weather
The machine failure was due to short circuit. equipment
The game was called off due to wind. weather
Players played the game very well. Other
the crane operator took the crane to wrong side. equipment
我尝试了下面的代码,但在最后的数据中,它给了我所有描述的标签“其他”。
df['Description'] = np.where(df['Description'].str.contains('|'.join(list1)),'weather','Other')
df['Description'] = np.where(df['Description'].str.contains('|'.join(list2)),'equipment','Other')
3条答案
按热度按时间zfycwa2u1#
快速简单的代码修复方法是使用
numpy.select
:但是,你可以做得更好。
您可以使用自动构建的正则表达式和Map字典:
如果每个句子可以有多个匹配项:
输出:
cetgtptt2#
它似乎是这样工作的:
ccrfmcuu3#
我将使用
.apply()
来实现这一点,您可以对填充label
列所使用的逻辑进行更多控制: