python-3.x 关键字在字典值之间匹配,如列表和Pandas列

0md85ypi  于 2023-03-09  发布在  Python
关注(0)|答案(1)|浏览(132)

假设我有一个 Dataframe df,列名为news_text

news_text
lebron james is the great basketball player.
leonardo di caprio has won the oscar for best actor
avatar was directed by steven speilberg.
ronaldo has resigned from manchester united.
argentina beats france in fifa world cup 2022.
joe biden has won the president elections.
2026 fifa WC will be host by canada,mexico and usa combined.

还有一个包含数百个键的大字典,

{'category_1': ['lebron james', 'oscar', 'leonardo dicaprio'], 'category_2': ['basketball', 'steven speilberg','manchester united'], 
'category_3': ['ronaldo', 'argentina','world cup']...so on}

所有,我想执行字典值**(其中包括关键字列表)**和df['news_text']之间的精确关键字匹配。一旦关键字将被匹配,相应的字典键将被分配到列表形式的新列mapped_category,如果没有关键字在任何关键字列表中找到,那么列值将是NA

news_text                                                    mapped_category
lebron james is the great basketball player.               ['category_1', 'category_2']
leonardo di caprio has won the oscar for best actor        ['category_1','category_1']
avatar was directed by steven speilberg.                   ['category_2']
ronaldo has resigned from manchester united.               ['category_2','category_3']
argentina beats france in fifa world cup 2022.             ['category_3','category_3]
joe biden has won the president elections.                        NA
2026 fifa WC will be host by canada,mexico and usa combined.      NA
h9vpoimq

h9vpoimq1#

最简单(不一定是最快或最奇特)的方法是编写一个函数,为 * 一个 * 新闻文档生成所需的类别列表,然后将该函数应用于该系列文档:

categories = {
    'category_1': ['lebron james', 'oscar', 'leonardo dicaprio'],
    'category_2': ['basketball', 'steven speilberg','manchester united'],
    'category_3': ['ronaldo', 'argentina','world cup'],
}

def find_categories(document):
    found = []
    for category, keywords in categories.items():
        for keyword in keywords:
            if keyword in document:
                found.append(category)
                break
    return found

df['news_categories'] = df['news_text'].apply(find_categories)

相关问题