我有一个DataFrame,它有一个网站、类别和该网站的关键词。
Url | categories | keywords
Espn | [sport, nba, nfl] | [half, touchdown, referee, player, goal]
Tmz | [entertainment, sport] | [gossip, celebrity, player]
Goal [ [sport, premier_league, champions_league] | [football, goal, stadium, player, referee]
可以使用以下代码创建:
data = [{ 'Url': 'ESPN', 'categories': ['sport', 'nba', 'nfl'] ,
'keywords': ["half", "touchdown", "referee", "player", "goal"] },
{ 'Url': 'TMZ', 'categories': ["entertainment", "sport"] ,
'keywords': ["gossip", "celebrity", "player"] },
{ 'Url': 'Goal', 'categories': ["sport", "premier_league", "champions_league"] ,
'keywords': ["football", "goal", "stadium", "player", "referee"]},
]
df =pd.DataFrame(data)
对于关键字列中的所有单词,我想获得与之相关的类别的频率。结果可能如下:
{half:{sport:1,nba:1,nfl:1},触地得分:{sport:1,nba:1,nfl:1},裁判:{sport:2,nba:1,nfl:1,premier_league:1,player:{sport:3,nba:1,nfl:1,premier league:1,champions_league:1},八卦:{体育:1,娱乐:1}、名人:{运动:1,体育:1、娱乐:1},目标:{short:2、premier league:1,chambers_league:,nba:1,nfl:1},体育场:{体育:1,英超联赛:1,冠军联赛:1}}
1条答案
按热度按时间ebdffaop1#
由于列包含列表,因此可以分解它们,以便为每个列表的每个元素重复一行: