字符串值直接和间接匹配的python Dataframe 列

rn0zuynd 于 2022-12-17 发布在 Python

关注(0)|答案(1)|浏览(91)

DF1

Place                  Location
Delhi,Punjab,Jaipur         Delhi,Punjab,Noida,Lucknow
Delhi,Punjab,Jaipur         Delhi,Bhopal,Jaipur,Rajkot  
Delhi,Punjab,Kerala         Delhi,Jaipur,Madras

DF2

Target1   Target2    Strength
Jaipur    Rajkot     0.94
Jaipur    Punjab     0.84
Jaipur    Noida      0.62 
Jaipur    Jodhpur    0.59
Punjab    Amritsar   0.97
Punjab    Delhi      0.85
Punjab    Bhopal     0.91
Punjab    Jodhpur    0.75
Kerala    Varkala    0.85
Kerala    Kochi      0.88

任务是将“地点”值与“位置”值进行匹配，并在直接匹配的情况下分配得分“1”，在间接匹配的情况下引用df 2，并从中分配强度得分。例如：在Row 1中，Delhi和旁遮普是直接匹配，因为两者都出现在“Place”和“Location”中，其中Jaipur出现在“Place”中，但不在“Location”中。因此，Jaipur将在df 2 Target 1中迭代，并尝试查找Target 2中Row 1的相应“Location”值。在df 2中，Jaipur与ROW 1 Location值中的Punjab和诺伊达相关。因此，与Jaipur相对应，旁遮普的实力将被分配为0.84高于诺伊达的0.62。最终得分计算为（1+1+0.84）/3，即直接和间接比赛的总和除以“位置”项目的数量。
预期输出

Place                              Location                   Score
Delhi,Punjab,Jaipur         Delhi,Punjab,Noida,Lucknow       (1+1+0.84)/3 = 0.95
Delhi,Punjab,Jaipur         Delhi,Bhopal,Jaipur,Rajkot       (1+0.91+1)/3 = 0.97 
Delhi,Punjab,Kerala         Delhi,Jaipur,Madras              (1+1+0)/3 = 0.67

我的尝试：

data1 = df1['Place'].to_list()
data2 = df1['Location'].to_list()

dict3 = {}
exac_match = []
for el in data1:
    #print(el)
    el=[x.strip() for x in el.split(',')]
   
    for ell in data2:
        ell=[x.strip() for x in ell.split(',')]
        dict1 = {}
        dict2 = {}
        for elll in el:
            if elll in ell:
                #print("Exact match:::", elll)
                dict1[elll]=1
                dict2[elll]=elll

python

来源：https://stackoverflow.com/questions/74824825/python-dataframe-column-with-string-values-direct-and-indirect-match

1条答案

按热度按时间

ukdjmx9f1#

考虑到两列中的列表不均匀，需要循环：

from statistics import mean

s = df2.set_index(['Target1', 'Target2'])['Strength']

df1['Score'] = [s.reindex(list(zip(*x))).mean()
                for x in zip(df1['Place'].str.split(','),
                             df1['Location'].str.split(','))
               ]

输出：

Place                    Location  Score
0  Delhi,Punjab,Jaipur  Delhi,Punjab,Noida,Lucknow   0.62
1  Delhi,Punjab,Jaipur  Delhi,Bhopal,Jaipur,Rajkot   0.91
2  Delhi,Punjab,Kerala         Delhi,Jaipur,Madras    NaN

赞(0）回复(0）举报 2022-12-17

我来回答

字符串值直接和间接匹配的python Dataframe 列

1条答案

相关问题

热门标签

最新问答