我有2个 Dataframe ,我想匹配2个以上的数字,匹配的行,我正在寻找。
import pandas as pd
cols = ['Num1','Num2','Num3','Num4','Num5','Num6']
df1 = pd.DataFrame([[2,4,6,8,9,10]], columns=cols)
df2 = pd.DataFrame([[1,1,2,4,5,6,8],
[2,5,6,20,22,23,34],
[3,8,12,13,34,45,46],
[4,9,10,14,29,32,33],
[5,1,22,13,23,33,35],
[6,1,6,7,8,9,10],
[7,0,2,3,5,6,8]],
columns = ['Id','Num1','Num2','Num3','Num4','Num5','Num6'])
我有这个匹配的代码,但我想通过匹配行中超过2个数字来增强。
# convert the values in the first dataframe to a list
vals_to_find = df1.iloc[0].tolist()
# Print the values to find
print("Vals to find:", vals_to_find)
# Create an empty list to hold the matching IDs
matching_ids = []
# iterate through the big dataframe
for index, row in df2.iterrows():
rowlist = row.tolist() # convert the row to a list
# keep the id for later, and extract the other values for evaluation
id = rowlist[0]
vals = rowlist[1:]
# count the number of values in one list against another list
counter = sum(elem in vals_to_find for elem in vals)
# If the number of matches is greater than 2, then grab the ID
if counter > 2:
matching_ids.append({'ID': id})
# Print the matching IDs
print('Matching IDS:', matching_ids)
我希望我的结果是这样的。
df3 = pd.DataFrame([[6,1,6,7,8,9,10],
[7,0,2,3,5,6,8]],
columns = ['Id', 'Num1','Num2','Num3','Num4','Num5','Num6'])
1条答案
按热度按时间zzlelutf1#
我希望我没理解错你的问题,你可以构造一个掩码(使用
set.intersection
),然后在df2
上使用这个掩码:图纸: