pandas 比较panda Dataframe 中的列值和列表

ttcibm8c  于 2023-03-11  发布在  其他
关注(0)|答案(1)|浏览(138)

我的 Dataframe 中有两列,一列包含数字,另一列包含数字列表。我希望创建第三列,根据第一行中的数字是否存在于第二列中对应列表的前两个元素中,确定第三列的True或False
下面是我的尝试:

import pandas as pd
import numpy as np
df = pd.DataFrame({'number': [1, 2, 3], 'list_of_numbers': [[1, 3 ,2 ,5], [6 ,7 ,8 ,2 ,10], [13 ,12 ,13 ,14 ,3]]})
df['check'] = np.isin(df['number'], [x[0:2] for x in df['list_of_numbers']])

我期待[True, False, False]的输出,但我得到的是[True, False, True]。我猜总是与list_of_numbers中的第一个值([1, 3 ,2 ,5])进行比较,以获得这样的输出。
我做错什么了?先谢谢你

ars1skjm

ars1skjm1#

你需要在这里使用一个循环:

df['check'] = [a in b[0:2] for a,b in zip(df['number'], df['list_of_numbers'])]

输出:

number      list_of_numbers  check
0       1         [1, 3, 2, 5]   True
1       2     [6, 7, 8, 2, 10]  False
2       3  [13, 12, 13, 14, 3]  False
您的方法失败的原因

np.isin在使用前将test_element数组扁平化,因此您不是针对每个列表测试每个元素,而是针对所有列表的连接测试每个元素
示范:

import pandas as pd
import numpy as np
df = pd.DataFrame({'number': [1, 1, 3], # we changed the 2 in 1
                   'list_of_numbers': [[1, 7, 2, 5],  # we removed the 3
                                       [6, 7, 8, 2, 10],
                                       [13, 12, 13, 14, 3]]})

df['check'] = np.isin(df['number'], [x[0:2] for x in df['list_of_numbers']])
print(df)
   number      list_of_numbers  check
0       1         [1, 7, 2, 5]   True
1       1     [6, 7, 8, 2, 10]   True # we have True as the first list has a 1
2       3  [13, 12, 13, 14, 3]  False # now we have False as the 3 in the first list is gone

相关问题