如何在pandas中找到两列列表中的第一个公共数据

w46czmvw 于 2023-04-04 发布在其他

关注(0)|答案(4)|浏览(173)

我有一个包含两列的数据集。每列包含一个整数列表。我想找到共同的第一个值。
下面是一个例子：

import pandas as pd
d = {'col1': [[1,2,3,5], [3,4,5]], 'col2': [[2,3,4], [3,5]]}
df = pd.DataFrame(data=d)

           col1       col2
0  [1, 2, 3, 5]  [2, 3, 4]
1     [3, 4, 5]     [3, 5]

期望df包含每行的第一个公共数据：

d = {'result': [2,3]}
expect_df = pd.DataFrame(data=d)

   result
0       2
1       3

pandas

来源：https://stackoverflow.com/questions/75859587/how-to-find-first-common-data-in-two-columns-of-lists-in-pandas

4条答案

按热度按时间

yiytaume1#

你可以使用iterrows和numpy.interst1d：

import numpy as np

for index, row in df.iterrows():
    intersect = np.intersect1d(row.col1, row.col2)
    if len(intersect) > 0:
        first = np.min(intersect)
    else:
        first = None
    df.at[index, "col"] = first

赞(0）回复(0）举报 2023-04-04

wb1gzix02#

使用numpy.intersect1d例程：

res = df.apply(lambda x: np.intersect1d(x['col1'], x['col2'],
                                        return_indices=False)[0], axis=1)\
      .to_frame('result')

result
0       2
1       3

赞(0）回复(0）举报 2023-04-04

envsm3lx3#

您可以使用apply并定义一个function，它返回两个列表中的第一个公共项：

def first_common(x):
    for i in x['col1']:
        if i in x['col2']:
            return i

expected_df = df.apply(first_common, axis=1) # axis=1 applies function by rows
expected_df = expected_df.to_frame('results') # Make series to dataframe

expected_df现在是：

results
0        2
1        3

如果您想将结果添加回原始 Dataframe ，可以执行以下操作：

df['common'] = expected_df

df现在是：

col1       col2  common
0  [1, 2, 3, 5]  [2, 3, 4]       2
1     [3, 4, 5]     [3, 5]       3

赞(0）回复(0）举报 2023-04-04

8cdiaqws4#

你需要在这里使用一个循环。一个带有列表解析的选项，以及一个set作为生成器表达式中的引用，其中next：

out = pd.DataFrame([next((x for x in a if x in b), None)
                    for a, b in zip(df['col1'],
                                    df['col2'].apply(set))
                    ], columns=['result'], index=df.index)

输出：

result
0       2
1       3

作为新列

df['common'] = [next((x for x in a if x in b), None)
                for a, b in zip(df['col1'],
                              df['col2'].apply(set))]

输出：

col1       col2  common
0  [1, 2, 3, 5]  [2, 3, 4]       2
1     [3, 4, 5]     [3, 5]       3

建议解决方案的时间

赞(0）回复(0）举报 2023-04-04

我来回答

如何在pandas中找到两列列表中的第一个公共数据

4条答案

作为新列

建议解决方案的时间

相关问题

热门标签

最新问答