python 如果一行中的列表字符串出现在另一行中，如何创建相同的索引

ldioqlga 于 2023-02-28 发布在 Python

关注(0)|答案(1)|浏览(136)

我尝试在pandas Dataframe 中为每一行创建一个索引列，如果字符串列表中的字符串在另一行中可用，则我应该为新行应用相同的索引
如果行[0]的字符串在行[1]中，则行[1]的索引应该具有行[0]
我尝试编写代码，但出现错误

import pandas as pd

# create a sample dataframe
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
       'Hobbies': [['reading', 'hiking', 'painting'], ['swimming', 'biking'], 
        ['cooking', 'dancing'], ['reading', 'swimming'], ['hiking', 'painting']]}
df = pd.DataFrame(data)

# loop over each row of the dataframe
for index, row in df.iterrows():
    # define a boolean mask for all rows except the current one
    mask = df.index != index
    # check if any string in the current row is present in the other rows
    if df.loc[mask, 'Hobbies'].apply(lambda x: any(item for item in row['Hobbies'] if item in x)).any():
        # set the index of the current row to the index of the other row
        df.at[index, 'Index'] = df.loc[mask, 'Index'].iloc[0]
    else:
        # assign a new index if no match is found
        df.at[index, 'Index'] = index

print(df)

python

来源：https://stackoverflow.com/questions/75582267/how-to-create-a-same-index-if-string-of-list-in-one-row-present-in-another-rows

1条答案

按热度按时间

vlf7wbxs1#

假设此输入（注意不同的索引）：

Name                      Hobbies
A    Alice  [reading, hiking, painting]
B      Bob           [swimming, biking]
C  Charlie           [cooking, dancing]
D    David          [reading, swimming]
E    Emily           [hiking, painting]

您可以将数据视为netwrokx的图形，并求出connected_components：

from itertools import combinations
import networkx as nx

G = nx.from_edgelist(c for l in df['Hobbies'] for c in combinations(l, 2))

groups = {n: i for i, c in enumerate(nx.connected_components(G)) for n in c}

df.index = (df.index.to_series()
              .groupby(df['Hobbies'].str[0].map(groups))
              .transform('first')
           )

输出：

Name                      Hobbies
A    Alice  [reading, hiking, painting]
A      Bob           [swimming, biking]
C  Charlie           [cooking, dancing]
A    David          [reading, swimming]
A    Emily           [hiking, painting]

如果您不关心原始索引，请将最后一行替换为：

df.index = df['Hobbies'].str[0].map(groups).to_numpy()

输出：

Name                      Hobbies
0    Alice  [reading, hiking, painting]
0      Bob           [swimming, biking]
1  Charlie           [cooking, dancing]
0    David          [reading, swimming]
0    Emily           [hiking, painting]

连接组件：

赞(0）回复(0）举报 2023-02-28

我来回答

python 如果一行中的列表字符串出现在另一行中，如何创建相同的索引

1条答案

相关问题

热门标签

最新问答