python 如果一行中的列表字符串出现在另一行中,如何创建相同的索引

ldioqlga  于 2023-02-28  发布在  Python
关注(0)|答案(1)|浏览(136)

我尝试在pandas Dataframe 中为每一行创建一个索引列,如果字符串列表中的字符串在另一行中可用,则我应该为新行应用相同的索引
如果行[0]的字符串在行[1]中,则行[1]的索引应该具有行[0]
我尝试编写代码,但出现错误

import pandas as pd

# create a sample dataframe
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
       'Hobbies': [['reading', 'hiking', 'painting'], ['swimming', 'biking'], 
        ['cooking', 'dancing'], ['reading', 'swimming'], ['hiking', 'painting']]}
df = pd.DataFrame(data)

# loop over each row of the dataframe
for index, row in df.iterrows():
    # define a boolean mask for all rows except the current one
    mask = df.index != index
    # check if any string in the current row is present in the other rows
    if df.loc[mask, 'Hobbies'].apply(lambda x: any(item for item in row['Hobbies'] if item in x)).any():
        # set the index of the current row to the index of the other row
        df.at[index, 'Index'] = df.loc[mask, 'Index'].iloc[0]
    else:
        # assign a new index if no match is found
        df.at[index, 'Index'] = index

print(df)
vlf7wbxs

vlf7wbxs1#

假设此输入(注意不同的索引):

Name                      Hobbies
A    Alice  [reading, hiking, painting]
B      Bob           [swimming, biking]
C  Charlie           [cooking, dancing]
D    David          [reading, swimming]
E    Emily           [hiking, painting]

您可以将数据视为netwrokx的图形,并求出connected_components

from itertools import combinations
import networkx as nx

G = nx.from_edgelist(c for l in df['Hobbies'] for c in combinations(l, 2))

groups = {n: i for i, c in enumerate(nx.connected_components(G)) for n in c}

df.index = (df.index.to_series()
              .groupby(df['Hobbies'].str[0].map(groups))
              .transform('first')
           )

输出:

Name                      Hobbies
A    Alice  [reading, hiking, painting]
A      Bob           [swimming, biking]
C  Charlie           [cooking, dancing]
A    David          [reading, swimming]
A    Emily           [hiking, painting]

如果您不关心原始索引,请将最后一行替换为:

df.index = df['Hobbies'].str[0].map(groups).to_numpy()

输出:

Name                      Hobbies
0    Alice  [reading, hiking, painting]
0      Bob           [swimming, biking]
1  Charlie           [cooking, dancing]
0    David          [reading, swimming]
0    Emily           [hiking, painting]

连接组件:

相关问题