我尝试在pandas Dataframe 中为每一行创建一个索引列,如果字符串列表中的字符串在另一行中可用,则我应该为新行应用相同的索引
如果行[0]的字符串在行[1]中,则行[1]的索引应该具有行[0]
我尝试编写代码,但出现错误
import pandas as pd
# create a sample dataframe
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Hobbies': [['reading', 'hiking', 'painting'], ['swimming', 'biking'],
['cooking', 'dancing'], ['reading', 'swimming'], ['hiking', 'painting']]}
df = pd.DataFrame(data)
# loop over each row of the dataframe
for index, row in df.iterrows():
# define a boolean mask for all rows except the current one
mask = df.index != index
# check if any string in the current row is present in the other rows
if df.loc[mask, 'Hobbies'].apply(lambda x: any(item for item in row['Hobbies'] if item in x)).any():
# set the index of the current row to the index of the other row
df.at[index, 'Index'] = df.loc[mask, 'Index'].iloc[0]
else:
# assign a new index if no match is found
df.at[index, 'Index'] = index
print(df)
1条答案
按热度按时间vlf7wbxs1#
假设此输入(注意不同的索引):
您可以将数据视为
netwrokx
的图形,并求出connected_components
:输出:
如果您不关心原始索引,请将最后一行替换为:
输出:
连接组件: