我有3个 Dataframe
DF1
CAT1 CAT2 CAT3 ID_X
A1 B C X1
A1 B C X2
A2 B C X3
A2 B C X4
A2 B C X5
A3 B C X6
A4 B C X7
DF2
CAT1 CAT2 CAT3 ID_Y
A1 B C Y1
A1 B C Y2
A1 B C Y3
A2 B C Y4
A2 B C Y5
A3 B C Y6
A5 B C Y7
DF3
ID_X ID_Y ID_XY
X1 Y1 X1Y1
X2 Y3 X2Y3
X3 Y4 X3Y4
X4 Y5 X4Y5
X6 Y6 X6Y6
有三个步骤可以得到最终结果
Step1:在CAT 1、CAT 2和CAT 3上内连接df 1和df 2,根据ID_X和ID_Y创建ID_XY列,得到新的 Dataframe df_merge
脚本
df_merge = pd.merge(df1, df2, how="inner", on=["CAT1", "CAT2","CAT3"])
df_merge['ID_XY'] = df_merge['ID_X']+merge1['ID_Y']
Step2:使用df 3从df_merge中删除具有公共ID_XY的行
脚本
df_merge1 = df_merge[~df_merge.ID_XY.isin(df3.ID_XY)]
df_merge1
CAT1 CAT2 CAT3 ID_X ID_Y ID_XY
A1 B C X1 Y2 X1Y2
A1 B C X1 Y3 X1Y3
A1 B C X2 Y1 X2Y1
A1 B C X2 Y2 X2Y2
A2 B C X3 Y5 X3Y5
A2 B C X4 Y4 X4Y4
A2 B C X5 Y4 X5Y4
A2 B C X5 Y5 X5Y5
Step 3:根据CAT 1、CAT 2、CAT 3、ID_X、ID_Y选择 Dataframe 第一行,如果ID_Y中的列值之前出现过,则删除行
最终输出将是步骤3的最终结果:输出如下所示
df_final
CAT1 CAT2 CAT3 ID_X ID_Y ID_XY
A1 B C X1 Y2 X1Y2
A1 B C X2 Y1 X2Y1
A2 B C X3 Y5 X3Y5
A2 B C X4 Y4 X4Y4
编辑1
**注意:**将ID_X视为JOB,ID_Y视为候选项!在df_merge1中,如果我为X1选择Y2(给定其他列常量),则无法为相同的X1选择Y3。类似地,如果X4分配给Y 4(给定相同的CAT 1,CAT 2和CAT 3),则Y 4无法分配给X5。
编辑二:
我试试
merge3 = df_merge1.copy()
df_X1 = merge3[merge3['ID_X']=='X1']
df_X2 = merge3[merge3['ID_X']=='X2']
df_X3 = merge3[merge3['ID_X']=='X3']
df_X4 = merge3[merge3['ID_X']=='X4']
df_X5 = merge3[merge3['ID_X']=='X5']
selected_list = []
df_X1 = df_X1.iloc[:1]
selected_list.append(df_X1['ID_Y'].values[0])
df_X2 = df_X2[~df_X2.ID_Y.isin(selected_list)].iloc[:1]
selected_list.append(df_X2['ID_Y'].values[0])
df_X3 = df_X3[~df_X3.ID_Y.isin(selected_list)].iloc[:1]
selected_list.append(df_X3['ID_Y'].values[0])
df_X4 = df_X4[~df_X4.ID_Y.isin(selected_list)].iloc[:1]
selected_list.append(df_X4['ID_Y'].values[0])
df_X5 = df_X5[~df_X5.ID_Y.isin(selected_list)].iloc[:1]
df_output = pd.concat([df_X1,df_X2,df_X3,df_X4,df_X5])
- 任何帮助都将不胜感激 *
1条答案
按热度按时间gmxoilav1#
如果需要通过
CAT1/CAT2/CAT3/ID_X
删除重复项,然后通过ID_Y
用途:或者: