python Pandas -基于多个匹配列值更新/合并2个 Dataframe

68bkxrlz  于 12个月前  发布在  Python
关注(0)|答案(3)|浏览(153)

我有两个嵌套框left_dfright-df,它们都有20列,具有相同的名称和数据类型。right_df还有两个额外的列,每行都有唯一的值。
我想用left_df中的所有值更新right_df中的行,其中matching_cols = ['col_1', 'col_3', 'col_10', 'col_12']列子集列表中的所有列中的值在两个嵌套框中相同。right_df中另外2个唯一列中的值应该保留。
理想情况下,我希望在同一个命令中也从left_df中删除这些行,或者在下一个命令中删除这些行。我需要多次执行此过程,在几个不同的列列表上进行匹配,left_df每次循环都删除匹配的行,直到最终没有找到更多的匹配。
一个可接受的替代方法是创建一个新的嵌套new_df,其中包含列表matching_cols中所有指定列匹配的行集,前20列中的值来自left_df,其余2列中的值来自right_df
我不关心在任何时候保留索引,在此之后我将它们导入SQL,并在最后将它们重新索引到2个right_df值中的一个。
新的Pandas,不能确定使用什么方法,已经尝试了.merge.join.update等的变化,但似乎不能指定只有更新时,我想要的列值都匹配,或如何删除这些行/导出到一个新的df。
更新:在下面添加了伪代码:
对于left_df为:

left_df = pd.DataFrame({
   'col_0': ['0', '1', '2', '3', '4', '5'],
   'col_1': ['A', 'B', 'C', 'D', 'E', 'F'],
   'col_2': ['new', 'new', 'new', 'new', 'new', 'new'],
   'col_3': ['new', 'new', 'new', 'new', 'new', 'new'],
   'col_4': ['new', 'new', 'new', 'new', 'new', 'new'],
   'col_5': ['new', 'new', 'new', 'new', 'new', 'new'],
   'col_6': ['new', 'new', 'new', 'new', 'new', 'new'],
   'col_7': ['new', 'new', 'new', 'new', 'new', 'new'],                
  })

字符串
right_df为:

right_df = pd.DataFrame({
   'col_0': ['0', '1', '2', '3', '4', '5'],
   'col_1': ['A', 'B', 'C', 'X', 'E', 'F'],
   'col_2': ['old', 'old', 'old', 'old', 'old', 'old'],
   'col_3': ['old', 'old', 'old', 'old', 'old', 'old'],
   'col_4': ['old', 'old', 'old', 'old', 'old', 'old'],
   'col_5': ['old', 'old', 'old', 'old', 'old', 'old'],
   'col_6': ['old', 'old', 'old', 'old', 'old', 'old'],
   'col_7': ['old', 'old', 'old', 'old', 'old', 'old'],
   'col_8': ['uid_0', 'uid_1', 'uid_2', 'uid_3', 'uid_4', 'uid_5'],
   'col_9': ['uid_a', 'uid_b', 'uid_c', 'uid_d', 'uid_e', 'uid_f'],                
  })


其中matching_cols = ['col_0', 'col_1']
我想得到以下结果,无论是作为一个新的嵌套框架,还是在right_df上的原地(注意,col_1在第3行不匹配,所以没有改变)

col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7  col_8  col_9
0     0     A   new   new   new   new   new   new  uid_0  uid_a
1     1     B   new   new   new   new   new   new  uid_1  uid_b
2     2     C   new   new   new   new   new   new  uid_2  uid_c
3     3     X   old   old   old   old   old   old  uid_3  uid_d
4     4     E   new   new   new   new   new   new  uid_4  uid_e
5     5     F   new   new   new   new   new   new  uid_5  uid_f

3zwtqj6y

3zwtqj6y1#

感谢this post和Pandas文档:
首先,我需要一个.merge,我指定后缀为'_r',只用于从right_df/复制的列,用于更新旧值:

merged_df = pd.merge(left_df, right_df, on=['col_0', 'col_1'], suffixes=(None, '_r'))

字符串
这将产生一个新的嵌套框架,其中行包含新列和旧列,仅针对每个嵌套框架中列on=['col_0', 'col_1']中的值匹配的行。然后我通过对文本'_r'使用正则表达式过滤器来删除“旧”列:

merged_df.drop(list(merged_df.filter(regex = '_r')), axis=1, inplace=True)


这将产生一个只有“修改”行而没有未修改行的嵌套框架,这已经足够接近我所需要的了。

col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7  col_8  col_9
0     0     A   new   new   new   new   new   new  uid_0  uid_a
1     1     B   new   new   new   new   new   new  uid_1  uid_b
2     2     C   new   new   new   new   new   new  uid_2  uid_c
3     4     E   new   new   new   new   new   new  uid_4  uid_e
4     5     F   new   new   new   new   new   new  uid_5  uid_f

zsohkypk

zsohkypk2#

试试这个

new_df=pd.concat([left_df,right_df.iloc[:,-1:-3]],axis=1)

字符串

yb3bgrhw

yb3bgrhw3#

使用您自己的解决方案:

merged_df = pd.merge(left_df, right_df, on=['col_0', 'col_1'], suffixes=(None, '_r'))
merged_df.drop(list(merged_df.filter(regex = '_r')), axis=1, inplace=True)

字符串
您可以将它们合并组合以获得最终目标:

mix_df = right_df.set_index(['col_0','col_1'])
mix_df[['col_2','col_3','col_4','col_5','col_6','col_7','col_8','col_9']] = merged_df.set_index(['col_0','col_1'])
final_df = mix_df.fillna(right_df.set_index(['col_0','col_1'])).reset_index()


结果(final_df):

col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7  col_8  col_9
0     0     A   new   new   new   new   new   new  uid_0  uid_a
1     1     B   new   new   new   new   new   new  uid_1  uid_b
2     2     C   new   new   new   new   new   new  uid_2  uid_c
3     3     X   old   old   old   old   old   old  uid_3  uid_d
4     4     E   new   new   new   new   new   new  uid_4  uid_e
5     5     F   new   new   new   new   new   new  uid_5  uid_f

相关问题