pandas 基于if-else条件的 Dataframe 合并

ukqbszuj  于 2023-03-11  发布在  其他
关注(0)|答案(2)|浏览(115)

我有以下两个 Dataframe df1和df2

type num1  dem
   t1   m1   32
   t1   m2   33
   t2   p1   34
   t2   p2   35

and

  type  num1  num2
   t1    -     23
   t1    -     24
   t2    n1    25
   t2    n2    26

我想根据两种不同的规则合并这两个数据框。在df2中,如果df2['num1']为空,则我将仅根据类型进行合并操作。如果df2['num1']为非空,则我将连接两个数据框的typenum1列,然后根据此新列进行合并操作。
最初我的想法是为两个 Dataframe 创建一个额外的列,名为flag,然后将这两个 Dataframe 分开,分别执行操作。最后,将执行pd.concat来连接这两个 Dataframe 。
我做到了,

df3 = pd.merge(df1, df2[d2['num1'].isna()], on = 'type', how = 'left')

但它没有给我正确的答案。我想我错过了一些东西。
好心帮忙。

7rtdyuoh

7rtdyuoh1#

如果我明白你的规矩,你就应该这么做

import pandas as pd
import numpy as np

# Create the first data frame
df1 = pd.DataFrame({
    'type': ['t1', 't1', 't2', 't2'],
    'num1': ['m1', 'm2', 'p1', 'p2'],
    'dem': [32, 33, 34, 35]
})

df2 = pd.DataFrame({
    'type': ['t1', 't1', 't2', 't2'],
    'num1': ['', '', 'n1', 'n2'],
    'num2': [23, 24, 25, 26]
})

def merge_key(row):
    if pd.isnull(row['num1']):
        return row['type']
    else:
        return row['type'] + row['num1']

df1['merge_key'] = df1.apply(merge_key, axis=1)
df2['merge_key'] = df2.apply(merge_key, axis=1)

df2_empty = df2[df2['num1'] == '']
df2_nonempty = df2[df2['num1'] != '']
merged_empty = pd.merge(df1, df2_empty[['type', 'num2']], on='type', how='left')
merged_nonempty = pd.merge(df1, df2_nonempty[['merge_key', 'num2']], on='merge_key', how='left')

merged = pd.concat([merged_empty, merged_nonempty], ignore_index=True)
merged['num2'] = merged['num2'].fillna(0)

print(merged)

其返回

type num1  dem merge_key  num2
0   t1   m1   32      t1m1  23.0
1   t1   m1   32      t1m1  24.0
2   t1   m2   33      t1m2  23.0
3   t1   m2   33      t1m2  24.0
4   t2   p1   34      t2p1   0.0
5   t2   p2   35      t2p2   0.0
6   t1   m1   32      t1m1   0.0
7   t1   m2   33      t1m2   0.0
8   t2   p1   34      t2p1   0.0
9   t2   p2   35      t2p2   0.0
9jyewag0

9jyewag02#

我认为您的concat想法是正确的。使用掩码将df2拆分为两部分,并分别处理这两种情况:

# condition to split df2
# this could be df2['num1'].eq('-')
# if dashes are used
m = df2['num1'].isna()

# merge separately and combine
out = pd.concat([
  df1.merge(df2[~m], on=['type', 'num1']),
  df1.merge(df2[m].drop(columns=['num1']), on=['type'])],
  ignore_index=True
 )

print(out)

输出(使用略有不同的df1作为输入):

type num1  dem  num2
0   t2   p1   34    25
1   t1   m1   32    23
2   t1   m1   32    24
3   t1   m2   33    23
4   t1   m2   33    24

使用的输入:

df1 = pd.DataFrame({
    'type': ['t1', 't1', 't2', 't2'],
    'num1': ['m1', 'm2', 'p1', 'p2'],
    'dem': [32, 33, 34, 35]
})

df2 = pd.DataFrame({
    'type': ['t1', 't1', 't2', 't2'],
    'num1': [np.nan, np.nan, 'p1', 'n2'],
    'num2': [23, 24, 25, 26]
})

相关问题