Pandas在两个 Dataframe 之间合并相同的多个列

brvekthn  于 2022-11-27  发布在  其他
关注(0)|答案(3)|浏览(236)

我试图找到一种方法来合并多列在同一时间与Pandas。我有我想要的输出做五个单独的合并,但感觉应该有一个更Python的方式来做。
实际上,我有一个名为df_striking的 Dataframe ,其中包含五个关键字列,我试图将另一个 Dataframe (名为df_keyword_vol)中的搜索卷数据合并到相邻的行中。
最小重现性示例:

import pandas as pd

striking_data = {
    "KW1": ["nectarine", "apricot", "plum"],
    "KW1 Vol": ["", "", ""],
    "KW2": ["apple", "orange", "pear"],
    "KW2 Vol": ["", "", ""],
    "KW3": ["banana", "grapefruit", "cherry"],
    "KW3 Vol": ["", "", ""],
    "KW4": ["kiwi", "lemon", "peach"],
    "KW4 Vol": ["", "", ""],
    "KW5": ["raspberry", "blueberry", "berries"],
    "KW5 Vol": ["", "", ""],
}

df_striking = pd.DataFrame(striking_data)

keyword_vol_data = {
    "Keyword": [
        "nectarine",
        "apricot",
        "plum",
        "apple",
        "orange",
        "pear",
        "banana",
        "grapefruit",
        "cherry",
        "kiwi",
        "lemon",
        "peach",
        "raspberry",
        "blueberry",
        "berries",
    ],
    "Volume": [
        1000,
        500,
        200,
        600,
        800,
        1000,
        450,
        10,
        900,
        1200,
        150,
        700,
        400,
        850,
        1000,
    ],
}

df_keyword_vol = pd.DataFrame(keyword_vol_data)

所需输出

我已经尝试过了。我已经做了两个函数来合并关键字数据一次一行,但它只是不是很Python!

# two functions to merge in the keyword volume data for KWs 1 - 5
def merger(col1, col2):
    dx = df_striking.merge(df_keyword_vol, how='left', left_on=col1, right_on=col2)
    return dx

def volume(vol1, vol2):
    vol = df_striking[vol1] = df_striking[vol2]
    df_striking.drop(['Keyword', 'Volume'], axis=1, inplace=True)
    return vol

df_striking = merger("KW1", "Keyword")
volume("KW1 Vol", "Volume")
df_striking = merger("KW2", "Keyword")
volume("KW2 Vol", "Volume")
df_striking = merger("KW3", "Keyword")
volume("KW3 Vol", "Volume")
df_striking = merger("KW4", "Keyword")
volume("KW4 Vol", "Volume")
df_striking = merger("KW5", "Keyword")
volume("KW5 Vol", "Volume")
ojsjcaue

ojsjcaue1#

如果已经有空列,则可以用途:

mapping = df_keyword_vol.set_index('Keyword')['Volume']

df_striking.iloc[:, 1::2] = df_striking.iloc[:, ::2].replace(mapping)

否则,如果只有KWx列:

df2 = (pd.concat([df, df.replace(mapping)], axis=1)
         .sort_index(axis=1)
       )

输出:

KW1   KW1     KW2   KW2         KW3  KW3    KW4   KW4        KW5   KW5
0  nectarine  1000   apple   600      banana  450   kiwi  1200  raspberry   400
1    apricot   500  orange   800  grapefruit   10  lemon   150  blueberry   850
2       plum   200    pear  1000      cherry  900  peach   700    berries  1000
6gpjuf90

6gpjuf902#

如果您将其全部转换为长格式,则会更容易:

>>> striking = df_striking.filter(regex='KW[0-9]*$').stack().rename('Keyword').reset_index()
>>> joined = striking.merge(df_keyword_vol)
>>> joined
  level_0 level_1     Keyword  Volume
0       0     KW1   nectarine    1000
1       0     KW2       apple     600
2       0     KW3      banana     450
3       0     KW4        kiwi    1200
4       0     KW5   raspberry     400
5       1     KW1     apricot     500
6       1     KW2      orange     800
7       1     KW3  grapefruit      10
8       1     KW4       lemon     150
9       1     KW5   blueberry     850
10      2     KW1        plum     200
11      2     KW2        pear    1000
12      2     KW3      cherry     900
13      2     KW4       peach     700
14      2     KW5     berries    1000

然后,您可以使用.pivot获得原始格式,但使用多索引作为列:

>>> joined.pivot('index', 'level_1', ['Keyword', 'Volume'])
           Keyword                                       Volume                       
level_1        KW1     KW2         KW3    KW4        KW5    KW1   KW2  KW3   KW4   KW5
index                                                                                 
0        nectarine   apple      banana   kiwi  raspberry   1000   600  450  1200   400
1          apricot  orange  grapefruit  lemon  blueberry    500   800   10   150   850
2             plum    pear      cherry  peach    berries    200  1000  900   700  1000

我们可以使用pd.concat来解决这种奇怪的格式:

>>> pd.concat([
...     joined.pivot('index', 'level_1', 'Keyword'),
...     joined.pivot('index', 'level_1', 'Volume').add_suffix(' Vol')
... ], axis='columns').sort_index(axis='columns')
level_1        KW1  KW1 Vol     KW2  KW2 Vol         KW3  KW3 Vol    KW4  KW4 Vol        KW5  KW5 Vol
index                                                                                                
0        nectarine     1000   apple      600      banana      450   kiwi     1200  raspberry      400
1          apricot      500  orange      800  grapefruit       10  lemon      150  blueberry      850
2             plum      200    pear     1000      cherry      900  peach      700    berries     1000
r3i60tvu

r3i60tvu3#

pd.concat([v.reset_index(drop=True).drop('col1',axis=1)
           for k,v in
           df_keyword_vol.assign(col1=df_keyword_vol.index//3)
          .groupby('col1')]
          ,axis=1)\
    .set_axis(df_striking.columns,axis=1)

    KW1   KW1     KW2   KW2         KW3  KW3    KW4   KW4        KW5   KW5
0  nectarine  1000   apple   600      banana  450   kiwi  1200  raspberry   400
1    apricot   500  orange   800  grapefruit   10  lemon   150  blueberry   850
2       plum   200    pear  1000      cherry  900  peach   700    berries  1000

相关问题