连接两个Pandas Dataframe 且不更改索引顺序

1wnzp6jl  于 2023-01-15  发布在  其他
关注(0)|答案(2)|浏览(179)

基本问题-我尝试在同一列上连接两个 Dataframe ,但不更改索引顺序,例如:

df1 = pd.DataFrame({'kabat_number':['H1','H2','H2A','H3','H4','H20','H20A','H30','H31'], 'AA':['A','C','S','Y','R','C','Y','V','I']})
df2 = pd.DataFrame({'kabat_number':['H1','H2','H3','H4','H20A','H20B','H20C','H30','H31'],'AA':['A','C','Y','R','C','Y','L','G','V']})
dfs = pd.merge(df1,df2,on='kabat_number',how='outer')
print(dfs)

   kabat_number AA_x AA_y
0            H1    A    A
1            H2    C    C
2           H2A    S  NaN
3            H3    Y    Y
4            H4    R    R
5           H20    C  NaN
6          H20A    Y    C
7           H30    V    G
8           H31    I    V
9          H20B  NaN    Y
10         H20C  NaN    L

合并结果顺序发生变化(H20 B、H20 C放在最后)。
但我想得到的是:

kabat_number AA_x AA_y
0            H1    A    A
1            H2    C    C
2           H2A    S  NaN
3            H3    Y    Y
4            H4    R    R
5           H20    C  NaN
6          H20A    Y    C
7          H20B  NaN    Y
8          H20C  NaN    L
9           H30    V    G
10          H31    I    V

我也尝试了sort=False,但是顺序还是改变了,我怎么才能得到我想要的结果呢?谢谢!

mbzjlibv

mbzjlibv1#

合并后按natsort_key排序:

# pip install natsort
from natsort import natsort_key

dfs = (pd.merge(df1,df2,on='kabat_number',how='outer')
         .sort_values(by='kabat_number', key=natsort_key, ignore_index=True)
      )

输出:

kabat_number AA_x AA_y
0            H1    A    A
1            H2    C    C
2           H2A    S  NaN
3            H3    Y    Y
4            H4    R    R
5           H20    C  NaN
6          H20A    Y    C
7          H20B  NaN    Y
8          H20C  NaN    L
9           H30    V    G
10          H31    I    V
1tuwyuhd

1tuwyuhd2#

试试这个:

import pandas as pd
from natsort import natsorted
import numpy as np

df1 = pd.DataFrame({'kabat_number':['H1','H2','H2A','H3','H4','H20','H20A','H30','H31'], 'AA':['A','C','S','Y','R','C','Y','V','I']})
df2 = pd.DataFrame({'kabat_number':['H1','H2','H3','H4','H20A','H20B','H20C','H30','H31'],'AA':['A','C','Y','R','C','Y','L','G','V']})
dfs = pd.merge(df1,df2,on='kabat_number',how='outer')
dfs = dfs.sort_values(
    by='kabat_number', 
    key=lambda x: np.argsort(natsorted(x))
    ).reset_index(drop=True)
print(dfs)

相关问题