numpy 如果列是另一个 Dataframe 列集值的子字符串

wgeznvg7  于 2023-02-08  发布在  其他
关注(0)|答案(3)|浏览(115)
df1 = pd.DataFrame({'Key':['OK340820.1','OK340821.1'],'Length':[50000,67000]})
df2 = pd.DataFrame({'Key':['OK340820','OK340821'],'Length':[np.nan,np.nan]})

如果df2.Key是df1.Key的子字符串,则将df2的Length设置为df1中Length的值
我试着这么做:

df2['Length']=np.where(df2.Key.isin(df1.Key.str.extract(r'(.+?(?=\.))')), df1.Length, '')

但它没有归还火柴。

qlfbtfca

qlfbtfca1#

df2.KeyMap到df1的"准备好的" Key值:

df2['Length'] = df2.Key.map(dict(zip(df1.Key.str.replace(r'\..+', '', regex=True), df1.Length)))
In [45]: df2
Out[45]: 
        Key  Length
0  OK340820   50000
1  OK340821   67000
h9a6wy2h

h9a6wy2h2#

您可以使用正则表达式提取字符串,然后Map值:

import re

pattern = '|'.join(map(re.escape, df2['Key']))

s = pd.Series(df1['Length'].values, index=df1['Key'].str.extract(f'({pattern})', expand=False))

df2['Length'] = df2['Key'].map(s)

更新df2

Key  Length
0  OK340820   50000
1  OK340821   67000

或者使用merge

import re

pattern = '|'.join(map(re.escape, df2['Key']))

(df2.drop(columns='Length')
    .merge(df1, how='left', left_on='Key', suffixes=(None, '_'),
           right_on=df1['Key'].str.extract(f'({pattern})', expand=False))
    .drop(columns='Key_')
)

如果df1中的密钥始终为XXX.1形式,并且删除.1就足够了,则可选择:

df2['Length'] = df2['Key'].map(df1.set_index(df1['Key'].str.extract('([^.]+)', expand=False))['Length'])
wfveoks0

wfveoks03#

另一种可能的解决方案基于pandas.DataFrame.update

df2.update(df1.assign(Key = df1['Key'].str.extract('(.*)\.')))

输出:

Key   Length
0  OK340820  50000.0
1  OK340821  67000.0

相关问题