pandas 从字典值中替换一段字符串

gk7wooem  于 2023-05-21  发布在  其他
关注(0)|答案(2)|浏览(92)

我有一个有两列的数据框

data = {'features': ['2q07_female', '2q06_male'], 'importance': [0.25, 0.30]}
df = pd.DataFrame(data)

我想用字典

dictnames = {'2q07': 'color of hair', 's2q06': 'color of eyes'}

来帮助我替换这些值,以便得到一个 Dataframe

df_new = pd.DataFrame({'features': ['color of hair_female', 'color of eyes_male'], 
                       'importance': [0.25, 0.30]})

有人能告诉我一个简单的方法吗?

nsc4cvqm

nsc4cvqm1#

鉴于

data = {'features': ['2q07_female', '2q06_male'], 'importance': [0.25, 0.30]}
df = pd.DataFrame(data)
dictnames = {'2q07': 'color of hair', '2q06': 'color of eyes'} # fixed key s2q06

做了

import re
pat = re.compile('|'.join(f'({re.escape(k)})' for k in dictnames))
df['features'] = df['features'].str.replace(pat, lambda m:dictnames.get(m.group(0)))

df更新为

features  importance
0  color of hair_female        0.25
1    color of eyes_male        0.30

你也可以

for k, v in dictnames.items():
    df['features'] = df['features'].str.replace(k, v)

但是对于较大的字典来说,它会非常慢。Panda Kim的答案应该是最快的,只要你的数据结构像"KeyToReplace_rest of string"

yizd12fk

yizd12fk2#

我认为's2q06'在dictnames为'2q06'和解决问题。

示例

data = {'features': ['2q07_female', '2q06_male'], 'importance': [0.25, 0.30]}
df = pd.DataFrame(data)
dictnames = {'2q07': 'color of hair', '2q06': 'color of eyes'}

编码

s = df['features'].str.split('_')

s

0    [2q07, female]
1      [2q06, male]

编辑feature

df.assign(features=s.str[0].replace(dictnames).str.cat(s.str[1], sep='_'))

输出:

features                importance
0   color of hair_female    0.25
1   color of eyes_male      0.30

当特征列中有键不对应dictnames时,处理不清楚,所以我使用replace

相关问题