将字符串添加到pandas dataframe列,其中包含多个逗号分隔值

u3r8eeie  于 2023-04-19  发布在  其他
关注(0)|答案(3)|浏览(121)

我试图在pandas数据框中创建一个新列,该列包含字符串前缀和来自另一列的值。包含值的列具有多个逗号分隔值的示例。例如:

MIMNumber      
102610
114080,601079

我希望dataframe看起来像这样:

MIMNumber       OMIM_Link
102610  https://www.omim.org/entry/102610
114080,601079   https://www.omim.org/entry/114080,https://www.omim.org/entry/601079

我试过这个:

df['OMIM_Link'] = df['MIMNumber'].map('https://www.omim.org/entry/{}'.format)

但这并没有将字符串前缀添加到所有有多个逗号分隔值的示例中:

MIMNumber       OMIM_Link
102610  https://www.omim.org/entry/102610
114080,601079   https://www.omim.org/entry/114080,601079

我也试过这个:

url = 'https://www.omim.org/entry/'
df['OMIM_Link'] = df['MIMNumber'].apply(url.join)

但是字符串前缀连接在每个值之间:

MIMNumber       OMIM_Link
102610  1https://www.omim.org/entry/0https://www.omim.org/entry/2https://www.omim.org/entry/6https://www.omim.org/entry/1https://www.omim.org/entry/0
114080,601079   1https://www.omim.org/entry/1https://www.omim.org/entry/4https://www.omim.org/entry/0https://www.omim.org/entry/8https://www.omim.org/entry/0https://www.omim.org/entry/,https://www.omim.org/entry/6https://www.omim.org/entry/0https://www.omim.org/entry/1https://www.omim.org/entry/0https://www.omim.org/entry/7https://www.omim.org/entry/9

有什么建议吗?

oymdgrw7

oymdgrw71#

你可以试试regex replace

df['out'] = df['MIMNumber'].replace(r'(\d+)', r'https://www.omim.org/entry/\1', regex=True)
print(df)

       MIMNumber  \
0         102610
1  114080,601079

                                                                   out
0                                    https://www.omim.org/entry/102610
1  https://www.omim.org/entry/114080,https://www.omim.org/entry/601079
cwtwac6a

cwtwac6a2#

将逗号替换为,https://www.omim.org/entry/,并在开头添加https://www.omim.org/entry/

df['OMIM_Link'] = 'https://www.omim.org/entry/' + df['MIMNumber'].str.replace(',', ',https://www.omim.org/entry/')
5m1hhzi4

5m1hhzi43#

如果你有多种域/路径,就把它放在这里:

import pandas as pd

df = pd.DataFrame({'MIMNumber': ['102610', '114080,601079'],
                   'OMIM_Link': ['https://www.omim.org/entry/',
'https://www.omim.org/entry/,https://www.omim.org/entry/']})

for i in range(len(df)):
    mim = df['MIMNumber'][i]
    if "," in mim:
        mim = mim.split(",")
        link = df['OMIM_Link'][i].split(",")
        df['OMIM_Link'][i] = ",".join(['{o}{m}'.format(o=link[i], m=mim[i])
                                   for i in range(len(link))])
    else:
        link = df['OMIM_Link'][i]
        df['OMIM_Link'][i] = '{o}{m}'.format(o=link, m=mim)

print(df)

它可以做你想要的:

MIMNumber                                          OMIM_Link
0         102610                  https://www.omim.org/entry/102610
1  114080,601079  https://www.omim.org/entry/114080,https://www....

相关问题