pandas 使用正则表达式获取子字符串

nwo49xxi 于 2023-02-11 发布在其他

关注(0)|答案(1)|浏览(137)

我是Pandas的新手。我正在尝试从一个字符串中获取一个多重子串。但是我需要检查特定的开始和结束。
如果它存在，我需要得到它的位置，哪个子串。

pandas

来源：https://stackoverflow.com/questions/75287064/get-a-substring-using-regex

1条答案

按热度按时间

bfnvny8b1#

使用str.replace：

target = 'hi|love'

m = df['sequence'].str.contains(target)

df.loc[m, 'output'] = (df.loc[m, 'sequence']
                         .str.replace(fr'.*({target}).*',
                                      lambda m: f'{m.start(1)+1},{m.group(1)}',
                                      regex=True)
                       )

df.loc[~m, 'output'] = 'NA'

输出：

sequence  output
0   HelloWorld      NO
1    worldofhi    8,hi
2  worldoflove  8,love

使用的输入：

sequence
0   HelloWorld
1    worldofhi
2  worldoflove

仅在子字符串7：10中检查

target = 'hi|love'

s = df['sequence'].str[7:10+1]

m = s.str.contains(target)

df.loc[m, 'output'] = (s[m]
                         .str.replace(fr'.*({target}).*',
                                      lambda m: f'{m.start(1)+7+1},{m.group(1)}',
                                      regex=True)
                       )

df.loc[~m, 'output'] = 'NA'

赞(0）回复(0）举报 2023-02-11

我来回答

pandas 使用正则表达式获取子字符串

1条答案

仅在子字符串7：10中检查

相关问题

热门标签

最新问答