regex 使用正则表达式提取给定字符串中任意位置的固定长度数字

2fjabf4q 于 2022-11-18 发布在其他

关注(0)|答案(1)|浏览(209)

我有Pandas列与样本文本如下，需要提取固定长度的标识符从文本

df1=pd.DataFrame({'Incident_details':['324657_Sample text1 about the incident',
' 316678_sample text2 with details of incident',
'*DEPARTMENT LIST 316878-Sample text3 with information, ph: 01314522345',
'327787_34587621 (sample text4 with incident details)',
'Sample text5 with details',
'327997_1000587621 (sample text6 with incident info',
' 314489_incident text7 details',
'DEPARTMENT_LIST_325489_Text8 details',
'DEPARTMENT3_316489 text9 details',
'DEPARTMENT_LIST_326499',
'324512_1000257218',
'314656_text10(01345782345)',
'324757_03456789',
'DEPARTMENT_CDES_324903_35678910 (details text11)',
'326512_34500257218 - text12 details',
'Incident 325621_ 316512_ sample text 13']})

我需要提取的标识符总是以3开头，并具有6位数的固定长度。
它可以出现在字符串的开头、空格（一个、两个或三个空格）之后或下划线之后。
给定字符串中可以有多个ID，并且需要以下输出。

目前我使用的是

df1['Incident_id'] = df1['incident_details'].str \
   .findall(r'(?:^|\s|[^_])(\d{6})').str.join(", ")

此表达式没有为我的要求给予正确的输出。

regex

来源：https://stackoverflow.com/questions/74263968/extracting-fixed-length-digits-anywhere-in-given-string-using-regex

1条答案

按热度按时间

k10s72fa1#

类似这样的方法可以奏效：

(?:^|(?<=\D))3\d{5}(?=\D|$)

(?:^|(?<=\D))-我后面是行的开头或非数字字符
Python中不支持可变宽度的lookbehind，所以我不能使用这个变体：(?<=^|\D)
3\d{5}-数字3后跟五位数
(?=\D|$)-我前面是非数字字符或行尾

https://regex101.com/r/8AoWeK/1

赞(0）回复(0）举报 2022-11-18

我来回答

regex 使用正则表达式提取给定字符串中任意位置的固定长度数字

1条答案

相关问题

热门标签

最新问答