我做了一个python脚本来从心脏病学报告中提取信息(作为txt文件),看起来像这样:
Cardiac report: Complete
Hospital Ultrasound Laboratory
----------
Name Test Test Birthdate Patient 123 Gender Height Weight BSA BP
Date 01/01/0001 Tape Sonographer Ref. Doc. Physician
----------
2D
AVC 351 ms Peak SL Dispersion Full 67 ms G peak SL Full(APLAX) -16 % G
peak SL Full(A4C) -13 % G peak SL Full(A2C) -16 % G peak SL
Full(Avg) -15 % BA PSSL Full -8 % BI PSSL Full -9 % MA PSSL Full -10 %
MI PSSL Full -20 % AA PSSL Full -21 % AI PSSL Full -31 % BAS PSSL
Full -11 % BP PSSL Full -6 % MAS PSSL Full -22 % MP PSSL Full -6 %
正如你所看到的,我在列表中显式地搜索特定的“键”,然后使用正则表达式搜索键+值的数字行,然后将匹配的行放入字典中,将它们分解成键-值对
import re
keys = ['AVC', 'Peak SL Dispersion Full', 'G peak SL Full(A4C)', 'G peak SL Full(Avg)', 'G peak SL Full(APLAX)', 'BA PSSL Full', 'BI PSSL Full', 'MA PSSL Full', 'MP PSSL Full']
results = {}
with open('report1.txt') as f:
content = f.read()
for key in keys:
match = re.search(key + r'\s+([\d.-]+)', content)
if match:
results[key] = float(match.group(1))
print(results)
由于未知的原因,我不能re.search
的关键字有括号,如'G peak SL Full(A4C)', 'G peak SL Full(Avg)', 'G peak SL Full(APLAX)'
当我从txt文件中的键中删除()时,像这样的代码'G peak SL FullA4C', 'G peak SL FullAvg', 'G peak SL FullAPLAX'
,它们神奇地被搜索并插入到最终的字典中。
这是什么原因呢?
1条答案
按热度按时间myzjeezk1#
由于
(
和)
在正则表达式中具有特殊含义,因此需要对它们进行转义以匹配括号。幸运的是,有一个函数可以做到这一点:
re.escape
.然后,您可以将一行替换为: