csv 用于标识分号之间包含逗号和空格的文本的正则表达式

czfnxgou 于 2023-03-15 发布在其他

关注(0)|答案(1)|浏览(88)

我试图识别一些文本，其中包含逗号（，）和白色（\s+）在csv是分号（;）分隔。csv条目示例如下：

09/03/2023;13;P;1210/2003 (OJ L169);2003-07-08;http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:169:0006:0023:EN:PDF;IRQ;(UNSC RESOLUTION 1483);;;;;;;;;;;;;;;;;;;;;;;;;;;14;13;1210/2003 (OJ L169);2003-07-08;http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:169:0006:0023:EN:PDF;IRQ;1937-04-28;al-Awja, near Tikrit;IRQ;;;;;;;;;;;;;;;;EU.27.28
09/03/2023;20;P;1210/2003 (OJ L169);2003-07-08;http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:169:0006:0023:EN:PDF;IRQ;(Saddam's second son);26;20;1210/2003 (OJ L169);2003-07-08;http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=OJ:L:2003:169:0006:0023:EN:PDF;IRQ;Hussein Al-Tikriti;Qusay;Saddam;Qusay Saddam Hussein Al-Tikriti;M;;Oversaw Special Republican Guard, Special Security Organisation, and Republican Guard;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;EU.39.56

在示例数据中，我尝试提取以下文本：

al-Awja, near Tikrit
Oversaw Special Republican Guard, Special Security Organisation, and Republican Guard

目标文本的两个示例中都有逗号（，），这在尝试转换分号（;）分隔的文件转换为逗号（，）分隔的文件，因为它会为字符串中现有的逗号（，）添加额外的列。
到目前为止，我有下面的正则表达式，这是带我到所需的文本。但是，我不能检索整个字符串使用这个。
正则表达式：([A-Za-z0-9-]+)([,])(\s+)([A-Za-z0-9-]+)
请帮帮我。

csv

来源：https://stackoverflow.com/questions/75745421/regular-expression-to-identify-text-between-semi-colons-that-contains-comma-and

1条答案

按热度按时间

lkaoscv71#

如果它不一定是RegEx，你可以读取CSV，例如，与Pandas.鉴于，你总是在寻找相同的列，你的代码可以是这样的：

import pandas as pd    
df = pd.read_csv('yourFile.csv', sep=';', header=None)
df[[20,41]]

对于示例数据，此函数返回：
| | 二十个|四十一|
| - ------|- ------|- ------|
| 无|钠氮|提克里特附近的奥贾|
| 1个|监督特别共和国卫队，特别安全...|钠氮|

赞(0）回复(0）举报 2023-03-15

我来回答

csv 用于标识分号之间包含逗号和空格的文本的正则表达式

1条答案

相关问题

热门标签

最新问答