我有以下Pandas数据框。我想替换一些字符并提取子字符串(原始 Dataframe 中存在更多行)。
我正在使用以下正则表达式,但无法替换'?从一些行如第6、7、8行。
df'label','id' = df['name'].str.extract(r'{???|?[[{]?(.*?)[]}]?(?:,\s+(\d{3,100}))?\s+(\d+)')
You-Hoover-Fong syndrome, 616954 (3)
Yuan-Harel-Lupski syndrome (4)
Zaki syndrome, 619648 (3)
Zimmermann-Laband syndrome 2, 616455 (3)
Zimmermann-Laband syndrome 3, 618658 (3)
[?Birbeck granule deficiency], 613393 (3)
[?Homosexuality, male] (2)
[?Phosphohydroxylysinuria], 615011 (3)
[Acetylation, slow], 243400 (3)
预期输出为:
You-Hoover-Fong syndrome 616954
Yuan-Harel-Lupski syndrome
Zaki syndrome 619648
Zimmermann-Laband syndrome 2 616455
Zimmermann-Laband syndrome 3 618658
Birbeck granule deficiency 613393
Homosexuality, male
Phosphohydroxylysinuria 615011
Acetylation, slow 243400
如何修改当前正则表达式以包含'?从上述行中删除?
1条答案
按热度按时间hi3rlvi21#
尝试:
图纸:
初始 Dataframe :