regex 如何在python中使用re.sub()方法替换双引号内的多个前导和尾随空格/制表符?[关闭]

yduiuuwa  于 2023-06-07  发布在  Python
关注(0)|答案(2)|浏览(357)

已关闭,此问题需要details or clarity。目前不接受答复。
**想改善这个问题吗?**通过editing this post添加详细信息并澄清问题。

昨天关门了。
Improve this question
如何在re.sub()方法中使用正则表达式替换双引号中的多个前导和尾随空格/制表符?
Regex应该只应用于文件第一行。
输入:

" Column 1  ","Column 2  ","    Column 3 ","        Column 4        "
" Record 11  ","    Record 12  ","  Record 13       ","         Record 14 "
" Record 21  ","    Record 22  ","Record 23     ","         Record 24 "
" Record 31  ","    Record 32  ","  Record 33","            Record 34"
" Record 41  ","  Record 42  "," Record 43      ","     Record 44   "

预期输出:

"Column 1","Column 2","Column 3","Column 4"
" Record 11  ","    Record 12  ","  Record 13       ","         Record 14 "
" Record 21  ","    Record 22  ","Record 23     ","         Record 24 "
" Record 31  ","    Record 32  ","  Record 33","            Record 34"
" Record 41  ","  Record 42  "," Record 43      ","     Record 44   "

使用以下正则表达式,但无法捕获单个空格:

[^\n(\w)\"]\s+\"|\"\s+[^\n(\w)\"]

注意:列和行将有所不同

yrwegjxp

yrwegjxp1#

要在Python中使用正则表达式替换双引号内的多个前导和尾随空格/制表符,可以使用re模块。举个例子

import re

# Input string
input_string =' " Column 1  ","Column 2  ","    Column 3 ","        Column 4        " " Record 11  ","    Record 12  ","  Record 13       ","         Record 14 " " Record 21  ","    Record 22  ","Record 23     ","         Record 24 "       " Record 31  ","    Record 32  ","  Record 33","            Record 34"       " Record 41  ","  Record 42  "," Record 43      ","     Record 44   "'

# Regex pattern to match double-quoted substrings
pattern = r'"([^"]*)"'

# Function to replace spaces/tabs within matched substrings
def replace_spaces(match):
    # Get the matched substring without quotes
    substring = match.group(1)
    # Replace multiple spaces/tabs with a single space
    replaced_substring = re.sub(r'\s+', ' ', substring)
    # Reconstruct the substring with replaced spaces/tabs
    return f'"{replaced_substring}"'

# Replace spaces/tabs within double-quoted substrings
output_string = re.sub(pattern, replace_spaces, input_string)

print(output_string)
1cosmwyk

1cosmwyk2#

不要使用正则表达式来解析CSV等结构化数据,其中双引号可能会在双引号内被doubling them转义,从而使简单的正则表达式模式容易失败,而健壮的正则表达式模式则不必要地复杂。
相反,使用csv.reader将CSV正确地读取为列序列,将列Map到str.strip方法以去除前导和尾随空格,并使用csv.writerquoting=csv.QUOTE_ALL选项生成所有列都用双引号括起来的输出:

import csv
from io import StringIO

output = StringIO()
writer = csv.writer(output, quoting=csv.QUOTE_ALL)
with open('input.csv') as file:
    writer.writerows(map(str.strip, row) for row in csv.reader(file))
print(output.getvalue())

给定示例输入,上面的代码将输出:

"Column 1","Column 2","Column 3","Column 4"
"Record 11","Record 12","Record 13","Record 14"
"Record 21","Record 22","Record 23","Record 24"
"Record 31","Record 32","Record 33","Record 34"
"Record 41","Record 42","Record 43","Record 44"

演示:https://replit.com/@blhsing/QuestionableAfraidPublisher

相关问题