regex 如何在python中使用re.sub()方法替换双引号内的多个前导和尾随空格/制表符？[关闭]

yduiuuwa 于 2023-06-07 发布在 Python

关注(0)|答案(2)|浏览(357)

已关闭，此问题需要details or clarity。目前不接受答复。
**想改善这个问题吗？**通过editing this post添加详细信息并澄清问题。

昨天关门了。
Improve this question
如何在re.sub（）方法中使用正则表达式替换双引号中的多个前导和尾随空格/制表符？
Regex应该只应用于文件第一行。
输入：

" Column 1  ","Column 2  ","    Column 3 ","        Column 4        "
" Record 11  ","    Record 12  ","  Record 13       ","         Record 14 "
" Record 21  ","    Record 22  ","Record 23     ","         Record 24 "
" Record 31  ","    Record 32  ","  Record 33","            Record 34"
" Record 41  ","  Record 42  "," Record 43      ","     Record 44   "

预期输出：

"Column 1","Column 2","Column 3","Column 4"
" Record 11  ","    Record 12  ","  Record 13       ","         Record 14 "
" Record 21  ","    Record 22  ","Record 23     ","         Record 24 "
" Record 31  ","    Record 32  ","  Record 33","            Record 34"
" Record 41  ","  Record 42  "," Record 43      ","     Record 44   "

使用以下正则表达式，但无法捕获单个空格：

[^\n(\w)\"]\s+\"|\"\s+[^\n(\w)\"]

注意：列和行将有所不同

regex

来源：https://stackoverflow.com/questions/76404117/how-do-i-replace-multiple-leading-and-trailing-spaces-tab-within-double-quotes-u

2条答案

按热度按时间

yrwegjxp1#

要在Python中使用正则表达式替换双引号内的多个前导和尾随空格/制表符，可以使用re模块。举个例子

import re

# Input string
input_string =' " Column 1  ","Column 2  ","    Column 3 ","        Column 4        " " Record 11  ","    Record 12  ","  Record 13       ","         Record 14 " " Record 21  ","    Record 22  ","Record 23     ","         Record 24 "       " Record 31  ","    Record 32  ","  Record 33","            Record 34"       " Record 41  ","  Record 42  "," Record 43      ","     Record 44   "'

# Regex pattern to match double-quoted substrings
pattern = r'"([^"]*)"'

# Function to replace spaces/tabs within matched substrings
def replace_spaces(match):
    # Get the matched substring without quotes
    substring = match.group(1)
    # Replace multiple spaces/tabs with a single space
    replaced_substring = re.sub(r'\s+', ' ', substring)
    # Reconstruct the substring with replaced spaces/tabs
    return f'"{replaced_substring}"'

# Replace spaces/tabs within double-quoted substrings
output_string = re.sub(pattern, replace_spaces, input_string)

print(output_string)

赞(0）回复(0）举报 2023-06-07

1cosmwyk2#

不要使用正则表达式来解析CSV等结构化数据，其中双引号可能会在双引号内被doubling them转义，从而使简单的正则表达式模式容易失败，而健壮的正则表达式模式则不必要地复杂。
相反，使用csv.reader将CSV正确地读取为列序列，将列Map到str.strip方法以去除前导和尾随空格，并使用csv.writer和quoting=csv.QUOTE_ALL选项生成所有列都用双引号括起来的输出：

import csv
from io import StringIO

output = StringIO()
writer = csv.writer(output, quoting=csv.QUOTE_ALL)
with open('input.csv') as file:
    writer.writerows(map(str.strip, row) for row in csv.reader(file))
print(output.getvalue())

给定示例输入，上面的代码将输出：

"Column 1","Column 2","Column 3","Column 4"
"Record 11","Record 12","Record 13","Record 14"
"Record 21","Record 22","Record 23","Record 24"
"Record 31","Record 32","Record 33","Record 34"
"Record 41","Record 42","Record 43","Record 44"

演示：https://replit.com/@blhsing/QuestionableAfraidPublisher

赞(0）回复(0）举报 2023-06-07

我来回答

regex 如何在python中使用re.sub()方法替换双引号内的多个前导和尾随空格/制表符？[关闭]

2条答案

相关问题

热门标签

最新问答