pandas 解析器错误：在python panda/dask中，“”应该出现在“”之后

vcudknz3 于 2022-11-27 发布在 Python

关注(0)|答案(2)|浏览(130)

嗨，我正在使用3GB的txt文件，并希望将其更改为CSV，但它给出错误_坏_行解析器错误：'"'后面应为''
我正在使用的代码

df1 = df.read_csv("path\\logs.txt", delimiter = "\t", encoding = 'cp437',engine="python")
df1.to_csv("C:\\Data\\log1.csv",quotechar='"',error_bad_lines=False, header=None, on_bad_lines='skip')

pandas

来源：https://stackoverflow.com/questions/74580371/parsererror-expected-after-in-python-pandas-dask

2条答案

按热度按时间

6ojccjat1#

在read_csv中添加on_bad_lines='warn'。看起来有一些错误的行。

赞(0）回复(0）举报 2022-11-27

qxsslcnc2#

下面的代码在每个记录或制表符之间查找不需要的引号（'和"），并将其替换为nothing。然后将制表符（\t）替换为逗号（,）。
此脚本使用regex查找不需要的引号。

import re

# Use regex to locate unwanted quotation marks
pattern = re.compile(r"(?!^|\"$)[\"\']")

new_file = open("C:\\Data\\log1.csv", "a")

# Read the file
with open("path\\logs.txt", "r") as f:
    for line in f.readlines():
        new_l = ""
        for l in line.split('\t'):
            
            # Replace the unwanted quotation marks
            l = re.sub(pattern, "", l)
            if new_l == "":
                new_l = new_l + l
            else:
                new_l = new_l + ',' + l
        
        # Write the line to the new file        
        new_file.write(new_l)

new_file.close()

出现此问题的原因是记录中有一个不需要的引号。例如：

"The"\t"quick brown"" fox "jumps over the"\t"lazy dog"

赞(0）回复(0）举报 2022-11-27

我来回答

pandas 解析器错误：在python panda/dask中，“”应该出现在“”之后

2条答案

相关问题

热门标签

最新问答