修复CSV文件的格式

brc7rcf0  于 2023-06-03  发布在  其他
关注(0)|答案(2)|浏览(200)

所以我打开了一个csv文件来解析,但是csv中的某些行的格式不正确。csv格式通常如下所示:

'ipAddress','associatedTix''\n'
'ipAddress','associatedTix''\n'
'ipAddress','associatedTix''\n'
'ipAddress','associatedTix''\n'
'ipAddress','associatedTix''\n'

但是在csv中的某些点(因为有多个associateTixipaddress),当有多个associatedTix时,它的格式如下:

'ipAddress','associatedTix''\n'
'ipAddress','associatedTix''\n'
'ipAddress','associatedTix''\n'
'associatedTix','associatedTix''\n'
'associatedTix''\n'
'ipAddress','associatedTix''\n'
'ipAddress','associatedTix''\n'

所以我要做的是得到正确格式的csv:

for line in inputCsvFile:
    chunks = line.split(",")
        if associatedTix in chunks[0]:
            #go through the following line's after that line until you find an ip address
            #go one line above the line with the ip address
            #push that column to the above row, and repeat until you get to the original line3 row with the ip address

这3行注解就是我在语法上遇到麻烦的那一行,所以任何帮助确定语法的人都将不胜感激。此外,确认我的逻辑将得到正确的格式csv将不胜感激。

oxf4rvwz

oxf4rvwz1#

csv可以正确处理带换行符的字段,只要它们被引用:

$ cat t.csv
136.107.169.150,
165.246.197.229,"ESCCB ID#: 90Z-009204,
ESCCB ID#: 90Z-003262,
ESCCB ID#: 90Z-003011                   ESCCB ID#: 90Z-001047"
155.89.77.11,
91.195.188.160,
154.176.191.130,

...

>>> with open('t.csv') as fp:
...   read = csv.reader(fp)
...   for line in read:
...     print line
... 
['136.107.169.150', '']
['165.246.197.229', 'ESCCB ID#: 90Z-009204,\nESCCB ID#: 90Z-003262,\nESCCB ID#: 90Z-003011                   ESCCB ID#: 90Z-001047']
['155.89.77.11', '']
['91.195.188.160', '']
['154.176.191.130', '']

所以你认为你有问题,实际上你没有。您所需要做的就是对第二个字段进行后处理,然后将其写回。

wljmcqd8

wljmcqd82#

正如伊格纳西奥所说,如果您使用csv模块,就不会有任何问题。如果你不想使用它,使用这个:

with open("inCSV.txt", "r") as f:
    text = f.read()
    # Buffer
    b = ""
    keep_reading = False
    for line in text.split("\n"):
        if "\"" in line:
            # A bunch of tixs are going to appear!
            if b == "":
                # There are more tixs to read
                b += line
                # More tixs to come
                keep_reading = True
            else:
                # This is the last tix to read
                b += line.replace(",", "")
                # Remove newlines, extra whitespace and commas
                b = b.translate(None, " ,\n\"")
                # Add nice looking whitespace
                b = b.replace("E", " E")
                b = b.replace(":", ": ")
                b = b.replace("I", " I")
                b = b.strip()
                # Add comma after IP address
                ip_index = b.find(" ")
                b = b.replace(b[:ip_index + 1], b[:ip_index] + ",")
                # No more tixs to read
                keep_reading = False

                print b
                # reset buffer
                b = ""
        elif keep_reading:
            b += line
        else:
            print line

这样做的好处是,正如martineau所说,您不需要将整个文件存储在内存中。
但是,如果使用csv模块,则需要进行更多的操作:

import csv
with open("inCSV.txt", "r") as f:
    text = csv.reader(f)
    for line in text:
        # Get associated tix
        tix = line[1]
        # Remove newlines, extra whitespace and commas
        tix = tix.translate(None, " ,\n")
        # Add nice looking whitespace
        tix = tix.replace("E", " E")
        tix = tix.replace(":", ": ")
        tix = tix.strip()

        line[1] = tix
        print line

两者都将为您提供:

['248.53.88.234-24', '']
['61.15.168.199-24', '']
['181.140.27.200', '']
['192.128.254.150', '']
['8.160.137.156', 'ESCCB ID#: 90Z-007463']
['136.107.169.150', '']
['165.246.197.229', 'ESCCB ID#: 90Z-009204 ESCCB ID#: 90Z-003262 ESCCB ID#: 90Z-003011 ESCCB ID#: 90Z-001047']
['155.89.77.11', '']
['91.195.188.160', '']
['154.176.191.130', '']
['105.98.164.205', '']
['245.6.16.92', '']
['207.108.19.66', 'ESCCB ID#: 90Z-002345']
['84.71.75.211', 'ESCCB ID#: 90Z-008567 ESCCB ID#: 90Z-006765 ESCCB ID#: 90Z-009384ESCCB ID#: 90Z-001234ESCCB ID#: 90Z-007465']
['33.236.5.19', '']
['127.42.160.158', 'ESCCB ID#: 90Z-002939']
['94.34.104.184', '']

相关问题