python-3.x 我怎样才能把一个文本文件和滚动每5个字(或空白)到一行?

kuhbmx9i  于 2023-05-02  发布在  Python
关注(0)|答案(1)|浏览(109)

我有一个很大的文本文件,看起来与此类似

Model
            
Series
            
Type
            
MY
            
ProdCode
          
          
            
              
C 600 LEATHER X
              
K96
              
              
2065
              
0132
            
          
            
              
C 600 SUEDE US
              
K64
              
              
2085
              
0132
            
          
            
              
C 650 FAKE US
              
K91
              
              
2055
              
0134

52I Swift White
              
F56
              
X20
              
2235
              
3C93

我如何将其转换为看起来像这样的CSV文件?

Model,Series,Type,MY,ProdCode
C600 LEATHER X,K96, ,2065,0132
C600 SUEDE US,K64, ,2085,0132
C 650 FAKE US,K91, ,2055,1034
52I Swift White,F56,X20,2235,3C93

我在努力打败那些空白的东西。下面的代码导致字段与标题的对齐偏离一,并导致整个行未对齐。
我尝试了下面的这段代码,似乎找不到解决办法。任何帮助都是感激的,以正确地固定对齐。

with open('output.txt') as file:
    lines = [line.strip() for line in file]

item_count = 0
current_line = ""

for item in lines:
    if item != "":
        current_line = current_line + ", " + item
        item_count = item_count + 1

        if item_count == 5:
            item_count = 0
            print(current_line)
            current_line = ""
ql3eal8s

ql3eal8s1#

我不知道你的数据格式,但据我观察,每组有五个字段,由空行分隔。
在每个组中,每隔一行是一个数据行,因此每个组的最大行数是9,如果任何字段为空,则该行不存在,因此您有连续的空行。
下面的代码应该可以工作:

from pathlib import Path

lines = [i.strip() for i in Path('output.txt').read_text().splitlines()]

l = len(lines)
i = 0
data = []

while i != l:
    if not lines[i]:
        i += 1
        continue
    d = min(9, l - i)
    group = lines[i:i+d].copy()
    entry = []
    while group:
        field = group.pop(0)
        entry.append(field)
        if field and group:
            group.pop(0)
    data.append(entry)
    i += d

更新

我修复了一个导致IndexError: pop from empty list的小错误。现在它正按预期工作。
写答案的时候,我没能好好测试,直到现在也找不到机会测试代码。
输出如下:

[['Model', 'Series', 'Type', 'MY', 'ProdCode'],
 ['C 600 LEATHER X', 'K96', '', '2065', '0132'],
 ['C 600 SUEDE US', 'K64', '', '2085', '0132'],
 ['C 650 FAKE US', 'K91', '', '2055', '0134'],
 ['52I Swift White', 'F56', 'X20', '2235', '3C93']]

如果你绝对想要csv,我强烈建议你不要这样做,你可以这样做:

'\n'.join(', '.join(f'"{s}"' for s in e) for e in data)

输出:

In [10]: print('\n'.join(', '.join(f'"{s}"' for s in e) for e in data))
"Model", "Series", "Type", "MY", "ProdCode"
"C 600 LEATHER X", "K96", "", "2065", "0132"
"C 600 SUEDE US", "K64", "", "2085", "0132"
"C 650 FAKE US", "K91", "", "2055", "0134"
"52I Swift White", "F56", "X20", "2235", "3C93"

但我猜你其实想要这样的东西

from collections import namedtuple

car = namedtuple('car', data[0])
cars = [car(*i) for i in data[1:]]
In [16]: cars
Out[16]:
[car(Model='C 600 LEATHER X', Series='K96', Type='', MY='2065', ProdCode='0132'),
 car(Model='C 600 SUEDE US', Series='K64', Type='', MY='2085', ProdCode='0132'),
 car(Model='C 650 FAKE US', Series='K91', Type='', MY='2055', ProdCode='0134'),
 car(Model='52I Swift White', Series='F56', Type='X20', MY='2235', ProdCode='3C93')]

但实际上最好这样做:

import pandas as pd

cars_df = pd.DataFrame(data[1:], columns=data[0])

输出:

In [21]: print(cars_df)
             Model Series Type    MY ProdCode
0  C 600 LEATHER X    K96       2065     0132
1   C 600 SUEDE US    K64       2085     0132
2    C 650 FAKE US    K91       2055     0134
3  52I Swift White    F56  X20  2235     3C93

相关问题