Pandas不分隔导入的csv文件的列

a0x5cqrl  于 2023-04-18  发布在  其他
关注(0)|答案(2)|浏览(112)

我想使用pandas导入一个csv文件作为 Dataframe 。该文件的结构如屏幕截图所示。(https://i.stack.imgur.com/N91d7.png)然而,由于某些原因,使用

df = pd.read_csv("Test.csv", delimiter = ',')

不起作用。生成的dataframe在一个单独的列中包含所有内容。
如何正确地分隔列?提前感谢。
我已经尝试了不同的“read_csv”提示符选项,但是,我还没有找到解决方案。

zhte4eai

zhte4eai1#

我不确定你是怎么得到一个逗号不明确的csv文件的(值是不带引号的,并且包含逗号,分隔符也是逗号),但好消息是…
假设整个文件都遵循您问题中的格式(特别是,如果每隔一列为空),这应该可以实现您想要的功能:

import pandas as pd
text='Fact,Fact Note,' + ','.join(f'{x},Value Note for {x}' for x in ['California','Arkansas','Arizona','Alaska','Alabama','United States'])
text += '''
Population Estimates, July 1, 2022, (V2022),,39,029,342,,3,045,637,,7,359,197,,733,583,,5,074,296,,333,287,557,
Population Estimates, July 1, 2021, (V2021),,39,142,991,,3,028,122,,7,264,877,,734,182,,5,049,846,,332,031,554,'''
print(text,'\n')

from io import StringIO
i = 0
rows = []

with StringIO(text) as f:
    for line in f:
        if not i:
            columns = line[:-1].split(',') #-1 is to avoid newline
            print(columns)
        else:
            vals = line[:-1].split(',,')
            row = [vals[0],''] + [x for z in zip([int(val.replace(',', '')) for val in vals[1:]], ['']*(len(vals)-1)) for x in z]
            rows.append(row)
        i += 1
print(rows)
df = pd.DataFrame(rows, columns=columns)
print('','',df,sep='\n')

样品输入:

Fact,Fact Note,California,Value Note for California,Arkansas,Value Note for Arkansas,Arizona,Value Note for Arizona,Alaska,Value Note for Alaska,Alabama,Value Note for Alabama,United States,Value Note for United States
Population Estimates, July 1, 2022, (V2022),,39,029,342,,3,045,637,,7,359,197,,733,583,,5,074,296,,333,287,557,
Population Estimates, July 1, 2021, (V2021),,39,142,991,,3,028,122,,7,264,877,,734,182,,5,049,846,,332,031,554

输出:

Fact Fact Note  California Value Note for California  ...  Alabama Value Note for Alabama  United States Value Note for United States
0  Population Estimates, July 1, 2022, (V2022)              39029342                            ...  5074296                             333287557
1  Population Estimates, July 1, 2021, (V2021)              39142991                            ...  5049846                             332031554

[2 rows x 14 columns]

转置输出(更易于阅读):

0                                            1
Fact                          Population Estimates, July 1, 2022, (V2022)  Population Estimates, July 1, 2021, (V2021)
Fact Note
California                                                       39029342                                     39142991
Value Note for California
Arkansas                                                          3045637                                      3028122
Value Note for Arkansas
Arizona                                                           7359197                                      7264877
Value Note for Arizona
Alaska                                                             733583                                       734182
Value Note for Alaska
Alabama                                                           5074296                                      5049846
Value Note for Alabama
United States                                                   333287557                                    332031554
Value Note for United States

请注意,为了便于创建示例,我使用了string和StringIO类而不是文本文件。

jaql4c8m

jaql4c8m2#

用途:

df = pd.read_csv("Test.csv", sep = ',', header=0)

相关问题