我如何在Pandas中读取并连接许多csv文件到一个大的 Dataframe 中?

b4lqfgs4  于 2022-12-15  发布在  其他
关注(0)|答案(2)|浏览(151)

我有100个csv文件在一个文件夹中。我想concatanate这些csv文件到一个单一的 Dataframe 。
我使用了以下代码:

import os
import pandas as pd 

data_suntracker = [f for f in os.listdir(".") if f.endswith('.csv')]
df = pd.concat(map(pd.read_csv, data_suntracker))

输出:

runfile('C:/Users/vasil/.spyder-py3/autosave/dokimastiko_sun4.py', wdir='C:/Users/vasil/.spyder-py3/autosave')
Traceback (most recent call last):

  File "C:\Program Files\Spyder\pkgs\spyder_kernels\py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "c:\users\vasil\.spyder-py3\autosave\dokimastiko_sun4.py", line 5, in <module>
    df = pd.concat(map(pd.read_csv, data_suntracker))

  File "C:\Program Files\Spyder\pkgs\pandas\util\_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)

  File "C:\Program Files\Spyder\pkgs\pandas\core\reshape\concat.py", line 368, in concat
    op = _Concatenator(

  File "C:\Program Files\Spyder\pkgs\pandas\core\reshape\concat.py", line 422, in __init__
    objs = list(objs)

  File "C:\Program Files\Spyder\pkgs\pandas\util\_decorators.py", line 211, in wrapper
    return func(*args, **kwargs)

  File "C:\Program Files\Spyder\pkgs\pandas\util\_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\readers.py", line 950, in read_csv
    return _read(filepath_or_buffer, kwds)

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\readers.py", line 611, in _read
    return parser.read(nrows)

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\readers.py", line 1778, in read
    ) = self._engine.read(  # type: ignore[attr-defined]

  File "C:\Program Files\Spyder\pkgs\pandas\io\parsers\c_parser_wrapper.py", line 230, in read
    chunks = self._reader.read_low_memory(nrows)

  File "pandas\_libs\parsers.pyx", line 808, in pandas._libs.parsers.TextReader.read_low_memory

  File "pandas\_libs\parsers.pyx", line 866, in pandas._libs.parsers.TextReader._read_rows

  File "pandas\_libs\parsers.pyx", line 852, in pandas._libs.parsers.TextReader._tokenize_rows

  File "pandas\_libs\parsers.pyx", line 1973, in pandas._libs.parsers.raise_parser_error

ParserError: Error tokenizing data. C error: Expected 1 fields in line 5, saw 3

因为我有spyder应用程序...在矩阵中的右上角的地方,输出如下我有一个列表的100 csv文件标题(字符串)不是数据库。我如何修复我的代码,以创建数据库,有这些数据文件的所有数据?所有的文件有相同的列。

rbpvctlc

rbpvctlc1#

试试这个代码,它应该对你有用

import pandas as pd
import glob
import os

path = 'your path to files'
all_files = glob.glob(os.path.join(path , "/*.csv"))

temp_list = []

for filename in all_files:
    temp_df = pd.read_csv(filename, index_col=None, header=0)
    temp_list.append(temp_df)

single_df = pd.concat(temp_list, axis=0, ignore_index=True)
ktca8awb

ktca8awb2#

使用pathlib

import pandas as pd
from pathlib import Path

path = "path/to/files/"
df = pd.concat((pd.read_csv(x) for x in Path(path).glob("*.csv")), ignore_index=True)

相关问题