python 将csv文件加载到Pandas DataFrame并添加一个新列,其中关键字取自“关键字”行

ymzxtsji  于 2023-04-28  发布在  Python
关注(0)|答案(4)|浏览(142)

我有一个.csv文件,格式如下:

Cash
Serial,Date,Balance
1,2021-03-05,34
2,2021-05-04,54
Credit
Serial,Date,Balance
18,2021-03-05,898
21,2021-04-01,654
Savings
Serial,Date,Balance
3,2021-03-18,19384
34,2021-12-04,472

我想把它加载到一个pandas DataFrame中,结构如下

Serial,Asset,Date,Balance
1,Cash,2021-03-05,34
2,Cash,2021-05-04,54
18,Credit,2021-03-05,898
21,Credit,2021-04-01,654
3,Savings,2021-03-18,19384
34,Savings,2021-12-04,472

我已经可以使用以下代码将文件加载到DataFrame中:

import numpy as np

FILE = r"/myfile.csv"

with open(FILE, 'r') as temp_f:
    col_count = [ len(l.split(",")) for l in temp_f.readlines() ]

column_names = [i for i in range(0, max(col_count))]
df = pd.read_csv(FILE, header=None, delimiter=",", names=column_names)
df['Asset'] = np.nan
print(df)

但我现在卡住了如何删除行与“序列,日期,余额”和填写行在资产列与相应的条目(“现金”,“信贷”等。)。欢迎提出任何建议。

dddzy1tm

dddzy1tm1#

CSV应该有一个标题,但这将按原样读取它:

import pandas as pd
import csv

df = pd.DataFrame(columns='Serial Asset Date Balance'.split())

with open('myfile.csv', 'r', newline='') as temp_f:
    reader = csv.reader(temp_f)
    for line in reader:
        if len(line) == 1:   # Only one thing in the line?
            asset = line[0]  # remember it as the asset type
            next(reader)     # and skip the header line below it
        else: # add to the end of the dataframe
            df.loc[len(df.index)] = line[0], asset, line[1], line[2]

print(df)
df.to_csv('output.csv', index=False)

输出:

Serial    Asset        Date Balance
0      1     Cash  2021-03-05      34
1      2     Cash  2021-05-04      54
2     18   Credit  2021-03-05     898
3     21   Credit  2021-04-01     654
4      3  Savings  2021-03-18   19384
5     34  Savings  2021-12-04     472

output.csv:

Serial,Asset,Date,Balance
1,Cash,2021-03-05,34
2,Cash,2021-05-04,54
18,Credit,2021-03-05,898
21,Credit,2021-04-01,654
3,Savings,2021-03-18,19384
34,Savings,2021-12-04,472
balp4ylt

balp4ylt2#

我有一个.csv文件,格式如下
这绝对不是一个CSV文件。这是三个这样的文件:

  • cash.csv
  • credit.csv
  • savings.csv

以这种方式将它们存储在文件系统中。
读入三个单独的 Dataframe 。然后以通常的方式对它们进行报告,以生成单个组合 Dataframe 。提示:在每个小数据框中添加一个恒定的文本列“cash”、“credit”或“savings”,这将简化您的任务。

fcg9iug3

fcg9iug33#

您可以用途:

import io

# Separate sections
data = {}
with open('data.csv') as fp:
    for row in fp:
        if ',' not in row:
            k = row.strip()
            data[k] = []
        else:
            data[k].append(row.strip())

# Build individual dataframes
dfs = []            
for asset, values in data.items():
    df = pd.read_csv(io.StringIO('\n'.join(values)))
    df.insert(1, 'Asset', asset)
    dfs.append(df)

# Merge them
df = pd.concat(dfs, ignore_index=True)

输出:

>>> df
   Serial    Asset        Date  Balance
0       1     Cash  2021-03-05       34
1       2     Cash  2021-05-04       54
2      18   Credit  2021-03-05      898
3      21   Credit  2021-04-01      654
4       3  Savings  2021-03-18    19384
5      34  Savings  2021-12-04      472
tuwxkamq

tuwxkamq4#

我将使用正则表达式re.finditer来迭代块,io.StringIO+ pandas.read_csv来加载每个块,concat将它们组合成一个DataFrame:

import re, io
import pandas as pd

with open('myfile.csv') as f:
    out = pd.concat(
        {m.group(1): pd.read_csv(io.StringIO(m.group(2)))
        for m in re.finditer('(\w+)\n(.*?)\n(?=\w+\n|$)',
                                f.read(), flags=re.DOTALL)
        }, names=['Asset']).reset_index('Asset')

输出:

Asset  Serial        Date  Balance
0     Cash       1  2021-03-05       34
1     Cash       2  2021-05-04       54
0   Credit      18  2021-03-05      898
1   Credit      21  2021-04-01      654
0  Savings       3  2021-03-18    19384
1  Savings      34  2021-12-04      472

regex demo

相关问题