读入具有相同命名约定的呼叫CSV文件,并从拆分文件名插入两个新列

7fyelxc5  于 2023-03-21  发布在  其他
关注(0)|答案(1)|浏览(103)

我使用此解决方案(reading csv file with specific name in Python)将Google Drive中的所有CSV文件读入Colab笔记本中的 Dataframe 。每个文件都具有相同的命名约定,我希望将文件名拆分为两个新列,并将其附加到 Dataframe 中。
文件名结构如下:Platform_Company.csv(例如Instagram_Microsoft.csv),并且我希望将这些列附加到 Dataframe 的开头。
| 平台|连|雇员识别码|员工电子邮件|
| - ------|- ------|- ------|- ------|
| 图片分享|微软|人1|humanperson@microsoft.com|
到目前为止,我已经用这个来读取文件。我不确定层号是什么,或者我是否需要它。

from pathlib import Path
import pandas as pd

ls_data = []

csv_directory = '/content/drive/MyDrive/Colab Notebooks/'

for idx, filename in enumerate(Path(csv_directory).glob('*Instagram_*.csv')):
    df_temp = pd.read_csv(filename)
    df_temp.insert(0, 'layer_number', idx)
    ls_data.append(df_temp) 

df = pd.concat(ls_data, axis=0)

我尝试合并以下脚本(Read multiple csv files and Add filename as new column in pandas),但它不起作用,我不知道如何将其添加到当前版本中。

import glob
import os
import pandas as pd

path = r'\OUTPUT'
all_files = glob.glob(os.path.join(path, "*.csv"))     

df_from_each_file = (pd.read_csv(f, delimiter='|') for f in all_files)
concatenated_df   = pd.concat(df_from_each_file, ignore_index=True)
concatenated_df['filename'] =(all_files[f] for f in all_files)

感谢您的指导和/或建议!

roejwanj

roejwanj1#

您可以使用(Platform, Company)作为dict的键,然后使用pd.concat获得预期的输出:

import pandas as pd
import pathlib

csv_directory = pathlib.Path('/content/drive/MyDrive/Colab Notebooks/')

data = {}
for filename in csv_directory.glob('*Instagram_*.csv'):
    df = pd.read_csv(filename)
    platform, company = filename.stem.split('_')
    data[platform, company] = df

df = (pd.concat(data, axis=0).droplevel(-1)
        .rename_axis(['platform', 'company']).reset_index())

输出:

>>> df
    platform    company employee id             employee email
0  Instagram  Microsoft    person 1  humanperson@microsoft.com
1  Instagram   Facebook    person 2   humanperson@facebook.com
2  Instagram   Facebook    person 3   humanperson@facebook.com

输入文件

# Instagram_Microsoft.csv
employee id,employee email
person 1,humanperson@microsoft.com

# Instagram_Facebook.csv
employee id,employee email
person 2,humanperson@facebook.com
person 3,humanperson@facebook.com

相关问题