pandas 如何合并多个csv并创建一个 Dataframe ?

jmo0nnb3  于 2023-02-11  发布在  其他
关注(0)|答案(1)|浏览(154)

我想执行以下步骤:1.合并同一目录中的所有csv 2.创建为Dataframe 3.分配列并删除一列,然后将其中一列(“类型”)设置为索引4.对于所有文件,我希望将列D融化,以将列作为行结束

file_list = glob.glob("*.csv")

for file in file_list:
    merged_file = pd.read_csv(file)
    print(merged_file)

merged_file = pd.DataFrame()
df.columns = ['Type', 'Country', 'Source']
df = df.drop('Source', axis=1)
df = df.set_index('Type',drop=False).stack().reset_index()

agg_df = pd.melt(df, id_vars = [df, 'Source', 'Country', 'Type'])

df = df.sort_values('Type').reset_index(drop=True)
print(df)

清除预期输出(左对齐):

Mineral name - Type - Country - Prod_t_2021
Mineral name - Type - Country - Prod_t_2022
Mineral name - Type - Country - Reserves_t
Mineral name - Type - Country - Reserves_notes
  • 矿物名称可从类型中提取为字符串 *

来源是World.zip,URL为:https://www.sciencebase.gov/catalog/item/63b5f411d34e92aad3caa57f

ql3eal8s

ql3eal8s1#

IIUC,使用这个:

df = (
        pd.concat([pd.read_csv(f) for f in file_list], ignore_index=True)
            .melt(id_vars = ["Source", "Country", "Type"])
            .set_index("Type")
            .sort_index()
      )
​

输出:

Source          Country         variable                   value
Type                                                                                          
 Mine production: Palladium  MCS2023    United States  Prod_t_est_2022                     NaN
 Mine production: Palladium  MCS2023     South Africa  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023           Russia  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023           Canada  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023    United States  Reserves_ore_kt                     NaN
 Mine production: Palladium  MCS2023    United States      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023           Canada      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023           Russia      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023     South Africa      Reserves_kt                     NaN
 Mine production: Palladium  MCS2023  Other countries   Reserves_notes  Included with platinum

相关问题