如何使用panda和for循环读取和操作多个CSV文件?

kx1ctssn  于 2023-01-10  发布在  其他
关注(0)|答案(2)|浏览(208)

我想读取CSV文件列表,例如exon_kipan.00001.csv、exon_kipan.00002.csv、exon_kipan.00003.csv和exon_kipan.00004.csv(总共24个文件),然后在连接 Dataframe 之前使用panda执行一系列操作。
对于单个文件,我将执行以下操作:

df= pd.read_csv("exon_kipan.csv", sep="\t", index_col=0, low_memory=False)
df= df[df.columns[::3]]
df= df.T 
del df[df.columns[0]]
df.index = df.index.str.upper()
df= df.sort_index()
df.index = ['-'.join( s.split('-')[:4]) for s in df.index.tolist() ]
df.rename_axis(None, axis=1, inplace=True)

但是,现在我想读取、操作和连接多个文件。

filename = '/work/exon_kipan.{}.csv'
df_dict = {}
exon_clin_list = []
for i in range(1, 25):
    df_dict[i] = pd.read_csv(filename, sep="\t", index_col=0, low_memory=False)
    df_dict[i] = df_dict[i][df_dict[i].columns[::3]]
    df_dict[i] = df_dict[i].T
    del df_dict[i][df_dict[i].columns[0]]
    df_dict[i].index = df_dict[i].index.str.upper()
    df_dict[i] = df_dict[i].sort_index()
    df_dict[i].index = ['-'.join( s.split('-')[:4]) for s in df_dict[i].index.tolist() ]
    df_dict[i].rename_axis(None, axis=1, inplace=True)

    exon_clin_list.append(df_dict[i])

exon_clin = pd.concat(df_list)

我的代码引发:

FileNotFoundError: [Errno 2] No such file or directory: '/work/exon_kipan.{}.csv'
a7qyws3x

a7qyws3x1#

您必须使用strformat方法:

filename = '/work/exon_kipan.{:05}.csv'  # <- don't forget to modify here
...
for i in range(1, 25):
    df_dict[i] = pd.read_csv(filename.format(i), ...)

试验:

filename = '/work/exon_kipan.{:05}.csv'
for i in range(1, 25):
    print(filename.format(i))

# Output
/work/exon_kipan.00001.csv
/work/exon_kipan.00002.csv
/work/exon_kipan.00003.csv
/work/exon_kipan.00004.csv
/work/exon_kipan.00005.csv
/work/exon_kipan.00006.csv
/work/exon_kipan.00007.csv
/work/exon_kipan.00008.csv
/work/exon_kipan.00009.csv
/work/exon_kipan.00010.csv
/work/exon_kipan.00011.csv
/work/exon_kipan.00012.csv
/work/exon_kipan.00013.csv
/work/exon_kipan.00014.csv
/work/exon_kipan.00015.csv
/work/exon_kipan.00016.csv
/work/exon_kipan.00017.csv
/work/exon_kipan.00018.csv
/work/exon_kipan.00019.csv
/work/exon_kipan.00020.csv
/work/exon_kipan.00021.csv
/work/exon_kipan.00022.csv
/work/exon_kipan.00023.csv
/work/exon_kipan.00024.csv
50pmv0ei

50pmv0ei2#

也许像这样的东西会有用

#write a function to read file do some processing and return a dataframe
def read_file_and_do_some_actions(filename):
    df = pd.read_csv(filename, index_col=None, header=0)
    #############################
    #do some processing
    #############################
    return df

path = r'/home/tester/inputdata/exon_kipan'
all_files = glob.glob(os.path.join(path, "/work/exon_kipan.*.csv"))

#for each file in all_files list, call function read_file_and_do_some_actions and then concatenate all the dataframes into one dataframe
df = pd.concat((read_file_and_do_some_actions(f) for f in all_files), ignore_index=True)

相关问题