pandas 如何正确使用pd.excelwriter在循环中写入文件

5kgi1eie  于 2023-04-04  发布在  其他
关注(0)|答案(1)|浏览(290)

目录文件:
D:\Development\AllDataSet\Data\2020\ER ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2020\PMV ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2020\PNY ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2021\ER ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2021\PMV ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2021\PNY ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2022_2023\ER ['50K_PreProcessed.csv']
D:\Development\AllDataSet\Data\2022_2023\PMV ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2022_2023\PNY ['50K_PreProcessed.csv']
我有下面的代码

writer = pd.ExcelWriter('Allpros.xlsx', engine='xlsxwriter')
# num = 1

classifiers = [
    ['ExtraTreesClassifier :', ExtraTreeClassifier(min_samples_split=2, random_state = 2)],
    ['LGBMClassifier : ', LGBMClassifier(n_estimators = 400, max_depth=15,learning_rate=1)],
    ['XGB :', XGBClassifier(tree_method = "hist", random_state= 2, learning_rate= 1)],
]

for root,dirs,files in os.walk(filepath):
    for i in files:
        if i.endswith('PreProcessed.csv'):
            uio = f'{root}/{i}'
            year = root.rsplit('\\',2, )[1]
            strategy = root.rsplit('\\',2, )[2]
            print(root,files)
            df = pd.read_csv(uio,usecols = col, nrows = 500)

            for i in df.columns:
                df[i] = df[i].astype('float64')

            X = df.drop(['WinTrade'], axis=1)
            Y = df.WinTrade

            X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=32, stratify= Y)
            
            predictions_df = []

            for name,classifier in classifiers:
                classifier.fit(X_train, y_train)
                predictions = classifier.predict(X_test)

                kappa = round(cohen_kappa_score(y_test, predictions) * 100, 1)
                predictions_df.append([name,kappa])

            predictions_df = pd.DataFrame(predictions_df, columns=['Algo','Kappa',])
            display(predictions_df.sort_values('Kappa', ascending=False))
            
            num = 1

            for name ,classifier in classifiers:
                if name in predictions_df.sort_values('Kappa', ascending=False)['Algo'][:2].to_list():
                    print(name,year,strategy)
                    Model = classifier
                    Model.fit(X_train, y_train)

                    importances = Model.feature_importances_
                    weights = pd.Series(importances, index=X.columns.values).sort_values(ascending=False)
                    print(weights.index[:5].to_list())
                    weights.reset_index().rename(columns= {'index': str(strategy)+ '_'+ str(year)}).iloc[:3,:1].to_excel(writer,startcol = num, startrow = 1, index = False,)
                    num = num+2
            writer.save()
writer.close()

该代码采取每个文件,并运行了提到的模型,并写在excel文件的top2模型每个文件的前3个功能
我期望的输出是一个excel文件中的yearwise和filewise特性:但我得到的第一个文件只输出,我知道我做循环不正确的地方,但不能弄清楚如何。请帮助。

xmq68pz9

xmq68pz91#

我认为为什么你没有得到预期的输出的答案是你得到的警告:
未来警告:保存不是公共API的一部分,使用可能会给予意外结果,将在未来版本中删除
我会使用pd.ExcelWriterpathlib的上下文管理器来使你的代码更简洁。这里有一个例子,所以你可以得到一般的逻辑。如果需要,请确保调整它,或者让我知道如果它没有帮助;):

classifiers = [
    ['ExtraTreesClassifier:', ExtraTreeClassifier(min_samples_split=2, random_state=2)],
    ['LGBMClassifier:', LGBMClassifier(n_estimators=400, max_depth=15, learning_rate=1)],
    ['XGB:', XGBClassifier(tree_method="hist", random_state=2, learning_rate=1)]
]

cols_to_use = ['column1', 'column2', 'column3', 'column4', 'column5', 'WinTrade']

with pd.ExcelWriter('Allpros.xlsx') as writer:
    start_col = 1

    for filepath in Path('tmp/').rglob('*.csv'):
        year, strategy = filepath.parts[-3:-1]

        df = pd.read_csv(filepath, usecols=cols_to_use, nrows=500, dtype="float64")

        X, Y = df.drop(['WinTrade'], axis=1), df["WinTrade"]
        X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=32, stratify=Y)
        
        predictions_df = []
        for name, classifier in classifiers:
            classifier.fit(X_train, y_train)
            predictions = classifier.predict(X_test)
            kappa = round(cohen_kappa_score(y_test, predictions) * 100, 1)
            predictions_df.append([name, kappa])

        predictions_df = (pd.DataFrame(predictions_df, columns=['Algo', 'Kappa'])
                                .sort_values(by=['Kappa'], ascending=False))

        for name, classifier in classifiers:
            if name in predictions_df['Algo'].tolist()[:2]:
                classifier.fit(X_train, y_train)
                importances = classifier.feature_importances_
                
                col_name = f'{strategy}_{year}'
                weights = pd.Series(importances, index=X.columns).sort_values(ascending=False)
                
                (weights.reset_index().rename(columns= {'index': col_name})
                         .iloc[:3,:1].to_excel(writer, startrow=1, startcol=start_col, index=False))
                
                start_col += 2

输出(* 电子表格 *):

下面是使用的目录/data/samples:

.. # <- the script above was run from here
┗━━ tmp
    ┗━━ Development
        ┗━━ AllDataSet
            ┗━━ Data
                ┣━━ 2020
                ┃   ┣━━ ER
                ┃   ┃   ┗━━ 50K_PreProcessed.csv
                ┃   ┣━━ PMV
                ┃   ┃   ┗━━ 50K_PreProcessed.csv
                ┃   ┗━━ PNY
                ┃       ┗━━ 50K_PreProcessed.csv
                ┣━━ 2021
                ┃   ┣━━ ER
                ┃   ┃   ┗━━ 50K_PreProcessed.csv
                ┃   ┣━━ PMV
                ┃   ┃   ┗━━ 50K_PreProcessed.csv
                ┃   ┗━━ PNY
                ┃       ┗━━ 50K_PreProcessed.csv
                ┗━━ 2022_2023
                    ┣━━ ER
                    ┃   ┗━━ 50K_PreProcessed.csv
                    ┣━━ PMV
                    ┃   ┗━━ 50K_PreProcessed.csv
                    ┗━━ PNY
                        ┗━━ 50K_PreProcessed.csv

相关问题