目录文件:
D:\Development\AllDataSet\Data\2020\ER ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2020\PMV ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2020\PNY ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2021\ER ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2021\PMV ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2021\PNY ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2022_2023\ER ['50K_PreProcessed.csv']
D:\Development\AllDataSet\Data\2022_2023\PMV ['50K_PreProcessed.csv ']
D:\Development\AllDataSet\Data\2022_2023\PNY ['50K_PreProcessed.csv']
我有下面的代码
writer = pd.ExcelWriter('Allpros.xlsx', engine='xlsxwriter')
# num = 1
classifiers = [
['ExtraTreesClassifier :', ExtraTreeClassifier(min_samples_split=2, random_state = 2)],
['LGBMClassifier : ', LGBMClassifier(n_estimators = 400, max_depth=15,learning_rate=1)],
['XGB :', XGBClassifier(tree_method = "hist", random_state= 2, learning_rate= 1)],
]
for root,dirs,files in os.walk(filepath):
for i in files:
if i.endswith('PreProcessed.csv'):
uio = f'{root}/{i}'
year = root.rsplit('\\',2, )[1]
strategy = root.rsplit('\\',2, )[2]
print(root,files)
df = pd.read_csv(uio,usecols = col, nrows = 500)
for i in df.columns:
df[i] = df[i].astype('float64')
X = df.drop(['WinTrade'], axis=1)
Y = df.WinTrade
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=32, stratify= Y)
predictions_df = []
for name,classifier in classifiers:
classifier.fit(X_train, y_train)
predictions = classifier.predict(X_test)
kappa = round(cohen_kappa_score(y_test, predictions) * 100, 1)
predictions_df.append([name,kappa])
predictions_df = pd.DataFrame(predictions_df, columns=['Algo','Kappa',])
display(predictions_df.sort_values('Kappa', ascending=False))
num = 1
for name ,classifier in classifiers:
if name in predictions_df.sort_values('Kappa', ascending=False)['Algo'][:2].to_list():
print(name,year,strategy)
Model = classifier
Model.fit(X_train, y_train)
importances = Model.feature_importances_
weights = pd.Series(importances, index=X.columns.values).sort_values(ascending=False)
print(weights.index[:5].to_list())
weights.reset_index().rename(columns= {'index': str(strategy)+ '_'+ str(year)}).iloc[:3,:1].to_excel(writer,startcol = num, startrow = 1, index = False,)
num = num+2
writer.save()
writer.close()
该代码采取每个文件,并运行了提到的模型,并写在excel文件的top2模型每个文件的前3个功能
我期望的输出是一个excel文件中的yearwise和filewise特性:但我得到的第一个文件只输出,我知道我做循环不正确的地方,但不能弄清楚如何。请帮助。
1条答案
按热度按时间xmq68pz91#
我认为为什么你没有得到预期的输出的答案是你得到的警告:
未来警告:保存不是公共API的一部分,使用可能会给予意外结果,将在未来版本中删除
我会使用
pd.ExcelWriter
和pathlib的上下文管理器来使你的代码更简洁。这里有一个例子,所以你可以得到一般的逻辑。如果需要,请确保调整它,或者让我知道如果它没有帮助;):输出(* 电子表格 *):
下面是使用的目录/data/samples: