tensorflow 在单独的python文件中加载测试数据以进行模型训练和评估的最佳实践:我应该再次train_test_split还是加载测试数据?

hs1ihplo  于 2023-10-23  发布在  Python
关注(0)|答案(1)|浏览(129)

假设我有一个train.py文件,其中包含训练模型的逻辑,然后将其参数保存到名为weights/的目录中:

x_train, x_test, y_train, y_test = train_test_split(x, y)
model = compile()
model.fit(x_train, y_train)
model.save_weights("weights/")

另一个文件,即evaluate.py,包含用于评估模型性能的逻辑,其参数将从weights/目录加载:

x_train, x_test, y_train, y_test = train_test_split(x, y)
model = compile()
model.load_weights("weights/")
model.evaluate(x_test, y_test)

我的问题是:在evaluate.py文件中,语句x_train, x_test, y_train, y_test = train_test_split(x, y)是否正确,或者我是否应该加载在train.py文件中拆分的相同测试集?在这种情况下,train.py文件将是:

x_train, x_test, y_train, y_test = train_test_split(x, y)
np.save("x_test", x_test) 
np.save("y_test", y_test) 
model = compile()
model.fit(x_train, y_train)
model.save_weights("weights/")

evaluate.py文件将是:

x_test = np.load("x_test")
y_test = np.load("y_test")
model = compile()
model.load_weights("weights/")
model.evaluate(x_test, y_test)
7d7tgy0s

7d7tgy0s1#

我认为处理评估模型的简单方法是将训练数据和测试数据分开,在训练数据集上模型学习权重,然后在评估阶段检查模型对测试数据的度量。您不需要在evaluate.py中再次拆分数据。我还建议在分割数据集时指定random_state以获得可重现的结果。

train_test_split(X, y, random_state=42)

相关问题