我尝试使用ModelCheckpoint来保存每个时期验证损失中的最佳性能模型。
class model(pl.lightningModule)
:
:
:
def validation_step(self, batch, batch_idx):
if batch_idx == 0:
self.totalValLoss = 0
self.totalValToken = 0
batch = Batch(batch[0], batch[1])
out = self(batch.src, batch.trg, batch.src_mask, batch.trg_mask)
out = self.generator(out)
criterion = LabelSmoothing(size=V, padding_idx=0, smoothing=0)
loss = criterion(out.contiguous().view(-1, out.size(-1)), batch.trg_y.contiguous().view(-1)) / batch.ntokens
self.totalValLoss += loss * batch.ntokens
self.totalValToken += batch.ntokens
if batch_idx == 99:
self.totalValLoss = self.totalValLoss / self.totalValToken
print(f"valLoss: {self.totalValLoss}")
self.log("val_loss", self.totalValLoss)
return {"val_loss": self.totalValLoss}
if __name__ == '__main__':
if True:
model = model(...)
checkpoint_callback =
ModelCheckpoint(dirpath="D:/PycharmProjects/Transformer/Models",
save_top_k=2, monitor="val_loss")
trainer = pl.Trainer(max_epochs=10, callbacks=[checkpoint_callback])
trainer.fit(model)
在运行代码之后,我期望两个性能最好的模型会被保存到目录“D:/PycharmProjects/Transformer/Models”中,但是这并没有发生。并且在运行时也没有显示错误。
1条答案
按热度按时间hjqgdpho1#
请检查您的培训师参数:check_瓦尔_every_n_epoch 和 max_epochs,如果check_val_every_n_epoch〈max_epochs,您的代码将不会保存模型。