在语言模型的微调的第二次迭代期间,发生错误:
optimizer = AdamW(model.parameters(), lr=1e-4)
scheduler = get_linear_schedule_with_warmup(optimizer,
num_warmup_steps=steps_per_epoch*1,
num_training_steps=steps_per_epoch*NUM_EPOCHS)
print("======================= Start pretraining ==============================")
pretrain(model=model,
train_iter=train_iter,
valid_iter=valid_iter,
optimizer=optimizer,
scheduler=scheduler,
num_epochs=NUM_EPOCHS)
NUM_EPOCHS = 12
print("======================= Start training =================================")
optimizer = AdamW(model.parameters(), lr=2e-6)
scheduler = get_linear_schedule_with_warmup(optimizer,
num_warmup_steps=steps_per_epoch*2,
num_training_steps=steps_per_epoch*NUM_EPOCHS)
train(model=model,
train_iter=train_iter,
valid_iter=valid_iter,
optimizer=optimizer,
scheduler=scheduler,
num_epochs=NUM_EPOCHS)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 16.00 MiB. GPU 0 has a total capacty of 2.00 GiB of which 0 bytes is free.
请告诉我如何解决这个问题?
改变批量大小、渐变裁剪、限制序列长度、torch.cuda.empty_cache()以及改变优化器参数和epoch的数量都不会改变这种情况。
1条答案
按热度按时间amrnrhlw1#
我通常会做以下事情:
您可以在
pretrain
和train
之前添加上述内容。如果仍然出现错误,请重新减小批处理大小。
您还可以尝试here中提到的一些技术