As suggested in the last comment, we can use the class introduced by https://nlp.seas.harvard.edu/2018/04/03/attention.html#optimizer. But this answer will give an error unless we define a function to update the state_dict. 下面是完整的调度程序:
class NoamOpt:
"Optim wrapper that implements rate."
def __init__(self, model_size, warmup, optimizer):
self.optimizer = optimizer
self._step = 0
self.warmup = warmup
self.model_size = model_size
self._rate = 0
def state_dict(self):
"""Returns the state of the warmup scheduler as a :class:`dict`.
It contains an entry for every variable in self.__dict__ which
is not the optimizer.
"""
return {key: value for key, value in self.__dict__.items() if key != 'optimizer'}
def load_state_dict(self, state_dict):
"""Loads the warmup scheduler's state.
Arguments:
state_dict (dict): warmup scheduler state. Should be an object returned
from a call to :meth:`state_dict`.
"""
self.__dict__.update(state_dict)
def step(self):
"Update parameters and rate"
self._step += 1
rate = self.rate()
for p in self.optimizer.param_groups:
p['lr'] = rate
self._rate = rate
self.optimizer.step()
def rate(self, step = None):
"Implement `lrate` above"
if step is None:
step = self._step
return (self.model_size ** (-0.5) *
min(step ** (-0.5), step * self.warmup ** (-1.5)))
4条答案
按热度按时间l3zydbqr1#
PyTorch提供了 learning-rate-schedulers 来实现在训练过程中调整学习速率的各种方法。一些简单的LR-schedulers已经实现了,可以在这里找到:https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
在您的特殊情况下,您可以像其他LR调度器一样,将
_LRScheduler
子类化,以实现基于时段数的可变调度。对于基本方法,您只需要实现__init__()
和get_lr()
方法。请注意,许多调度器希望您在每个时期调用
.step()
一次,但您也可以更频繁地更新它,甚至像余弦退火LR调度器一样传递一个自定义参数:https://pytorch.org/docs/stable/_modules/torch/optim/lr_scheduler.html#CosineAnnealingLRmftmpeh82#
As suggested in the last comment, we can use the class introduced by https://nlp.seas.harvard.edu/2018/04/03/attention.html#optimizer. But this answer will give an error unless we define a function to update the state_dict.
下面是完整的调度程序:
稍后,要在训练循环中使用它:
4dbbbstv3#
例如:https://nlp.seas.harvard.edu/2018/04/03/attention.html#optimizer
oxiaedzo4#
NoamOpt of cause提供了一种实现预热学习率的途径,如www.example.com中所示https://nlp.seas.harvard.edu/2018/04/03/attention.html#optimizer,但是它有点陈旧和不方便,实现这一点的更聪明的方法是直接使用Pytorch支持的lambda learning rate scheduler。
也就是说,首先定义预热函数以自动调整学习速率,如下所示:
然后构建学习率调度程序并在训练过程中使用它: