PyTorch中的标签平滑

rdlzhqv9 于 2023-04-21 发布在其他

关注(0)|答案(8)|浏览(124)

我正在使用迁移学习为斯坦福大学汽车数据集构建ResNet-18分类模型。我想实现label smoothing来惩罚过度自信的预测并提高泛化能力。
TensorFlow在CrossEntropyLoss中有一个简单的关键字参数。有人为PyTorch构建了一个类似的函数吗？

pytorch

来源：https://stackoverflow.com/questions/55681502/label-smoothing-in-pytorch

8条答案

按热度按时间

zyfwsgd61#

多类神经网络的泛化和学习速度通常可以通过使用软目标来显著提高，软目标是硬目标的加权平均和标签上的均匀分布。以这种方式平滑标签可以防止网络变得过于自信，并且标签平滑已用于许多最先进的模型，包括图像分类，语言翻译和语音识别。

Label Smoothing已经在Tensorflow的交叉熵损失函数BinaryCrossentropy，CategoricalCrossentropy中实现了。但目前PyTorch中还没有Label Smoothing的正式实现。不过，关于它的讨论正在进行中，希望能提供一个正式的包。下面是讨论线程：Issue #7455。

这里我们将带来一些PyTorch从业者提供的Label Smoothing（LS）的最佳实现。基本上，LS的实现方式有很多种，请参考这篇具体的讨论，一种在这里，另一种在这里。这里我们将带来2种独特的实现方式，每种方式有两个版本;所以总共是4

选项一：CrossEntropyLossWithProbs

通过这种方式，它接受one-hot目标向量。用户必须手动平滑他们的目标向量。这可以在with torch.no_grad()范围内完成，因为它暂时将所有requires_grad标志设置为false。

Devin Yang：来源

import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss

class LabelSmoothingLoss(nn.Module):
    def __init__(self, classes, smoothing=0.0, dim=-1, weight = None):
        """if smoothing == 0, it's one-hot method
           if 0 < smoothing < 1, it's smooth method
        """
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.weight = weight
        self.cls = classes
        self.dim = dim

    def forward(self, pred, target):
        assert 0 <= self.smoothing < 1
        pred = pred.log_softmax(dim=self.dim)

        if self.weight is not None:
            pred = pred * self.weight.unsqueeze(0)   

        with torch.no_grad():
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.cls - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
        return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))

此外，我们还在self. smoothing上添加了Assert复选标记，并在此实现上添加了损失加权支持。

Shital Shah：Source
Shital已经在这里公布了答案。这里我们指出这个实现与Devin Yang的上述实现类似。然而，这里我们提到他的代码最小化了一点code syntax。

class SmoothCrossEntropyLoss(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    def k_one_hot(self, targets:torch.Tensor, n_classes:int, smoothing=0.0):
        with torch.no_grad():
            targets = torch.empty(size=(targets.size(0), n_classes),
                                  device=targets.device) \
                                  .fill_(smoothing /(n_classes-1)) \
                                  .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
        return targets

    def reduce_loss(self, loss):
        return loss.mean() if self.reduction == 'mean' else loss.sum() \
        if self.reduction == 'sum' else loss

    def forward(self, inputs, targets):
        assert 0 <= self.smoothing < 1

        targets = self.k_one_hot(targets, inputs.size(-1), self.smoothing)
        log_preds = F.log_softmax(inputs, -1)

        if self.weight is not None:
            log_preds = log_preds * self.weight.unsqueeze(0)

        return self.reduce_loss(-(targets * log_preds).sum(dim=-1))

检查

import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss

if __name__=="__main__":
    # 1. Devin Yang
    crit = LabelSmoothingLoss(classes=5, smoothing=0.5)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

    # 2. Shital Shah
    crit = SmoothCrossEntropyLoss(smoothing=0.5)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

tensor(1.4178)
tensor(1.4178)

选项二：标签平滑交叉熵损失

这样，它接受目标向量并使用不手动平滑目标向量，而是内置模块负责标签平滑。它允许我们根据F.nll_loss实现标签平滑。
(a). Wangleiofficial：Source -（AFAIK），Original Poster
(b). Datasaurus：来源-添加了权重支持
此外，我们稍微减少了代码编写，使其更加简洁。

class LabelSmoothingLoss(torch.nn.Module):
    def __init__(self, smoothing: float = 0.1, 
                 reduction="mean", weight=None):
        super(LabelSmoothingLoss, self).__init__()
        self.smoothing   = smoothing
        self.reduction = reduction
        self.weight    = weight

    def reduce_loss(self, loss):
        return loss.mean() if self.reduction == 'mean' else loss.sum() \
         if self.reduction == 'sum' else loss

    def linear_combination(self, x, y):
        return self.smoothing * x + (1 - self.smoothing) * y

    def forward(self, preds, target):
        assert 0 <= self.smoothing < 1

        if self.weight is not None:
            self.weight = self.weight.to(preds.device)

        n = preds.size(-1)
        log_preds = F.log_softmax(preds, dim=-1)
        loss = self.reduce_loss(-log_preds.sum(dim=-1))
        nll = F.nll_loss(
            log_preds, target, reduction=self.reduction, weight=self.weight
        )
        return self.linear_combination(loss / n, nll)

NVIDIA/DeepLearningExamples：来源

class LabelSmoothing(nn.Module):
    """NLL loss with label smoothing.
    """
    def __init__(self, smoothing=0.0):
        """Constructor for the LabelSmoothing module.
        :param smoothing: label smoothing factor
        """
        super(LabelSmoothing, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing

    def forward(self, x, target):
        logprobs = torch.nn.functional.log_softmax(x, dim=-1)
        nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
        nll_loss = nll_loss.squeeze(1)
        smooth_loss = -logprobs.mean(dim=-1)
        loss = self.confidence * nll_loss + self.smoothing * smooth_loss
        return loss.mean()

检查

if __name__=="__main__":
    # Wangleiofficial
    crit = LabelSmoothingLoss(smoothing=0.3, reduction="mean")
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])

    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

    # NVIDIA
    crit = LabelSmoothing(smoothing=0.3)
    predict = torch.FloatTensor([[0, 0.2, 0.7, 0.1, 0],
                                 [0, 0.9, 0.2, 0.2, 1], 
                                 [1, 0.2, 0.7, 0.9, 1]])
    v = crit(Variable(predict),
             Variable(torch.LongTensor([2, 1, 0])))
    print(v)

tensor(1.3883)
tensor(1.3883)

更新：Officially Added

torch.nn.CrossEntropyLoss(weight=None, size_average=None, 
                          ignore_index=- 100, reduce=None, 
                          reduction='mean', label_smoothing=0.0)

赞(0）回复(0）举报 2023-04-21

mf98qq942#

我一直在寻找从_Loss派生的选项，就像PyTorch中的其他损失类一样，并尊重基本参数，如reduction。不幸的是，我找不到直接的替代品，所以最后我自己写了一个。我还没有完全测试这个，但是：

import torch
from torch.nn.modules.loss import _WeightedLoss
import torch.nn.functional as F

class SmoothCrossEntropyLoss(_WeightedLoss):
    def __init__(self, weight=None, reduction='mean', smoothing=0.0):
        super().__init__(weight=weight, reduction=reduction)
        self.smoothing = smoothing
        self.weight = weight
        self.reduction = reduction

    @staticmethod
    def _smooth_one_hot(targets:torch.Tensor, n_classes:int, smoothing=0.0):
        assert 0 <= smoothing < 1
        with torch.no_grad():
            targets = torch.empty(size=(targets.size(0), n_classes),
                    device=targets.device) \
                .fill_(smoothing /(n_classes-1)) \
                .scatter_(1, targets.data.unsqueeze(1), 1.-smoothing)
        return targets

    def forward(self, inputs, targets):
        targets = SmoothCrossEntropyLoss._smooth_one_hot(targets, inputs.size(-1),
            self.smoothing)
        lsm = F.log_softmax(inputs, -1)

        if self.weight is not None:
            lsm = lsm * self.weight.unsqueeze(0)

        loss = -(targets * lsm).sum(-1)

        if  self.reduction == 'sum':
            loss = loss.sum()
        elif  self.reduction == 'mean':
            loss = loss.mean()

        return loss

其他选项：

utils.pytorch实现
DeepMatch实现

赞(0）回复(0）举报 2023-04-21

lo8azlld3#

据我所知没有
以下是PyTorch实现的两个示例：

OpenNMT框架中的LabelSmoothingLoss模块用于机器翻译
attention-is-all-you-need-pytorch，重新实现Google的Attention is all you need paper

赞(0）回复(0）举报 2023-04-21

cwxwcias4#

从1.10.0版开始，Pytorch正式支持torch.nn.CrossEntropyLoss中的标签平滑和软目标。

赞(0）回复(0）举报 2023-04-21

zzzyeukh5#

标签平滑PyTorch实现参考：https://github.com/wangleiofficial/label-smoothing-pytorch

import torch.nn.functional as F

def linear_combination(x, y, epsilon):
    return epsilon * x + (1 - epsilon) * y

def reduce_loss(loss, reduction='mean'):
    return loss.mean() if reduction == 'mean' else loss.sum() if reduction == 'sum' else loss

class LabelSmoothingCrossEntropy(nn.Module):
    def __init__(self, epsilon: float = 0.1, reduction='mean'):
        super().__init__()
        self.epsilon = epsilon
        self.reduction = reduction

    def forward(self, preds, target):
        n = preds.size()[-1]
        log_preds = F.log_softmax(preds, dim=-1)
        loss = reduce_loss(-log_preds.sum(dim=-1), self.reduction)
        nll = F.nll_loss(log_preds, target, reduction=self.reduction)
        return linear_combination(loss / n, nll, self.epsilon)

赞(0）回复(0）举报 2023-04-21

snvhrwxg6#

目前在PyTorch中还没有正式的实现，但已经提出了作为高优先级Feature Request #7455，并在TorchVision Issue #2980中单独提出。
在其他库中有许多实现：

NJUNMT-pytorch NMTCritierion()._smooth_label()
Snorkelsnorkel.classification.cross_entropy_with_probs()
OpenNMT LabelSmoothingLoss()

以及一些非官方的实现/代码片段：

NVIDIAResNet50 v1.5
https://discuss.pytorch.org/t/cross-entropy-with-one-hot-targets/13580/5

TensorFlow / Keras implementationtf.keras.losses.CategoricalCrossentropy(label_smoothing)

赞(0）回复(0）举报 2023-04-21

hivapdat7#

当前最上面的帖子在传递权重时并不完全正确。请注意，PyTorch文档指定当传递权重和reduction='mean'时，将考虑权重进行归一化：

所以减少应该是

if self.weight is not None:
    redw = 1.0/(true_dist*self.weight.unsqueeze(0)).sum()
else:
    redw = 1.0/prediction.shape[0] # 1/n
        
return torch.sum(redw*torch.sum(-true_dist * pred, dim=self.dim))

而不是

return torch.mean(torch.sum(-true_dist * pred, dim=self.dim))

完整地，通过此修复：

import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.nn.modules.loss import _WeightedLoss

class LabelSmoothingLoss(nn.Module):
    def __init__(self, classes, smoothing=0.0, dim=-1, weight = None):
        """if smoothing == 0, it's one-hot method
           if 0 < smoothing < 1, it's smooth method
        """
        super(LabelSmoothingLoss, self).__init__()
        self.confidence = 1.0 - smoothing
        self.smoothing = smoothing
        self.weight = weight
        self.cls = classes
        self.dim = dim

    def forward(self, pred, target):
        assert 0 <= self.smoothing < 1
        pred = pred.log_softmax(dim=self.dim)

        if self.weight is not None:
            pred = pred * self.weight.unsqueeze(0)   

        with torch.no_grad():
            true_dist = torch.zeros_like(pred)
            true_dist.fill_(self.smoothing / (self.cls - 1))
            true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)
        if self.weight is not None:
            redw = 1.0/(true_dist*self.weight.unsqueeze(0)).sum()
        else:
            redw = 1.0/prediction.shape[0] # 1/n
        
        return torch.sum(redw*torch.sum(-true_dist * pred, dim=self.dim))

赞(0）回复(0）举报 2023-04-21

oymdgrw78#

请注意，标签平滑有不同的配方。
1.来自原始论文https://arxiv.org/pdf/1512.00567.pdf的规范公式，其中平滑的真实分布是one-hot向量和均匀分布的线性组合：

true_dist = torch.zeros_like(pred)
true_dist.scatter_(1, target.data.unsqueeze(1), 1 - self.smoothing)
true_dist += self.label_smoothing / num_classes

1.另一种流行的公式，其中平滑的真实分布具有不同的形式：

true_dist = torch.zeros_like(pred)
true_dist.fill_(self.smoothing / (num_classes - 1))
true_dist.scatter_(1, target.data.unsqueeze(1), 1 - self.smoothing)

规范公式用于Pytorch的实现中。它也对应于Innat's answer中的选项2。第二个公式在许多开源实现中提供，包括上述answer中的选项1。

赞(0）回复(0）举报 2023-04-21

我来回答

PyTorch中的标签平滑

8条答案

选项一：CrossEntropyLossWithProbs

选项二：标签平滑交叉熵损失

更新：Officially Added

相关问题

热门标签

最新问答