pytorch 使用KL散度时，变分自动编码器为每个输入mnist图像提供相同的输出图像

chhqkbe1 于 2022-12-04 发布在其他

关注(0)|答案(4)|浏览(222)

当不使用KL散度项时，VAE几乎能很好地重建原始图像，但当存在随机噪声时，VAE不能很好地重建新图像。
当使用KL散度项时，VAE在重建和生成图像时给出相同的怪异输出。

下面是损失函数的pytorch代码：

def loss_function(recon_x, x, mu, logvar):
    BCE = F.binary_cross_entropy(recon_x, x.view(-1, 784), size_average=True)
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())    
    return (BCE+KLD)

recon_x是重建图像，x是original_image，μ是平均向量，而logvar是包含方差的对数的向量。
这里出了什么问题？提前感谢：）

pytorch

来源：https://stackoverflow.com/questions/50607516/variational-autoencoder-gives-same-output-image-for-every-input-mnist-image-when

4条答案

按热度按时间

py49o6xq1#

一个可能的原因是两个损失之间的数字不平衡，您的BCE损失计算为整个批次的平均值（c.f. size_average=True），而KLD损失是总和。

赞(0）回复(0）举报 2022-12-04

xcitsw882#

用KLD乘以0.0001就可以了。生成的图像有点失真，但相似性问题得到了解决。

赞(0）回复(0）举报 2022-12-04

kuhbmx9i3#

是，尝试使用KLD损失项的不同权重因子。降低KLD损失项的权重可解决CelebA数据集（http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html）中的相同重建输出问题。

赞(0）回复(0）举报 2022-12-04

bakd9h0s4#

这有很多可能的原因。正如benjaminplanche所说，您需要使用.mean而不是.sum约简。此外，KLD项权重可能因架构和数据集的不同而不同。因此，请尝试不同的权重，并查看重建损失和潜在空间以进行决定。重建损失（输出质量）和KLD项之间存在一个权衡，KLD项会迫使模型形成高斯型潜在空间。
为了评估VAE的不同方面，我用不同的KLD项权重训练了Vanilla自动编码器和VAE。注意，我使用MNIST手写数字数据集来训练输入大小为784=28*28和潜在大小为30维的网络。虽然对于[0，1]范围内的数据样本，我们通常使用Sigmoid激活函数，但出于实验原因，我使用了Tanh。
Vanilla自动编码器：

Autoencoder(
  (encoder): Encoder(
    (nn): Sequential(
      (0): Linear(in_features=784, out_features=30, bias=True)
    )
  )
  (decoder): Decoder(
    (nn): Sequential(
      (0): Linear(in_features=30, out_features=784, bias=True)
      (1): Tanh()
    )
  )
)

然后，我实现了VAE模型，如下面的代码块所示。我用集合{0.5，1，5}中的不同KLD权重训练了这个模型。

class VAE(nn.Module):

    def __init__(self,dim_latent_representation=2):

        super(VAE,self).__init__()
        
        class Encoder(nn.Module):
            def __init__(self, output_size=2):
                super(Encoder, self).__init__()
                # needs your implementation
                self.nn = nn.Sequential(
                nn.Linear(28 * 28, output_size),
                )

            def forward(self, x):
                # needs your implementation
                return self.nn(x)                

        class Decoder(nn.Module):
            def __init__(self, input_size=2):
                super(Decoder, self).__init__()
                # needs your implementation
                self.nn = nn.Sequential(
                nn.Linear(input_size, 28 * 28),
                nn.Tanh(),
                )

            def forward(self, z):
                # needs your implementation
                return self.nn(z)
                
        self.dim_latent_representation = dim_latent_representation
        self.encoder = Encoder(output_size=dim_latent_representation)    
        self.mu_layer = nn.Linear(self.dim_latent_representation, self.dim_latent_representation)
        self.logvar_layer = nn.Linear(self.dim_latent_representation, self.dim_latent_representation)       
        self.decoder = Decoder(input_size=dim_latent_representation)
    # Implement this function for the VAE model
    def reparameterise(self, mu, logvar):
        
        if self.training:
            std = logvar.mul(0.5).exp_()
            eps = std.data.new(std.size()).normal_()
            return eps.mul(std).add_(mu)
        else:
            return mu

    def forward(self,x):
        
        # This function should be modified for the DAE and VAE
        x = self.encoder(x)
        mu, logvar = self.mu_layer(x), self.logvar_layer(x)
        z = self.reparameterise(mu, logvar)
        return self.decoder(z), mu, logvar

Vanilla自动编码器
培训损失：0.4089验证损失
确认丢失（重建错误）：0.4171
VAE损失= MSE + 0.5 * KLD
培训损失：0.6420
确认丢失（重建错误）：0.6060
VAE损失= MSE + 1 * KLD
培训损失：0.6821
确认丢失（重建错误）：0.6550
VAE损失= MSE + 5 * KLD
培训损失：0.7122
确认丢失（重建错误）：0.7154

在这里你可以看到不同模型的输出结果。我还使用sklearn.manifold.TSNE变换在2D中可视化了30维的潜在空间。

我们观察到具有30 D瓶颈尺寸的vanilla自动编码器的低损失值，这导致高质量的重建图像。虽然损失值在VAE中增加，但VAE安排了潜在空间，使得不同类别的潜在表示之间的间隙减小。这意味着我们可以更好地操纵由于VAE在本征空间遵循各向同性多元正态分布，我们可以通过从潜在空间中提取比Vanilla自动编码器质量更高的样本来生成新的不可见图像。2然而，重建质量降低（损失值增加）因为损失函数是待优化的MSE和KLD项的加权组合当我们增加KLD权重时，通过牺牲重构质量，我们实现了更紧凑的更接近先验分布的潜在空间。

赞(0）回复(0）举报 2022-12-04

我来回答

pytorch 使用KL散度时，变分自动编码器为每个输入mnist图像提供相同的输出图像

4条答案

相关问题

热门标签

最新问答