我正在尝试使用以下架构训练模型:
self.lstm1 = nn.LSTM(in_channels, hidden_channels, num_layers, batch_first=True, dropout=dropout_prob)
self.batchnorm1 = nn.BatchNorm1d(hidden_channels)
self.lstm2 = nn.LSTM(hidden_channels, hidden_channels, num_layers, batch_first=True, dropout=dropout_prob)
self.batchnorm2 = nn.BatchNorm1d(hidden_channels)
self.lstm3 = nn.LSTM(hidden_channels, hidden_channels, num_layers, batch_first=True, dropout=dropout_prob)
self.batchnorm3 = nn.BatchNorm1d(hidden_channels)
self.fc1 = nn.Linear(hidden_channels, out_channels)
个字符
当我试图训练网络时,当我到达batchnorm1层时,我得到了这个错误:
RuntimeError: running_mean should contain 770 elements not 128
型
你能告诉我错误在哪里吗?
我尝试使用permeute将输出双介子从(32,770,128)更改为(32,128,770),但仍然得到不同的错误。
1条答案
按热度按时间nbnkbykc1#
批量归一化在lstm层之前执行(无论您将它们添加到网络的顺序如何),因此您应该设置:
字符串