我有下面的代码。
x = keras.layers.Input(batch_shape = (None, 4096))
hidden = keras.layers.Dense(512, activation = 'relu')(x)
hidden = keras.layers.BatchNormalization()(hidden)
hidden = keras.layers.Dropout(0.5)(hidden)
predictions = keras.layers.Dense(80, activation = 'sigmoid')(hidden)
mlp_model = keras.models.Model(input = [x], output = [predictions])
mlp_model.summary()
这是模型摘要:
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_3 (InputLayer) (None, 4096) 0
____________________________________________________________________________________________________
dense_1 (Dense) (None, 512) 2097664 input_3[0][0]
____________________________________________________________________________________________________
batchnormalization_1 (BatchNorma (None, 512) 2048 dense_1[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 512) 0 batchnormalization_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 80) 41040 dropout_1[0][0]
====================================================================================================
Total params: 2,140,752
Trainable params: 2,139,728
Non-trainable params: 1,024
____________________________________________________________________________________________________
BatchNormalization(BN)层的输入大小为512。根据Keras documentation,BN层的输出形状与输入512相同。
那么与BN层相关联的参数的数目是2048个是如何的呢?
2条答案
按热度按时间krcsximq1#
这2048个参数实际上是
[gamma weights, beta weights, moving_mean(non-trainable), moving_variance(non-trainable)]
,每个参数具有512个元素(输入层的大小)。njthzxwz2#
Keras中的批处理规范化实现了this paper。
正如您在这里看到的,为了在训练过程中使批量归一化工作,他们需要跟踪每个归一化维度的分布。为此,由于您默认处于
mode=0
中,他们计算前一层上每个特征的4个参数。这些参数确保您正确地传播和反向传播信息。4*512 = 2048
,这应该可以回答你的问题。