我正在尝试建立一个用于文本生成的编码器-解码器模型。我正在使用带有嵌入层的LSTM层。我在从嵌入层到LSTM编码器层的输出中遇到了一个问题。我得到的错误是:
ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: (None, 13, 128, 512)
我的编码器数据具有形状:(40, 13, 128) = (num_observations, max_encoder_seq_length, vocab_size)
×嵌入大小/潜在尺寸= 512。
我的问题是:我怎样才能"摆脱"从嵌入层到LSTM编码器层的第四维,或者换句话说:我应该如何将这4个维度传递到编码器模型的LSTM层?2由于我是这个主题的新手,我最终还应该在解码器LSTM层中纠正什么?
我读过一些帖子,包括this,这个one和许多其他的,但没有找到一个解决方案。在我看来,我的问题不是在模型中,而是在数据的形状。任何提示或评论可能是错误的将是非常感谢。非常感谢
我的模型如下所示,来自(this tutorial):
encoder_inputs = Input(shape=(max_encoder_seq_length,))
x = Embedding(num_encoder_tokens, latent_dim)(encoder_inputs)
x, state_h, state_c = LSTM(latent_dim, return_state=True)(x)
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(max_decoder_seq_length,))
x = Embedding(num_decoder_tokens, latent_dim)(decoder_inputs)
x = LSTM(latent_dim, return_sequences=True)(x, initial_state=encoder_states)
decoder_outputs = Dense(num_decoder_tokens, activation='softmax')(x)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
# Compile & run training
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
# Note that `decoder_target_data` needs to be one-hot encoded,
# rather than sequences of integers like `decoder_input_data`!
model.fit([encoder_input_data, decoder_input_data],
decoder_target_data,
batch_size=batch_size,
epochs=epochs,
shuffle=True,
validation_split=0.05)
我的模型总结如下:
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 13)] 0
__________________________________________________________________________________________________
input_2 (InputLayer) [(None, 15)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 13, 512) 65536 input_1[0][0]
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 15, 512) 65536 input_2[0][0]
__________________________________________________________________________________________________
lstm (LSTM) [(None, 512), (None, 2099200 embedding[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) (None, 15, 512) 2099200 embedding_1[0][0]
lstm[0][1]
lstm[0][2]
__________________________________________________________________________________________________
dense (Dense) (None, 15, 128) 65664 lstm_1[0][0]
==================================================================================================
Total params: 4,395,136
Trainable params: 4,395,136
Non-trainable params: 0
__________________________________________________________________________________________________
- 编辑**
我正在按以下方式格式化数据:
for i, text, in enumerate(input_texts):
words = text.split() #text is a sentence
for t, word in enumerate(words):
encoder_input_data[i, t, input_dict[word]] = 1.
对于这样的命令decoder_input_data[:2]
,给出:
array([[[0., 1., 0., ..., 0., 0., 0.],
[0., 0., 1., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]],
[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 1., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]], dtype=float32)
1条答案
按热度按时间jm2pwxwz1#
我不确定你传递给模型的输入和输出是什么,但这是有效的,请注意我传递的
encoder
和decoder
输入的形状,你的输入需要是这个形状,模型才能运行。序列数据(文本)需要作为标签编码序列传递到输入端。这需要使用来自keras的
textvectorizer
来完成。请在这里阅读更多关于如何为嵌入层和lstm准备文本数据的信息。