我正试图使用Keras作为使用LSTM网络的机器翻译中的注意力机制。
但是,我在代码中得到一个TypeError异常。
TypeError: Exception encountered when calling layer "tf.keras.backend.rnn_1" (type TFOpLambda).
You are passing KerasTensor(type_spec=TensorSpec(shape=(None, 35), dtype=tf.float32, name=None), name='tf.compat.v1.nn.softmax_3/Softmax:0', description="created by layer 'tf.compat.v1.nn.softmax_3'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.
字符串You can work around this limitation by putting the operation in a custom Keras layer call and calling that layer on this symbolic input/output.
有人知道这是什么意思吗?
主代码在这里,它在attention_result, attention_weights = attention_layer([encoder_outputs1, decoder_outputs])
处失败
# Encoder
encoder_inputs = Input(shape=(max_length_english,))
enc_emb = Embedding(vocab_size_source, 1024,trainable=True)(encoder_inputs)
# Bidirectional lstm layer
enc_lstm1 = Bidirectional(LSTM(256,return_sequences=True,return_state=True))
encoder_outputs1, forw_state_h, forw_state_c, back_state_h, back_state_c = enc_lstm1(enc_emb)
final_enc_h = Concatenate()([forw_state_h,back_state_h])
final_enc_c = Concatenate()([forw_state_c,back_state_c])
encoder_states =[final_enc_h, final_enc_c]
# Set up the decoder.
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(vocab_size_target, 1024,trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)
#LSTM using encoder_states as initial state
decoder_lstm = LSTM(512, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(dec_emb, initial_state=encoder_states)
#Attention Layer
attention_layer = AttentionLayer()
attention_result, attention_weights = attention_layer([encoder_outputs1, decoder_outputs])
# Concat attention output and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attention_result])
#Dense layer
decoder_dense = Dense(vocab_size_target, activation='softmax')
decoder_outputs = decoder_dense(decoder_concat_input)
# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
型
attention.py的相关代码是
def call(self, inputs, verbose=False):
"""
inputs: [encoder_output_sequence, decoder_output_sequence]
"""
assert type(inputs) == list
encoder_out_seq, decoder_out_seq = inputs
if verbose:
print('encoder_out_seq>', encoder_out_seq.shape)
print('decoder_out_seq>', decoder_out_seq.shape)
def energy_step(inputs, states):
""" Step function for computing energy for a single decoder state """
assert_msg = "States must be a list. However states {} is of type {}".format(states, type(states))
assert isinstance(states, list) or isinstance(states, tuple), assert_msg
""" Some parameters required for shaping tensors"""
en_seq_len, en_hidden = encoder_out_seq.shape[1], encoder_out_seq.shape[2]
de_hidden = inputs.shape[-1]
""" Computing S.Wa where S=[s0, s1, ..., si]"""
# <= batch_size*en_seq_len, latent_dim
reshaped_enc_outputs = K.reshape(encoder_out_seq, (-1, en_hidden))
# <= batch_size*en_seq_len, latent_dim
W_a_dot_s = K.reshape(K.dot(reshaped_enc_outputs, self.W_a), (-1, en_seq_len, en_hidden))
if verbose:
print('wa.s>',W_a_dot_s.shape)
""" Computing hj.Ua """
U_a_dot_h = K.expand_dims(K.dot(inputs, self.U_a), 1) # <= batch_size, 1, latent_dim
if verbose:
print('Ua.h>',U_a_dot_h.shape)
""" tanh(S.Wa + hj.Ua) """
# <= batch_size*en_seq_len, latent_dim
reshaped_Ws_plus_Uh = K.tanh(K.reshape(W_a_dot_s + U_a_dot_h, (-1, en_hidden)))
if verbose:
print('Ws+Uh>', reshaped_Ws_plus_Uh.shape)
""" softmax(va.tanh(S.Wa + hj.Ua)) """
# <= batch_size, en_seq_len
e_i = K.reshape(K.dot(reshaped_Ws_plus_Uh, self.V_a), (-1, en_seq_len))
# <= batch_size, en_seq_len
e_i = K.softmax(e_i)
if verbose:
print('ei>', e_i.shape)
return e_i, [e_i]
def context_step(inputs, states):
""" Step function for computing ci using ei """
# <= batch_size, hidden_size
c_i = K.sum(encoder_out_seq * K.expand_dims(inputs, -1), axis=1)
if verbose:
print('ci>', c_i.shape)
return c_i, [c_i]
def create_inital_state(inputs, hidden_size):
# We are not using initial states, but need to pass something to K.rnn funciton
fake_state = K.zeros_like(inputs) # <= (batch_size, enc_seq_len, latent_dim
fake_state = K.sum(fake_state, axis=[1, 2]) # <= (batch_size)
fake_state = K.expand_dims(fake_state) # <= (batch_size, 1)
fake_state = K.tile(fake_state, [1, hidden_size]) # <= (batch_size, latent_dim
return fake_state
fake_state_c = create_inital_state(encoder_out_seq, encoder_out_seq.shape[-1])
fake_state_e = create_inital_state(encoder_out_seq, encoder_out_seq.shape[1]) # <= (batch_size, enc_seq_len, latent_dim
""" Computing energy outputs """
# e_outputs => (batch_size, de_seq_len, en_seq_len)
last_out, e_outputs, _ = K.rnn(
energy_step, decoder_out_seq, [fake_state_e],
)
""" Computing context vectors """
last_out, c_outputs, _ = K.rnn(
context_step, e_outputs, [fake_state_c],
)
return c_outputs, e_outputs
型
但它失败了,
""" Computing energy outputs """
# e_outputs => (batch_size, de_seq_len, en_seq_len)
last_out, e_outputs, _ = K.rnn(
energy_step, decoder_out_seq, [fake_state_e],
)
型
如果有人知道如何解决这个问题,并围绕这个限制,请建议。非常感谢。
3条答案
按热度按时间wbrvyc0a1#
看起来你正在使用这个仓库中提供的注意力层:https://github.com/thushv89/attention_keras/blob/master/src/layers/attention.py
如果是这样,很明显,如果您在问题部分中检查,作者无法解决问题:https://github.com/thushv89/attention_keras/issues/59
有一段时间我也面临着类似的问题,然后我决定转向Keras提供的注意力层。
如果您正在寻找附加注意力,Bahdanau风格,请将注意力层代码更改为:
字符串
有关任何其他帮助,请查看此处的文档:https://keras.io/api/layers/attention_layers/additive_attention/
如果你想使用任何其他注意力变量,你也可以从Keras提供的其他选项中进行检查:https://keras.io/api/layers/attention_layers/
2izufjch2#
在使用示例库中提供的attention层时,我发现了一个解决方法:https://github.com/thushv89/attention_keras/blob/master/src/layers/attention.py
我发现的方法是删除你作为K导入的python Keras后端模块,如下面的例子。
导入之前:
字符串
导入时间:
型
这似乎解决了在使用Tensorflow > 2.5版本时与存储库中使用的Attention Layer的兼容性问题。
最好替换代码中的其他
tensorflow.python.keras
依赖项。p5fdfcr13#
使用这个:
字符串
而不是:
型
在上一步的建模过程中,这对我很有效。