在调用层“tf.keras.backend.rnn_1”时绕过Keras TypeError限制

sxissh06  于 11个月前  发布在  其他
关注(0)|答案(3)|浏览(86)

我正试图使用Keras作为使用LSTM网络的机器翻译中的注意力机制。
但是,我在代码中得到一个TypeError异常。

TypeError: Exception encountered when calling layer "tf.keras.backend.rnn_1" (type TFOpLambda).

You are passing KerasTensor(type_spec=TensorSpec(shape=(None, 35), dtype=tf.float32, name=None), name='tf.compat.v1.nn.softmax_3/Softmax:0', description="created by layer 'tf.compat.v1.nn.softmax_3'"), an intermediate Keras symbolic input/output, to a TF API that does not allow registering custom dispatchers, such as `tf.cond`, `tf.function`, gradient tapes, or `tf.map_fn`. Keras Functional model construction only supports TF API calls that *do* support dispatching, such as `tf.math.add` or `tf.reshape`. Other APIs cannot be called directly on symbolic Kerasinputs/outputs. You can work around this limitation by putting the operation in a custom Keras layer `call` and calling that layer on this symbolic input/output.

字符串
You can work around this limitation by putting the operation in a custom Keras layer call and calling that layer on this symbolic input/output.有人知道这是什么意思吗?
主代码在这里,它在attention_result, attention_weights = attention_layer([encoder_outputs1, decoder_outputs])处失败

# Encoder 

encoder_inputs = Input(shape=(max_length_english,)) 
enc_emb = Embedding(vocab_size_source, 1024,trainable=True)(encoder_inputs) 

# Bidirectional lstm layer
enc_lstm1 = Bidirectional(LSTM(256,return_sequences=True,return_state=True))
encoder_outputs1, forw_state_h, forw_state_c, back_state_h, back_state_c = enc_lstm1(enc_emb)

final_enc_h = Concatenate()([forw_state_h,back_state_h])
final_enc_c = Concatenate()([forw_state_c,back_state_c])

encoder_states =[final_enc_h, final_enc_c]

# Set up the decoder. 
decoder_inputs = Input(shape=(None,)) 
dec_emb_layer = Embedding(vocab_size_target, 1024,trainable=True) 
dec_emb = dec_emb_layer(decoder_inputs)

#LSTM using encoder_states as initial state
decoder_lstm = LSTM(512, return_sequences=True, return_state=True) 
decoder_outputs, _, _ = decoder_lstm(dec_emb, initial_state=encoder_states)

#Attention Layer
attention_layer = AttentionLayer()
attention_result, attention_weights = attention_layer([encoder_outputs1, decoder_outputs])

# Concat attention output and decoder LSTM output 
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attention_result])

#Dense layer
decoder_dense = Dense(vocab_size_target, activation='softmax')
decoder_outputs = decoder_dense(decoder_concat_input)

# Define the model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)


attention.py的相关代码是

def call(self, inputs, verbose=False):
        """
        inputs: [encoder_output_sequence, decoder_output_sequence]
        """
        assert type(inputs) == list
        encoder_out_seq, decoder_out_seq = inputs
        if verbose:
            print('encoder_out_seq>', encoder_out_seq.shape)
            print('decoder_out_seq>', decoder_out_seq.shape)

        def energy_step(inputs, states):
            """ Step function for computing energy for a single decoder state """

            assert_msg = "States must be a list. However states {} is of type {}".format(states, type(states))
            assert isinstance(states, list) or isinstance(states, tuple), assert_msg

            """ Some parameters required for shaping tensors"""
            en_seq_len, en_hidden = encoder_out_seq.shape[1], encoder_out_seq.shape[2]
            de_hidden = inputs.shape[-1]

            """ Computing S.Wa where S=[s0, s1, ..., si]"""
            # <= batch_size*en_seq_len, latent_dim
            reshaped_enc_outputs = K.reshape(encoder_out_seq, (-1, en_hidden))
            # <= batch_size*en_seq_len, latent_dim
            W_a_dot_s = K.reshape(K.dot(reshaped_enc_outputs, self.W_a), (-1, en_seq_len, en_hidden))
            if verbose:
                print('wa.s>',W_a_dot_s.shape)

            """ Computing hj.Ua """
            U_a_dot_h = K.expand_dims(K.dot(inputs, self.U_a), 1)  # <= batch_size, 1, latent_dim
            if verbose:
                print('Ua.h>',U_a_dot_h.shape)

            """ tanh(S.Wa + hj.Ua) """
            # <= batch_size*en_seq_len, latent_dim
            reshaped_Ws_plus_Uh = K.tanh(K.reshape(W_a_dot_s + U_a_dot_h, (-1, en_hidden)))
            if verbose:
                print('Ws+Uh>', reshaped_Ws_plus_Uh.shape)

            """ softmax(va.tanh(S.Wa + hj.Ua)) """
            # <= batch_size, en_seq_len
            e_i = K.reshape(K.dot(reshaped_Ws_plus_Uh, self.V_a), (-1, en_seq_len))
            # <= batch_size, en_seq_len
            e_i = K.softmax(e_i)

            if verbose:
                print('ei>', e_i.shape)

            return e_i, [e_i]

        def context_step(inputs, states):
            """ Step function for computing ci using ei """
            # <= batch_size, hidden_size
            c_i = K.sum(encoder_out_seq * K.expand_dims(inputs, -1), axis=1)
            if verbose:
                print('ci>', c_i.shape)
            return c_i, [c_i]

        def create_inital_state(inputs, hidden_size):
            # We are not using initial states, but need to pass something to K.rnn funciton
            fake_state = K.zeros_like(inputs)  # <= (batch_size, enc_seq_len, latent_dim
            fake_state = K.sum(fake_state, axis=[1, 2])  # <= (batch_size)
            fake_state = K.expand_dims(fake_state)  # <= (batch_size, 1)
            fake_state = K.tile(fake_state, [1, hidden_size])  # <= (batch_size, latent_dim
            return fake_state

        fake_state_c = create_inital_state(encoder_out_seq, encoder_out_seq.shape[-1])
        fake_state_e = create_inital_state(encoder_out_seq, encoder_out_seq.shape[1])  # <= (batch_size, enc_seq_len, latent_dim

        """ Computing energy outputs """
        # e_outputs => (batch_size, de_seq_len, en_seq_len)
        last_out, e_outputs, _ = K.rnn(
            energy_step, decoder_out_seq, [fake_state_e],
        )

        """ Computing context vectors """
        last_out, c_outputs, _ = K.rnn(
            context_step, e_outputs, [fake_state_c],
        )

        return c_outputs, e_outputs


但它失败了,

""" Computing energy outputs """
        # e_outputs => (batch_size, de_seq_len, en_seq_len)
        last_out, e_outputs, _ = K.rnn(
            energy_step, decoder_out_seq, [fake_state_e],
        )


如果有人知道如何解决这个问题,并围绕这个限制,请建议。非常感谢。

wbrvyc0a

wbrvyc0a1#

看起来你正在使用这个仓库中提供的注意力层:https://github.com/thushv89/attention_keras/blob/master/src/layers/attention.py
如果是这样,很明显,如果您在问题部分中检查,作者无法解决问题:https://github.com/thushv89/attention_keras/issues/59
有一段时间我也面临着类似的问题,然后我决定转向Keras提供的注意力层。
如果您正在寻找附加注意力,Bahdanau风格,请将注意力层代码更改为:

#call attention using:
from tensorflow.keras.layers import AdditiveAttention
 
#Modify your code and provide decoder_outputs first and encoder_outputs next as parameters.
attention_result = AdditiveAttention(use_scale=True)([decoder_outputs, encoder_outputs1])

字符串
有关任何其他帮助,请查看此处的文档:https://keras.io/api/layers/attention_layers/additive_attention/
如果你想使用任何其他注意力变量,你也可以从Keras提供的其他选项中进行检查:https://keras.io/api/layers/attention_layers/

2izufjch

2izufjch2#

在使用示例库中提供的attention层时,我发现了一个解决方法:https://github.com/thushv89/attention_keras/blob/master/src/layers/attention.py
我发现的方法是删除你作为K导入的python Keras后端模块,如下面的例子。
导入之前:

from tensorflow.python.keras import backend as K

字符串
导入时间:

from tensorflow.keras import backend as K


这似乎解决了在使用Tensorflow > 2.5版本时与存储库中使用的Attention Layer的兼容性问题。
最好替换代码中的其他tensorflow.python.keras依赖项。

p5fdfcr1

p5fdfcr13#

使用这个:

from tensorflow.keras.layers import Layer 
from tensorflow.keras import backend as K

字符串
而不是:

from tensorflow.python.keras.layers import Layer  
from tensorflow.python.keras import backend as K


在上一步的建模过程中,这对我很有效。

相关问题