tensorflow 在CNN注意力模型中使用注意力

vmdwslir  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(126)

我对python和ML非常陌生。我有一个工作的CNN LSTM模型,如下所示:

def model_demo():
    inp = Input(shape=(6000,3), name='input')
    e = Conv1D(16, 9 , strides =1,  padding = 'same', activation='relu')(inp)
    e = BatchNormalization()(e) 
    e = Activation('relu')(e)
    
    e = Dropout(dropout_rate_cnn)(e)

    
    e = LSTM(32, return_sequences=True, unroll=True)(e)
    e = Dropout(0.7)(UNIlstm)  
    e = BatchNormalization()(UNIlstm)

    
    e = TimeDistributed(Dense(64, kernel_regularizer=l1(0.01), activation='relu'))(e)
    e = BatchNormalization()(e)
    e = Dropout(0.7)(e)

    e = TimeDistributed(Dense(1, kernel_regularizer=l1(0.01), activation='sigmoid'))(e)

    out_model = Model(inputs=inp, outputs=e) #e.shape(6000,1)

我想添加由Tensorflow提供的注意力。我不确定这个注意力层的输入,以及是否要使用CNN LSTM Attention模型或用Attention替换LSTM。
我尝试在LSTM层之后添加Attnetion,输入为[e[0],e[1]],但这为模型贡献了0个参数,并有一个模型摘要(不确定这是否是正确的方法):

tf.__operators__.getitem (Slic  (6000, 32)          0           ['lstm[0][0]']                   
 ingOpLambda)                                                                                     
                                                                                                  
 tf.__operators__.getitem_1 (Sl  (6000, 32)          0           ['lstm[0][0]']                   
 icingOpLambda)                                                                                   
                                                                                                  
 attention (Attention)          (6000, 32)           0           ['tf.__operators__.getitem[0][0]'
                                                                 , 'tf.__operators__.getitem_1[0][
                                                                 0]']                             
                                                                                                  
 dropout_1 (Dropout)            (6000, 32)           0           ['attention[0][0]']
wqsoz72f

wqsoz72f1#

正如你所引用的,这些论点在documentation中已经正确地给出了。
一般来说,注意力机制和LSTM中的任何一个都将与CNN一起使用,但我还没有找到任何关于同时使用它们的信息。
注意力机制比LSTM具有更好的性能,因此您可以用前者替换LSTM。
以下是Keras官方文档中关于使用CNN+Attention的代码:

# Variable-length int sequences.
query_input = tf.keras.Input(shape=(None,), dtype='int32')
value_input = tf.keras.Input(shape=(None,), dtype='int32')

# Embedding lookup.
token_embedding = tf.keras.layers.Embedding(input_dim=1000, output_dim=64)
# Query embeddings of shape [batch_size, Tq, dimension].
query_embeddings = token_embedding(query_input)
# Value embeddings of shape [batch_size, Tv, dimension].
value_embeddings = token_embedding(value_input)

# CNN layer.
cnn_layer = tf.keras.layers.Conv1D(
    filters=100,
    kernel_size=4,
    # Use 'same' padding so outputs have the same shape as inputs.
    padding='same')
# Query encoding of shape [batch_size, Tq, filters].
query_seq_encoding = cnn_layer(query_embeddings)
# Value encoding of shape [batch_size, Tv, filters].
value_seq_encoding = cnn_layer(value_embeddings)

# Query-value attention of shape [batch_size, Tq, filters].
query_value_attention_seq = tf.keras.layers.Attention()(
    [query_seq_encoding, value_seq_encoding])

# Reduce over the sequence axis to produce encodings of shape
# [batch_size, filters].
query_encoding = tf.keras.layers.GlobalAveragePooling1D()(
    query_seq_encoding)
query_value_attention = tf.keras.layers.GlobalAveragePooling1D()(
    query_value_attention_seq)

# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
    [query_encoding, query_value_attention])

# Add DNN layers, and create Model.
# ...

相关问题