在 Keras 中具有注意力的 Seq2Seq 编码器解码器

问题描述

我正在尝试在来自 kaggle 的 In-Short 数据集上实现注意力机制，但我被解码器模块的这个输入张量困住了。我使用 glove 进行词嵌入，并创建了 2 个嵌入矩阵，一个用于标题，另一个用于摘要。

代码如下：

print(max_len_news,max_len_headline)
K.clear_session()

embedding_dim = 300 #Size of word embeddings.
latent_dim = 500

encoder_input = Input(shape=(max_len_news,))
encoder_emb = Embedding(news_vocab,embedding_dim,weights=[embedding_matrix],trainable=True)(encoder_input) #Embedding Layer

#Three-stacked LSTM layers for encoder. Return_state returns the activation state vectors,a(t) and c(t),return_sequences return the output of the neurons y(t).
#With layers stacked one above the other,y(t) of prevIoUs layer becomes x(t) of next layer.
encoder_lstm1 =Bidirectional ( LSTM(latent_dim,return_sequences=True,return_state=True,dropout=0.3,recurrent_dropout=0.2))
y_1,a_1,c_1,a_b1,c_b1 = encoder_lstm1(encoder_emb)

encoder_lstm2 = Bidirectional( LSTM(latent_dim,recurrent_dropout=0.2))
y_2,a_2,c_2,a_b2,c_b2 = encoder_lstm2(y_1)

encoder_lstm3 = Bidirectional (LSTM(latent_dim,recurrent_dropout=0.2))
encoder_output,a_enc,c_enc,a_b3,c_b3 = encoder_lstm3(y_2)

states_a=Concatenate(axis=1)([a_enc,a_b3])
states_c=Concatenate(axis=1)([c_enc,c_b3])


print(states_f.shape)

#Single LSTM layer for decoder followed by Dense softmax layer to predict the next word in summary.
decoder_input = Input(shape=(None,))
decoder_emb = Embedding(headline_vocab,weights=[embedding_matrix1],trainable=True)(decoder_input)

decoder_lstm =(LSTM(latent_dim,recurrent_dropout=0.2))
decoder_output,decoder_fwd,decoder_back = decoder_lstm(decoder_emb,initial_state=([states_a,states_c])) #Final output states of encoder last layer are fed into decoder.

#Attention Layer
attn_layer = AttentionLayer(name='attention_layer') 
attn_out,attn_states = attn_layer([encoder_output,decoder_output]) 

decoder_concat_input = Concatenate(axis=-1,name='concat_layer')([decoder_output,attn_out])

decoder_dense = Timedistributed(Dense(headline_vocab,activation='softmax'))
decoder_output = decoder_dense(decoder_concat_input)

model = Model([encoder_input,decoder_input],decoder_output)

model.summary()

我得到的错误信息如下：

53 14
(None,2000)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-30-0e600101b74e> in <module>()
     30 
     31 decoder_lstm =(LSTM(latent_dim,recurrent_dropout=0.2))
---> 32 decoder_output,states_c])) #Final output states of encoder last layer are fed into decoder.
     33 
     34 #Attention Layer

7 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/recurrent.py in _validate_state_spec(cell_state_sizes,init_state_specs)
    633           cell_state_spec.shape[1:]).is_compatible_with(
    634               tensor_shape.TensorShape(cell_state_size)):
--> 635         raise validation_error
    636 
    637   @doc_controls.do_not_doc_inheritable

ValueError: An `initial_state` was passed that is not compatible with `cell.state_size`. Received `state_spec`=ListWrapper([InputSpec(shape=(None,1000),ndim=2),InputSpec(shape=(None,ndim=2)]); however `cell.state_size` is [500,500]

任何人，请帮助我如何纠正这个问题？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

attention-model bidirectional encoder-decoder lstm lstm seq2seq