Tensorflow - 机器翻译解码器

问题描述

我正在学习使用注意力机制的神经机器翻译 Tensorflow's tutorial。

解码器的代码如下：

class Decoder(tf.keras.Model):
  def __init__(self,vocab_size,embedding_dim,dec_units,batch_sz):
    super(Decoder,self).__init__()
    self.batch_sz = batch_sz
    self.dec_units = dec_units
    self.embedding = tf.keras.layers.Embedding(vocab_size,embedding_dim)
    self.gru = tf.keras.layers.GRU(self.dec_units,return_sequences=True,return_state=True,recurrent_initializer='glorot_uniform')
    self.fc = tf.keras.layers.Dense(vocab_size)

    # used for attention
    self.attention = BahdanauAttention(self.dec_units)

  def call(self,x,hidden,enc_output):
    # enc_output shape == (batch_size,max_length,hidden_size)
    context_vector,attention_weights = self.attention(hidden,enc_output)

    # x shape after passing through embedding == (batch_size,1,embedding_dim)
    x = self.embedding(x)

    # x shape after concatenation == (batch_size,embedding_dim + hidden_size)
    x = tf.concat([tf.expand_dims(context_vector,1),x],axis=-1)

    # passing the concatenated vector to the GRU
    output,state = self.gru(x)

    # output shape == (batch_size * 1,hidden_size)
    output = tf.reshape(output,(-1,output.shape[2]))

    # output shape == (batch_size,vocab)
    x = self.fc(output)

    return x,state,attention_weights

我在这里不明白的是，解码器的 GRU 单元没有通过使用编码器的最后一个隐藏状态初始化它来连接到编码器。

output,state = self.gru(x)  

# Why is it not initialized with the hidden state of the encoder ?

根据我的理解，编码器和解码器之间存在联系，只有当解码器使用“思想向量”或编码器的最后隐藏状态进行初始化时。

为什么 Tensorflow 的官方教程中缺少这一点？这是一个错误吗？还是我在这里遗漏了什么？

有人能帮我理解吗？

解决方法

此 detailed NMT guide 很好地总结了这一点，它将经典的 seq2seq NMT 与基于编码器-解码器注意力的 NMT 架构进行了比较。

Vanilla seq2seq：解码器还需要访问源信息，实现这一点的一种简单方法是使用编码器的最后一个隐藏状态编码器状态对其进行初始化。

基于注意力的编码器-解码器：请记住，在 vanilla seq2seq 模型中，我们在开始解码过程时将最后一个源状态从编码器传递给解码器。这适用于短句和中长句；然而，对于长句子，单个固定大小的隐藏状态成为信息瓶颈。注意力机制不是丢弃在源 RNN 中计算的所有隐藏状态，而是提供了一种方法，允许解码器查看它们（将它们视为源信息的动态记忆）。通过这样做，注意力机制改进了较长句子的翻译。

在这两种情况下，您都可以使用教师强制来更好地训练模型。

TLDR;注意机制是帮助解码器“峰值”进入编码器，而不是你明确地将编码器正在做的事情传递给解码器。

encoder-decoder gated-recurrent-unit lstm machine-translation tensorflow