关于 Pytorch seq2seq 教程中注意力计算的问题：discrepancy with original Badahnau or Luong paper

我最近在研究注意力。我有点怀疑他们计算的注意力是 Pytorch NLP 注意力教程：https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html。

在教程中，他们使用解码器的输入和解码器的隐藏状态计算得分或权重。但是我发现 Luong 和 Badahnau 都没有这样做的原因。相反，两者都使用解码器隐藏状态和编码器输出计算权重。为什么 Pytorch 教程会这样做？

pytorch 教程对 Luong 和 Bahdanau 的关注似乎不同。