RNN encoder-decoder - Cyril Chhun

# RNN encoder-decoder The RNN encoder-decoder framework is composed of both an encoder and a decoder. ## Encoder The encoder reads an input $\mathbf{x} = (x_1, \dots, x_{T_x})$ and yields a context vector $\mathbf{c}$. Usually, an RNN computes hidden states $\mathbf{h}_t = f(x_t, \mathbf{h}_{t-1})$ to give $\mathbf{c} = q([\mathbf{h}_1, \dots, \mathbf{h}_{T_x}])$ where $f$ and $q$ are non-linear functions. Sutskever[^1] used an LSTM as $f$ and $q([\mathbf{h}_1, \dots, \mathbf{h}_{T}]) = \mathbf{h}_T$. ## Decoder The decoder is trained to predict the next word $y_t$ given all previously predicted words $y_1, \dots, y_{t-1}$ and the context vector $\mathbf{c}$. It therefore defines a probability over the translated output $\mathbf{y}$ using conditionals: $ p(\mathbf{y}) = \prod_{t=1}^T p(y_t \mid y_1, \dots, y_{t-1}, \mathbf{c}) $ where $\mathbf{y} = (y_1, \dots, y_{T_y})$ and, if using an RNN, $ p(y_t \mid y_1, \dots, y_{t-1}, \mathbf{c}) = g(y_{t-1}, \mathbf{s}_t, \mathbf{c}) $ where $g$ is a non-linear function and $\mathbf{s}_t$ is the hidden state of the RNN. --- ## 📚 References - Cho, Kyunghyun, et al. "Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation." _ArXiv:1406.1078 [Cs, Stat]_, Sept. 2014. _arXiv.org_, [http://arxiv.org/abs/1406.1078](http://arxiv.org/abs/1406.1078). [^1]: Sutskever, Ilya, et al. "Sequence to Sequence Learning with Neural Networks." _ArXiv:1409.3215 [Cs]_, Dec. 2014. _arXiv.org_, [http://arxiv.org/abs/1409.3215](http://arxiv.org/abs/1409.3215).