# RNN encoder-decoder
The RNN encoder-decoder framework is composed of both an encoder and a decoder.
## Encoder
The encoder reads an input $\mathbf{x} = (x_1, \dots, x_{T_x})$ and yields a context vector $\mathbf{c}$.
Usually, an RNN computes hidden states $\mathbf{h}_t = f(x_t, \mathbf{h}_{t-1})$ to give $\mathbf{c} = q([\mathbf{h}_1, \dots, \mathbf{h}_{T_x}])$ where $f$ and $q$ are non-linear functions. Sutskever[^1] used an LSTM as $f$ and $q([\mathbf{h}_1, \dots, \mathbf{h}_{T}]) = \mathbf{h}_T$.
## Decoder
The decoder is trained to predict the next word $y_t$ given all previously predicted words $y_1, \dots, y_{t-1}$ and the context vector $\mathbf{c}$.
It therefore defines a probability over the translated output $\mathbf{y}$ using conditionals:
$
p(\mathbf{y}) = \prod_{t=1}^T p(y_t \mid y_1, \dots, y_{t-1}, \mathbf{c})
$
where $\mathbf{y} = (y_1, \dots, y_{T_y})$ and, if using an RNN,
$
p(y_t \mid y_1, \dots, y_{t-1}, \mathbf{c}) = g(y_{t-1}, \mathbf{s}_t, \mathbf{c})
$
where $g$ is a non-linear function and $\mathbf{s}_t$ is the hidden state of the RNN.
---
## 📚 References
- Cho, Kyunghyun, et al. "Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation." _ArXiv:1406.1078 [Cs, Stat]_, Sept. 2014. _arXiv.org_, [http://arxiv.org/abs/1406.1078](http://arxiv.org/abs/1406.1078).
[^1]: Sutskever, Ilya, et al. "Sequence to Sequence Learning with Neural Networks." _ArXiv:1409.3215 [Cs]_, Dec. 2014. _arXiv.org_, [http://arxiv.org/abs/1409.3215](http://arxiv.org/abs/1409.3215).