Long short-term memory

# Long short-term memory The long short-term memory (LSTM) unit is composed of a cell, an input gate, an output gate and a forget gate. They are well-suited to processing time series data because they were developed to deal with the vanishing gradient problem and are thus relatively insensitive to gap length, contrary to RNNs. ![Long short-term memory unit](https://miro.medium.com/max/1052/1*03hDJmyUP4_CBLsrJvtBPg.png) - $f_t = \sigma (W_f \cdot [h_{t-1}, x_t] + b_f)$ is the forget gate (first sigmoid) - $i_t = \sigma (W_i \cdot [h_{t-1}, x_t] + b_i)$ is the input gate (second sigmoid) - $\tilde{C}_t = \tanh (W_C \cdot [h_{t-1}, x_t] + b_C)$ is the cell input activation - $C_t = f_t \odot C_{t-1} + i_t \odot \tilde{C}_t$ is the new cell state - $o_t = \sigma (W_o \cdot [h_{t-1}, x_t] + b_o)$ is the output gate (third sigmoid) - $h_t = o_t \odot \tanh (C_t)$ is the new hidden state All this terminology looks quite impressive, but the intuition behind is straightforward. The core idea of the LSTM is to propagate a cell state along the time dimension, while updating it along the way. The previous cell state $C_{t-1}$ is multiplied by the forget gate $f_t \in [0,1]$, and the cell input activation $\tilde{C}_t$ is multiplied by the input gate $i_t \in [0,1]$. The new cell state $C_t$ is simply the sum of both. Then, the output is $\tanh(C_t)$ multiplied by the output gate $o_t \in [0,1]$. ## Variants ### Peephole LSTM ![Peephole LSTM](https://i.imgur.com/iobgCAp.png) Here, the gates can look at the cell state. Some peepholes can be omitted. ### Coupled forget and input gates ![Coupled forget and input gates](https://i.imgur.com/csoWjIZ.png) Here, we only forget when we update the cell state. ### Gated recurrent unit See [[Gated recurrent unit]]. --- ## 📚 References - Hochreiter, Sepp and Jürgen Schmidhuber. ["Long Short-term Memory."](https://doi.org/10.1162/neco.1997.9.8.1735) Neural Comput., vol. 9, no. 8, 1 Dec. 1997, pp. 1735-80. - ["Understanding LSTM Networks -- colah's blog."](https://colah.github.io/posts/2015-08-Understanding-LSTMs) 27 August 2015. - ["Long short-term memory | Wikipedia."](https://en.wikipedia.org/wiki/Long_short-term_memory) Wikipedia, 21 Sept. 2021.