Long Short-Term Memory
- 1 November 1997
- journal article
- Published by MIT Press in Neural Computation
- Vol. 9 (8), 1735-1780
- https://doi.org/10.1162/neco.1997.9.8.1735
Abstract
Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.Keywords
This publication has 18 references indexed in Scilit:
- Learning long-term dependencies in NARX recurrent neural networksIEEE Transactions on Neural Networks, 1996
- Gradient calculations for dynamic recurrent neural networks: a surveyIEEE Transactions on Neural Networks, 1995
- Learning long-term dependencies with gradient descent is difficultIEEE Transactions on Neural Networks, 1994
- EXPERIMENTAL COMPARISON OF THE EFFECT OF ORDER IN RECURRENT NEURAL NETWORKSInternational Journal of Pattern Recognition and Artificial Intelligence, 1993
- Contrastive Learning and Neural OscillationsNeural Computation, 1991
- A time-delay neural network architecture for isolated word recognitionNeural Networks, 1990
- Adaptive neural oscillator using continuous-time back-propagation learningNeural Networks, 1989
- Finite State Automata and Simple Recurrent NetworksNeural Computation, 1989
- Learning State Space Trajectories in Recurrent Neural NetworksNeural Computation, 1989
- Generalization of back-propagation to recurrent neural networksPhysical Review Letters, 1987