Long Short-Term Memory

1 November 1997

journal article
Published by MIT Press in Neural Computation

Vol. 9 (8), 1735-1780
https://doi.org/10.1162/neco.1997.9.8.1735

Abstract

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

Keywords

This publication has 18 references indexed in Scilit:

Learning long-term dependencies in NARX recurrent neural networks
IEEE Transactions on Neural Networks, 1996
Gradient calculations for dynamic recurrent neural networks: a survey
IEEE Transactions on Neural Networks, 1995
Learning long-term dependencies with gradient descent is difficult
IEEE Transactions on Neural Networks, 1994
EXPERIMENTAL COMPARISON OF THE EFFECT OF ORDER IN RECURRENT NEURAL NETWORKS
International Journal of Pattern Recognition and Artificial Intelligence, 1993
Contrastive Learning and Neural Oscillations
Neural Computation, 1991
A time-delay neural network architecture for isolated word recognition
Neural Networks, 1990
Adaptive neural oscillator using continuous-time back-propagation learning
Neural Networks, 1989
Finite State Automata and Simple Recurrent Networks
Neural Computation, 1989
Learning State Space Trajectories in Recurrent Neural Networks
Neural Computation, 1989
Generalization of back-propagation to recurrent neural networks
Physical Review Letters, 1987

Cited by 57620 articles