Learning long-term dependencies in NARX recurrent neural networks

1 November 1996

journal article
Published by Institute of Electrical and Electronics Engineers (IEEE) in IEEE Transactions on Neural Networks

Vol. 7 (6), 1329-1338
https://doi.org/10.1109/72.548162

Abstract

It has previously been shown that gradient-descent learning algorithms for recurrent neural networks can perform poorly on tasks that involve long-term dependencies, i.e. those problems for which the desired output depends on inputs presented at times far in the past. We show that the long-term dependencies problem is lessened for a class of architectures called nonlinear autoregressive models with exogenous (NARX) recurrent neural networks, which have powerful representational capabilities. We have previously reported that gradient descent learning can be more effective in NARX networks than in recurrent neural network architectures that have "hidden states" on problems including grammatical inference and nonlinear system identification. Typically, the network converges much faster and generalizes better than other networks. The results in this paper are consistent with this phenomenon. We present some experimental results which show that NARX networks can often retain information for two to three times as long as conventional recurrent neural networks. We show that although NARX networks do not circumvent the problem of long-term dependencies, they can greatly improve performance on long-term dependency problems. We also describe in detail some of the assumptions regarding what it means to latch information robustly and suggest possible ways to loosen these assumptions.

Keywords

This publication has 23 references indexed in Scilit:

Computational capabilities of recurrent NARX neural networks
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1997
Unified integration of explicit knowledge and learning by example in recurrent networks
IEEE Transactions on Knowledge and Data Engineering, 1995
On the Computational Power of Neural Nets
Journal of Computer and System Sciences, 1995
The gamma model—A new neural model for temporal processing
Neural Networks, 1992
Comparison of four neural net learning methods for dynamic system identification
IEEE Transactions on Neural Networks, 1992
FIR and IIR Synapses, a New Neural Network Architecture for Time Series Modeling
Neural Computation, 1991
Finding structure in time
Cognitive Science, 1990
Identification and control of dynamical systems using neural networks
IEEE Transactions on Neural Networks, 1990
Non-linear system identification using neural networks
International Journal of Control, 1990
Pseudo-orbit shadowing in the family of tent maps
Transactions of the American Mathematical Society, 1988

Cited by 498 articles