By clicking "Accept", you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. See our Privacy Policy for more information
Glossary
Recurrent Neural Network
AI DEFINITION

Recurrent Neural Network

A Recurrent Neural Network (RNN) is a type of neural network designed for sequential data. It maintains an internal memory that allows information from previous steps to influence current predictions. This makes it well-suited for tasks where order and context matter.

Background
First introduced in the 1980s, RNNs became practical in the 2010s with better hardware and large-scale datasets. They represented a breakthrough in modeling temporal dependencies, especially before the rise of attention-based architectures like Transformers.

Applications

Strengths and weaknesses

  • ✅ Captures sequential dependencies effectively.
  • ✅ Foundation for advanced variants like LSTMs and GRUs.
  • ❌ Struggles with long-term dependencies due to vanishing/exploding gradients.
  • ❌ Training can be slow and less parallelizable compared to modern architectures.

RNNs embody the idea of sequential memory in neural networks, making them well-suited for tasks where context evolves over time. Unlike feedforward networks that process inputs independently, RNNs create a feedback loop where the hidden state carries past information forward. This enables them to model sequential dependencies, whether in sentences, audio signals, or time series.

A critical limitation of classical RNNs is their struggle with long-term dependencies due to vanishing or exploding gradients. This problem inspired the development of advanced variants like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), which introduced gating mechanisms to preserve or discard information selectively. These extensions remain influential today, even as transformers dominate many sequence-processing tasks.

Despite being partly superseded, RNNs still play a role in resource-constrained environments. Their relatively simple architecture and smaller parameter count make them attractive for embedded systems and streaming applications where low latency matters more than state-of-the-art accuracy.

📚 Further Reading

  • Elman, J. L. (1990). Finding Structure in Time. Cognitive Science.
  • Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning.