Recurrent Neural Network
A Recurrent Neural Network (RNN) is a type of neural network designed for sequential data. It maintains an internal memory that allows information from previous steps to influence current predictions. This makes it well-suited for tasks where order and context matter.
Background
First introduced in the 1980s, RNNs became practical in the 2010s with better hardware and large-scale datasets. They represented a breakthrough in modeling temporal dependencies, especially before the rise of attention-based architectures like Transformers.
Applications
- Natural language processing: machine translation, text classification.
- Speech recognition: mapping audio to text.
- Time series forecasting: predicting financial trends or sensor readings.
- Generative tasks: producing music, poems, or sequences of text.
Strengths and weaknesses
- ✅ Captures sequential dependencies effectively.
- ✅ Foundation for advanced variants like LSTMs and GRUs.
- ❌ Struggles with long-term dependencies due to vanishing/exploding gradients.
- ❌ Training can be slow and less parallelizable compared to modern architectures.
RNNs embody the idea of sequential memory in neural networks, making them well-suited for tasks where context evolves over time. Unlike feedforward networks that process inputs independently, RNNs create a feedback loop where the hidden state carries past information forward. This enables them to model sequential dependencies, whether in sentences, audio signals, or time series.
A critical limitation of classical RNNs is their struggle with long-term dependencies due to vanishing or exploding gradients. This problem inspired the development of advanced variants like Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), which introduced gating mechanisms to preserve or discard information selectively. These extensions remain influential today, even as transformers dominate many sequence-processing tasks.
Despite being partly superseded, RNNs still play a role in resource-constrained environments. Their relatively simple architecture and smaller parameter count make them attractive for embedded systems and streaming applications where low latency matters more than state-of-the-art accuracy.
📚 Further Reading
- Elman, J. L. (1990). Finding Structure in Time. Cognitive Science.
- Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning.