Hyperparameters
Hyperparameters are values set before training an artificial intelligence model. They govern how the model learns, unlike parameters (weights and biases), which are learned automatically during training.
Examples of hyperparameters
- Learning rate: how quickly weights are updated.
- Batch size: number of samples processed per iteration.
- Network depth and width: number of layers and neurons.
- Regularization factors: dropout, L1/L2 penalties.
Background
Choosing the right hyperparameters is essential. Poorly tuned hyperparameters can lead to underfitting (model too simple) or overfitting (model memorizes training data). Automated approaches such as grid search, random search, and Bayesian optimization are widely used to explore the hyperparameter space.
Strengths and challenges
- ✅ Allow fine control of learning behavior.
- ✅ Directly influence performance and generalization.
- ❌ Optimization is often computationally expensive.
- ❌ Hyperparameters are task- and data-specific.
Hyperparameters can be thought of as the “settings” of a learning algorithm, defining how it will approach training before it even sees the data. While they don’t directly encode knowledge about the dataset, they shape the model’s ability to learn patterns effectively. A learning rate that is too high may cause the model to diverge, while one that is too low may result in painfully slow training or getting stuck in local minima.
In practice, hyperparameters are not independent. Choices often interact: for example, batch size influences the optimal learning rate, and the number of layers interacts with the choice of regularization. This makes hyperparameter selection a highly multidimensional challenge.
Today, automated tools (like Optuna, Hyperopt, or Ray Tune) assist in tuning, but expert intuition still plays a role. For production systems, it’s not enough to focus solely on accuracy—hyperparameters must also be chosen with training cost, inference latency, and energy consumption in mind.
📚 Further Reading
- Bergstra, J., Bengio, Y. (2012). Random Search for Hyper-Parameter Optimization.