Generalization, hyperparameters, and practical deep learning
NBIS
08-May-2026
The goal of supervised machine learning is to minimize the (unknown) generalization error.
Training set
Use the training set to develop learning algorithms
Test set
At the very end, use the test set to estimate generalization error
Underfitting
Just right
Overfitting




Few free parameters — low capacity

Lots of free parameters — high capacity
Training set
Use the training set to develop learning algorithms
Test set
At the very end, use the test set to estimate generalization error
Training set
Test set
Training set
Use the training set to adjust the function to minimize training error
Development set
Use the development set to adjust hyper parameters to minimize validation error
Test set
At the very end, use the test set to estimate generalization error


A useful algorithm is to run multiple trainings with different test sets, this is called cross-validation.
Source: https://mlfromscratch.com/nested-cross-validation
The average error over all splits is a better estimate of the generalization error than using a single split.
Source: https://mlfromscratch.com/nested-cross-validation
Source: https://mlfromscratch.com/nested-cross-validation-python-code
Source: https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks
Grad Student Descent — someone manually tries out a bunch.
Parallel coordinates chart of hyperparameter sweeps over learning rate, decay, momentum, and batch size, coloured by accuracy
Source: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters

Bergstra, James, and Yoshua Bengio. “Random search for hyper-parameter optimization.” Journal of Machine Learning Research 13.Feb (2012): 281-305.

{red, orange, yellow, green, blue, indigo, violet}In supervised machine learning, we are interested in modelling P(Y|X): the probability distribution of our target variable Y conditioned on our specific input variables X.

Bishop, Christopher M. “Mixture density networks.” (1994).

Gaussian Mixture Models (GMM) Explained, https://youtu.be/wT2yLNUfyoM?si=aYHnmxvH2wVhDjX6

Petrov, Tatjana, and Denis Repin. “Automated deep abstractions for stochastic chemical reaction networks.” arXiv preprint arXiv:2002.01889 (2020).

Oord, Aaron van den, Nal Kalchbrenner, and Koray Kavukcuoglu. “Pixel recurrent neural networks.” arXiv preprint arXiv:1601.06759 (2016).

erik.ylipaa@scilifelab.se