Wikipedia Bibliography:

https://en.wikipedia.org/wiki/Multilayer_perceptron :

https://en.wikipedia.org/wiki/Multilayer_perceptron
Transformer (deep learning architecture)
Convolutional neural network
CIFAR-10
Graphics processing unit
Tensor processing unit
Data augmentation
Stochastic gradient descent
Rectifier (neural networks)
Neural scaling law :

https://en.wikipedia.org/wiki/Neural_scaling_law
Power law
Batch normalization
Ridge regression § Tikhonov regularization :

https://en.wikipedia.org/wiki/Ridge_regression#Tikhonov_regularization
Dilution (neural networks)
https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam :

https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Adam