Bibliography (6):
Universal Transformers
https://arxiv.org/abs/1701.06538
https://www.statmt.org/wmt14/translation-task.html
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data
Wikipedia Bibliography:
Transformer (deep learning architecture)
https://en.wikipedia.org/wiki/Dirichlet_process#The_stick-breaking_process :
https://en.wikipedia.org/wiki/Dirichlet_process#The_stick-breaking_process