Bibliography (8):
https://github.com/google-research/meliad#block-recurrent-transformer
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Compressive Transformers for Long-Range Sequence Modeling
Attention Is All You Need
Wikipedia Bibliography:
Long short-term memory
GitHub
https://en.wikipedia.org/wiki/Cross-entropy#Cross-entropy_loss_function_and_logistic_regression :
https://en.wikipedia.org/wiki/Cross-entropy#Cross-entropy_loss_function_and_logistic_regression
Cross-entropy