Bibliography (3):
https://github.com/databricks/megablocks
‘end-to-end’ directory
MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism