Google LaMDA LLM
LaMDA is one of the standard testbeds for Google scaling research and examining the many surprising capabilities scaled-up models turn out to have, and many papers have been published about it. Mysteriously, Googlers were not allowed to name LaMDA in those papers, or even to confirm or deny whether it is LaMDA when asked; instead, the early papers vaguely alluded to a series of large Transformers (eg. “we used pre-trained dense decoder-only Transformer language models, ranging in size from 2 million to 137 billion parameters. These models were pre-trained on web documents and dialog data”), leading to confusion.
So, this index collates LaMDA papers: typically during 2021–2022 if a Google paper uses a model size <20b, then it is probably a T5 bidirectional Transformer; >200b-parameters, it is actually a mixture-of-experts model (eg. Switch); if a >150b-parameter model is specified to be dense, then it may be a different model like DeepMind’s 280b-parameter Gopher.