“Stack More Layers Is Fine As GPT-4 Is about to Show but There Are Superior Routes…”, 2022-08-27 ():
With smaller models and more data, the Chinchilla authors got much better results for a fixed compute budget. Moreover, as one well-known commentary pointed out, the Chinchilla scaling formula suggests that there’s a hard limit for model accuracy that no amount of model size will ever overcome without more data. […].
This is a good paper and why our approach is general artificial intelligence for maximum impact, getting it to as many folk as possible and having diverse highly structured data sets
Stack more layers is fine as GPT-4 is about to show but there are superior routes…
Distributed is a better word. We have communities around specific modalities with cooperation at core and are setting up joint ventures and local communities in every country to build national datasets and models. Complex hierarchical system in how they interact and operate.