distillation is going fine now after i stopped hand tweaking values and literally asked chatgpt to suggest values because the ones present in most papers were doing very poorly

Oct 13, 2023 · 5:30 PM UTC

zephyr distill into my 44m textbooks model for a full epoch on textbooks as a continued pretraining, the textbook model only did 1 epoch and i raised dropout to 30% for this
Replying to @aicrumb
Is the repo open source?
its just a scratchy notebook right now
Replying to @aicrumb
we need more ML training ML