A tiny #RWKV with 2.9M (!) params can solve 18239.715*9.728263 or 4.2379*564.778-1209.01 etc. with CoT, while being 100% #RNN (L6-D192)🤯The trick: generate lots of data with reversed numbers (denoted by "f" here) to train the model🚀Try it now: github.com/BlinkDL/RWKV-LM/t…

Jul 8, 2023 · 8:21 AM UTC

Replying to @BlinkDL_AI
This is ridiculous. How long to fine tune one of these on a MacBook?
Replying to @BlinkDL_AI
Thanks for a great project! I’ve ran a quick experiment to see if it can generalize to longer digits (compositional generalization). Multiplication still difficult, as in other works (arxiv.org/abs/2305.18654, arxiv.org/abs/2307.03381) (The eval code is at gist.github.com/syncdoth/8ae…)
Replying to @BlinkDL_AI
So we keep finding new things that kind of work and we don’t know why or even if it will work in other architectures. At this point llm papers look really close to pseudoscience...which is cool as pseudoscience is underrated haha