A tiny #RWKV with 2.9M (!) params can solve 18239.715*9.728263 or 4.2379*564.778-1209.01 etc. with CoT, while being 100% #RNN (L6-D192)🤯The trick: generate lots of data with reversed numbers (denoted by "f" here) to train the model🚀Try it now: github.com/BlinkDL/RWKV-LM/t…
Jul 8, 2023 · 8:21 AM UTC