BlinkDL · Jul 8, 2023 · 8:21 AM UTC

BlinkDL · Jul 8, 2023 · 8:21 AM UTC

BlinkDL

A tiny #RWKV with 2.9M (!) params can solve 18239.715*9.728263 or 4.2379*564.778-1209.01 etc. with CoT, while being 100% #RNN (L6-D192)🤯The trick: generate lots of data with reversed numbers (denoted by "f" here) to train the model🚀Try it now: github.com/BlinkDL/RWKV-LM/t…

Jul 8, 2023 · 8:21 AM UTC

195

Lachlan Gray · Jul 10, 2023 · 2:54 PM UTC

Lachlan Gray @lfegray

Jul 10

Replying to @BlinkDL_AI

This is ridiculous. How long to fine tune one of these on a MacBook?

Choi Sehyun · Jul 11, 2023 · 3:05 PM UTC

Choi Sehyun @schoiaj

Jul 11

Replying to @BlinkDL_AI

Thanks for a great project! I’ve ran a quick experiment to see if it can generalize to longer digits (compositional generalization). Multiplication still difficult, as in other works (arxiv.org/abs/2305.18654, arxiv.org/abs/2307.03381) (The eval code is at gist.github.com/syncdoth/8ae…)

Tiago Freitas · Jul 9, 2023 · 8:35 AM UTC

Tiago Freitas @tiagoefreitas

Jul 9

Replying to @BlinkDL_AI

So we keep finding new things that kind of work and we don’t know why or even if it will work in other architectures. At this point llm papers look really close to pseudoscience...which is cool as pseudoscience is underrated haha