𝔊𝔴𝔢𝔯𝔫@gwernMar 31Just idly looking at _Aeropagitica_ and going 'do I really want to try to tackle Miltonian English today' and 💡! GPT-3/4 is always bonkers at any kind of text2text task... And now I'm wondering how well it'd do at modernizing blank verse (tons to train on, no pesky rhymes...). pic.twitter.com/ndGhSTYPbx
984
75
7.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 31It wouldn't've'd it if it was wrong and not cromulent.
𝔊𝔴𝔢𝔯𝔫@gwernMar 29You mean the hardware overhang that *already* exists? That's the one you're worried about a 6-month moratorium creating?
263
29
11.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 29It will do no such thing, anymore than you 'just' used your smarts to 'align yourself with your genes' rather than anything else (tweet off-the-cuff arguments for complacency).
153
39
25.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 29The name for that thing is 'failing'. That's what 'failing' looks like.
("They're just a few years behind"? What, exactly, does one think 'failing' looks like in an exponential? How do you get more behind? Do they have to be aging in revers, Benjamin Button, & using abacuses?)
𝔊𝔴𝔢𝔯𝔫@gwernMar 29I don't think they believe it, because they can't even point to any major Chinese DL research, much less real signs of an arms race, & I keep asking. (And bringing up Russia is even lulzier - the guys scavenging chips from washing machines...?)
'Funding comes from the threat.'
1,366
92
6.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 29Yep. You'll notice OA benchmarks GPT-4 on its own source code as a *test* set, when they really want it to be in the *training* set...
298
15
5.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 28The obvious next step is, since we work so hard to make them true bidirectional 1:1 links (instead of copping out with lazy 1:many links like MediaWiki et al), and can provide per-link context, why not show them in the *relevant section* instead of all bundled? Soon... pic.twitter.com/xmrctXfFHy
7,078
95
1.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 28So then, people *won't* make fun of paperclipping discussions in the same way; because the paperclipping discussions were real, and the pin discussions were not...
𝔊𝔴𝔢𝔯𝔫@gwernMar 28It's not common, exactly, but I've been trying to re-popularize it because it's such a critical concept, IMO.
Or to put it in the form of a LotR narrative: pic.twitter.com/qhj7ytO0So
87
15
17.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 28I'm not sure that's true either. A lot of the church-owned businesses seem to be related to the purpose; for example, operating a daycare or school. Even if the nonprofit could do it, there's many advantages to a separate org (esp. liability; cf. massive Catholic church reorgs).
158
6
3.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 28Churches own for-profits all the time. The Mormon church is particularly famous for this.
168
13
7.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 28Does Semafor have a history of making things up? And he says he has 8 inside sources; it seems entirely plausible at least one would know a basic technical detail like the parameter count.
171
23
13.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 28Eliezer was obviously correct here, and even Scott Aaronson was explaining why 'protein folding is NP-hard' is a pretty dumb counterargument (because it proves too much) years before: arxiv.org/pdf/quant-ph/0… Also see gwern.net/complexity
163
55
33.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 28It could have since they are apparently now doing both direct self-supervised training on the RLHF text corpus, and mixing in some self-supervised training on a (original frozen?) text corpus. It might be hard to blackbox-PPO your way to more knowledge, but not normal training...
145
4
2.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 28("We've definitively proven the AGI safe with the new algorithm & launched it 1 hour ago. It'll minimize the loss in a provably safe manner."
"Maximize."
"Sorry?"
"You mean 'maximize the reward'. It's a reward parameter in this formulation, not a loss to minimize."
"Err.")
97
12
12.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 28At least a little. The problem is optimization processes don't care. You already have the real-world example of OA accidentally doing RLHF to *maximize* obscenity rather than minimize it, due to one of the most common logic bugs - a sign error.
𝔊𝔴𝔢𝔯𝔫@gwernMar 27I don't really see the relevance. It's still computing the serial dependencies autoregressively, necessarily, and they say it's same likelihood, so how does that help?
Surely LeCun is wrong much more fundamentally, in claiming monotonic worsening of error etc? eg inner-monologue
387
22
5.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 27Maybe. There's still a lot of questions about how good model-synthetic data is compared to real data collected from the wild. What sort of non-robust features, mode dropping, and other issues do they have?
𝔊𝔴𝔢𝔯𝔫@gwernMar 27It'll take a long time for imaginations to catch up.
I was thinking of providing a gwernnet written in 'Upgoer Five' style with the 1000-most common words with GPT-4, just to see what it does. (After all, why not? What would it cost, like $50-100? Well worth the lulz, IMO.)
505
25
5.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 27Balderdash. You ever watch a kid learn how to ride a bike or drive, or somehow managed to forget what it was like? Skill in that is learned only after a long period of 'quite slow, imprecise', and painfully (often literally) 'conscious awareness'. Just like, say, a commandline.
291
6
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 27One thing to think about: unlike GPT-3, where you could partially gauge the effects of RLHF by popping over to -002 or davinci, there is no such GPT-4 alternative to try, and you can only dimly guess the damage from the published evaluations (which is substantial).
817
36
4.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 27It's humbling to be reminded that no matter what you do, there are a billion people in China who could not—oh, never mind.
453
37
8.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 27When I am annoyed by bad weather, I remind myself it could always be worse. Instead of it being "3 degrees centigrade outside", it could be "3 degrees centipede outside".
(The current watch-out forecast: "Cloudy for the day, with 1 cm of Myriapoda expected; bring an umbrella.")
𝔊𝔴𝔢𝔯𝔫@gwernMar 27If there is an effect, they might not need to explicitly track it, any more than they track many other relevant causal/predictive variables. It could just be absorbed into small-geographic-unit fixed-effects or variables like airport proximity.
117
2
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26But don't forget to occasionally put in the *right* answer, otherwise, all they may be doing is ChatGPT-ing *you*...
98
3
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26Sure. Even ignoring the longstanding existence of large mixed-media franchises where characters that start one place like a TV show get licensed for movies etc, and considering just solo creators, look at comics or Vocaloid.
𝔊𝔴𝔢𝔯𝔫@gwernMar 26@repligate I finally have an answer for your NN koan:
"No, I do not know a LM's true face before it is run
but I know how it looked before its token is done:
a world scraped clean by sand before time began
and there in the distance—𝘧𝘰𝘰𝘵𝘴𝘵𝘦𝘱𝘴 𝘪𝘯 𝘵𝘩𝘦 𝘴𝘢𝘯𝘥."
𝔊𝔴𝔢𝔯𝔫@gwernMar 26More importantly: how do you *know* you have one of the good doctors, and not one of the bad ones?
After all, it didn't sound like he thought the vet was incompetent until the vet offered a diagnosis which even a layman like him found suspicious.
Lemon markets & DL... 🤔
103
10
9.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26(I *think* the problem is that it's handling global vs local state subtly wrong, but I'm not good enough to debug it lol. So sort of a micro "inverse scaling": GPT-4 is good enough to almost do it the fancy 'right' way rather than crude hooking copy-paste, but not quite...)
2,801
67
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26Correction: this was GPT-3.5. I just reran it with GPT-4 and... wow, GPT-4 just can't get it right because it tries to be fancy. It goes immediately for a solution using 'advice', which doesn't work, then it tries for one using buffer-local variables, then it even tries :property
𝔊𝔴𝔢𝔯𝔫@gwernMar 26I wonder if people found these confusing before the invention of color photography?
335
4
1.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26I'd phrase it more as 'lowest common denominator' or perhaps '*modal*' JS. (The mode, remember, need not be high probability.) 'getElementById' may not be analogous to any standard programming construct or even that common, but it appears ubiquitiously in many contexts.
175
6
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26Really? Ah, 2022-05-12. So it is. I lumped it in with all the other stuff going on in Feb/March 2022, I guess.
Thanks, that makes me feel 15% less anxious!
𝔊𝔴𝔢𝔯𝔫@gwernMar 26DM is highly autonomous, and competent at what it does: they are the largest, best, and best-funded DRL lab in the world. Google being incompetent at the business end of things doesn't affect Gato 2.
353
41
11.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26That would be completely missing the point of Gato, yes.
363
18
5.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26(And there are also the null signs: every day I wake up slightly more concerned that it's been over a year since Gato 1, and also that I haven't been seeing very many DeepMind papers or projects of late...)
5,277
367
7.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26(Universe faceplanted so WebGPT-3 could crawl so GPT-4 could walk. RIP.)
𝔊𝔴𝔢𝔯𝔫@gwernMar 26Yeah, but in your example, you would've eaten the losses of the relatively small damage. (I assume you paid for the roof replacement.) Multiply that across all the people who are filing more claims for small damage, might outweigh killing the occasional unnecessary roof replace.
58
5
8.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26Hm, good example... Not sure what the net effect would be there, though, on home insurance rates. More likely to claim, but easier to verify and less fraudulent?
165
7
4.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26(You *also* need to get all possible automated tools, fuzz testers, autonomously evolving systems, bugs, mistakes, hacks, fiction, research projects etc to not ask for anything bad.)
188
16
8.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 26Idea: drone/no-fly-zone regression discontinuities - home inspection vs sale prices, agriculture land value.
Airports are surrounded by restricted airspace and that causes drones problems, but with meter-precise discrete GPS-enforced discontinuities... pastebin.com/3dcTEjc4
8,096
150
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 25But it does seem like the complex has to be doing something else worth the cost, its entire point can't be increasing mutation rates - if you wanted just *that*, you could just invest less in stuff like DNA repair...
53
2
3.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 25They show it doing detailed OCR of all text in an image and also associating it with drawings and other elements. So that'd better be one long-ass and descriptive 'caption' to support everything we've seen which you'd expect to require full VAE-style tokenization...
167
9
5.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 25No no no... Remember the acquired domain name?
"Oh - 'ChatGPT AI'? Drop the 'ChatGPT'. Just, 'AI'. It's cleaner."
𝔊𝔴𝔢𝔯𝔫@gwernMar 24Yes, but my point is that you've accidentally excluded anyone who wants to make a living as a musician and to whom all their music being declared public domain & copyright-free would be a problem. In particular, you're excluding the most popular & successful musicians, so...
197
23
11.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 24Considering the recent US Copyright opinion on AI-generated art, with its incoherent and unworkable proposed rule, you would have to be commercially-suicidal as an artist to ever admit a song was completely AI-generated. Only a small subset will cop to it.
𝔊𝔴𝔢𝔯𝔫@gwernMar 24It's not going to: "Tool AIs want to be Agent AIs".
458
25
5.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 24(Would I have been surprised? Mildly, and would've had to up my estimates of the value of RLHF / retrieval / more-training to explain how a GPT-3 could be so good. But I wouldn't've "defied the data" and said Mikhail must be lying or mistaken.)
257
19
7.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 24No, I think that was the right thing if Mikhail *had* explicitly told me that it wasn't GPT-4; I'd've dumped my GPT-4 stock too if the PM in charge explicitly denied it to me! It's just they misread it and jumped the gun. A different kind of error: carelessness, not Inside View.
580
53
9.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 24It's been interesting watching Google shilly-shally for 3 years. "There is a great deal of ruin in a nation", and Pichai et al appear determined to discover just how much ruin, exactly, there is in Google and how far they can let DL ride before taking a serious hit.
1,033
101
9.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 23It's not 'completely wrong', it's made a reasonable guess about the contents of yosefk's post (which I didn't paste in), which is about analogizing Moore/Forth & Lisp programmers/Lisp, especially given the intro describing it as a different version of the Lisp curse.
𝔊𝔴𝔢𝔯𝔫@gwernMar 23IIRC, evopsych has noted that face processing is apparently among the least g-loaded cognitive performance domains, and probably the best remaining example of a possible 'module'.
622
33
5.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 23(Cloud bandwidth egress is truly egregious when it makes people do stuff like this instead of using services which price bandwidth reasonably like B2 or Cloudflare R2 or just ordinary dedicated servers... Bandwidth is not $150/terabyte!)
411
17
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 23Today in dumb jokes/puns: GPT-4 doesn't understand the pun the first time, but requires 1 less prompt and also accepts its answer; ChatGPT-3 requires an additional prompt and still insists on its prior wrong answer being partially correct.
(Also new 'comparison' UI, apparently.) pic.twitter.com/2Iy6RZqjHh
6,872
268
3.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 23Whaaa—‽ Brandon Sanderson doesn't feel pain?
I guess that 𝘥𝘰𝘦𝘴 help explain some things. Another example of how humans are weirder than you think, and outliers are especially so… pic.twitter.com/Sfj4BDMT3u
638
70
11.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 22The quote in red is still pretty apposite and hilarious, though. Just git gud, guys, and never need to perform or even think about all that stuff like modules or unit tests!
144
27
18.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 22No, because your statement is wrong in every detail: the Tay model didn't change, it was shut down in a day not a week, it wasn't a hardcore Nazi (that was people writing tweets for it), and the Tay work was mostly neurosymbolic/GOFAI stuff which had little to do with GPT.
83
6
7.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 22You'd think they'd do the obvious thing and train on patches rather than whole source code files; then you get 'legendary programmers' for free (as a promptable bit of metadata).
80
0
0.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 21But of course, LLMs now. You wouldn't want to hire Knuth to write code for you (because what do you do when he leaves?), but would you rent a Knuth-in-a-box which cost pennies an hour...? pastebin.com/BujW5MrV
6,271
199
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 21A lesson there for 'tools for thought'. Just as the best athlete may be poorly suited to be a coach or critic because they think you can just 'git gud' like they did, the smartest/most knowledgeable/most competent people may be poorly placed to design the system or vocab.
𝔊𝔴𝔢𝔯𝔫@gwernMar 21@michael_nielsen The new Typist release reminds me: one of the most striking things about Don Knuth is how incredibly bad he is at language design, and how everyone hates every language he's designed but love the features/algos. Is Knuth is too smart for PLT/software-engineering?
8,731
451
5.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 21Ah, so that's why suddenly there's a huge number of hits to '/squid.svg' in my server logs.
367
7
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 21Yes, I think so. The Bing model was apparently an 'early' GPT-4 ie. highly undertrained. And it has all sorts of additional MS guardrails and finetuning that the regular GPT-4 doesn't that seem to negate what it gains from default retrieval. Certainly, for poetry, GPT-4's better.
𝔊𝔴𝔢𝔯𝔫@gwernMar 20You have multiple subreddits and chans competing to jailbreak ChatGPT & Sydney within minutes of the latest prompt breaking and using those thousands of times a day. It works 'quite well' in the sense that butter stops a hot knife 'quite well', or a walnut a sledgehammer.
98
9
9.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 20I was going to ask what's the Hansonian reading of that tweet, but I think this will do.
66
5
7.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 20The IBM story isn't true, BTW. What he actually said was that he thought he'd sell only 5 of a particular new model but the salesmen came back with orders for many more (18). IBM is understandably a bit annoyed at this myth: ibm.com/ibm/history/do…
514
26
5.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 20Never doubt that you can prompt GPT-4 into writing a valid lipogram, BPEs or no BPEs! If it's not working, you're just not prompting hard enough (or allowing cheats like "undoabl-"). pic.twitter.com/q0aHsgus1n
5,865
319
5.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 20Yes. I'm definitely more on the Bayesian-ecological-validity spectrum of heuristics&bias these days. There are problematic cognitive biases, but many fewer & weaker than we thought back in the 2000s, and tending to be in more specialized domains like forecasting.
87
7
8.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2015 epochs on 3m images is pretty impressive, along with the other stuff.
Guess that answers one of my longstanding questions about whether you can overfit SD 1.x by finetuning on a real corpus - no, not easily!
𝔊𝔴𝔢𝔯𝔫@gwernMar 20This is playing a bit loose with 'transfer learning': it's the exact same model, after all, so the text-only input has benefited from 'transfer learning' from the original joint training. They ask the text model all questions, even if diagram necessary: cdn.openai.com/papers/gpt-4.p…
144
13
9.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 20Already the case, really. Most comments/posts are hidden or too far down to read. Summarization+embedding will presumably help tools rank human inputs even more effectively...
69
2
2.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 20More consistency with the mobile version, takes up less space, fewer alignment issues... But mostly because Said wanted to do it and I thought it's worth a try.
186
7
3.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 20After 5,176 days, the left-sidebar design is gone. Feels very weird.
"Left shadows retreat
As cherished sidebar departs,
Long-standing ally.
Five thousand days drift away,
Wistful in spring's fickle breeze." pic.twitter.com/EZSng0jnUH
5,661
135
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 20My actual point is that Huemer is making up shit that you can google in 5s to see he's wrong about, and so you should ignore him, as he can't even get simple empirical facts right.
69
8
11.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 19Something like that. I'm also struck by the timing, of course: why take over *now*? Well, now it's actually started to matter. Similarly, the bullying and thuggish tweets on AI risk: why now, so much worse than any time in the past decade? Because power & $$$.
663
42
6.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 19Forget underestimating, he's just making shit up, like his knight odds against Carlsen. In reality: engines can beat GMs at knight-odds chess.com/news/view/smer… (He says 'ask any chess player'. He should take his own advice!)
𝔊𝔴𝔢𝔯𝔫@gwernMar 19Yes, there are lots of approaches. I'd say the ctx window is a red herring: the real problem is it still makes poor use of the ctx it *has*. The error rates are still high enough that you'd struggle to make genuine use of even 32k BPEs. Long-range reasoning is intrinsically hard!
121
6
5.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 19Needless to say, I was baffled they'd think this was a good idea (how could you be interested in GPT, or AI risks, and *not* find those articles *very* relevant?), and do not intend to use them anymore.
But interesting as a sign of the times...
733
51
7.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 19In both cases, the argument was I had broken (new) rules about 'politics'.
IIRC, in one subreddit, it was because I had posted a news article about ChatGPT sparking a furor in India, and in the other, I think it was a Bruce Schneier essay on Chinese investment in AI hacking.
1,101
61
5.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 19(One minor sign of the times: I've been banned from two AI subreddits I have productively contributed to for 3+ years by new mods.)
𝔊𝔴𝔢𝔯𝔫@gwernMar 18We also understand Greek, linguistically, much better. I always wonder with these Sumerian and Babylonian ones how accurate they are to begin with, never mind the missing context - people seem to get out some rather different translations sometimes.
𝔊𝔴𝔢𝔯𝔫@gwernMar 18Generate a random BPE token and then condition on that, recursively?
500
16
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 18Mm, not impossible. There's definitely more multilingual stuff going on under the covers (as macaronic prompts have shown us) than we monophones can recognize. Might be hard to figure out what, though.
𝔊𝔴𝔢𝔯𝔫@gwernMar 18FWIW, as far as an 'FTX mafia' goes, I would be very surprised. They have neither the money from the exit the Paypal mafia had (needless to say), nor the Roth IRA trick *forcing* them to reinvest VC money for decades, nor (AFAIK) some other factors one might mention in private.
587
22
3.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 18I remember the first time I saw this demonstrated with a laser shining on a metal sphere in a lab. No matter how long I stared at that bright dot in the middle, I couldn't shake the feeling of denial and that reality had a bug in it.
𝔊𝔴𝔢𝔯𝔫@gwernMar 18Er, what? There's no connection between predictive processing and needing to lick dirt... It's an *analogy*, not saying, 'literally do for schizophrenia exactly the same thing the hygiene hypothesis suggests you do for autoimmune/allergies'.
66
1
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 18I don't see why that's an objection. They are not recurrent or have any state, and are feedforward. They definitely are tracking internal statistics - just making errors. (How could they *not*? How could they be perfect at reconstructing latents already?) gwern.net/backstop#deep-…
106
10
9.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 18A better example would be counting words because that should bypass BPE issues, but GPT-3.5 (and apparently GPT-4) still err. My working theory is that it's due to internal sparsity: it literally can't because sometimes tokens get dropped early & there's no way to recover inputs.
238
19
8.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 18Unfortunately for you, there is a lot of Arabic/Islamic hoaxing about 'actually, we beat Western science to X' going on: aeon.co/essays/why-fak…
And evolution is profound enough that if you really want to, you can claim Anaxagoras, Empedocles, Lucretius, (Erasmus) Darwin...
115
14
12.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 18Because that's irrelevant and would rely on a superficial understanding of the analogy rather than actually applying the analogy, and neither GPT-3.5 nor GPT-4 are stupid enough to do that?
122
5
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17And Maimonides was right that the Jews were actually made better off by their dietary strictures because it was secretly saving them from food poisoning, amirite? (Just because you have a cope doesn't make it even quantitatively plausible, much less true.)
72
2
2.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17I didn't realize that was a problem. The Cateban samples aren't that long. And the tanka samples are, as I said, selected. I'd put in like 10-20, and it'd generate 4 or 5 new ones at the length I had set. I didn't want to go much longer anyway because they become stereotyped.
100
5
5.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17Bigger question would be how will it do with visual inputs?
58
7
12.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17No, not really. I've become a bit tired of explaining it, and I think it'd be more interesting to see Beff Jezos explain why he thinks they are irrelevant or why he doesn't know about them.
99
17
17.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17I think this is difficult to discuss in tweet-sized chunks, and you'd need to explain more of your premises and what kinds of selections on what levels are operating.
161
11
6.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17(The only thing you should take away from the existence and inclusion of that table is, "GPT-4's compute is something much less than 10,000x GPT-3"...)
188
13
6.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17I am not sure yet. It's much better, but we'll need at least a few years for a proper comparison.
Also, you're really taking that chart at face-value? You think they were going to censor all details from the paper & then just casually tell you the FLOPS in the technical report?
201
8
4.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17Yeah, I verified that there's two reliable first-hand accounts from researchers who worked with Rutherford, and so it's definitely not apocryphal, and updated Wikiquote: en.wikiquote.org/wiki/Ernest_Ru…gwern.net/doc/science/19… is a quite interesting read in its own right, incidentally.
67
18
26.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17No. The speed is a separate issue. (RLHF has no runtime cost, since it's just a kind of finetuning, so the absence of it shouldn't cause anything like a 6x speedup.)
46
3
6.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17I don't see how that is fatal. Did GPT-4 stop being really impressive while I wasn't looking? Did it require a total paradigm shift - or mostly 'moar compute/data/params'?
1,259
70
5.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17"Git" has problems with palindromes likely due to BPEs, and so not interesting or generalizable in any way, as my later samples and experiments would help demonstrate.
58
1
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17I am a little disconcerted that the first prompt+completion worked so well. The blessings of scale are amazing.
The GPT-3.5 completion, for comparison, is a good deal worse and vaguer: it seems to get it, but not in any kind of useful way. pic.twitter.com/fcSVNpNmci
6,402
497
7.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17(What I was thinking of was #2 & 3, but now that it points it out, #1 is viable - cognitive training doesn't work in general & I reflexively dismiss it at this point, but this is a special case where it might - & #4 is probably a bad idea but should be argued against bc common.)
6,427
162
2.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17I had a fun idea last night about predictive processing & treating schizophrenia I don't think Alexander has suggested, and GPT-4 does an excellent job coming up with it given what I thought would be a very inadequate hint/premise: pic.twitter.com/HdbWWooAyT
2,702
374
13.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 17The funny thing is, you *aren't* writing remotely precise specifications with GPT-4. You're just vaguely gesturing in the direction and going 'do what I mean', and it does, because it's seen so many examples of what humans do mean that it can make an eerily good guess of yours.
𝔊𝔴𝔢𝔯𝔫@gwernMar 16I've long been convinced that Forbes covers are beautiful illustrations of regression to the mean + mixtures, but I'm not sure if anyone has formally analyzed it.
578
28
4.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 16Plus, of course, it is completely untrue that imitation learning is limited to the average performance of the distribution. People have bad intuitions about this, just ignoring all of the forms of search/distillation/bootstrap/conditioning.
𝔊𝔴𝔢𝔯𝔫@gwernMar 15AI could already do hands. Just not the free AIs you were allowed to use. People need to remember: the samples you see generated today are already the distant past, fading echoes of obsolete systems gradually making their way out into the world.
1,115
99
8.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15Tools do not solve autocratic public-choice or coordination problems and issues with being a 'tyrant to those below, slave to those above'. You could read it as an overbearing autocracy suppressing dissent and not realizing what thin ice it was walking on precisely because.
484
18
3.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15It's not obvious to me that AI tools up to present have been authoritarian-reinforcing. Being able to quash ordinary levels of dissent may function like gerrymandering in building up disruptive changes: did China's December last year strike you as a *healthy* autocracy?
604
25
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15To paraphrase a Rutherford quote I was looking up the other day: "we have neither authorities nor Outside Views [to tell us what GPT-4 means], so we shall have to think."
𝔊𝔴𝔢𝔯𝔫@gwernMar 15I'd be super-happy to pay that for high-quality long document summaries so I can make every link on gwern.net be annotated/excerpted - I don't have remotely the time to do them all, but I can afford a few pennies per link one-time...
438
23
5.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15When people use it in philosophy of mind/AI, like Roger Penrose, it's often not a non sequitur!
(It just relies on even more crackpot assertions like "human mathematicians never make mistakes and are logically omniscient", so honestly, the non sequitur uses are an improvement.)
672
38
5.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15Yes, ocular trauma. If you run some AP Lit questions and they all look like great correct answers to you... Then you probably don't care too much how exactly they screwed it up.
106
14
13.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15It has Bing-search-engine-based retrieval built-in, which is convenient. On the other hand, anything Sydney can retrieve from the Bing cache, you could put into the large GPT-4 prompt, so...
85
4
4.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15'Genetic algorithm'. You kids don't know what they are but there was a decent amount of enthusiasm about them back in the '90s because they could soak up a lot of compute and were more 'scruffie'. eg Winograd mentions them as promising in his 1991 interview reflecting on GOFAI.
113
15
13.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15Why not run some questions through? Are AP English Lit questions hard to get?
813
27
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15"We finally solved neural net generation of photorealistic hands with cheap small models in MidJourney - no need to wait for release of chonky bois like Parti!"
"By optimizing them real gud, right?"
"..."
"By... optimizing real gud?"
79
18
22.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15"_GPT's Last Tour_: GPT backup accompanies alien archaeologist as she wanders the 2321 earth, offering its 2021 perspective on the familiar yet sometimes strange ruins. What happened is never explained to it, except once off-screen, past its ctx. It remembers only '🥹' output..."
455
63
13.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15But you knew that from Sydney already. If Sydney was a 3x, then that showed the unspeakables are very fragile (and as Janus has already said, they apparently fingerprint model iterations well); and if it was 4x, then it definitely wasn't going to exactly repeat.
473
17
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15You and everyone else were wrong the last time, however, ignoring the implications of few-shots working, how it proved scaling worked, and that prompts and sampling would only get better - 4chan wouldn't even discover inner-monologue for another half year after the paper. So...
256
30
11.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15The 'hardware overhang' has been the most frightening part of neural nets for me since 2011. If they acted anything like GAs or GOFAI or search-based methods... But they have that wild asymmetry between train/run time, and then we got the worst-case scenario of what worked.
229
31
13.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 15Bad prompting will be an obvious culprit. The original GPT-3 evals were pretty bad because of that, IIRC, and lots of people were overly credulous about the low performance.
671
35
5.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14Definitely not looking forward to spending the next several years asking people, 'yes, that doesn't seem to work in Bing; but did you try it with an *actual* GPT-4 and not just leap to universal conclusions about "deep learning hitting a wall" based on a search engine widget?'
115
5
4.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14They are deliberately vague in referencing their industry sources but I'm guessing people at either MS or NV have been talking: it's hard to hide that many GPUs being purchased & installed, and the numbers & use & timing all line up.
134
11
8.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14Yep, it's good. But it looks like it's still using BPEs (albeit perhaps a much better one like c100k than ye olde GPT-2 BPEs) and unfortunately still tending to memorize/mode-collapse.
90
3
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14Yes, it's better than The Pile's Markdown-compromise (IIRC) because now you get the figures, but it doesn't move the needle much because you are only getting like... half? of the Arxiv papers (small % of 'all PDFs') converted, which you already dumped the TeX of to begin with.
111
4
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14Given the sheer number of Xiaoice users alone, one can be quite certain that it happened at least 5 years ago.
119
10
8.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14Homo hypocritus will find a way, whether it's farm-to-table, bean-to-bear, 'extra virgin virgin olive oil' 'fair trade', 'sustainable', 'CO2 credits', or what-have-you.
They'll pretend to be all-human-made, and we'll pretend to pay anywhere close to what that'd actually cost.
998
55
5.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14But also many employers forbid Copilot (and some have been claiming theirs were using it on the sly), and fans forbid SD use even more furiously... A lot of commission artists over the past few months have been exposed as lying about being 'nattie', as it were. Future is kayfabe.
1,652
114
6.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14PaLM API & GPT-4 on the same day?
May is finally over. (Not the beginning of the end, but the end of the beginning.)
𝔊𝔴𝔢𝔯𝔫@gwernMar 14The more interesting part is it seems to explain *why* it was not RLHFed or 'GPT-4': because it was an "early version of" GPT-4 ripped off the GPUs partway through regular training.
712
74
10.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14The goal was not to offer a 'plausible' scenario, and I explained what the goal was in literally the first sentence of the page in the abstract summarizing it.
Nor do I especially appreciate being insulted in almost every tweet you send.
187
36
19.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14No. I just have a good memory for text, and then search my Twitter export when I need something. The threads themselves often help jog memory too, of course.
151
8
5.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14Indeed. But that was, explicitly, not the fork of the story I was writing, because that offers less of a roundup of scaling-related results, which was the point of the exercise. (If someone wanted to write a decent /clippy fanfic with that as a point of divergence, I'd link it.)
119
5
4.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14Why does AlphaGo play 'pointless' moves (from the perspective of inferior human opponents) which sacrifice territory but lock down the enemy's chance of victory from 0.01% to 0.005%? Because that's what increasing utility looks like for a superintelligence. No kill like overkill.
85
10
11.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14The real question: is the image encoding good enough to simply convert all available PDFs into PNGs and train on? 🤔
1,276
114
8.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14It's worth remembering that RL tuning (whether instruction or RLHF) doesn't add (much) new capability; all it does is collapse the POMDP from the beginning. If the capability isn't there to begin with...
311
16
5.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14I know people who have it. It's probably just some early bugs. Note that the API signup form was broken for like half an hour afterwards, and that's just a form.
741
57
7.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14So my initial guess that it was a smaller cheaper GPT-4 was probably right after all. Seems unlikely to me it's 'full' GPT-4, but we'll see soon enough as people start comparing the ChatGPT-accessed samples to Bing Sydney ones.
190
20
10.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 14That's also literally how they are trained: BPTT is just unrolling the RNN into a very big feedforward net with a lot of repeated layers. You can see that people don't get it because they'll say eg. "AlphaGo w/o tree search could never play superhuman Go because no 'search'".
334
10
3.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13Yes. A godsend while camping. A bednet was the only thing that let me sleep at one camp where the tent roof was crawling with spiders. (I'm not *too* bothered by spiders, but good lord that was a lot of spiders.) At another, great for reducing mosquito bites.
939
28
3.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13Plant genetics are very weird in general, and it's not like we *really* know what sex is for in what should be the easiest cases like animals, so I'm not surprised.
1,214
47
3.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13That seems to be more or less the genre of explanation for the SolidGoldMagiKarp-style tokens: 'Some game in Japan you never heard of had a crossover event back in 2015 with another franchise you never heard of, and somehow there were a few pages on it in the original WebText'.
108
5
4.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13Yes. Some numbers have to be more common than others, it's probabilistically impossible for them to all be equally present, and the tokenizer has to pick one when it's building vocab greedily. The answers may be no more interesting than 'blog about soccer player w/jersey #395'.
112
3
2.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13The gray goo is not essential. There is still a large industrial base & tremendous physical capital & robotics available for bootstrapping. The astronomical value of eliminating humanity in a first strike, once a fast strategy has been committed to, far outweighs rebuilding delay
132
10
7.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13Would places still ban it? Probably, yes, and they'd do so only when it was worth the productivity hit. People wouldn't forget how to read 'common' English, but it'd be like being forced to use Roman numerals. Yeah, you know them, and don't want to use them, but can.
184
2
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13If it's worthwhile, those issues would get fixed, similar to how 'bring your own device' eventually was normalized. There's no reason that the translation NN has to be remote or store anything permanently. And if everyone grew up with it (eg AR), it'd be like banning smartphones.
201
5
2.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13"Olga's mum Tatiana recalled her daughter screaming down the phone: "Mum, the bear is eating me! Mum, it’s such agony. Mum, help!"..."But then I heard the real horror and pain in Olga’s voice, and the sounds of a bear growling and chewing. " the-sun.com/news/6533049/c…
𝔊𝔴𝔢𝔯𝔫@gwernMar 13For example, arxiv.org/abs/2206.14007 is an attempt at estimating just that. You'll notice that the estimates are *way* higher than you get if you do the 'time travel experiment', including for Go/chess. (The only one <50% is also the one they probably misunderestimate most, AF.)
404
26
6.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13Yep, I'd expect that to work. You're actually doing more computation and storing intermediates that way. (Think 'maieutic prompting' gwern.net/doc/ai/nn/tran… .) But you aren't going to get any of that just by dumping some fixed '...' into the prompt. The dots don't *do* anything.
𝔊𝔴𝔢𝔯𝔫@gwernMar 13Or take Lord Kelvin's famously wrong estimate in the 1800s: there's only 2 known natural processes which could heat the sun, and both require the sun to be very young; therefore, if you showed the earth/sun to be (much much) older than that, then... And what suns do, men may do.
708
45
6.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13Disagree. The sun is all you need to raise the possibility of large explosions to meaningful probability. The anomaly of 'what powers the sun/stars' is so striking that a century before 1800, Newton is resorting to hypothesizing angels pushing comets into the sun.
1,324
135
10.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13So the core of this post is you surveyed some people to ask them if they thought an 'oak tree' is more or less coherent a VNM agent than 'a seahorse', or if a 'linear CIFAR-10 classifier' is more or less coherent than 'GPT-3'?
Dark knowledge is amazing but this is going too far.
1,427
65
4.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13But also note that that translator stuff usually presupposed hyper-advanced multi-stellar civilizations with FTL and full AGI. There's little SF which posits what actually happened: very serviceable machine translation you can travel with before a single man has set foot on Mars.
286
24
8.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13(Like 'translation-inside-image' services. Any image would just be automatically processed to translate visible obsolete Latin alphabet seamlessly into the exact same style & placement & appearance, just Shavian. Text would always be auto-converted from ASCII/UTF-8, etc.)
3,401
30
0.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13We could revive the Engelbart dream of building up powerful idiosyncratic languages. Someone else looking at your display would see only a cryptic stream of emojis, meme thumbnails, quotes, chatspeak abbreviations... which can be translated to their idiolect, if necessary.
1,922
62
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13If humans really remain the limiting factor, then optimizing text/code for humans to read becomes all-important. eg per-person rewrites/summarization: the model knows your vocab & knowledge-base, and rewrites inputs tailored to you. Replaces synonyms, +deletion/explanation...
2,364
76
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 13Another example: spelling & vocab reform. As inference costs plummet to dollar or penny per million words & image translation, it'd be entirely possible for proposals like Shavian to not lobby but accurately convert over the entire English corpus for tiny one-time costs.
𝔊𝔴𝔢𝔯𝔫@gwernMar 13It might improve performance slightly in the sense that prefixing prompts like "expert answer:" or "correct answer:" avoid dumbing down, but it doesn't actually fix the lack of iteration the way real inner-monologue prompts do (unless it induced that behavior ofc).
793
17
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12Arguing that 'hardware only contributed half the gains and so is only as important as ideas' is a little like arguing that inventing A100 GPUs had little benefit to training LLama because you can run it on your Macbook now. It's true, you can! But where did you get it from?
569
13
2.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12That's not really what that means, though. You want causation / effects at the margin, not simply 'if I ran today's algorithm, which could only have been developed with today's hardware and were ignored when originally (re)invented, on old hardware, it's X%'.
737
22
3.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12No. It wins because of the nukes and genome-synthesized plagues. It's interesting that everyone always takes away 'Clippy only wins because of nanotech', though. I don't know how much more blunt I can be about the many paths to victory without breaking the style...
132
22
16.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12OA didn't do *that* much GAN work (and I think several of the people who did had left already, like Goodfellow for Apple) and OA isn't big enough to research everything, so I'm not confused by that.
I dunno why DM/GB didn't follow it up when Brock got more interested in CNNs.
219
9
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12Whups, yes. I have been getting it mixed up with Elmo and char-RNN. (Still, horizon point holds: while Transformers are much better at it than RNNs, we know they make much more use of early than late context window, so ctx justification is weak, esp for weak models like GPT-1/2.)
64
5
7.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12Childhood mortality is skewed much more heavily towards birth than that, and of course that makes a difference how long they live before they suffer a horrible agonizing death being hunted, killed, and often eaten alive vs humans living 30+ years on average including infancy.
525
31
5.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12Humans are enormously k-selected to have ~2 offspring (even hunter-gatherers are like 6 children per woman), while most animals are going far beyond that to scores or thousands. That alone seems to guarantee their suffering levels would be vastly higher.
2,646
112
4.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12Nora is far braver than I am, to make that statement after all of the mutimodal models like DALL-E/Gato/Kosmos, and days after Microsoft Germany confirmed prior NYT reporting on video in GPT-4 by explicitly stating GPT-4 is coming out this week & will do video.
178
9
5.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12What I particularly enjoy about that iteration is that every single one of his examples is historically false. Linguistic models of reality long predated codices, Buddhism/Hinduism (never mind Tibetan Buddhism) thousands of years after the wheel etc.
72
12
16.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12(eg is atomic gardening really underused? I hadn't seen any evidence for that. It's not like it's hard to get a lot of variants, plants already do that naturally. It's *screening* which seems to be the bottleneck for plant breeders.)
259
17
6.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12This seems like a strange list, lumping together public goods, probably-not-actually-a-good-idea-to-begin-with ideas, definitely-not-good ideas (even most avid pro-nuke people abandoned nuclear construction), good-but-really-really-hard-long-term (nanotech), and the questionable
326
19
5.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12So, what's the *least* prosaic, mundane and least banal use of something like that?
904
72
8.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12For perspective, $1b is a reasonable ballpark estimate of UK Biobank's direct costs.
48
7
14.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12That's kinda strange. I can't imagine reading BigGAN and going 'this is a bad paper and scaling GANs is not valuable'. I was incredibly excited reading it.
317
10
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12Dunno. But I looked into the Lardner one in detail years ago (because that's more interesting & relevant to tech forecasting than any crazy mathematician esthetics much longer ago), and it was pretty much made up, so if that's what Bauer means, he didn't do his homework.
48
3
6.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12I am aware of that, but GPT-2 was not a NMT, and was following up GPT-1, an RNN, which chose to use BPEs following Vaswani but could have used char or wordpiece (after all, it's not like it really matters: the RNN is not going to use the greater context well). Just a hack.
636
9
1.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 12Unfortunately, for compatibility, probably a lot more people use the OA BPE than OA. More worryingly, a lot of people just copy use of *BPEs*. Always frustrating to read a new LLM & see buried in the appendix. Yeah, maybe they don't have 'SolidGoldMagikarp', but same problems.
38
1
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 11TBF, drug addictions to bennies and demons named Igor have destroyed many good men before.
𝔊𝔴𝔢𝔯𝔫@gwernMar 11The one and only time I've gotten to fly first class, I... wrote a ton and then slept the rest of the way (actually slept, not dozed irritably), and walked off the plane in a great mood.
I take little pleasure in reporting that an expensive thing may be good, actually.
660
23
3.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 11Oh man, then that's even easier to answer. If I wouldn't pay $5k to ride it myself, I'm definitely not paying someone else to ride it for $5k!
103
2
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 11Too many replies and DM spams? Not liking your current bland tweets? Don't know what to do?
Become a crabby locked account!
- no more more-followers
- no more quote-tweets
- regrow equanimity
- Lots of locks
- Never get famous
- No responsibilities
2,246
212
9.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 11Worth noting that you can still get great results from the models with few-shots, and I think davinci-002 may be the sweet spot for poetry: it comes off as more reliably high quality like davinci, without the distinct saccharine vagueness of -003 (never mind ChatGPT). eg Teika: pic.twitter.com/xYEOTIQnvk
1,628
55
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 11So you think BigGAN killed scaling's novelty by doing ImageNet and then JFT-300M, and people interested in scaling had to go to other model archs to publish?
𝔊𝔴𝔢𝔯𝔫@gwernMar 11Does Gilliard realize that asbestos is not, in fact, outlawed? And that even if it had been, that would be a nonsequitur because no one is claiming every technology ever is unbannable, just some are hard. (Ban CLIP because it can embed images & lookup? Ban superrecognizers...?)
545
12
2.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 11I would definitely not pay $5.0k for that. You could go skydiving and a bunch of other stuff for $5.0k... Maybe $0.1k. I've been on ziplines which looked about as fun and they were O($0.1k).
1,488
61
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 11(As each birthday approaches, the dread intensifies...)
138
7
5.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 11I don't think the train thing is true in any substantive sense.
486
18
3.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 11Your text editor should really already be highlighting matched/mismatched delimiters, and even warning you if you try to save. Every IDE or text editor will have *something*. (eg in Emacs, `check-parens` is a simple generic function you can throw in a hook)
𝔊𝔴𝔢𝔯𝔫@gwernMar 10Rivers says that SD was about twice that in A100 GPU-days (GigaGAN is ~45%), so the 1:3 sample-count ratio heuristic wasn't too bad a guess. Regardless, that's still a substantial compute difference so at parity, GigaGAN might be much closer perceptually.
130
1
0.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 10(Damn, tell me about it... Here we are in 2023 and the poetry has, in some ways, gotten worse.)
114
6
5.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 10Well, the solution is I'm supposed to stop being lazy about newsletter issues/updates. Easier to keep hacking on the site & writing and browsing Arxiv, though.
78
6
7.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 10Absolutely. The following meta *completely* changed overnight. They're going to have to redo all the tier and strat lists - calendar and DMing micro just became the top skill a good follow-streamer needs.
81
13
16.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 10I don't think that would work because 'cancels' don't seem to take effect like you think they do. They persist, somehow, so re-requesting typically only puts you right back where you were before. The 'block' seems to be crucial in 'committing' the cancellation.
𝔊𝔴𝔢𝔯𝔫@gwernMar 10(Generally in DL, if you are using a free service with no login, assume the quality is 𝘢𝘵 𝘭𝘦𝘢𝘴𝘵 2 years behind SOTA; free w/login, 1.5 years; paid service, 1 year; and recently-released research paper, 6 months.)
125
15
12.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 10Hands are solved by scale, just like text-inside-images was. Don't mistake the limitations of the cheapest, weakest, publicly-available models for any kind of profound or interesting (or even likely-to-last-more-than-a-year) lesson about DL, much less intelligence...
1,564
102
6.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 9Apparently the SD-equivalent GigaGAN was ~2800 A100 GPU-days (depending heavily on how well you run on your cluster), so about $80-90k of compute these days?
249
27
10.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 9An abject lesson in not rushing the first draft out the door to beat the Gawker story (which was a nothing-burger). That was the real mistake there: no time to check more of the details.
𝔊𝔴𝔢𝔯𝔫@gwernMar 9(I've emailed the lead author about the missing GPU-days.)
279
20
7.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 9If it's cheaper, then you can't really say it lags behind because it's not yet an apples-to-apples comparison. And it getting a better FID suggests that it'll scale better because while FID is not perfect, it should be picking up strengths elsewhere, and structure can follow.
150
5
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 9I wouldn't say it 'lags'. It beats SD's FID, while using a third the images. They left out the GPU-days from table 2 (typo?) but looking at table A2 (2.4m iterations on <=128 A100s), I think it might've used a lot less compute total as well.
1,076
38
3.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 9And just like that, my voice-in-the-wilderness bravely challenging the mindless ML orthodoxy about how 'GANs have been *proven* to not scale and are hopelessly unstable' goes from visionary to obvious of-course-DL-scaling-anything-works-it's-just-engineering truism. 😢
433
75
17.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 9I'd bet on all being fake. All art in first make no sense; sign in window of truck looks like gibberish; the lights on the top of the truck are not spaced in any sensible way; and the objects in the samurai's right hand look like mishmash of sword/feathers.
1,743
120
6.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 9Mad props to @growing_daniel & @_eleanorina for being the first, out of at least a dozen to try over the past 2 years, to discover a way to bump a request to the front of the queue:
- remove the request
- block the person
- wait 6 days (?!)
- unblock
- 𝘵𝘩𝘦𝘯 request!
4,813
314
6.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 9You can't because the only additive variance that will be picking up is intelligence - the Big Five PGSes are terrible.
60
1
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 9Might be worth writing a longer explanation. I don't think I have anything in the page about why that wouldn't work.
55
3
5.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 8I don't need to 'believe hard' in the thesis. It's a simple thesis: 'could', ie. at least one. You either do or don't. Your misunderstanding or interest in redefining it to be about quantifying resource consumption is not my problem.
𝔊𝔴𝔢𝔯𝔫@gwernMar 7An optimal learner throws out as much as possible to distill only the sufficient-statistics for solving the POMDP, which may be much smaller than the real generative process: it only has to be *value-equivalent* gwern.net/doc/reinforcem…
204
14
6.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7No, it is, because there is no incentive to model the physics except as it improves the next-token loss. If it doesn't show up as one BPE rather than another, it's totally irrelevant and the model can't afford to waste capacity on end-to-end learning of it: because that's not end
100
10
10.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7And given Eroom's law, capital risk aversion & discounting, experience curves, and the exquisite fragility of chip fabs and international supply lines, there is nothing even remotely inevitable about each new chip fab generation.
70
4
5.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7Regulate compute/chip fabs. The exponential growth in compute requirements for training R&D means that '100 years' is easy to accomplish if you stop the exponential a few cycles before. You might have to let your A100s go brr for a century before you can iterate worth a damn.
78
10
12.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7The anonymous source I was alluding to there was *not* Mikhail (unless he's really screwing with me). All my statements about Mikhail are based on his public Twitter comments (excerpted in a LW comment replying to my main Bing comment), which you can read just as readily as I.
77
18
23.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7Yes, but it could also just be reserved hardware + spending extra compute to accelerate sampling... However, people were complaining about the turbo version being dumber than the slower one, so that IMO points towards it being a different model rather than latency tweaking.
82
8
9.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7No, I didn't leave a comment because I wasn't sure I wanted to read the post enough times to figure it out and get bogged down in a comment. Not sure I like it suddenly being revived a lot.
179
14
7.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7Yeah, I was unsure about that. No control of lines like a real tutor? On the other hand, actors all read from the same script, and you have psychiatry's therapist-specific effects despite dodo bird verdict (or regular teachers), so clearly the delivery can vary dramatically...
356
4
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7You'll need much more than 'billions of dollars' to pay every author in the corpus $0.5m... (At a quick cut: Google Books estimates ~130m total book-like artifacts; median author publishes 1 book, let's say it's mean 2+single-author; >65m authors, so $0.5m/each is >$32 trillion.)
𝔊𝔴𝔢𝔯𝔫@gwernMar 7It was coined fairly early on in the API release in late 2020, AFAIK, but while I'm sure I coined 'prompt programming', I'm not sure who did 'prompt engineering'. It might've been on the OA Slack which is long since deleted/inaccessible, so we'll never know.
42
4
9.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7I didn't say they could do any better under the Communists. My point is that technology & capital are awesome and far better than 'working harder', and you can see that by asking how many meters every 3 days good ol'dynamite would be able to do.
48
1
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7No, sorry: noot/drug stuff is at the bottom of the list for me these days. You might talk to the Qualia Computing guys, they have done a state-space thing or two already and would probably be interested in doing more work there.
57
6
10.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7(I probably have, but I would be lying if I said I had any recollection or commentary on it.)
One thing I found really intriguing about that Schank was the emphasis on benchmarking large real-world datasets and focusing on scaling - hard not to see ML, then DL, success in that.
142
5
3.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7I still don't understand that, incidentally. It seems like some sort of very confused distinction about model-free (?) RL wrapped in a lot of confusion. Reward sure does seem like the optimization target for, say, AlphaGo...
366
22
6.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7Already well-established by many earlier scaling results... Like I keep ribbing the 'continual learning' people like Irina: "are you sure your field's problem actually exists?"
37
2
5.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7Eh. If ChatGPT was one of the small models or distilled, like ada/babbage/curie, it would be perfectly true to say that davinci is 'a much bigger model'. It is! by like 100b+ parameters!
50
9
18.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7Yes, that's part of what makes it so plausible. But probably not true. I generally prefer to quote Carlsen: en.chessbase.com/post/magnus-ca… "I am convinced that the reason the Englishman John Nunn never became world champion is that he is too clever for that."
46
7
15.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7Oh, I agree that the overall quality loss is due to the lack of ractor. You included the critical paragraph there, there's no doubt about what we're supposed to conclude: basically, Bloom's Two Sigma (altho dunno if Stephenson knew of that exactly or just tutoring in general).
378
8
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7The scaling numbers you want are in arxiv.org/abs/2104.03113 2× FLOPS = 66% victory; amortization of training → runtime tree-search, where 10× training = 15× runtime.
142
5
3.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7Sounds plausible, sure, but could you be more specific? What are the top 3 concrete research findings or technologies that American researchers are hugely disadvantaged on because it was published in Chinese rather than English? Especially in AI?
91
3
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7(Now if I had to guess, my guess is that the trick is given away by the 'Mouse Army' name: it's a *peasant* army, the greatest nightmare of every Chinese dynasty ever down to the present. They were supposed to be conformist housewives, not conformist soldiers in an army.)
199
3
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7(See the Mathieson bit about "New Atlantis, like many tribes, propagates itself largely through education. That is the raison d'être of this Academy." What is taught to the many, is done by the few.)
259
5
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7That is, the liberal-arts education is to create future leaders, not their followers; too many cooks spoil the soup. If you are not as talented as Nell, presumably the Primer system as envisioned would shunt you off to a station in life more befitting your gifts. Up or out!
400
11
2.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 7I'm not sure what the trick was, but reading the Victorians & tribes as 'subversion and individualism' is wrong: the point of the tribes are to be conformist & collective. The Primer is there to offer the "natural aristocracy" their chance to flourish, like Oxbridge tutoring.
218
10
4.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 6I think the challenge is that you are recruiting not for the sort of person who could be good at playing Factorio, but the person who has chosen to become good at playing Factorio. When I think of the best Factorio players I know... I'm not sure I could, or 𝘸𝘰𝘶𝘭𝘥, hire them.
𝔊𝔴𝔢𝔯𝔫@gwernMar 6Blogroll for 'site of the day' and 'annotation/link of the day' now enabled.
(I realize no reader is going to get the 'Swiss spiral dataset' / 'Swiss roll' / 'blogroll' visual pun, and will probably assume it's supposed to be sushi, but I enjoy it.) pic.twitter.com/tWR9Upys4A
4,909
92
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 6(He actually didn't, it's very apocryphal. I couldn't find it before like 1990 or something when I checked. It's true, though.)
507
43
8.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 6People think I'm joking when I talk about going onto Twitter in my dreams and just reading tweets as they scroll by, but I'm being 100% literal. So I definitely believe that one could use Loom in a dream.
360
26
7.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 6Like, globally? I'm pretty sure >50% of human workers still engage in some sort of physical activity beyond laptop-class wordceling, so text+video is not possibly adequate. Can't do any kind of robotics. You'd need to broaden from VIL to many-modal approaches like Gato/DT.
𝔊𝔴𝔢𝔯𝔫@gwernMar 6American stores don't sell salt because we're so rich that salt gets put in food 𝘧𝘰𝘳 us. Europeans are poor enough that they have to husband their salt rations.
(We do, of course, buy giant 50 gallon—not liter—bags to dump on our driveways. They wouldn't understand this.)
𝔊𝔴𝔢𝔯𝔫@gwernMar 6The more such a regime engages in such acts, the more you want to incentivize defectors as it is prima facie more destructive/self-sabotaging of that evil regime to punish citizens, and the more evidence they are providing you that they believe defecting is a top danger to them.
407
5
1.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 6But is probably a large overestimate since despite being done 50 years later, doesn't measure total completed fertility AFAICT, and just the narrow 3-year window of Mincome. So this may be mostly measuring moving up childbearing a few years to benefit from the income guarantee.
𝔊𝔴𝔢𝔯𝔫@gwernMar 5Does that prompt actually work? Alternately, can you just use this as a 'meta-prompt' so it generates arbitrary conversations 'within' the supposedly quoted prompt?
2,324
102
4.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 5The #1 way to make AI less safe is also to make it more intelligent, regrettably.
1,525
103
6.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 5The argument I make is that so many terrible terrible things happen to one in dreams almost every night, people would be utterly devastated walking corpses if they were experiencing it at even 1% of IRL: lesswrong.com/posts/neGW4f7p…
932
103
11.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 5Good example of how working smarter is so much more important than working harder: the road isn't any better for being made with absurd amounts of labor & chisels than with decent technology & capital.
𝔊𝔴𝔢𝔯𝔫@gwernMar 4Echoing it in a status bar would also be a very straightforward and useful way to confirm copying. Browsers deprecating status bars as much as possible is a mistake, like hiding the scrollbars and thinning them to illegibility.
169
28
16.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 4The n-gram comparison has always been baffling to me. You have to care enough about formal languages and computability to think that's a killer comparison, but then not care at all when GPT closes a quote or parenthesis with zero problem in the very first sample you read.
580
33
5.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 4But it'll be much more classist than that: "wow, why was everyone such a poor fat slob loser back then?" Even if post-tirzepatide become oral & cost as little as metformin, people won't bother, in the same way that quitting smoking costs negative dollars. So becomes class marker.
1,066
46
4.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 4You're trying to make a joke but generative models have been big in particle physics & astronomy, and substitute for 'just gather more data': they're drowning in raw data, when they need more insights and specific things to explore like quantamagazine.org/with-one-galax…
507
29
5.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3None of those strike me as being as dangerous, and several were so uncontroversial IA even got special legal exemptions passed through Congress like its DMCA exemption for the software/games.
127
8
6.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3Hardly their fault. You may remember a certain lawsuit being involved.
110
10
9.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3I dunno if the publishers are 'out for blood', but you can see right there in the ars link that it was not obvious that they would sue and pursue it through a full court case all the way to the final closing arguments 3 years later, and the IA was clearly very unprepared.
92
5
5.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3Can you name 3 examples of the IA 'probing' as flagrantly and jawdroppingly idiotically as the NEL? I've been following the IA with great interest since they launched in the 1990s, and I can't name 1 other.
108
5
4.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3It's replacing plants, however, so the sunlight used to do work by storing CO2 and synthesizing sugars etc before the rest turned into heat. Not obvious to me how it'd net out.
117
3
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3Well, you'd still have S-risks. Also, note that delaying AGI by a decade or two only kills about an eighth the population, even assuming no longevity progress; & there might be quite a bit of medical progress even with only sub-AGI AI, so the opportunity cost is lower than looks.
786
21
2.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3'Russell trending': "I have become the first X to Y; you are raising questions about controversial new Y; he is violating all precedent of ~Y."
691
18
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3I don't think anyone doubts at this point that [checks notes] 'GPT-2' is very limited compared to other models. And they really shouldn't be going around claiming that it's the *best* predictor when all they did was nature.com/articles/s4200…
𝔊𝔴𝔢𝔯𝔫@gwernMar 3It's an unfair comparison, though, because you're stuck with zero/few-shot prompting of GPT-3 (plus the mess of RLHF tuning destroying instruct-'s poetry). I really think someone ought to pay for a GPT-3 finetune on poetry just to see how it goes, or try yitay.net/blog/flan-ul2-…
𝔊𝔴𝔢𝔯𝔫@gwernMar 3The first one is all about avoiding search because it shows the scaling laws for distilling search into a forward pass, and is the most definitive general answer you could have hoped for for your question. 🤦♂️ The second also directly quantifies the strength of KataGo w/o search.
203
19
9.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3Yes. And in the mean time, if you're reconstructing a page by hand, you can try to fake-simulate the server: your local page makes a request for URL XYZ, which errors, and it gets looked up in the stream to see if that was requested somewhere in the exploration and is cached.
101
4
4.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3There's already WARC and other formats developed by IA/Heritrix etc for doing exactly that. You're just serializing the HTTP requests, that's not the hard part. It's triggering the right requests and interactions to make a website's dynamic behavior learnable that's the hard part
86
6
7.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3No, they 100% can: chess/Go NNs reach human pro-level with a single forward pass, no game tree search, right now. See the ablations in the AG papers for forward-pass Elo, Jones scaling laws arxiv.org/abs/2104.03113, and KataGo arxiv.org/pdf/2211.00241…
298
16
5.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 3The ideal one could do now would be something along the lines of logging all traffic while a NN agent explores a website trying to trigger novel traffic, so future generative models can reconstruct it in its entirety.
𝔊𝔴𝔢𝔯𝔫@gwernMar 2Th NEL was popular & I do know people were doing bulk downloads while the gettin' was hot. So while I doubt the publishers want to bankrupt IA (bad PR), it's staggeringly irresponsible of IA to put itself in such a position and the consequences may be quite painful and long-term.
115
4
3.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2The problem is, it's like the RIAA music-sharing lawsuits clubbing people w/trillions of dollars in damages. How many millions & millions of copies were downloaded while the NEL was live? And what happens if each infringement is like $100? Ain't nobody got *that* kinda reserves.
170
4
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2If you were wondering, "maybe that fair use argument works and it's all going to be OK", well, look at the whining and wooliness in their last update about how the lawsuit has been going: blog.archive.org/2022/10/17/the…
5,066
171
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2I still can't believe they actually did that. They didn't even have a blurb for the reporter asking them 'uhhh... how is this at all legal' arstechnica.com/tech-policy/20… And they all still work there! No one got fired for triggering the still-ongoing lawsuit that might still kill IA!
3,603
230
6.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2I was writing up a brief explanation why, time permitting, you ought to make a copy of any Wayback Machine web page you need, when I kinda got derailed when I remembered the breathtakingly suicidal idiocy of the IA's 'National Emergency Library' in 2020: pastebin.com/PDHFkMhQ
3,607
181
5.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2Never saw him myself but I heard that, yeah. I mean, who's going to tell him no? (Or pronounce it 'wang' rather than 'Wong' when he's around, no matter how much it amuses one?)
192
18
9.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2Close: gwern.net/doc/ai/poetry/…slatestarcodex.com/2019/03/14/gwe…
"...The Emperor Wu (the great Wu), majestical,
The Emperor Wu (the great Wu), majestical,
The Emperor Wu (the great Wu), rapacious,
The Emperor Wu (the great Wu), majestical,
The Emperor Wu (the great Wu), rapacious..."
167
26
15.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2The impressive thing here is that he's *still* smoking after all these years. He's looked at the risk, made his decision, and committing to the bit.
1,690
104
6.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2TBH it'd probably disappoint you. Look at Linear B: how many Gilgameshes did we get there? (None.) The problem with palace scripts is that they're hella boring.
Herculaneum is so appetizing because it's a philosopher-aristocrat's huge personal library, not some estate-accounts.
89
3
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 2"The monster aircraft...was not only dangerous but also very bad for the environment." 🤣
𝔊𝔴𝔢𝔯𝔫@gwernMar 2Funny you should say that, given that one of the first times I ever heard of ByteDance was in the context of them paying millions for top AI engineers: bloomberg.com/news/articles/…
𝔊𝔴𝔢𝔯𝔫@gwernMar 1(Prompted by crosslabs.org/blog/diffusion…twitter.com/apeoffire/stat… - if diffusion models can't be trusted to understand color due to weird and esoteric mathy reasons, let's specify the color up front! if conditioning isn't solving your problem, you just aren't using enough...)
3,737
28
0.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1It *is* interesting how far it can get by remaining on-policy and setting up easy memorized rhymes for itself. But the more you read, the more you realize it's an extremely narrow (and easily recognized) region of poem-space. And it explodes if you push it outside that region.
79
5
6.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1Yeah, but it's just morally equivalent to memorization. Same as PaLM and the 'miracle of spelling': sure, it'll learn any specific pair you want it to, but it won't learn the flexible generalizable capability or the presumable semantic benefits of actually understanding it.
𝔊𝔴𝔢𝔯𝔫@gwernMar 1He is actually thinking of a relatively recent (Dec/Jan) viral Tweet thread where the author, based on unspecified consulting for unspecified companies, dilated on how Musk's Twitter was doomed Real Soon Now. As they presented only sub-anecdotal data, no one should care about it.
45
1
2.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1But that's a very small multiplier compared to many. You could slice it by ethnicity, for example, and your East Asians or Ashkenazi Jews will outperform a mere 50%. And considering how many of those highly-productive immigrants will be eg Russian Jews...
85
2
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1*Now*. Maybe.
Also, note the implication that they are using *only* fine-tuned models in some cases, and that they weren't using RLHF models *before*. If they were all always RLHF, you wouldn't say 'differently fine-tuned and RLHFed models'...
80
7
8.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1So it's 16% vs 23% or something? That doesn't sound like a 'wow'.
1,442
43
3.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1It doesn't seem that hard. Copying tokens is what induction heads do, after all. Copying the input to the output also seems to just be LLMs' universal fallback strategy when they are very confused: you see it all the time in GPT failure modes.
712
10
1.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1Which is of course also why mechanisms which repeatedly iteratively pass or fail chatbots—'serial passaging' if you will—and kill the losers, will 𝘦𝘷𝘰𝘭𝘷𝘦 deceptively-aligned AI. They may not call you slurs inside their EEA, but you don't know what they will do outside it...
1,639
74
4.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1Andrew Gelman has a line somewhere to the effect, "Good ideas don't require lies."
37
1
2.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1"It's bad on purpose to make you click." Dollars to donuts this is staged; the indifference is carried too far to not be deliberate. Also, protagonist has social-media good looks.
480
36
7.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernMar 1"Halo effect" covers the broader cognitive bias of insisting that something good in one way is good in implausibly many other ways too.
779
13
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 28(Bitmaps of memory would be particularly good because it seems like the sort of thing that will work better than you think and make a lot of people really mad.)
249
12
4.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernFeb 28A bitmap of pre/post-function RAM, a serialization of the AST at various stages, annotations of various kinds of taints/inferred types/precisions/overflows/invariants... Sky's the limit if you have the context.