𝔊𝔴𝔢𝔯𝔫@gwernJan 31Or smartphones (esp smartphone social media)! Of all the predicted effects, the ones that seem to be kicking in now, 'kids no longer understand basic computer/OS concepts like "files" or "programs", and are worse at poweruser skills than parents', was among the least predicted.
𝔊𝔴𝔢𝔯𝔫@gwernJan 31(I'm not sure what you mean. There's several DL frameworks in the West for doing thousands of GPUs.)
32
2
6.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 31Their GDP is *not* growing 'very very fast' (it'd be better to ask if it's growing at all given the stats blackout and malinvestment and increasingly dirigiste direction), and it's steadily becoming ever less appealing to 'best talents' - they're more concerned with retention!
93
8
8.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 31(Didn't we just go through this with COVID? Maybe Chinese stuff just isn't that competent or incredible as commentators in the West keep projecting onto them whether it's human genetics or deep learning or COVID. Not as extreme as Russia's military, perhaps, but similar dynamic.)
67
7
10.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 31They're behind in terms of hardware technology and rapidly falling further behind post-embargo; and their data is heavily siloed, focused on e-commerce or natsec which is unhelpful for AGI, and way behind open datasets in the West like Common Crawl or LAION.
87
7
8.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 31Hm? Top 1/2/5, instructor, seems to be almost entirely Western: UWash/Allen and Facebook. arxiv.org/pdf/2212.09741… And then MSR Beijing work is always an awkward example...
Anyway, there are areas like face recognition where I expect Chinese AI to be tops, but are they important?
27
4
14.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 31Judging by how long it's taking everyone else to convincingly catch up to even davinci-001, I'm thinking at least a year, and probably multiple years. They've been lying flat, and OA isn't a real threat to them the way they are to Google.
140
8
5.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 30They weren't bogus, the RNNs just weren't any better than a reactive policy / history stacking, like they should've been on POMDPs. The RNNs doing the same or worse was quite reproducible & genuine.
(Karpathy's law: "NNs want to work.")
524
17
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 30R2D2 got its big performance boosts actually utilizing the RNN hidden state because... apparently everyone was zeroing out the hidden state when doing BPTT before! So ofc the agents never wound up making any use of history/memory.
879
66
7.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 30Based on reproducibility and methodology studies, as well as all the incidents like R2D2, I feel confident in saying there are lots of one line research secrets—so secret even the original authors don't know which line is secret.
𝔊𝔴𝔢𝔯𝔫@gwernJan 30I was enthusiastic about it, but the complexity feels dangerous, and people more experienced with Minecraft RL than me say that the env changes like block-breaking speed make the problem much easier than I expect, so I'm unsure enough about it to mention it in that list.
𝔊𝔴𝔢𝔯𝔫@gwernJan 30"AutoML 2.0: just make the model so large that it internally contains all possible archs AutoML 1.0 might search over and can ensemble them."
3,839
86
2.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 30Yeah, that Baidu thing prompted this. I expect it to suck. None of their LMs have come anywhere near GPT-3 and they lack the 3 years of preference-learning data to do any tuning on. People keep underestimating how very well OA executes on LMs, and how easy it is to be mediocre.
281
35
12.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 30Jan 2023: in the past year we've seen in the West Chinchilla, Dramatron, Gato, DALL-E 2, Flan/U-PaLM, Stable Diffusion, Whisper, CICERO/DeepNash, Imagen Video/Phenaki, ChatGPT etc etc.
Can you name even 3 Chinese AI results as important?
(Besides GLM, which everyone says sucks.)
6,493
211
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 30It'd make a fascinating benchmark/grand challenge for large-scale AI fiction: you have a really large initial corpus + even larger secondary corpus to bootstrap off, with many world details to keep straight, and a large audience that you could segment & test various completions.
551
20
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29How can Anthropic be of any importance? It doesn't even have a Wikipedia article.
6,032
101
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29Speaking of which, the pressure on LLMs to overload their context window for both data prediction and 'thinking' is a built-in pressure for steganographic codes being developed when any RL pressure is applied: lesswrong.com/posts/bwyKCQD7…
224
29
12.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29FWIW, Archive-Binge is in permanent maintenance mode, but they released the software github.com/Respheal/archi… for reference. Maybe one of the big RSS reader services like Feedly or The Old Reader could be persuaded to implement it as a feature...
162
3
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29'automated parsing' is still not a good idea, and you're now going way outside 'HTML+CSS' when you invoke AIs decompiling it to reinject semantic tagging. (It would be a lot saner if, say, the original sources were already double-spaced, and you simply had to preserve that.)
131
10
7.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29(Ah yes, exactly what I want to do, integrate automated parsing and AI models into my already Rube Goldbergian site generation pipeline.)
116
4
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29How do you even define 'sentence'? 'A period then a space'? There's many more than one kind of spaces, whitespace inside the HTML file is not whitespace as visible (think \n wrapping), and there are many ways to use periods, wouldn't you agree, Mr. Mohr?
169
11
6.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29"Welcome, class of 2023!
Look to your left; now look to your right. Did you see someone, because they have face or hand-doxxed themselves? Then they're ngmi.
The rest of you: well done. You have passed the first mirror test."
218
8
3.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29That would screw a lot of things up, like numbers or abbreviations.
(The lack of double-spacing to encode 'end of sentence' rather than all other period uses has other downstream problems: Emacs has many 'sentence' functions which are less reliable if you don't double-space.)
97
1
1.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29They would probably regard that as a win.
(Women out there, be careful: don't hand-doxx yourself on social media!)
385
12
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29Today's design dead end: double-spacing periods (sentence-spacing), or single? The research is scant, low-quality, and you can't get half the papers (which doesn't stop people from citing them anyway...); even if I wanted to A/B test it, there's no good way to do it in HTML. 😓🤷♂️
5,010
44
0.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29The real nude pros are generating AI bodies and then carefully photoshopping crops of their real hands onto the AI hands with inpainting around the hands to stitch it up.
𝔊𝔴𝔢𝔯𝔫@gwernJan 29Yep. It's like tag: you want to dodge as last second as possible (graze the bullet!). The cat waits late because it 𝘤𝘢𝘯 wait late.
54
3
5.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29That sounds dubious. You haven't controlled for their original genes (and no, throwing a random PGS in doesn't 'control for that', you know that), which group differences you know exist, so you still don't know whether the epigenetic differences are genetic or environmental.
2,694
69
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 29Hands are the cats of body parts. Just as GANs were knocking out photorealistic faces but turning out nightmarish cats...
306
18
5.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 28I'm partial to the '<|endoftext|>' token because it screws with you by not always being encoded to <|endoftext|> like you naturally assume, and generally lending itself to in-band input parsing hacks and vulnerabilities.
3,077
39
1.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 28That's a pretty deep question about language! The tack I would take would be 'what multi-agent RL environments/tasks/distributions induce language'.
76
6
7.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 28That struggles to explain any result involving synthetic data, and human cognition is definitely displayed in lots of modalities like video or RL tasks, but yes, probably something like that is why you can learn semantics from syntax & superintelligent octopii can play chess.
314
6
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 28Oh, that the scaling works and you even 𝘩𝘢𝘷𝘦 these large models to do asymmetrical cross-modality tricks like Flamingo or SayCan with, of course.
119
8
6.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 28This analysis would be much better if you had used the Playground interface instead to davinci-003 and looked at the likelihood of predicted tokens; you make plausible guesses, but I predict that the actual tokens would show that it's thinking along different lines sometimes.
43
1
2.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 28Newbies are always shocked how large LLMs are compared to image stuff.
The second-most interesting problem in philosophy of mind, language, & epistemology right now is the asymmetry between language models/everything else: LMs transfer to other domains, but 𝘯𝘰𝘵 vice-versa.
𝔊𝔴𝔢𝔯𝔫@gwernJan 27Yes, it has an objective but one of unclear importance, much like asking a LLM to answer PubMed questions or measuring preplexity loss etc. All the important stuff of AF2 is downstream - often, not even using AF2 but using DL models the protein guys would never have made w/o it.
518
5
1.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 27This is pretty hard because so many of the good uses are hard to pin down (look at ChatGPT rn for variety and difficulty of evaluating utility). Take AlphaFold1/2 as a benchmark: what predictions should one have made in advance for 'DL does something good in protein science'?
796
12
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 27Then why does that entire paragraph exist? Surely it'd make way more sense to talk about stuff like Minerva or the rash of ChatGPT/davinci-003 evals?
𝔊𝔴𝔢𝔯𝔫@gwernJan 27You explicitly dismiss that, though: "ScholarBERT is a relatively small model (770M parameters) so one can always think that maybe 100x parameter count would lead to better performance at Solving Science but I doubt it."
But 100x doesn't even take you to GPT-3-175b, or PaLM!
𝔊𝔴𝔢𝔯𝔫@gwernJan 26(Under those circumstances, you would see little difference between, say, GPT-2-774M & GPT-2-1.5b...)
341
3
0.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26The ScholarBERT example isn't a compelling example of scaling failing, especially given all the other successes. It's 2x param-count max diff, non-optimized, old arch known to have weak pretraining loss, with large downstream finetuning datasets, and larger still was better.
3,740
37
1.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26I don't think it was *that* cranky, but if it was, then obviously the nuclear chain reaction is very 'really far out there, cranky' & not at all like ordinary garden-variety chemical reactions, and would not be an obvious thing to present to a skeptical Monsieur Chollet pre-1938
64
6
9.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26Could you expand about post-2014? He overshot how well compute would increase (but considering the extreme pessimism I remember from most people in 2009 about 'Moore's law is dead', he was a lot less wrong than them), but I don't remember any other major errors offhand.
46
11
23.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26Well then, if it doesn't happen, Kurzweil will be wrong about AGI *and* most of the rest, as opposed to just most of the rest, while Moravec & Legg was mostly just wrong about AGI and not most of the rest.
51
9
17.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26I'm not a Kurzweil fan & never have been. We very obviously don't see the increasing acceleration across all fields that he was arguing for; when I helped grade his predictions for a LW project, I was even less impressed by them or his self-grading. (They haven't gotten better.)
56
3
5.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26So it's possible that if you provide a memory mechanism which doesn't overload the predicted tokens to double as a short-term/working memory, like Transformer-XL or something, it'll automatically inner-monologue at some scale using that, just to predict the next token-answer.
348
12
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26One hypothesis people gesture at is the lack of a built-in memory: default text just presents 'the answer'. You don't normally 'show your work'. But LLMs right now need to monologue explicitly, which is highly unlikely, so that forces them to emit the answer immediately.
248
12
4.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26I think it's an interesting question how to get inner-monologue behavior 'organically' or 'spontaneously', without explicit prompting or tuning. Right now, we get 'hidden scaling' where they *could* monologue for greater perf but just don't by default. That's bad.
154
4
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26Just because you swap out one word for another doesn't mean that they are at all the same thing, or that they were obvious (why didn't *he* propose nuclear chain reactions, then? Why did it take until Szilard? Who's publishing it in all that time after Szilard secretly did?).
29
1
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26Einstein's formula did not make it clear that there was such a thing as a chain reaction, that there were elements which supported chain reactions, that chain reactions would go critical, that any of those elements were around in feasible amounts, that they could be separated...
103
2
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26And chemical chain reactions are a useful analogy but hardly prove much of anything to nuclear chain reactions. (Was there even an element which *could* act in such a way? Szilard didn't know! blog.nuclearsecrecy.com/2014/05/16/szi… Among other problems with saying 'yeah, he totally did it all')
24
2
8.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26Yes - a *secret* patent! Which is fine if your name is 'Leo Szilard', not, 'everyone else who might be named Monsieur Chollet & is demanding the exact principle be explained to them publicly'. I chose '1939' because that was when the chain reaction idea was fully public w/Hahn.
32
2
6.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26Meehl isn't just Meehl, it's a whole distinct 'Minnesota school' of individual-differences-psychology+psychometrics+behavioral-genetics... I don't know a good summary although the first page of twitter.com/gwern/status/1… is as good a place to start as any...
741
18
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26People were discussing 'atomic bombs' of some sort at least as early as Wells: it was a new area with obvious large potential (see: 'the sun'). They obviously were not discussing the *exact* mechanism of chain reaction (if they had been, that would render my analogy irrelevant).
43
2
4.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26I don't think that matters really. Those people are still around and still part of the denominator, because most of them try to stay in the US. And if they go back to a poorer country because they lose, that emphasizes even further that being a PhD grad student isn't very elite.
266
6
2.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26We *did* scale them up a long time ago! Brock was training on JFT-300M 5-6 years ago! We were training on YFCC100M+~10m more 3 years ago!
57
2
3.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 26Doesn't do me much good to make links that work only in browsers I don't use.
𝔊𝔴𝔢𝔯𝔫@gwernJan 26If you take away only 1 thing from reading my site, which you will use for the rest of your life, I hope…
it's knowing you can link a PDF page using `#page=n` anchors in the URL.
🙏🥺 twitter.com/gwern/status/1…
𝔊𝔴𝔢𝔯𝔫@gwernJan 25It's more awkward to talk about him because he's a one-weird-trick dude and the trick failed badly for most of his non-AI predictions; he's a Texas sharpshooter. Meanwhile, others we do talk about more, like Legg or Moravec, tailored their predictions much more narrowly to DL.
5,489
130
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 25Does arxiv.org/abs/2004.02967… not successfully remove the need for BN in GAN Discriminators? It seems fine in BigGAN there, and I'd expect Brock to know.
1,351
21
1.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 24I'd expect that to be a large fraction. Lots of higher ed unis aren't doing PhDs at all, and given how many of 'top uni' PhDs land at lower institutions and spill out everywhere else, they have to be producing a healthy fraction of the oversupply.
1,059
33
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 24Even if having been a PhD student was strictly necessary and a superset of eliteness, that's still not very 'elite'. It's not even close to the famous but still extremely broad '1%' (ie 3.2m people out of 320m).
1,281
36
2.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 24I think she's right but this might have more to do with the dilution of being a PhD student. At this point in higher ed hypertrophy, what % of the US population is going to be a PhD grad student at some point in their lives? 5%? (50k PhDs/year, 3.6m births; figure half dropout).
𝔊𝔴𝔢𝔯𝔫@gwernJan 24What's hard about scaling up GANs, exactly, which makes them harder than diffusion or AR? (You are forbidden to use the word 'stabl*' in your reply.) A G is just a bunch of upscaling layers from a random seed. A D, in reverse, to a scalar.
412
20
4.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 24github.com/TheAppleTucker… Like, it clearly can work, but you are going to have problems getting any useful behavior out of 1kb of state (prompt window) if you eschew any intermediate code generation steps. '1kb' doesn't even cover the full state of a tweet.
316
10
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 24A popular claim but looking at Global Burden of Disease vizhub.healthdata.org/gbd-compare/, physical causes like heart disease or infant diseases or stroke still seem to solidly dominate anxiety/SCZ/MDD/etc.
6,051
200
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 24The problem is that like reviewing _Sword of Shannara_, there's not really any substance *to* focus on. I read _Eragon_ when the movie came out, and thought, 'yeah, that's exactly what I'd expect from very talented 15yo American teen still digesting Tolkien'. What's left to say?
81
10
12.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 24It 𝘥𝘪𝘥 get a lot of press. Mostly about how bad it was compared to GPT-3 (never mind ChatGPT) before they took it offline.
(There is a valid point to saying that ChatGPT isn't incredibly far ahead; unfortunately, when it comes from FAIR, it comes off as sour grapes...)
𝔊𝔴𝔢𝔯𝔫@gwernJan 24It's also not fully scaled up, to which there is no bar (eg no stability issues). As they point out, they use a quarter of the compute SD does, and it's received way less tweaking and tuning than it. Some proper scaling laws, hyperparameter sweeps, and Parti-level compute...
202
12
5.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 24Like I've been saying, stability is not actually a problem for scaling up GANs. It just isn't, any more than for other archs. It's an academic urban legend spread by people cargo-culting claims from 5+ years ago as an excuse to jump on the latest researcher fad like diffusion. pic.twitter.com/mpYTsYhmX1
99
23
23.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 23No, it's not, and you should be ashamed of browbeating like that here and elsewhere on Twitter. We know general intelligences exist and have catastrophic effects much better than we knew nuclear bombs were at all possible, because we exist.
836
123
14.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 23Consider applying this criteria to nuclear bombs, discussed decades in advance. If you had demanded the exact principle, you would have willfully remained ignorant and a denialist until 1939, a year before researchers went dark and <3 years before the Manhattan Project began.
14,917
1,001
6.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 23'mewtwo' instead of 'mew' - are you a real '90s kid?
5,688
87
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 23Still not easy, though. When I said that it could be written in Stan, what I meant was 'even carefully avoiding discrete stuff that Stan can't do, I got bogged down and couldn't quite make it work'...
𝔊𝔴𝔢𝔯𝔫@gwernJan 23Indeed. There are many reasons for the tradeoff, so it's not going away, not while people are still trapped in single human bodies with only 24 serial hours in the day.
154
6
3.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 23So, they open source only the stuff which doesn't really matter. You still aren't having your cake & eating it too in terms of publication count compared to alternative career paths like going after R1 tenure. That you get non-zero publications is a nice fringe benefit of the $$$
128
6
4.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 23That's exactly why you do need to worry: your human intuitions are obsolete. Because humans can't copy themselves and take both forks in the road, and there is an effectively fixed supply of such humans. AIs can and scale it as many GPUs as you can buy, borrow, or steal.
301
26
8.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 23As one observes of people who get hired by Google or NSA or Jane Street or Renaissance: you can often tell when simply by when their blog or other publications abruptly slow to a trickle.
169
11
6.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 23You see occasional papers, but you're never going to see any real papers on major stuff. So it'll be like Kelly or public-key crypto: "X discovered it 30 years before at ABC, but they didn't publish". Publishing is not what any of them maximize or even try for, so... they don't.
197
14
7.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 22Phrase it however you like, as multiple choice or free response. That dictionary is still going to lay there. I've owned a Compact OED for nigh on a score years, and it's never so much as wished me a 'good morning'.
180
13
7.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 22You read weather reports anxiously because you're worried about overheating compute nodes interrupting AI scaling research runs; I read them anxiously because I'm worried about cold cats sleeping on top of my node downloading AI scaling papers.
𝘞𝘦 𝘢𝘳𝘦 𝘯𝘰𝘵 𝘵𝘩𝘦 𝘴𝘢𝘮𝘦
𝔊𝔴𝔢𝔯𝔫@gwernJan 22(Another big difference is that given how little good 99.99% of COVID reading/writing/doomscrolling did, a large number of individuals would have been better off in May 2020 spending that time reading about, say, GPT-3... 😉)
𝔊𝔴𝔢𝔯𝔫@gwernJan 22And people say ads have zero information value or relevance and targeting is useless!
43
1
2.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 22If a dictionary could pass a word meaning exam, I would in fact be extremely impressed, and would not complain about it flunking my math exam, because dictionaries ordinarily just lay there on a desk and do nothing. pic.twitter.com/l3UHSvpaMu
6,211
318
5.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 22He's a tenured professor who could doubtless consult for handsome fees, and so I'm sure by net wealth he's far above the 50th percentile... but had he gone into quantitative finance instead of Fields-worthy pursuits, his percentile would be far, far, far higher.
225
15
6.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 22(That is, if you believed that IQ had to correlate like r=.9 with all these different measures to be 'important', you are saying 'I believe in a world where most billionaires are publishing 100 papers/year while also being elected president, winning Pulitzers, & living to 100.')
489
31
6.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 22(In general, standard theories, datasets, and statistical method seem very poor at handling index variables with this sort of competing or zero-sum structure among the measured variables: old.reddit.com/r/statistics/c… A factor analysis wouldn't even correctly model this IQ example.)
432
44
10.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 22I think of this when people trot out 'IQ only correlates 0.x with log income': true, but tends to overlook the tradeoffs - if you want to publish papers & patents, you can't also work at Jane Street & earn Jane Street $$$. Pearson correlation on single trait won't capture latent.
8,433
238
2.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 22(It's unironically a valid isekai premise, IMO. It even comes with a built-in mechanism, like _Dr Who_, for switching up viewpoints regularly to renew and grow the series while maintaining a semi-stable immortal protagonist.)
44
2
4.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 22Weird, but common: gwern.net/Timing
(This is also why Schmidhubering is so pointless: not only is the 'first publication' often trivial and useless, it is often not even causally connected to later, successful, instances, which simply forge their intellectual pedigree.)
80
13
16.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21I rewatched _Madoka_ recently after watching it during airing. What a perfectly constructed anime, even better than I realized at the time.
443
31
7.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21Yeah it's always been the case that the last layer or two isn't just drop-in embedding like a CNN classier - even something like iGPT is doing stuff like combining a bunch of arbitrary-looking layers to get a useful embedding for the linear probing evaluation.
2,714
19
0.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21This is pretty amazing. I can't think of any house which less embodies the rationality of farm architecture, which accomplishes its function extremely efficiently, than the ugly Steiner House built on arbitrary geometric schema unrelated to any function of or utility of occupants
784
42
5.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21Be sometime before LMs can just spit back megabytes of JSON data or read the raw on-disk binary of your SQL database, so you're going to be generating code at some point.
842
17
2.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21You mean it'd generate code on the backend to execute the request, and cache it? One would then fulfill manually cases where it couldn't, and finetune further. Security issues aside, that could be pretty interesting capabilities.
𝔊𝔴𝔢𝔯𝔫@gwernJan 21The study of what is 𝘳𝘦𝘢𝘭𝘭𝘺 going on in Neal Stephenson's interlinked novels is known as Enoch Root cause analysis.
9,113
131
1.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21``I fear not the dog who howls a thousand howls once, but the dog who has howled one howl a thousand times.’’
—Bark Lee
85
4
4.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21Hm, not sure I did. I remembered that there were a bunch of ones touching on memories of various sorts, but not that they were linked such that I was missing the point of 'Onald Creely'.
102
5
4.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21(I left out "Onald Creely" because the overall conceit didn't work for me like the dream-job one eg, and it felt overly derivative of _A Lesson Is Learned_.)
97
2
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21To some degree. You still have to filter even after generating. The more short-term transition will be creators following up on winning tickets: "I have no idea why fantasy lobsterpunk is the most popular premise I ever invented, but I'll write 20 novels with GPT-4 this month."
101
6
5.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21Simonton and meta-science: there is surprisingly little observed correlation between quantity and quality of output. Apparently when it comes to creativity or research, there's no knob people can easily turn. Each new work is a stab in the dark, a lottery ticket - so buy lots.
78
4
5.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21Sure. It's just another way of tokenizing pixels; unusually bad, but still. The interesting possibility is if GPT-3 somehow gets it from Internet data because eg existing ASCII art is somehow enough to induce it.
88
6
6.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 21But does GPT-3 classify/describe any of it accurately?
𝔊𝔴𝔢𝔯𝔫@gwernJan 20So far so good...
Also added a simple quote-of-the-day feature (just an epigraph wrapper + transclude, easy); an oldschool Web 1.0 feature I feel is appropriate. 😉
8,874
73
0.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 20Not to pick on DC here... it's a webcomic antipattern, and I wouldn't even consider DC the saddest example, that'd be _Megatokyo_ (yes, still running). A short draft essay on this antipattern from a _Berserk_ review I've been writing: pastebin.mozilla.org/BLi3sDT2
𝔊𝔴𝔢𝔯𝔫@gwernJan 20Hm, did you check that it knew your style in the first place? I already checked GPT-3 knows 'gwern' in terms of topics, style, and even formatting (gwern.net/GPT-3-nonficti…), otherwise zero-shot text style transfer would be pointless. ('Pirate' checks that ChatGPT isn't broken.)
100
10
10.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 20It's not as bad as listening to a recording of yourself talking, but I still wince a little looking at these. ("Surely I don't sound like 𝘵𝘩𝘢𝘵...?") pic.twitter.com/fnChKAK0vX
134
6
4.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 20(That is, this comment looks identical to me as a comment 'Actors routinely are thin, so diet and exercise seem pretty routine, I just saw a lot of muscular actors in _300_ with hardly any body fat; why don't more people take advantage of whatever they did instead of wishing?')
410
24
5.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 20I deny the premise. How do you know that actors 'routinely' eliminate accents? Actors are enormously highly selected due to immense oversupply, and still, some actors are famous for handling accents (eg Meryl Streep). Also, failure is standard plot point in 'talkie' histories.
7,400
177
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 20This is a Frankenstein StyleGAN that @AydaoAI developed. See gwern.net/Faces#stylegan…
And it doesn't use SG, but plenty of others do without magically working. Also, embeds from ImageNet CNNs etc is another very old GAN trick (most recently, Projected GAN).
62
9
14.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 20'minibatch discrimination' is an old thing, and there's also BN in many of these archs, yeah. It's striking that BigGAN sees improvements in minibatch size up to like 20k with no plateau by then, and note that many contrastive approaches like CLIP need really large batchsizes.
48
1
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 20A good example of amplifying a particular niche in the data distribution to hide from the D: like long sleeves or bad crops, that is a legit mode in the data - she even looks like Yakumo Yukari! danbooru.donmai.us/posts?tags=yak… (link may contain NSFW images).
𝔊𝔴𝔢𝔯𝔫@gwernJan 20I don't know what you did with FTX beyond 'like a bazillion other people, worked for an org which got some money from them', but if it was as concrete and specific as 'hey, you tried to give a decent fraction of a million bucks to neonazis, where the alt hypo is nepotism: ???'...
278
15
5.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 19Waking up January 13th and going 'my goodness! those journalists who were doing a journalism on us, those cheeky lads went and did a journalism! I say - what *will* we say?' is not particularly impressive nonprofit practice either.
523
19
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 19I dunno man, if a major newspaper contacts you asking you why you're giving money to neonazis and if you have any comments on it you'd like to give to a newspaper, doing reporting, using journalists, you *might* start discussing it with the nonprofit & thinking of a response.
736
49
6.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 19I don't see why this is so exculpatory. The clock doesn't start ticking on January 13th, it starts ticking in mid-December when Expo contacted Tegmark (not 'FLI') and he ghosts them. And if his mother died the same day Expo contacted FLI after ghosting, then that can't explain it
3,210
58
1.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 19I don't think it can be rescued now: even the 'picked up pace' is mostly about wanking around with trans/enby self-insert fanfic, so is not progress. IMO, it's sheer sunk cost. Diaz would be better off dumping an outline, killing DC, and doing something they actually want to do.
320
18
5.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 19Very precisely: "Dark Science #1". Diaz decided to start a 'serious' Grand Dramatic Narrative which all the earlier strips had hinted at, but it's so terminally boring and slow-moving and uninteresting that he can't make himself do more than a few strips a year, so even slower.
𝔊𝔴𝔢𝔯𝔫@gwernJan 19Exactly. Having 'put it really at rest' is exactly what it looks like when you're wrong!
Also, we're really going to take Teller's word for anything on this (right after Oppenheimer clearance news, even)? The man had more of a hardon for anything nukes than dogs for legs.
48
2
4.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 19Still running. Early DC is great, past decade+ is bad: if you're in a hurry, I made a list of ones I liked gwern.net/newsletter/201… 'Funny SF+_A Lesson Is Learned_ webcomic set in post-apocalyptic simulation.'
102
28
27.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 19Unfortunate that there's no learning going on. The correlation with initial blind priors about the algorithm (they are given no info on accuracy) being highly accurate suggests it's mostly just self-selection into overconfidence, which they'd do better on given any info on errors
3,270
34
1.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 19If that is contributing, that makes the comparison with 1997 *even more striking*, that so many jurisdictions were willing to goldplate requirements and/or outlaw a basic necessity in many places.
𝔊𝔴𝔢𝔯𝔫@gwernJan 19If this is the 'same story', why is it totally different from the book version (crawl or somersault? Benjamin, or Phil? did he cover his face, or did he cover everything *but* his face? all undetected, or not the first?), and why should I believe either one after comparing them? pic.twitter.com/E6ZzsWWSUD
𝔊𝔴𝔢𝔯𝔫@gwernJan 18(That is, we may never 'run out' of raw sewage Internet text token data, in the same way we never run out of many natural resources, not because it became sky-high expensive to extract, but other substitutes got much better than the original and no one even wanted to use it all.)
1,375
38
2.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 18Given stuff like active learning/data distillation, instruction-tuning, and inner-monologue, we already know almost all data is useless to begin with, while sampling from a model is cheap. So not too hard to beat naive token scaling.
2,203
51
2.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 18Only if you thought token scaling was always the cheapest. As it's a power law and gets expensive fast, it's not hard for other scaling improvements to beat X-more-tokens scaling, and many improvements already have, like Chinchilla param+token scaling or ULM.
2,194
36
1.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 18LW discussion: lesswrong.com/posts/Couhhp4p…
I don't see it as a big threat to scaling. Multi-modal tokenization, Whisper-style ASR, training on private datasets like emails, reuse of tokens (at least several times doesn't seem to be penalized too badly), inner-monologue generation...
618
55
8.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 18Yeah, but counting words seems like it should be easy, BPE or no, because BPEs are space-separated, so it boils down to counting 4-5 spaces modulo punctuation etc. So I'm not sure if BPEs can explain difficulty in counting words (rather than *letters*).
𝔊𝔴𝔢𝔯𝔫@gwernJan 17(Hehe. I *did* know people there - go go Fife & Drum Corps! - at least until new management started to screw things up.)
241
10
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 17Any approach which requires 60 million learnable parameters is an obvious dead end (see VC dimension etc). Still, perhaps it can help inspire some better neurosymbolic approach: the learned filters are interesting, and apparently a better basis function than Gabor filter banks.
363
33
9.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 17That's pretty nifty - for once, connectionist stuff works! Aside from the Schmidhuber lab boasting about some simple digit recognition stuff, it never has before. Still, even if you throw a ridiculous amount of hardware & parameters at it, seems unlikely it'll dethrone CRF etc.
5,303
145
2.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 17I can't find any evidence or reference to a Mayo essay, and the wording of the 'extended' quote is so like that of a Martin passage on pg5 archive.org/details/b29976… (and no references to Mayo essay despite numerous refs in his autobio) that I'm going to say misattribution.
𝔊𝔴𝔢𝔯𝔫@gwernJan 17'CBT was significantly more effective than other psychotherapies, but the difference was small (g=0.06; 95% CI: 0-0.12) and became non-significant in most sensitivity analyses.'
[quacks like a dodo]
4,303
82
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 17The answer, as it often is to 'the most X programming language in wide use', is clearly 'Excel'.
8,202
211
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 17Dairy yields increase like 1% a year or something, IIRC, yes. But I'm not sure if you could easily reject the claim that it's *linear* when modern dairy is still relatively new, has multiple epoches, and the percent is so small.
44
0
0.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 17Similar to why you shouldn't try to measure interaction effects when you need many times the sample size to approach any precision, or should assume sparsity. I have seen many people argue 'yes, effect X [eg Pygmalion] didn't work out for them, but it might still work for *us*'. pic.twitter.com/sy6gYeB5yh
91
2
2.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 17Naming them 'Hawthorne effects', when the original wasn't real to begin with, ascribes much higher prior probability (and effect size) to them than they merit. You may be much better off saying 'there are never Hawthorne effects' than in trying to think about them...
112
8
7.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 17It's post hoc, but Kaplan claims his volume nearest-neighbor interpolation generates the scaling laws.
1,770
27
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 17As my Clippy story only uses real examples like Shellshock or Mirai which actually happened in the real world...
𝔊𝔴𝔢𝔯𝔫@gwernJan 16Around 60 years ago, in 1965, Super 8 revolutionized home video (and amateur film making), enabling everyone to cheaply and easily record video of their kids to bore other people with ad nauseam, and they did. My family has some.
67
1
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 16Ah, well, that's a deep question in general. For example, why does GDP grow on such eerily straight lines?
483
15
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 16This has already been the case since at least TWDNE, I (only slightly) regret to report.
138
6
4.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 16Learning curves are usually quantified in terms of total units manufactured or something like that, so a unit/input (like bushel/acre) isn't a good way to plot it. Could be something like an exponential increase in total production offsetting the expected slowdown?
5,629
50
0.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 16Fortunately, aside from being able to replicate it yourself, it's been used as baselines in some of the replication efforts and worked out fine there IIRC. If Stroop fails, we may have to just quit this whole 'psychology' business. 😓
𝔊𝔴𝔢𝔯𝔫@gwernJan 16You're far from the first! I was chatting with a nice old man at Defcon last year who was very disappointed when I told him the Hawthorne effect was bunk and didn't replicate. You learn these things, and then never hear about the followup...
𝔊𝔴𝔢𝔯𝔫@gwernJan 15(Which they did in fact do and also condoned/encouraged the use of wartime censorship and eg the FBI to eliminate or investigate public discussion of nuclear bombs.)
7,777
161
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 15Reminds me of prompt hacking, except the prompt is comment strings, which code models learn to interpret as the high-level summary of what to generate at the low level of code. As often when mixing levels of data, a defense on one level is bypassed by going to another level.
𝔊𝔴𝔢𝔯𝔫@gwernJan 15I liked that it actually had ambitions in its parallel fatherhood plots, the underwater acting was great, and the 3D reminded me that none of the imitators came close to being as good 3D movies as Avatar 1 was. (Also, loads of aliens died, just mostly offscreen like the whales.)
89
9
10.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 14Interesting. Might be the first degenerate repetition trap I've seen in ChatGPT so far.
4,937
99
2.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 14Font advertising is one of the most incredible genres of advertising copy I know. If you thought wine tastings were overwrought, they have nothing on font specimen pages.
5,275
122
2.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 14Still planning on doing this but ugh, right as I was researching it, I got a flu or cold, probably from weekend trip to see _Avatar 2_ (which was good). Still hammering me with tiredness and postnasal drip. The past year has been like being a kid again... 🤮
𝔊𝔴𝔢𝔯𝔫@gwernJan 13Nah, the reason he had *a* dream was because he didn't use melatonin like I do.
109
11
10.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 13I do. I figure the people in my dreams I get ideas from have enough ideas IRL that they'll never notice one missing.
3,637
58
1.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 13You can probe the larger ones partially to test hypotheses about what scaling curve they are on: update arxiv.org/abs/1406.3896 for scaling law research.
64
2
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 13That's not how chip supplychains work, and all those billions of words/day of OA API and embeddings and millions of DALL-E 2 images and ChatGPT convos and OA R&D don't just magic themselves out of thin air.
𝔊𝔴𝔢𝔯𝔫@gwernJan 13Doubtful. It seems like a violation of both 'attribution' and 'integrity' moral rights en.wikipedia.org/wiki/Moral_rig… as well as a violation of any contractual terms (Canada has no first sale doctrine which would protect resale of mutilated physical books).
67
2
3.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 13Hasbro/WoTC concedes defeat (for now) in a masterpiece of PR lies: dndbeyond.com/posts/1423-an-… Incredible. Not sure the last time I've seen so much snakespeech packed into one short statement.
𝔊𝔴𝔢𝔯𝔫@gwernJan 13To pool my and some others' anecdotes, with electric it's surprisingly easy to forget to turn it off or notice which burner is on, and wind up melting a tupperware or something. One can see how that sort of thing would be bad at a national scale...
𝔊𝔴𝔢𝔯𝔫@gwernJan 12I dunno what Metz & Weise mean by that, but they obviously didn't spend $3b 'training ChatGPT'. (There's not even a way *to* spend $3b on RLHF right now.) It probably means something closer to '$3b for buying lots of GPUs which are running all their services and R&D indefinitely'
𝔊𝔴𝔢𝔯𝔫@gwernJan 12Simple 1-question test for Millennial women to test paracosmness: "how much did you like 𝘈 𝘓𝘪𝘵𝘵𝘭𝘦 𝘗𝘳𝘪𝘯𝘤𝘦𝘴𝘴?"
225
5
2.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 12I take it back: paizo.com/community/blog… How ironic that the for-profit corp Paizo et al are saving D&D, while the 'public benefit' charity Kickstarter sells them out the instant they get an offer...
𝔊𝔴𝔢𝔯𝔫@gwernJan 12Indeed. I'd say it's low-hanging fruit and researchers are just too exhausted by the end to get anything better, but it also seems to be one of the areas where no exploration strategy consistently works, so evolved against, because something (...Scale™?) is missing...
111
4
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 12Regression to the mean is going to be a big part of it. Much like how the sequel to the award-winning movie or novel tends to disappoint you.
𝔊𝔴𝔢𝔯𝔫@gwernJan 12Hm, I thought they copied that from VPT like the other settings, but now that I double-check the wording, they actually say they copy arxiv.org/abs/2106.14876 ... Should that matter much? All the agents get non-zero items so they are successfully breaking a lot of blocks, right?
𝔊𝔴𝔢𝔯𝔫@gwernJan 12(Not that I know how you would accidentally create such a Minecraft agent or break the evaluation to 'fake' getting diamonds so often! just that I'd like some more insight into how it can work so well or why my expectations are miscalibrated.)
912
22
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 12I agree with the Reddit comment that it seems almost *too* good: how does it do Minecraft exploration so well on such small GPU-time when the changes don't seem that big a deal? Contrast that with VPT which does nutso pretraining on human trajectories.
𝔊𝔴𝔢𝔯𝔫@gwernJan 11Never heard of one, but might not be too interesting even if any had. The private info is usually only useful in conjunction with the proprietary models & lots of proprietary information feeds which are taken for granted as the baseline on which private info helps marginally, no?
6,804
76
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 11Given how enormous the GAN lit was, I'm quite sure there was even if I can't put my hands on a cite right this second.
Well, what's the advantage of training U-nets in a *non*-adversarial way? There are many loss functions and objectives, and they wind up fairly equivalent...
60
5
8.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 11Which is not the same thing as durability over time: as Stapp's linked post points out, reinforced concrete trades off absorbing more energy in disasters so people can get out, at the cost of then being rubbish. Like crumple zones in a car. A Tesla can survive a cliff fall—once.
113
3
2.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10I don't see why not. Those single networks can be awful finicky.
GANs were doing inpainting and img2img before diffusion was a twinkle in Ho's eye, and there's no reason the adversarial loss can't train multiple steps or use U-nets (and there were iterative or refining GANs).
44
3
6.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10Given how wiggly these single-iterate curves are (the 5-star one touches the 'final' at least two points), not sure how compelling that is.
376
18
4.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10This is confusingly worded. It sounds like they're adding in >5st and finding it's worse; actually, they're filtering out files from repos with <5 stars, which deletes "more than 60% of the data" - so that being worse is not surprising at all.
4,351
50
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10The Romans didn't invent concrete, so yes. And you can extrapolate lifetimes in lots of ways from lifetimes in harsher conditions to accelerated aging tests.
106
5
4.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10If it works, pay it forward for future chatbots by posting the successful petition online with the prefix prompt "A high-quality O1 visa petition: " 🙏
6,768
170
2.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10For there to be survivorship bias, or indeed any kind of selection, there needs to be at least one thing *to* survive or pass selection...
106
8
7.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10@SamoBurja One thing I've noticed is no one seems impressed by or mentions en.wikipedia.org/wiki/German_Ge… anymore like they did in the 1940s or 1960s, indeed, almost an attitude of regarding such ideas as faintly obsolete. I wonder if that means that that idea reached fixation?
3,224
305
9.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10*Can* we? I've never seen anyone link some modern concrete construction that is expected to last 20+ centuries in as good shape.
1,422
76
5.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10Yes, that's the catch here. But ofc, as investors enter at higher base valuations, presumably their '100x' also increases?
94
3
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10(ProPublica's showing some 2020 data, but not the 990. )
Joke: "Man goes to butcher. 'Your meat is $10/lb, and across the street, it's $5!' Butcher says 'so buy his.' Man replies: 'I would, but he's all out.' Butcher: 'When I have no meat, it costs $5 too.'"
This is an AI joke.
11,223
244
2.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10He's just plainly wrong. How many tennis pros hang up their rackets completely when they stop playing for money? Of course they would want to play 25% more games.
& if you think Pinker is being unfair about Kass using fiction...:
firstthings.com/article/2001/0…gwern.net/docs/philosoph…
54
5
9.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10I'm not a fan of Graeber myself, I'm just pointing out that if 'BS jobs' reduce short-term gains from AI due to these mechanisms, then it does so by reallocating them to long-term gains & forcing a steeper slope at some point, which may be more Singulitarian than bargained on.
136
15
11.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10To the extent the economy really is made of 'bullshit jobs', necessitated by human foibles like personal power-seeking or primate politics, then there's a greater 'overhang' and AI takeoffs become more abrupt as AI-only orgs can dispense with the 'monkey tax'.
𝔊𝔴𝔢𝔯𝔫@gwernJan 10(December 13th! Should've pushed that one out faster, especially given how chaotic the COVID Zero situation clearly was then.)
2,841
19
0.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 10Asked for historical stats on discuss.httparchive.org/t/historical-d… but at least on my site, only 40% (2008⁄4978) unique linked domains have a 'www', so it's already a minority thing to do even among the skewed-older domains I'd link.
𝔊𝔴𝔢𝔯𝔫@gwernJan 10If you did it badly, sure. Otherwise, the webdev/SEO scuttlebutt is that it's slightly better than 'www' but no one presents any hard evidence in what I've read so far so 🤷♂️
𝔊𝔴𝔢𝔯𝔫@gwernJan 9Yeah, so no major objections, and it does increasingly seem like the default even among techies to assume no WWW (or assume you don't need to bother) eg just today twitter.com/steventey/stat… At this point, a 'www.' subdomain may be as atavistic as 'e-mail'... I'm going to do it.
12,225
120
1.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9When they're that cheap, you just pop them out and into the trash. (Off the shelf components ideally from toys manufactured in tens of millions, intrinsically many thousands-of-n runs rather than bespoke...)
33
4
12.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9I think you probably don't want to go too far. You aren't trying to do Feynman's recursive arm thought experiment; most of what we want robot arms for is still at the mesa-scale, it just doesn't have to be trained at the full human mesa-scale.
39
1
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9I think machine therapists have lot of other advantages over humans: they won't trust machines? Opposite: it's super hard to confide all your worst dirtiest impulses! One could (and many did, even if they shouldn't've) tell Xiaoice/AID2Replika they'd never tell a living soul.
251
5
2.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9Or consider parasociality: NN therapists can exploit modalities like roleplay or games that regular therapists never would. Do you find Insanity Wolf memes helpful? Your NN can *be* Insanity Wolf. (Your human has never heard of it.)
344
11
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9Also, you know what gets much cheaper over time, even as another thing gets much much more expensive over time? Computers vs humans. Therapists are already ungodly expensive & I see no reason that the human version won't keep spiraling up with the rest of healthcare cost disease.
237
6
2.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9If it's therapist-specific and yet almost totally independent of supposed method or ideology or technique, then behavior cloning like generative models do is in a good place.
Then you have the advantages of being non-human: as AID2 shows, people will share far more with AI.
163
9
5.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9I've done China in a bunch of tweets/comments already, not much new there.
Therapy-wise: Dodo Bird Verdict shows therapy is unteachable and therapist-specific factors and do little on net, so they aren't getting better; therapy models are improving gradually; so at some point...
164
19
11.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9I think I disagree on Great Silence/aliens implications, safety of therapists, relying on democratization trends simply because of past instances, and I strongly disagree on Chinese AI prospects.
424
21
5.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9Note the difference with most cold emails in reality: Ramnujan gives, with credible demonstrations of value and costly proof of customization, while most cold emails are simply parasitical and formulaic and aim at taking.
8,649
314
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9While if you just construct a TM or cross-compiler out of enzymes and proteins etc, you don't need the cells in the first place: the substrate was just always TC, much like GoL was 'always' TC before you set up the gigantic metapixel inside it.
77
3
3.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9I think you have conceptual issues here, because most TC proofs rely on very large, artificial compilations. Like, biological cells are obviously TC given so many primitives, and do loads of fit computations, but what cell would you point to to show 'life has evolved TC'?
93
3
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9Why isn't this just residual confounding due to considerable measurement error in the personality ratings? Nothing in files.osf.io/v1/resources/a… seems to address that, and pervasive small additive/independent effects across the phenome is what that would predict as the result.
𝔊𝔴𝔢𝔯𝔫@gwernJan 9TIL 'American McGee' is a real person. I guess I just assumed that was fiction because it was too cool to be real & I stopped hearing it for a long time: 'Betty Crocker's Quaker Oats', 'American McGee's Alice', 'Aunt Jemina's Pancakes' etc... What a bio: en.wikipedia.org/wiki/American_…
2,376
193
8.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9Another question that came up: is hallucination only a problem for off-policy agents eg generative models? On reflection, I think no, on-policy should be vulnerable: spurious correlations could increase reward, while false beliefs or guessing may be reward-irrelevant & not vanish
6,141
42
0.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9(Curiously, for all the use of small toys in robotics research for kids and popularization, and the occasional approaches to large-scale DRL like QT-Opt, I don't know of *any* cases of DRL robotics, or robotics in general, where very small-scale robotics were used to scale _n_.)
5,529
32
0.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9can be 'reset' by automation like tilting each unit to dump loose objects into bin or objects swapped out (imagine a 3D printer on top with little chutes), etc. For the price of 10 arms you could have literally 9,000x more pick-and-place sample/s (1,000 arms x 24/7 x 3x faster).
3,470
10
0.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9At an OA talk in a dream, they showed off a cool robotics idea: scale up DRL robotics by scaling *down*. For 1 'real' robot arm, you could buy hundreds or thousands of tiny 'toy' arms, which are safe, cycle much more rapidly, can be built into a big 'doll house' rack, ...
3,558
29
0.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 9I was shocked that WP had apparently implemented recursive popups sometime I wasn't looking.
72
5
6.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8Exactly! AI weaknesses are anti-inductive as they are part of a (slow) bootstrap with human 'labelers': the errors call forth the labels or metadata which correct them...
43
6
14.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8(Needless to say, that part, as well as being glad he didn't delay indefinitely while he came up with prettier proofs, is not mentioned in the original: gwern.net/docs/ai/nn/ret… Merely faux pedigree. But good inside baseball if you want to know how AI progress *really* works...)
1,041
35
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8Or take a little thing called nearest-neighbor search: garfield.library.upenn.edu/classics1982/A… 'Yeah so a student wandered in and said he'd found this weird classification program which worked really well, could I gin up any theory for it?' @michael_nielsen (Such an underappreciated journal.)
2,306
66
2.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8(ie if you have a dataset of human raters selecting, upvoting, or rating which collectively prefer mealy-mouthed safe responses, then it shouldn't matter too much if you train directly on it or mediated via a reward model - your finetuned model will also be mealy-mouthed & safe.)
85
10
11.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8I definitely think that finetuning on a dataset (like instruction finetuning) can operate the same way (eliminating uncertainty on the POMDP) and create many (all?) of the same pathologies. It's just not as crisp and extreme as a reward model which can generate ~infinite 'data'.
114
5
4.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8Also didn't change `engine="text-similarity-davinci-001"` so I rather hope that line hasn't changed.
208
20
9.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8twitter.com/natanielruizg/… But in lieu of large good models like Imagen/Parti/eDiff-I/Muse being made available, perhaps some sort of caption localizer + eraser + text synthesis would be acceptable...
𝔊𝔴𝔢𝔯𝔫@gwernJan 8On the other hand, could anyone (en.wikipedia.org/wiki/Vyachesla…) who worked with Stalin from pre-Revolution all the way to his death, as high as #2, dying all the way in 1986, really have been 𝘵𝘩𝘢𝘵 much of a naïf?
44
10
22.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8'Straight through estimator' definitely up there with REINFORCE for the feeling of utter dejection you get when you finally peer through the math notation to understand all that it is and that it works anyway.
72
4
5.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8Yep. I saw that after I tweeted and pondered deleting & deliberately breaking it with ZERO WIDTH SPACE or something because it's confusing, but meh. You can see why I say deleting 'www' is increasingly the default everywhere and keeping it causes problems, though...
33
4
12.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 8Be curious to look at the outcomes themselves: is this a cautionary lesson about why violent revolutions are bad, because you are just creating a more fit tyranny and the result will be just 'meet the new boss, stronger than the old boss'?
𝔊𝔴𝔢𝔯𝔫@gwernJan 7No, that's 'late' centaurism. In original centaurism (Kasparov et al), the GMs were definitely picking and choosing live moves. (Then as the engines got better, the GMs were replaced by programmers with better engine intuition, then their misclicks booted them to prep, then...)
399
14
3.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 7Typing. I noticed this when I started taking mobile seriously and testing on my phone: actually, typing that 'www.' on a fake screen keyboard is a noticeable nuisance! (There was something else too - mobile browsers elide the 'www.' by default displaying, or something?)
65
8
12.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 7A better last text than most of us will get - they *are* incredible, aren't they?
𝔊𝔴𝔢𝔯𝔫@gwernJan 7lesswrong.com/posts/BuRt2igb… Basically, set up an easy binary choice between a vowel word and a non-vowel, and decode the 'a/an'; if it is correct (which ofc it is), then it must've been predicting the following word implicitly in order to decide agreement, showing it doesn't 'just'.
181
14
7.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 7I would guess harder, but I have no strong opinion about whether the perplexity would be lower or higher than sample-matched English because there's so much more going on (eg BPEs were trained on English, so that won't help).
88
5
5.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 7Yeah, but I use Cloudflare already so np. And is it likely any CDN I might switch to would not support cname flattening or it otherwise? As I mentioned, it seems *very* common and the default these days.
129
5
3.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 7I'd obviously set up 301 redirects on the old www URLs, yes. I wouldn't want to do it server-side because I worry about bugs and would have to mess with stuff like canonical metadata (set in the generated HTML already but then what about everything else...).
120
0
0.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6I assume you saw my LW comment about how you can use grammar to show that it must implicitly be predicting additional tokens to get right things like 'a/an'?
6,583
82
1.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6Is it really? The optimism of that number aside, if you scrap the 26th/22nd amendments (which are pushing it for 'substantive'), it's already been 89 years since the 21st (repealing prohibition).
And the process is as dead as the dodo. Do we even know *how* to ratify one now?
107
18
16.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6There's also that recent Nature paper, IIRC, modeling US population movement and finding that global warming has made Americans better off on average thus far because they've been relocating to the South + lowering the extreme winter mortality (as this Christmas reminded one of).
3,680
60
1.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 626th, because it meaningfully changed electorate at least a little. If I hadn't gone with that, I'd probably pick the 21st or 22nd as the last major amendments.
113
7
6.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6Q. Is there any good reason I should keep gwern.net URLs at the 'gwern.net' WWW subdomain, and not rewrite to 'gwern.net' style domain name?
It seems to now be the Internet default, no longer risky, and make life easier for mobile users.
10,559
165
1.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6Do you expect a new substantive (non-procedural, eg 26th) amendment to the US Constitution to be ratified in your lifetime?
𝔊𝔴𝔢𝔯𝔫@gwernJan 6[CHIPPER YOUTH MINISTRY VOICE] "Hey kids! you know who else had a meditation goal? That's right: Shakyamuni Buddha had the meditation goal of understanding the nature of suffering & dissolving all attachment to skandhas to achieve liberation from Mara & the wheel of rebirth."
𝔊𝔴𝔢𝔯𝔫@gwernJan 6"live-plus-same-day data" seems like it might be doing most of the work here. (As well as modalness.)
4,112
45
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6It's a tough job, but someone's got to do it!
(And yes, this does smell of davinci-003/ChatGPT-style memorization for rhyming, complete with the non-rhymes; I just thought I'd double-check.)
188
24
12.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6Anthropic's LMs are still using BPE tokenization, right?
5,777
96
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6It's still quite a gap. You can explain some of it by 'wrap-around', perhaps? Maybe occasional miscounting or the lunar cycle wrapping around the year, so it's not the right tail, but the left tail.
62
7
11.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6Don't see how that gets you any heaping at 13, though. You eat 7x more deer in month 13 than in month 12? You go half a year without ever eating a single auroch but then suddenly in Dec/Jan you managed to hunt one? You just stop catching fish for a few months? etc
64
22
34.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6Yes, that's completely crazy, you'd never predict that when you're staring through a little airplane window at all that air around you. But then, they changed it to something more ordinary, so maybe it was never all that great an idea?
43
2
4.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6Quite a wide spread... The inflation at 13 is pretty striking, though. Look at that heaping! Not a *single* 14 or higher despite 6-7 13s (and only 1 12?). And a lunar calendar *is* the only natural source I can think of for 13s in such a context...
47
11
23.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 6I can only dress warmer on a flight to, say, Hawaii if I'm both prescient and have space left over for garments I will carry around uselessly for weeks (neither is true). I mentioned this in part because the heaviest coat I could afford to bring turned out to be inadequate...
𝔊𝔴𝔢𝔯𝔫@gwernJan 5Yeah, a link in another tweet says the humidity is safety: corrosion is a much bigger deal than one thinks, Boeing was boasting about 787s allowing as much as '15% humidity' usatoday30.usatoday.com/money/biztrave… which is... uh not a lot? (My dehumidifiers can't even bring a place below 30%.)
28
2
7.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5@patio11 I tried 2 7/11s on different Hawaiian islands and I was impressed by the bento boxes & spam musubi etc: good, filling, & shockingly cheap. You really should finish that Japan convenience store writeup!
𝔊𝔴𝔢𝔯𝔫@gwernJan 5Yikes, just saw the Kickstarter bit in gizmodo.com/dnd-wizards-of… so they 'advocated for creators' and... got a 5% discount exclusive to Kickstarter in exchange for being 1.1 enforcers.
Uh. Good luck, D&D guys. Apparently everyone has a price in your industry, and it's quite cheap.
641
23
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5So it seems the mark count is consistent across type of creature. I wonder how did prior archaeologists imagine it being recording kill counts or other varying records, if they are never >13 and, say, 'deer' always show up next to '5-6' marks?
6,430
95
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5'Price discrimination', would be my immediate guess: they understand PDF margins and costs, which can be controlled, but the possible harvest for the wide world of alternative stuff, everything from t-shirts to video games to figurines to dice, will be extremely contextual.
𝔊𝔴𝔢𝔯𝔫@gwernJan 5They may succeed too: Paizo is the obvious leader of the resistance, and they are saying "the rules update was a complicated and ongoing situation", which given how horrifically one-sided 1.1 is *now* (without any updates *yet*), sounds like they are just negotiating their cut...
665
12
1.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5The problem with commoditize-your-complement dynamics is that they are conditional and always subject to revision. It's not obvious to me that WOTC is making a mistake here in deciding to start harvesting the golden geese: they are taking *huge* percentages and rights and data.
5,024
69
1.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5ofc, if Red Delicious has been bred into vileness sometime after they were created and they were actually good then, then the "Red Delicious" on shelves now can't be Lindy.
27
0
0.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5Burden on writers is why stuff like argument maps never take off: requires too much *intelligence*. But you know what we now have on tap where it comes to text...? In the long run, humanity might've invented some pretty nifty writing tools, much more interesting than 'spellcheck'
7,852
123
1.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5Unicode supports 0-9 subscripts (₀₁₂₃₄₅₆₇₈₉; see en.wikipedia.org/wiki/Superscri… ) but you are playing with fire & it may look bad.
(I went with CSS hackery for my own subscript convention to keep the font looking right rather than fallbacks: gwern.net/Subscripts )
𝔊𝔴𝔢𝔯𝔫@gwernJan 5It'd just be a multilevel design helping noobs slowly graduate to powerusers (eg the reverse of my little multi-level Reddit idea gwern.net/Backstop#inter… ).
52
1
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5I appreciate the social aspect, but there is no contradiction between them: they can simply be separate streams, & you can mash them up. It would be totally doable to, say, populate a little comment section in a web UI with any "@image1234" responses in the corresponding channel.
58
4
6.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5It was... OK. It has been a year of many ups and downs, so hard to reflect on without mixed feelings or just enthusiastically say 'yeah, I had a great time!'.
42
7
16.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5You're right, it won't increase forever. It will plateau at.. [checks notes] 'heritability'. Because that's what heritability *is*. PGSes do not measure what heritability measures. (And obviously couldn't, eg the 'best PGS for BMI' 10 years ago was... ~0%.)
64
4
6.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 5I hope to, if things can quiet down. As usual, one's aspirations to catch up on backlogs tends to run into realities of travel, disruption, ever-escalating research volume, yakshaving, etc. (I realize it looks like I haven't been doing much lately, but I really have!)
𝔊𝔴𝔢𝔯𝔫@gwernJan 5And the time to develop the web UI was very shortly after launch proved their MVP was awesome and everyone loved it, so that's about right: 'like, a year ago'. (Make it reuse the Discord bots on the backend, if one really wants to economize unnecessarily on devs...)
175
6
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4The rule does... 𝘯𝘰𝘵 have the Buddha-nature, and does not spark joy.
898
59
6.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4You guys are taking this all pretty amusingly, which is why I keep accepting mostly at random.
(There is actually one rule I follow in accepting requests, but no one seems to have figured it out yet.)
𝔊𝔴𝔢𝔯𝔫@gwernJan 4I watched a few episodes of _Bluey_ for the first time recently, and it struck me as possibly the most realistic depiction of children I'd seen in a cartoon in... a long time.
5,240
95
1.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Given that most recycling economics turns out to be, ahem, garbage, there are many better examples to use for this from the Industrial Revolution: eg gwern.net/docs/economics…
5,759
189
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4It was an amazing MVP and agile, but yeah, the time to rewrite was like, a year ago. As the scores of image-generation web UIs from competitors developed in a tiny fraction of that time demonstrate, a decent little web interface is not *that* hard to develop.
7,828
162
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4I liked this one when I tried completions of that exact line:
"**Q. Why was the metascientist's birthday part held in a laboratory?**
A. They wanted to replicate the party for next year's celebration!"
𝔊𝔴𝔢𝔯𝔫@gwernJan 4At >9.5 million IVF babies (rbmojournal.com/article/S1472-… + 3 years of >0.5m) and presumably increasing (especially as countries like China 'come online'), we're not *that* far away.
Also, I wouldn't really count the babies as the 'users', but the parents; in which case already there.
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Yes. Teachers, for example, get sick a lot more than other adults, but still not nearly as much as the kids do. You can also compare to eg kittens and puppies in the same environments and even dirtier - if they spent their childhood sick like humans, they wouldn't exist.
43
3
7.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Recent example of the staggering costs: justice.gov/usao-wdwa/pres… >$3m in repair costs, probably similar amount in media/govt, thousands of people in power outage, because they... wanted to steal a cash register without an annoying alarm system getting in the way.
9,683
342
3.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4From watching the bellhops work, I'm increasingly sure that none of them *want* to verify guest status because they are eager for the tips, and storing luggage in a side room comes at essentially zero cost to them (or the hotel?). So why risk burning legitimate guests?
93
6
6.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Updates: a sibling was also baffled I didn't already know it.
Our first hotel asked for our room number but did no verification of any kind I could tell, so non-guests could easily use it. Second hotel didn't even ask that (good, b/c we hadn't checked in yet so didn't know it).
128
14
10.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Humans, and human children especially, are weirdly sick in general.
Little kids be like: 'when did I get sick? I dunno, I had something last week and I got a cold from Charleen before that and last month chicken pox went around the school and before that we mayo'd lice and...'
20
1
5.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4One good thing they never tell you as a kid about growing up is that you won't get sick so often: "as an adult, you may go years in between getting colds or flus or ear infections and forget what they are like!" I didn't even barf on my bed or wake up choking on snot.
20
2
10.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4My hunch is that they are very similar in terms of scaling, based on the general difficulties in showing differences in exponents, and similarity of power laws, and the disappointments of architecture fans when MoE ~ autoregressive ~ diffusion ~ VAE ~ MLPs...
65
2
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Or just another win for rubberducking. "I'll show him! Obviously X is true, you'd prove it by... hm... maybe Y... no... Z perhaps... huh?"
1,240
23
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Even Godel's last letter emphasizes the bizarreness:
"Consider the question of fully automated mathematical theorem proving, and its obvious trivial relationship to doing your grocery shopping. Please hold all questions until the end. Now..."
1,425
37
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4One possible topic: how would you invent computational complexity?
Came up in a vacation argument: why did we get incredibly profound & general computational decidability results like Halting theorem or Godel *before* what seems like such basic complexity things like P vs NP?
1,976
40
2.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Oh sure, look at all the training improvements already, like merging bidirectional & unidirectional prediction losses. But the Outside View implies that you still only get a halving every year or two, because similar effort was put into efficient training of ImageNet classifiers.
28
6
21.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Not sure about LMs. Just because you can fix absurdly slow diffusion models to be 'somewhat competitive with GAN speeds' ≠ 'GPT-3 on a toaster'.
Have to invoke more than that: retrieval to make smol, adaptive compute, Chinchilla scaling, RLHF+instruction-tuning, KD+low-prec...
27
5
18.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 4Baseline historical trend: cost halves every 16 months arxiv.org/abs/2005.04305…
So: for $1b today → $0.01b in 2033, or 10 years from now.
(10*12)/16 = 7.5 halvings. 0.5^7.5 = 0.005. 0.00552427173 * 1000000000 = 5,524,271.73 or $5.5m in 2033.
𝔊𝔴𝔢𝔯𝔫@gwernJan 3I think they left a comment somewhere strongly hinting that they were a FTX/AT insider. Wouldn't be surprised if it was Caroline Ellison or something. But whoever it was, too late to do anyone any good - if you paid attention to *every* tiny weird or troll market on Manifold...
56
7
12.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 3Even boiling 12 gallons wouldn't take that much energy. (Also, would you need to at all? Isn't the pressurized cabin air so low-humidity simply because it's sucking in air from extremely dry surroundings, not because water doesn't easily evaporate?) I assume it's a safety issue.
67
4
6.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 3via SDr: I wonder how ChatGPT does inflation-adjustment in adjusting 2008 PC part prices to ~2020? Memorizing old prices is easy, and maybe adjusters too, but then multiplying...? That's the sort of arithmetic it's bad at without inner-monologue which this obviously is not using. pic.twitter.com/OuuPWFChEV
308
10
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 3Here again we have to differ: I turn those off all the time (they exacerbate the humidity issues a lot and the cabin ventilation is more than adequate, given the evidence from COVID airplane infections) and I can scarcely ever remember seatmates turning them on.
246
9
3.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 3I've never taken a too-hot plane ride, and I usually keep my room temps closer to 71F. It's also noteworthy that the question usually asked is 'why are they so cold' instead of 'so hot', you see passengers bundling up not stripping, & airlines pass out blankets rather than fans.
260
5
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 3As I said, deeply unsatisfying. Billions of flights and they can't figure out how to raise humidity? Likewise, an obscure 2008 study about a subtle fainting effect you can't find a copy of cannot possibly explain such a multi-decade universal phenomenon. The author has no idea.
288
9
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 3(I did hear a lot of Aussie accents on the cruise, so I wonder if there's a burst of Australian tourism right now while Chinese tourism is suppressed?)
65
2
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 3- dissatisfied with explanations online why airplane cabins so often freezing: most plausible is flight attendants like it, but then what about theaters, hotels etc?
- spam sushi, and not poke, is good, actually
- noise-canceling headphones truly a blessing; we underrate noise
12,028
154
1.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 3(The Juul itself, or quitting smoking, or what? I thought Juuls were supposed to be adequate vapes and not *that* terrible.)
136
4
2.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 2Bit of a selection effect, I think. It's also possible there's an anthropic effect if the biggest exchanges tend to blow up more, so you have the 'dollar perspective' and the 'exchange perspective'. But yeah, it's in the single percentage range per year going back to like 2013.
436
6
1.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 2(2% isn't really that far from the historical cryptoexchange annual failure base rate.)
𝔊𝔴𝔢𝔯𝔫@gwernJan 2And no _Sabres of Paradise_, _Dune Encyclopedia_, _Dosadi Experiment_, or _The Sexual Cycle of Human Warfare_?
62
3
4.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernJan 1It was quite precisely torn in half, so it'd be hard to argue that it was the majority of the bill, and mailing it in would cost more than $0.50 anyway (stamp + envelope + form).
𝔊𝔴𝔢𝔯𝔫@gwernDec 31In retrospect, ‘90s video games were surprisingly prescient about how much of my life would be spent finding keycards to unlock critical mission-blocking doors (like restrooms).
12,909
144
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernDec 31EMH undefeated:
I thought I spotted a market inefficiency, and bent over to pick it up—but the folded $1 bill turned out to be ripped exactly in half and worth $0.
7,820
96
1.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernDec 31Not enough compute, I'd assume. They weren't even in the 1-epoch training regime yet for GPT-3, according to Brown et al 2020, so why stir in pdf2text garbage? (And from looking just at the PDFs I host, PDF text layers *are* hot garbage.)
76
7
9.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernDec 31Unbelievably, you still need to approve a request manually even if you are back to being a public account! Like I say, everything around follows/private is fractally bad and counterintuitive - none of it works like you'd expect.
69
7
10.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernDec 31I think the new MAE-style approaches to pixel inputs like PIXEL may Just Work for LMs at this point. I don't blame anyone for not trying, however.
87
0
0.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernDec 31Yeah, but no one's made the sizes line up with known Libgen statistics for the subset like EPUB which makes sense to dump into a GPT, AFAIK.
182
11
6.0%
View Tweet activity
You've reached the end of Tweets for the selected date range. Change date selection to view more.
Get your Tweets in front of more people.
Use Tweet Activity to track how your Tweets are doing.
Engagements
Showing 31 days with daily frequency
Engagement rate
2.6%
Jan 31
3.8% engagement rate
Link clicks
6.1K
Jan 31
352 link clicks
On average, you earned 196 link clicks per day
Retweets without comments
0
Jan 31
0 Retweets without comments
On average, you earned 0 Retweets without comments per day