𝔊𝔴𝔢𝔯𝔫@gwernNov 30(The price of causal inference: the fewer feet you restrict your sample to, the more credible the local treatment effect is... but the looser the lower bound on the global effect becomes, as you more precisely estimate a causal effect you don't really care about.)
𝔊𝔴𝔢𝔯𝔫@gwernNov 30Sounds 'conservative' to me. Even conservatives will agree that things like suppositories or enemas or other such things (including doubtless many classes of medical interventions we would both prefer to remain ignorant of) designed to be put in anuses should be put into anuses.
223
7
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 30It's clearly Markdown. What would the first 2 evaluate to (and where is that evaluation result?), and how would any calculator know the 3rd one evaluates to 'square inches'?
84
3
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 30In 2016, surveyed AI experts predicted that "Perform as well as the best human entrants in the Putnam competition" would take 34 years to achieve, in 2050: arxiv.org/pdf/1705.08807… 🤔
6,710
179
2.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 29"The news is that the hackers dumped the information not that the hack happened >:|"
Again, the Dailydot article contains nothing about the *new* hack being dumped. So you apparently are referring to the first one, which is not news and cannot be warned about.
295
12
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 29Now you're equivocating. dailydot.com/debug/twitter-… says that the first known hack is the one being released for free (again).
It doesn't say the *second* new hack is being dumped, or for free. Make up your mind which you are talking about.
202
13
6.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 29'Dumped'... as opposed to selling it to all comers which is how Twitter found about and announced that hack back in August, and was covered back then, & this new article mentions was also *already* released for free before in September? Like, you can't lose your virginity twice!
𝔊𝔴𝔢𝔯𝔫@gwernNov 29To avoid being forced to eat the steak, or to be sure of getting one?
5,510
52
0.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 29(Painful how loud that looks. Can you hear anyone talk in between all the echoes?)
4,223
35
0.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 29('Let post-posterity be the judge' was right there.)
6,537
72
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 29The downside is that when it knows you, it can roast you. Someone was trying out the humor with -003:
'write a joke about gwern branwen: | Q: What did Gwern Branwen do when he was asked to tell a joke? A: He said, "I'm sorry, I don't do jokes. I only do meta-analyses!"'
🪦
𝔊𝔴𝔢𝔯𝔫@gwernNov 28I'm confused. Leike seems to think the tokenization hasn't changed, and doesn't know why their RL alignment secret sauce seems to help. It also is progressive, apparently, with 003>002. Maybe... they dumped in some IPA encoded text or rhyming dictionaries?
𝔊𝔴𝔢𝔯𝔫@gwernNov 28(So tokenization *did* change between 001 and 002, then?)
918
39
4.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 28If so, probing the other BPE-affected capabilities should yield larger gains on 002: phonetics is a lot of knowledge you have to learn the hard way, but something like shuffling letters or arithmetic ought to be learned much quicker once tokenization is fixed.
201
8
4.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 28Yeah, I saw reports of that too but the person I asked didn't provide any prompt and no one seemed to do it reliably and the ones they did show all looked common & memorized.
Maybe 002 switched tokenization but hadn't trained adequately to unlearn its blindness to phonetics yet?
𝔊𝔴𝔢𝔯𝔫@gwernNov 28Also good for lightweight testing. Curious whether negative prompts add anything over best-of or vice-versa? Just generate pairs to upload simultaneously with 2 tags denoting control or experiment, and later on look at them or user ratings, or use the DB for explicit A/B testing.
5,934
22
0.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 27They claim to have some baselines from the longitudinal part showing that the Toxo wolves looks fairly similar to the non-infected ones, so I'm willing to give them a pass on that for now.
78
3
3.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 26Should be pretty easy, overall, if you already have a working booru and working automatic-generation. From there, just working your way through old Danbooru IDs & using the API to fetch tags shouldn't be too much work. CLIP ranking's more work but an independent benefit/feature.
215
4
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 26We were very pleased when @arfafax came up with it back in Tensorfork. 'How extremely stupid to have not thought of that!'
Yes.
You could, but that might lose the value of the side-by-side comparison if you use *too* much from a human original?
76
2
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 26Nice thing about this is that it can be fully automated, can be ratcheted upward as compute allows (just generate more to rank & select from), and you'll get inherently more variety than the regular users seem able to summon and provide a showcase for very diverse prompts.
10,253
68
0.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 26So, why not have a 'Ganbooru' which just generates an image for each live D upload (delay a week for tags to be finished)? Copy over the tags, and use the spare time to do CLIP-based best-of-𝘯 ranking. (Quality over quantity.)
7,701
40
0.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 26Thought a little about whether I want to use my 'Ganbooru' domain to stand up a generative project of some sort. An idea I like: no one is doing side by side comparisons of Danbooru tag sets with generated images of said tags, for fairest comparison. Danbooru is ~1 image/minute.
5,254
35
0.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 26+27 days, and now +86k to 162k hits. So, slowed down (would be ~quadruple, not 2.1x). Around 3.5k/day now? Looks like steady-state might be much lower and pending on some big upgrades. (SD2 probably isn't going to move the needle much.)
𝔊𝔴𝔢𝔯𝔫@gwernNov 26Lots of OGs and commentators like Matt Levine have made precisely that point before: the success of centralized exchanges undermined the entire point. After all, why do you need PoW or PoS etc when an Excel spreadsheet at Coinbase or FTX is so much faster & more efficient?
𝔊𝔴𝔢𝔯𝔫@gwernNov 26Oh, it's just the obvious generalization of arrays to _n_ dimensions. (If you find it hard to visualize a 512-dimensional tensed flyswatter, just visualize an ordinary 3D-tensed flyswatter and then say '512!' loudly to yourself. That's what everyone else does.)
233
8
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 25Underselling the improvement there: the TADNE samples of a single headshot is probably its best area, while Anything V3 can do far more complex multi-object scenes that TADNE utterly falls apart on.
4,128
33
0.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 25Yes, that's where it came from.
(I agree that it would be a very fittingly Germanic thing to generalize from what Germans liked to do in the late 1800s to almost all possible alien species, but it is probably not a good basis for universal xenopsychological principles.)
115
5
4.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 25Damn, endorsing unreplicable candidate-gene hits - they made a mistake letting him out.
206
12
5.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 25His arithmetic is fine, he's just wrong because he's using the wrong numbers twice over: his '$39b' is Facebook's capex reported number, which includes stuff like building underseas fibreoptics or datacenters; and because Metaverse R&D would be accounted for under *opex* anyway!
51
1
2.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 25And to post-WWII Anglo culture, to boot. If you're going to claim PhDs are universally convergent, that's pretty strange considering most times and places doing research had no need for them.
𝔊𝔴𝔢𝔯𝔫@gwernNov 25My guess is that it's either only fitness-boosting early on while they were repopulating Yellowstone (which is when all the samples are from) or they tend to form dead-end packs: lesswrong.com/posts/snQGEAK8…
5,939
183
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 25How old were you when you realized wire fly-swatters are supposed to be tensed for more rapid swatting (and that's why they're twisted metal wire instead of plastic)?
14,229
566
4.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 25Wait, that's an option? I'd think I'd like to start bugging people to throw away Alexas right now, because they're so annoying. (I'll argue that one should throw them away because they're useless & *not* AGIs. No one will remember my hypocrisy when I eventually flip the reason.)
220
10
4.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 25You really think 'self-identifying as vegetarian' is *more* socially ostracizing, controversial, and harder than believing in AI x-risk with short timelines? Among which groups, exactly?
2,101
72
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 24I wouldn't be surprised if it was something like 1.6t. But 100t+ would surprise me; on the gripping hand, it'd be a fun surprise because that would imply some major multimodal and/or MoE breakthroughs.
𝔊𝔴𝔢𝔯𝔫@gwernNov 24It's still trying to maximize your probability of recall and would produce an overload when applied to a realistic number of notes or excerpts. The spaced repetition objective it maximizes is fundamentally the opposite of what is desired for serendipity/exploration.
183
20
10.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 23Huh, is this what happened to the Sberbank AI guys?
𝔊𝔴𝔢𝔯𝔫@gwernNov 23Yes, that goes without saying: the master breaks the rule to follow a deeper rule. You may be able to bluff occasional random errors but if you don't have a track record, it becomes a thing you can't countersignal lesswrong.com/posts/ThhNvdBx… and you fall to the floor of the test.
166
35
21.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 23Startup Idea: the first group to train a character-level LM can offer a premium product to the elite until everyone else catches on. 🤑
210
6
2.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 23So, if standard usage 'moves down' & merely shows you are top decile rather than top ventile, then that leaves a top percentile gap. That will be signaled. Probably it will be replaced by 'rule breaking' style: showing mastery by breaking rules. You see that on Twitter already.
2,027
49
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 23An analogy would be spellcheck: it made correctly-spelled texts less impressive (because that may just mean you took 10s to use a free spellcheck on it) but conversely, incorrectly-spelled text far more negative (so, you couldn't even take 10s to spellcheck it...?).
3,543
56
1.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 23It'll do both. Think of it statistically like ROC: a writing is a 'test' of competence, and every test has different ceilings, floors, and entropy in each range. Standard English will have a lower ceiling ('that's just run through a NN') but a lower floor ('didn't even bother').
3,220
71
2.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 23They are so nekulturniy they don't even understand that her flatness is a charm point.
1,574
20
1.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 23$15k a year for treason on a videotape? You can see the logic: no one really thinks the CCP will invade, so it's 'free' money... as long as you don't get caught, of course. Picking up pennies in front of a steamroller.
𝔊𝔴𝔢𝔯𝔫@gwernNov 22Oh, I don't really do important problems - I'm not really a researcher, I'm a writer. But I do definitely agree with him on the attack part - I'm not looking for important things to write about, simply things I have a good attack on.
𝔊𝔴𝔢𝔯𝔫@gwernNov 22As soon as you see 'Tarjan' in the author list, you know you're in for something.
60
0
0.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 22Hm. Useful for data augmentation too, perhaps? Find similar captions to an image, then crop it according to each caption; now you have valid sub-images+caption to train on.
3,906
47
1.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 22I've always treated my tweets as drafts/notes, so it would be foolish of me to throw them away having incurred all the costs but not all the benefits.
57
4
7.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 22Or the Time Warner/AT&T merger (do you even remember that? I didn't) lost ~$50b archive.ph/R3KlI and the original TW/AOL like $100b, in addition to your link about >$100b lost in pandemic fraud...
Scale of the global economy is incredible compared to 'all AI R&D ever'.
𝔊𝔴𝔢𝔯𝔫@gwernNov 21You joke, but there are still pretty much no decent scaling law papers for *anything* diffusion. All the arch settings, param counts, dataset sizes for anything image diffusion (never mind discrete or text diffusion, or audio etc) is basically just guessing.
51
1
2.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 21(Sorry, if I have to explain the allusion to 𝘩𝘰𝘪 𝘱𝘰𝘭𝘭𝘰𝘪, then it wouldn't be funny anymore.)
613
40
6.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 21Cool, yes, but also explains what happened with the statues.
917
59
6.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 21Mine worked 2 days ago but it's only 348MB so not as chonky as yours.
𝔊𝔴𝔢𝔯𝔫@gwernNov 20A 'commitment mechanism'? Gosh, that sounds like some sort of *cult*. And only some sincere tryhard would care about such a thing in the first place or uphold any values non-ironically.
5,348
52
1.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 20I have! Regrettably, it took me exactly 29m:59s to do so and my 'session timed out'. 😵💫
90
3
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 20The awesomeness of Arxiv is obvious as soon as you start reading any other field's research.
"Wow, I just lost the evening going through APA PsycNet open tabs!" - no one ever
937
35
3.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 20And, to belabor the obvious, Tunguz's implied critique is of Musk/Twitter, but it shows the opposite: if G+ could be such a superior product *and* do so well despite fatal network effects and an equally fatal owner, then that bodes well for a rebuilt Twitter, not ill!
414
15
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 20No, it also had better communities, not just better code. It's just that it wasn't 10x better than established ones, and then, of course, Google orphaned it and eventually axed it.
677
14
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 20Doesn't land very well because if you actually used Google+, you knew it was (like Google Reader) pretty good. I only started using Twitter after G+, and it was a definite downgrade. And competitors keep reinventing parts of it like circles.
10,742
307
2.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 19Because they need to be tuned for each problem, defeating the point. Real exploration is a convergent instrumental drive, just 'solving POMDPs' but relabeled; scaling (across many problems, not samples) solves this. 😉
271
12
4.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 19My view was that on-chain fees/slowness killed Augur. When I used it, it costs several bucks to do any trade & would take like 10 minutes to execute ugh. No wonder Polymarket/Manifold/Metaculus/Kalshi have done so much better!
But with the Merge done (!) and off-chain scaling...
111
7
6.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 19A demonstration of Stockholm/Ikea effect: the Ford Model T was the worst car ever sold to that many people (because it set the bar such that a car even worse would never get sold); but spend that much time using it & mastering/repairing it, and people will come to love it anyway.
51
3
5.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 19Not to mention the many nonlinearities. Would one argue nuclear bombs aren't dangerous because they scale from 'less than conventional explosives like MOABs' to megatons? No, of course not, because you can't put millions of tons of TNT into a bomber or the nose of an ICBM.
94
1
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 19Yeah, that'd probably be the simplification: popin cards until they tile available space, pushing each one down left-to-right, and then when you run out of space, start collapsing the older ones into little tab of some sort at the lower-right corner/edge (like yours but bottom).
1,228
19
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 19Yes, that was the ref I found: AFAICT that was the first real translation, and the preface only cites English secondary literature starting in the 1970s. Meanwhile, you have Ettinger and everyone going back before 1990 or 1970s. So temporally, Stross's claim doesn't work.
199
7
3.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 18Yeah, I think part of why it's crazy is that AFAICT Cosmist materials like Fyodorov's collection hadn't even been translated into English. Very eyeroll. But at least Stross influenced Hannu into using it for cool SF worldbuilding, so I forgive him for that.
199
15
7.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 18Yep. I grew up with German shepherds and even though they are not the smartest dogs, I definitely thought to myself, 'you guys are almost too smart to keep cooped up in suburbia and still be happy...' Our smaller dogs always seemed happier.
74
3
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 18Maybe it already exists but where Whisper has clear commoditize-your-complement dynamics for OA, a *text* Whisper is a huge secret weapon... 🤔
391
13
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 18Stross wouldn't! I think he's still occasionally pushing his "transhumanism is actually just Russian Cosmism with the serial numbers filed off" thing.
But more seriously: if so, you might enjoy _The Quantum Thief_ trilogy.
191
12
6.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 18As cool as summoning popups are and recursing through them is, I don't know to what extent people actually read & interact with them extensively as opposed to, say, read the title & first few lines before deciding to close or queue in a new tab for later...
987
8
0.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 18Yes, it addresses one of the fundamental problems with any kind of popup: it's not very 'calm'. Kanbans are; if they popin permanently and stably, and you spend more time reading them with a saccade rather than mouse-interacting with them, it may be better on net.
1,083
15
1.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 18Note that Dr Sinister says that only of the *20th* century.
159
3
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 18I should add an org section to gwern.net/notes/Faster .
Hm. Craigslist, 50; Instagram, 13; WhatsApp, 50; Reddit, 700; Wikipedia... probably like 10-50 for the actual hosting rather than rest of WMF parasite.
5,713
403
7.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 18(Does it involve more or less suffering when wild animals are simply killed by other wild animals instead of humans?)
7,482
181
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 17Yeah, don't you need to 'burn' the donations in some way? Otherwise there's nothing at stake. I send in my $5k contrib or whatever to the escrow service, locking down $5k from some sucker on the opposite side, and then send my actual $5k contrib to my candidate - for $10k net.
60
6
10.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 17Still working out the bugs on this, sadly. As the overall arch gets more baroque, tech debt piles up. But next cool change should be rendering popups on hover, rather than timeout, offscreen, until timeout fires. Ought to eliminate ~100% of user-visible popup slowness or delay.
8,769
46
0.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 17Yeah, I saw that, but the balance sheet is crack & AIDS, so believing any implication that the $500m equity is fully paid for, simply because any sane accountant would expect a liability for the tranches to be included if it hasn't been, is... dubious.
340
21
6.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 17(That is, if you aren't going to count the full nominal amount of all the sports deals etc because FTX hadn't paid out 100% of it, then you shouldn't count the full nominal amounts of all Anthropic and charity grants either for the exact same reason.)
620
23
3.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 17Why is that a lower bound? Isn't a big part of the problem that they didn't (and now can't) pay out a lot of those grants? Clawbacks aside, presumably the real number is lower.
𝔊𝔴𝔢𝔯𝔫@gwernNov 16Twin experiments are awesome. You could send up 20 pairs of astronauts and still not have the statistical precision a single identical-twin pair gives you. And because the sample size is small, you can do intensive measuring - imagine trying to do all that with 40 people, not 2!
1,298
37
2.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 16Nah, it's not like the Semantic Scholar corpus is any more legal than SciHub. Just no one wants to mess with PDFs that don't have clean born-digital text to extract. (I keep highlighting stuff like PIXEL which shows that it probably isn't *too* hard to handle raw PDF. We'll see.)
369
24
6.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 16Oh sure, all that is there too, but the basic setting seems more Vingean to me, and then you have sequences like the trojan -> ship takeover/suicide: that's just the opening of _A Fire Upon the Deep_.
300
23
7.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 16Incredible "you miss 100% of the shots you don't take" example.
But I think I liked it better when SBF might've just been a misguided EA who lost some +EV gambles or something, rather than a sociopath who never gave a toss about EA or its ideals.
10,081
699
6.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 16You can see that sketching out an imitation of this sort of popin rather than popup doesn't work with the single-col gwern.net layout - just not enough margin space to make it sane anywhere with left-to-right reading... But a more newspapery layout might work. pic.twitter.com/RNFv4UUZKk
1,763
38
2.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 16As I said, it's a mockup, using his site not mine bc I don't have multiple-col layout leaving vertical space. The idea is that instead of popping up when you hover on a link, it pops *in* to the lower row, where they are organized like cards in a kanban (but horizontally).
1,309
27
2.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 16(That is, each annotation is processed for a list of links in it; these are then put into a list with transclusion of their own annotation enabled. This list is transcluded at the bottom of each popup. Then for tag-directory, it just transcludes a list of those lists. Tada!)
8,601
16
0.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 16Because transcludes are lazy, can keep defining large lists w/recursive transcludes which'd bring a browser to its knees if strictly evaluated.
So can define appended link bibliographies for *annotations* & the tag-directory linkbib is just the union! eg gwern.net/docs/ai/nn/tra…pic.twitter.com/MZzCFjYWRS
5,117
36
0.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 15Yes but sheepdogs even more so. Just start with the 'dog of breed X can learn Y words', and compare with lists of problem behavior when not given heavy work.
336
23
6.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 15Maybe a bit derivative of Vinge, but I enjoyed it and the references.
4,031
48
1.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 15I'll give the same reason I gave the last dozen people to suggest this: because those very intelligent dog breeds are *also* the ones which are most destructive and neurotic when living in typical urban/suburban conditions. If anything, we should be breeding (even) dumber ones.
7,094
227
3.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 15Interesting how many of the 'wacko libertarian' positions in that are now fairly mainstream, especially on the left wing: decriminalizing crack cocaine + 'pro-LSD lobby', no mandatory conscription, defunding ICE, ending ag subsidies...
2,731
41
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 15(And ofc MAE pretraining is efficient because most of the pixels are dropped out and don't get processed.)
𝔊𝔴𝔢𝔯𝔫@gwernNov 14As I said, you are expecting a lot from someone who is in one of the worser positions to be in, because the things that need to be concealed from him can be easily, as opposed to regulators, auditors, and investors.
224
9
4.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 14That also doesn't seem the case, either career or character: MacAskill got SBF into Jane Street? MacAskill set up FTX? MacAskill knew about the transfers to Alameda, which supposedly only SBF and Caroline knew about? His involvement seems to be mostly PR and the charity end.
561
93
16.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 14OK, so, you have some better source of billions? Because I don't. That a plan is not ideal doesn't mean that you have any better alternatives.
572
32
5.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 14Arguing that the response to bad risk management is to stop even trying is truly the nirvana fallacy at its purest.
638
34
5.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 14[meme] "We should invest more in preventing tail risks because of human error, fraud, chance, and a myriad of other risks affecting human response to novel situations and problems."
"And yet, you have been hurt by a tail risk. Curious! I am very intelligent and on Twitter."
253
19
7.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 14Second, the failure to detect close frauds is consistent with the EA worldview emphasizing error, while it's inconsistent with the non-EA worldview you are espousing here: "everything is fine just fine we can trust institutions, if you don't detect fraud you're just incompetent".
266
17
6.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 14First, *why* should they have been particularly in the know? The only people well-placed to 'be in the know', aside from the fraudsters, are the financial regulators, auditors, and investors who have things the fraudsters want, not who want things from the fraudsters.
838
57
6.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 14This point makes little sense to me. I should stop worrying about asteroids, AGI, global pandemic, or a few decisionmakers blowing up (the world, not a financial firm) because... I didn't see an Excel spreadsheet backdoor? If it's not apples/oranges, shouldn't that be opposite?
10,001
387
3.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 13(This is an amusing use of 'MSM'. I have to ask, was that deliberate or just a happy accident?)
1,866
35
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 13[flashes back to several nasty bugs, particularly in the shell, where changing the name of the function changes its behavior, for a variety of reasons such as bypassing a *different* function in the call path or namespace]
7,651
125
1.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 13But the norm historically is that you wouldn't see *most* discussions that revolved around you: you wouldn't see most book reviews or most letters to the editors or more than a vanishing fraction of gossip or... And it's still true if you get discussed on a different site.
𝔊𝔴𝔢𝔯𝔫@gwernNov 13STOP divisively COMPARING programming type systems!
- HASKELL has statically checked H-M type
- JAVASCRIPT has prototype inheritance with ducktyping
- TCL
- CLOJURE has strong dynamic typing with rich hierarchy
- PYTHON has type hints & reflection
- C++ has multiple inheritance
𝔊𝔴𝔢𝔯𝔫@gwernNov 12What makes you think there's any money *to* return? As twitter.com/deliprao/statu… points out, it's likely SBF has not paid most of that. I'd assume that the agreement claws back his shares if he fails to pay all tranches, but Anthropic may be in serious trouble now runway vanished.
𝔊𝔴𝔢𝔯𝔫@gwernNov 12(Each pane/row is a deeper level of recursion for the popins, and just slide right as new ones at invoked that that nesting level. So they don't obstruct the main text, but use the vertical space left by the newspaper-esque horizontal columining.)
𝔊𝔴𝔢𝔯𝔫@gwernNov 12I was being sarcastic. Oh, 1000 or 2000 kids, sooo absurd? Hell, run a highschool poll and at least that many will tell you that they are paraplegic pirates with a parrot perched on their pauldrons and thus, disabled minorities.
100
2
2.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 12So you need, like, 10 really bright ambitious kids like a Warren to do the obvious easy highly incentivized lie? That doesn't sound 'huge'.
87
8
9.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 12'Sampling can show the presence of knowledge but not the absence.'
The problem with badly designed websites and tools is not that they can't, but often that they *won't*, I find. No tool can save you from yourself, and that is why you can't buy good taste either.
74
2
2.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 12I can continue to believe that the ineptitude of FLOSS for good UI/UX design will continue as long as it is relevant and also believe in scaling. There are many ways for Mem to fail, but 'FLOSS will suddenly be able to herd cats' is one of the less likely ones.
41
3
7.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 12You know perfectly well that most people seeing this, and the others, don't know that it's fake, any more than they are 'in on the joke' for Tay or CA.
59
2
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11The phrase I've started to use is 'good design is invisible'. I'm also trying 'it's easier to invent a bad design than understand a good design'.
eg dsalo.info/pinboard-vs-ra… what could be easier than to clone Pinboard? What could be simpler than Zendo koryheath.com/zendo/design-h… ?
5,548
191
3.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11They didn't 'elect Trump' because their claimed effect sizes on such impoverished data is impossible and all of the PR/reporting was basically kayfabe: CA lied about it working to drum up its business, and its enemies believed its lies because that was their business.
268
26
9.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11Oh, mine can have multiple parents too but yeah, that part is a lot less elegant because you have to store that outside the filepath. In my case, I have to build a Markdown 'foo/index.page' index, so to give 'foo' a cross-reference 'bar', I tag 'foo/index' w/'bar'. Works, but 🫤
120
3
2.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11Yeah, I base my tags on filesystem paths as a common hierarchical tag system, so I get that for free. (Refactor 'foo' into 'foo/1'...'foo/5' and searching 'foo' brings up the union.) A user could also decide to promote, say, 'foo/3' to '3' if that makes sense to them.
139
4
2.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11It is enough of a PITA to integrate a bunch of things like the database, embeddings, a DBscan library, interactive UI etc that I don't know offhand of anything that does this. But that's exactly the sort of integrated approach a really good PIM could create.
108
2
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11And then there are bootstrap aspects: when you embed items, you presumably include their metadata like tags. So as you get better tags, the updated embeddings also get better and more relevant for similarity in general, and then help you expose additional sub-sub-tag clusters.
102
5
4.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11You could also do it automatically. (Tags don't have to be perfect to be useful. Some misclassified entries in 5 sub-tags is still better than an opaque mega-tag.) This can help expose topics or ideas to the writer, too, when they see how their tag ontology evolves.
66
1
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11However, with embeddings, you could simply use a clustering algorithm to extract 5 centroids/clusters, assign the 1000 automatically to each, and the human scroll through the list to approve. Time & effort drops from 'hours of exhausting tedium' to more like 10 or 20 minutes.
70
4
5.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11Any such tag can usually be refactored into more like 5 useful tags with 200 entries. Often quite obvious what 5, too. But who's going to do it? A human? Defining the 5, & reading 1000 entries and deciding 1 by 1 which 1-5, is a huge PITA! Even with tooling, I *hate* doing it.
73
8
11.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11Sure. When you maintain tags like gwern.net, the worst part is that you wind up with mega-tags: tags with eg 1000 entries. This is useless for both reader & writer. Like blog posts with tags that take you to 'basically the entire blog'. So you don't use them.
81
5
6.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11I think the quality of a FLOSS snapshot is going to matter a lot less than things like reliability of backend, regular upgrades, ease of integration, and putting in the elbow grease to go from FLOSS-typical 90%-of-the-job to useful polished 99% of job for non-programmers.
296
5
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11The status of logical tautologies as evidence is a long-debated one... but I've seen economists model a lot of things I definitely don't believe (eg Leeson), so let's say that I'm not going to change my newsletter linking habits for fear of undermining reader scoop inferences.
183
10
5.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11What makes WP categories useful, while most sites' tags are completely useless (because tags either have 1 entry or 1 million)? An army of editors+bot automation constantly updating it. A slick UX will bring that to your notes, and even make it like popping bubblewrap: enjoyable.
3,075
35
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11For example, the whole 'tag' experience could be *so* much better compared to any other tool I've seen with good design exploiting embeddings: automatically cluster and suggest edits. Refactoring tag ontologies seems to be the biggest UX killer of non-ML-powered tag use.
665
27
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11The design came first, before all the ruby screens and fancy AI tensor cores and M1s. No one bought the iPhone for the CPU. And I would point out that OA is very good at the supply chain and competitors there still struggle to catch up years later (eg OPT, BLOOM).
373
10
2.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11(Worth noting this is merely a modeling exercise. They have no evidence that it is true.)
2,710
26
1.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11(Ah yes, the famously copyable traits 'good taste' and 'quality', which is why Apple went bankrupt a decade ago after everyone had plenty of time to copy all of its copyable UI/UX.)
1,007
55
5.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11No, not nearly. Most of it was just an 'echo' built-in function they forgot to disable, spread virally by carefully cropped screenshots, and it's unclear there was *any* 'learning' involved anywhere.
252
26
10.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11This is fake, you know, I don't see it anywhere in @Chiquita's tweets.
But given the scholarly rigor applied to, say, Tay and Cambridge Analytica, I look forward to seeing this and other screenshots for the rest of my life as conventional wisdom in papers about Musk/Twitter...
5,907
342
5.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11Mm. Yeah, 'use embeddings for search' is not technically impressive, but these things live or die on their UI/UX, and *that* is something FLOSS has always been *terrible* at at any timescale. 'I can do that in org-mode' is roughly a 'Dropbox is a weekend project' level response.
4,644
166
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11('No bed frame' is too easy, should obviously be, 'Signed up for bed frame'.)
3,993
33
0.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11Mm, the need for prompt engineering doesn't really go away. You still need to come up with interesting prompts, even if you no longer need to lard on dozens of adjectives and corrections. Mechanical sympathy will remain a useful skill.
92
6
6.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 11"We are devastated to say that it looks likely that there are many committed grants that the Future Fund will be unable to honor." forum.effectivealtruism.org/posts/xafpj3on…
😱 So looks like they 𝘸𝘦𝘳𝘦 granting based on SBF promises of future transfers, and not cash-on-hand. Bad.
𝔊𝔴𝔢𝔯𝔫@gwernNov 10If you had measured it by total tokens, presumably the learning curve would look worse because the LR was now too high, rather than much better as plotted by gradient step?
284
15
5.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 10At least in ye olde Turing Test or Loebner Prize, the chats weren't very long. Judges just don't have the time or patience to type away for eons. How many tokens could an entire Turing Test with a single judge be? (20k BPEs is a *long* piece of text.)
125
9
7.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 10(It is, however, true in general that people think I do way more drugs than I do do. I understand the impression from my writings, but remember, those are fairly comprehensive and it was all spread over 15+ years!)
𝔊𝔴𝔢𝔯𝔫@gwernNov 10Oh wow. I saw the HN discussion asking just that and thought, 'yeah, sus, but surely a very srs mathematician doing very srs big-if-true work picked the very best constant he could prove and it's just an amusing coincidence.'
397
25
6.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 10Er... I'm pretty sure VC funding has nothing to do with it. I can definitely say that not a single one of the lads here I have spoken to has ever once brought up startup-y stuff, and this has been observed on many college campuses and overall statistics.
56
0
0.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 10I don't know who needs to hear this, but if your beloved has 'breath like wine' and 'eyes like the sun', you should probably not kiss her lips but rush her to the emergency room, as her chronic alcoholism has led to end-stage liver disease.
8,809
89
1.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 9forum.effectivealtruism.org/posts/2mx6xrDr… They also apparently have been using DAFs. DAFs should be legally safe (their whole point is being irrevocable and no longer owned by the donor), but again, might be now holding worthless token assets. Complicated.
261
46
17.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 9I'm not sure. It's part of "FTX Philanthropy Inc." which claims to be a "nonprofit" so... it *should* be separate from FTX.int's bankruptcy. They'd have to zero out any stuff like Solana/FTT that they still held of course but they should've sold any they received.
554
33
6.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 9Is there really anyone? I would have assumed that @ftxfuturefund had been only promising grants for money they had in the bank from SBF, and that even SBF going to zero would simply mean that they'd spend down their existing endowment (whatever that is) and wind down.
𝔊𝔴𝔢𝔯𝔫@gwernNov 9This is the most elaborate way to get a 'like' from me I've ever seen.
6,116
275
4.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 9(I felt that made it just a little too obvious.)
97
3
3.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 9twitter.com/gambrinous/sta… Another possible example: does '22yo' jump out at you as a dangerous date-range of '22 years since born in X'? If not, you may not have internalized that $CURRENT_YEAR is 2022 and the implied birth date of X = '2000' is a very special salient round number.
5,359
49
0.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 9The massively increased female-tilt on campuses has been observed in the US too. I recall in 2021 looking around one day on my way to the college gym and going 'where are the guys?' & starting to count pedestrians. They still haven't recovered to pre-pandemic baseline of ~40%.
5,511
95
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 9I was reading about the Three Arrows guys and wondering what a 'Toyota Century' was he wanted so badly even in Singapore's punitive car regime. Reading WP was an epiphany. "Oh, so 𝘵𝘩𝘢𝘵 is what those cars in anime all were‽‽" I'd just assumed they were idk BMWs or something.
𝔊𝔴𝔢𝔯𝔫@gwernNov 8Can't tell if you deliberately averted 'All your bits are belong to U.S.' or not... 🤔
Anyway my contribution: "Every coin in the world belongs to America!" reddit.com/r/funny/commen…
2,722
113
4.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 8The animal spirits were lower that day than the NASDAQ.
5,140
77
1.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 8Another idea - for more diverse sampling of n diffusion model samples: initialize 1 random noise, shuffle (permute) n/2 times, then mirror-sample/negate.
Guaranteed exact mean=0, reducing any clumping/spreading out samples, and little harder than sampling n random noises.
𝔊𝔴𝔢𝔯𝔫@gwernNov 8I dunno man, I've been reading a lot about how $8/month is an unreasonable burden and I should be paying *them* for the tweets I 'like' and I wouldn't want to be single-handedly responsible for the doom of the bird site.
90
8
8.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 8The important thing is you know all about his death, and the details of the eruption recorded in that letter, because of it; and you don't know all the stuff Pliny the Elder said like 'the Indian insect the ant mines for gold and is the size of a wolf, devouring would-be thieves'
251
7
2.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 8I agree. The meta-learning perspective makes sense of this, and also explains why it retains generative modeling of things that don't look agenty, like the Python REPL.
Taking that perspective suggests including more conditioning and a more Decision-Transformer-like approach?
212
4
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 8Yes. That covers most of the dot-com bubble and then bust, so it exercises way more potential skill than this paper's 2011-2014 or whatever: buying in, selling gains, tax harvesting, and avoiding locking in losses from panic selling or daytrading or penny stonks etc.
110
5
4.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 8Yes, although my point here is more that Grinblatt has up/downs since they cover the dotcom bubble+bust, while this is just a bull market slice.
(It would be perfectly legitimate for IQ to have ~0 correlation b/c of autocorrelation/sample-size etc; it just doesn't actually.)
157
4
2.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 8I doubt that, because that would erase their positive point-estimate and Grinblatt as well. It's just an uninformative result: compared to Grinblatt, it's a very small sample measured noisily in both variables, and so not being the same size nor statistically-significant is meh.
174
8
4.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 8Meh: 66 times fewer self-selected people rather than population registry, a weaker IQ test (a quick survey Raven's rather than full-scale draft IQ test), and trading records in a single bull market (everything↑) rather than across a bubble where choices matter most (↑↓↑).
𝔊𝔴𝔢𝔯𝔫@gwernNov 7(Ah, the 'centaur era' for language models arrives. Hopefully it lasts longer than the chess one did.)
887
74
8.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7$2.23/hr for H100s next year, nice. A100s are so tired anyway.
3,075
14
0.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7(I'm still convinced that there's a 'right' generic initialization for Transformers and warmup is merely a hack to overcome the fact that stuff like He initialization are just plain wrong and quickly travel to a semi-working initialization.)
1,137
50
4.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7It *was*, at least on the consumer end of things! That's because security is invisible by default: the testing-can-prove-the-presence-but-not-absence-of-bugs sort of thing. You can see the happy-path works for you when you test it out, but not all the pathological alternatives.
2,416
16
0.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7...Which means that if ByT5 *does* do well on it, it shows even more so that it's an uninteresting artifact of tokenization of interest, as you say, only to 'LM engineers', and thus ByT5 is informative. & if it does really terribly, then that would then be evidence it's not BPEs.
175
9
5.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7No, you don't. Aside from rhyming and maybe the categorization tasks, no human in history has ever genuinely cared about asking a LM 'which word has more letters, "cat" or "holiday"'. No one. Ever. It's of interest solely as a microbenchmark demonstrating the BPE problem.
296
15
5.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7Entire university departments are formed around individual PDFs; students stare in despair at bibliographies listing hundreds of journals no longer extant, & try to fill in the gaps: "The Spice Girls as a Consequence of the Columbian Exchange: Reexamining Zooblax 2301's Thesis".
3,217
133
4.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7Like... why? Why create a mostly-but-not-entirely BPE-problem benchmark which ignores BPEs (even though you, specifically, ought to know all about them) and also doesn't offer any interesting ablations or comparisons like ByT5 or other character-based models?
195
18
9.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7You didn't design with 'minimal assumptions'. You took a laundry list of already-known BPE problems like rhyming, spelling, word length (examples of all of these already linked for years now in my writeup gwern.net/GPT-3#bpes ) and put them into a paper ignoring all BPE stuff.
168
17
10.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7Of course they are *somewhat* aware, or else they would always perform at random baseline. And they wouldn't because these can be memorized, even your example, BTW: pic.twitter.com/EiETlhcqDc
176
7
4.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7Contrary to the description of 'a wide variety of problems', at least 11 of these 25 tasks are nothing but different versions of the BPE problem, and several of the remainder are mostly the BPE problem. (Rhyming! Seriously - you are claiming novelty for GPT-3 not rhyming?!)
195
7
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7"Which word has more letters, 'cat' or 'holiday'"? / fewer letters / alphabetical order... Hm... 🤔
C-f "BPE" [not found]
C-f "byte-pair encoding" [not found]
Are you serious? 🤦♂️
4,632
86
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 7(Also worth pointing out that, even if you didn't have various prospective/retrospective survey data, the anecdotes alone would be fairly powerful evidence against an *inverse* correlation, because you are far out on the tails, and anti-correlated extreme traits extremely rare.)
701
21
3.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 6How could we know for certain that 57 wasn't prime before, and God did not by an act of pure volition will it composite and rewrite all memories accordingly?
𝔊𝔴𝔢𝔯𝔫@gwernNov 6Wonder if it's real? China has a huge problem with fake antiques, and 'private collection' doesn't foster confidence.
7,296
102
1.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 6Weakly. But the null hypothesis is just that 'they have more of everything' - openness and intelligence and energy, in particular. Little of Root-Bernstein's work can really address the doubtless extensive confounding.
238
10
4.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 6Where is her sample from? How does she know the anecdotes are wrong and she is right because historians and admirers are just making stuff up? If it's selection effects, how is *her* evidence immune to this? And so on.
539
18
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 6I think it's hard to say because Boyle is basically just making stuff up here and not defining terms. What's 'most domains'? How does she know there's an *inverse* correlation? (The only domains I know any real evidence on show the opposite.)
818
27
3.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 6I'm not sure I believe 'business'. At least, not at all levels that might be relevant to VC discussions. (Seems like overfixation and hard work can mask the need for a startup to pivot, for example, rather than chase a doomed plan like a grayhound on Adderall at the racetrack.)
583
15
2.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 6Sports definitely has extraordinarily steeply diminishing returns in absolute perf until you crack the top _n_.
I think it's more closed vs open domains. Similarly for chess: being too good at other things is a liability.
𝔊𝔴𝔢𝔯𝔫@gwernNov 6(This is also quite handy for getting lines or essay ideas. The downside is you sometimes get a "Heartbreaking: The Worst Persona You Know Just Made A Great Point" moment.)
67
13
19.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 6(Sometimes when unable to sleep (eg post-vaxx last night) I visualize the Twitter timeline in a near-lucid state, to watch tweets scroll by. It is completely effortless as the GPT in my brain autonomously generates tweet after tweet by persona after persona. None sound like me.)
90
13
14.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 6@SamoBurja You were right, this has been remarkably educational about the mechanics of soft power in the US. I didn't even know these events existed, much less were major levers of power for the establishment.
𝔊𝔴𝔢𝔯𝔫@gwernNov 5(Makes for hysterical scenes in anime, though, when they set something in Manhattan.)
227
11
4.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 5I always wonder how much foreigners, especially ones in highly autocratic & censored countries like China/Russia, are misled by media coverage of DC/SF to massively underestimate US wealth & capability because they don't understand it is driven by internal politics + openness.
𝔊𝔴𝔢𝔯𝔫@gwernNov 4Yes: if few-shot learning is meta-learning, then the labels don't need to be correct as long as they help narrow down the posterior of possible tasks. A totally wrong demonstration how to write C++ code still tells you that you aren't supposed to be writing Shakespearean sonnets.
84
2
2.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 4Can you find anyone who thinks 'fiducial inference' wasn't cringe?
3,242
36
1.1%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 4This seems rather circular, because intelligence is absorbed by Openness and then Big Five are, almost by definition, uncorrelated with each other? (And then HEXACO is not much different from OCEAN in the first place.)
4,730
79
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 4'Overreacting'? Marry, surely this making merry mars Marys trying to marry?
80
0
0.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 4(This is also a good approach to scaling up models: you aren't limited to images you can scrape a meaningful alt text from out of CC. Any set of images will work for unconditional training; then you do conditional training on what good text+image pairs you have to finetune it.)
𝔊𝔴𝔢𝔯𝔫@gwernNov 4One important point about unconditional SD: you can finetune-train in it—you don't 𝘯𝘦𝘦𝘥 paired text captions! If you have raw images, go nuts. Writing captions w/BLIP or by hand is unnecessary, for the most part. If there's a name SD doesn't know, need just a few-shot labels.
𝔊𝔴𝔢𝔯𝔫@gwernNov 2eg arxiv.org/abs/2205.12393 is still using a bit of old data mixed in... but the percentage is so small you really have to ask how important that really could be and whether online continual learning on real-world data may just 'rehearse' incidentally enough in big parameter NNs.
131
12
9.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 2Oh, I thought you were going for the 'slave of some dead economist/metaphysician' point about priors: "why do I believe X? I can't quite remember... Oh my god - maybe I only believe X because some social scientists said so 30 years ago with p<0.05!"
278
12
4.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 2I heard that when it writes pitches for goat soap, that it's so compelling people ask if real/where they can buy it.
I also heard that it still uses BPEs—not because they couldn't use sparsity but because my wailing has become an OA injoke & no one has the heart to end the gag.
5,030
98
1.9%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 2Another good use of the BLIP captioning trick. Also enjoy BG as Akatsuki.
𝔊𝔴𝔢𝔯𝔫@gwernNov 2I don't see much Russian work on Arxiv, so that checks out.
52
0
0.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 2They're just adult kids trying to enforce kid candy norms against the parental cartel exacting horrendous levels of confiscatory candy taxation. Increase the consumer surplus!
8,103
137
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 2Are you asking why "SINHALA LETTER KANTAJA NAASIKYAYA" or "DESERET CAPITAL LETTER GAY" don't show up much in Greek alphabet use in science/math?
62
7
11.3%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 2I tend to reflect more on the insanity of Bay Area landuse policies when I look at the presidiogolf.com (top right, the parallel tree rows) not far away from the world-famous homeless encampments.
1,984
71
3.6%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 2It's a fun one. "Is that a... 9? A G? Maybe... an italic 'g'? ..not a cursive 'e', surely? What?"
𝔊𝔴𝔢𝔯𝔫@gwernNov 2Considering Twitter's track record on security, they can have my driver's license or any other comparable documentation for verification when Hell freezes over.
208
3
1.4%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 2If you have ever found yourself, Anon, late at night, wondering, 'what 𝘪𝘴 the least used Greek letter anyway?', know that you are not alone, and someone, somewhere, is fighting for you. pic.twitter.com/pvAman2oZw
3,660
183
5.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 1Indeed. The wheel of fortune spins to curation, bringing high the low and low the high.
248
12
4.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 1I think you are. The context here is for the human, not the AI, because it's pointing out the stripes are shadows cast by the bars, not that the bars look slightly like a zoo cage. (Which hadn't even occurred to me.)
𝔊𝔴𝔢𝔯𝔫@gwernNov 1Oh, definitely. Without the muzzle in the second image or any hint elsewhere in the first image of the shadows even existing, you would have to think fairly hard to realize it's way too small and the stripes don't quite curve right and colors are missing.
𝔊𝔴𝔢𝔯𝔫@gwernNov 1Yeah, the problem is I think a lot of people are motte-and-baileying with these. "Oh, it's just a joke, it's not some sort of rigorous benchmark, doesn't matter if it's real as long as it's funny; don't be such a killjoy." [TO SELF] "See, I knew DL was a stupid fad."
293
16
5.5%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 1(Is that actually true? I feel like every time someone posts a chihuahua/muffin-type image, it always turns out to only fool a 2014 model or not even real in the first place.)
7,458
278
3.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernNov 1At least as a start. Need to graduate from flashcards initially to autonomously writing flashcards and the parent prioritizing real tutoring at some point...
60
1
1.7%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernOct 31(Obviously, it can't just keep doubling every 16 days. I figure it probably isn't more than a doubling or three away from a steady-state where everyone who wants to generate & upload to Pixiv is doing so & novelty wears off, so you get like 50k/day steady-state until boosts.)
7,656
62
0.8%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernOct 31Considering the heavy tail where AI 'work' entries look like they have many more discrete images than regular 'works', and ballpark 100m 'works' on Pixiv now, and further accelerations, there may be more AI-generated than human images on Pixiv within 2 years or so.
5,297
53
1.0%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernOct 31Some more drama around SD1.5, but what else... Checking back in: over the past 15 days, AIbooru has doubled & Pixiv has doubled to >56k 'works' (improving search to add 'waifu-diffusion OR hentai-diffusion' adds a half to >76k), so >4k 'works'/day (!), or >1.6m/year (!!!).
3,125
36
1.2%
View Tweet activity
𝔊𝔴𝔢𝔯𝔫@gwernOct 31"Whether it's the 1980s with 𝘙𝘢𝘯𝘮𝘢 ½ or the 1990s with 𝘛𝘩𝘦 𝘔𝘢𝘵𝘳𝘪𝘹 or the 2010s with 𝘔𝘺 𝘓𝘪𝘵𝘵𝘭𝘦 𝘗𝘰𝘯𝘺: 𝘍𝘳𝘪𝘦𝘯𝘥𝘴𝘩𝘪𝘱 𝘐𝘴 𝘔𝘢𝘨𝘪𝘤, shy nerdy boys want only one thing—and it's 𝗱𝗶𝘀𝗴𝘂𝘀𝘁𝗶𝗻𝗴."
5,693
102
1.8%
View Tweet activity
You've reached the end of Tweets for the selected date range. Change date selection to view more.
Get your Tweets in front of more people.
Use Tweet Activity to track how your Tweets are doing.
Engagements
Showing 30 days with daily frequency
Engagement rate
2.4%
Nov 30
2.0% engagement rate
Link clicks
3.3K
Nov 30
110 link clicks
On average, you earned 112 link clicks per day
Retweets without comments
0
Nov 30
0 Retweets without comments
On average, you earned 0 Retweets without comments per day