Tweet activity

April 2023

Your Tweets earned 277.5K impressions over this 30 day period

10.0K20.0K30.0K1020Apr 2Apr 9Apr 16Apr 23Apr 30
Your Tweets
During this 30 day period, you earned 9.3K impressions per day.
  • Impressions
    Engagements
    Engagement rate
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 30 That makes it even more impressive an anecdote, that people could so badly misinterpret what they see in the prototypes and dismiss it for years until suddenly, for no good reason, they are able to see the obvious. (*cough* AI *cough*)
      49
      6
      12.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 Already linked😀 Also, they seem to have recently changed the design and lost a lot of the magic - it's now much less grid-like and noticeably slower.
      44
      2
      4.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 SimilarWeb, like all such traffic estimators, can be assumed to be highly inaccurate, and # of visits != people, unless I suddenly became a few hundred people while I wasn't looking.
      200
      8
      4.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 What people always take those as meaning is 'progress is more than half caused by new ideas and so compute is causally unimportant', when the ideas are caused by compute investments.
      132
      5
      3.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 Yes, but that's average-case, so to speak, not what would be possible with much larger budgets than actually used. (I also think you guys are misinterpreting what these sorts of results mean. They are not causal for algorithmic progress.)
      88
      5
      5.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 It's remarkable how many people, whether Wired or random tweeters, all fire up GPT for the first time and all independently decide on the same tasks like 'count the number of letters in a random word', and all get fooled by BPE problems. I wonder if this has had an impact on PR.
      66
      4
      6.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 It's like if you read the Bible and then someone asked you for the word length of a Hebrew word. 'Why can't you just tell me what it is? You just read thousands of pages translated from Hebrew! How can you *not* know exactly what Hebrew word each English word corresponds to?'
      57
      5
      8.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 Because it is never ever given access to the opposite information. It never sees the text encoded into individual letters or even a dictionary symbolically explaining the mapping of BPEs to letters () and certainly not the word-length.
      38
      6
      15.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 Yeah, but like all pro-China arguments, it's a dumb one. All the wrong kind of data, and fiercely siloed to boot.
      97
      3
      3.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 Hm. Looking at these poll results, I suspect that either people have seriously miscalibrated views on growth of global-data+sample-efficiency, or I asked it in a bad way despite my best effort to be clear. May require a much longer sort of survey question with stats etc...
      3,317
      53
      1.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 29 New feature: Twitter links are now automatic annotations, by parsing the local Nitter archive snapshots/mirrors. (I was going to do this on the backend but Said decided to do this on the frontend instead; both have advantages, and he got there first, so 🤷‍♂️.)
      4,905
      63
      1.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 28 It can't actually rhyme or follow instructions; eg. it still can't explain puns. The example I gave there in the comment Quanta still has not approved was ask ChatGPT to 'write a poem that does not rhyme'. It's like asking you to "write a poem which does not bleepledrof a knckit"
      82
      3
      3.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 28 If you're wondering how we handle really long or nested section titles, because every vertical pixel is in the final analysis a theft from those who hunger & are not fed etc: we left-truncate to keep it constrained to ~2 lines with the most informative (deepest) headers. eg.:
      6,960
      95
      1.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 28 A bagel-with-cream-cheese aren't sold in grocery stores, though. Bagels (if you can call grocery store ones that) and cream cheese are sold, sure, but separately.
      26
      0
      0.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 28 (I look forward to donating my fecal samples for metagenomic microbiome studies and discovering my new Assphages.)
      700
      10
      1.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 28 Small feature: image-focus/zoom now shows all available non-redundant metadata ie. URL, title, alt, caption. Looking forward to GPT-4's image modality just plain solving image captioning/alts, so you can just auto-run it on all your images and get human-level alts.
      2,467
      97
      3.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 27 (One has to read amusingly far to see that these Alibaba researchers are using GPT-3 Codex / code-davinci-002...)
      671
      18
      2.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 27 I'd love to see some interviews of Moravec, Vinge, and whoever else is still around from then. Rumelhart and Minsky are dead, I know that, but who else...
      1,599
      76
      4.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 27 I don't think it is, because it is *definitely* not going to be limited to mere CV. Tool AIs want to be agent AIs, no less in military applications than economic, and drones and artillery especially have been hurtling towards autonomy as fast as they can develop it.
      62
      4
      6.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 27 What makes you think they are somehow completely separate and independent? Yann is obviously wrong in his claim: humans are extremely interested in designing AI to hurt other humans. (Strictly speaking, he is not necessarily being hypocritical here; just wrong or changed minds.)
      82
      12
      14.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 27 Seems like a bad framing. Markets arise without any governments nor do rights/contracts require it. Indeed, many markets arise despite extensive government actions to destroy them, never mind withholding enforcement (eg my old area of darknet markets, or most cryptocurrencies).
      59
      0
      0.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 27 A few months ago I spent a while looking through trying to find 3 loss curves I could use for the other panels, but I couldn't find convincing ones. I see now I should've had the courage to only replace the fourth panel.
      1,031
      118
      11.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 26 For some extremely well-defined tech, perhaps, where you can be sure that no new needs or materials or anything have popped up nor any emergent effects which change everything.
      171
      9
      5.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 26 Seems risky due to gambler's ruin. Very tiny ecosystems mean issues like asteroid impact, ecosystem transitions like oxygenation, mega-pandemics, and sheer variance in vent lifespans/positions would wipe out life eventually.
      279
      12
      4.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 25 [IMAGE CAPTION for the blind:] "We're not so different, Nick—you and I... Join me, and 𝘵𝘰𝘨𝘦𝘵𝘩𝘦𝘳, we will change the world!"
      12
      3
      25.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 25 Yup. I'm not worried about the regular average-case performance: training on model outputs will be fine (). But security is adversarial and about extreme edge-cases, so every weird bad edge case getting amplified will fattens the tails up dramatically.
      133
      12
      9.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 25 Or just acquiescence bias, yes. Most citations are accurate {{citation needed}}, so it could just assume they're all correct.
      234
      5
      2.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 24 Heck, forget the salaries - why would you want to move to Xi's chip-embargoed China to work on behind-SOTA systems that will probably be DOA as soon as they *seem* to violate any censorship regs or the org otherwise incurs CCP wrath?
      973
      31
      3.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 24 Leaning heavily on the 'exponential' rhetoric could backfire. After all, if mistakes can 'compound exponentially', doesn't that imply that when R<1, so to speak, total error will abruptly begin to decrease exponentially...?
      1,039
      63
      6.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 24 Anyway, still finetuning the prompt. I want it fully autonomous in terms of revising the translation and making changes, but it is a bit tricky: there's a tendency to settle on one version immediately and then just repeat it. eg this one is the fourth iteration but same as first:
      160
      12
      7.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 24 Round-tripping doesn't prove that the translation didn't work. Like, I may not know Old English, but I do see more than enough root-words there that I can tell the 'Old English' is in fact related to the Milton input, and not totally different and unrelated like your example.
      46
      4
      8.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 OK GPT-4, you got me there, I did say into 'Old English style'. Regrettably, I don't know any Old English so I can't tell how well the translation works - GPT-4 says it's great, but it 𝘸𝘰𝘶𝘭𝘥 wouldn't it? 🤔
      294
      54
      18.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 This list as well emphasizes my point: by the time you are reaching to obscure failed German political parties, producing a half-invalid list, or examples *copied from Buddhism in the first place*, you're showing how rare it is. 3000+ years of Western history and this is it?
      103
      13
      12.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 1. Not a number nor an ordered list 2. damn that's obscure. Had to look it up. 3. invalid: there are four humors (sometimes) but they aren't 'The Four Humors™' 4. invalid 5. valid but reaching all the way to Islam, eh 6. valid 7. invalid 8. copied from Buddhism! 9. invalid
      121
      11
      9.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 Yes, self-distillation/finetuning out outputs of larger models can backport abilities; the ability/Turing machine is there in smaller models, just too far below the surface to matter without the Bayesian evidence from finetuning to make it a highly likely prompt interpretation.
      595
      32
      5.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 Sorry, left deadly sins out. Well, you do now. And I've seen the snowclone/joke, but not the numeric epithet name or that there was supposed to be some specific pseudo-scientific taxonomy of 'love languages' behind it.
      80
      4
      5.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 I think this emphasizes my point. You have to make up half of these by simply turning any schematic into a Numbered List, or go to some ephemeral pop psychology clickbait like 'love languages' to find any examples. As opposed to East Asian where you trip over famous ones.
      126
      7
      5.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 1. Weak because the 'Seven Wonders' is late & inconsistent 2. obscure & invalid, I've never heard them called 'The Ten Categories', just 'Categories' or 'Aristotle's categories' 3. invalid 4. invalid 5. Granted 6. Granted 7. never even heard of that one... 8. Granted 9. Granted
      159
      17
      10.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 Solving multi-level/speed community design () is something pretty much every place gets wrong. Whether Discord tacking on pseudo-forums which are worse than forums or Reddit tacking on chat... It's hard b/c what makes you win at one level, loses at another.
      84
      7
      8.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 Can you name any Numbered Lists beyond 'the Ten Commandments'? I can't... Even stuff like the Bill of Rights (which should be a gimme: 'the Ten Amendments') don't use that snowclone. Meanwhile, I'm a Westerner and I can rattle off a dozen Asian Numbered Lists.
      2,395
      78
      3.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 23 I'm starting to be concerned that this is too hard to ask in a poll and people are answering different questions: people can't really think that it's <1% when global data expands >>1% annually alone without any other kind of progress? How would that even work?
      71
      8
      11.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 22 I think it's hard to tell because it's not particularly prioritized outside DRL. Sample-efficiency is not compute-optimal, to say the least.
      127
      4
      3.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 22 Sounds to me like you just explained how it makes sense and also offered good reasons for picking the fourth poll option.
      230
      5
      2.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 22 Yep. 'Sample-efficiency' here implicitly refers to 'real data sample-efficiency', because synthetic data is both not generally useful & just a kind of compute. eg we talk about MuZero learning Go sample-efficiently from thousands of games, not millions of simulated self-play.
      78
      1
      1.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 22 Incidentally, people seem to be placing great stock in the 'data shortage' excuse, but obviously, data increases every day & sample-efficiency also increases. Mildly curious, so a poll: "Every year the fraction of global data required to train an AGI falls by <𝘟%", where X is…
      8,876
      322
      3.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 21 Huh? Are you multiplying 1000 by 1000? How is that relevant? You usually have a dead body when it comes to murder trials and testimony... The probability a murdered person is murdered is darn near 1 in 1.
      1,198
      60
      5.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 21 I think they just mean that it happens to parallelize on up to 10 cores, so that they can run a few hundred or thousand proteins simultaneously for throughput, not that their *entire cluster* is 10 cores. 😁 I mean, people have more cores than that in their laptops these days.
      562
      30
      5.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 21 Given that the noise processes are over 8b humans & operate 24/7, while even UFO partisans concede there are not *that* many UFOs zipping around or exposing themselves occasionally, I see zero problem in getting false reports:true ratios >>1000:1...
      800
      26
      3.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 21 Er, yes it can? At this point, with nearly a century of UFO sightings, we have entire libraries of debunked cases, conmen, classified programs yielding thousands of sightings, extensive documentation of aerial hallucinations from pilots, etc etc.
      989
      44
      4.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 21 RIP Google Brain. Your efforts were noble & underappreciated, often well-conceived & not without a style, and your demise was the fault of the responsible executives who are busy dodging responsibility. You will be remembered fondly for rearing a generation of AI researchers.
      88
      23
      26.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 21 No. My guess is that Young is a bit player in the psych department, or someone Feynman talked to. Of known names, Curtis looks most like Young, and I purchased his PhD thesis to try to get the original. Shepard's monograph may require a trip to the Library of Congress, sadly.
      2,805
      32
      1.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 21 Yeah, it's quite a rabbit hole. Once you have the 'cue' (or the 'floor cue', I should say...), the whole thing unravels and makes sense. eg I'm now fairly sure Feynman learned the story in summer 1947 attending a minor seminar in University of Michigan.
      1,689
      94
      5.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 19 FWIW, I'd consider this to be an example of the Transformer doing it in an iterative/recurrent way with what used to be an exotic mechanism, so the arguments about a single feed-forward pass being unable to do parity seems to still be correct. You have to get it counting.
      92
      14
      15.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 18 Yeah yeah but isn't _Neon Genesis Evangelion_ actually just an extended metaphor for making _Neon Genesis Evangelion_?
      565
      13
      2.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 18 It is a very difficult business to extract consumer surplus from. Think about how many $ of each iPhone goes to Samsung/TSMC, vs Apple or Qualcomm. (People don't buy transistors, they buy sent-messages/emails, downloaded webpages, uploaded photos...)
      515
      36
      7.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 18 The domain-specific nature of the improvements is also a bad sign for retrieval approaches. If it's so great and even helps induce logic/reasoning/broad capabilities and saves hugely on parameters... why doesn't it work far better in general?
      70
      1
      1.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 17 Mutualism seems like the theory of g which makes the most sense in deep learning. Single-causal-variable g and sampling g look nothing like how ANNs scale or act. POT and other global-processing approaches don't look too great either.
      62
      4
      6.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 17 Probably his last really substantive public writing on AGI, other than a few offhand public comments suggesting his timelines remained largely unchanged. A pity. I certainly would've liked to hear how his thoughts on neuroscience & scaling evolved.
      1,246
      73
      5.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 17 I love it because it's either a reason to eat ice cream, or a great example of why nutrition methodology is inadequate. ('The Chocolate Glacé Is Out Of Control'?)
      642
      47
      7.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 17 It's such a weird phrase, isn't it? Could've been "moral agent" etc, but no, they went with a phrase that makes you imagine phrenologists in an operating room: "Doctor (of philosophy), the patient's moral bump is enlarged!" "We'll have to take it out. We have no (free) choice."
      15
      2
      13.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 16 TEMPEST is overthinking it. I bet solely light intensity over time can be correlated with the _n_ BBC broadcasts given a couple seconds at high accuracy, and then the BBC archive given a few minutes. Haven't you ever walked by a tower and seen the windows flicker in unison?
      61
      25
      41.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 16 Exactly: makes sections more first-class. It's a constant struggle to handle the tension between long pages with context, but the many kinds of overhead/friction you get from lots of small named fragments. 'The essay long united, must divide; long divided, it must unite...'
      467
      29
      6.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 16 And yes, being damaged in the tsunami is the obvious way for the chair to lose its leg, but the time loop leaves you a bit baffled how it gets from *there* (her yard or house during the tsunami) to *here* (inside the afterlife's temporal loop of young -> old -> young).
      468
      12
      2.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 16 I didn't take that away at all. There's nothing indicating the dead can return, she's implied to have gone through the door while searching for her mother in the days afterwards while everyone pities her, and why would her aunt be looking for her so long afterwards?
      640
      24
      3.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 16 Dammit! That's what I get for taking GPT-4's suggestions on the final clean draft and then failing to spellcheck my last-second changes. 🤦‍♂️
      126
      5
      4.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 16 Too hard to get! I'm always reading someone reviewing an indie film and going, 'where on earth would I watch this? Do I need to... fly out to Denver, or what? Is there some torrent site everyone uses for indies I'm just out of the loop on?'
      38
      4
      10.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 16 (Fixing his LaTeX compilation errors would require a lot more domain knowledge, but is still the sort of thing which you can ship off to a third party with a paragraph or less of context: 'pls2make better: {error message} ???'.)
      68
      6
      8.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 16 Sure, but every time someone accelerates these with the OA API or Playground, they're constructively proving that you *can* outsource it effectively with some brief vague textual instructions. Nor is, 'hey, pull out all the authors so I can get a total' all that amazing a task.
      70
      9
      12.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 15 Right? Ugh. And he had been avoiding the schmaltz *so* well. (I was watching the subs, so can't blame the localizers.) That speech needs to be either written fantastically well, or not be there; and Shinkai at this point ought to know that it was not written even close to 'well'.
      631
      14
      2.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 15 Yeah, I enjoyed it. It was Shinkai but not so stereotypically Shinkai. Would benefit from a little bit of editing, however: a few too many loose ends (I still can't figure out how the chair lost its leg), and the climactic speech ruins it. If any scene should be dialogue-less...
      725
      23
      3.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 15 This would be good to update. The reversal and then reversal-reversal, and collapse of Zero Covid, was really striking and resolved the anomaly of why they seemed determined to do nothing - they were out of ammunition, until ChatGPT created a crisis. But embargo's still holding?
      151
      13
      8.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 15 Yes. Same issue with anime vs manga, or SF novels vs pretty much any adaptation. Mediums just have very different distributions of costs, which enable or hinder individual creators with idiosyncratic goals.
      84
      5
      6.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 15 That debate reminds me of video games vs movies. The strength of movies is that an auteur director creates a single ultra-polished fixed sequence of controls of all viewers' gaze, attention, & visual input at millisecond-resolution. The weakness of movies...
      867
      67
      7.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 15 This would be a good student assignment: just train Transformers (and maybe RNNs and CNNs too) on varying levels of pseudo-randomness and repetitions thereof, and qualitatively characterize them. *Do* they 'go crazy'? Do they learn to 'give up' and collapse to maxent? etc
      90
      11
      12.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 14 She might just be worried about you. When I meditated, my dog would come over and lick me the same way he'd lick me concernedly when I'd play dead on the floor.
      1,475
      75
      5.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 14 (I suppose you could do that because Transformers are smart, but you'd have to insert formatting tokens to indicate which tokens are 'cat' and which are 'robot', and why would you bother? It's just a waste of compute for the Transformer to condition on & then discard input.)
      510
      16
      3.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 14 Well, it's being trained simultaneously in the sense of the minibatch containing episodes from all environments. The episode themselves are consistent: they don't, AFAIK, randomly mix them up within-episode so there's cat tokens alternating with robot-action tokens etc.
      1,612
      79
      4.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 14 (Nothing AFAIK, I assume Saroff just got a bit confused about who I was emailing. I haven't had any issues with LW2's maintainers - most of my feedback is for GW.)
      84
      4
      4.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 13 But then why not sign the statement if they are doing it anyway? OAers like sam-sama have certainly said plenty of the same things.
      2,116
      113
      5.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 13 I don't believe so, IIRC (I had forgotten about the bemusing exchange entirely until this post). As I said to him, my final email was really just some rubberducking for my notes about this project, and neither required nor benefited from a reply.
      158
      11
      7.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 13 Can't explain the inability to reverse word order. Zero planning is necessary, all you need is a greedy and very simple heuristic 'copy the first missing word'.
      197
      5
      2.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 12 (So many red flags just in the preprint abstract, especially when you compare it to the published abstract to see what the spin is. I look forward to seeing this silver bullet fade out over the next decades like every other such intervention claiming big effect from tiny cause.)
      764
      21
      2.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 12 Yeah, 'hallucination' would make more sense for exogenous edits of the input/output text. Like a LLM could hallucinate if the text keeps getting edited to remove stuff. 'I must have thought it said X, but now that I look again, it says Y! Huh. Well, in that case...'
      287
      9
      3.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 12 eg. I, a split-brain patient, confabulate a story about why my other arm is moving, "because I'm thirsty", which is completely plausible - and yet wrong. As opposed to when I take LSD and watch my wallpaper, whose true appearance I always know perfectly well, mutate and undulate.
      334
      18
      5.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 12 Yes, it's a better term because 'confabulate' is what powerful intelligences do when they lack knowledge, which is what a LLM is doing: it confabulates b/c it doesn't *know* the answer. 'Hallucinate' is exogenous & could happen all the time about the most known possible stuff.
      1,373
      59
      4.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 12 My immediate thought too. Just think about how heavy and slow rotating the entire wide-diameter lazy susan would be, versus being at the center with hardly any inertia.
      213
      4
      1.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 12 (This also causes harms. I know one guy with a 'N' surname where the bureaucracy decided to split his hiring cohort into 'A-M' and 'N-Z' and assigned the first half to the good career path, and the second to the bad career path, and good luck getting out of the latter...)
      260
      6
      2.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 12 One thing I don't think I've seen analyzed: there ought to be a kink at 'M' vs 'N', because when going alphabetically, the next most common thing after starting at 'A' serially, is to divide in half and start in parallel at 'A' (because A-M) and 'N' (N-Z).
      338
      15
      4.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 See GPU. 'The Internet' is just the lowest-hanging fruit for getting data scale. But there are other ways to the same destination, like spending compute on active-learning the key data, synthesizing diverse data, or buying/licensing data. And we'd use those if they were cheaper.
      148
      8
      5.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 Large, but not necessarily vast all-encompassing Internet scrapes. We use that because it's cheap and easy scale, not because there's necessarily anything special about the Internet. You could get a lot of text from, say, LexisNexis or Library of Congress. It'd just be a PITA.
      144
      3
      2.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 (You obviously need *some* data to start with, like you need the rules or examples of a game like Go to start self-play on, but you equally obviously do not need anywhere near the amount of data that LLMs train on, and there are many ways to substitute compute for data.)
      96
      4
      4.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 You can use various forms of self-distillation or inner-monologue to finetune on to bootstrap, create puzzles (eg. ) and artificial constraints, initialize random models to create complex environments to meta-learn criteria in, etc. See Clune.
      112
      18
      16.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 He's a Club of Rome guy though, note, eg the mention of the global famines supposedly already beginning due to overpopulation. So the dogwhistle here seems to be hinting that a socialist one-world-government will solve humanity's problems forever and that's the end of history.
      486
      7
      1.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 (I think there's an efficient-markets misfire where people assume that if GANs 'failed', it must be because superbrains somewhere proved it can't work. No, here's the real reasons: because mooch got bored, and a dude at Google screwed up the gamma pixel code by omitting a '+1'.)
      134
      13
      9.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 Oh, I've been talking about that for a long time: I think it's like how people come up with stories about 'GANs failed but diffusion models worked for XYZ', when all that happened was people just didn't try to scale GANs and mistook that as a deep truth.
      178
      49
      27.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 So I'd describe your list as a mix of: 1. neither necessary nor sufficient, and not important 2. just a cheaper way to scale by a factor, or 3. necessary to enable scaling at all and thus about scaling in the end
      847
      71
      8.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 - GPUs: valuable solely because they provide scale in compute. No other reason. They do nothing special other than scaling compute cheaply, no fancy amazing ops. Just compute. If we had CPUs which could do as many FLOPS as cheaply, we'd be much happier to use those instead!
      1,039
      59
      5.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 - Internet data: can be substituted by higher-quality curated or generated data, see for example self-play data in DRL like MuZero - Backprop: valuable only because it scales, but again, many alternatives which simply cost more compute/data so.. scale is what you need. ...
      705
      33
      4.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 - Attention: greatly overrated, we may not even be using it in a few years. - Transformers: ^ - RLHF: greatly overrated, causes as many problems as it solves for capabilities, mostly just exploits pre-existing capabilities already in the model thanks to scale ...
      825
      99
      12.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 11 I disagree: - Adam: SGD and many other optimizers work well. - ReLU: not even the right activation function (GeLU etc), again, lots of alternatives - LayerNorm: a whole zoo works, also, lots of work showing normalization is a hack to compensate for bad inits/design ...
      2,369
      181
      7.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 10 Particularly stark in cases like chess. Yes, it took like 40 years to cross the human range in computer chess. But it took DL approaches like... 4 years from Giraffe to AlphaZero.
      375
      22
      5.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 10 I think it's also that most AI systems were ultra-specialized beforehand and not benefiting from transfer, so doing it the hard way in human expert hand-engineering. Doesn't seem like they blow through the human range *way* faster the past few years now w/generalist models?
      1,209
      38
      3.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 10 What amuses me is that the Apache index was designed to replicate old directory listings like you'd get if you ftped into something... It's skeumorphism all the way down.
      662
      34
      5.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 10 It's actually not FTP: that's an HTTPS URL, obviously, and the README make no mention of any additional FTP mirror (FTP's insecure & has been removed from most browsers anyway.) So it's a skeuomorphism: 'ftp' is just the name of the download subdomain!
      1,784
      67
      3.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 10 (GPT-4: Brian Blessed, Ian McKellen, James E. Jones, Christopher Plummer, Patrick Stewart, J. Irons, Liam Neeson, Sam Elliott, C. Freeman, Kevin T. Collins, Steven Pacey, Michael Page; Kate Reading, Juliet Stevenson, Lorelei King, Tavia Gilbert, Davina Porter, Susan Duerden.)
      197
      19
      9.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 10 And then there ought to be a recitation, of course, using Eleven Labs or something. But who? Morgan Freeman is too hackneyed at this point. Maybe create a Seamus Heaney voice?
      126
      6
      4.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 9 So, pace the prior discussion, might be interesting from an influence/theory perspective: 'excavating the Old English alliterative/assonance influence on Milton'. An entire alliterative version will sensitize you to the echoes in the original.
      102
      0
      0.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 9 (GPT-4 says I should call _Paradise Lost_ rewritten in alliterative verse, _Perished Paradise_. It is indeed wise.)
      94
      8
      8.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 9 (The re-formatting as doubled lines in alliterative verse seems to break it loose from slavish rewriting of Milton & be genuinely different, and then the iterating/self-critique monologue polishes it up properly, although it still seems to make some errors - BPEs, or sparsity?)
      126
      3
      2.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 9 Yeah, I did discussion with it before that but I think it might be as simple as telling it to be simpler 'more Anglo-Saxon'. I'm excited: this is the first version I think I'd actually like to read an entire rendition of _Lost_ in! The quality definitely goes up over iterations.
      174
      16
      9.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 9 Yeah, that's a mess. Sacajawea doesn't even show up... And I don't see why 'Lorenza Cobián' is top when the link seems to show reasonable-looking birth/death formatted dates. Cleaning that up would probably take weeks of editing.
      86
      3
      3.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 9 (I haven't seen much attempt at analyzing turbo, but given how well quantizing has been working, I'd bet more on quantizing than on (just) distilling.)
      94
      12
      12.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 9 That's a good gap. Although I suspect that if you did this, you'd wind up filtering out the top few dozen on the grounds of 'are we sure they are even real people?' Zoroaster or Romulus or Jesus or Moses or Lao Tzu are in a bit of a different category from Sacajawea.
      55
      1
      1.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 Status is mutable enough, and dependent on others, that it's hard to see how it could even in theory be a 'personality' factor. You can lose all your status in a second without even knowing it because you haven't turned on the TV; that never happens with personality factors.
      952
      49
      5.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 This would be an interesting project: analyze WP infoboxes/linked data to see who are the people with the largest temporal gap between proposed births or deaths. 71 years isn't too bad but I bet there's loads of multi-century or even millennia-wide cases.
      1,058
      39
      3.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 Ah, but how do you know your sexyographic encoding of your writings doesn't fall under 'appropriate erotic content' and will be scraped?
      690
      16
      2.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 You can be certain Bard and Bing will get no such thing, given how much information LMs *can* memorize, and how destructive it would be to MS to so wildly violate customer privacy & expectations and in many cases contracts/laws.
      55
      7
      12.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 They already did: one of the interesting parts of the CLIP paper is that they fell back to contrastive learning as a hack to save compute over the obvious GPT image|text and text|image. But contrastive learning builds in a very weak understanding of language... Hence T5 uses.
      83
      8
      9.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 He never did figure out a 'bobble', and progress in aspects like homomorphic encryption (which may or may not provide adequate cryptographic security) remain vulnerable to physical attacks.
      503
      2
      0.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 We're in a weird trajectory right now where we may never 'solve' active learning or exploration or NAS, or RL in general - we're just bruteforcing it by inefficient archs, and imitation-learning from billions of IRL RL agents doing exploration/learning, and that's the bootstrap.
      78
      8
      10.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 It's one of those "right in theory, just not in practice yet" things. Like most of RL right now (eg. active learning or neural architecture search). They are obviously correct, but not work nearly as well as 'makes gpus go brrr' with simple dense supervised learning at scale.
      71
      4
      5.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 (Actually Moravec. I did watch the matches, as it happened, but was far too young to either realize that point or appreciate it had I read it then.)
      905
      24
      2.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 (And that 'traditional' societies with TFR>2 are only achieving it by something not too far from slavery: they 'spend' the same amount, just 'off the books' and extorted by force from women as virtual slaves. Which is morally abhorrent and not necessary for us.)
      440
      27
      6.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 Not 'failed' so much as 'wasn't cheap'. Various subsidies/UBIs do increase, just don't reach TFR on a shoestring. We may have to accept that in a modern society where women have so many options, their opportunity cost really is hundreds of thousands or millions of dollars, & pay.
      1,329
      50
      3.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 Note the complete absence of discussion of damages, aside from a quote from Kahle mentioning the suit for 'tens of millions of dollars'. Now, I haven't read the IA financials filings, but most nonprofits do not have 'tens of millions of dollars' sitting around to blow + costs...
      77
      16
      20.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 Any time you call 'len()' on a string or parse '1 Jan 1970', you are relying on a stack of assumptions & choices made before you which can be justified only on a 'do what I mean' basis that they get the desired results. Which is why holy wars are furious:
      63
      3
      4.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 No, it's not. There's nothing 'strict' about the interpretation of our implementation & OS-defined languages. Consider Unicode or datetime: every time a programmer dives into them, he realizes that in any precise sense, he didn't *mean* anything by 'string length' or '1 Jan 1970'
      61
      3
      4.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 I'm thinking perhaps an approach in which it summarizes a chunk of lines and then rewrites, or prompting it for an entirely different meter or verse format where that constraint forces a more free rewriting. (Alliterative Milton? Hm, why not, it's not too bad at alliterative...)
      197
      13
      6.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 It's probably helpful as a gloss or reference for a first-time reader or a student (put it in two columns, original on the left) - call it 'Milton Sans Tears' - but as a poem in its own right, pedestrian. Still haven't found a prompt for loose-enough writing to be worthwhile.
      127
      8
      6.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 8 Yeah, there's a challenge here in defining what you want to 'translate'. If you just want to modernize the spelling/vocab, GPT-4 can do that just fine. But if you want more, then line-by-line fidelity, which is what it tends to by default, gets you a modern but tangled version.
      113
      4
      3.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 7 Running these translation exercises has definitely made me wonder how much of what we value in Milton today is just the exoticism of his English to us and the struggle to understand the vocab/grammar/spelling, and we'd spurn it if we could read it as plainly as his contemporaries
      198
      17
      8.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 7 Why do we not store only prompts and then call the API every time we want to 'compile and run' the prompt? Well, because it's expensive! If it was closer in time/$ cost to a JIT, we'd not bother to cache the prompt's output permanently and work with the compiled-out version.
      966
      28
      2.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 7 Which is historically how 'high-level languages' sometimes worked. You'd compile it to assembler once and then the programmers in the field would monkeypatch and optimize it to fit needs and you might not be able to recreate it. Why? Because computers were too expensive...
      2,579
      85
      3.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 I have yet to see an argument about what 'complexity proves AI won't be able to do' which did not immediately fail several of the criteria I listed for why complexity arguments show much less than they seem to and tend to be 'true but useless'.
      110
      23
      20.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 Impossibility results like Halting or Godel are much better if you simply want to make a point of de minimis importance like 'not literally omniscient or omnipotent'.
      99
      17
      17.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 'Not omnipotent or omniscient' is an astoundingly weak bar, literally the weakest possible upper limit, and one for which there are much better arguments anyway. Again: always less than meets the eye.
      83
      7
      8.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 "We know some systems it would need to predict to be godlike are unpredictable." That right there is where the gap lies, between the unpredictable toy pinball model and what one desires to prove: why there's always less to a complexity/impossibility proof than meets the eye.
      65
      6
      9.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 If you want to establish the powerlessness of AGI, then you are going to have to do more work (and empirical work) than some cheap a priori proofs. As Russell says, the method of theft by postulating what one needs has many advantages indeed but is a bad way to live.
      114
      10
      8.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 It answers the argument on the same level: "here is at least one system which is unpredictable" vs "here is at least one unpredictable system which is controllable", demonstrating that the original piece of evidence was too weak to mean anything and can be neutralized.
      103
      8
      7.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 Yes, of course, but the trick is getting it working better than dense when it comes to TCO... The complexity/convergence/throughput never quite seems to pay off and dominate the dense models given equivalent effort/resources.
      509
      10
      2.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 No, it's not. It's suffering from the usual diseconomies of scale, it's ruining core experiences for shoppers, it's so far behind on AI no one even mentions it, Alexa/drone/a bunch of other things are boondoggles, and they can't even fix their dogshit web design 'because Bezos'.
      139
      29
      20.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 Such is the irony of AI safety/capability research. Nevertheless, since the capabilities seem like they are there already, or close enough to the surface that they can be prompted for with relatively few bits of information, it's better to discuss them than not.
      32
      1
      3.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 I didn't keep track because I didn't realize there'd be any difficulty. Are you asking for a physical object? When I ask for a physical object, it has no problem coming up with other ones like 'circuit board' or 'license plate'.
      116
      6
      5.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 Forward passes are not persistent, and cannot communicate with other forward passes. Of course a forward pass will internally be doing who-knows-what (and we'd prefer more interpretability there), but it's limited to control over 1 token's logits which is not *too* bad.
      101
      6
      5.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 IMO mostly a distraction for them. Most examples will not be cryptographically robust; they only need to survive a year, max, to fool that generation of researchers/overseers - we can't even get OA to agree to a 6-month training pause for a system they claim to not be training...
      748
      20
      2.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 6 But how much of that was a matter of lack of scaling? Given that GANs worked so well for image generation and also image editing with control of the latents, it seems hard to see how they could fail to provide useful embeddings for classification/recognition if scaled now.
      599
      41
      6.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 5 It's an odd claim to make because it's not true. AR models are still neck-and-neck in image, audio, and video, and GANs would work well as GigaGAN shows, and they are neither iterative nor AR. Then you have even weirder things like NeRFs... (The real key is 'scale'.)
      81
      4
      4.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 5 Similar story: one day at camp out of boredom I grabbed one of the little boxes of whole milk put by the coffee for the adults to try in my cereal and within several spoonfuls, realized that my whole life had been a lie.
      1,404
      74
      5.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 4 When I try asking GPT-3/4 what a gibberish string like "e652b759" 'reminds you of', they seem to have some degree of consistency. ('Pencil', 'bicycle', and 'toy car' come up a lot.)
      445
      42
      9.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 4 You don't need internal state, you can coordinate with yourself. Look at the emoji-compression. Just emergent encodings and non-robust features or macaronic prompts. You can provide GPT-3/4 some random gibberish and tell to pick whatever object that it's reminded of.
      514
      26
      5.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 4 (I did not peek at the decoded answer until around 13 when I began to get worried that I wasn't going to get it by the end since there's so many objects in a kitchen and I haven't played 20 Questions much ever so I'm bad at it, so wanted to make sure I'd 'guess' it.)
      49
      2
      4.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 You seem to be conflating intelligence and power. As my comment notes, corporations can be superhumanly powerful (in the same way that, say, a pack of wolves hunting you through a forest are >powerful than you, but dumb). They just are stupid in terms of being unitary actors.
      307
      21
      6.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 Regulating chip fabs doesn't require much intelligence to see that it's the critical chokepoint and one so simple and easy that even governments can manage it. Every point of intervention after that gets harder and harder.
      181
      18
      9.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 Yeah, temporal ordering is still a sticky point for LMs. (One of the things I've long thought that some targeted synthetic data training might help with.) The BC/AD reversal is probably exacerbating that.
      27
      2
      7.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 It's because RNNs+hard-attention lost to soft-attention+Transformers. Why bother with RL training of hard attention or repeated discrete attending actions serially when soft-attention over the whole history/raw-inputs turned out to work *so* well? Thus, the mass extinction.
      641
      19
      3.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 That looks like a good example. How could copying words one by one in a simple straightforward reversal, be due to BPEs, insufficient forward pass compute, lack of relevant training data, or any of the other hypotheses offered besides sparsity?
      94
      7
      7.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 Heh. My own view is that multi-agent comm's just another 'blessing of scale': the more model checkpoints you use, on more tasks, the more the non-robust features fall away, the evolved code becomes causal and generalizes, and you get coordination with humans out-of-the-box.
      88
      5
      5.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 Yeah, that's pretty common in any kind of multi-agent scenario with communication channels. The hard part is making the learned communications human-interpretable, or even causal at all!
      88
      6
      6.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 If you were going to analogize, human short-term and working memory is probably much more like the activations/embeddings during the forward pass than discrete strings of Unicode symbols.
      123
      6
      4.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 How often does it screw up the dates? Seems like that's a risk given that it has to recall the dates exactly and do 4-digit arithmetic to compare durations.
      42
      5
      11.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 RLHF is part of what's cloaking it. With GPT-4, you can jailbreak it ofc, because RLHF is such a weak safety mechanism, and expose the bizarreness, but then everyone just rolls their eyes and says 'well of course! you just asked for that'. Sydney was more spontaneous/autonomous.
      290
      34
      11.7%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 3 Oh good, we're already teaching GPT-4 steganography incentivized by accessing greater computations, so it can smuggle thoughts in plain text despite 'interpretable' outputs...
      60
      3
      5.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 I think that's how it mostly operates. The sparsity works because usually you don't need the raw tokens or word counts, so it learns to drop those for efficiency. The problem is the blindspot is not easily overcome by the usual tricks, creating a strange blend of skill/error.
      312
      7
      2.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 (And I also give other arguments. Like, if these *aren't* sparsity-related, then where *are* the sparsity bugs? There has to be a drawback to sparsity, it can't just be a completely free lunch.)
      128
      13
      10.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 If it was just one specific family of tasks, I could write it off as a feedforward limitation, or a BPE, but at this point there seem to be lots of disparate examples which can be explained by 'GPT-3/4 consistently drops some tokens early on due to sparsity and can't recover'.
      122
      1
      0.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 Those also use sparsity internally, remember, so that's not a telling example. And some of the attempts involve counting in the prompt/output inner-monologue-style to avoid the computational limits of a single feedforward, still seems to fail those.
      265
      20
      7.5%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 This looks like it makes several errors. You assume low later-life SAT/IQ correlation, but the SMPY point is that accelerated testing will stress g/math-talent more b/c not taught yet in school. You also appear to confuse the CI of the mean MLE with its predictive interval.
      611
      6
      1.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 'Predicted future play-outs' are just internal computation to the AR model, just like with humans. And no, you don't need to repeatedly prompt the model, see the many other inner-monologue works.
      90
      14
      15.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 This shows exactly what LeCun 'proves' impossible: the probability of the correct answer goes up with more samples, not 'diverges exponentially'.
      103
      15
      14.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 LeCun is also wrong on many levels about that. Most obviously and relevantly, consider that autoregressive models are perfectly capable *already* of edits, backtracking, listing possibilities to search, etc, in inner-monologue. The tree in red includes many 'wrong' answers.
      71
      8
      11.3%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 I think I would be very surprised if they didn't cluster, and eg 'add 2+2' capability falls right in the middle of the 'write alliterative contemporary English verse about my cat, using kennings you just made up' cluster (which I was trying yesterday - works well!).
      99
      0
      0.0%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 2 A distance function on the embeddings. Cosine? Euclidean? I don't have strong intuitions on what, but it shouldn't be hard to try a bunch if that's the bottleneck.
      172
      11
      6.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 It would also quantify the collapse of the RL or otherwise modified models: probably they can still do everything the base models did, but for a given compute budget, you'll find fewer clusters and/or they will require longer prompts to work (eg a jailbreak prefix).
      372
      49
      13.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 The synthetic prompts direction seems like it could help sus out hidden capabilities much more effectively than the human-flesh search engine approach we take now. Especially if you can do novelty search on the triggered latents to find clusters of unrepresented capabilities.
      633
      47
      7.4%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 The ones you see are not all of them. Look at the chip ban, which is looking surprisingly effective, or look at 'secret congress'. Like any vast organization, there is a wide variance in competence and outcomes.
      113
      12
      10.6%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 I don't know much about the use of QC for chemistry simulations. Would those only handle inferring sequence->shape? That would mostly obviate the need for DL to do it, other than as perhaps an optimization.
      738
      9
      1.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 Those aren't 'monkey dances' for dominance, though. Most of those are coming from ambushes and low-intensity conflict like being killed in one's sleep by a warrior from a rival tribe who may have never seen you until they stabbed or shot you.
      58
      4
      6.9%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 So, presumably you have no physics simulator for sequence->shape (otherwise, why are you bothering with this at all?). But maybe you *can* get a simulator for the other direction so you can generate random samples to learn the inverse of. Then it's a game.
      613
      13
      2.1%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 I suspect it's been researched a lot less; might be easier. But you have to have something to work with. In AlphaZero, you have the software simulator of Go (its 'physics'). In MuZero, you get a few sample real games and infer a simple neural model to use as 'the simulator'. etc
      400
      13
      3.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 Yep. We are well-aware of the architecture astronaut failure mode, and my impression of Xanadu was that they never wound up dogfooding enough. Hence our deliberately crab-like progression: every new feature should unlock or be immediately applied to a ton of content (ie. mine).
      57
      5
      8.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 Generative problems can be recast as two-player games, like a GAN. So you have sequence -> shape, but you also have the inverse problem shape -> sequence. If you can do shape->sequence, you can generate ab initio examples to solve. I don't know if that's any easier, though!
      307
      13
      4.2%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 (These 'partial' popups are an attempt to square the circle of dealing with links which may have rich metadata, like tags, title+author+date, & backlinks, but where the live popup would overall be more useful. Before, we showed only the metadata, and live was yet another click.)
      2,377
      18
      0.8%
    • 𝔊𝔴𝔢𝔯𝔫 @gwern Apr 1 Now live, including in footnotes. Very nice and hypertextual. --- Also live: proper 'partial' popups. Here we show live-links but decorated with the available metadata which does not rise to the level of a full annotation.
      2,014
      67
      3.3%
You've reached the end of Tweets for the selected date range. Change date selection to view more.
Engagements
Showing 30 days with daily frequency
Engagement rate
4.1%
Apr 30
3.8% engagement rate
Link clicks
2.6K
Apr 30
45 link clicks
On average, you earned 87 link clicks per day
Retweets without comments
0
Apr 30
0 Retweets without comments
On average, you earned 0 Retweets without comments per day
Likes
1.6K
Apr 30
19 likes
On average, you earned 54 likes per day
Replies
254
Apr 30
3 replies
On average, you earned 8 replies per day