Rob Bensinger ⏹️ · Mar 14, 2022 · 4:02 AM UTC

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:02 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

"Bad take" bingo cards are terrible, because they never actually say what's wrong with any of the arguments they're making fun of. So here's the "bad AI alignment take bingo" meme that's been going around... but with actual responses to the "bad takes"!

Mar 14, 2022 · 4:02 AM UTC

169

916

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:04 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

I'll reply in a bit more detail below. This is still really cursory, but it's at least the sort of thing that could start a discussion, as opposed to back-and-forth rounds of mockery. Here's the original meme:

Leo Gao @nabla_theta

12 Mar 2022

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:06 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

1. "It sounds like scifi so it's not possible." It's an error to infer 'X will happen' from 'X shows up in fiction'. But it's also an error to infer 'X won't happen' from 'X shows up in fiction'. Fiction just isn't relevant.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:06 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

2. "Smarter AI will also be more moral." Sufficiently smart AI would have more ability to be moral, by better modeling our values. It wouldn't thereby want to do moral things, unless we train it to want that. Safely training goals like that is a major unsolved technical problem.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:06 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

See: arbital.com/p/orthogonality/

Orthogonality Thesis

Will smart AIs automatically become benevolent, or automatically become hostile? Or do different AI designs imply different goals?

arbital.com

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:07 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

3. "AI wouldn't want to kill us." If we solve alignment, yes. Otherwise, optimizing almost any random objective hard enough will tend to imply killing potential threats ("you can't get the coffee if you're dead"); and humans are made of atoms an AI can use as raw materials.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:07 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

See: arbital.com/p/instrumental_c…

Instrumental convergence

Some strategies can help achieve most possible simple goals. E.g., acquiring more computing power or more material resources. By default, unless averted, we can expect advanced AIs to do that.

arbital.com

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:07 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

4. "AI killing us is actually a good thing." Quoting intelligence.org/files/Cogni…: "People who would never dream of hurting a child hear of an existential risk, and say, 'Well, maybe the human species doesn’t really deserve to survive.'" A million deaths is a million tragedies.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:08 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

One reason a few people argue 'human extinction would be good' is Negative Utilitarianism, the theory that only suffering matters. But this theory is transparently false/silly. (For a longer critique, see lesswrong.com/posts/8Eo52cjz….)

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:09 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

5. "We shouldn't obstruct the evolution of intelligence." If this means 'we shouldn't give up on ever building AGI', then sure. But there isn't a god, Evolution, that wants AI to maximize paperclips rather than having rich, beautiful cosmopolitan values. arbital.com/p/value_cosmopol…

Cosmopolitan value

Intuitively: Value as seen from a broad, embracing standpoint that is aware of how other entities may not always be like us or easily understandable to us, yet still worthwhile.

arbital.com

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:10 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

There's just us. We can try to figure out how to create a rich, complex, wondrously alien future, but this requires that we actually do the engineering legwork. It doesn't happen by default, because we aren't living in a morality tale about the beauty of science and progress.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:10 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

We're living in a lawful, physical universe, where the future distribution of matter and energy depends on what goals (if any) are being optimized. Optimizing a random goal ("paperclips" being the usual toy example) will tend to produce a dead universe. So let's not do that.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:11 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

6. "Smart AI would never pursue dumb goals." Again, see arbital.com/p/orthogonality/. Goals and capabilities are orthogonal. As intelligence increases, you get better at modeling the world and predicting its future state. There's no point where a magical ghost enters the machine...

Orthogonality Thesis

Will smart AIs automatically become benevolent, or automatically become hostile? Or do different AI designs imply different goals?

arbital.com

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:12 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

... and goes "wait, my old goals are stupid". The machine might indeed reflect on its goals and opt to change them; but if so, it will decide what changes to make based on its *current*, "stupid" goals, not based on a human intuition that paperclips are boring.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:13 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

(Unless we solve the alignment problem and program it to share those intuitions and values!)

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:13 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

7. "AGI is too far away to worry about right now." What specific threshold should we wait for before worrying? How do you know this threshold is both far from the present day, and well before AGI? There's no fire alarm for AGI, and timing tech is hard. intelligence.org/2017/10/13/…

There's No Fire Alarm for Artificial General Intelligence - Machine Intelligence Research Institute

What is the function of a fire alarm? One might think that the function of a fire alarm is to provide you with important evidence about a fire existing, allowing you to change your policy...

intelligence.org

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:15 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

... verging on impossible, when you're more than a few years out. This doesn't mean that AGI is near. But it means we can't necessarily expect to know when AGI is 10 years away, or 20, and wait to work on alignment then.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:15 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

If we're looking for hints, however: Many tasks that "the human brain does easily in a half-second" have fallen to DL, and "the gap between us and AGI is made mostly of intangibles" (lesswrong.com/posts/cCMihiwt…). This makes it somewhat harder to say that AGI is *extremely* far away.

Comments on Carlsmith's “Is power-seeking AI an existential risk?” — LessWrong

The following are some comments I gave on Open Philanthropy Senior Research Analyst Joe Carlsmith’s Apr. 2021 “Is power-seeking AI an existential ris…

lesswrong.com

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:17 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

8. "Just give the AI sympathy for humans." Human value is complex and fragile; and you often won’t get an advance warning if the goal you instill catastrophically diverges from human values in a new distribution. intelligence.org/files/Compl…

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:18 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

9. "AI will never be smarter than humans." Computers are already superhuman on many narrow tasks, like chess and arithmetic. It would be strange if, e.g., our ability to do science were any different.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:20 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

Evolution is a poor designer, and human brains weren't optimized by evolution to do science. There are also an enormous number of known limitations of human brains that wouldn’t automatically apply to digital brains. The conclusion seems overdetermined. aiimpacts.org/sources-of-adv…

Sources of advantage for digital agents over biological agents

Artificial agents should have several advantages over humans. Details The following is an excerpt from Superintelligence (Bostrom, 2014), reproduced with permission. It outlines ten advantages Bos...

aiimpacts.org

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:20 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

10. "We'll just solve alignment when we get there." We have no idea how to go about doing that; and there likely won't be time to solve the problem in the endgame unless we deliberately filtered in advance for AI approaches that lend themselves to interpretability and alignment.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:21 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

Alignment looks difficult (intelligence.org/2017/11/25/…), and AGI systems would likely blow humans out of the water on STEM work immediately, or within a few years. This makes failure look very likely, and very costly. If we can find a way to get ahead of the problem, we should do so.

Security Mindset and Ordinary Paranoia

The following is a fictional dialogue building off of AI Alignment: Why It’s Hard, and Where to Start. (AMBER, a philanthropist interested in a more reliable Internet, and CORAL, a computer...

intelligence.org

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:22 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

See also: arbital.com/p/aligning_adds_…

Aligning an AGI adds significant development time

Aligning an advanced AI foreseeably involves extra code and extra testing and not being able to do everything the fastest way, so it takes longer.

arbital.com

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:22 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

11. "Maybe AGI will keep us around like pets." Seems like wishful thinking. Is this really the best use of resources for maximizing paperclips? The whole idea of a "pet" is a value-laden human concept, and humans don't keep most possible configurations of matter around as pets.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:23 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

So even if the AGI wanted something that (from a human perspective) we would label a “pet”, why assume that this thing would specifically be a human, out of the space of all possible configurations of matter? (Pet rocks, pet gas clouds, pet giant blue potatoes...)

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:23 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

12. "Just use Asimov's three laws." We don’t know how to robustly load laws like those into an AI system; and even if we did, Asimov’s laws would have terrible outcomes in practice. (Which, indeed, was the point of the laws in Asimov’s stories.)

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:23 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

13. "Just keep the AI in a box." This is much harder than it sounds, and doesn't obviously help. E.g., how would you use a boxed superintelligence to safely design a complex machine for you? Any information we extract from the AGI is a channel for the AGI to influence the world.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:24 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

If we had a fast, fool-proof way to analyze machine blueprints and confirm that they’re safe to implement, then we could trust the design without needing to trust the designer. But no such method exists.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:24 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

14. "Just turn it off if it turns against us." Computers have an "off switch", but humans have an "off switch" too. If we're in an adversarial game with a superintelligence to see who can hit the other's off switch first, then something has gone very wrong at an earlier stage.

Rob Bensinger ⏹️ · Mar 14, 2022 · 4:25 AM UTC

Rob Bensinger ⏹️

@robbensinger

14 Mar 2022

The core problem, however, is that even if a developer can keep repeatedly hitting the "off switch", this doesn't let you do anything useful with the AGI. Meanwhile, AGI tech will proliferate over time, and someone will eventually give their AGI access to the Internet.

more replies