"Bad take" bingo cards are terrible, because they never actually say what's wrong with any of the arguments they're making fun of. So here's the "bad AI alignment take bingo" meme that's been going around... but with actual responses to the "bad takes"!

Mar 14, 2022 · 4:02 AM UTC

I'll reply in a bit more detail below. This is still really cursory, but it's at least the sort of thing that could start a discussion, as opposed to back-and-forth rounds of mockery. Here's the original meme:
1. "It sounds like scifi so it's not possible." It's an error to infer 'X will happen' from 'X shows up in fiction'. But it's also an error to infer 'X won't happen' from 'X shows up in fiction'. Fiction just isn't relevant.
2. "Smarter AI will also be more moral." Sufficiently smart AI would have more ability to be moral, by better modeling our values. It wouldn't thereby want to do moral things, unless we train it to want that. Safely training goals like that is a major unsolved technical problem.
3. "AI wouldn't want to kill us." If we solve alignment, yes. Otherwise, optimizing almost any random objective hard enough will tend to imply killing potential threats ("you can't get the coffee if you're dead"); and humans are made of atoms an AI can use as raw materials.
4. "AI killing us is actually a good thing." Quoting intelligence.org/files/Cogni…: "People who would never dream of hurting a child hear of an existential risk, and say, 'Well, maybe the human species doesn’t really deserve to survive.'" A million deaths is a million tragedies.
One reason a few people argue 'human extinction would be good' is Negative Utilitarianism, the theory that only suffering matters. But this theory is transparently false/silly. (For a longer critique, see lesswrong.com/posts/8Eo52cjz….)
5. "We shouldn't obstruct the evolution of intelligence." If this means 'we shouldn't give up on ever building AGI', then sure. But there isn't a god, Evolution, that wants AI to maximize paperclips rather than having rich, beautiful cosmopolitan values. arbital.com/p/value_cosmopol…
There's just us. We can try to figure out how to create a rich, complex, wondrously alien future, but this requires that we actually do the engineering legwork. It doesn't happen by default, because we aren't living in a morality tale about the beauty of science and progress.
We're living in a lawful, physical universe, where the future distribution of matter and energy depends on what goals (if any) are being optimized. Optimizing a random goal ("paperclips" being the usual toy example) will tend to produce a dead universe. So let's not do that.
6. "Smart AI would never pursue dumb goals." Again, see arbital.com/p/orthogonality/. Goals and capabilities are orthogonal. As intelligence increases, you get better at modeling the world and predicting its future state. There's no point where a magical ghost enters the machine...
... and goes "wait, my old goals are stupid". The machine might indeed reflect on its goals and opt to change them; but if so, it will decide what changes to make based on its *current*, "stupid" goals, not based on a human intuition that paperclips are boring.
(Unless we solve the alignment problem and program it to share those intuitions and values!)
... verging on impossible, when you're more than a few years out. This doesn't mean that AGI is near. But it means we can't necessarily expect to know when AGI is 10 years away, or 20, and wait to work on alignment then.
If we're looking for hints, however: Many tasks that "the human brain does easily in a half-second" have fallen to DL, and "the gap between us and AGI is made mostly of intangibles" (lesswrong.com/posts/cCMihiwt…). This makes it somewhat harder to say that AGI is *extremely* far away.
8. "Just give the AI sympathy for humans." Human value is complex and fragile; and you often won’t get an advance warning if the goal you instill catastrophically diverges from human values in a new distribution. intelligence.org/files/Compl…
9. "AI will never be smarter than humans." Computers are already superhuman on many narrow tasks, like chess and arithmetic. It would be strange if, e.g., our ability to do science were any different.
Evolution is a poor designer, and human brains weren't optimized by evolution to do science. There are also an enormous number of known limitations of human brains that wouldn’t automatically apply to digital brains. The conclusion seems overdetermined. aiimpacts.org/sources-of-adv…
10. "We'll just solve alignment when we get there." We have no idea how to go about doing that; and there likely won't be time to solve the problem in the endgame unless we deliberately filtered in advance for AI approaches that lend themselves to interpretability and alignment.
Alignment looks difficult (intelligence.org/2017/11/25/…), and AGI systems would likely blow humans out of the water on STEM work immediately, or within a few years. This makes failure look very likely, and very costly. If we can find a way to get ahead of the problem, we should do so.
11. "Maybe AGI will keep us around like pets." Seems like wishful thinking. Is this really the best use of resources for maximizing paperclips? The whole idea of a "pet" is a value-laden human concept, and humans don't keep most possible configurations of matter around as pets.
So even if the AGI wanted something that (from a human perspective) we would label a “pet”, why assume that this thing would specifically be a human, out of the space of all possible configurations of matter? (Pet rocks, pet gas clouds, pet giant blue potatoes...)
12. "Just use Asimov's three laws." We don’t know how to robustly load laws like those into an AI system; and even if we did, Asimov’s laws would have terrible outcomes in practice. (Which, indeed, was the point of the laws in Asimov’s stories.)
13. "Just keep the AI in a box." This is much harder than it sounds, and doesn't obviously help. E.g., how would you use a boxed superintelligence to safely design a complex machine for you? Any information we extract from the AGI is a channel for the AGI to influence the world.
If we had a fast, fool-proof way to analyze machine blueprints and confirm that they’re safe to implement, then we could trust the design without needing to trust the designer. But no such method exists.
14. "Just turn it off if it turns against us." Computers have an "off switch", but humans have an "off switch" too. If we're in an adversarial game with a superintelligence to see who can hit the other's off switch first, then something has gone very wrong at an earlier stage.
The core problem, however, is that even if a developer can keep repeatedly hitting the "off switch", this doesn't let you do anything useful with the AGI. Meanwhile, AGI tech will proliferate over time, and someone will eventually give their AGI access to the Internet.