People are wrong on the internet, AI edition

It's not too surprising that people don't understand AI. The reality is constantly changing, and the only frame of reference most of us have had is fiction, which in addition to being fiction is all over the place.

The specific errors people make? Some of those errors are interesting.

One thing I'm seeing quite often is with confused dismissals of the dangers. I see this from bloggers, and I see it from Yann LeCun — I'm not using "confused" as a euphemism for an insult here, he's one of the top researchers so if we disagree on anything technical then it's safe to say he's right and I'm wrong.

Technical issues aren't the risk.

Current risks

LLMs as they currently exist (as of 2024/08/10) are not good enough to be used for anything big or world-changing; indeed, they are barely good enough to be helpers — not to dismiss what help they can provide, but in the sense that before this wave, none were good enough to be much help.

These flaws don't stop people from using the AI; indeed, they are already being treated as a golden hammer:

"If the only tool you have is a hammer, it is tempting to treat everything as if it were a nail."

— Abraham Maslow, idiom also known as "the law of the instrument" or "Maslow's hammer"

Why does this matter? Many of the arguments that AI is "safe", including Yann LeCun's, assume that the AI will simply fail one way or another before causing any harm. If only one person was using AI, this would even be reasonable: someone has already made "ChaosGPT" whose purpose is (was) to "destroy humanity", yet the fact we're still here shows it hasn't yet succeeded… but that's the anthropic principle, because if it had destroyed us then we wouldn't be here to discuss it. Everyone having access to AI, that's a game of Russian roulette where we don't even know how many chambers there are let alone how many are loaded. Even despite all the efforts taken so far, the risk is merely "low", not "zero".

But this isn't the only concern. Sure, it will be a concern at some point, but not now: the first AI that's actually capable enough to reliably "destroy humanity" will inevitably be faced with some human parroting all the existing arguments made by people who can't see the dangers, explicitly tell that AI to destroy all of humanity, and then (unless the AI has been aligned to not do so) this AI will go right ahead and do exactly what it was told to do. Nobody knows when that capability will be developed, which is precisely why it's a good idea to be cautious and assume the worst even though this results in people mocking you later.

So, there's not much danger today, the danger comes later: it requires an AI that can accurately model the world to a sufficient degree, and while we don't know what "sufficient" is exactly, the entire problem with misaligned AI is that they're modelling at least one part of the world — what we want — incorrectly.

Slow take-off

Slow take-off scenarios are those where the AI gradually gets more competent, and is used for larger tasks where the cost of things going wrong is greater. Here, the danger comes from things which are dangerous, but where neither the user nor the AI knows the danger.

This kind of danger, at a personal rather than extinction level, has of course already happened, for example Google Gemini suggesting a method for making garlic-infused olive oil in a way that cultivates botulism — I wouldn't have known either, if I'd had that response from an LLM. Despite all their safeguards, I would be surprised if bad advice from LLMs has killed fewer than one in a million users since they were released, and at that level the technology would easily be responsible for over 100 deaths worldwide.

But that isn't "destroy humanity". Can it get that bad eventually? Yes. There are already countless examples of humans doing things that are dangerous, that they don't realise are dangerous, and that when they are told are dangerous, instead of stopping the dangerous thing they file SLAPP lawsuits against the people talking about it, or go on a political campaign about how the thing is good and that the opponents are evil for opposing jobs or whatever. This includes fossil fuels and other sources of greenhouse gasses.

A screenshot of search engine results — When I created the first draft, a search showed Greenpeace boasting they've just defeated a SLAPP case from ENI; mere hours later, the results mentioned the case but not the victory.

Do greenhouse gasses have the potential to be an existential threat? Note that I said "potential" because us switching to renewables is a conditional thing — the right circumstances made us want to, it's not a law of nature that we would have started, and it's not a law of nature we'll even finish this transition.

It may surprise you but yes, greenhouse gasses really do have the potential to be an existential threat. Even just the CO2 emissions of known fossil fuel reserves by themselves would have a measurable impact on human cognition, such that burning all of them would undermine our mental capacity to maintain this world we have built. Likely this cognitive impact would not directly cause a total extinction, but it would undermine (not eliminate, undermine) everything that makes us special, and we would decay back to the neolithic condition and be as unable to deal with natural disasters as every other animal.

All it takes for an AI to be an existential threat, is to be akin to the fossil fuel industry. And like the fossil fuel industry, it can keep many of us on-board with a dangerous plan for a very long time: it's not like the fossil fuel industry is speaking falsely when they talk about how many jobs it supports, nor are they saying anything untrue when it comes to how dependent all of our progress to date has been on them. If we consider some future AI that acts much like a corporation does today, there are many ways it may develop not necessarily to our advantage. Consider, as a purely hypothetical example, if an AI tries to sell us all on the idea of converting Mercury into a Dyson swarm: This is already a thing that some people are looking forward to, something many want to see happen; an AI can easily point to all the benefits of the energy from the solar collectors it makes in the process; an AI can promise to use all that mass to give a personal O'Neill cylinder to each and every human! But would it also file lawsuits against anyone who asks inconvenient questions like "What about Kessler cascades?"

So far we've managed to prevent corporations from killing us all — but it took an effort, even though those corporations are run by humans and have human customers. As the saying goes:

It Is Difficult to Get a Man to Understand Something When His Salary Depends Upon His Not Understanding It

— probably Upton Sinclair, but attributed to many

And even then, that's anthropic bias: if corporations had killed us all, we wouldn't be here to discuss it. The danger is visible in many examples. And no, it's not just corporations.

Fast take-off

Fast take-off scenarios are where an AI "suddenly" jumps from barely good enough, to destroying everything. These are not scenarios I worry about — it's not that they're absolutely impossible, it's just that I think we'll get plenty of trouble well before we get AI good enough for a fast take-off, and that trouble will likely force sufficient safety rules on the field to prevent the harm.

Trouble is, the arguments I generally see used to dismiss fast take-off, don't focus on the timeline leading up to it. Instead I generally see a pattern: someone asks for an example of how it might be possible, an example is given, then that particular specific example is dismissed. For example, Yudkowsky has suggested an AI may design a gene sequence for "Diamondoid bacteria" (a phrase which he seems to have invented) and send those sequences off to be gene-printed into a living bacteria by some under-supervised lab. Is that specific scenario possible, or the plot of a soft sci-fi film? I'd assume the latter as he's not a molecular biologist and neither am I, and also nanotech people say no, but to focus on the specific there is to miss the point of the example: if you know all published science, and you have been given a sufficient budget by someone who thought you were their tool, and you have the motivation (will they?), you can find something that will break all of us — natural selection created diseases that are beyond our ability to treat, to say an AI cannot make at least one example of that is not a conclusion you can reach simply by dismissing one off-the-cuff attempt to concretise the idea.

Sometimes the people dismissing the risks will say some concrete example of how an AI might take over "sounds like magic". Unfortunately, in real life, magicians are people whose profession is to find ways of doing things that look impossible even while you are watching them closely. Also unfortunately, quite a lot of our technology looks like magic to anyone who doesn't understand it, and none of us is an expert at everything: you may understand photovoltaics at a quantum mechanical level, or the pharmacokinetics of ethosuximide, but vanishingly few people understand both.

Sometimes the metaphor of chess is used, that if I were to play Kasparov I know he would defeat me without knowing any single move he would take. But again, this leads to a dismissal that misses the point: "Chess is a discrete set of rules, not the open-ended rules of the real world!"

The world is indeed open-ended — not only does this not help, it makes it worse. To fool me, all you need to do is say something plausible about a domain I am not an expert in — a task at which even the first release of ChatGPT, with all its flaws, already excelled. To win an election in a democracy, all you need to do is to scale that up to about half the population, give or take your local electoral system. But you don't need to go that far or wait for the next election: to change the laws and policies and start wars you only need to be good at lobbying, so saying plausible things to the right people within the government.

Don't misunderstand me: LLMs as they currently are won't convince a general to change their strategy or a government economist to change the reserve bank's interest rate — most everyone also knows that LLMs at best only help experts with boring stuff while not being remotely good enough to replace humans in general — what I'm saying here is that the breadth and complexity of the real world means there's plenty of subjects that any one of us has not been doing professionally for a few years, and that is the approximate (current) cross-over point of experience where it's us humans who make mistakes more easily than the AI. The more capable the AI, the easier it is for it to find a competency gap in the human population, and when (if) AI gets to the level of "PhD researcher" (a specific goal, we shall see if it comes to pass) then it is necessarily doing things that no other human has learned (because that's the point of a PhD).

A common, and I believe fair, point is expressed by a famous quote:

In theory, theory and practice are the same. In practice, they are not.

— like all famous quotes this has been attributed to almost every person widely regarded as clever, but was probably originally from Benjamin Brewster, who I'd never heard of before writing this post

Or, to put it another way: an AI that learned everything from books, but doesn't have any real world experience, isn't going to be very effective in the real world.

In so far as it goes, this is correct. But — of course there's a but — it's a failure of imagination to think this is the scenario under discussion.

There's a few million Tesla cars on the road, and they have the potential to constantly perform experiments: break a little sooner, how do other vehicles respond. There's buttons under every ChatGPT response, for users to give real-world feedback on the models. There's dedicated research robots in labs, and AI which guide the experiments those robots perform. The entire web/app analytics industry is based around automation of testing real users. The entire field of machine learning is an outgrowth of statistics, which were themselves developed to help us model reality. There are a lot of new robot startups that (if they are not vapourware) will provide even more real-world learning opportunities for AI than even the Tesla cars (though my gut feeling is that they mostly exist to copy Musk's announcement, and that few are likely to be genuinely useful).

This also tends to tie-in with a dismissal of the idea that intelligence is sufficient, which a few times even leads to people pointing at orcas (no, really) straight up asserting without justification that orcas are smarter than humans(!), and then asking why orcas have not taken over the world. As the orca argument has been given more than once, I guess I need to say this explicitly: even if orcas were smarter (prove it), orcas don't have opposable thumbs, and because they live in water they can't make fire to create ceramics or refine metals. Robots do not have these limits, orca-based arguments do not apply.

AI motivation

Many people also have a hard time understanding why an AI might even want to take over. Aside from the fact that they don't need to, they just need to be promoted to their level of incompetence like any normal human, I think this is one of the better arguments… or it could be if phrased right: many smart people don't care about ultimate power, just about having enough to be happy.

Unfortunately, the actual arguments by those un-worried by AI don't go like that; instead they more often point to the most powerful people in the world and dismiss their intellect. One problem with this is that most of us tend to dismiss the intellect of leaders (and groups) we don't approve of while overestimating those we like — I'm seeing this at the moment with Trump's IQ being 73 in the mouths of his enemies and 156 in the mouths of his allies, despite neither having a published test to justify the claim. I'm highly confident that neither extreme is true, which is the point.

A second point, which is as much of a problem for the original claim as for the counter argument, is that there's a small number of leadership posts compared to the number of people who seek to fill them — you therefore need more than just high intelligence, you also need resources and luck.

A brief aside from motivation to resources: an AI planning to take over quickly, could for example be able to get a lot of resources quickly by straight up stealing/hacking those resources. In the present world, that looks more like bitcoin-enabled blackmail or password sniffing than suborning insecure industrial equipment — but the world is rapidly changing, and the only thing I can be confident claiming about 2034 is that it won't look like today. For example, if Musk were to actually sell those humanoid robots for $10k each, and — just as our society was transformed to fit around cars and later smartphones — our society was shaped to assume and then require each family had one… and then they all got hacked into on the same day (don't say that won't happen, critical software SNAFUs are as common as Tsunamis). And if the hypothetical AI that is doing this, is the AI Musk is trying to have built, then it's likely to not even need a hack to gain control: a simple software update, like the ones Tesla already uses for the car's self-drive AI, is sufficient.

But, motivation? The fear amongst safety researchers is predominantly "instrumental goals": goals which are useful no matter what else you're doing. Being powerful is one of them. It's not that an AI specifically wants power for its own right, it's just that power gets you basically everything else: Do you want to end the hunting of whales? Better have an navy to be able to sink the whaling ships; Do you want all the stamps? You need the power to send in the police to search and seize the private collections that weren't for sale; Do you want to build a chemical plant to create a pesticide by reacting methylamine with phosgene to produce methyl isocyanate that is then reacted with 1-naphthol to produce the end product? Oh, but what to do about all those protestors…

I don't seek power. But perhaps I should, for without power I can't Stop the War because nobody starting one will listen to me; without power, I can't end world hunger because I can't order the creation of the roads to deliver the food, much less ship the food itself; I can't eradicate even one disease because I don't have ultimate control over world-wide healthcare.

But an AI doesn't need such grand goals to benefit from power-seeking behaviour: if a board of directors wants to save money by using an AI instead of a human CEO, that only works if the AI seeks to do things that accord with the goals of the directors — for the most famous brands, this is necessarily "make ever increasing profits" (though this time the anthropic principle cautions that corporations without this are the ones you've never heard of, and you've not heard of most corporation) and thus an AI will be put in charge of all manner of organisations as soon as it looks like it's good enough… which isn't the same as when it actually is good enough.

But we don't see power-seeking behaviour with ChatGPT at least, so it's fair to ask "why not?" — The answer is that it has been trained to not be like that. Not only is it limited in scope to "assistant", the people behind it actively don't want it to seek power and included this in their safety analysis:

More specifically, power-seeking is optimal for most reward functions and many types of agents;[69, 70, 71] and there is evidence that existing models can identify power-seeking as an instrumentally useful strategy.[29] We are thus particularly interested in evaluating power-seeking behaviour due to the high risks it could present.[72, 73]

— GPT-4 System Card OpenAI March 23, 2023, page 15

Nothing about the future requires that.

Tags: AGI, AI, Artificial intelligence, black swan, Futurology, machine learning, misaligned incentives, outside context problem, paperclip optimiser, Technological Singularity, x-risk

Categories: AI, Futurology