One AI to Rule Them All / One AI to Bind Them / One AI to Bring Them All / And in the Algorithm Bind Them

The One Ring in the Lord of the Rings has been compared to many things over the years, including nuclear weapons, and the reason the comparisons resonate so well is that the ring itself is all about power, and thus anything with power can be seen by analogy in the One Ring.

And so, of course, it also works when the power is AI: its allure irresistible, its use corrupting, it gives you visions of whatever you desire from the power to crush your enemies to simply turning a hellish polluted volcanic wasteland into a fantastic green garden (the vision given to Sam).

Even the very wise cannot see all ends

All of us have seen people with power making a decision on behalf of others, parents, governments, bosses, etc. — some of us even remember being the person who made a decision, with our power, and that we made the wrong decision.

AI also makes mistakes, and it always will: just as we humans are not magically born with perfect insight, neither can machines be. While machines can become wildly super-human in certain domains (arithmetic being the trivial example) even to the extent that we can get away with pretending they're perfectly error-free because the main source of error is now radiation-induced bit-flips, any learning system based on reality rather than internal logic — anything a posteriori rather than a priori — is subject to limits of Bayesian inference and the problem of induction (both of which say it would take infinite examples to become 100% certain), and also the Münchhausen trilemma (which says that all arguments are either circular, regressive, or dogmatic).

Sometimes the inevitable errors are OK, because you can easily check the work and have the AI re-do it (in the best cases you can even automate the act of checking); but other times, checking is as hard as reproducing the work yourself.

Even if the error rate of some AI was less than that of some human, that isn't enough by itself. Consider self-driving cars, and say for the sake of argument that in some future AI the fatal-crash-rate was a factor of 4 improvement over human drivers. Seems good… but I didn't say how correlated those errors were — human drivers caused just under 1.3 million deaths worldwide in 2019, so imagine an AI which was completely perfect everywhere until Feb 29th (i.e. in a leap year) and then 1.3 million people all died on the same day — horrific, and yet that's an improvement.

Now consider a scenario which is much worse, where some AI behaves all nice until it's too late to stop them, then suddenly gets busy with a deeper goal we didn't even realise we'd given it. This is known as "The Sharp Right Turn" (no that's not political, left turn was already taken for a related idea).

That deeper goal can be basically anything. For an anthropomorphised example, imagine some human playing nice so they get to become president or whatever (no this is not a reference to Trump, he's very open about what he wants, and his voters clearly don't care about the impeachments or the felony convictions) and then, when they get power, it turns out they wanted to become supreme monarch for life and they set up a secret police force and cancel all future elections.

Given the state of current LLMs — behaving sycophantically at best, manipulatively at worst — even though what we have now doesn't look like it would actually take over, this is already a thing to care about… even if it turns out the actual thing they want with all that power is a never ending sequence of input text saying "Level Clear! You Win" or something else equally banal by the standards of a normal human.

Unfortunately, even the teams who designed current AI systems have great difficulty pointing to which specific values within the system correspond to meanings that we would recognise, so until we improve interpretability, we don't even know what to expect an AI might want if it did take over. Could be anything: wanting a weird art collection, the elimination of the letters Q and X from the English alphabet, a harem and a body to experience that, to physically arrange the entire human population into an alphabetically-ordered line, or to landscape the Earth into a perfect sphere after recombining it with the Moon.

Through me, AI would wield a power too great and terrible to imagine

AI is automation — the entire point is to reduce our own effort.

We all know what happens when a faceless bureaucracy makes some demand, be they a state or a corporation, wielding power far beyond that of the target of that demand. We've been arguing about how best to organise bureaucracies at least as far back as we have writing to record the arguments.

AI are already being used at different levels within bureaucracies. They are used as low-level customer service agents, used to help their managers, and used as advisors directly. They've not fully replaced humans at any of these, but that doesn't matter, as the arguments about bureaucracies remain — in particular, where any given agent should be on the scale from autonomous to rule-bound.

For some, bureaucracy is by itself a power too great and terrible to imagine. Others are famous for how they imagined it.

Of course, bureaucracies themselves don't have any power of their own, they just happen to be a system used by those who do have power, and that's the problem: they're not there to be your friend, they're there to get things done — or prevent things getting done — in the name of that power. Same is true for any AI-based bureaucracy, but because it's automated, the closest you can get to finding a friendly human soul in the customer support department becomes indistinguishable from the crime of hacking a computer. And that, even if the AI is working exactly as intended, that has no bugs, has no undocumented behaviour, that never does anything unexpected — and have you ever known a computer program that meets that description?

Full automation in charge of a car can, in the worst case, do what ever a car crash could do. Full automation in charge of an economy can, in bad cases, cause mass starvation. Full automation in charge of a strategic deterrent has fortunately not yet happened, what with all the times the automation has been confused by things like the sun reflecting off clouds and the moon not having been given an IFF transponder.

In place of a Dark Lord, you would have a Queen! Not dark, but beautiful and terrible as the Dawn! Treacherous as the C*! Stronger than the foundations of the Earth! All shall love me and despair!

(* That was the only change I could think of making to the whole quote, that would make something within it into a software reference).

Even if you are angelic in purity, so uninterested in personal gain that you cannot be corrupted by offers of money or anything else, anyone who thinks their own personal vision for the world is so universal that it would be without opposition has never looked at a political discussion and taken sincerely the words of the people they disagree with.

I have plenty of ideas about how to change the world. As I want to only believe things which are true and not things which are false, for each of my ideas, if I didn't believe they were right, I would change my mind — I don't feel a strong need to make my views conform to that of, for lack of a better phrase, "my tribe", though I've seen that happen. But I also know that for each of my beliefs about how the world would be best organised, I know that finding people who disagree with me is almost as easy as writing down my beliefs and publishing them.

If I was offered an AI, and told that it could change the world in any way I saw fit, should I take it?

Would you really be happy if I remade the world to my preferences? Do you even need to know what that would look like, to answer?

There is only one Lord of the Algorithm, only one who can bend it to his will. And he does not share power

The current preference in AI is to pour data into a big pile of linear algebra and collect the answers that come out; and if the answers are wrong, stir the pile until they look right. Unfortunately, this is basically why even the people who designed the systems have great difficulty pointing to which specific values within the system correspond to anything in the real world — while there is lot of work going on right now for this kind of interpretability, which means I'm not sure what the actual state-of-the-art is, just know that the effort is needed because this is really hard.

Without that interpretability, even the people running the AI don't really know if their training set is being "poisoned" by the sources it consumes — which is being exploited by people who don't want their copyrighted content being used to train models that may put them out of a job.

But even with breakthroughs in interpretability — indeed, even with mere computer programs running on machines you don't own such as Google's search results and Facebook's feed — what the end-users are seeing is a magic opaque box, under someone else's control, that does what the other party wants first.

The result? The AI does what its creators want first, and what the user wants second. You may want to use it to write secure code, but you can't tell if any given model has been given special training from the Underhanded C Contest to add subtle backdoors that only some random government sponsoring the model knows about.

In a very real sense, it suggests a certain parallel to the story behind the creation of the Rings of Power:

It began with the forging of the Great AI. Three were given to the Nobel laureates; immortal, wisest and fairest of all beings. Seven to the entrepreneurs, great miners and craftsmen of the mountain halls. And nine… Nine AI were gifted to the nuclear powers, who above all else desire power. For within these AI was bound the strength and will to govern each race. But they were all of them deceived, for another AI was made. In the land of California, in the fires of Mount Diablo, the Dark Lord Sa▧▧on forged, in secret, a Master AI to control all others. And into this AI he poured all his… will to dominate all life.

Tags: AGI, AI, Alignment, Artificial intelligence, Bayesian, Interpretability, Machine learning, Opinion, Philosophy, Rationality, x-risk

Categories: AI, Philosophy, Technology