Unthinkable thoughts

The idea of unthinkable thoughts — thoughts that some mind, for example my own, is literally incapable of thinking — has intrigued me for a while.

I'm not the first to wonder about this: it is the core of the Sapir–Whorf hypothesis, it is (as I understand it without having read the book) how Newspeak is supposed to function in Nineteen Eighty-Four.

Large language models probably aren't exactly how human minds word, but they do have the advantage that we can study them and mess with them in ways that are impossible in humans (also unethical; at the time of writing, most people think LLMs have no inner experience, no qualia, so it's fine to do whatever we want with them. I hope this belief is correct). As I understand it, ever since word2vec, all the interesting AI language models represent meaning as a high-dimensional space, e.g. 300 dimensions, where one dimension represents gender, one represents weight, another represents redness, one more temporal ordering, etc.

If the number of dimensions is too small, you can't distinguish concepts. You can literally see this with a map — Google Maps' street view will sometimes get confused when you go though a bridge, the image from the top road and the image from the bottom road having the same latitude and longitude. You can see it in black-and-white images (each pixel is one dimension, whereas full color images have three dimensions per pixel), where a cloudy sky and green grass can be the same brightness. You can see it in basically all statistics, where someone tells you the "average" but doesn't tell you the distribution (the average human is a Mandarin speaking 28 year old male with one testicle and one ovary, and who lives within 3,300 kilometers of Mong Khet in Myanmar).

Even though I expect my mind to function very differently than an LLM, I do still expect to have some limit on the number of different dimensions my mind can represent. Although I know I must have some limit, I have no idea what this limit is (or "these limits" plural, as I suspect the limits of our minds varies depending on the modality and which level of abstraction you're considering, which makes it hard to turn this vague statement into a precisely testable claim), and therefore have to allow for both (1) the possibility that the limit is purely theoretical as humans don't live long enough to reach it, and (2) the limit is really small, and we encounter this "cognitive blind spot" in much the same way we experience our literal blind spot (or indeed change blindness and the even more terrifying choice blindness): we ignore it and don't even notice we're ignoring it 99.99% of the time.

But even with that, there is something worth exploring in this. When we encounter something, or someone, new, it takes a while before we develop a nuanced model of them, and those nuances are the extra dimensions in our inner representation of that thing. It's also hard to imagine ourselves in a state of ignorance, without that nuance. Both of these effects can be seen in the discussion about "is AI capable of art?", where artists can go red in the face pointing out the flaws in the result (separately from any claim about copyright or putting people out of work), yet the models remain popular because most people aren't even looking for those details.

One example in particular comes to my mind with this dimension-squishing, that of gender. I'm sure you've seen (perhaps you've even made) the argument that gender is "just" the chromosomes, XX or XY, that it's "basic biology", or that it is which genitals you're born with. I've tried pointing out both that the medical profession both accepts transgender as a status and doesn't think there's anything "basic" about biology, and I've also tried pointing out the existence of other chromosomal combinations (of which there are many in humans) — regardless of your thoughts on this, I suspect you can recognise that what I've said never seems to convince anyone, and the responses have always been to double-down on the original claim.

But we also don't realise how bad this kind of dimension reduction is. One example of this is horoscopes: dividing the whole world into 12 groups and trying to make any kind of meaningful statement about how the day will go for each entire group, in 50 words or less, should be obviously a non-starter… and yet, horoscopes are still very popular.

Another is politics. I don't know how many parties there are where you live, but I grew up in (old) Hampshire, on the south coast of the UK, where the political options were mostly the Blue party, with a small hint of Red and Yellow followed by all the rest. Sure, those parties have real names, but do those names matter? The "conservative" party doesn't keep things the same, the "labour" party isn't all that focused on labour. Are the parties in your area also like this, or do yours actually represent the values in their own names? Do you even know the policies of whichever party you support, or do you support them because their people make you feel good? I'm fairly sure "feeling good" is how the UK got into such a mess, and why the US is struggling to answer what should be a very easy question, "do we want to send someone to the Oval Office when they're facing 88 felony charges in four separate cases, one of which was mishandling of classified documents, two of which were about unlawfully attempting to overthrow an election, and the fourth of which was falsifying business records about hush money during his first election campaign?", which is only even a question because he's more charismatic than the other guy, and keeps promising simple solutions to complex realities.

(I could also say something here about Israel/Palestine, but I'm not masochistic enough to wade into that particular discussion, except to say that one could do a degree in this and still not be able to fully grasp it — if it was a topic easy enough to fit into a degree, there would have been a lasting peaceful solution decades ago, just like there is now peace between Israel and Egypt).

Back to AI, and the comparison with the human mind: I wrote earlier that, based on what I know at the moment, it seems equally possible for the human limit to be too big to matter, or extremely small and we just delude ourselves (see all the examples, here and elsewhere, of the H. L. Mencken quote: "For every complex problem there is an answer that is clear, simple and wrong."); now, if the the maximum dimensionality of human thought is huge, then there's probably a long time before AI can take over. If the maximum human dimensionality is small, then AI can very easily confuse (or work around) any single human, by having the space to represent two different concepts which seem the same to us — the same way dogs can't see red balls on green grass because they have two kinds of color sensor in their retina, rather than the three in a human retina.

One may point out that a human is different from all humans: we clearly do (and therefore can) learn different models of the world, and therefore it may seem reasonable to suggest that any given AI may outfox any single one of us without necessarily being able to outfox all of us — but even then, Alpha Go demonstrated that it could find things in the game-space of Go that no human had ever thought of.

I know how to spot my literal blind-spot, but it takes a deliberate effort; How might I learn to find a cognitive blind-spot? How do I learn the shape of the thoughts that I cannot think?

I also want our future AI to be literally incapable of wanting to harm humanity, akin to in software "Make invalid states unrepresentable". But, an AI which cannot distinguish good from evil is definitionally Machiavellian; and if a model can distinguish them, and if the model is distributed, then it's almost trivial to find and flip that specific axis. But I did say "and if the model is distributed": if it isn't distributed, then the "switch" can be kept out of sight and away from fiddling hands.

There is another possible safety mechanism, which is at least worth considering: if all possible moral issues are squashed down into a single dimension (something I've previously noticed humans did, but only by going past it) then anyone with a copy of distributed model will only have one single good-evil axis to modify — they would still be able to make a demonically evil moustache-twirling villain, but they wouldn't be able to accidentally destroy the world by turning off all the dials except the one saying "maximise shareholder value by maximising the number of paperclips", nor would they be able to run a dictatorship by saying "I am the only person who matters, everyone else is my slave".

The downside, and yes there is a downside, is that this will be the morality of whoever makes the model, and basically everyone disagrees on what "good" is. For example, even limiting the world to those who follow the Ten Commandments, "thou shalt not kill" is often regarded as so incredibly obvious that it shouldn't need to be stated, and yet that specifically has sometimes included and sometimes excluded each of: meat, abortion, war, the death penalty, and suicide — can any of us truly imagine how much broader the range and depth of disagreement is in aggregate, or is this, too, an unthinkable thought?

Tags: AI, brains, cognitive bias, machine learning, Minds, Philosophy, Psychology, questions, rationality, reasoning, theory of mind, thought, word2vec

Categories: AI, Minds, Philosophy, Psychology