Yes, AIs ‘understand’ things
At least, the classic argument against AI understanding—John Searle’s ‘Chinese Room’ thought experiment—is fatally undermined by large language models.
By far the most famous philosophical argument deployed by skeptics of artificial intelligence is the “Chinese Room” thought experiment, put forth in 1980 by John Searle. The argument holds that, no matter how successful an AI is in emulating human behavior—even if it answers questions the way a person would answer them, thus passing the Turing Test—it can’t be said to have “understanding.”
The Stanford Encyclopedia of Philosophy says that “the Chinese Room argument has probably been the most widely discussed philosophical argument in cognitive science to appear since the Turing Test. By 1991 computer scientist Pat Hayes had defined Cognitive Science as the ongoing research project of refuting Searle’s argument.”
Well, thanks to ChatGPT and other large language models, that project can now be declared a success. The way LLMs achieve their remarkable conversational fluency fatally undermines Searle’s argument.
By itself, this doesn’t mean that Searle’s belief that AIs are incapable of “understanding” is wrong. But it does mean that if he, or his intellectual heirs, want to defend this belief, they’ll have to come up with a new argument. And I don’t think they’ll succeed.
Before I explain why, here’s a brief plot spoiler about my own take on the AI understanding question:
Whether AIs can be said to have “understanding” depends largely, I think, on whether you define understanding as entailing the subjective experience of understanding. If you do define it that way (something Searle didn’t do in his classic 1980 paper), then the jury is out and may be out forever; we have no way of knowing whether AIs have subjective experience—whether it’s “like something” to be them, as Thomas Nagel famously put the consciousness question.
If you want a more useful definition of understanding—a definition that lets us ponder the question of AI understanding without knowing the answer to the AI consciousness question—the most natural candidate, I think, is something like this:
“Understanding” means processing information in broadly the way human brains are processing information when humans are having the experience of understanding.
Or, to put a finer point on it:
“Understanding” means employing structures of information processing that are functionally comparable to the structures of information processing that, in the human brain, are critical to understanding.
If that’s what you mean by understanding, then LLMs, I submit, have key elements of understanding. I’m not saying they have all the elements of understanding. But they seem to have about as many of the elements as you’d expect, given their current level of evolution. And there’s good reason to think that more evolution will bring more elements.
My full critique of Searle’s argument—and my discussion of the elements of understanding that LLMs already possess—takes a while to lay out. But the essence of the critique is pretty easy to summarize:
Central to Searle’s argument is his contention that AIs, in contrast to biological brains, don’t connect words to meaning. The kind of information processing that AIs do has “only a syntax but no semantics,” he wrote in his original presentation of the Chinese Room argument, the 1980 paper “Minds, Brains, and Programs.”
This was arguably true of the kind of AI he was assessing back then, but it’s not true of today’s large language models. LLMs do have a way of mapping words onto meaning, and so can’t be dismissed as mere “stochastic parrots” that are just doing “fancy auto-complete”—phrases that Searle’s intellectual heirs use to describe LLMs. (Obscure footnote for any philosophers who are intimately familiar with Searle’s argument: No, as I explain below, it can’t be saved via the “intentionality” aspect of the semantics issue.)
OK, now I’ll back up and explain Searle’s argument and why LLMs are fatal to it, and the sense in which they have elements of understanding.
Searle, in his thought experiment, imagines himself locked in a room. Periodically, someone gives him (slips under the door, presumably) pieces of paper with questions written in Chinese. Searle is expected to respond to these questions in written Chinese.
Any hopes that his responses might be coherent and appropriate are dimmed by the fact that he doesn’t speak Chinese. He has no clue as to the meaning of the Chinese symbols. In fact, he doesn’t even know that these questions are questions.
However, he has been given written rules that will help him formulate a reply. “These rules instruct me how to give back certain Chinese symbols with certain sorts of shapes in response to certain sorts of shapes given me.” These rules for transforming input symbols into output symbols are so good that Searle’s replies convince Chinese speakers outside the room that he understands Chinese.
Searle says that nowhere in the Chinese Room—in his brain or anywhere else within the four walls—is there a place where something called “understanding” can be said to reside. (And for that matter, it’s fair to say that nowhere in his realm of subjective awareness is there a sense of understanding. But, again, Searle’s 1980 argument doesn’t depend on this question of subjectively experienced understanding, and I’ll treat it separately, below.) Yet something in the Chinese Room is generating responses that seem to people outside the room to be products of understanding.
That, Searle says, is what AI is like. At the heart of AI is a computer program—the functional equivalent of the written rules that guide the man in the Chinese Room. The computer, in executing the program, transforms input symbols into output symbols, just like that man does. These output symbols may seem to naive observers to reflect understanding, but they don’t; nowhere in the computer, or in the Chinese Room, does understanding exist.
One odd thing about Searle’s rendering of the Chinese Room argument is that it changed significantly over time—apparently without his realizing it. In 2010 he put it this way: “I demonstrated years ago with the so-called Chinese Room Argument that the implementation of the computer program is not by itself sufficient for consciousness or intentionality (Searle 1980).”
But, again, the 1980 paper makes no mention of consciousness—or of anything that’s tantamount to consciousness (like “sentience” or “subjective experience”). It would be easy to miss this, because the 1980 paper does talk about “intentionality”—a word that, in ordinary usage, implies subjective experience. So when Searle writes in that paper about “such intentionality as computers appear to have” (by way of denying that they have it), it sounds as if he’s talking straightforwardly about a subjective state.
But in philosophy, “intentionality” is a technical term that means, basically, “the property of being about something.” This property, to be sure, is most commonly attributed to conscious mental states; if I think about my dog, that thought has intentionality. But many philosophers would also say that if I write a sentence about my dog, or make a documentary about my dog, the sentence, or the documentary, has a kind of intentionality. And some philosophers say that an unconscious state of mind can have intentionality. If, for example, I’m unconsciously monitoring a sound or some other feature of my environment, there must be some kind of representation of that feature somewhere in my mind, and that representation is about that feature.
I’ll get back to the question of what Searle himself seems to have meant by “intentionality.” But first, let’s take a closer look at large language models against the backdrop of the framing of the “understanding” question that I suggested above: Is the information processing that’s going on inside a large language model when it’s answering questions in important respects like the information process that goes on inside the human brain when humans answer questions?
Since a large language model is structurally pretty different from a human brain, there will inevitably be senses in which the answer is no. But it turns out there are senses in which the answer is yes—and one of those senses is the sense in which Searle, in 1980, insisted the answer was no: his assertion that AIs have “no semantics.”
Large language models depend critically on mapping sequences of symbols—in particular, words—onto what can be called “semantic space.” Here’s a crude but, for present purposes, adequate example:
Suppose you make a graph with an x and y axis on which to map the words for various animals. The x axis is labeled “degree of lethality” and the y axis is labeled “speed.” Both a rattlesnake and a tiger would have pretty high x values, but the tiger would have a much higher y value. So you might plot these as: rattlesnake (8,3); tiger (10,10). And a scorpion might be, say, (5,2). You could keep adding dimensions to your map; the z axis could represent size, for example.
Large language models, during their training phase, construct a map like this, except way more complicated. For starters, it has thousands and thousands of dimensions. (Also, not all the dimensions are semantic, but we can ignore that fact for present purposes.) In this semantic space, a word’s location is specified by a very long sequence of numbers. So instead of two coordinates—like (8,3) for “rattlesnake”—you’d have thousands of numbers separated by commas (8,3,12,1,14…..).
Of course, it’s hard to envision a semantic “space” with thousands of dimensions. Still, just as it’s possible to say exactly how close “rattlesnake” is to “tiger” in two-dimensional semantic space (by measuring the distance between those two points), it’s possible to say exactly how close two words are in 10,000-dimensional space. And, as you might expect, it turns out that, in an LLM’s vast semantic space, words that are close together tend to have a lot of overlap in meaning. Thus, “shoe” is closer to “boot” than either is to “tree.” And “tree” is closer to “bush” than to “shoe” or “boot.”
The string of numbers representing a word (8,3,12,1,14….) is called a vector, and it amounts to a way of representing the meaning of a word. If I asked you what a tiger is, you might say it’s an animal with four legs and fur that has stripes and runs fast and can kill pretty big animals and eat them and so on. That list of characteristics would reflect your understanding of the meaning of the word tiger. And each of these characteristics (including the binary ones) can in principle be represented along a dimension in what I’m calling “semantic space.” In that sense, at least, the vector that an LLM assigns to a word is functionally comparable to structures of information in your brain that represent your understanding of the word’s meaning.
This description of what goes on inside a large language model isn’t nearly complete, but it’s already to a point where it poses a threat to Searle’s argument. The “formal symbol manipulations” in an AI, he wrote, are “quite meaningless… they have only a syntax but no semantics.” Clearly, large language models have, in some sense, semantics.
Now, the presence of “semantics” in LLMs doesn’t by itself mean we can pronounce Searle’s argument dead. There is a reply he could make that seems at first glance to breathe life into it. The reply involves that term “intentionality.” Here is a fuller rendering of the passage from his 1980 paper that I quote in the paragraph above:
AIs can be said to lack understanding, Searle writes, “because the formal symbol manipulations by themselves don't have any intentionality; they are quite meaningless; they aren't even symbol manipulations, since the symbols don't symbolize anything. In the linguistic jargon, they have only a syntax but no semantics. Such intentionality as computers appear to have is solely in the minds of those who program them and those who use them, those who send in the input and those who interpret the output.”
Again, “intentionality” means something like “aboutness.” For the word “apple” to involve intentionality, the word has to be about an actual apple; it has to represent, to refer to, something in the real world—in this case, a particular kind of fruit. And Searle is saying that within an AI—at least, within his Chinese Room version of an AI—no such connection between word and real-world object is made; only in the minds of computer programmers and of people who read the computer’s output does such a connection exist. The computer itself wouldn’t know an apple if it met one on the street.
Now, as we’ve just seen, in a large language model there is a kind of representation of an apple; the word “apple” occupies a position along thousands of semantic dimensions, and together those positions capture the properties of real-world apples. Doesn’t this alone qualify as “intentionality”?
Beats me! Philosophers consider intentionality a pretty slippery concept, and I agree. But let’s assume that Searle has an especially strict conception of intentionality and answers that question with a firm no. He might elaborate on his answer by saying something along these lines:
Where in this large language model is the word “apple” really connected to a real apple in the real world? Granted, there is a multidimensional map of words that is based on meaning, not on superficial patterns in the symbols that constitute the words—a map on which “apple” is closer to “banana” than either word is to “Appalachia” or “bandana.” But that’s just a semantic mapping of words relative to one another. There’s still no mapping of words directly onto the things in the world that they represent—no connection between symbol and symbolized comparable to the connection that exists in the mind of a human who has encountered real apples in the real world. And that’s the kind of connection that thoroughgoing intentionality demands!
This might have seemed like a powerful response a few years ago, at the dawn of the age of LLMs. But some LLM makers—including OpenAI and Google—have since connected their LLMs to image recognition software. So you can point your smartphone camera at an apple and say, “What is this?” and the AI will say “an apple.” And if you ask it what an apple is, it will give you a good answer, probably involving the word “fruit.” And if you ask it what desserts you can make with apples, it will handle that question capably as well. This kind of “multimodal” LLM, unlike the AI in Searle’s thought experiment, and unlike an LLM without a vision mode, would know an apple if it met one on the street.
If a human being looked at an apple and said, “That’s an apple, and it’s an edible fruit, and you can use it to make an apple pie,” Searle would say the human is exhibiting understanding. And, to judge by his 1980 paper, that wouldn’t be because the human has the subjective experience of understanding (a phenomenon, again, that Searle doesn’t mention in that paper). Rather, it would be because the human's brain is performing the functions that the 1980 paper associates with understanding—assigning meaning to strings of symbols and doing so not just by mapping their relative semantic relationships but by mapping them onto things in the real world. And that’s what these multimodal AIs are doing: employing not just “syntax” but “semantics,” complete with “intentionality.”
Is it possible that, even though Searle didn’t make consciousness a pre-requisite for understanding in his 1980 paper, he actually thought he was doing that, as his 2010 recollection suggests? Maybe, for example, he thinks of intentionality as entailing consciousness?
Hard to say. On the one hand, he writes in his 1980 paper that “certain brain processes are sufficient for intentionality”—which would seem to mean that we can in principle determine that an information processing system has intentionality just by inspecting its dynamics, without reference to any subjective experience that may or may not accompany them.
What’s more, Searle says in other writings that unconscious mental states can have intentionality. (Or so I gather by poking around the internet—his writings on the subject are pretty technical.) On the other hand, Searle also seems to believe that these unconscious states can be said to have intentionality only if they could “potentially” become part of consciousness.
Whether all this, taken together, suggests that in 1980 Searle did mean to say that consciousness is a pre-requisite for understanding is a question you could organize a graduate seminar around. But what I want to emphasize is that, for present purposes, the answer to that question doesn’t matter. Because one premise of the argument I’m making is that if we want to usefully operationalize the question of whether computers are capable of understanding we should jettison the consciousness question and focus instead on questions like this: Do computers process physical information in broadly the way human brains are processing it when humans are doing what we call understanding?
So even if Searle had in 1980 explicitly linked intentionality to consciousness—and even if every other philosopher did that, and every other philosopher also asserted that true understanding must involve intentionality—my response would still be: Look, if you want to assert that computers can’t have understanding because they’re not conscious, fine—you’ve successfully made your argument impervious to counterargument, since we have no way of knowing whether computers have subjective experience. But if you want to turn your argument into something we can actually argue about, you’re going to have to make it about the processing of physical information within a large language model as it compares to the processing of physical information, within a human brain, that underpins human understanding.
And I’m arguing that such a comparison yields, at a minimum, this parallel: an LLM, like the brain, connects words to meaning. And it does that in two senses:
(1) It depicts the semantic relationship among words, and this involves associating each word with properties of the thing the word represents. (This doesn’t mean the brain uses a system precisely comparable to the multidimensional semantic map employed by LLMs—though there have long been scientists who do posit somewhat comparable structures in the brain.)
(2) If it’s a multimodal LLM, it connects words more directly to the things in the world that they represent. It thus does a kind of information processing that could underpin “intentionality” even in a strict sense of the term.
How many other elements of understanding do LLMs have? Any attempt to answer this question will run into two problems: (1) There’s a lot we don’t know about how the human brain processes information. (2) There’s a lot we don’t know about how LLMs process information.
But researchers who do “interpretability” work on LLMs—which is to say, try to figure out exactly what’s going on inside them—are making progress. (They say, for example, that they’re learning how to reach into an LLM, find the representations of “concepts” and fiddle with them in ways that alter their meaning.) So new elements of understanding may come to light.
In any event, as progress in AI continues, I suspect that new elements of understanding will evolve—though there may be a lag between the evolution of these features and our discernment of them through interpretability research.
If you’re wondering why the inner workings of LLMs are so mysterious that there’s a whole research program devoted to fathoming them, the answer lies in how these workings took shape in the first place. The LLM’s semantic map, for example, wasn’t programmed into it by human beings. Rather, the machine itself built—and to some extent, I gather, invented—the map in the course of its “training.”
As it happens, the Chinese Room offers a good metaphor for how that happens. At the beginning of the training session, the unformed LLM, like the man in the Chinese Room, is given an incoming string of symbols and told to generate a corresponding outgoing string of symbols. And, like the man, it has no idea what the symbols mean. But, unlike the man, it doesn’t have a reference manual that compensates for this lack of understanding by giving it the correct answer. So, the machine has to develop something like understanding—at least, something that’s closer to an understanding of the relationship between the incoming and outgoing strings of symbols than anything that exists in the man’s brain.
The machine—a “neural network”—does this through a trial and error process that involves a kind of continual rewiring of its “brain.” Every time it gets an answer wrong, it changes the strength of the connections among some of its “neurons.” (These modifications aren’t entirely random; they are guided by feedback about how far its last answer was from being correct.) Over time, the machine’s output gets better and better, and the tweaks of its “brain” get subtler and subtler, until finally the LLM is, like the man in the Chinese room, able to consistently give observers the impression that it “understands” the language.
But at this point there’s a big difference between what’s going on in the LLM and what’s going on in Searle’s version of the Chinese Room: The LLM’s brain now has structures of information that relate the incoming symbols to meaning; and this map of meaning is used to generate outgoing symbols that make sense.
There’s no such semantic intermediation in Searle’s version of the Chinese Room—no mapping of symbols onto meaning in the man’s brain, and no such mapping in the reference manuals the man is using. All those manuals have is rules for converting one string of inscrutable symbols into another string of inscrutable symbols. Searle seems to have thought this lack of semantic intermediation was an intrinsic property of AI algorithms—perhaps understandably, given the approach to AI that was dominant back when he wrote his classic paper. But that approach has since yielded center stage to the “neural network” approach, which in 1980 was a fringe enterprise championed by such researchers as the then-obscure and now-famous Geoffrey Hinton.
It’s early days. We still don’t know all the things going on inside these large language models. And they definitely haven’t replicated all the elements of human understanding, even if they’ve replicated some of the most basic ones. For example, they are bad at some forms of reasoning that humans consider elementary. In one famous case, an LLM was asked who Tom Cruise’s mother is and correctly answered “Mary Lee Pfeiffer”—but when asked who Mary Lee Pfeiffer’s son is, the LLM was baffled.
But computer scientists are working on these problems, and the more robust their solutions are, the more willing I’d be to bet that the solutions involve structures of information processing that are, at some level of abstraction, comparable to structures of information processing in the human brain.
Meanwhile, these AIs already have the hallmarks of understanding that Searle laid out in 1980: semantics, not just syntax—complete with “intentionality.” If Searle or his intellectual heirs want to raise the bar, and name other things as prerequisites for understanding, they can do that. But the ball is in their court. Searle’s Chinese Room argument, as classically formulated, is dead.
Like all dead things, it has a legacy. And it has a rich one—lots of illuminating discussion and debate, which has led people on all sides to sharpen the articulation of their arguments and clarify their own thinking. Searle’s Chinese Room thought experiment has enhanced our understanding of understanding, even if we still don’t agree on what that means.
Image by Clark McGillis using DALL-E
My guess is that a human who knows the name of Tom Cruise’s mother still could have trouble going the other way around if asked who her son is. I would think of Tom Cruise being a big word balloon in somebody’s head with a bunch of factoids that spring from it, whereas his mom is just one of those factoids and therefore harder to index if mentioned out of context.
Gemini is tougher on itself than you are!
https://g.co/gemini/share/46031316c8ca