Dr Jumbly Grindrod, Lecturer in Philosophy
Me: Write a captivating opening sentence of a blog piece about philosophy of language and large language models.
ChatGPT: “Imagine a machine that can understand and produce language as well as a human, if not better: this is the reality of large language models, and with it comes a new frontier in the philosophy of language, raising questions about the nature of meaning, understanding and the relationship between language and thought.”
ChatGPT did a better job than I could of writing the opening sentence for this blog post.
It is results like this that justify the considerable buzz around large language models. These models seem to possess incredible abilities to produce extended text in response to brief prompts – they can write code, limericks, stories, competent essays, and even engage in conversation. (If you haven’t yet tried playing with one, ChatGPT is probably the best place to start).
I work in the philosophy of language, and the success of these models has made me reflect on whether digging into the way they work might tell us something about the nature of language and linguistic meaning. So how do these models work?
At their heart, these large language models draw on a way of thinking about meaning known as distributional semantics. Distributional semantics treats the meaning of a word as its distribution across a large body of text or set of texts. To capture what “dog” means for instance, you capture in which sentences it turns up and those in which it doesn’t. Typically, this is done by representing the distribution of each word with an ordered list of numbers that corresponds in some way to its distribution. The simplest method (e.g. for “dog”) is to list, for every word, how often “dog” appears next to that word. The ordered list would then represent the distribution of “dog”. (If you would like to see this approach in action, I have a Google Colab notebook where you can run some code yourself that produces these lists.) An ordered list of numbers can also be understood as a vector (i.e. a line in a space defined by its magnitude and direction), with each number in the list giving a coordinate along a dimension. Every word then would have a vector that occupies some part of a high-dimensional space. It is for this reason that this approach is often referred to as vector space semantics. You can see one of these spaces represented here. A clear benefit of understanding meaning in this way is that the distribution of a word can be automatically computed from very large bodies of text, even when the text has absolutely no accompanying information. This makes the approach highly scalable, which partly accounts for its success.
The way each word is represented is highly interdependent on its relationships with all other words. If we take that vector for “dog” mentioned earlier, the exact vector it ends up with depends on its distributional relations to all other words in the text. For this reason, in philosophy we would describe this as a holistic approach to capturing meaning. It is holistic insofar as the meaning of each word is dependent on its relationships with all other words.
There has been very little discussion of distributional semantics in philosophy. But there has been a great deal of discussion about holism, and it is fair to say that it doesn’t have the best reputation. There have been a number of arguments given as to why holistic accounts of meaning don’t work. Chief among them is the so-called instability objection. The idea is that if the meaning of a word depended on its relationships with the meaning of all other words, its meaning would change when the meaning of any one of those other words change. So a change in the meaning of one word would change the meaning of every other word. But this just doesn’t seem to be the case for words in a language. The meaning of one word can change while others remain the same, and words can be introduced or lost without affecting the meaning of others. The fact that we gained the word “Brexit” around 2014 didn’t lead to a wholesale change in meaning across our vocabulary—maybe it affected the meaning of “Europe”, but I haven’t noticed any change in the meaning of “kettle” or “University” or “the”.
In our recent work, Nat Hansen (Reading), J.D. Porter (Stanford) and I have investigated whether this instability objection applies to the distributional view behind large language models. We provide a defence of the distributional view on two fronts. First, we clarify the way that meaning is represented in these large language models. We argue that meaning there is best understood differentially, and that this can help disarm some of the instability objections that have been raised in the philosophical literature. Briefly put, the idea is that word meanings are defined directly in terms of their relations to other words, rather than in terms of the specific vector space that we construct. Second, we created our own language models in order to explore how word vectors change as we expand the corpus that the models are built from. In this way, we can explore just how unstable these models of language meaning actually are. We argue that there are in fact impressive levels of stability in these models, and the change that you do see occur can be made good sense of. What instability these models display is really better thought of as sensitivity to subtle shifts in meaning – it is a feature rather than a bug!
The holistic nature of these models is no barrier to taking these models seriously as capturing meaning in a language. We hope to show that beyond their astonishing ability to produce fluent text in a huge variety of genres, they can function as philosophical tools for cracking open classic questions about the nature of meaning in new ways.
Jumbly Grindrod, J.D. Porter, and Nat Hansen’s paper on distributional semantics and holism is currently at a draft stage for a forthcoming volume titled “Communication with AI – Philosophical Perspectives” edited by Rachel Sterken and Herman Cappelen.
Jumbly Grindrod’s paper on “Distributional theories of meaning” is currently in preparation for a forthcoming volume titled “Experimental Philosophy of Language”, edited by David Bordonaba-Plou.