Illustration of constructed person on graphic yellow and white background
An arrow pointing leftHome

The problem with overestimating AI

  • Hope Reese

An interview with the UW computational linguist Emily M. Bender, who was quoted in, then rebutted, science journalist Steven Johnson’s big New York Times story on OpenAI.

Emily M. Bender (Corinne Thrash | University of Washington College of Arts & Sciences)

In April, the New York Times Magazine ran a feature story by Steven Johnson, one of the nation’s most prominent and best-selling science writers, provocatively titled “AI is Mastering Language. Should We Trust What It Says?” The piece was a lengthy exploration of OpenAI’s GPT — a large language model that can take small amounts of input text and autocomplete a suggestion, what we might see in, say, Gmail, but generate much longer strings of text. For instance, if you type a few lines about a historical event, GPT-3 might spew out additional paragraphs that can fill in the rest.

Johnson writes that GPT-3 is “approaching a fluency that resembles creations from science fiction like HAL 9000 from ‘2001’” and makes it clear that he is impressed. However, one of his sources for the story, Emily M. Bender, the faculty director of the Professional MS Program in Computational Linguistics at the University of Washington, was not. She believes that the article about GPT-3 is just one more piece of “AI hype” that “fawns over what Silicon Valley tech bros have done.”

Bender, who is an expert in computational linguistics and a critic of how LLMs are used, was so bothered over the story’s depiction of AI that she was moved to write a 5,600-word rebuttal on her Medium blog just days later. That she was so passionate speaks to the divide emerging today between what technologists and founders promise AI will soon be able to do, and what critics and subject matter experts say are its limitations.

In her view, Johnson was not sufficiently critical of OpenAI, an AI research company founded in 2015 by Elon Musk, Sam Altman, Ilya Sutskever, Greg Brockman, Wojciech Zaremba and John Schulman. And in his framing of the story, positioning her as a skeptic of the technology, takes it as a given that these kinds of programs do what the developers say they will do — without putting the onus on them to prove it.

I spoke to Bender about why, when it comes to AI, we should be careful not to “imagine all kinds of emergent properties that aren’t there — like, maybe, consciousness,” and other issues.

The conversation has been edited and condensed for clarity.

Why were you so moved to write this really long rebuttal over a piece about AI?

I was interviewed by Steven Johnson over email, and in the back and forth I felt like something was really off in his framing. When I read his piece, those suspicions were confirmed. I was bothered both by the way he seemed to take much of what OpenAI was saying at face value and by the way my stance was framed. He presented me as a “skeptic” but in a way that made it seem like the only question that matters is: “Has OpenAI made progress towards AGI [Artificial General Intelligence] or not?” This keeps the framing rooted in OpenAI’s view of the world.

My usual reaction to things like this is to take to Twitter and write a short or not so short tweet thread with screen caps of the article so I can put an alternate framing out into the world and maybe educate folks on how to spot AI hype when they encounter it. But Johnson’s piece is 10,000 words long and there’s a lot of AI hype there! That would have been a very long tweet thread and honestly it was thought of how much work it would take to do it properly, and put in alt text for all of the screen caps et cetera, that made me realize a blog post would be a better format. And once I’d made that decision, I had some space to dig in and spread out and make it more structured.

To write the blog post, I gathered all of the quotes from Johnson’s article that I wanted to take issue with and sorted them into the categories based on the kind of problem they illustrate. And then I started writing … it got really long really quickly and in fact I didn’t even use all of the quotes I’d noted.

Can you try to sum up the main aim of Steven Johnson’s article, as you see it?

OpenAI claims that one of the big threats to humanity is AI systems not aligned with what they see to be human values. And they see their work as addressing that threat by making AGI that they believe can align with human values.

The headline is: “AI is Mastering Language: Should We Trust What It Says?” That’s taking it as a given that it’s mastering language, and asking if it’s trustworthy.

In keeping with OpenAI’s framing, the piece kept out a lot of where the real risks are, and also didn’t take OpenAI’s claims with a sufficiently critical view.

What kinds of questions was he asking, and not asking?

The questions seemed to be around the idea of: Should we teach AI to be ethical? Or, should we teach AI our values? That already situates this technology as a very anthropomorphized entity. That it can be taught things. That it can teach us things. It fails to situate it as: technology is made by people and controlled by people, and fits in a larger economic system. Instead, this is just humans versus AI. The framing was off in two ways. It was treating a large language model not as a sentient entity — he wasn’t going that far — but with metaphors that would be appropriate to a sentient entity. And leaving out all of these other questions about the amassing of data and financial capital.

Why is this kind of framing dangerous?

There are certainly use cases for automation and in particular pattern recognition. But it’s really, really important that the pattern recognition (“machine learning”) be fully evaluated in the actual use context. And that the people using the system have a clear understanding of what it can and can’t do. When companies oversell their “AI” technology, that muddies the waters. It makes it harder for everyone involved — people procuring systems, people regulating them, people who might have to advocate for themselves in the face of these systems — to understand what’s really going on and react appropriately. If we’re constantly told “This is AI!” or “It can understand what you say!” or “It can reason using common sense!”, it makes it harder to effectively evaluate claims of all systems called “AI”. And when journalists don’t hold companies to account, when they instead repeat the companies’ claims uncritically or faux-critically, we are all the worse off for it.

Does this type of article fall into a pattern of reporting on AI that you’ve observed?

Absolutely. There is some great reporting on AI that I cite on my blog such as work by Khari Johnson, Edward Ongweso Jr. and Karen Hao. But there’s a tendency to do this “Gee, whiz! Look at this, should we be scared?” That takes as a given the claims by the technologists that need to be looked at with a critical eye. I think this is happening now because the technology of LLMs, what it’s particularly good at, is creating coherent-seeming text. We see that text, and as humans, apply our ability to communicate with other people to it. That includes imagining a mind behind the text. It takes a lot of effort and skepticism to keep focus on the fact that that is not where this text is coming from.

What is this fixation on the ‘gee whiz’ factor in AI actually obscuring?

There are so many things that the public should be aware of in the domain of so-called “AI” or “AI-infused” technology! What data are being collected about ordinary people in the name of training “AI”? What kinds of work are people being made to do, to maintain the illusion of “AI”? In what ways are these “AI” (really, just pattern recognition) systems being used for surveillance, and how is that hurting already over-surveilled populations? What corners are being cut, in life-critical domains like healthcare or education, or government systems interacting with people in vulnerable positions, because it’s supposedly cheaper to do things with “AI”?

Are OpenAI and other organizations wanting to create “ethical AI,” operating from a good place? And if journalists are overhyping this, is the impulse behind it, “hey, we should be careful about what this technology is doing,” also meant to do the same?

At OpenAI and the new company Anthropic — I believe that the people working there intend to solve a problem that they think is an urgent problem. I don’t think it’s window dressing, in the minds of the people doing it. But there’s an awful lot of resources being poured into what is a hypothetical future-problem — about sentient artificial general intelligence that will be misaligned with humanity. That takes away from the problems we are currently facing with the application of pattern-recognition technology.

The phrase “ethical AI” is interesting. The term “AI” can refer to the technology, things that we set up to function autonomously, and that frequently people want to pretend to have the ability to reason. But “AI” can also refer to a research and development field. So the term has the ambiguity — are we trying to create an autonomous agent that is ethical? Or are we researching in a more ethical fashion? I think those things get conflated and confused in these discussions.

Yes — in your post you refer to “so-called AI.” Especially for a more general audience, how should these terms be used and framed?

There’s a wonderful phrase called “wishful mnemonics,” coined by Drew McDermott. Melanie Mitchell [an AI researcher at the Santa Fe Institute] talks about it too. When we use these cognition terms — believe, understand, know, reason — as names for what the computer program is doing, we are over-inflating, and using terms like that for computer operations makes it harder for us to remember that that’s a machine. Even if it’s useful to us, it’s not doing it how we do it.

There’s an Italian researcher Stefano Quintarelli who said, we should call AI “SALAMI,” meaning “systematic approaches to learning algorithms and machine inferences.” Then it makes it ridiculous to say, “Will SALAMI have emotions? Can SALAMI acquire a personality similar to humans? Will SALAMI ultimately overcome human limitations and develop a self similar to humans?” AI, as a phrase, immediately evokes things from science fiction. C-3PO from “Star Wars,” Data from “Star Trek,” HAL from “2001: A Space Odyssey.” And these are characters who behave similar to humans in the stories. But that’s not what happens with what’s called “AI” in the real world. All of a sudden, in the last 10 years, it became fashionable to brand a certain kind of pattern-recognition as AI. “This software is infused with AI.” What it is, is big data manipulation. That’s why I’m always putting scare quotes around “AI.”

When I read Johnson’s piece, before reading your rebuttal, I also noticed phrases like “unleashed, destroying humanity, shiver down my spine and propensities.” “Propensities,” especially, signals to me that the machine wants to do something — which is not how it works.

Yes. And there’s also a sense of beholding a naturally occurring phenomenon — this natural thing in the world that we’re learning about. A colony of insects has a propensity to do something. Let’s study it and learn about it! That’s entirely appropriate in entomology. But natural language processing programs are a human artifact. It’s true that we’ve built them at a scale that we can’t directly understand. We have built something with emergent properties. But it’s somehow enticing to imagine all kinds of emergent properties that aren’t there — like, maybe, consciousness. The OpenAI CEO mused on Twitter that today’s neural models “might be slightly conscious.”

You also think that Johnson wasn’t following the money behind organizations like OpenAI. Can you explain?

In terms of the big data economy, I think that the work of Shoshana Zuboff is really interesting. She documents in her book, The Age of Surveillance Capitalism, how the so-called AI — or she calls it machine intelligence, without as much distance as I would like — started off with this notion of, you could collect information about yourself and use it in a closed-loop sort of a way to benefit your own self. Right? So, there’s this notion of the smart home that would help you with information. But we’ve ended up in an economy where it’s not closed-loop.

Every time we use something to collect information about ourselves, that information is shared far and wide. The incentives behind getting to bigger and bigger data sets have to do with what for-profit interests can do with that information. There’s some super frightening reporting out right now about one of these data brokers that’s dealing with location data, selling information about people who have visited abortion providers. Put that together with the Texas abortion bounty law, and that’s really frightening. It’s very easy to find places where big data is being used in ways that people don’t realize and that are just directly overtly harmful.

Then the other part of follow-the-money, when you think about organizations like OpenAI is, where is the funding for these coming from? There’s deep connections between OpenAI and Anthropic and the Effective Altruism movement. I understand there are people within Effective Altruism who are really interested in conversations like: I have more money than I need, I would like to put it out into the world to do good, help me think about that. But there are also people in that movement who have overtly stated that it is better for humanity to save the life of a person in a rich country than the life of a person in a poor country. These are organizations whose stated purpose is to align AI with human values, but being connected to those funding sources, it becomes very clear to me that they don’t mean all of humanity.

You are also critical of being positioned, in the article, as a “skeptic.” Can you explain this?

It felt like a kind of both-sidesing that elevated, effectively, the outlandish claims that, “No, this really is AI,” as the more plausible or at least equally plausible stance to people saying, “No, you haven’t made AI.” So, at best it says, “We really don’t know who’s right, here.” And at worst, it shifts the burden of proof to the people who are not the ones making the outlandish claims. The folks building LLMs say, “Hey, we’ve made a big step towards AI.” And you’ve got some people saying, “No, you haven’t.” But those are the skeptics. And so they have to prove their point.

Also, that framing keeps it to this question of, is this or is this not AI? And that is a very small piece of what’s actually the problem here. Which is, when you take this pattern recognition at scale, it is doing harm in the world right now, both by getting the right answer in some cases and by getting the wrong answer in other cases. And if my voice is, “Oh, I’m skeptical about whether or not this is AI,” that misses the more important message that I have, which is part of this larger conversation with scholars like Timnit Gebru and Joy Buolamwini and Deb Raji and Safiya Noble, and Ruha Benjamin and Abeba Birhane. I’m starting you off with the Black women who are leading this conversation, who are talking about the harms that are being done now. And it’s not a question of, “Has OpenAI built AGI or not?”

Should media organizations shift away from this future-thinking, and be more critical of the present? What do you think about the whole perspective?

From the point of view of a researcher, when we ask questions, the questions we ask have a big influence on what answers we can find. And I was speculating in my blog post that something very similar happens in journalism. When the journalist asks a question, that question shapes the possible answers they can find and therefore the public’s understanding of what’s going on. And I feel like the questions being asked now are frequently very much coming from the viewpoint of the people selling the technology and maybe of the people who would use the technology as first person users of it. What’s missing is more questions taking the point of view of people who are affected by the use of that technology, even if they’re not the users or they’re not users by choice.

Another example is a piece in the New York Times by Ingrid K. Williams (prior to Steven Johnson’s piece) that was talking about mental health apps. Again, very credulous about the claims of the people making the technology. So when someone’s asking from the point of view of the user, what could this do for me? I think they also need to be asking: How do I know if this actually works? And that could be, what can I do as a consumer to be a critical consumer of this? But also, what should I expect of my government in terms of regulation? What’s the current state of that? And I think it’s really under-regulated. I think there’s an open question as to whether we are under-applying existing laws that would really already apply versus do we need new laws.

We often think of this technology in terms of, this is something in our homes or this is something that will affect us, and how will it affect us. But we don’t always think about who is affected by it, who’s not even electing to use it. Right?

There is some reporting on that. For example, David Sherfinski and Avi Asher-Schapiro’s reporting on companies that are applying speech recognition software to phone calls by incarcerated people. There are people who, if they want to talk to their loved ones, don’t have a choice, but have their phone calls monitored in this way. And those talking to loved ones in prison — if they want to make that conversation, are required to undergo surveillance. These are people who are not incarcerated, but if they want to maintain connections, need to undergo surveillance. Another example is Google Maps making shortcuts through residential neighborhoods, and you get traffic jams moved from the throughways to the side streets. People in positions like these are called “indirect stakeholders” in value sensitive design.

What do you hope can come away from this discussion?

I’ve been hearing from journalists who are saying “hey, this is useful!” and I’m really glad of that. I’m also hoping that people who are neither technologists nor journalists will learn from this exchange — to take the reporting with a grain of salt, read it with a critical eye and find the places where the journalists seem to be taking the perspective of the technologists, or the makers of the software.