books in pile open
An arrow pointing leftHome

This new process can help researchers perfect AI text generation

  • Alexander Gary

Washington researchers have unveiled MAUVE, a new method for comparing human text to AI.

Text generation is one of the most visible uses of modern AI, but it’s not the easiest tech to perfect. Text generators, which rely on natural language processing, or NLP, aim to create continuous strings of words, sentences, and paragraphs that look and feel like they were written by flesh-and-blood humans.

There are currently a number of such language generation AIs attempting the feat, but too often, the text they produce doesn’t measure up to natural-looking human writing. Now, a team of researchers at the University of Washington, Stanford, and the Allen Institute for AI may have found a way to begin correcting the issue — with a system called MAUVE.

In December, 2021, the researchers received the Outstanding Paper Award at the annual academic summit NeurIPS, for MAUVE, or “Measuring the Gap Between Neural Text and Human Text.” MAUVE is a new tool that helps analyze an AI model’s generated text and the human text it’s attempting to recreate.

MAUVE primarily seeks to identify two different types of errors — when a machine creates text that looks too repetitive or garbled to have been written by a person (in the paper, they call this a “Type I Error”), and when it can’t credibly reproduce the diverse vocabulary used by humans (a “Type II Error”).

Unlike computers, we humans use a myriad of idiosyncrasies in our speech: we employ and understand slang, sarcasm, homonyms, as well as mistakes in grammar or pronunciation. This isn’t to say that AI can’t be trained to learn our way of speech and writing, but that we are typically much more distinct and open-ended when we speak than AI can predict. But getting it right is of great interest to tech companies, investors, and consumers.

We find NLP in much of our everyday tech — everything from virtual assistants to chatbots to digitally targeted ads all use language processing to make sense of our communication styles. NLP helps you with your web searches; sites like Google and Bing use predictive text to help users specify their queries in a sea of billions of results. As routine functions rely on increasing amounts of AI, it’s important for us to understand NLP and text generation as a resource and build it to understand us as well as we understand it. This is where MAUVE comes in.

MAUVE measures the various mistakes in generated text by inspecting Type I and II Errors with information divergence frontier models. Then, it sums up the results in a single visual scalar that shows the gap between the two texts, however large or small they are. As the team found, MAUVE was able to locate patterns in the mistakes made by generated text, and did so with more human-like accuracy than other available evaluation metrics.

The researchers sought out to create MAUVE to get as close as possible to “open-ended text generation [that is] coherent, creative, and fluent.” That is, generated text that passes muster in the eyes of average readers.

They are optimistic that they have connected the bridge between human and machine capabilities, and that MAUVE can be used to help us wherever we find AI: from telling apart real human content versus fake generated ones, assisting people in typing longform documents and emails, and even offering better customer service the next time we’re on an automated call to make a doctor’s appointment.