How Minecraft-playing AI is pushing the frontiers of machine learning
- Adam Bluestein
Researchers have taught an AI to complete complex tasks in the most popular video game of all time. This is why.
The ability of artificial intelligence to play games at the level of human experts has become a popular proxy for gauging overall progress in the field. It’s been 25 years since IBM’s Deep Blue beat the world chess champion, Gary Kasparov, in a best-of-six-game match in 1997. In May 2017, DeepMind’s AlphaGo software defeated the world’s best player of the Chinese strategy game Go, which boasts 300 times as many possible moves as chess. And last month, an international consortium of AI researchers announced significant milestones in a game that is, in many ways, more challenging still: Minecraft.
In a preprint article “Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos,” released on June 22 this year at arXiv.org, the authors, who hail from OpenAI and the University of British Columbia, describe how they developed AI agents that learned a range of complex tasks in Minecraft, achieving human-level performance in many of them, all while using a “native human interface,” i.e. a mouse and keyboard. They also claim to be the first group to develop computer agents that can craft “diamond tools” within the game — a difficult task that takes proficient human players 20 minutes or so and requires some 24,000 gameplay actions to accomplish.
But why bother?
“Minecraft has many attributes that make it difficult, significant and interesting as an AI challenge,” says Jeff Clune, an associate professor of computer science at UBC and one of the paper’s nine authors. Games like chess and Go, have clear objectives — and progress toward those objectives can be measured. Minecraft, on the other hand “has no real goal,” Clune says.
It’s hard to specify, in advance, formal rules that cover every possible situation one may encounter. “Like life, the possibilities of what one can do are endless,” Clune says. “It’s up to the player to choose goals, try to achieve them, continuously learn new skills and do interesting new things. It also requires some of the physical common sense that humans have about navigating and accomplishing goals in a 3D space. It is in many ways a simplified microcosm of the natural world.”
Developed by Swedish game designer Markus ‘Notch’ Persson and launched by his company Mojang in 2009, Minecraft is the best-selling videogame ever, with over 238 million copies sold and nearly 140 million monthly users as of 2021. (Mojang was acquired by Microsoft in 2014.) Players all over the world are familiar with the blocky, 3D world of the “sandbox” game, which allows them to explore virtually infinite terrain, extract raw materials, “craft” tools, and build structures and “machines.”
While AI systems can be taught a game such as Go by using reinforcement learning — where an algorithm is given a goal and rewarded for progress toward it — so-called “hard exploration problems,” where rewards are sparse and random exploration rarely finds successful solutions, demand a different approach. (Some other hard-exploration problems include navigating websites, using software programs and booking online flights) The Minecraft researchers used a variation on imitation learning, in which an AI agents learn optimal “policies” for what to do in given situations by imitating “demonstrations” performed by an expert (often, a human). Imitation learning is simplest when demonstrations are labeled with corresponding actions — and it has been successfully used in aerial vehicles, self-driving cars, board games and video games. There wasn’t a lot of labeled demonstration data available for Minecraft. But there were hundreds and thousands of hours of unlabeled video data, though, publicly available on places like YouTube and Twitch.
While pretraining foundational AI models on “noisy,” internet-scale datasets has been effective in natural language processing and computer vision, for example, it has proven more difficult to employ this approach when sequential decision making is required—as in game playing, robotics, and computer use. So, the Minecraft researchers developed a hybrid method. First, they labeled a small amount of data the “hard way”— by tracking human players playing the game and recording their keystrokes and mouse movements as they completed game tasks, annotating about 2,000 hours of video.
This human-labeled data was used to train an algorithm that could autonomously label a much larger amount of video data — some 70,000 hours of unlabeled gameplay in all. (It was more than 90% accurate at labeling keyboard and mouse commands.) Using an imitation-learning method called behavioral cloning, the researchers used this large set of newly labeled data to pretrain an AI Minecraft “player” that mastered many basic game skills right out the gate — chopping down trees, making planks, building crafting tables. The AI was also observed swimming, hunting, cooking and “pillar jumping.” With additional fine-tuning, it performed more reliably and advanced to fabricating wood and stone tools, building shelters, exploring villages and raiding treasure chests. Through further fine-tuning with reinforcement learning, it learned to build a diamond pickaxe — an unprecedented step for AI.
The OpenAI researchers believe their approach — using humans to train a data-labeling algorithm — could unlock other large video data sets, too, allowing intelligent agents to learn a range of more general computing tasks by watching videos on the internet. (There are literally millions of hours of online video tutorials about using Photoshop, building websites and even coding for Minecraft, for a start.) “One can use VPT to train virtual assistants that can help you accomplish tasks in the digital world,” says Clune. “Our agent learned to do complex things on a computer — using computer controls in the form of a keyboard and mouse — that take humans 20 minutes or more on average. What can you accomplish in 20 minutes on a computer? Quite a lot!”
Developing AI for Minecraft is a decidedly collaborative effort — Clune requested that I name all of his paper co-authors.* MineRL, a research project started at Carnegie Mellon University, aims to get even more people involved. With organizers and advisors drawn from the likes of Microsoft Research, OpenAI, AIcrowd and DeepMind, MineRL offers a suite of environments within Minecraft for testing AI agents, along with over 60 million frames of recorded human player data. MineRL also organizes an annual competition.
The 2022 Basalt competition — the name is an acronym for Benchmark for Agents that Solve Almost-Lifelike Task — challenges developers to produce agents that are judged by real humans to be effective at solving a given task, which calls for human feedback, through demonstrations, training on human preferences, or using humans to correct agents’ actions. Specific challenges include finding a cave, making a waterfall, and building an animal pen and a house. Thanks to sponsors FTX Future Fund Regranting Program, Encultured.ai, and Microsoft, there will be $20,000 worth of cash prizes for the best solutions and conditional $50,000 - $100,000 milestone prize. Entries are being accepted until October.
Whether manipulating blocks in the virtual world of Minecraft yields the building blocks for more versatile and efficient machine learning in the future or not, our experiences with chess and Go suggest that human players shouldn’t expect to enjoy a competitive edge for very much longer.
*Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune