3D render of blue protein molecules
An arrow pointing leftHome

How deep learning can help build new and improved protein

  • Hope Reese

“The proteins we find in nature are amazing molecules, but designed proteins can do so much more.”

Proteins are one of the most critical building blocks in the body. They can repair and build tissues, trigger metabolic reactions, store nutrients and coordinate many other bodily functions. As such, they play a vital role in the body’s immune response. And now, artificial intelligence is being applied to give proteins an extra boost.

The University of Washington School of Medicine and Harvard University recently published an article in Science, “Scaffolding protein functional sites using deep learning,” showing how artificial intelligence is being used to design proteins with “a wide variety of functions.”

Protein-engineering has been going on for decades. But because of the complexity of a single protein molecule, which can be composed of thousands of bonded atoms, these biological structures can be difficult to replicate.

Machine learning can now be used by scientists to trigger novel responses to prompts, presenting a major innovation in how proteins can be understood and designed. It has recently been applied to many different kinds of biological processes including an AI-designed protein that can turn “on” certain genes, as well as viral pathogenesis — understanding what happens to genes and proteins in the presence of SARS-CoV-2, the virus that causes COVID-19. The findings from this new research can be applied to cancer treatments or vaccinations.

“The proteins we find in nature are amazing molecules, but designed proteins can do so much more,” said David Baker, a professor of biochemistry at UW Medicine and senior author on the study, in a press release. The research extends what scientists can do by manufacturing computer-generated proteins by applying AI to detect patterns in the data, according to lead author Joseph Watson, a postdoctoral scholar at UW Medicine. “Once trained, you can give it a prompt and see if it can generate an elegant solution. Often the results are compelling — or even beautiful.”

The research team used a public database called the Protein Data Bank, which includes hundreds of thousands of protein structures, to train neural networks on. Through a process they call “hallucination” — or AI-image-generation — they could recreate proteins based on a prompt. They also used another process called “inpainting,” which has been likened to autocomplete, which can fill in the missing information. They use the analogy of an AI-generated book, meaning that each section needs to make logical sense, as the story unfolds. The AI will edit as needed, until this happens — in the form of strings of amino acids, which can be mutated until they seem accurate. Then, researchers can take a closer look at these sequences in the lab.

This is a milestone, but still requires thorough testing before being used in medical settings. But the authors believe that the methods for studying proteins are only getting better.

“Deep learning transformed protein structure prediction in the past two years,” said Baker, a recipient of the 2021 Breakthrough Prize in Life Sciences, in a release. “We are now in the midst of a similar transformation of protein design.”