Google turns AlphaFold loose on the entire human genome

Science

Ars Technica 22 July, 2021 - 10:00am 39 views

AI breakthrough will 'transform' biology

BBC News 22 July, 2021 - 10:24am

Proteins are essential building blocks of living organisms; every cell we have in us is packed with them.

Understanding protein structures is critical for advancing medicine, but until now, only a fraction of these have been worked out.

Researchers used a program to predict 350,000 protein structures belonging to humans and other organisms.

The instructions for making human proteins are contained in our genomes - the DNA contained in the nuclei of human cells.

There are around 20,000 of these proteins expressed by the human genome. Collectively, biologists refer to this full complement as the "proteome".

The AI program used for the work is called AlphaFold. It was able to make a confident prediction of the structural positions for 58% of the amino acids (the constituents of proteins) in the human proteome.

Of this, the positions of 35.7% were predicted with a very high degree of confidence, which is double the number of structures confirmed by experiment.

"We believe it's the most complete and accurate picture of the human proteome to date," said Demis Hassabis, chief executive and co-founder of Deep Mind.

"We believe this work represents the most significant contribution AI has made to advancing the state of scientific knowledge to date.

"And I think it's a great illustration and example of the kind of benefits AI can bring to society."

In the prestigious scientific journal Nature, DeepMind researchers detailed how AlphaFold predicted the structures for 350,000 different proteins, including not only the 20,000 in the human proteome, but those of so-called model organisms used in scientific research, such as E. coli, yeast, the fly and the mouse.

The structural layout of different proteins can be worked out using various techniques, including X-ray crystallography, cryogenic electron microscopy (Cryo-EM) and others. But none of these is easy to do: "It takes a huge amount of money and resources to do structures," Prof John McGeehan, a structural biologist at the University of Portsmouth, told BBC News.

Therefore, structures are often determined as part of targeted scientific investigations, but no successful project until now had set out to systematically determine structures for all the proteins made by the body.

In fact, just 17% of the proteome is covered by a structure confirmed experimentally.

Commenting on the predictions from AlphaFold, Prof McGeehan said: "It's just the speed - the fact that it was taking us six months per structure and now it takes a couple of minutes. We couldn't really have predicted that would happen so fast."

Prof Edith Heard, from the European Molecular Biology Laboratory (EMBL), said: "We at EMBL believe this will be transformative for our understanding of how life works. That's because proteins represent the fundamental building blocks from which living organisms are made."

"The applications are limited only by our understanding."

The applications we can envisage now include developing new drugs and treatments for disease, to designing future crops that can resist climate change, or enzymes that can break down the plastic that pervades the environment.

DeepMind has teamed up with EMBL to made the AlphaFold code and protein structure predictions openly available to the global scientific community.

Follow Paul on Twitter.

The decision to remove Kentaro Kobayashi comes a day before the opening show is due to be held.

15 sayings from around the world

A.I. Predicts the Shapes of Molecules to Come

The New York Times 22 July, 2021 - 10:01am

DeepMind has given 3-D structure to 350,000 proteins, including every one made by humans, promising a boon for medicine and drug design.

For some years now John McGeehan, a biologist and the director of the Center for Enzyme Innovation in Portsmouth, England, has been searching for a molecule that could break down the 150 million tons of soda bottles and other plastic waste strewn across the globe.

Working with researchers on both sides of the Atlantic, he has found a few good options. But his task is that of the most demanding locksmith: to pinpoint the chemical compounds that on their own will twist and fold into the microscopic shape that can fit perfectly into the molecules of a plastic bottle and split them apart, like a key opening a door.

Determining the exact chemical contents of any given enzyme is a fairly simple challenge these days. But identifying its three-dimensional shape can involve years of biochemical experimentation. So last fall, after reading that an artificial intelligence lab in London called DeepMind had built a system that automatically predicts the shapes of enzymes and other proteins, Dr. McGeehan asked the lab if it could help with his project.

Toward the end of one workweek, he sent DeepMind a list of seven enzymes. The following Monday, the lab returned shapes for all seven. “This moved us a year ahead of where we were, if not two,” Dr. McGeehan said.

Now, any biochemist can speed their work in much the same way. On Thursday, DeepMind released the predicted shapes of more than 350,000 proteins — the microscopic mechanisms that drive the behavior of bacteria, viruses, the human body and all other living things. This new database includes the three-dimensional structures for all proteins expressed by the human genome, as well as those for proteins that appear in 20 other organisms, including the mouse, the fruit fly and the E. coli bacterium.

This vast and detailed biological map — which provides roughly 250,000 shapes that were previously unknown — may accelerate the ability to understand diseases, develop new medicines and repurpose existing drugs. It may also lead to new kinds of biological tools, like an enzyme that efficiently breaks down plastic bottles and converts them into materials that are easily reused and recycled.

“This can take you ahead in time — influence the way you are thinking about problems and help solve them faster,” said Gira Bhabha, an assistant professor in the department of cell biology at New York University. “Whether you study neuroscience or immunology — whatever your field of biology — this can be useful.”

This new knowledge is its own sort of key: If scientists can determine the shape of a protein, they can determine how other molecules will bind to it. This might reveal, say, how bacteria resist antibiotics — and how to counter that resistance. Bacteria resist antibiotics by expressing certain proteins; if scientists were able to identify the shapes of these proteins, they could develop new antibiotics or new medicines that suppress them.

In the past, pinpointing the shape of a protein required months, years or even decades of trial-and-error experiments involving X-rays, microscopes and other tools on the lab bench. But DeepMind can significantly shrink the timeline with its A.I. technology, known as AlphaFold.

When Dr. McGeehan sent DeepMind his list of seven enzymes, he told the lab that he had already identified shapes for two of them, but he did not say which two. This was a way of testing how well the system worked; AlphaFold passed the test, correctly predicting both shapes.

It was even more remarkable, Dr. McGeehan said, that the predictions arrived within days. He later learned that AlphaFold had in fact completed the task in just a few hours.

AlphaFold predicts protein structures using what is called a neural network, a mathematical system that can learn tasks by analyzing vast amounts of data — in this case, thousands of known proteins and their physical shapes — and extrapolating into the unknown.

This is the same technology that identifies the commands you bark into your smartphone, recognizes faces in the photos you post to Facebook and that translates one language into another on Google Translate and other services. But many experts believe AlphaFold is one of the technology’s most powerful applications.

“It shows that A.I. can do useful things amid the complexity of the real world,” said Jack Clark, one of the authors of the A.I. Index, an effort to track the progress of artificial intelligence technology across the globe.

As Dr. McGeehan discovered, it can be remarkably accurate. AlphaFold can predict the shape of a protein with an accuracy that rivals physical experiments about 63 percent of the time, according to independent benchmark tests that compare its predictions to known protein structures. Most experts had assumed that a technology this powerful was still years away.

“I thought it would take another 10 years,” said Randy Read, a professor at the University of Cambridge. “This was a complete change.”

But the system’s accuracy does vary, so some of the predictions in DeepMind’s database will be less useful than others. Each prediction in the database comes with a “confidence score” indicating how accurate it is likely to be. DeepMind researchers estimate that the system provides a “good” prediction about 95 percent of the time.

As a result, the system cannot completely replace physical experiments. It is used alongside work on the lab bench, helping scientists determine which experiments they should run and filling the gaps when experiments are unsuccessful. Using AlphaFold, researchers at the University of Colorado Boulder, recently helped identify a protein structure they had struggled to identify for more than a decade.

The developers of DeepMind have opted to freely share its database of protein structures rather than sell access, with the hope of spurring progress across the biological sciences. “We are interested in maximum impact,” said Demis Hassabis, chief executive and co-founder of DeepMind, which is owned by the same parent company as Google but operates more like a research lab than a commercial business.

Some scientists have compared DeepMind’s new database to the Human Genome Project. Completed in 2003, the Human Genome Project provided a map of all human genes. Now, DeepMind has provided a map of the roughly 20,000 proteins expressed by the human genome — another step toward understanding how our bodies work and how we can respond when things go wrong.

The hope is also that the technology will continue to evolve. A lab at the University of Washington has built a similar system called RoseTTAFold, and like DeepMind, it has openly shared the computer code that drives its system. Anyone can use the technology, and anyone can work to improve it.

Even before DeepMind began openly sharing its technology and data, AlphaFold was feeding a wide range of projects. University of Colorado researchers are using the technology to understand how bacteria like E. coli and salmonella develop a resistance to antibiotics, and to develop ways of combating this resistance. At the University of California, San Francisco, researchers have used the tool to improve their understanding of the coronavirus.

The coronavirus wreaks havoc on the body through 26 different proteins. With help from AlphaFold, the researchers have improved their understanding of one key protein and are hoping the technology can help increase their understanding of the other 25.

If this comes too late to have an impact on the current pandemic, it could help in preparing for the next one. “A better understanding of these proteins will help us not only target this virus but other viruses,” said Kliment Verba, one of the researchers in San Francisco.

The possibilities are myriad. After DeepMind gave Dr. McGeehan shapes for seven enzymes that could potentially rid the world of plastic waste, he sent the lab a list of 93 more. “They’re working on these now,” he said.

DeepMind open-sources protein structure dataset generated by AlphaFold 2

VentureBeat 22 July, 2021 - 10:00am

DeepMind and the European Bioinformatics Institute (EMBL), a life sciences lab based in Hinxton, England, today announced the launch of what they claim is the most complete and accurate database of structures for proteins expressed by the human genome. In a joint press conference hosted by the journal Nature, the two organizations said that the database, the AlphaFold Protein Structure Database, which was created using DeepMind’s AlphaFold 2 system, will be made available to the scientific community in the coming weeks.

The recipe for proteins — large molecules consisting of amino acids that are the fundamental building blocks of tissues, muscles, hair, enzymes, antibodies, and other essential parts of living organisms — are encoded in DNA. It’s these genetic definitions that circumscribe their three-dimensional structures, which in turn determine their capabilities. But protein “folding,” as it’s called, is notoriously difficult to figure out from a corresponding genetic sequence alone. DNA contains only information about chains of amino acid residues and not those chains’ final form.

Above: A tuberculosis protein structure predicted by AlphaFold 2.

In December 2018, DeepMind attempted to tackle the challenge of protein folding with AlphaFold, the product of two years of work. Its successor, AlphaFold 2, announced in December 2020, improved on this to outgun competing protein-folding-predicting methods. In the results from the 14th Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 had average errors comparable to the width of an atom (or 0.1 of a nanometer), competitive with the results from experimental methods.

“The AlphaFold database shows the potential for AI to profoundly accelerate scientific progress. Not only has DeepMind’s machine learning system greatly expanded our accumulated knowledge of protein structures and the human proteome overnight, its deep insights into the building blocks of life hold extraordinary promise for the future of scientific discovery,” Alphabet and Google CEO Sundar Pichai said in a press release.

AlphaFold 2 draws inspiration from the fields of biology, physics, and machine learning, taking advantage of the fact that a folded protein can be thought of as a “spatial graph” where amino acid residues (amino acids contained within a peptide or protein) are nodes, and edges connect the residues in close proximity. AlphaFold 2 leverages an AI algorithm that attempts to interpret the structure of this graph while reasoning over the implicit graph it’s building, using evolutionarily related sequences, multiple sequence alignment, and a representation of amino acid residue pairs.

In an open source codebase published last week, DeepMind significantly streamlined AlphaFold 2. Whereas the close-sourced system took days of computing time to generate structures, the open source version is about 16 times faster and can produce structures in minutes to hours, depending on the protein size.

These improvements enabled DeepMind and the EMBL to create more than than 350,000 protein structure predictions including the human proteome (which spans 20,000 proteins), more than doubling the number of high-accuracy structures available to researchers. Beyond this, DeepMind and EMBL used AlphaFold 2 to predict the structures of 20 other “biologically significant organisms,” yielding over 350,000 structures in total for E. coli, fruit flies, mice, zebrafish, yeast, malaria parasites, tuberculosis bacteria, and more. The plan is to expand coverage to over 100 million structures as improvements to both AlphaFold 2 and the database come online.

Above: AlphaFold 2’s prediction of a malaria parasite protein.

“This will be one of the most important datasets since the mapping of the Human Genome,” EMBL deputy director general Ewan Birney said in a statement. “Making AlphaFold 2 predictions accessible to the international scientific community opens up so many new research avenues, from neglected diseases to new enzymes for biotechnology and everything in between. This is a great new scientific tool, which complements existing technologies, and will allow us to push the boundaries of our understanding of the world.”

Some scientists caution that AlphaFold 2 isn’t likely the end-all be-all when it comes to protein structure prediction. Steven Finkbeiner, professor of neurology at the University of California, San Francisco, told Wired in an interview that it’s too soon to tell the implications for drug discovery, given the wide variation in structures within the human body. But DeepMind makes the case that AlphaFold 2, if further refined, could be applied to previously intractable problems, including those related to epidemiological efforts. Last year, the company predicted several protein structures of SARS-CoV-2, including ORF3a, whose makeup was formerly a mystery.

Above: A yeast protein, once again predicted by AlphaFold 2.

DeepMind says it’s committed to making AlphaFold 2 available “at scale” and collaborating with partners to explore new frontiers, like how multiple proteins form complexes and interact with DNA, RNA, and small molecules. Earlier this year, the company announced a partnership with the Geneva-based Drugs for Neglected Diseases Initiative, a nonprofit pharmaceutical organization that hopes to use AlphaFold to identify compounds to treat conditions for which medications remain elusive. The Centre for Enzyme Innovation is using the system to help engineer faster enzymes for recycling polluting single-use plastics. And teams at the University of Colorado Boulder and the University of California, San Francisco are studying antibiotic resistance and SARS-CoV-2 biology with AlphaFold 2.

“Proteins are like tiny exquisite biological machines. The same way that the structure of a machine tells you what it does, so the structure of a protein helps us understand its function. Proteins are like tiny exquisite biological machines. The same way that the structure of a machine tells you what it does, so the structure of a protein helps us understand its function,” DeepMind CEO Demis Hassabis wrote in a blog post published today. “At DeepMind, our thesis has always been that artificial intelligence can dramatically accelerate breakthroughs in many fields of science, and in turn advance humanity. We built AlphaFold and the AlphaFold Protein Structure Database to support and elevate the efforts of scientists around the world in the important work they do. We believe AI has the potential to revolutionise how science is done in the 21st century, and we eagerly await the discoveries that AlphaFold might help the scientific community to unlock next.”

Join us for the world’s leading event on applied AI for enterprise business & technology decision-makers, presented by the #1 publisher of AI coverage.

We may collect cookies and other personal information from your interaction with our website. For more information on the categories of personal information we collect and the purposes we use them for, please view our Notice at Collection.

Science Stories