Google’s protein-folding AI AlphaFold has nearly cracked them all

The word ‘breakthrough’ is overused in scientific research but occasionally something will be a breakthrough in the true sense of the word. For example using CRISPR-Cas 9, mapping the human genome, taking the first image of a black hole, and over the last few years – and now DeepMind’s AlphaFold protein structure database.

For molecules that can be made up of only a couple of dozen different amino acids, proteins are incredibly complex. Each fold, twist and position can change how the protein works, and so understanding these complicated 3D structures can tell us a lot about what the protein does.

But uncovering these multiple layers of structures is difficult and painstaking work. In the 1960’s two Nobel prizes were given for determining the structures of proteins, and in early 2020s, only 17% of the protein structures in the human body had been identified in a lab.

But DeepMind – an AI from Google’s parent company Alphabet – has absolutely changed the game with AlphaFold.

“Determining the 3D structure of a protein used to take many months or years, it now takes seconds. AlphaFold has already accelerated and enabled massive discoveries, including cracking the structure of the nuclear pore complex,” says Eric Topol, Founder and Director of the Scripps Research Translational Institute.

“And with this new addition of structures illuminating nearly the entire protein universe, we can expect more biological mysteries to be solved each day.”

Over the last two years, AlphaFold has been dropping larger and larger numbers of protein structures into its database. In 2021 it was the entire human proteome, then hundreds of thousands of new protein structures including the proteins for a large number of human diseases.

Now the researchers announced they have released 200 million protein structures – which is the predicted structures for nearly all catalogued proteins known to science. This includes animals, plants, bacteria and fungi.

“As someone who’s been in genomics and computational biology since the 1990s, I’ve seen many of these moments where you can sense the landscape shifting under you and the provision of new resources, and this has been one of the fastest,” Ewan Birney – director of the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) – told New Scientist.  

“I mean, two years ago, we just simply did not realise that this was feasible.”

He calls the database “a gift to humanity”.

Knowing the structures of proteins allows researchers to know more about how the protein interacts with other molecules and can determine its function. This means creating more specific drug targets, discovering new drugs and understandinghow specific proteins work.

Of course, as exciting as this is, AlphaFold is still guessing what the structures look like. EMBL-EBI suggests that 35% of the structures are as good as experimentally determined structures, and another 45% are good enough to use for many applications in genomics. The vast majority of these structures have not been verified in a lab.

But the vast scale of protein structures and ease of accessibility outweigh these issues.

“What took us months and years to do, AlphaFold was able to do in a weekend,” says structural biologist Professor John McGeehan, from the University of Portsmouth.

We’re excited to see what researchers can do with this new treasure trove of data.

Please login to favourite this article.