Protein folding is the process that allows amino acids to form into complex, three-dimensional protein structures. This process is fundamental to biology and plays a crucial role in understanding and developing treatments for diseases such as cancer, Alzheimer’s, and others. However, scientists have struggled to predict the correct folding of a protein given its amino acid sequence for decades. Moreover, solving the protein folding problem has been considered one of the greatest challenges in biology and biochemistry for over 50 years, until recently, an Artificial Intelligence system set a massive milestone and proved how AI can revolutionize scientific discovery.
For more than half a century, the scientific community has been struggling with one of the most complicated puzzles in biology: how proteins fold. The protein folding problem involves predicting the 3D shape or conformation into which a protein will fold, given only its amino acid sequence. The correct 3D structure of a protein is essential to its biological function. The fact that a computer or an algorithm could predict such structures in a fraction of the time required by experimental methods would be a breakthrough in the field, and this is where Artificial Intelligence, particularly deep learning algorithms, comes in handy.
Recently, a team of researchers from Google AI, the Institute for Protein Design at the University of Washington, and Northwestern University published a groundbreaking paper in the journal Nature that outlines how their Artificial Intelligence system, AlphaFold, has predicted confidently the folding of 25 out of the 43 proteins in the Protein Structure Prediction (CASP13) blind prediction experiment accurately. This percentage is groundbreaking compared to other experiments, where before this endeavour, the success rate was much lower. AlphaFold’s accuracy has been described as a milestone in the field of structural biology, and scientists are confident that it could significantly accelerate drug discovery and development in various areas of medicine.
AlphaFold uses two convolutional neural networks, a supercomputer, and a vast amount of available data to predict the protein structures. The first step in developing the AlphaFold model was to train it using existing protein structures. The team collected many known historical structure predictions from a database, paired them with the sequences from which they were derived, and trained two neural networks to predict the three-dimensional structure of a protein given its sequence of amino acids. The first neural network was designed to learn the correlation between the amino acid sequence and the secondary structure of the protein. The secondary structure refers to the local folding of the protein region, for example, helix or beta-sheet.
The second neural network was then used to build the tertiary structures, which bring together the different secondary structures to form a three-dimensional protein. The researchers trained the model using a deep learning approach, which entailed feeding the networks vast amounts of data so they could adapt and “learn” to predict accurate protein folding from a protein sequence.
The beauty of the AlphaFold system is its ability to interpret and use multiple data types to come up with a prediction model. These include multiple sequence alignments and evolutionary profiles, which log how often different amino acids contact each other across evolutionary history. They are then used to input information into the neural networks, which produce the predicted protein structure faster than the rate of the experimental methods. AlphaFold’s ability to do this is the reason why the scientific community has been so interested in this project.
The groundbreaking results of the AlphaFold system could affect various areas of drug discovery and development, which rely on understanding protein 3D structure. By having this knowledge, researchers can predict how a protein will interact with potential drug compounds. By contrast, experimental techniques currently used in this field are time-consuming and costly, often taking a year or more to determine the 3D structure of a protein. AlphaFold can complete the entire prediction process in a matter of days, and with high confidence, which implies a reduction of time, effort, and cost in various fields.
In conclusion, AlphaFold’s breakthrough represents a major milestone in the field of structural biology, and we can’t wait for further development and applications on solving some of the most complicated challenges in medicine today. Although challenges remain, such as how to apply AlphaFold to previously unknown proteins, it is clear that Artificial Intelligence can provide a promising insight into structure discovery that could transform modern medicine.