
Recent work at the University of Basel has demonstrated that while State-of-the-art AI programs could help support the development of new drugs, they frequently fail when it comes to new proteins that could be of interest for innovative drugs.
Described in a publication in the journal Nature Communications, the team demonstrate that many of these AI models simply memorize patterns rather than understand the physical relationships at play.
Elucidating protein structures has always been a complex endeavor, until recently when machine learning found its way into research. Early models offered simple approaches to calculating how amino acids fold into three-dimensional structures, earning the developers of those early models the 2024 Nobel Prize in Chemistry.
More recent iterations of these models go further, offering insights into how target proteins will interact with other molecules. "This possibility of predicting the structure of proteins together with a ligand is invaluable for drug development," said Professor Markus Lill from the University of Basel.
However, Lill and his team were puzzled by the claimed high success rates of these models, especially given that only around 100,000 elucidated protein structures paired with their ligands are even available to train these models. "We wanted to find out whether these AI models really learn the basics of physical chemistry using the training data and apply them correctly," says Lill.
During their testing, the team modified the amino acid sequence of hundreds of sample proteins in a way which would exhibit entirely different charge distributions at their ligand binding sites or even blocking them entirely. Despite these changes, the AI models predicted the same complex structure as if binding were still possible. The team pursued a similar approach with ligands, finding the AI models behaved in the same way.
Additionally, the team also found that the models were particularly challenged if proteins did not show any similarity to those in the training data sets. "When they see something completely new, they quickly fall short, but that is precisely where the key to new drugs lies," emphasizes Lill.
The team emphasize that AI models should therefore be viewed with caution when used for drug development, pointing out that experiments or computer-aided analysis will still be critical to validate any AI model predictions.
"The better solution would be to integrate the physicochemical laws into future AI models," Lill concluded.