MIT Researchers Improve AI Reaction Predictions Using Decades Old Method

621220.jpg

The FlowER (Flow matching for Electron Redistribution) system allows a researcher to explicitly keep track of all the electrons in a reaction to ensure that none are spuriously added or deleted in the process of predicting the outcome of a chemical reaction. Credit: Massachusetts Institute of Technology

MIT researchers have developed a unique approach to incorporate physical constraints into LLM models for more accurate reaction predictions. Despite growing interest in using LLMs to predict the outcome of chemical reactions, previous attempts have had limited success due to the LLMs not having fundamental knowledge of physical principles such as the laws of conservation of mass.

Described in the journal Nature, the work helps address serious concerns with popular models such as ChatGPT which currently cannot be limited to outputting responses which are physically realistically possible. Currently, many popular LLMs use “tokens” to represent individual atoms.

"If you don't conserve the tokens, the LLM model starts to make new atoms, or deletes atoms in the reaction," said recent postdoc Joonyoung Joung.  Instead of being grounded in scientific understanding, "this is kind of like alchemy," he added.

To offer a solution, the collaborative team of chemical engineers, electrical engineers, computer scientists, and physicists resorted to a method developed in the 70s in which a bond-electron matric is used to represent electrons within a reaction. This bond-electron matrix serves as the basis of their new FlowER (Flow matching for Electron Redistribution) program, allowing them to explicitly track electrons during a reaction.

As it stands, the program developed is in an early stage.

"The system as it stands is a demonstration—a proof of concept that this generative approach of flow matching is very well suited to the task of chemical reaction prediction," added senior author Connor Coley. "we're aware that it does have specific limitations as far as the breadth of different chemistries that it's seen."

Despite this, during their testing the FlowER model matched out outperformed existing approaches used to find standard mechanistic pathways.

"Using the architecture choices that we've made, we get this massive increase in validity and conservation, and we get a matching or a little bit better accuracy in terms of performance," said Coley. "What's unique about our approach is that while we are using these textbook understandings of mechanisms to generate this dataset, we're anchoring the reactants and products of the overall reaction in experimentally validated data from the patent literature."

"We are quite interested in expanding the model's understanding of metals and catalytic cycles. We've just scratched the surface in this first paper," Coley added when speaking about the next steps for the project.

Subscribe to our e-Newsletters!
Stay up to date with the latest news, articles, and events. Plus, get special offers from Labcompare – all delivered right to your inbox! Sign up now!

More News