Improving chemical reaction yield prediction using pre-trained graph neural networks

Journal
J. Cheminformatics(Journal of Cheminformatics)
Date
2024.03.01
Abstract

Graph neural networks (GNNs) have proven to be effective in the prediction of yields for chemical reactions. However, their performance tends to deteriorate when they are trained using an insufficient training dataset in terms of quantity or diversity. One promising solution to alleviate this issue is to pre-train a GNN on a large-scale molecular databased. In this study, we present a novel pre-training method for GNNs with the objective of building an improved prediction model for chemical reaction yield prediction. Given a molecular database consisting of a large number of molecules, we calculate molecular descriptors for each molecule and reduce the dimensionality of these descriptors by applying principal component analysis. We define a pre-text task by assigning a vector of principal component scores as the pseudo-label to each molecule in the database. A GNN is then pre-trained to perform the pre-text task of predicting the pseudo-label for the input molecule. For chemical reaction yield prediction, a prediction model is initialized using the pre-trained GNN and then fine-tuned with the training dataset containing chemical reactions and their yields. Through the evaluation on benchmark datasets, we demonstrate the effectiveness of the proposed method in improving chemical reaction yield prediction. 

Reference
J Cheminform 16, 25 (2024)
DOI
http://dx.doi.org/10.1186/s13321-024-00818-z