Predictive Modeling of NMR Chemical Shifts without Atom-level Annotations
Journal
J. Chem. Inf. Model. (Journal of Chemical Information and Modeling)
Date
2020.07.21
Abstract
Recently, machine learning has successfully been applied to NMR chemical shift prediction. To build a prediction model, existing methods required a training dataset consisting of molecules whose NMR-active atoms are annotated with their chemical shifts. However, the laborious atom-level annotation task must be conducted manually by chemists, and thus, it is difficult to make it large-scale in practice. To address the issue, we propose a weakly supervised learning method to enable predictive modeling of NMR chemical shifts without explicit atom-level annotations in the training dataset.
For the proposed method, the training dataset is constituted by a set of molecules that are labeled with chemical shifts at the molecule-level. As the prediction model, we build a message passing neural network that predicts the chemical shifts of individual NMR-active atoms in a molecule. By ntroducing a loss function that is invariant to the permutation of atoms in a molecule, the model is trained in a weakly supervised manner to minimize the molecule-level difference between a set of predicted chemical shifts and the corresponding set of actual chemical shifts across the training dataset. Accordingly, during the training, the predicted chemical shifts by the model are approximately aligned with the actual chemical shifts in a data-driven fashion. We demonstrate that the proposed method achieves the performance comparable to the existing fully supervised methods in predicting chemical shifts of 1H and 13C NMR spectra for small molecules.