Compressed graph representation for scalable molecular graph generation
J. Cheminformatics(Journal of Cheminformatics)
In recent years, deep learning has been successfully applied to molecular graph generation tasks. A major challenge has been the high computational complexity increasing with the number of nodes in a graph, which hinders the application to large molecules having many heavy atoms. To alleviate the complexity, we present a method of molecular graph compression. We identify six substructural patterns that are practically prevalent between two atoms in real-world molecules. In a molecular graph, the relevant substructures are then converted to edges by regarding them as additional edge features along with bond types. This reduces the number of nodes significantly without any information loss. A generative model can be built in a more efficient and scalable manner with large molecules on the compressed graph representation. We demonstrate the effectiveness of the proposed method with GuacaMol benchmark comprised of molecules with up to 88 heavy atoms.