AI Powered Drug Discovery and Manufacturing
CONFERENCE 2020
February 27 - 28, 2020
MIT, Cambridge, MA

Posters
Evaluating the performance of in-house, commercial, and open models for early ADMET properties
Authors
Renee DesJarlais, Kiran Kumar, Vladimir Chupakhin, Hugo Ceulemans, Dmitrii Rassokhin
This poster will describe the performance of models trained with Chemprop and in-house datasets and how they compare to commercial models.
Target Deconvolution by Matching Gene and Drug Perturbations
Authors
Robin Luo, Yuning Bie
We propose a method for determining the mechanism of action, also called target deconvolution, by cross-referencing drug perturbations with gene perturbations.
Using Machine Learning vs Molecular Similarity in Predicting Enzymatic Steps in Biochemical Synthesis Routes
Authors
Gian Marco Visani, Michael C. Hughes, Soha Hassoun
Contrasting using molecular similarities against ML in the form of hierarchical classifiers trained with positive, hard negatives and unlabeled data to predict enzyme commission numbers that can act on a given query molecule.
A reaction inspector for identifying impurities in organic reactions
Authors
Yiming Mo, Hanyu Gao, Klavs F. Jensen
A framework for prediction probable impurity formations within an organic reaction.
Deep Learning reveals 3D atherosclerotic plaque distribution and composition
Authors
Vanessa Isabell Jurtz, Grethe Skovbjerg, Casper Gravesen Salinas, Urmas Roostalu, Louise Pedersen, Bidda Charlotte Rolin, Michael Nyberg, Martijn van de Bunt, Camilla Ingvorsen
No description.
Enzymatic link prediction for biochemical route synthesis
Authors
Julie Jiang, Li-Ping Liu, Soha Hassoun
Using graph embedding and neural networks to predict missing enzymatic links and characteristics of putative catalyzing enzymes.
Integrating Deep Neural Networks and Symbolic Inference for Organic Reactivity Prediction
Authors
Wesley Wei Qian, Nathan Russell, Claire L. W. Simons, Yunan Luo, Martin D. Burke, Jian Peng
We present a machine-intelligence approach to predict reaction products by integrating deep neural networks and probabilistic rule-based inference based on chemical knowledge, acheiving more than 90% accuracy on the USPTO dataset.
Machine Learning guided peptide design in a drug discovery context
Authors
Simone Fulle, Carsten Stahlhut, Søren Berg Padkjær
No description.
Synthetic Planning and Library Optimization for Automated Cross Coupling Synthesis Platforms
Authors
Nathan Russell, Andrea Palazzolo, Claire Simmons, Martin Burke, Jian Peng
We present algorithms for automation friendly synthetic planning of linear natural products, optimization of building block libraries and characterization of necessary cross couplings.
Learning Neural Retrosynthetic Planning
Authors
Binghong Chen, Hanjun Dai, Chengtao Li, Le Song
We propose a novel tree search algorithm for retrosynthetic planning which exploits neural modules. This enables the learning of generalizable structures from prior experiences to promote planning efficiency and effectiveness in new tasks.
Explicit geometric information in molecular property prediction
Authors
Martin Vögele, Ron Dror
No description.
Applications of the ATOM Modeling PipeLine (AMPL) for Pharmacokinetics Property Prediction
Authors
Benjamin D. Madej
The ATOM Modeling PipeLine (AMPL) is an open source software library for building and sharing machine learning models that predict biological activities or physicochemical and pharmacokinetic properties.
Generative Models for Graph-Based Protein Design
Authors
John Ingraham, Vikas K. Garg, Regina Barzilay, Tommi Jaakkola
We introduce conditional language models for protein sequences that directly condition on a graph specification of a 3D structure.
Transform drug discovery and development with the integration of AI, physics models, and computational resources: an experience from XtalPi
Authors
Sivakumar Sekharan, Lipeng Lai, et al.
Can we integrate several AI models to provide a unified solution to accelerate the R&D process?
Prediction of IR Spectra with Machine Learning
Authors
Michael Forsuelo, Charles McGill, Yanfei Guan, William Green
A method for the prediction of IR spectra from the molecular structure of novel molecules using a machine learning model.
Data-efficient machine learning guided protein engineering
Authors
Surojit Biswas, Grigory Khimulya, Ethan C. Alley, George M. Church
A machine learning approach to rapidly optimize proteins with only small amounts of training data.
Learning a molecular latent space organized by binding affinities
Authors
Jacques Boitreaud, Carlos Oliver, Vincent Mallet, Jerome Waldispühl
We propose to learn a molecular latent space organized by binding affinities, by adding binding affinities multitask prediction to a variational autoencoder.
Rational Design of a Parallel Synthesis Program for the Optimization of Anti-fungal HDAC Inhibitors
Authors
Benjamin Merget, B. Merget, C. Wiebe, A. Koch
No description.
ChemBERT: A pretrained language model for the extraction of chemical reaction information
Authors
Jiang Guo, A.S. Ibanez-Lopez, Victor Quach, Regina Barzilay
We present ChemBERT, a pre-trained language model based on BERT, which enables automated chemical reaction extraction.
Predicting the impact of somatic mutations using cell painting
Authors
Juan C. Caicedo, Shantanu Singh, Alice Berger, Angela Brooks, Jesse Boehm, Anne E. Carpenter
We over-express gene mutations in cultured cells and profile their phenotype using images to identify their impact in the development of cancer.
Predicting Drug Activity using Combined Phenotypic Features and Chemical Structures
Authors
Tim Becker, Kevin Yang, Juan Caicedo, Shantanu Singh, Tommi Jaakkola, Regina Barzilay, Anne E. Carpenter
We use machine learning to predict the biochemical activity of a compound based on its chemical structure combined with gene expression and image-based profiles of cells exposed to it.
Learning mass spectrometry fragmentation of small molecules
Authors
Liu Cao, Alexey Gurevich, Hosein Mohimani
Discovering small molecules using mass spectral database search based on probabilistic modeling.
LigandNet: A machine learning toolbox for predicting ligand activity towards therapeutically important proteins
Authors
Md Mahmudulla Hassan, Danielle Castaneda-Mogollon, Govinda KC, and Suman Sirimulla
A machine learning toolbox for predicting ligand activity towards therapeutically important proteins.
Deep Learning-Generated Potential NMDA Receptor Antagonists Using a Variational Autoencoder
Authors
Katherine J. Schultz, Sean M. Colby, Yasemin Yesiltepe, Jamie R. Nuñez, Monee Y. McGrady, Ryan R. Renslow
This work presents the application and assessment of deep learning software for the generation of de novo candidate antagonists of the NMDA receptor.
Predicting Drug Activity using Combined Phenotypic Features and Chemical Structures
Authors
Tim Becker, Kevin Yang, Juan Caicedo, Shantanu Singh, Regina Barzilay, Anne E. Carpenter
We use machine learning to predict the biochemical activity of a compound based on its chemical structure combined with gene expression and image-based profiles of cells exposed to it.
CORE: Automatic Molecule Optimization Using Copy and Refine Strategy
Authors
Tianfan Fu, Cao Xiao, Jimeng Sun
A deep generative model to enhance molecules with more desirable properties which utilizes a copy and refine strategy.
Regioselectivity Predictions Using Graph Network and Quantum Descriptors
Authors
Yanfei Guan, Thomas Struble, Oscar Wu, Lagnajit Pattanaik, Connor Coley, William H. Green, Klavs F. Jensen
GNN model to predict major products for general regioselective reactions using quantum descriptors.
Property Prediction for 2-Molecule Mixtures
Authors
Allison Tam, Octavian Ganea, Gary Becigneul, Regina Barzilay
We explore different techniques for property prediction of a molecular pair.
Optimizing Synthesis Plans for Molecular Libraries
Authors
Hanyu Gao, Jean Pauphilet, Thomas J. Struble, Connor W. Coley, William H. Green, Klavs F. Jensen
This poster presents an optimization strategy to plan syntheses for a molecular library.
Analyzing Learned Molecular Representations for Property Prediction
Authors
Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker Settels, Tommi Jaakkola, Klavs Jensen, and Regina Barzilay
We carefully benchmark property prediction models on public and proprietary datasets and introduce a new graph convolution model which outperforms previous baselines.
Machine Learning for Ligand Design in Palladium-Catalyzed Cross Coupling Reactions
Authors
Jessica Xu, Thomas Struble, Yanfei Guan, Priscilla Liow, Joseph M. Dennis, Stephen L. Buchwald, and Klavs F. Jensen
Work towards in-silico ligand design for Buchwald-Hartwig cross coupling reactions.
Practical constraints on machine learning in drug discovery - a case study
Authors
Dominique Beaini, Lu Zhu, Daniel Cohen, Sébastien Giguère
We identify several key challenges that need to be overcome for machine learning to be deployed in real-world drug discovery, propose a set of tools to address them and conclude with a case study highlighting the computational design of a potent compound inhibiting uptake of α-Synuclein by neuronal iPS cells with applications in Parkinson’s Disease.
Thermodynamic Properties of Materials: Towards the Prediction of Solubility
Authors
Florence H. Vermeire, William H. Green
Prediction of solubility using directed message passing neural networks.
Wasserstein Graph Representation and Generation
Authors
Gary Bécigneul, Benson Chen, Octavian Ganea, Tommi Jaakkola, Regina Barzilay
We perform graph representation and generation by combining message passing neural networks and the optimal transport geometry.
Drug-BERT: Pre-training Drug Sub-structure Representation for Molecular Property Prediction
Authors
Kexin Huang, Cao Xiao, Lucas M. Glass, Jimeng Sun
In this work, we explore how to learn effective drug representation using BERT by leveraging the massive unlabeled molecular data and the bidirectional self-attention mechanisms for improved molecular property prediction.
3D Molecular Representation and Modeling using Deep Learning
Authors
Tomohide Masuda, Matthew Ragoza, David Ryan Koes
The method to represent and generate 3D molecular structures using GAN will be presented.
Inductive Transfer Learning for Molecular Activity Prediction
Authors
Xinhao Li, Denis Fourches
We propose the Molecular Prediction Model Fine-Tuning (MolPMoFiT) approach, an effective transfer learning method that can be applied to any QSPR/QSAR problems.
Learning accurate generative models of chemical structures from limited training examples
Authors
Michael A. Skinnider, Leonard J. Foster
We explore strategies for training text-based generative models of chemical structures in low-data situations, and apply these insights to train generative models of naturally occurring small molecules (metabolites) in several species.
Hierarchical Graph-to-Graph Translation for Molecules
Authors
Wengong Jin, Regina Barzilay, Tommi Jaakkola
A graph-to-graph translation model for optimizing molecular properties.
Practical Application of Deep Learning to Drug Discovery Project Data
Authors
Thomas Whitehead, Ben Irwin, Julian Levell, Matthew Segall, Gareth Conduit
We will describe application and validation of Alchemite™, a novel deep learning method for data imputation, to drug discovery data covering two pharma projects and diverse endpoints.
Analyzing uncertainty estimates for deep molecular property prediction
Authors
Gabriele Scalia, Colin Grambow, Barbara Pernici, Drew Wicke, Vladimir Chupakhin, Hugo Ceulemans, William H. Green
This work takes a step towards an accurate characterization of DNN-based uncertainty for molecular property prediction, providing experimental insights based on both public and industry datasets.
A deep learning approach to antibiotic discovery
Authors
Jonathan M. Stokes, Kevin Yang, Kyle Swanson, Wengong Jin, Andres Cubillos-Ruiz, Nina M. Donghia, Craig R. MacNair, Shawn French, Lindsey A. Carfrae, Zohar Bloom-Ackerman, Victoria M. Tran, Anush Chiappino-Pepe, Ahmed H. Badran, Ian W. Andrews, Emma J. Chory, George M. Church, Eric D. Brown, Tommi S. Jaakkola, Regina Barzilay, and James J. Collins
To address the rapid emergence of antibiotic-resistant bacteria, we trained a deep neural network capable of predicting molecules with antibacterial activity.
Application of Multi-Armed Bandit Problems to Ensemble FMO to Efficiently Propose Promising Idea Compounds
Authors
Kenichiro Takaba
Application of multi-armed bandit problems to ensemble FMO to efficiently propose promising idea compounds.
Performance of Reaction Datasets in Machine Learning Approaches to Synthesis Planning and Use in Restricted Domains
Authors
Amol Thakkar, Jean-Louis Reymond, Ola Engkvist, Esben Jannik Bjerrum
The overlap and performance of reaction datasets for the task of retrosynthetic planning are explored, alongside the potential to restrict their domain for specialized applications.
Transfer Learning: A Next Key Driver of Accelerating Materials Discovery with Machine Learning
Authors
Chang Liu, Hironao Yamada, Stephen Wu, Yokinori Koyama, Ryo Yoshida
Transfer learning enables us an opportunity to explore the extrapolation area.
Deep generative workflow for the real-world design of novel lead compounds
Authors
Sang Ok Song, Jae Hong Shin, Sanghyung Jin, Minkyu Ha, Jiho Yoo, Jinhan Kim
Standigm developed an AI de novo drug design workflow from target to lead compound which automates molecule generation, prioritization and optimization for further synthesis and testing.
Increasing the applicability domain of QSAR models with meta-learning
Authors
Prudencio Tossou, Basile Dura
We use episodic meta-learning to learn to extrapolate and increase applicability domain of QSAR models.
Interpretable graph neural networks for molecular property prediction
Authors
Emmanuel Noutahi, Dominique Beani, Julien Horwood, Prudencio Tossou
We propose a new hierarchical graph pooling layer that improves graph neural networks performance and interpretability on several molecular property prediction benchmarks.
Machine Learning in Quantum Chemistry: Art or Science?
Authors
Anton V. Sinitskiy, Daniil V. Izmodenov, Iosif V. Leibin, Georgiy K. Ozerov, Dmitry S. Bezrukov, Vijay S. Pande
We present a deep neural network (DNN) that not only outperforms DFT in predicting energies and electron densities of organic molecules, but also has a strong theoretical foundation.
Fully Convolutional Neural Network Models for Materials Science Applications
Authors
Abraham Stern, Michelle Gill, Dave Magley, Ellen Du, Ryan Marson, Jonathan Moore, Bart Rijksen, Joey Storer, Sukrit Mukhopadhyay
New methodology for variational autoencoder.
Data Augmentation and Pre-training for Template-Based Retrosynthetic Prediction
Authors
Michael E Fortunato, Connor W Coley, Brian C Barnes, Klavs F Jensen
This poster presents results of pre-training a reaction template relevance neural network on generalized template applicability.
Efficient Modeling of Reaction Outcomes Using Active Machine Learning
Authors
Natalie S. Eyke, William H. Green, Klavs F. Jensen
Through retrospective analysis of existing reactivity data derived from high-throughput reaction screening, we demonstrate that active machine learning can be used to reduce the experimental burden associated with this type of screening.
Reaction Condition Prediction
Authors
Jiannan Liu, Hanyu Gao, Thomas Struble, Klavs F. Jensen
A Weisfeiler-Lehman network based model was proposed to predict reaction conditions.
Graph dynamical networks for unsupervised learning of atomic scale dynamics in materials
Authors
Tian Xie, Arthur France-Lanord, Yanming Wang, Yang Shao-Horn, Jeffrey Grossman
We developed an unsupervised learning approach to learn the dynamics of atoms or small molecules in materials from time-series molecular dynamics simulation data.
Generative Molecular Design for Lead Optimization: Demonstration by Discovery of Potent, Selective Aurora Kinase B Inhibitors with Favorable Candidate Quality Attributes
Authors
Andrew D Weber, Jason Z Deng, Kevin McLoughlin, Jeffrey Mast, Thomas Sweitzer, Juliet McComas, Margaret Tse, Derek Jones, Jonathan Allen, Stacie Calad-Thomson, Jim Brase, Tom Rush
ATOM consortium developed a platform for generative molecular design an demonstrated its application in lead optimization by discovery of potent, selective aurora kinase B inhibitors with favorable candidate quality attributes.
Black Box Recursive Translations for Molecular Optimization
Authors
Farhan Damani, Vishnu Sresht, Stephen Ra
Motivated by molecular optimization as a translation task, we develop Black Box Recursive Translation (BBRT), a drop-in framework where we explore and demonstrate the empirical benefits of recursive inference, vis-à-vis decoding strategies, as applied to unconstrained molecular optimization.
Leveraging non-structural data to predict structures of protein–ligand complexes
Authors
Joseph Paggi, Ron Dror
We present a method that improves the performance of small-molecule-ligand binding pose prediction by considering other ligands known to bind the target protein.
Use of Full 3D Electronic Structure with a Convolutional Neural Network for Prediction of Molecular and Energetic Material Properties
Authors
Brian C. Barnes, Alex D. Casey, Betsy M. Rice, Steven F. Son
We use full, raw 3D electronic structure gas-phase quantum chemistry calculations as input to a convolutional neural network for the prediction of a variety of molecular and condensed phase material properties, including transfer learning to a small experimental dataset.
Leveraging non-structural data to predict structures of protein–ligand complexes
Authors
Joseph M. Paggi, Ron O. Dror
We present a method that improves the accuracy of small-molecule-ligand binding pose prediction by considering other ligands known to bind the target protein.
Deep Learning of Activation Energies
Authors
Colin A. Grambow, Lagnajit Pattanaik, William H. Green
A message passing neural network predicts the activation energies of gas-phase chemical reactions.
Encoding periodic trends for fully transferable neural network potentials
Authors
John E. Herr, Kevin Koh, Kun Yao, John Parkhill
An encoding of periodic trends is employed to develop a high-dimensional neural network potential which is constant in size with respect to the number of unique elements in the training data.
Olympus: A Toolkit for Benchmarking Optimization Algorithms on Experimentally Derived Surfaces
Authors
F. Häse, R.J. Hickman, M. Aldeghi, E. Liles, L.M. Roch, J.E. Hein, A. Aspuru-Guzik
We present Olympus, a framework for the benchmarking of optimization algorithms on experimentally derived surfaces. Olympus includes a collection of datasets from physics, chemistry and materials science, and a suite of optimization algorithms, from random search to Bayesian and evolutionary, that can be easily accessed via a user-friendly python interface. In addition, Olympus allows to test custom algorithms and user-defined benchmark datasets. Here, we will present the user interface and a few examples of how Olympus can be used as a benchmark toolkit for optimization, as well as a simulation platform for hypotheses evaluation prior to experimental testing.
Smart Data Analytics in Pharmaceutical Manufacturing
Authors
Weike Sun, Kristen A. Severson, Richard D. Braatz
This poster describes a robust and automated approach for data analytics method selection and model construction that allows the user to focus on goals rather than methods.
Generating 3D transition state structures with deep learning
Authors
Lagnajit Pattanaik, John Ingraham, Colin Grambow, Regina Barzilay, Tommi Jaakkola, Klavs Jensen, William Green
Using a dataset generated from quantum chemistry, we predict 3D transition state structures from reactant and product geometries.
Document embedding centroids: new and versatile descriptors for biomedical entities
Authors
John P. Santa Maria Jr., Eugen Lounkine, Jeremy Jenkins
We show that Doc2Vec-generated embeddings for biomedical entities can be useful for nearest neighbor, clustering, and association learning tasks.
Predicting therapeutic targets of small molecules using an integrated machine learning framework
Authors
Nishanth Merwin, Dr. Victoria Catterson, Dr. Gabriel Musso
Iterative machine learning framework integrates multiple data sources to link small molecules to therapeutic targets.
Finding diverse routes of organic synthesis using surrogate-accelerated Bayesian retrosynthesis
Authors
Zhongliang Guo, Stephen Wu, Ryo Yoshida
We propose a sequential Monte Carlo algorithm to find diverse synthesis routes of target molecule within a Bayesian framework, and accelerated the sampling efficiency by a surrogate model.
Augmenting protein network embeddings with sequence information
Authors
Hassan Kané, Mohamed Coulibali, Pelkins Ajanoh, Ali Abdalla
We learn protein representations by integrating data from physical interaction and amino acid sequence.
Learning Representations of DNA Sequences for Low Resource Promoter Characterization
Authors
Peter Morales, Rajmonda Caceres, Matt Walsh, Christina Zook, Catherine Van Praagh, Nicholas Guido, Todd Thorsen
A method for learning effective deep learning representations of promoter sequences with low resource data.
Antibody’s developability prediction with machine learning and a high quality data set
Authors
Lei Jia, Yax Sun, Alex Jacobitz, Vladimir Razinkov, Marissa Mock, Nic Angell, Peter Grandsard, BAT team
Uses of machine learning models and feature engineering will be presented for predicting Mab developability.
Benchmarking the Synthesizability of Molecules Proposed via de novo Generative Models
Authors
Wenhao Gao, Connor Coley, Klavs Jensen
We quantified synthetic feasibility with a a data-driven computer-aided synthesis planning program and analyzed the approaches to bias generative models to synthesizable space.
Using big data and machine learning to understand T cell dysfunction in human tumors
Authors
Simarjot Pabla, Tenzing Khendu, Cailin Joyce, Benjamin Duckless, Andrew Basinski, Matthew Hancock, Jeremy Waight, Mariana Manrique, Jennifer Buell, Alex Duncan, David Savitsky, Lukasz Swiech, Thomas Horn, John Castle
Big data genomics and machine learning can be used to understand T cell dysfunction in human tumors. We created an invitro system to improve our understanding of how T cells get exhausted in tumors and how we can use these data to identify new targets and biomarkers.
PostEra - The GPS for Chemistry
Authors
Alpha Lee, Matthew Robinson, Aaron Morris
We present a machine learning methodology for lead optimization and scaffold hopping which combines synthesis prediction and uncertainty-calibrated bioactivity prediction.
The Center for Computer-Assisted Synthesis: An Overview
Authors
Olaf Wiest, Nitesh Chawla, Abigail Doyle, Robert Paton, Richmond Sarpong, Matthew Sigman
Overview of the newly established Phase I CCI.
Molecule Structure Elucidation given Mass Spectrum and Chemical Formula
Authors
David Jian Yi Lee, Bingquan Shen, Hai Leong Chieu
Method to build the molecular structure one bond at a time given its chemical formula and observed spectrum.
AI-assisted lead optimization with SynSpace
Authors
Istvan Szabo, Greg Makara, Gabor Pocze, Laszlo Kovacs, Anna Szekely
The poster will introduce the SynSpace de novo design software with built-in synthetic feasibility and retrosynthesis capability that provides a user-friendly environment for drug hunters and a versatile design engine for automated platforms.
Machine learning guided process development for complex systems with multiple objectives
Authors
Perman Jorayev, Danilo Russo, Paul Deutsch, Alexei Lapkin
Enhancing process development via hybrid approach of using a prior knowledge and machine learning.
Applying state-of-art deep learning models to design novel JAK inhibitors
Authors
Vykintas Jauniškis, Ole Winther, Daniel R. Greve
Applying state-of-art deep learning models to design novel JAK inhibitors.
Exploring Fragment-based Target-specific Ranking Protocol with Machine Learning on Cathepsin S
Authors
Yuwei Yang
Apply fragmentation and machine learning methods to rank the activities of Cathepsin S inhibitors which is challenging for conventional methods due to their large size, high flexibility and similar chemical component.
Optical Graph Recognition of Chemical Compounds by Deep Learning
Authors
Martijn Oldenhof, Adam Arany, Yves Moreau, Jaak Simm
We present a deep neural network model for recognizing graph structures from chemical compound images.
Computational Pipeline to Discover Inhibitors of RPN11 using Machine Learning and Image Processing
Authors
Aayush Gupta, Huan-Xiang Zhou
Our drug discovery pipeline combines molecular docking and molecular dynamics simulations with powerful image processing and machine learning techniques.
Accelerating lead optimization with active learning by exploiting MMPA based ADMET knowledge with regression forest potency models
Authors
Alexander. G. Dossetter, Edward J. Griffen, Andrew G. Leach, Phillip de Sousa
Combining matched molecular pair analysis(MMPA) derived rules for ADMET issues with regression forest based potency scoring in an active learning mode can be used to accelerate lead optimization.
AI Driven Iterative Screening for Hit Finding
Authors
Gabriel Dreiman, Magda Bictash, Paul Fish, Lewis Griffin, Fredrik Svensson
Iterative screening can lead to drastically improved efficiency in hit discovery compared to the traditional HTS paradigm.