AI Powered Drug Discovery and Manufacturing


February 27 - 28, 2020

MIT, Cambridge, MA


Evaluating the performance of in-house, commercial, and open models for early ADMET properties


Renee DesJarlais, Kiran Kumar, Vladimir Chupakhin, Hugo Ceulemans, Dmitrii Rassokhin

This poster will describe the performance of models trained with Chemprop and in-house datasets and how they compare to commercial models.

Target Deconvolution by Matching Gene and Drug Perturbations


Robin Luo, Yuning Bie

We propose a method for determining the mechanism of action, also called target deconvolution, by cross-referencing drug perturbations with gene perturbations.

Using Machine Learning vs Molecular Similarity in Predicting Enzymatic Steps in Biochemical Synthesis Routes


Gian Marco Visani, Michael C. Hughes, Soha Hassoun

Contrasting using molecular similarities against ML in the form of hierarchical classifiers trained with positive, hard negatives and unlabeled data to predict enzyme commission numbers that can act on a given query molecule.

A reaction inspector for identifying impurities in organic reactions


Yiming Mo, Hanyu Gao, Klavs F. Jensen

A framework for prediction probable impurity formations within an organic reaction.

Deep Learning reveals 3D atherosclerotic plaque distribution and composition


Vanessa Isabell Jurtz, Grethe Skovbjerg, Casper Gravesen Salinas, Urmas Roostalu, Louise Pedersen, Bidda Charlotte Rolin, Michael Nyberg, Martijn van de Bunt, Camilla Ingvorsen

No description.

Enzymatic link prediction for biochemical route synthesis


Julie Jiang, Li-Ping Liu, Soha Hassoun

Using graph embedding and neural networks to predict missing enzymatic links and characteristics of putative catalyzing enzymes.

Integrating Deep Neural Networks and Symbolic Inference for Organic Reactivity Prediction


Wesley Wei Qian, Nathan Russell, Claire L. W. Simons, Yunan Luo, Martin D. Burke, Jian Peng

We present a machine-intelligence approach to predict reaction products by integrating deep neural networks and probabilistic rule-based inference based on chemical knowledge, acheiving more than 90% accuracy on the USPTO dataset.

Machine Learning guided peptide design in a drug discovery context


Simone Fulle, Carsten Stahlhut, Søren Berg Padkjær

No description.

Synthetic Planning and Library Optimization for Automated Cross Coupling Synthesis Platforms


Nathan Russell, Andrea Palazzolo, Claire Simmons, Martin Burke, Jian Peng

We present algorithms for automation friendly synthetic planning of linear natural products, optimization of  building block libraries and characterization of necessary cross couplings.

Learning Neural Retrosynthetic Planning


Binghong Chen, Hanjun Dai, Chengtao Li, Le Song

We propose a novel tree search algorithm for retrosynthetic planning which exploits neural modules. This enables the learning of generalizable structures from prior experiences to promote planning efficiency and effectiveness in new tasks.

Explicit geometric information in molecular property prediction


Martin Vögele, Ron Dror

No description. 

Applications of the ATOM Modeling PipeLine (AMPL) for Pharmacokinetics Property Prediction


Benjamin D. Madej

The ATOM Modeling PipeLine (AMPL) is an open source software library for building and sharing machine learning models that predict biological activities or physicochemical and pharmacokinetic properties.

Generative Models for Graph-Based Protein Design


John Ingraham, Vikas K. Garg, Regina Barzilay, Tommi Jaakkola

We introduce conditional language models for protein sequences that directly condition on a graph specification of a 3D structure.

Transform drug discovery and development with the integration of AI, physics models, and computational resources: an experience from XtalPi


Sivakumar Sekharan, Lipeng Lai, et al.

Can we integrate several AI models to provide a unified solution to accelerate the R&D process?

Prediction of IR Spectra with Machine Learning


Michael Forsuelo, Charles McGill, Yanfei Guan, William Green

A method for the prediction of IR spectra from the molecular structure of novel molecules using a machine learning model.

Data-efficient machine learning guided protein engineering


Surojit Biswas, Grigory Khimulya, Ethan C. Alley, George M. Church

A machine learning approach to rapidly optimize proteins with only small amounts of training data.

Learning a molecular latent space organized by binding affinities


Jacques Boitreaud, Carlos Oliver, Vincent Mallet, Jerome Waldispühl

We propose to learn a molecular latent space organized by binding affinities, by adding binding affinities multitask prediction to a variational autoencoder. 

Rational Design of a Parallel Synthesis Program for the Optimization of Anti-fungal HDAC Inhibitors


Benjamin Merget, B. Merget, C. Wiebe, A. Koch

No description.

ChemBERT: A pretrained language model for the extraction of chemical reaction information


Jiang Guo, A.S. Ibanez-Lopez, Victor Quach, Regina Barzilay

We present ChemBERT, a pre-trained language model based on BERT, which enables automated chemical reaction extraction. 

Predicting the impact of somatic mutations using cell painting


Juan C. Caicedo, Shantanu Singh, Alice Berger, Angela Brooks, Jesse Boehm, Anne E. Carpenter

We over-express gene mutations in cultured cells and profile their phenotype using images to identify their impact in the development of cancer.

Predicting Drug Activity using Combined Phenotypic Features and Chemical Structures


Tim Becker, Kevin Yang, Juan Caicedo, Shantanu Singh, Tommi Jaakkola, Regina Barzilay, Anne E. Carpenter

We use machine learning to predict the biochemical activity of a compound based on its chemical structure combined with gene expression and image-based profiles of cells exposed to it.

Learning mass spectrometry fragmentation of small molecules


Liu Cao, Alexey Gurevich, Hosein Mohimani

Discovering small molecules using mass spectral database search based on probabilistic modeling.

LigandNet: A machine learning toolbox for predicting ligand activity towards therapeutically important proteins


Md Mahmudulla Hassan, Danielle Castaneda-Mogollon, Govinda KC, and  Suman Sirimulla

A machine learning toolbox for predicting ligand activity towards therapeutically important proteins. 

Deep Learning-Generated Potential NMDA Receptor Antagonists Using a Variational Autoencoder


Katherine J. Schultz, Sean M. Colby, Yasemin Yesiltepe, Jamie R. Nuñez, Monee Y. McGrady, Ryan R. Renslow

This work presents the application and assessment of deep learning software for the generation of de novo candidate antagonists of the NMDA receptor.

Predicting Drug Activity using Combined Phenotypic Features and Chemical Structures


Tim Becker, Kevin Yang, Juan Caicedo, Shantanu Singh, Regina Barzilay, Anne E. Carpenter

We use machine learning to predict the biochemical activity of a compound based on its chemical structure combined with gene expression and image-based profiles of cells exposed to it.

CORE: Automatic Molecule Optimization Using Copy and Refine Strategy


Tianfan Fu, Cao Xiao, Jimeng Sun

A deep generative model to enhance molecules with more desirable properties which utilizes a copy and refine strategy.

Regioselectivity Predictions Using Graph Network and Quantum Descriptors


Yanfei Guan, Thomas Struble, Oscar Wu, Lagnajit Pattanaik, Connor Coley, William H. Green, Klavs F. Jensen

GNN model to predict major products for general regioselective reactions using quantum descriptors. 

Property Prediction for 2-Molecule Mixtures


Allison Tam, Octavian Ganea, Gary Becigneul, Regina Barzilay

We explore different techniques for property prediction of a molecular pair.

Optimizing Synthesis Plans for Molecular Libraries


Hanyu Gao, Jean Pauphilet, Thomas J. Struble, Connor W. Coley, William H. Green, Klavs F. Jensen

This poster presents an optimization strategy to plan syntheses for a molecular library.

Analyzing Learned Molecular Representations for Property Prediction


Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker Settels, Tommi Jaakkola, Klavs Jensen, and Regina Barzilay

We carefully benchmark property prediction models on public and proprietary datasets and introduce a new graph convolution model which outperforms previous baselines.

Machine Learning for Ligand Design in Palladium-Catalyzed Cross Coupling Reactions


Jessica Xu, Thomas Struble, Yanfei Guan, Priscilla Liow, Joseph M. Dennis, Stephen L. Buchwald, and Klavs F. Jensen

Work towards in-silico ligand design for Buchwald-Hartwig cross coupling reactions.

Practical constraints on machine learning in drug discovery - a case study


Dominique Beaini, Lu Zhu, Daniel Cohen, Sébastien Giguère

We identify several key challenges that need to be overcome for machine learning to be deployed in real-world drug discovery, propose a set of tools to address them and conclude with a case study highlighting the computational design of a potent compound inhibiting uptake of α-Synuclein by neuronal iPS cells with applications in Parkinson’s Disease.

Thermodynamic Properties of Materials: Towards the Prediction of Solubility


Florence H. Vermeire, William H. Green

Prediction of solubility using directed message passing neural networks.

Wasserstein Graph Representation and Generation


Gary Bécigneul, Benson Chen, Octavian Ganea, Tommi Jaakkola, Regina Barzilay

We perform graph representation and generation by combining message passing neural networks and the optimal transport geometry.

Drug-BERT: Pre-training Drug Sub-structure Representation for Molecular Property Prediction


Kexin Huang, Cao Xiao, Lucas M. Glass, Jimeng Sun

In this work, we explore how to learn effective drug representation using BERT by leveraging the massive unlabeled molecular data and the bidirectional self-attention mechanisms for improved molecular property prediction.

3D Molecular Representation and Modeling using Deep Learning


Tomohide Masuda, Matthew Ragoza, David Ryan Koes

The method to represent and generate 3D molecular structures using GAN will be presented.

Inductive Transfer Learning for Molecular Activity Prediction


Xinhao Li, Denis Fourches

We propose the Molecular Prediction Model Fine-Tuning (MolPMoFiT) approach, an effective transfer learning method that can be applied to any QSPR/QSAR problems.

Learning accurate generative models of chemical structures from limited training examples


Michael A. Skinnider, Leonard J. Foster

We explore strategies for training text-based generative models of chemical structures in low-data situations, and apply these insights to train generative models of naturally occurring small molecules (metabolites) in several species.

Hierarchical Graph-to-Graph Translation for Molecules


Wengong Jin, Regina Barzilay, Tommi Jaakkola

A graph-to-graph translation model for optimizing molecular properties.

Practical Application of Deep Learning to Drug Discovery Project Data


Thomas Whitehead, Ben Irwin, Julian Levell, Matthew Segall, Gareth Conduit

We will describe application and validation of Alchemite™, a novel deep learning method for data imputation, to drug discovery data covering two pharma projects and diverse endpoints.

Analyzing uncertainty estimates for deep molecular property prediction


Gabriele Scalia, Colin Grambow, Barbara Pernici, Drew Wicke, Vladimir Chupakhin, Hugo Ceulemans, William H. Green

This work takes a step towards an accurate characterization of DNN-based uncertainty for molecular property prediction, providing experimental insights based on both public and industry datasets.

A deep learning approach to antibiotic discovery


Jonathan M. Stokes, Kevin Yang, Kyle Swanson, Wengong Jin, Andres Cubillos-Ruiz, Nina M. Donghia, Craig R. MacNair, Shawn French, Lindsey A. Carfrae, Zohar Bloom-Ackerman, Victoria M. Tran, Anush Chiappino-Pepe, Ahmed H. Badran, Ian W. Andrews, Emma J. Chory, George M. Church, Eric D. Brown, Tommi S. Jaakkola, Regina Barzilay, and James J. Collins

To address the rapid emergence of antibiotic-resistant bacteria, we trained a deep neural network capable of predicting molecules with antibacterial activity.

Application of Multi-Armed Bandit Problems to Ensemble FMO to Efficiently Propose Promising Idea Compounds


Kenichiro Takaba

Application of multi-armed bandit problems to ensemble FMO to efficiently propose promising idea compounds.

Performance of Reaction Datasets in Machine Learning Approaches to Synthesis Planning and Use in Restricted Domains


Amol Thakkar, Jean-Louis Reymond, Ola Engkvist, Esben Jannik Bjerrum

The overlap and performance of reaction datasets for the task of retrosynthetic planning are explored, alongside the potential to restrict their domain for specialized applications.

Transfer Learning: A Next Key Driver of Accelerating Materials Discovery with Machine Learning


Chang Liu, Hironao Yamada, Stephen Wu, Yokinori Koyama, Ryo Yoshida

Transfer learning enables us an opportunity to explore the extrapolation area.

Deep generative workflow for the real-world design of novel lead compounds


Sang Ok Song, Jae Hong Shin, Sanghyung Jin, Minkyu Ha, Jiho Yoo, Jinhan Kim

Standigm developed an AI de novo drug design workflow from target to lead compound which automates molecule generation, prioritization and optimization for further synthesis and testing.

Increasing the applicability domain of QSAR models with meta-learning


Prudencio Tossou, Basile Dura

We use episodic meta-learning to learn to extrapolate and increase applicability domain of QSAR models.

Interpretable graph neural networks for molecular property prediction


Emmanuel Noutahi, Dominique Beani, Julien Horwood, Prudencio Tossou

We propose a new hierarchical graph pooling layer that improves graph neural networks performance and interpretability on several molecular property prediction benchmarks.

Machine Learning in Quantum Chemistry: Art or Science?


Anton V. Sinitskiy, Daniil V. Izmodenov, Iosif V. Leibin, Georgiy K. Ozerov, Dmitry S. Bezrukov, Vijay S. Pande

We present a deep neural network (DNN) that not only outperforms DFT in predicting energies and electron densities of organic molecules, but also has a strong theoretical foundation.

Fully Convolutional Neural Network Models for Materials Science Applications


Abraham Stern, Michelle Gill, Dave Magley, Ellen Du, Ryan Marson, Jonathan Moore, Bart Rijksen, Joey Storer, Sukrit Mukhopadhyay

New methodology for variational autoencoder.

Data Augmentation and Pre-training for Template-Based Retrosynthetic Prediction


Michael E Fortunato, Connor  W Coley, Brian C Barnes, Klavs F Jensen

This poster presents results of pre-training a reaction template relevance neural network on generalized template applicability.

Efficient Modeling of Reaction Outcomes Using Active Machine Learning


Natalie S. Eyke, William H. Green, Klavs F. Jensen

Through retrospective analysis of existing reactivity data derived from high-throughput reaction screening, we demonstrate that active machine learning can be used to reduce the experimental burden associated with this type of screening.

Reaction Condition Prediction


Jiannan Liu, Hanyu Gao, Thomas Struble, Klavs F. Jensen

A Weisfeiler-Lehman network based model was proposed to predict reaction conditions.

Graph dynamical networks for unsupervised learning of atomic scale dynamics in materials


Tian Xie, Arthur France-Lanord, Yanming Wang, Yang Shao-Horn, Jeffrey Grossman

We developed an unsupervised learning approach to learn the dynamics of atoms or small molecules in materials from time-series molecular dynamics simulation data.

Generative Molecular Design for Lead Optimization: Demonstration by Discovery of Potent, Selective Aurora Kinase B Inhibitors with Favorable Candidate Quality Attributes


Andrew D Weber, Jason Z Deng, Kevin McLoughlin, Jeffrey Mast, Thomas Sweitzer, Juliet McComas, Margaret Tse, Derek Jones, Jonathan Allen, Stacie Calad-Thomson, Jim Brase, Tom Rush

ATOM consortium developed a platform for generative molecular design an demonstrated its application in lead optimization by discovery of potent, selective aurora kinase B inhibitors with favorable candidate quality attributes.

Black Box Recursive Translations for Molecular Optimization


Farhan Damani, Vishnu Sresht, Stephen Ra

Motivated by molecular optimization as a translation task, we develop Black Box Recursive Translation (BBRT), a drop-in framework where we explore and demonstrate the empirical benefits of recursive inference, vis-à-vis decoding strategies, as applied to unconstrained molecular optimization.

Leveraging non-structural data to predict structures of protein–ligand complexes


Joseph Paggi, Ron Dror

We present a method that improves the performance of small-molecule-ligand binding pose prediction by considering other ligands known to bind the target protein.

Use of Full 3D Electronic Structure with a Convolutional Neural Network for Prediction of Molecular and Energetic Material Properties


Brian C. Barnes, Alex D. Casey, Betsy M. Rice, Steven F. Son

We use full, raw 3D electronic structure gas-phase quantum chemistry calculations as input to a convolutional neural network for the prediction of a variety of molecular and condensed phase material properties, including transfer learning to a small experimental dataset.

Leveraging non-structural data to predict structures of protein–ligand complexes


Joseph M. Paggi, Ron O. Dror

We present a method that improves the accuracy of small-molecule-ligand binding pose prediction by considering other ligands known to bind the target protein.

Deep Learning of Activation Energies


Colin A. Grambow, Lagnajit Pattanaik, William H. Green

A message passing neural network predicts the activation energies of gas-phase chemical reactions.

Encoding periodic trends for fully transferable neural network potentials


John E. Herr, Kevin Koh, Kun Yao, John Parkhill

An encoding of periodic trends is employed to develop a high-dimensional neural network potential which is constant in size with respect to the number of unique elements in the training data.

Olympus: A Toolkit for Benchmarking Optimization Algorithms on Experimentally Derived Surfaces


F. Häse, R.J. Hickman, M. Aldeghi, E. Liles, L.M. Roch, J.E. Hein, A. Aspuru-Guzik

We present Olympus, a framework for the benchmarking of optimization algorithms on experimentally derived surfaces. Olympus includes a collection of datasets from physics, chemistry and materials science, and a suite of optimization algorithms, from random search to Bayesian and evolutionary, that can be easily accessed via a user-friendly python interface. In addition, Olympus allows to test custom algorithms and user-defined benchmark datasets. Here, we will present the user interface and a few examples of how Olympus can be used as a benchmark toolkit for optimization, as well as a simulation platform for hypotheses evaluation prior to experimental testing.

Smart Data Analytics in Pharmaceutical Manufacturing


Weike Sun, Kristen A. Severson, Richard D. Braatz

This poster describes a robust and automated approach for data analytics method selection and model construction that allows the user to focus on goals rather than methods.

Generating 3D transition state structures with deep learning


Lagnajit Pattanaik, John Ingraham, Colin Grambow, Regina Barzilay, Tommi Jaakkola, Klavs Jensen, William Green

Using a dataset generated from quantum chemistry, we predict 3D transition state structures from reactant and product geometries.

Document embedding centroids: new and versatile descriptors for biomedical entities


John P. Santa Maria Jr., Eugen Lounkine, Jeremy Jenkins

We show that Doc2Vec-generated embeddings for biomedical entities can be useful for nearest neighbor, clustering, and association learning tasks.

Predicting therapeutic targets of small molecules using an integrated machine learning framework


Nishanth Merwin, Dr. Victoria Catterson, Dr. Gabriel Musso

Iterative machine learning framework integrates multiple data sources to link small molecules to therapeutic targets.

Finding diverse routes of organic synthesis using surrogate-accelerated Bayesian retrosynthesis


Zhongliang Guo, Stephen Wu, Ryo Yoshida

We propose a sequential Monte Carlo algorithm to find diverse synthesis routes of target molecule within a Bayesian framework, and accelerated the sampling efficiency by a surrogate model.

Augmenting protein network embeddings with sequence information


Hassan Kané, Mohamed Coulibali, Pelkins Ajanoh, Ali Abdalla

We learn protein representations by integrating data from physical interaction and amino acid sequence.

Learning Representations of DNA Sequences for Low Resource Promoter Characterization


Peter Morales, Rajmonda Caceres, Matt Walsh, Christina Zook, Catherine Van Praagh, Nicholas Guido, Todd Thorsen

A method for learning effective deep learning representations of promoter sequences with low resource data.

Antibody’s developability prediction with machine learning and a high quality data set


Lei Jia, Yax Sun, Alex Jacobitz, Vladimir Razinkov, Marissa Mock, Nic Angell, Peter Grandsard, BAT team

Uses of machine learning models and feature engineering will be presented for predicting Mab developability.

Benchmarking the Synthesizability of Molecules Proposed via de novo Generative Models


Wenhao Gao, Connor Coley, Klavs Jensen

We quantified synthetic feasibility with a a data-driven computer-aided synthesis planning program and analyzed the approaches to bias generative models to synthesizable space.

Using big data and machine learning to understand T cell dysfunction in human tumors


Simarjot Pabla, Tenzing Khendu, Cailin Joyce, Benjamin Duckless, Andrew Basinski, Matthew Hancock, Jeremy Waight, Mariana Manrique, Jennifer Buell, Alex Duncan, David Savitsky, Lukasz Swiech, Thomas Horn, John Castle

Big data  genomics and machine learning can be used to understand T cell dysfunction in human tumors. We created an invitro system to improve our understanding of how T cells get exhausted in tumors and how we can use these data to identify new targets and biomarkers.

PostEra - The GPS for Chemistry


Alpha Lee, Matthew Robinson, Aaron Morris

We present a machine learning methodology for lead optimization and scaffold hopping which combines synthesis prediction and uncertainty-calibrated bioactivity prediction.

The Center for Computer-Assisted Synthesis: An Overview


Olaf Wiest, Nitesh Chawla, Abigail Doyle, Robert Paton, Richmond Sarpong, Matthew Sigman

Overview of the newly established Phase I CCI. 

Molecule Structure Elucidation given Mass Spectrum and Chemical Formula


David Jian Yi Lee, Bingquan Shen,  Hai Leong Chieu

Method to build the molecular structure one bond at a time given its chemical formula and observed spectrum.

AI-assisted lead optimization with SynSpace


Istvan Szabo, Greg Makara, Gabor Pocze, Laszlo Kovacs, Anna Szekely

The poster will introduce the SynSpace de novo design software with built-in synthetic feasibility and retrosynthesis capability that provides a user-friendly environment for drug hunters and a versatile design engine for automated platforms.

Machine learning guided process development for complex systems with multiple objectives


Perman Jorayev, Danilo Russo, Paul Deutsch, Alexei Lapkin

Enhancing process development via hybrid approach of using a prior knowledge and machine learning. 

Applying state-of-art deep learning models to design novel JAK inhibitors


Vykintas Jauniškis, Ole Winther, Daniel R. Greve

Applying state-of-art deep learning models to design novel JAK inhibitors. 

Exploring Fragment-based Target-specific Ranking Protocol with Machine Learning on Cathepsin S


Yuwei Yang

Apply fragmentation and machine learning methods to rank the activities of Cathepsin S inhibitors which is challenging for conventional methods due to their large size, high flexibility and similar chemical component.

Optical Graph Recognition of Chemical Compounds by Deep Learning


Martijn Oldenhof, Adam Arany, Yves Moreau, Jaak Simm

We present a deep neural network model for recognizing graph structures from chemical compound images.

Computational Pipeline to Discover Inhibitors of RPN11 using Machine Learning and Image Processing


Aayush Gupta, Huan-Xiang Zhou

Our drug discovery pipeline combines molecular docking and molecular dynamics simulations with powerful image processing and machine learning techniques.

Accelerating lead optimization with active learning by exploiting MMPA based ADMET knowledge with regression forest potency models


Alexander. G. Dossetter, Edward J. Griffen, Andrew G. Leach, Phillip de Sousa

Combining matched molecular pair analysis(MMPA) derived rules for ADMET issues with regression forest based potency scoring in an active learning mode can be used to accelerate lead optimization.

AI Driven Iterative Screening for Hit Finding


Gabriel Dreiman, Magda Bictash, Paul Fish, Lewis Griffin, Fredrik Svensson

Iterative screening can lead to drastically improved efficiency in hit discovery compared to the traditional HTS paradigm.

  • LinkedIn - White Circle
  • Twitter - White Circle

Contact Us

77 Massachusetts Ave.

Cambridge, MA 02139