• LinkedIn - White Circle
  • Twitter - White Circle

Contact Us

77 Massachusetts Ave.

Cambridge, MA 02139


AI Powered Drug Discovery and Manufacturing


February 27 - 28, 2020

MIT, Cambridge, MA


Evaluating the performance of in-house, commercial, and open models for early ADMET properties


This poster will describe the performance of models trained with Chemprop and in-house datasets and how they compare to commercial models.

Renee DesJarlais, Kiran Kumar, Vladimir Chupakhin, Hugo Ceulemans, Dmitrii Rassokhin

Target Deconvolution by Matching Gene and Drug Perturbations


We propose a method for determining the mechanism of action, also called target deconvolution, by cross-referencing drug perturbations with gene perturbations.

Robin Luo, Yuning Bie

Using Machine Learning vs Molecular Similarity in Predicting Enzymatic Steps in Biochemical Synthesis Routes


Contrasting using molecular similarities against ML in the form of hierarchical classifiers trained with positive, hard negatives and unlabeled data to predict enzyme commission numbers that can act on a given query molecule.

Gian Marco Visani, Michael C. Hughes, Soha Hassoun

A reaction inspector for identifying impurities in organic reactions


A framework for prediction probable impurity formations within an organic reaction.

Yiming Mo, Hanyu Gao, Klavs F. Jensen

Deep Learning reveals 3D atherosclerotic plaque distribution and composition


No description.

Vanessa Isabell Jurtz, Grethe Skovbjerg, Casper Gravesen Salinas, Urmas Roostalu, Louise Pedersen, Bidda Charlotte Rolin, Michael Nyberg, Martijn van de Bunt, Camilla Ingvorsen

Enzymatic link prediction for biochemical route synthesis


Using graph embedding and neural networks to predict missing enzymatic links and characteristics of putative catalyzing enzymes.

Julie Jiang, Li-Ping Liu, Soha Hassoun

Integrating Deep Neural Networks and Symbolic Inference for Organic Reactivity Prediction


We present a machine-intelligence approach to predict reaction products by integrating deep neural networks and probabilistic rule-based inference based on chemical knowledge, acheiving more than 90% accuracy on the USPTO dataset.

Wesley Wei Qian, Nathan Russell, Claire L. W. Simons, Yunan Luo, Martin D. Burke, Jian Peng

Machine Learning guided peptide design in a drug discovery context


No description.

Simone Fulle, Carsten Stahlhut, Søren Berg Padkjær

Synthetic Planning and Library Optimization for Automated Cross Coupling Synthesis Platforms


We present algorithms for automation friendly synthetic planning of linear natural products, optimization of  building block libraries and characterization of necessary cross couplings.

Nathan Russell, Andrea Palazzolo, Claire Simmons, Martin Burke, Jian Peng

Learning Neural Retrosynthetic Planning


We propose a novel tree search algorithm for retrosynthetic planning which exploits neural modules. This enables the learning of generalizable structures from prior experiences to promote planning efficiency and effectiveness in new tasks.

Binghong Chen, Hanjun Dai, Chengtao Li, Le Song

Explicit geometric information in molecular property prediction


No description. 

Martin Vögele, Ron Dror

Applications of the ATOM Modeling PipeLine (AMPL) for Pharmacokinetics Property Prediction


The ATOM Modeling PipeLine (AMPL) is an open source software library for building and sharing machine learning models that predict biological activities or physicochemical and pharmacokinetic properties.

Benjamin D. Madej

Generative Models for Graph-Based Protein Design


We introduce conditional language models for protein sequences that directly condition on a graph specification of a 3D structure.

John Ingraham, Vikas K. Garg, Regina Barzilay, Tommi Jaakkola

Transform drug discovery and development with the integration of AI, physics models, and computational resources: an experience from XtalPi


Can we integrate several AI models to provide a unified solution to accelerate the R&D process?

Sivakumar Sekharan, Lipeng Lai, et al.

Prediction of IR Spectra with Machine Learning


A method for the prediction of IR spectra from the molecular structure of novel molecules using a machine learning model.

Michael Forsuelo, Charles McGill, Yanfei Guan, William Green

Data-efficient machine learning guided protein engineering


A machine learning approach to rapidly optimize proteins with only small amounts of training data.

Surojit Biswas, Grigory Khimulya, Ethan C. Alley, George M. Church

Learning a molecular latent space organized by binding affinities


We propose to learn a molecular latent space organized by binding affinities, by adding binding affinities multitask prediction to a variational autoencoder. 

Jacques Boitreaud, Carlos Oliver, Vincent Mallet, Jerome Waldispühl

Rational Design of a Parallel Synthesis Program for the Optimization of Anti-fungal HDAC Inhibitors


No description.

Benjamin Merget, B. Merget, C. Wiebe, A. Koch

ChemBERT: A pretrained language model for the extraction of chemical reaction information


We present ChemBERT, a pre-trained language model based on BERT, which enables automated chemical reaction extraction. 

Jiang Guo, A.S. Ibanez-Lopez, Victor Quach, Regina Barzilay

Predicting the impact of somatic mutations using cell painting


We over-express gene mutations in cultured cells and profile their phenotype using images to identify their impact in the development of cancer.

Juan C. Caicedo, Shantanu Singh, Alice Berger, Angela Brooks, Jesse Boehm, Anne E. Carpenter

Predicting Drug Activity using Combined Phenotypic Features and Chemical Structures


We use machine learning to predict the biochemical activity of a compound based on its chemical structure combined with gene expression and image-based profiles of cells exposed to it.

Tim Becker, Kevin Yang, Juan Caicedo, Shantanu Singh, Tommi Jaakkola, Regina Barzilay, Anne E. Carpenter

Learning mass spectrometry fragmentation of small molecules


Discovering small molecules using mass spectral database search based on probabilistic modeling.

Liu Cao, Alexey Gurevich, Hosein Mohimani

LigandNet: A machine learning toolbox for predicting ligand activity towards therapeutically important proteins


A machine learning toolbox for predicting ligand activity towards therapeutically important proteins. 

Md Mahmudulla Hassan, Danielle Castaneda-Mogollon, Govinda KC, and  Suman Sirimulla

Deep Learning-Generated Potential NMDA Receptor Antagonists Using a Variational Autoencoder


This work presents the application and assessment of deep learning software for the generation of de novo candidate antagonists of the NMDA receptor.

Katherine J. Schultz, Sean M. Colby, Yasemin Yesiltepe, Jamie R. Nuñez, Monee Y. McGrady, Ryan R. Renslow

Predicting Drug Activity using Combined Phenotypic Features and Chemical Structures


We use machine learning to predict the biochemical activity of a compound based on its chemical structure combined with gene expression and image-based profiles of cells exposed to it.

Tim Becker, Kevin Yang, Juan Caicedo, Shantanu Singh, Regina Barzilay, Anne E. Carpenter

CORE: Automatic Molecule Optimization Using Copy and Refine Strategy


A deep generative model to enhance molecules with more desirable properties which utilizes a copy and refine strategy.

Tianfan Fu, Cao Xiao, Jimeng Sun

Regioselectivity Predictions Using Graph Network and Quantum Descriptors


GNN model to predict major products for general regioselective reactions using quantum descriptors. 

Yanfei Guan, Thomas Struble, Oscar Wu, Lagnajit Pattanaik, Connor Coley, William H. Green, Klavs F. Jensen

Property Prediction for 2-Molecule Mixtures


We explore different techniques for property prediction of a molecular pair.

Allison Tam, Octavian Ganea, Gary Becigneul, Regina Barzilay

Optimizing Synthesis Plans for Molecular Libraries


This poster presents an optimization strategy to plan syntheses for a molecular library.

Hanyu Gao, Jean Pauphilet, Thomas J. Struble, Connor W. Coley, William H. Green, Klavs F. Jensen

Analyzing Learned Molecular Representations for Property Prediction


We carefully benchmark property prediction models on public and proprietary datasets and introduce a new graph convolution model which outperforms previous baselines.

Kevin Yang, Kyle Swanson, Wengong Jin, Connor Coley, Philipp Eiden, Hua Gao, Angel Guzman-Perez, Timothy Hopper, Brian Kelley, Miriam Mathea, Andrew Palmer, Volker Settels, Tommi Jaakkola, Klavs Jensen, and Regina Barzilay

Machine Learning for Ligand Design in Palladium-Catalyzed Cross Coupling Reactions


Work towards in-silico ligand design for Buchwald-Hartwig cross coupling reactions.

Jessica Xu, Thomas Struble, Yanfei Guan, Priscilla Liow, Joseph M. Dennis, Stephen L. Buchwald, and Klavs F. Jensen

Practical constraints on machine learning in drug discovery - a case study


We identify several key challenges that need to be overcome for machine learning to be deployed in real-world drug discovery, propose a set of tools to address them and conclude with a case study highlighting the computational design of a potent compound inhibiting uptake of α-Synuclein by neuronal iPS cells with applications in Parkinson’s Disease.

Dominique Beaini, Lu Zhu, Daniel Cohen, Sébastien Giguère

Thermodynamic Properties of Materials: Towards the Prediction of Solubility


Prediction of solubility using directed message passing neural networks.

Florence H. Vermeire, William H. Green

Wasserstein Graph Representation and Generation


We perform graph representation and generation by combining message passing neural networks and the optimal transport geometry.

Gary Bécigneul, Benson Chen, Octavian Ganea, Tommi Jaakkola, Regina Barzilay

Drug-BERT: Pre-training Drug Sub-structure Representation for Molecular Property Prediction


In this work, we explore how to learn effective drug representation using BERT by leveraging the massive unlabeled molecular data and the bidirectional self-attention mechanisms for improved molecular property prediction.

Kexin Huang, Cao Xiao, Lucas M. Glass, Jimeng Sun

3D Molecular Representation and Modeling using Deep Learning


The method to represent and generate 3D molecular structures using GAN will be presented.

Tomohide Masuda, Matthew Ragoza, David Ryan Koes

Inductive Transfer Learning for Molecular Activity Prediction


We propose the Molecular Prediction Model Fine-Tuning (MolPMoFiT) approach, an effective transfer learning method that can be applied to any QSPR/QSAR problems.

Xinhao Li, Denis Fourches

Learning accurate generative models of chemical structures from limited training examples


We explore strategies for training text-based generative models of chemical structures in low-data situations, and apply these insights to train generative models of naturally occurring small molecules (metabolites) in several species.

Michael A. Skinnider, Leonard J. Foster

Hierarchical Graph-to-Graph Translation for Molecules


A graph-to-graph translation model for optimizing molecular properties.

Wengong Jin, Regina Barzilay, Tommi Jaakkola

Practical Application of Deep Learning to Drug Discovery Project Data


We will describe application and validation of Alchemite™, a novel deep learning method for data imputation, to drug discovery data covering two pharma projects and diverse endpoints.

Thomas Whitehead, Ben Irwin, Julian Levell, Matthew Segall, Gareth Conduit

Analyzing uncertainty estimates for deep molecular property prediction


This work takes a step towards an accurate characterization of DNN-based uncertainty for molecular property prediction, providing experimental insights based on both public and industry datasets.

Gabriele Scalia, Colin Grambow, Barbara Pernici, Drew Wicke, Vladimir Chupakhin, Hugo Ceulemans, William H. Green

A deep learning approach to antibiotic discovery


To address the rapid emergence of antibiotic-resistant bacteria, we trained a deep neural network capable of predicting molecules with antibacterial activity.

Jonathan M. Stokes, Kevin Yang, Kyle Swanson, Wengong Jin, Andres Cubillos-Ruiz, Nina M. Donghia, Craig R. MacNair, Shawn French, Lindsey A. Carfrae, Zohar Bloom-Ackerman, Victoria M. Tran, Anush Chiappino-Pepe, Ahmed H. Badran, Ian W. Andrews, Emma J. Chory, George M. Church, Eric D. Brown, Tommi S. Jaakkola, Regina Barzilay, and James J. Collins

Application of Multi-Armed Bandit Problems to Ensemble FMO to Efficiently Propose Promising Idea Compounds


Application of multi-armed bandit problems to ensemble FMO to efficiently propose promising idea compounds.

Kenichiro Takaba

Performance of Reaction Datasets in Machine Learning Approaches to Synthesis Planning and Use in Restricted Domains


The overlap and performance of reaction datasets for the task of retrosynthetic planning are explored, alongside the potential to restrict their domain for specialized applications.

Amol Thakkar, Jean-Louis Reymond, Ola Engkvist, Esben Jannik Bjerrum

Transfer Learning: A Next Key Driver of Accelerating Materials Discovery with Machine Learning


Transfer learning enables us an opportunity to explore the extrapolation area.

Chang Liu, Hironao Yamada, Stephen Wu, Yokinori Koyama, Ryo Yoshida

Deep generative workflow for the real-world design of novel lead compounds


Standigm developed an AI de novo drug design workflow from target to lead compound which automates molecule generation, prioritization and optimization for further synthesis and testing.

Sang Ok Song, Jae Hong Shin, Sanghyung Jin, Minkyu Ha, Jiho Yoo, Jinhan Kim

Increasing the applicability domain of QSAR models with meta-learning


We use episodic meta-learning to learn to extrapolate and increase applicability domain of QSAR models.

Prudencio Tossou, Basile Dura

Interpretable graph neural networks for molecular property prediction


We propose a new hierarchical graph pooling layer that improves graph neural networks performance and interpretability on several molecular property prediction benchmarks.

Emmanuel Noutahi, Dominique Beani, Julien Horwood, Prudencio Tossou

Machine Learning in Quantum Chemistry: Art or Science?


We present a deep neural network (DNN) that not only outperforms DFT in predicting energies and electron densities of organic molecules, but also has a strong theoretical foundation.

Anton V. Sinitskiy, Daniil V. Izmodenov, Iosif V. Leibin, Georgiy K. Ozerov, Dmitry S. Bezrukov, Vijay S. Pande

Fully Convolutional Neural Network Models for Materials Science Applications


New methodology for variational autoencoder.

Abraham Stern, Michelle Gill, Dave Magley, Ellen Du, Ryan Marson, Jonathan Moore, Bart Rijksen, Joey Storer, Sukrit Mukhopadhyay

Data Augmentation and Pre-training for Template-Based Retrosynthetic Prediction


This poster presents results of pre-training a reaction template relevance neural network on generalized template applicability.

Michael E Fortunato, Connor  W Coley, Brian C Barnes, Klavs F Jensen

Efficient Modeling of Reaction Outcomes Using Active Machine Learning


Through retrospective analysis of existing reactivity data derived from high-throughput reaction screening, we demonstrate that active machine learning can be used to reduce the experimental burden associated with this type of screening.

Natalie S. Eyke, William H. Green, Klavs F. Jensen

Reaction Condition Prediction


A Weisfeiler-Lehman network based model was proposed to predict reaction conditions.

Jiannan Liu, Hanyu Gao, Thomas Struble, Klavs F. Jensen

Graph dynamical networks for unsupervised learning of atomic scale dynamics in materials


We developed an unsupervised learning approach to learn the dynamics of atoms or small molecules in materials from time-series molecular dynamics simulation data.

Tian Xie, Arthur France-Lanord, Yanming Wang, Yang Shao-Horn, Jeffrey Grossman

Generative Molecular Design for Lead Optimization: Demonstration by Discovery of Potent, Selective Aurora Kinase B Inhibitors with Favorable Candidate Quality Attributes


ATOM consortium developed a platform for generative molecular design an demonstrated its application in lead optimization by discovery of potent, selective aurora kinase B inhibitors with favorable candidate quality attributes.

Andrew D Weber, Jason Z Deng, Kevin McLoughlin, Jeffrey Mast, Thomas Sweitzer, Juliet McComas, Margaret Tse, Derek Jones, Jonathan Allen, Stacie Calad-Thomson, Jim Brase, Tom Rush

Black Box Recursive Translations for Molecular Optimization


Motivated by molecular optimization as a translation task, we develop Black Box Recursive Translation (BBRT), a drop-in framework where we explore and demonstrate the empirical benefits of recursive inference, vis-à-vis decoding strategies, as applied to unconstrained molecular optimization.

Farhan Damani, Vishnu Sresht, Stephen Ra

Leveraging non-structural data to predict structures of protein–ligand complexes


We present a method that improves the performance of small-molecule-ligand binding pose prediction by considering other ligands known to bind the target protein.

Joseph Paggi, Ron Dror

Use of Full 3D Electronic Structure with a Convolutional Neural Network for Prediction of Molecular and Energetic Material Properties


We use full, raw 3D electronic structure gas-phase quantum chemistry calculations as input to a convolutional neural network for the prediction of a variety of molecular and condensed phase material properties, including transfer learning to a small experimental dataset.

Brian C. Barnes, Alex D. Casey, Betsy M. Rice, Steven F. Son

Leveraging non-structural data to predict structures of protein–ligand complexes


We present a method that improves the accuracy of small-molecule-ligand binding pose prediction by considering other ligands known to bind the target protein.

Joseph M. Paggi, Ron O. Dror

Deep Learning of Activation Energies


A message passing neural network predicts the activation energies of gas-phase chemical reactions.

Colin A. Grambow, Lagnajit Pattanaik, William H. Green

Encoding periodic trends for fully transferable neural network potentials


An encoding of periodic trends is employed to develop a high-dimensional neural network potential which is constant in size with respect to the number of unique elements in the training data.

John E. Herr, Kevin Koh, Kun Yao, John Parkhill

Olympus: A Toolkit for Benchmarking Optimization Algorithms on Experimentally Derived Surfaces


We present Olympus, a framework for the benchmarking of optimization algorithms on experimentally derived surfaces. Olympus includes a collection of datasets from physics, chemistry and materials science, and a suite of optimization algorithms, from random search to Bayesian and evolutionary, that can be easily accessed via a user-friendly python interface. In addition, Olympus allows to test custom algorithms and user-defined benchmark datasets. Here, we will present the user interface and a few examples of how Olympus can be used as a benchmark toolkit for optimization, as well as a simulation platform for hypotheses evaluation prior to experimental testing.

F. Häse, R.J. Hickman, M. Aldeghi, E. Liles, L.M. Roch, J.E. Hein, A. Aspuru-Guzik

Smart Data Analytics in Pharmaceutical Manufacturing


This poster describes a robust and automated approach for data analytics method selection and model construction that allows the user to focus on goals rather than methods.

Weike Sun, Kristen A. Severson, Richard D. Braatz

Generating 3D transition state structures with deep learning


Using a dataset generated from quantum chemistry, we predict 3D transition state structures from reactant and product geometries.

Lagnajit Pattanaik, John Ingraham, Colin Grambow, Regina Barzilay, Tommi Jaakkola, Klavs Jensen, William Green

Document embedding centroids: new and versatile descriptors for biomedical entities


We show that Doc2Vec-generated embeddings for biomedical entities can be useful for nearest neighbor, clustering, and association learning tasks.

John P. Santa Maria Jr., Eugen Lounkine, Jeremy Jenkins

Predicting therapeutic targets of small molecules using an integrated machine learning framework


Iterative machine learning framework integrates multiple data sources to link small molecules to therapeutic targets.

Nishanth Merwin, Dr. Victoria Catterson, Dr. Gabriel Musso

Finding diverse routes of organic synthesis using surrogate-accelerated Bayesian retrosynthesis


We propose a sequential Monte Carlo algorithm to find diverse synthesis routes of target molecule within a Bayesian framework, and accelerated the sampling efficiency by a surrogate model.

Zhongliang Guo, Stephen Wu, Ryo Yoshida

Augmenting protein network embeddings with sequence information


We learn protein representations by integrating data from physical interaction and amino acid sequence.

Hassan Kané, Mohamed Coulibali, Pelkins Ajanoh, Ali Abdalla

Learning Representations of DNA Sequences for Low Resource Promoter Characterization


A method for learning effective deep learning representations of promoter sequences with low resource data.

Peter Morales, Rajmonda Caceres, Matt Walsh, Christina Zook, Catherine Van Praagh, Nicholas Guido, Todd Thorsen

Antibody’s developability prediction with machine learning and a high quality data set


Uses of machine learning models and feature engineering will be presented for predicting Mab developability.

Lei Jia, Yax Sun, Alex Jacobitz, Vladimir Razinkov, Marissa Mock, Nic Angell, Peter Grandsard, BAT team

Benchmarking the Synthesizability of Molecules Proposed via de novo Generative Models


We quantified synthetic feasibility with a a data-driven computer-aided synthesis planning program and analyzed the approaches to bias generative models to synthesizable space.

Wenhao Gao, Connor Coley, Klavs Jensen

Using big data and machine learning to understand T cell dysfunction in human tumors


Big data  genomics and machine learning can be used to understand T cell dysfunction in human tumors. We created an invitro system to improve our understanding of how T cells get exhausted in tumors and how we can use these data to identify new targets and biomarkers.

Simarjot Pabla, Tenzing Khendu, Cailin Joyce, Benjamin Duckless, Andrew Basinski, Matthew Hancock, Jeremy Waight, Mariana Manrique, Jennifer Buell, Alex Duncan, David Savitsky, Lukasz Swiech, Thomas Horn, John Castle

PostEra - The GPS for Chemistry


We present a machine learning methodology for lead optimization and scaffold hopping which combines synthesis prediction and uncertainty-calibrated bioactivity prediction.

Alpha Lee, Matthew Robinson, Aaron Morris

The Center for Computer-Assisted Synthesis: An Overview


Overview of the newly established Phase I CCI. 

Olaf Wiest, Nitesh Chawla, Abigail Doyle, Robert Paton, Richmond Sarpong, Matthew Sigman

Molecule Structure Elucidation given Mass Spectrum and Chemical Formula


Method to build the molecular structure one bond at a time given its chemical formula and observed spectrum.

David Jian Yi Lee, Bingquan Shen,  Hai Leong Chieu

AI-assisted lead optimization with SynSpace


The poster will introduce the SynSpace de novo design software with built-in synthetic feasibility and retrosynthesis capability that provides a user-friendly environment for drug hunters and a versatile design engine for automated platforms.

Istvan Szabo, Greg Makara, Gabor Pocze, Laszlo Kovacs, Anna Szekely

Machine learning guided process development for complex systems with multiple objectives


Enhancing process development via hybrid approach of using a prior knowledge and machine learning. 

Perman Jorayev, Danilo Russo, Paul Deutsch, Alexei Lapkin

Applying state-of-art deep learning models to design novel JAK inhibitors


Applying state-of-art deep learning models to design novel JAK inhibitors. 

Vykintas Jauniškis, Ole Winther, Daniel R. Greve

Exploring Fragment-based Target-specific Ranking Protocol with Machine Learning on Cathepsin S


Apply fragmentation and machine learning methods to rank the activities of Cathepsin S inhibitors which is challenging for conventional methods due to their large size, high flexibility and similar chemical component.

Yuwei Yang

Optical Graph Recognition of Chemical Compounds by Deep Learning


We present a deep neural network model for recognizing graph structures from chemical compound images.

Martijn Oldenhof, Adam Arany, Yves Moreau, Jaak Simm

Computational Pipeline to Discover Inhibitors of RPN11 using Machine Learning and Image Processing


Our drug discovery pipeline combines molecular docking and molecular dynamics simulations with powerful image processing and machine learning techniques.

Aayush Gupta, Huan-Xiang Zhou

Accelerating lead optimization with active learning by exploiting MMPA based ADMET knowledge with regression forest potency models


Combining matched molecular pair analysis(MMPA) derived rules for ADMET issues with regression forest based potency scoring in an active learning mode can be used to accelerate lead optimization.

Alexander. G. Dossetter, Edward J. Griffen, Andrew G. Leach, Phillip de Sousa

AI Driven Iterative Screening for Hit Finding


Iterative screening can lead to drastically improved efficiency in hit discovery compared to the traditional HTS paradigm.

Gabriel Dreiman, Magda Bictash, Paul Fish, Lewis Griffin, Fredrik Svensson