Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled...

ORNL is managed by UT-Battelle, LLC for the US Department of Energy

Artificial Intelligence Enabled Multiscale Molecular Simulations

Arvind RamanathanTeam Lead, Integrative Systems Biology, Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge

ramanathana@ornl.gov http://ramanathanlab.org

Dmitry I. LiakhNCCS

Ramakrishnan KannanCSMD

Srikanth Yoginath

Christopher B. StanleyCSED

DebsindhuBhowmik

HengMa

Large-ScaleNumerical Simulation

Scalable Data Analytics

DeepLearning / Artificial

Intelligence

Drive Integration of Simulation, Data Analytics and Machine LearningMolecules Library: AI-driven multiscale molecular simulations

TraditionalHPC

Systems

Multi-scale phenomena in biological systems pose challenges for modeling/ simulations @ Exascale

• Simulations of physical phenomena take 45-60% of supercomputing time

• Coupled to experimental data such as Spallation Neutron Source, X-ray scattering/ diffraction facilities, etc.

• “Exascale simulations will require some analyses… be performed while data is still resident in memory…”

Towards AI-driven simulations: Interleaving data analytics + simulations

• In situ analytics• Reduced data

movement and other overheads

• Online monitoring and feedback

Job scheduler

Simulation(s)

Data storage (Disks)

Analytics

Visualization

“Big iron”

Dedicated analytics clusters

Big Store

Traditional Compute + Simulations

Unsustainable at Exascale• Data movement bottlenecks• Parallel analytics bottlenecks

Interleaving Analytics + Simulations

Job scheduler

Simulation(s)

Data storage (Disks)

Analytics

Visualization

“Big iron”

Iterative forward/backward loop • High performance framework to monitor & analyze

simulations as they are running with little/ no modification to simulation software

• Demonstrate on molecular dynamics simulations, but generalize framework for broad applicability

• Building effective (low-dimensional) latent representations of simulation datasets:– Using deep learning approaches for

molecular dynamics (MD) data– Scaling convolutional variational

autoencoder for MD

• Predicting where we should go next in MD simulations:– Building a recurrent autoencoder to

predict future steps

• Preliminary work on a reinforcement learning approach for protein folding/ docking

Outline: Can artificial intelligence (AI) techniques be leveraged for accelerating molecular simulations?

A variational approach to encode protein folding with convolutional auto-encoders

4 convolution layers 4 deconvolution layers

1. 64 filters, 3x3 window, 1x1 stride, RELU2. 64 filters, 3x3 window, 1x1 stride, RELU3. 64 filters, 3x3 window, 2x2 stride, RELU4. 64 filters, 3x3 window, 1x1 stride, Sigmoid

Reduced dimension: 3

INPUT OUTPUTS

D. Bhowmik, M.T. Young, S. Gao, A. Ramanathan, BMC Bioinformatics (accepted)

Related work: Hernandez 17 arXiv, Doerr 17 arXiv

Deep Learning reveals ”metastable states” in protein folding…

MSM Builder Datasets, Pande group

Scaling & Performance: Enabling DL approaches to achieve near real-time training/prediction

6 60 120 384Number of GPUs

Batch Size

128• Scaling deep learning on Summit to

facilitate online training• DeepEx: a custom-built deep

learning stack for Summit

• Exploiting low rank structure of scientific data:

• Accelerate training• Scale to larger datasets

• Performance on Resnet like convolutional nets

S. Yoginath , M. Alam, K. Perumalla, R. Kannan, D. Bhowmik, A. Ramanathan, ORNL Tech Report

N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger. Gaussian process bandits without regret: An experimental design approach. CoRR, abs/0912.3995, 2009.S. Grunewalder, J.-Y. Audibert, M. Opper, and J. Shawe-Taylor. Regret Bounds for Gaussian Process Bandit Problems. In AISTATS 2010 -Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9, pages 273–280, Chia Laguna Resort, Sardinia, Italy, May 2010.

Current platforms for hyperparameter optimization rely on sequential optimization techniques

• Bayesian optimization, Bandit optimization, and others are usually sequential search processes

• Exponential scaling: – The number of samples required to bound our uncertainty about the

optimization procedure is scales exponentially with the number of search dimensions, as in 2D, where D is number of dimensions [1, 2].

– Forgotten in the recent excitement of Bayesian optimization.

• Hyperspace, instead seeks to focus on the search space:

• Parallelism to exploit the statistical structure of the search space

• Reveal partial dependencies across parameter spaces

• Build many surrogate functions in parallel

• {Prayer}!

HyperSpace: Distributed Parallel Bayesian Optimization

HyperSpace: Parallel exploration of large search spaces 1. Define the bounds of each hyperparameter

search space.

2. Divide each search space bound into two nearly equal sub-bounds with overlap ɸ, where {ɸ ∈ ℝ|0 ≤ ɸ ≤ 1}.

3. Create all possible combinations of hyperparameter sub-bounds to form 2+search spaces (hyperspaces) where D is the number of model hyperparameters.

4. Run Bayesian optimization over each hyperspace in parallel

M. Todd Young, J. D. Hinkle, R. Kannan, A. Ramanathan, HyperSpace: Massively Parallel Bayesian Optimization, Workshop on High Performance Machine Learning, 2018, Lyon, France

https://github.com/yngtodd/hyperspace

Parallel exploration of large search spaces works better than random/ sequential based optimization

HyperSpace Random search SMBO

https://hpml2018.github.io/HPML2018_7/index.html#/

• Exploiting statistical dependencies in the hyperparameter dimensions leads to better set of parameters for ML models

• Building effective (low-dimensional) latent representations of simulation datasets:– Using deep learning approaches for molecular dynamics (MD) data– Scaling convolutional variational autoencoder for MD

• Predicting where we should go next in MD simulations:– Building a recurrent autoencoder to predict future steps

• Preliminary work on a reinforcement learning approach for protein folding/ docking

Outline: Can AI techniques be leveraged for biological experimental design?

RL-Fold: a naïve design based on native structure

Coffee

Extract Contact Matrices

Coffee

offee M

Trajectories

Latent Data

Stop and signal end of episode

Output

Run MDsimulations

Output

Contact Matrices

Output

DBSCANClustering

Outliers are present?

YesNoSave topologies to

spawn new simulations

Stop CalculateReward

CalculateReward

No. of native contacts in the ith sample RMSD threshold

No. of conformers in the cluster of the ith sample RMSD to the native state

No. of samples

No. of clusters

• Baseline: work with manually set threshold for RMSD from native state

• Deep RL: design a policy network that auto-detects the RMSD threshold

• Hypothesis: RL will allow sampling of “unseen” states

Pre-trained deep learning model allows RL explores possible states in protein folding

How does the folding look?

• Within 3-4 iterations, RL reaches near native state RMSD

• Further cycles explore misfolded states:• Unfold within a few steps of

MD simulations• Sampling allows exploration

of more intermediate states• Builds on all-atom simulations +

RL in a loop

Summary• Deep learning / AI techniques show promise: learning biophysical

characteristics that can be used to guide simulations

• Reinforcement learning: Preliminary evidence suggests the approach is feasible to speed up protein folding simulations!– How to integrate with physics-based models? – How to build scalable approaches beyond RL? – How to integrate with sparse experimental observables?

• Enabling iterative, active, and optimal experimental design

• Extensible library: Molecules to enable analysis of MD simulations at scale with Deep(µ)scope supporting AI-driven MD simulations

Some emerging challenges in HPC for multi-scale simulations…

• Design of coupled data analytic and simulation workflows on OLCF - Summit and ALCF – A21/Theta– In situ analytics approaches are required– Streaming applications of ML are different from post-processing of data

• Scaling DL/ AI approaches for biomolecular simulations– Faster and more efficient training for deep learning / AI approaches – Tensor based approaches to build deep learning algorithms

THANK YOU !!!• ORNL LDRD – Exascale computing

initiative

• DOE-NCI Joint Design of Advanced Computing Solutions for Cancer (JDAS4C)

• DOE Exascale Computing Project Cancer Deep Learning Environment (CANDLE)

• OLCF Early Science Access (Summit)

Questions/ Comments: ramanathana@ornl.gov

Artificial Intelligence Enabled Multiscale Molecular ... · Artificial Intelligence Enabled...

Documents

Big data-enabled multiscale serviceability analysis for ... · HOSTED BY Big data-enabled multiscale serviceability analysis for aging bridges☆ Yu Lianga,n, Dalei Wua, Guirong Liub,

Multiscale Spatiotemporal Visualisation - CORDIS · magnitude (Multiscale Spatiotemporal Visualisation, MSV in brief). As new methods for collecting and modelling multiscale data

Semiclassical & multiscale aspects of wave propagation · Semiclassical & multiscale aspects of wave propagation ... Program and Abstracts Semiclassical & multiscale aspects of wave

Multiscale quantum/atomistic coupling using … · Multiscale quantum/atomistic coupling using constrained density functional ... Multiscale coupling ... The method is applied to

Spectral Segmentation with Multiscale Graph Decompositionjshi/papers/multiscale-paper.pdf · Spectral Segmentation with Multiscale Graph Decomposition Timothee Cour´ 1 Florence Ben´

ARTIFICIAL NEURAL NETWORKS FOR GENOME-ENABLED …€¦ · ARTIFICIAL NEURAL NETWORKS FOR GENOME-ENABLED PREDICTION IN CATTLE: POTENTIAL AND LIMITATIONS Dissertation zur Erlangung

Artificial Intelligence-Enabled Cellular Networks: A

Multiscale Simulations - Georgia Tech Multiscale Systems ...msse.gatech.edu/CANE/lect05_MultiscaleSims_yanwang.pdf · Multiscale Systems Engineering Research Group Classical MD Based

Multiscale Physical and Structural Particle Mechanics · 1.2.1.3 Multiscale Physical and Structural Particle Mechanics . ... suitable for biomass. ... Multiscale Physical and Structural

Magnetospheric Multiscale MissionMagnetospheric Multiscale

Business Solutions Enabled by Artificial Intelligence

Hardware-Enabled Artificial Intelligence

Multiscale geometric methods for data sets I: Multiscale ...users.ju.edu/alittle2/Publications/Publication_ACHA_2016.pdf · Multiscale geometric methods for data sets I: Multiscale

Talk Multiscale

Artificial Intelligence/Machine Learning Enabled Software

Multiscale Filtering and Neural Network Classification for ......artificial neural network, tracking How to cite the article:Baroni M, Fortunato P, Pollazzi L, La Torre A. Multiscale

multiscale method

Tomorrow’s AI-Enabled Banking - IPsoft Inc....6 Tomorrow’s AI enabled banking Artificial Intelligence technologies will not have an isolated impact on banking operations. Technology

FUNDING & SUPPORT OF SPACE-ENABLED SERVICES POWERED BY ARTIFICIAL ... Leaflet.pdf · SERVICES POWERED BY ARTIFICIAL INTELLIGENCE . Funding up to €60K per Activity → The European

ARTIFICIAL NEURAL NETWORKS FOR GENOME-ENABLED …