GPU-Based Bayesian Phylogenetic Inference Beyond Extreme … · 2014. 4. 18. · t18 t19 ....

BEAST/BEAGLE Phylogene2cs So5ware 1

Mitch Horton – Keeneland Advanced Applica4on Development Team

GTC 2014 March 24-‐27, 2014 | San Jose, CA

BEAST/BEAGLE Phylogene2cs So5ware Enormous code base wriIen in Java, C++, CUDA BEAST version 1.4.6 consists of 81,000 lines of Java, 779 classes, and 81 packages Scaled to run on 120 Keeneland nodes (360 GPUs)

Keeneland Project Keeneland is a project inves4ga4ng the use of GPU accelerators with commodity microprocessors for high-‐performance scien4fic compu4ng. A significant component of the Keeneland project is to reach out to teams developing applica4ons that might map well to this innova4ve architecture.

Phylogene2cs In biology, phylogene4cs is the study of evolu4onary rela4on among groups of organisms, which is discovered through molecular sequencing data.

Phylogene2cs 5

AGTTCGATCCG

AGTGCGATCCG

AGTCCGAACAG

AGTCCGATGCC

AGTCCGAACCG

GGTCCGATCCG

AGTCAGAGCCG

AGTAAGAGCCG

AGTCCGACCAG

AGTCCGAGCCG

Phylogene2cs 6

AGTTCGATCCG

AGTGCGATCCG

AGTCCGAACAG

AGTCCGATGCC

AGTCCGAACCG

GGTCCGATCCG

AGTCAGAGCCG

AGTAAGAGCCG

AGTCCGACCAG

AGTCCGAGCCG

Phylogene2cs 7

AGTTCGATCCG

AGTGCGATCCG

AGTCCGAACAG

AGTCCGATGCC

AGTCCGAACCG

GGTCCGATCCG

AGTCAGAGCCG

AGTAAGAGCCG

AGTCCGACCAG

AGTCCGAGCCG

Phylogene2cs 8

Define as the number of possible unrooted bifurca4ng topologies for taxa. Thus, grows factorially with increasing , and becomes almost unimaginably large for (e.g. ). -‐ Derrick Joel Zwickl (Ph.D. Thesis, 2006) Enumera4ng and evalua4ng every possible topology would be computa4onally foolhardy.

T (n) n

T (n) = (2i− 5)i=3

n > 50 T (50) ≈ 2.84×1076

Monte Carlo Markov Chain Phylogene2cs 9

Startling, recent advances in sequencing technology are fueling a concomitant increase in the scale and ambi4on of phylogene4c analyses. Effort skyrockets with number of sequences, complexity of sequence characters, and complexity of sequence evolu4on model.

Felsenstein’s Algorithm for Likelihood 10

INITIALIZATION: SET RECURSION: COMPUTE FOR ALL AS FOLLOWS IF IS A LEAF NODE SET IF , IF IF IS NOT A LEAF NODE COMPUTE , FOR ALL AT THE DAUGHTER NODES AND SET TERMINATION: LIKELIHOOD AT SITE THE CONCLUDING STEP IN COMPUTING THE LIKELIHOOD IS TO USE THE ASSUMPTION OF INDEPENDENCE AT SITES TO WRITE:

k = 2n−1

a ≠ xuka = xu

kP(Lk | a) =1

aP(Lk | a)

P(Lk | a) = 0

P(x* |T, t*) = P(xu* |T, t*)

u = P(xu* |T, t*) = P(L2n−1 | a)qa

P(Lk | a) = P(b | a, ti )P(Li | b)P(c | a, t j )P(Lj | c)b,c∑

i, jP(Li | a)

P(Lj | a) a

A C T G C

nvvp – 100 mcmc steps 11

nvvp – 100 mcmc steps 12

Single GPU – 24 Hour Run 13

100 GPUs – 2 Hour Run 14

360 GPUs 18

360 GPUs 19

GPU-‐Based Bayesian MCMC Phylogene2c Inference at Scale 20

0 10 20 30 40 50 60 70 80 90 100

Number of Compute Nodes

Performance of BEAST/BEAGLE Phylogenetics Software125 Sequences, 2968 Sites, 5 Rate Categories, Nucleotide Model

Comparing Different Sofware Configurations12 CPU Cores per Compute Node (2 x 6-cores) 2.8 GHz, X5660, 23 GB, 270 Gflops/s Peak

3 GPUs per Compute Node (Telsa M2090) 1.3 GHz, 5.4 GB, 1.33 Tflop/s Peak120 Compute Nodes

Single node, CPU, single coreSingle node, CPU, multi-core

Single node, single GPUMulti-node, single GPUMulti-node, multi-GPU

Each site evolves according to a Markov process in which a base (T,C,A, or G) is replaced by another base in an infinitesimally short interval of 4me, , with a probability as follows: Subs4tu4on From mathema4cal manifesta4on of the Probability Matrix Markovian nature of the process: for Infinitesimally Short Interval of Time Masami Hasegawa

Batch Matrix Mul2ply 21

In gene4cs, transi4on is a point muta4on that changes a purine nucleo4de (A,G) to another purine or a pyrimidine nucleo4de (C,T) to another pyrimidine. Although there are twice as many possible transversions, approximately two out of every three single nucleo4de muta4ons are transi4ons.

Pij (dt) =απ jdt (for transition)Pij (dt) = βπ jdt (for transversion)

ij dt Pij (dt)

Wikipedia

P (t) = exp(tA) = E×diag(etλ1,…,etλS )×E−1 = EDtE−1

Finite-‐4me transi4on probabili4es that characterize how state mutates to state along A branch of length .

Psj (t) s jt

Matrix exponen4a4on is defined to be:

eX = 1k!Xk

∑For some simple cases, the above can be computed explicitly, otherwise, diagonaliza4on.

A = EDE−1⇒ An = EDnE−1⇒ I+ 11!A+ 1

2!An +…= EeDE−1

100 1000 10000 100000

Number of Matrices

Performance of cublasSgemmBatched, streams, hand-written CUDA, MKL, 4x4 CPU Cores (2 x 8-cores) 2.6 GHz, Xeon E5-2670, 32 GB, 332 Gflops/s Double Precision Peak

Fermi (Telsa M2090) 1.3 GHz, 5.4 GB, 665 Gflop/s Double Precision PeakKepler (Telsa K20X) 0.732 GHz, 5.4 GB, 1320 Gflops/s Double Precision Peak

Hand-written CUDA, KeplerHand-written CUDA, Fermi

cublasSgemmBatched, KeplercublasSgemmBatched, Fermi

MKLstreams, Fermi

streams, Kepler

100 1000 10000 100000 1e+06

Number of Matrices

Performance of Hand-written CUDA, Optimized CUDA, 4x4 CPU Cores (2 x 8-cores) 2.6 GHz, Xeon E5-2670, 32 GB, 332 Gflops/s Double Precision Peak

Optimized CUDA, KeplerOptimized CUDA, Fermi

100 1000 10000 100000

Number of Matrices

Performance of Hand-written CUDA, Optimized CUDA, cublasSgemmBatched, 20x20 CPU Cores (2 x 8-cores) 2.6 GHz, Xeon E5-2670, 32 GB, 332 Gflops/s Double Precision Peak

cublasSgemmBatched, KeplercublasSgemmBatched, Fermi

0 10000 20000 30000 40000 50000 60000

Number of Matrices

Performance of Lagrange Interpolation, Newton Interpolation, Optimized CUDA, 4x4 CPU Cores (2 x 8-cores) 2.6 GHz, Xeon E5-2670, 32 GB, 332 Gflops/s Double Precision Peak

Newton Interpolation, KeplerLagrange Interpolation, Fermi

Newton Interpolation, FermiLagrange Interpolation, Kepler

One matrix mul4ply – size N N*N*N flops for N*N memory accesses 1024 matrix mul4plies – size N/32 1024*(N/32)*(N/32)*(N/32) flops for 1024*(N/32)*(N/32) memory accesses (N*N*N)/32 flops for N*N memory accesses

GPU-Based Bayesian Phylogenetic Inference Beyond Extreme … · 2014. 4. 18. · t18 t19 ....

Documents

Exp t19 exp-t19

T19 ~10,200 BP

Symbolic and Automatic Differentiation & Code Generation · t5 = tll = t12 t18 t19 t21 = t22 t23 t25 = t26 t31 = t32 = t31t31; t34 = Sm02sm02 , t35 = Sm12sm12 t36 = sm22sm22 t38

October 2012 - IEEE · PDF fileFor transmission security limits it is assumed that transmissions capacity between two ... T18 T19 T20 T21 T22 T23 T24 T25 T26 T27 T28 T29 6 7 10 11

T18 impresionismo

Conference & Events Packascotparkhotel.co.nz/uploads/files/PDF/Confrence... · T16 Savoury scones with pesto butter T17 and beetroot relish T18 Mini Quiche Lorraine T19 Gourmet mousetraps

FOTO DESCRIPCIÓN COD T17 T18 T19 T20 T21 T22 T23 T24 T25 … · 2020-01-30 · LOTE OK 2901_CONFOTO FOTO DESCRIPCIÓN COD T17 T18 T19 T20 T21 T22 T23 T24 T25 T26 T27 T28 T29 T30

· 2016. 6. 29. · t19 jsh j15 jsk m14 jsl t18 jsm r14 jsp m19 jsr t18 jss s15 jst r18 jsw l15 jtb m14 jtc m10 jth t12 jtm k12 jts m10 jtw t14 jus m19 jwb s x, % *

Department of Conservation - Homesteadmta.homestead.com/LURCjuri-1__1_.pdfSaco Lyman Leb anon North Berwick We ls Berwick York Eliot Kittery Big Twenty Twp T19 R12 WELS T18 R10 WE

Constrained Exact Op1mizaon in Phylogenecs · Constrained Exact Op1mizaon in Phylogenecs Tandy Warnow The University of Illinois at Urbana-Champaign Orangutan Gorilla Chimpanzee Human

WHY USE NEW REG? · 2016-06-29 · why use new reg? k19 aaa t18 aab k17 abb l19 aby r16 acb l19 ack t17 acs t18 act s19 aja t19 ajd t18 ajf r16 ajk ... k16 slm t14 sls r19 smd l19

Apresentação Treinamento T19

Former Goodmans Fields – Appendix 1 · PPG13, Policy ST25, ST28, ST30, T16, T18, T19, T21 of the LBTH UDP 1998, Policies DEV17, DEV18, DEV19 of the LBTH IPG 2007 which seek to ensure

HP2-T19 Exam

Oxygen Sensors - vdo.com · T17 Pour véhicules équipés d’une climatisation T18 Pour véhicules sans climatisation T19 Pour véhicules à direction assistée T21 Boîte de vitesses

Tax Report – Summary (Part A) explained - Macquarie · T18;DF1 T19;S12 T27;S16 ... please see the Tax Guide. TaxPack references for individuals, trusts, ... Tax Report – Summary

U13 / U15 Inter-Counties Track and Field Championship · T18 3.45 200m U13B T19 3.55 200m U13G F12 2.25 High Jump U15G Derbyshire T20 4.05 200m U15G ... 3 Libby Hakney U13G Hum 02

FABRIC - spa tables › pdfdocs › mmcssp0012_finishes-and-fa… · FABRIC. TERRATOUCH ™ Butter-T09 Opal-T12 Clay-T18 Sky Blue-T04 Ruby-T08 Orchid-T19 Saffron-T21 Sage-T22 Blue

IRS T19-1994

K E R C R E C E R K i tas Coun y - co.kittitas.wa.us · t20r18 t19-r13 t18-r15 t19-r14 t18-r20 t17-r20 t16-r20 t16-r21 t18-r17 t 18-r19 t19-r20 t17-r1 9 t17-r21 t17-r22 18-r t19-r15