9
PMML for QSAR Model Exchange Rajarshi Guha, Ph.D. NIH Center for Advancing TranslaEonal Sciences [email protected] / h0p://rguha.net

PMML for QSAR Model Exchange

  • Upload
    rguha

  • View
    951

  • Download
    1

Embed Size (px)

Citation preview

Page 1: PMML for QSAR Model Exchange

PMML  for  QSAR  Model  Exchange    

Rajarshi  Guha,  Ph.D.    NIH  Center  for  Advancing  TranslaEonal  Sciences  

[email protected]  /  h0p://rguha.net  

Page 2: PMML for QSAR Model Exchange

Background  •  CheminformaEcs    – QSAR,  diversity  analysis,  virtual  screening,    fragments,  polypharmacology,  networks  

•  RNAi  screening,  high  content  imaging  •  Extensive  use  of  machine  learning  •  All  Eed  together  with  soLware    development  (GUI’s,  libraries)  

•  Contributed  pmml.lm  to  the  PMML  package  

Page 3: PMML for QSAR Model Exchange

QuanEtaEve  Structure  AcEvity  RelaEonships  

Page 4: PMML for QSAR Model Exchange

Why  is  QSAR  Useful?  

•  Lets  us  predict  whether  a  chemical  is  likely  to  be  toxic,  avoiding  animal  tesEng  

•  PrioriEze  molecules  from  a  high  throughput  screen  of  300K  molecules  

•  Predict  whether  a  molecule  will  be  (sufficiently)  soluble  in  water  

•  IdenEfy  molecules  with  anE-­‐malarial  properEes  •  Accurate,  predic-ve  models  can  save  significant  -me  and  money  (and  cute  bunnies)  

Page 5: PMML for QSAR Model Exchange

Lots  and  Lots  of  Models  

•  Hundreds  of  such  models  published  in  the  literature  – Usually  in  the  form  of  tables  of  regression  coefficients  (if  we’re  lucky)  

–  If  the  paper  describes  an  SVM  model,  no  chance  of  reproducing  the  results  

•  How  can  we  exchange  QSAR  models?  

Page 6: PMML for QSAR Model Exchange

QSAR  Model  Exchange  

•  Build  models  in  ….,    •  Save  them  in  PMML  •  Distribute  •  …  •  Profit?  – Not  always  

 The  bo0leneck  is  evalua:ng  descriptors  for  the  new  observa:ons  to  supply  to  the  model  

Page 7: PMML for QSAR Model Exchange

CheminformaEcs  in  R  

•  rcdk  provides  cheminformaEcs  support  in  R  – Load  and  parse  molecular  file  formats  – Evaluate  numerical  descriptors  from  chemical  structures  

R Programming Environment

rJava

CDK Jmol

rcdk

XML

rpubchem

fingerprint

Page 8: PMML for QSAR Model Exchange

CheminformaEcs  in  R  

library(pmml)!library(rcdk)!data(bpdata)!mols <- parse.smiles(bpdata[, 1])!descNames <- unique(unlist(sapply('topological', ! get.desc.names)))!descs <- eval.desc(mols, descNames)!model <- lm(BP ~ khs.sCH3 + khs.sF + TopoPSA + VABC, data.frame(bpdata,descs))!pmml(model)!

Page 9: PMML for QSAR Model Exchange

R,  rcdk,  PMML  

•  rcdk  provides  the  means  to  take  in  molecules  and  output  a  PMML  encoded  model  

•  One  could  record  appropriate  funcEons/classes  in  the  document  and  use  that  info  to  evaluate  descriptor  for  new  observaEons  

•  Since  rcdk  is  based  on  the  Java  CDK  library,  could  also  use  jpmml,  a  Java  API  for  PMML  documents