25
Evaluation of Evaluation of methods in gene methods in gene association studies: association studies: yet yet another case for another case for Bayesian networks Bayesian networks Department of Measurement Department of Measurement and Information Systems and Information Systems Budapest University of Budapest University of Technology and Economics Technology and Economics Gábor Hullám Gábor Hullám , , Péter Antal Péter Antal , , András Falus and Csaba Szalai András Falus and Csaba Szalai Department of Genetics Cell and Immunobiology Semmelweis

Evaluation of methods in gene association studies: yet another case for Bayesian networks

  • Upload
    naasir

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

Evaluation of methods in gene association studies: yet another case for Bayesian networks. Gábor Hullám , Péter Antal , András Falus and Csaba Szalai. Department of Measurement and Information Systems Budapest University of Technology and Economics. Department of Genetics - PowerPoint PPT Presentation

Citation preview

Page 1: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Evaluation of Evaluation of methods in gene methods in gene

association studies: association studies: yetyet

another case for another case for Bayesian networksBayesian networks

Department of MeasurementDepartment of Measurement and Information Systemsand Information Systems

Budapest University of Budapest University of Technology and EconomicsTechnology and Economics

Gábor HullámGábor Hullám, , Péter AntalPéter Antal,,András Falus and Csaba SzalaiAndrás Falus and Csaba Szalai

Department of GeneticsCell and

ImmunobiologySemmelweis University

Page 2: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Genetic association studies Genetic association studies (GAS)(GAS)

A Bayesian approach to GASA Bayesian approach to GAS Bayesian networks in GASBayesian networks in GAS Evaluation of methodsEvaluation of methods

Page 3: Evaluation of methods in gene association studies: yet another case for Bayesian networks

3

MotivationMotivation:: Exploring the Exploring the vvariomeariome

VariomeVariome Single-Nucleotide Polymorphisms (SNPs)Single-Nucleotide Polymorphisms (SNPs) Copy-Number Variations (CNVs).Copy-Number Variations (CNVs). Genome rearrangGenome rearrangeements ments MethylomeMethylome

SNPsSNPs Number of SNPs (10Number of SNPs (1077 ->10 ->1066)) Correlation structure: the HAPMAP projectCorrelation structure: the HAPMAP project

Page 4: Evaluation of methods in gene association studies: yet another case for Bayesian networks

GWAS Data analysis candidate regions.

genes

PGAS s11 PGAS s12 PGAS s13

Study design SNP datasets

PGAS Data analysis confirmations

refutations

384x48384x48384x48

Genome wide association study (GWAS)

GAS phases

Page 5: Evaluation of methods in gene association studies: yet another case for Bayesian networks

GAS Facts Publications: ~40K SNPs on plate: 100K-2M Sample size: 30K

Confirmed associations: <1000 Small attributable risk

Why? Common disease – common variance

hypothesis - multifactorial diseases, many weak interactions

Rare haplotype hypothesis (Minor allele freq. <1%)

Number of gene association studies GWAS: ~100 PGAS-: ~10K

Page 6: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Current challenge: the discovery of epistasis

Statistical epistasis: non-linear Statistical epistasis: non-linear interaction of genes interaction of genes

The goal is the exploration ofThe goal is the exploration of…… explanatory variables explanatory variables ofof the target the target

variablevariable((ss)) the the interactioninteraction of explanatory variables of explanatory variables

Genetic association concepts can be Genetic association concepts can be formalized (partially) as machine formalized (partially) as machine learning concepts and as learning concepts and as Bayesian Bayesian network conceptsnetwork concepts

Page 7: Evaluation of methods in gene association studies: yet another case for Bayesian networks

The model class: Bayesian The model class: Bayesian networksnetworks

directed acyclic graph (DAG)directed acyclic graph (DAG) nodes – domain entitiesnodes – domain entities edges – direct probabilistic relationsedges – direct probabilistic relations

conditional probability models P(X|Pa(X))conditional probability models P(X|Pa(X)) interpretations:interpretations:

effectiverepresentationof the distribution

N

i iiN XPaXPXXP11 ))(|(),..(

DAG structure:dependency map(d-separation)

edges:direct causalrelations

Page 8: Evaluation of methods in gene association studies: yet another case for Bayesian networks

8

Y

BBayesian network features ayesian network features representing relevancerepresenting relevance

Markov Blanket (Markov Blanket (subsub)Graphs (MBGs))Graphs (MBGs)

(1) parents of the node(1) parents of the node

(2) its children(2) its children

(3) parents of the children(3) parents of the children

Markov BlanketMarkov Blanket Set Sets (MBs (MBSSs)s) the set of nodes which the set of nodes which

probabilisticallyprobabilistically isolate isolate the target from the restthe target from the rest of the model of the model

Markov Blanket Membership (MBM)Markov Blanket Membership (MBM) pairwise relationshippairwise relationship

Page 9: Evaluation of methods in gene association studies: yet another case for Bayesian networks

9

GA-to-BNGA-to-BN

(model-based) pairwise association(model-based) pairwise association Markov Blanket Memberhsips (MBM)Markov Blanket Memberhsips (MBM)

MultivariaMultivariatte analysis e analysis Markov Blanket Markov Blanket sets (MB)sets (MB)

MultivariaMultivariatte analysise analysis with interactions with interactions Markov Blanket Subgraphs (MBG)Markov Blanket Subgraphs (MBG)

CCausal ausal relations/relations/models models Partially Partially directed directed BayesBayesianian networknetwork (PDAG) (PDAG)

HierarchyHierarchy DAG=>DAG=>PPDAG=>DAG=>MBGMBG=>=>MBMB=>MBM=>MBM

Page 10: Evaluation of methods in gene association studies: yet another case for Bayesian networks

AdvantagesAdvantages of of GA-to-BN - GA-to-BN - 11

SStrong relevancetrong relevance - direct association: - direct association: Clear semantics and dedicated goal for Clear semantics and dedicated goal for the explicit. faithful representation of the explicit. faithful representation of strongly relevant (e.g. non-transitive)strongly relevant (e.g. non-transitive) relationrelationss

GGraphical representationraphical representation:: It offers It offers better overview of the dependence-better overview of the dependence-independence structure. e.g. about independence structure. e.g. about interactions and conditionalinteractions and conditional relevance.relevance.

MultiMultiple ple targetstargets:: It inherently works for It inherently works for multiple targets.multiple targets.

Page 11: Evaluation of methods in gene association studies: yet another case for Bayesian networks

AdvantagesAdvantages of of GA-to-BN – GA-to-BN – 22

IIncomplete datancomplete data:: It offers integrated It offers integrated management of incomplete data management of incomplete data within Bayesian inference.within Bayesian inference.

Causality:Causality: Model-based causal Model-based causal interpretation of associationsinterpretation of associations

Haplotype level: Haplotype level: Offers integrated Offers integrated approach to haplotype reconstruction approach to haplotype reconstruction and association analysis (assuming and association analysis (assuming unphased genotype data)unphased genotype data)

Page 12: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Challenges of applying BNs in GAS

High computational complexity High sample complexity Bayesian statistics

Bayesian model averaging Feature posterior

GoalGoal: approximat: approximate e the full-scale the full-scale summationsummation (integral) (integral)

A solutionA solution: Metropolis coupled : Metropolis coupled Markov chain Monte Carlo (MCMCMC)Markov chain Monte Carlo (MCMCMC)

fFG G GPfFP:

)()(

Page 13: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Uncertainty in Uncertainty in multivariate analysismultivariate analysis

Entropy of the MBS posteriors

0

2

4

6

8

10

12

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

20_full

50_full

100_full

116

250

Entropy of the MBS posteriors

0

2

4

6

8

10

12

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

20_full

50_full

100_full

116

250

Page 14: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Advantages of the Bayesian framework

Automated Automated correction for correction for “multiple “multiple testing”testing” The measure of uncertainty at a given level The measure of uncertainty at a given level

automatically indicates its applicabilityautomatically indicates its applicability Prior incorporationPrior incorporation: better prior incorporation : better prior incorporation

both at parameter and structural levels.both at parameter and structural levels. Post fusion:Post fusion: better semantics for the better semantics for the

construction of meta probabilistic knowledge construction of meta probabilistic knowledge basesbases

Normative uncertainty for model properties (cf. bootstrap)

Page 15: Evaluation of methods in gene association studies: yet another case for Bayesian networks

The basis for comparison

Our approach is a model based exploration of the underlying

structure(note: multiple targets, causal and

direct aspects)

≠Prediction of class labels

Page 16: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Comparison of GAS toolsComparison of GAS toolsDedicated GAS toolsDedicated GAS tools

BEAMBEAM BIMBAMBIMBAM SNPAssocSNPAssoc SNPMstatSNPMstat PowermarkerPowermarker

General purpose General purpose FSS toolsFSS tools

MDRMDR Causal ExplorerCausal Explorer

……

Page 17: Evaluation of methods in gene association studies: yet another case for Bayesian networks

moderate number of clinical variables (in the moderate number of clinical variables (in the range of 50) range of 50)

hundreds of genotypic SNP variables for each hundreds of genotypic SNP variables for each patient patient

thousands of gene expression measurements thousands of gene expression measurements

Asthma Asthma Complex disease mechanismComplex disease mechanism Half of the patients do not respond well to current Half of the patients do not respond well to current

treatmentstreatments Unknown pathways in the asthmatic processUnknown pathways in the asthmatic process

Application domain: The Application domain: The genomic background of genomic background of

asthnaasthna

Page 18: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Evaluation on an artificial data set

Artificial model based on a real-world domain: the genomic background of asthma

The real data set consists of: 113 SNPs 1117 samples

Page 19: Evaluation of methods in gene association studies: yet another case for Bayesian networks

The reference model

Page 20: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Reference MBG

Asthma

SNP23

SNP69

SNP110

SNP27

SNP11

SNP64

SNP109

SNP17

SNP95

SNP61

SNP81

Page 21: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Results - 1

Software (Parameters) Sensitivity Specificity Accuracy

BMLA (CH) 1 0.99 0.99115044

BMLA (BD) 0.92307692 1 0.99115044

HITON MB (k=1) 0.76923077 0.98 0.95575221

HITON MB (k=2) 0.76923077 0.99 0.96460177

HITON MB (k=3) 0.69230769 0.99 0.95575221

MDR – TurF 0.61538462 0.97 0.92920354

MDR – Relief 0.53846154 0.96 0.91150442

interIAMB (MI) 0.46153846 0.96 0.90265487

Page 22: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Results – 2.Results – 2.

Page 23: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Results – 3. Results – 3.

Page 24: Evaluation of methods in gene association studies: yet another case for Bayesian networks

SummarySummary General BN repreGeneral BN representation is feasible sentation is feasible

and gives superior performance for and gives superior performance for PGASPGAS

Bayesian statistics allows the Bayesian statistics allows the quantification of applicability of BNsquantification of applicability of BNs

Special extensions are necessary forSpecial extensions are necessary for Multiple targetsMultiple targets Combined discovery of relevance and Combined discovery of relevance and

interactions (MBM, MBS, MBG) interactions (MBM, MBS, MBG) Scalable multivariate analysis (k-MBS concept) Scalable multivariate analysis (k-MBS concept) Feature aggregationFeature aggregationAntal et al.: A Bayesian View of Challenges in Feature Antal et al.: A Bayesian View of Challenges in Feature

Selection: Multilevel Analysis, Feature Aggregation, Selection: Multilevel Analysis, Feature Aggregation, Multiple Targets, Redundancy and InteractionMultiple Targets, Redundancy and Interaction , JMLR , JMLR Workshop and Conference ProceedingsWorkshop and Conference Proceedings

Page 25: Evaluation of methods in gene association studies: yet another case for Bayesian networks

Future work

Specific local models (GA –specific local models)

Integrated missing data management and GA analysis (cf. imputation)

Noisy genotyping probabilistic data (see poster)

Integrated haplotype reconstruction (see poster) Integrated study design and analysis (see

poster) Scaling computation up to ~1000 variables