The Measuring Technology

  • Upload
    giles

  • View
    22

  • Download
    0

Embed Size (px)

DESCRIPTION

- PowerPoint PPT Presentation

Citation preview

Learning Causal Structure from Overlapping Manipulations

Causal Discovery from Mass Cytometry DataPresenters: Ioannis Tsamardinos and Sofia Triantafillou

Institute of Computer Science, Foundation for Research and Technology, HellasComputer Science Department, University of Cretein collaboration with Computational Medicine Unit, Karolinska Institutet1

The Measuring Technology2Mass Cytometry3Single cells measurementsSample sizes in the millions, minimal costPublic data availableUp to ~30 proteins measured at a timeApplicationsCell countingCell sorting (gating)Identifying signaling responsesDrug screeningDe novo, personalized pathway / causal discovery (?)

3Mass Cytometry4

[Image by Bendall et al., Science 2011]Software for some of the applications mentioned is available4Cell Sorting (Gating)Immune system cells can be distinguished based on specific surface markers.

Process resembles a decision tree5

[Image by Bodenmiller et al., Nat. Biotech. 2012]Identifying Signaling ResponsesImmune responses are triggered by specific activators Signaling responses are sub-population specific.Mass cytometry for identifying signaling effects:Functional proteins (non-surface) are also marked (e.g., pSTAT3 and pSTAT5)Activators are applied to stimulate a response to diseaseCells are sorted by sub-populationChanges in protein abundance/phosphorylation in each subpopulation are quantified

6

Columns correspond to Different subpopulationsDifference in log2 mean intensity of the stimulated condition compared with the unstimulated control[Image by Bendall et al., Science 2011]Drug ScreeningUnwanted signaling responses should be suppressed for disease treatmentMass cytometry for drug screeningAfter stimulation, cells are treated with potential drugs (inhibitors)Cells are sorted by sub-populationDose-response curves are identifiedPer activatorPer sub-populationPer inhibitor

7

[Image by Bodenmiller et al., Nat. Biotech. 2012]7The Public Data813 activatorsBendall Data913 surface 18 functional variables several subpopulationsDonor 113 surface 18 functional variables several subpopulationsDonor 2[Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum, Bendall et al., Science 332, 687 (2011)]no activatorno activator

Bodenmiller Data: Time Course1010 surface14 functional variables0 min1 min 5 min15 min30 min60 min120 min240 min11 activatorsEach well produces a data set.10 surface 14 functional variables several subpopulations [Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators, Bodenmiller et al., Nature Biotechnology 30, 9 (2012) ]A plate with 96 wellsno activatorBodenmiller Data: 8 donors118 donors11 activatorsA plate with 96 wellsEach well produces a data set.10 surface 14 functional variables several subpopulations [Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators, Bodenmiller et al., Nature Biotechnology 30, 9 (2012) ]

Bodenmiller Data: Inhibitors12Inhibitor(drug) in 7 dosages11 activators27 inhibitors10 surface 14 functional variables several subpopulations [Multiplexed mass cytometry profiling of cellular states perturbed by small-molecule regulators, Bodenmiller et al., Nature Biotechnology 30, 9 (2012) ]Bodenmiller dataBendall dataInhibitor data8donor dataTime course dataActivatorsTimeDonorsInhibitorsSubpopulationsProteinsCollection of datasets with :All activators1 time point (30)1 donorAll InhibitorsAll SubpopulationsAll 10+14 markers measuredData summary13Data Summary14 BTK ERK HLADR IgM NFkB P38 PLCg2 S6 SHP2 SLP76 STAT3 STAT5 ZAP70 Creb CrkL CXCR4 H3 IkB a Ki67 MAPKAPK2 Src AKT LAT STAT1BCRGCSFIFNaLPSPMAPVO4 Flt3L IL3GMCSFIL7SCFTNFTPOIFN-gIL-2IL-3BodenmillerBendallACTIVATORSFunctional ProteinsBothCausal Discovery in Mass CytometryFeedback loopsLatent variablesNon-linear relationsUnfaithfulness

A typical day in the cellImage courtesy of Dr. Brad MarshA Basic Approach16Local Causal Discovery17Nothing causes XXYZUse stimulus as instrumental binary variableXYZXYZXYZXYZXYZXYZXYZXYZXYZAssumptions:Causal Markov ConditionReichenbachs Common Cause PrincipleNo feedback cyclesIssue #1: Signaling is Sub-Population SpecificGate dataData were gated by the initial researchers in Cytobank.orgAnalyze sub-populations independentlyGated sub-populations differ between Bodenmiller and Bendallcd4+, cd8+, nk sub-populations in common.

18Bodenmiller Bendallcd14+hladr-,cd14+hladrhighcd14+hladrmidcd14+surf-cd14-hladr-cd14-hladrhighcd14-hladrmidcd14-surf-cd4+cd8+dendriticigm+igm-nk

Pre-B IIMature CD38lo BPre-B IMature CD38mid BImmature BPlasma cellnkMyelocyte

Mature CD4+ TNaive CD4+ TCMPNaive CD8+ TMature CD8+ TCD11b- MonocyteCD11bmid MonocyteCD11bhi Monocyte

MPPHSC Megakaryocyte Erythroblast PlateletMEPPlasmacytoid DCGMPIssue #2:Dormant RelationsRelations may appear only during signalingPool together unstimulated and stimulated data

Different parts of the pathway maybe activated by different activatorsAnalyze data from different activators independently

19Issue #3: Testing Independence20SP1P2MIC: Science test/ problems: not conditional, you have to select some parameters plus you need to run simulations to define p-values for a specific sample size.20Issue #4Make Reliable Predictions21SP1P2p-valuedependentindependent

Issue #5: Identify Outlier Experiments22Inhibitor(drug) in 7 dosages27 inhibitors11 stimuliInhibitor data for zero dosage and 8 donor data should represent the same joint distributionDo they?

DistanceIssue #5: Identify Outlier ExperimentsInhibitor data for zero dosage and 8 donor data should represent the same joint distributionDo they?

23Given a pair of plates:For each activator, rank correlations (of markers), compute spearman correlation of rankingDistance = 1-min correlation over activators2324Time Course DataInhibitor DataFor every inhibitorAll necessary dependencies and independencies holdNoNoYesYesYesPipeline for making causal predictionsYesYesNo24Causal Postulates A list of predicted causal pairs, each tagged for a specific population and activator, ranked according to a score quantifying the frequency of appearance.

250.5482

pPlcg2pSTAT30.8750.5512

pPlcg2pZap700.81250.7152

pSlp76pSHP20.81250.6708

pSHP2pSTAT30.78570.8526

pPlcg2pP380.75 0.6166

pPlcg2pZap700.75 0.5688

pSlp76pZap700.75 0.4557

pSTAT3pBtk0.70590.5688

pSHP2pZap700.7143cd14-hladr-cd14-hladrmidcd14-hladrmiddendriticcd14+hladr-cd14-hladr-cd14-hladr-cd14-hladrmidcd14-hladr-0.4557

BCRpS6pErk0.7037igm-288 predictions in 14 sub-populationsInternal Validation26ActivatorProtein1Protein2ActivatorProtein3Protein242% of the predicted triplets are also reportedDespite strict thresholds and multiple testing

Theory+algorithms: [Tillman et. al. 2008, Triantafillou et. al 2010, Tsamardinos et. al 2012]Check whether predicted triplet has also been reportedActivatorProtein1Protein3ActivatorProtein3Protein1ORProtein2Protein2Validation on Bendall Data0.2411PMAERKSTAT3CD8+ 0.44440.51850.3114PMAP38STAT30.2459IFNaBTKSHP20.42310.1802IFNaSTAT5ZAP700.5185PMAERKSTAT3NK0.51850.4444PMAS6NFKbPMAS6STAT30.4815PMAERKZAP700.40740.33410.25020.53960.2236Bendall DataPMAS6STAT30.4074LPSSHP2ZAP700.4444CD4+ 0.14760.4352!Measurements in Bendall data are taken 15 minutes after activation2727Validation on Bendall Data280.2411PMAERKSTAT3CD8+ 0.44440.51850.3114PMAP38STAT30.2459IFNaBTKSHP20.42310.1802IFNaSTAT5ZAP700.5185PMAS6STAT30.4074LPSSHP2ZAP700.4444PMAERKSTAT3NK0.51850.4444PMAS6NFKbPMAS6STAT30.4815PMAERKZAP700.4074CD4+ 0.0100.3500.33410.18430.1300.3600.24980.25020.53960.22360.2300.2300.18910.0500.4500.17580.0000.3600.29360.1300.4900.15610.0200.1500.07930.0500.0200.06280.0000.0500.11130.0000.2400.2221ConflictingConfirmingCorrelation0.14760.435228ResultsHundreds of predictions to-be-tested; Experiments under way!Internal validation using non-trivial inferencesPromising validation on another collection of dataset (Bendall)Evidence of batch effects and/or biological reasons of variabilityMethod based on the most basic causal discovery assumptions29A Not So Basic Approach30Data summary31Bodenmiller dataBendall dataInhibitor data8donor dataTime course dataActivatorsTimeDonorsInhibitorsSubpopulations

Our basic approach utilized 12.5% of the available data setsCo-analyzing data sets from different experimental conditions with overlapping variable sets32

Condition A

Condition B

Condition C

Condition D

Different experimental conditionsDifferent variable setsData can not be pulled together because they come from different distributions

Principles of causality links them to the underlying causal graphCo-analyzing data sets from different experimental conditions with overlapping variable sets33

Condition A

Condition B

Condition C

Condition D

Identify a single causal graph that simultaneously fits all data33What type of causal graph?34ABDCManipulations in SMCMs35ABDCReverse Engineering36ABDCEABDCEABDCEABDCEObserved (in) dependenciesIndependencies as constraints37ABCA-C does not exist(A-B does not existORA-B is into BB-C does not existOROR B-C is into B)AND Statistical errors38

What happens with statistical errors?Conflicts make SAT instance unsatisfiable!

Sort constraints!*well, not really

Comparing p-values39

39COmbINE Algorithm40

COmbINEAlgorithm that transforms independence constrains to SAT instanceSummary of semi Markov Causal models that best fits all data sets simultaneously

Eric Ellis40Similar AlgorithmsSBCSD: [Hyttinen et al., UAI, 2013]Inherently less compact representation of path constraints.Does not handle conflicts; non applicable to real data.In addition, it admits cycles.Scales up to 14 variablesLininf [Hyttinen et al., UAI 2012, JMLR 2012]Linear relations only.Scales up poorly (6 variables in total with overlapping variables, 10 without).In addition, it admits cycles.

41

Execution Time in SecondsPerformance on Simulated Data42

Application on Mass Cytometry data43

cd4+ T-cellscd8+ T-cellsResponse to PMA

Summary and ConclusionsMass Cytometry data a good domain for causal discoveryHundreds of robust causal postulatesApproach:Conservative: local discovery, performing all tests, independent analysis of populationsOpportunistic: using 2 thresholds for (in)dependency

New algorithm that can handle different experimental conditions overlapping variable subsets deal with statistical errors

Numerous directions open for future work on this collection of dataExperiments under way!44Acknowledgements and CreditIoannis TsamardinosAssociate ProfLab Head

Sofia TriantafillouPh.D. Candidate

Vincenzo LaganiResearch FellowJesper TegnrProfUnit Head

Angelika SchmidtPost-Doc

David Gomez-Cabrero, Project Leader

45

Funded by: STATegra EU project (stategra.eu)