29
Network inference from repeated observations of node sets Neil Clark, Avi Ma'ayan

Network inference from repeated observations of node sets

  • Upload
    monita

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Network inference from repeated observations of node sets. Neil Clark, Avi Ma'ayan. Network Inference. Protein-Protein interaction network. Cell signaling network. Overview. Network inference - the deduction of an underlying network of interactions from indirect data . - PowerPoint PPT Presentation

Citation preview

Page 1: Network inference from repeated observations of node sets

Network inference from repeated observations of node sets

Neil Clark, Avi Ma'ayan

Page 2: Network inference from repeated observations of node sets

Network InferenceProtein-Protein interaction network Cell signaling network

Page 3: Network inference from repeated observations of node sets

Overview

• Network inference - the deduction of an underlying network of interactions from indirect data.

1. A general class of network inference problem2. Network inference approach3. Application:

1. inference of physical interactions: PPI 2. Inference of gene associations: Stem cell genes3. inference of statistical interactions: Drug/side effect

network

Page 4: Network inference from repeated observations of node sets

GMT files

Page 5: Network inference from repeated observations of node sets

The inference problem• Input: a set of entities (genes or proteins or ...) in the form of a

GMT file - the results of experiments, or sampling more generally.

• Assumptions:• 1 An underlying network exists which relates the interactions

between the entities in the GMT file• 2 Each line of the GMT file contains information on the

connectivity of the underlying network

• The problem: Given a GMT file can we extract enough information to resolve the underlying network?

Page 6: Network inference from repeated observations of node sets

A synthetic example

Page 7: Network inference from repeated observations of node sets

Approach...• Forget for the moment that we know the underlying network and pretend

we only have the GMT file.

• Attempt to use the accumulation of our course data to infer the fine details of the underlying network.

• Consider the set of all networks that are consistent with our data - there are likely to be many.

• Use an algorithm to sample this ensemble of networks randomly.

• The mean adjacency matrix gives the probability of each link being present within the ensemble.

Page 8: Network inference from repeated observations of node sets

Inference live!

Page 9: Network inference from repeated observations of node sets

Information content

Page 10: Network inference from repeated observations of node sets

Analytic Approximation• When applying this approach to real data typically there are large numbers

of nodes

• Sample space of networks can be very large -> computationally demanding

• Write a simple analytical approximation which mimics the action of the algorithm.

𝑝𝑖𝑗 = 1−ෑ� ቆ1− 2𝛼𝑛𝑖𝑗𝑘ቇ𝑘

Page 11: Network inference from repeated observations of node sets

Compare analytic approximation

Page 12: Network inference from repeated observations of node sets

Correction for sampling bias• Destroy any information by a random permutation of the GMT file and

compare the actual edge weight to the distribution of edge weights from the randomly permuted GMT files:

Page 13: Network inference from repeated observations of node sets

Application to Infer PPIs

Malovannaya A et al. Analysis of the human endogenous coregulator complexome. Cell. 2011 May 27;145(5):787-99

Page 14: Network inference from repeated observations of node sets

PPI network

Page 15: Network inference from repeated observations of node sets

Validataion

• Compare inferred PPI network to the following databases: – BioCarta– HPRD PPIInnateDB– IntAct– KEGG– MINT mammalia– MIPS– BioGrid

Page 16: Network inference from repeated observations of node sets

Comparison

Page 17: Network inference from repeated observations of node sets

Validation

Page 18: Network inference from repeated observations of node sets

Validation

Page 19: Network inference from repeated observations of node sets

Application to stem cells• We used two types of high-throughput data from the ESCAPE

database (www.maayanlab.net/ESCAPE).• Chip X data: from Chip-Chip and Chip-seq experiments.

– 203,190 protein DNA binding interactions in the proximity of coding regions from 48 ESC-relevant source proteins.

• Logof followed by microarray data: A manually compiled database of Protein-mRNA regulatory interactions deriving from loss-of-function gain-of-function followed by microarray profiling.– 154,170 interactions from 16 ESC-relevant regulatory proteins from

loss-of-function studies, and 54 from gain-of-function studies.

Page 20: Network inference from repeated observations of node sets

Chip X network

Page 21: Network inference from repeated observations of node sets

Logof network

Page 22: Network inference from repeated observations of node sets

Combining networks

• Each data source gives a different perspective on the associations between the genes

• New insights may possibly be gained by combining the different perspectives. e.g. small but consistent associations across different perspectives will be revealed by the enhanced signal-to-noise ratio.

𝑝𝑖𝑗 = 1− ෑ� ቆ1− 2𝛼𝑛𝑖𝑗𝑘1ቇ𝑘1 ෑ� ቆ1− 2𝛽𝑛𝑖𝑗𝑘2ቇ𝑘2

ሾ … ሿሾ … ሿ…

Page 23: Network inference from repeated observations of node sets

Combination of Chip X and Logof

Page 24: Network inference from repeated observations of node sets

An extension of the approach...

Page 25: Network inference from repeated observations of node sets

Application II: Inference of Network of statistical relationships in AERS database

• Adverse Event Reporting System (AERS) database contains records of ....

AERS Record 1 Drug 1, Drug 2, ... Side-effect 1, Side-effect 2, ...AERS Record 2 Dug 3, Drug 4, ...Side-effect 3, Side effect 4, ...

… …

Page 26: Network inference from repeated observations of node sets

AERS sub network

Page 27: Network inference from repeated observations of node sets

AERS Large-scale Adjacency Matrix

Page 28: Network inference from repeated observations of node sets

And finally…

Page 29: Network inference from repeated observations of node sets

Summary• We described a general class of problem in network inference.• A network of physical interactions between proteins is

inferred based on high-throughput IP/MS experiments• The method has been applied to examine associations

between stem-cell genes from multiple perspectives• We have begun to apply the approach to the inference of

statistical interactions between drugs and side-effects based on the AERS database

• More details can be found on the website

�www.maayanlab.net/S2N