62
Detecting Phenotype- Specific Interactions Between Biological Processes Nadeem A. Ansari Department of Computer Science Wayne State University Detroit, MI 48202 1

Detecting Phenotype-Specific Interactions Between Biological Processes

  • Upload
    lyn

  • View
    45

  • Download
    0

Embed Size (px)

DESCRIPTION

Detecting Phenotype-Specific Interactions Between Biological Processes. Nadeem A. Ansari. Department of Computer Science Wayne State University Detroit, MI 48202. Outline. Biological background Motivation and problem description Challenges and limitations Mathematical background - PowerPoint PPT Presentation

Citation preview

Page 1: Detecting Phenotype-Specific Interactions Between Biological Processes

Detecting Phenotype-Specific Interactions Between Biological

Processes

Nadeem A. Ansari

Department of Computer ScienceWayne State University

Detroit, MI 48202

1

Page 2: Detecting Phenotype-Specific Interactions Between Biological Processes

Outline

• Biological background• Motivation and problem description• Challenges and limitations• Mathematical background• Detecting changed interactions between

biological processes in a phenotype• Improvements• Results• Summary

2

Page 3: Detecting Phenotype-Specific Interactions Between Biological Processes

Outline

• Biological background• Motivation and problem description• Challenges and limitations• Mathematical background• Detecting changed interactions between

biological processes in a phenotype• Improvements• Results

3

Page 4: Detecting Phenotype-Specific Interactions Between Biological Processes

Cells, proteins, and DNA

• Cells: fundamental units of life that contain all the working machinery necessary for their functioning

• Proteins: the main contributors of this working machinery

• Deoxyribonucleic acid (DNA): contains the blueprint for making the working machinery

• Gene expression: the process of making the working machinery

4

Page 5: Detecting Phenotype-Specific Interactions Between Biological Processes

DNA

• Linear molecule of two strands; each composed of subunits called Nucleotides

• Nucleotide types: Adenine – ACytosine – CGuanine – GThymine – T

5

Page 6: Detecting Phenotype-Specific Interactions Between Biological Processes

DNA

6

• Base pairing:… A A C G G A T …… T T G C C T A …

Page 7: Detecting Phenotype-Specific Interactions Between Biological Processes

Transcription

• Information stored in DNA letters is transcribed into Ribonucleic acid (RNA)

• RNA: a chain of nucleotides - A, C, G, U (uracil)

7

… G T G C A T … DNA… C A C G U A … RNA

Page 8: Detecting Phenotype-Specific Interactions Between Biological Processes

Translation

8

• Information stored in RNA is translated into chains of amino acids - proteins

Page 9: Detecting Phenotype-Specific Interactions Between Biological Processes

Gene expression

• The process of making the working machinery of a cell.

9

Page 10: Detecting Phenotype-Specific Interactions Between Biological Processes

10

• Regions of DNA that are synthesized into functional RNA and proteins are known as genes

• An observable characteristic (or trait) of an organism caused by gene expression is known as a phenotype.

Page 11: Detecting Phenotype-Specific Interactions Between Biological Processes

Gene expression measurement – why?

• Various stimuli cause change in gene expression

• Change in expression level results in under or over production of working machinery– diseases / phenotypes

• All cells contain same DNA – express genes selectively

11

• Measuring gene expression can help us understand underlying biological phenomenon

Page 12: Detecting Phenotype-Specific Interactions Between Biological Processes

Gene expression measurements

• Typically researchers measure gene expression in two different tissues or cell samples

– Cells treated with a drug vs. untreated cells• Genes expressed differently than in a controlled

sample are called differentially expressed (DE) genes

• High throughput technologies like DNA microarrays measure expression levels of thousands of genes

12

Page 13: Detecting Phenotype-Specific Interactions Between Biological Processes

Genes and annotations• Functional characteristics of gene products are

stored in annotation databases like gene ontology• Gene Ontology (GO): a controlled and structured

vocabulary– Molecular functions, biological processes, and

cellular components• Structured as directed acyclic graphs (DAGs)

– Nodes represent terms– Edges represent relationships

• Parent-child relations (more than one parent)– Is-a, part-of, and regulates (negatively, positively)

13

Page 14: Detecting Phenotype-Specific Interactions Between Biological Processes

Biological processes – GO subset

• GO is a set of terms and their definitions organized in a structure that reflects their relationships

• GO also provides a set of annotations, describing what is known about each gene (products)

14

Page 15: Detecting Phenotype-Specific Interactions Between Biological Processes

Outline

• Biological background• Motivation and problem description• Goals, Challenges and limitations• Mathematical background• Detecting changed interactions between

biological processes in a phenotype• Improvements• Results

15

Page 16: Detecting Phenotype-Specific Interactions Between Biological Processes

Motivation and problem description

• Various stimuli cause differential gene expression, which results in the over and under production of proteins

• Over and under production of proteins can result in the expression of a disease and disease-specific phenotype

• Understanding genes behavior can help us understand diseases in ways never thought before – e.g. drug targets for curing diseases

16

Page 17: Detecting Phenotype-Specific Interactions Between Biological Processes

Motivation and problem description

• Current approaches look for the biological functions that are under or over represented in the phenotype-specific gene expression patterns

• However, life is complex and biological functions also interact

• These interactions change in a phenotype• Understanding changed interactions between

biological functions is important in understanding the underlying biological mechanism that resulted in the phenotype

17

Page 18: Detecting Phenotype-Specific Interactions Between Biological Processes

Outline

• Biological background• Motivation and problem description• Goals, Challenges and limitations• Mathematical background• Detecting changed interactions between

biological processes in a phenotype• Improvements• Results

18

Page 19: Detecting Phenotype-Specific Interactions Between Biological Processes

Goals

• Our goal is to detect the interactions between biological functions that have changed significantly in a given phenotype

• We detect these interactions between the biological processes from GO annotated with differentially expressed genes in a phenotype

19

Page 20: Detecting Phenotype-Specific Interactions Between Biological Processes

Challenges and limitations• There is no simple way to establish which

biological functions are important– No universally accepted statistical model exists

• Finding relationship between biological processes using mathematical models is challenging

• No known statistical model exists that detects changed interactions in a given phenotype

• Using GO annotations presents its own challenges

20

Page 21: Detecting Phenotype-Specific Interactions Between Biological Processes

Challenges and limitations• GO is incomplete and updated on continuous

basis– Missing information regarding gene annotations

• GO contains inconsistencies– New research may make previous annotations

obsolete• GO hierarchy poses challenge of dependencies

– Genes annotated with specific terms are assumed to be annotated with all the ascendants of the annotated term

21

Page 22: Detecting Phenotype-Specific Interactions Between Biological Processes

Outline

• Biological background• Motivation and problem description• Goals, Challenges and limitations• Mathematical background• Detecting changed interactions between

biological processes in a phenotype• Improvements• Results

22

Page 23: Detecting Phenotype-Specific Interactions Between Biological Processes

Information retrieval (IR)

• Problem: Given a query, find relevant documents from a collection

• Vector space model (VSM)– Represent document and keywords in a matrix

• Documents as columns with keywords as components – columns are document vectors

– Represent query as a (column) vector– Find document vectors closer to query vector

• Documents are relevant to query23

Page 24: Detecting Phenotype-Specific Interactions Between Biological Processes

Example – document retrieval

A Document collectionD1 How to bake bread without recipesD2 The classic art of Viennese pastryD3 Numerical recipes: the art of scientific

computingD4 Breads, pastries, pies, and cakes: quality

baking recipesD5 Pastry: a book of best French recipes

24Example taken from Berry et al., SIAM: Review 41, 2 (1999)

Page 25: Detecting Phenotype-Specific Interactions Between Biological Processes

Example – document retrievalA Document collectionD1 How to bake bread without recipes

D2 The classic art of Viennese pastry

D3 Numerical recipes: the art of scientific computing

D4 Breads, pastries, pies, and cakes: quality baking recipes

D5 Pastry: a book of best French recipes

25

T1 T2 T3 T4 T5 T6Terms bake recipe bread cake pastry pie

Page 26: Detecting Phenotype-Specific Interactions Between Biological Processes

Example – document retrievalA Document collectionD1 How to bake bread without

recipesD2 The classic art of Viennese

pastryD3 Numerical recipes: the art of

scientific computingD4 Breads, pastries, pies, and

cakes: quality baking recipesD5 Pastry: a book of best French

recipes

A D1 D2 D3 D4 D5

T1 1 0 0 1 0T2 1 0 1 1 1T3 1 0 0 1 0T4 0 0 0 1 0T5 0 1 0 1 1T6 0 0 0 1 0

26

T1 T2 T3 T4 T5 T6Terms bake recipe bread cake pastry pie

Term by document matrix

Page 27: Detecting Phenotype-Specific Interactions Between Biological Processes

Example (IR VSM)

A D1 D2 D3 D4 D5

T1 1 0 0 1 0T2 1 0 1 1 1T3 1 0 0 1 0T4 0 0 0 1 0T5 0 1 0 1 1T6 0 0 0 1 0

27

T1 T2 T3 T4 T5 T6Terms bake recipe bread cake pastry pieQuery 1 0 1 0 0 0

• User searching for documents related to “baking bread”

• Query vector:

TD )000111(1

• Document vector:

TQ )000101(

Page 28: Detecting Phenotype-Specific Interactions Between Biological Processes

Finding relevant (similar) documents

28

TtqqqQ )...( 21

j

jj DQ

DQDQsimilarity

T

),(

A D1 D2 … Dn

T1 a11 a12 … a1n

T2 a21 a22 … a2n

… … … … …Tm am1 am2 … amn

Ttjjjj aaaD )...( 21

222

21

2211 ...

mT

mjmjjjT

qqqQQQ

aqaqaqDQ

Page 29: Detecting Phenotype-Specific Interactions Between Biological Processes

Correlation

29

• Determines if two random variables vary together

• Linear correlation between X and Y:– Positive correlation - X increases as Y increases– Negative correlation - X decreases as Y increases– No linear correlation - no linear relationship

mm

m

YYXX

YYXX22 )()(

))((

YYXX

XYXYr

.

mxxxX ,...,, 21 myyyY ,...,, 21

(Pearson correlation coefficient)

Page 30: Detecting Phenotype-Specific Interactions Between Biological Processes

Pearson correlation coefficient – geometric interpretation

30

22 )()(

))((

YYXX

YYXXrXY

cTc

cccc

mmm

YX

yxyx

YyXxYyXxYYXX

mm

11

))(())(()()( 11

),( cccc

ccXY YXsimilarity

YXYXr

T

Page 31: Detecting Phenotype-Specific Interactions Between Biological Processes

Outline

• Biological background• Motivation and problem description• Goals, Challenges and limitations• Mathematical background• Detecting changed interactions between

biological processes in a phenotype• Improvements• Results

31

Page 32: Detecting Phenotype-Specific Interactions Between Biological Processes

Detecting interactions that have changed significantly in the phenotype• Represent differentially expressed genes, in a

phenotype, and their biological functions as a matrix – vector space model with biological processes as column vectors

• Find associations between pairs of biological processes

• Compare these associations with the corresponding associations in the absence of such phenotype

• Detect association that are significantly different in the phenotype 32

Page 33: Detecting Phenotype-Specific Interactions Between Biological Processes

Data inputs - genes and functions

• Reference genes and functions set (R)– M genes on a microarray– N GO terms annotated with M genes

• In a biological condition under study (E)– m < M differentially expressed (DE) genes– n <= N GO terms annotated with m DE genes

33

Page 34: Detecting Phenotype-Specific Interactions Between Biological Processes

Gene function matrix – reference data

34

GF f1 f2 … fN

g1 a11 a12 … a1N

g2 a21 a22 … a2N

… … … …

gM aM1 aM2 … aMN

Page 35: Detecting Phenotype-Specific Interactions Between Biological Processes

Gene function matrix – reference data

35

otherwisef term GO with

annotated is g geneIf i

0

1}{

jijNMR aGF

Example gene-function matrix

Page 36: Detecting Phenotype-Specific Interactions Between Biological Processes

Gene function matrix – experiment data

36

otherwisef term GO withannotated is g gene DEif i

0

1}{

jijnmE aGF

Example gene-function matrix

Page 37: Detecting Phenotype-Specific Interactions Between Biological Processes

Gene function matrix – reference and experiment Data

37

• Experiment gene-function matrix is subpart of reference gene-function matrix

Page 38: Detecting Phenotype-Specific Interactions Between Biological Processes

Challenges and limitations• GO is incomplete and updated on continuous

basis– Missing information regarding gene annotations

• GO contains inconsistencies– New research may make previous annotations

obsolete• GO hierarchy poses challenge of dependencies

– Genes annotated with specific terms are assumed to be annotated with all the ascendants of the annotated term

38

Page 39: Detecting Phenotype-Specific Interactions Between Biological Processes

Our approach to solve challenges

• Use singular value decomposition (SVD)• SVD can find missing relationships between genes

and annotations in the latent semantic space and also remove noise from data

• Noise: multiple words describing the same concepts

39

• SVD is a factorization of a matrix into three matrices consisting of singular vectors and singular values corresponding to the original matrix

Page 40: Detecting Phenotype-Specific Interactions Between Biological Processes

Singular value decomposition (SVD)

• Columns of matrix G (F) are left (right) singular vectors of GF

• S is a diagonal matrix of singular values si. – The values on the main diagonal are ordered in non-

increasing order and represent variability in data 40

• SVD of a GF matrix

Page 41: Detecting Phenotype-Specific Interactions Between Biological Processes

Matrix approximation – dimensionality reduction• An approximated matrix can be computed by

keeping only the first k largest singular values

41

• We select k that retains the desired data variance (say x%) using the equation:

Page 42: Detecting Phenotype-Specific Interactions Between Biological Processes

Approximated matrix – column view

42

• We approximate both reference and experiment matrices

• The approximated experiment gene-function matrix is not a sub-part of the approximated reference gene-function matrix

Page 43: Detecting Phenotype-Specific Interactions Between Biological Processes

Correlation Between Functions

• Indicates the strength and direction of a linear relationship between two biological processes

• Pearson correlation coefficient rfi,fj between a pair of functions fi and fj is computed as:

43

ji

jiff ff

ffr

T

ji

,

• Matrices (RRNxN and RE

nxn) of correlation coefficients are computed for reference and experiment data (respectively)

Page 44: Detecting Phenotype-Specific Interactions Between Biological Processes

Pair-wise Correlation Coefficients for Reference and Experiment data

• RRnxn contains the pair-wise correlation

coefficients between the first n functions in the absence of phenotype 44

=

Page 45: Detecting Phenotype-Specific Interactions Between Biological Processes

Fisher Z Transform – Correlation Coefficient To Z-values• Correlation coefficients from samples of large

population can be mapped to z values using Fisher z-transform, which approximates normal distribution

• For a correlation coefficient r, the Fisher z-transform Zr can be computed as:

45

• Compute ZRr from RR

NxN and ZEr from RE

nxn

Page 46: Detecting Phenotype-Specific Interactions Between Biological Processes

Detecting Changes Between Functional Interactions

• Hypothesis: Correlation between two biological processes in the given phenotype differs from the correlation in the reference data

46

Hypothesis Test statistic

Page 47: Detecting Phenotype-Specific Interactions Between Biological Processes

Outline

• Biological background• Motivation and problem description• Goals, Challenges and limitations• Mathematical background• Detecting changed interactions between

biological processes in a phenotype• Improvements• Results

47

Page 48: Detecting Phenotype-Specific Interactions Between Biological Processes

Improvements

• The dependencies between GO terms can somewhat be removed using weights in our matrix.

48

Page 49: Detecting Phenotype-Specific Interactions Between Biological Processes

Scheme 1-1

• This is a binary scheme and was discussed while describing our main method

49

otherwise term GO with

annotated is gene DEif

0

1}{

j

i

ijnmE f

gaGF

otherwise term GO with

annotated is geneIf

0

1}{

j

i

ijNMR f

gaGF

Page 50: Detecting Phenotype-Specific Interactions Between Biological Processes

Scheme 1-e

• ei is the normalized log-transformed fold-change measured for gene gi in the given condition

50

otherwisef term GO withannotated is g gene DEif i

0}{

j

i

ijnmE

eaGF

otherwise term GO with

annotated is geneIf

0

1}{

j

i

ijNMR f

gaGF

Page 51: Detecting Phenotype-Specific Interactions Between Biological Processes

Scheme IR 1-1

gb: Gene (annotation) bias – GO DAG relatediab: Inverse annotation bias – experiment related 51

otherwise term GO with

annotated is gene DEif

0

1}{

j

iij

ijnmE f

gwaGF

ijij iabgbw

jj f#

gb with annotated genesof

1

sannotation Total for sannotationof i

igiab #ln

Page 52: Detecting Phenotype-Specific Interactions Between Biological Processes

Scheme IR 1-e

52

and change-fold dtransforme-log normalized the is otherwise

term GO withannotated is gene DEif

i

j

iiji

ijnmE

e

fgwe

aGF

0}{

ijij iabgbw

otherwise term GO with

annotated is geneif

0

1}{

j

iij

ijNMR f

gwaGF

Page 53: Detecting Phenotype-Specific Interactions Between Biological Processes

Outline

• Biological background• Motivation and problem description• Goals, Challenges and limitations• Mathematical background• Detecting changed interactions between

biological processes in a phenotype• Improvements• Results

53

Page 54: Detecting Phenotype-Specific Interactions Between Biological Processes

Breast cancer data set

• Veer et al. (2002) found some differentially expressed genes in breast cancer– 24,000 reference genes on the microarray– 13,201 annotated biological processes from GO– 231 genes were found to be differentially

expressed– 246 annotated biological processes with the DE

genes• Since then no satisfactory prediction has been

made in this regard54

Page 55: Detecting Phenotype-Specific Interactions Between Biological Processes

Breast Cancer Data Set Results

A subset of predicted biological pairs with significant interaction change

Scheme GO Term 1 GO Term 2 p-value1-1, IR 1-e Proteolysis Positive regulation of

apoptosis.0001

1-1 Transcription DNA replication initiation .026

1-1 DNA repair Regulation of transcription, DNA-dependant

.033

IR 1-1 Vesicle-mediated transport

Transcription from RNA polymerase II promoter

.002

IR 1-1 DNA replication initiation

Phosphinositide-mediated signaling

.00001

55

Page 56: Detecting Phenotype-Specific Interactions Between Biological Processes

Breast Cancer Data Set Results Summary

Number of predicted biological pairs with significant interaction change

Scheme Cat. 1 Cat. 2 Cat. 3 Accuracy1-1 10 5 1 93.7%1-e 16 6 2 91.6%IR 1-1 9 7 2 88.8%IR 1-e 15 9 2 92.3%Total 50 27 7 91.6%

56

Cat. 1: Known interactions and trivialCat. 2: Known interactions and non-trivialCat. 3: Unknown

Page 57: Detecting Phenotype-Specific Interactions Between Biological Processes

Lung cancer data set

• Beer et al. (2002) found some differentially expressed genes in lung cancer– 5541 reference genes on the microarray– 2908 annotated biological processes from GO– 87 genes were found to be differentially expressed– 248 annotated biological processes with the DE

genes

57

Page 58: Detecting Phenotype-Specific Interactions Between Biological Processes

Lung Cancer Data Set Results Summary

Number of predicted biological pairs with significant interaction change

Scheme Cat. 1 Cat. 2 Cat. 3 Accuracy1-1 16 3 2 90.4%1-e 39 3 2 95.4%IR 1-1 29 2 0 100.0%IR 1-e 38 9 3 94.0%Total 122 17 7 95.21%

58

Page 59: Detecting Phenotype-Specific Interactions Between Biological Processes

Summary• Various stimuli cause differential gene expression,

which results in the expression of a disease and disease-specific phenotype

• Biological processes interact and their interaction change in a given phenotype

• We proposed methods to detect such significantly changed interactions in the observed phenotype

• We used vector space model, matrix approximation, and statistical hypothesis testing to find changed interactions between biological processes from GO

• Results showed 89% or more accuracy for our proposed methods

59

Page 60: Detecting Phenotype-Specific Interactions Between Biological Processes

References:

• Ansari, N. A., Bao, R., and Drăghici, S. Detecting phenotype-specific interactions between biological processes from microarray data and annotations. Bioinformatics, under revision.

• Drăghici, S. Data Analysis Tools for DNA Microarrays. Chapman and Hall/CRC Press, 203 (first print), 2006 (second print)

• Berry, M. W., Drmac, Z., and Jessup, R. E. Matrices, vectors spaces, and information retrieval. SIAM: Review 41, 2 (1999), 335-62

• Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., and Harshman, R. Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 6 (1990), 391-407

• Done, B., Khatri, P., Done, A., and Drăghici, S. Predicting novel human Gene Ontology annotations using semantic analysis. IEEE/ACM Transactions on CBB (2009)

60

Page 61: Detecting Phenotype-Specific Interactions Between Biological Processes

Special Thanks to

• Dr. Sorin Draghici

61

Page 62: Detecting Phenotype-Specific Interactions Between Biological Processes

Thank You

62