Http:// We got differentially expressed genes, now what ?

http://www.aitbiotech.com/images/microarray.jpghttp://www.pnas.org/content/104/51/20374/F4.large.jpg

We got differentially expressed genes, now what ?Find function, enriched, reduce false positive

From gene-lists to functional annotations

1

http://www.aitbiotech.com/images/microarray.jpg

http://www.aitbiotech.com/images/microarray.jpg

http://www.pnas.org/content/104/51/20374/F4.large.jpg

• Molecular Function = elemental activity/task– the tasks performed by individual gene products;

examples are carbohydrate binding and ATPase activity

• Biological Process = biological goal or objective– broad biological goals, such as dna repair or purine

metabolism, that are accomplished by ordered assemblies of molecular functions

• Cellular Component = location or complex– subcellular structures, locations, and macromolecular

complexes; examples include nucleus, telomere, and RNA polymerase II holoenzyme

The 3 Gene Ontologies

Modified from: http://anil.cchmc.org/Intro_FunGen_Feb2008_Jegga.ppt#287,33,Slide 33

2

Function (what) Process (why)

Drive a nail - into wood Carpentry

Drive stake - into soil Gardening

Smash a bug Pest Control

A performer’s juggling object Entertainment

Example: Gene = hammer

http://anil.cchmc.org/Intro_FunGen_Feb2008_Jegga.ppt#284,34,Slide 34

3

4

http://www.geneontology.org/




Known Disease Genes

Direct Interactions of Disease Genes

Mining human interactome

Which of these interactants are potential new candidates?

Indirect Interactions of Disease Genes

7

66

778

Prioritize candidate genes in the interacting partners of the disease-related genes

•Training sets: disease related genes

•Test sets: interacting partners of the training genes

http://anil.cchmc.org/Intro_FunGen_Feb2008_Jegga.ppt#337,47,Slide 47

5

Database

Panther

ToppGene

STRING

GOTM

Onto-Tools

TF networks (P.A.I.N.T)

http://www.pantherdb.org

6A Small example of post-microarray analysis tools:

http://www.pantherdb.org/


http://toppgene.cchmc.org/

http://bioinfo.vanderbilt.edu/gotm

http://www.dbi.tju.edu/dbi/tools/paint/index.php

http://vortex.cs.wayne.edu/OntoToolsUsers.htm

PANTHER™ Protein Classification System

7

http://www.pantherdb.org

WHAT CAN I DO ON THE PANTHER SITE?

Protein ANalysis Through Evolutionary RelationshipsGoal: The PANTHER site was designed to facilitate functional analysis of large numbers of genes, proteins or transcripts.

Tools:

• Explore protein families functionality, molecular functions, biological processes and pathways.

• Generate lists of genes, proteins or transcripts that belong to a given protein family or subfamily, have a given molecular function or participate in a given biological process or pathway, e.g. generate a candidate gene list for a disease.

• Analyze lists of genes in a batch mode, proteins or transcripts according to categories based on family, molecular function, biological process or pathway, e.g. analyze mRNA microarray data.

8


http://nar.oxfordjournals.org/cgi/content/full/31/1/334http://genome.cshlp.org/content/13/9/2129.fullhttp://nar.oxfordjournals.org/cgi/content/full/33/suppl_1/D284http://nar.oxfordjournals.org/cgi/content/full/35/suppl_1/D247

9

http://nar.oxfordjournals.org/cgi/content/full/31/1/334

http://genome.cshlp.org/content/13/9/2129.full

http://nar.oxfordjournals.org/cgi/content/full/33/suppl_1/D284

http://nar.oxfordjournals.org/cgi/content/full/35/suppl_1/D247

http://www.pantherdb.org/sitemap.jsp

Single gene search

Batch gene search

10



11

1788_S_AT36651_AT41788_I_AT35595_AT36285_AT39586_AT35160_AT39424_AT

USP1DDR1WNT10BPRKAR1BMLLCD44GNA13MMP15IER3

http://david.abcc.ncifcrf.gov/tools.jsp

Convert Gene list ID Affy ID Gene symbol



12

http://david.abcc.ncifcrf.gov/tools.jspPaste the AffyID listSelect AFFY_ID as ID typeSelect List type: Gene ListSubmit list

Select HOMO SAPIENS as species, press the select buttonChoose the Gene ID Conversion ToolSelect: GENE_SYMBOL, submit and download the results



13

Perform Panther Batch Search:

Copy the gene symbol list and paste into the Batch search in Pantherhttp://www.pantherdb.org/ => Batch SearchSelect upload ID type: Gene SymbolSelect File Type: ID listResult page: GenesSelect 1 datasets: NCBI: H. sapiens Press the Search buttonPress in the and select: Biological process

Panther Export Options

14

Click on either Pie slices or Bars to get sub-functions.Click on links to get gene lists for the chosen function.

http://www.pantherdb.org/genes/

15

Other Panther Options




http://www.pantherdb.org/panther/ontologies.jsp

Task: find genes in a specific ontology (or in a few ontologies)

Panther vs GO molecular function and biological process

Browse for genes in ontologies

16




Search PANTHER Pathway

http://www.pantherdb.org/pathway/

Add legend to pathway

17





Compare classifications of multiple clusters of lists to a reference list to statistically determine over- or under- representation of PANTHER classification categories. Each list is compared to the reference list using the binomial test (Cho & Campbell, TIGs 2000) for each molecular function, biological process, or pathway term in PANTHER.

Map the genes in a gene expression data file to a PANTHER ontology. For pathways, you can then view the gene expression values overlaid on top of a pathway diagram, where genes are colored according to the expression value.

http://www.pantherdb.org/tools/

Gene expression tools

18


http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TCY-4152YS1-F&_coverDate=09%2F01%2F2000&_alid=250320192&_rdoc=1&_fmt=&_orig=search&_qd=1&_cdi=5183&_sort=d&view=c&_acct=C000053769&_version=1&_urlVersion=0&_userid=1567505&md5=0b6b00d9b59792e4449350f5e42ca771







19

optional

defaultPlay with graphics

- GRAPHIC RESULTSOther Panther Options

http://toppgene.cchmc.org/http://toppgene.cchmc.org/help/help.jsp

Portal for (i) gene list functional enrichment(ii) Candidate gene prioritization using either functional

annotations or network analysis(iii) identification and prioritization of novel disease candidate

genes in the interactome.

20




http://toppgene.cchmc.org/help/help.jsp




http://nar.oxfordjournals.org/cgi/reprint/gkp427v1 Hypergeometric distribution with Bonferroni correction

21

http://nar.oxfordjournals.org/cgi/reprint/gkp427v1

22

http://stattrek.com/Tables/Hypergeometric.aspx

What is a hypergeometric experiment?

A hypergeometric experiment has the following characteristics:Population size N, out of which M items are success.The researcher randomly selects a subset of n items from a population. Question: what is the probability that k selected item are success ?

What is a hypergeometric distribution?

A hypergeometric distribution is a probability distribution. It refers to the probabilities associated with the number of successes in a hypergeometric experiment. Example:We have a pack of 52 cards (26 black, success). We randomly select 12 cards out of 52. What is the probability of having 7 successes (black) ? (0.21)

Hypergeometric calculator results

Hypergeometric calculator:

Just 2 clarification slides….



Statistical Corrections

http://cbi.labri.fr/outils/BlastSets/BlastSets_web_manual/principles.html

In many analysis of biological experiments, a great number of false positives are found among the results. When making multiple comparisons, we need to apply a statistical correction to our threshold, to remove the maximum of false positives.

Commonly available statistical corrections:

23

Method Complexity Time Method Results Drawback

Bonferroni correction

simplest fastest Most conservative keeping only the most significant results, removing every possible noise, or putative results.

a lot of significant information is removed along with the noise

False Discovery Rate (FDR)

Less conservative a good compromise between keeping only really significant hits, and having too much false positives.

Some false positives…

When detecting differentially expressed genes, we want to detect ONLY the differentially expressed, with no false positives !

24

25

Example:

Go to ToppGene web-page: http://toppgene.cchmc.org/

Choose ToppFun link

Copy the gene symbol list and paste into the provided box, make sure that entry

name is HGNC symbol, press the Submit Query button.

Go to bottom of page, choose FDR correction method to all features, and submit.

Observe details of the results, each at a time.

Example: a. Using ToppFun for gene list enrichment analysis :Construct a gene list enrichment analysis on obesity-associated genes

26

27

28

b. Using ToppGene for disease gene prioritization based on functional similarity to training set genesQuery: To rank or prioritize a list of genes (test set) by functional annotation similarity to training set.

29

Calculates score and p-value for the genes and functions.

c. Using ToppNet for disease gene prioritization based on topological features in protein-protein interactions network (PPIN)Query: To rank or prioritize a list of genes (test set) based on topological features in PPIN.

30

31

d. Using ToppGenet to identify and prioritize the neighboring genes of the "seeds" or training set in protein-protein interactions network (PPIN)Query: To rank or prioritize a list of genes in the interactome of training set genes using either functional similarity (ToppGene) or PPIN analysis (ToppNet).

Create network by functional similarity (ToppGene) or network analysis (ToppNet). Distance to seeds: 1, the test set comprises all genes that are immediate interactants of the training set genes.purple nodes are the training set or seed genes.grey nodes are the interactants from the test set. The green nodes (subset of the grey ones) are the top ranked ones from the test set genes.

32

STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) (functional connectivity within a proteome)

http://string-db.org/

STRING is a database and web resource dedicated to protein–protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set of genomes and proteins.

Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms

Databases:MINT, HPRD, BIND, DIP, BioGRID, KEGG and Reactome, IntAct, EcoCyc , NCI-Nature Pathway Interaction Database and Gene Ontology (GO) protein complexes. SGD, OMIM , The Interactive Fly, and all abstracts from PubMed

33

A shift of focus to system biology in the “post-genomic” era

34

http://bioinfo.vanderbilt.edu/gotm/

35

http://bioinfo.vanderbilt.edu/gotm/GOTM_Manual.pdf

Bar graphPathway details

Input details Pathway gene details(all genes in pathway)

36

The apoptosis pathway as described by KEGG

Underexpressed genesOverexpressed genes

37

http://www.dbi.tju.edu/dbi/tools/paint/

38

TF networks (P.A.I.N.T)TF networks (P.A.I.N.T)

SUSPECTS is a server designed to automate the first steps of the candidate gene approach. http://www.genetics.med.ed.ac.uk/suspects/search.shtml

BRCA1

The 3D boxes represent genes. Higher, brighter boxes represent better (higher scoring) candidates. The width of a box corresponds to the number of different types of evidence that contribute to its score. If a box is blue then a potentially relevant PubMed abstract has been found.

39

http://www.genetics.med.ed.ac.uk/prospectr/

BRCA1:

PROSPECTR uses sequence features to rank genes in order of their likelihood of involvement in disease;

40

Documents

Http:// We got differentially expressed genes, now what ?