51
1 Bio-Trac 25 (Proteomics: Principles and Methods) Bio-Trac 25 (Proteomics: Principles and Methods) October 3, 2008 October 3, 2008 Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Research Associate Professor Research Associate Professor Protein Information Resource, Department of Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center Georgetown University Medical Center Tutorial: Bioinformatics Resources (http:// pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html )

Tutorial: Bioinformatics Resources

  • Upload
    eyal

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Tutorial: Bioinformatics Resources. ( http://pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html ). Bio-Trac 25 (Proteomics: Principles and Methods) October 3, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of - PowerPoint PPT Presentation

Citation preview

Page 1: Tutorial:  Bioinformatics Resources

1

Bio-Trac 25 (Proteomics: Principles and Methods)Bio-Trac 25 (Proteomics: Principles and Methods)

October 3, 2008October 3, 2008

Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Research Associate ProfessorResearch Associate ProfessorProtein Information Resource, Department of Protein Information Resource, Department of Biochemistry and Molecular & Cellular BiologyBiochemistry and Molecular & Cellular BiologyGeorgetown University Medical CenterGeorgetown University Medical Center

Tutorial: Bioinformatics Resources(http://pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html)

Page 2: Tutorial:  Bioinformatics Resources

2

computer + mouse = bioinformatics (information) (biology)

• NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

What is Bioinformatics?

Page 3: Tutorial:  Bioinformatics Resources

3

Molecular Biology Database Collection

(http://nar.oxfordjournals.org/cgi/content/full/36/suppl_1/D2)

1078 key databases of 14 categories

Page 4: Tutorial:  Bioinformatics Resources

4

Database Collection in Nucleic Acids Res.

Page 5: Tutorial:  Bioinformatics Resources

5

http://pir.georgetown.edu/pirwww/workshop/2005_database_update.html

Online Access to Database Collection

http://www.oxfordjournals.org/nar/database/cap/

2008

Page 6: Tutorial:  Bioinformatics Resources

6

Overview

I.I. Text search / Information retrievalText search / Information retrievalII.II. Sequence & genomics databasesSequence & genomics databasesIII.III. Protein family databasesProtein family databasesIV.IV. Databases of protein functionsDatabases of protein functionsV.V. Databases of protein structuresDatabases of protein structuresVI.VI. Proteomics databasesProteomics databases

Database Contents, Search and RetrievalDatabase Contents, Search and Retrieval

Lab sessionLab session

Page 7: Tutorial:  Bioinformatics Resources

7

Entrez Text Searches

(http://www.ncbi.nlm.nih.gov/Entrez/) Lab

Integrated one-stop search

Page 8: Tutorial:  Bioinformatics Resources

8

PubMed Literature Database(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed)

Lab

Literature mining

PMID:14640721

Page 9: Tutorial:  Bioinformatics Resources

9

iProLINK: Protein Literature Mining Resource

http://pir.georgetown.edu/iprolink/

Text mining for protein phosphorylation

Gene/protein name thesaurus: synonyms, ambiguous names…

Lab

RLIMS-P:

BioThesaurus:

Page 10: Tutorial:  Bioinformatics Resources

10

BioThesaurus: Gene/protein name searches - synonyms, ambiguous names…

http://pir.georgetown.edu/iprolink/biothesaurus

Synonyms: CRYAAcrystallin, alpha ACRYA1HSPB4…

Lab

Page 11: Tutorial:  Bioinformatics Resources

11

RLIMS-P: Text mining for protein phosphorylation

http://pir.georgetown.edu/iprolink/rlimsp/ Lab

Page 12: Tutorial:  Bioinformatics Resources

12

Google type search vs.

PIR Text Search (I)

Lab

((http://pir.georgetown.edu/http://pir.georgetown.edu/pirwww/search/pirwww/search/textsearch.htmltextsearch.html) )

Boolean searches: AND, OR, NOT

Page 13: Tutorial:  Bioinformatics Resources

13

PIR Text Search (II)

Search for synonyms

Lab

Search: alpha crystallin A chain that are in protein families?null = absent; not null = present

Page 14: Tutorial:  Bioinformatics Resources

14

PIR Text Search (III) Search: what crystallins are enzymes and what families they belong to?

Can you find which crystallins have 3D structure determined?

Lab

Argininosuccinate lyase (EC 4.3.2.1)

Page 15: Tutorial:  Bioinformatics Resources

15

Find proteins related to diabetes and with 3D-structure determined?

http://www.uniprot.org/ UniProt Text Search

Lab

Page 16: Tutorial:  Bioinformatics Resources

16

Search continues…

Lab

Page 17: Tutorial:  Bioinformatics Resources

17

I. Sequence & Genomics Databases

• NCBI Resources– GenBank: An annotated collection of all publicly available nucleotide and

protein sequences.– RefSeq: NCBI non-redundant set of reference sequences, including genomic

DNA, transcript (RNA), and protein products– Entrez Gene: Gene-centered information at NCBI.– UniGene: Unified clusters of ESTs and full-length mRNA sequences .– OMIM: Online Mendelian inheritance in man: a catalog of human genetic and

genomic disorders.• UniProt Consortium Database: Universal protein resource, a central

repository of protein sequence and function.• Model Organism Genome Databases: MGD, RGD, SGD, Flybase…• GeneCards: Integrated database of human genes, maps, proteins and

diseases.• SNP Consortium Database (dbSNP); International HapMap Project:

Genes associated with human diseases (http://www.oxfordjournals.org/nar/database/cap/)

Page 18: Tutorial:  Bioinformatics Resources

18

UniProt Consortium Databases

(http://www.uniprot.org) Universal Protein Resource

Since October 2002

New!

UUW6.6 million

Since July 2008

Page 19: Tutorial:  Bioinformatics Resources

19

UniProt Report (I)

Sections of the record

http://www.uniprot.org/uniprot/P02493

Lab

Entry View: Sequence & Annotation

Page 20: Tutorial:  Bioinformatics Resources

20

UniProt Report (II) – sequence and features

Lab

Page 21: Tutorial:  Bioinformatics Resources

21

UniProt Report (III) – UniRef90

http://www.uniprot.org/uniref/?query=member%3aP02493+identity:0.9

Page 23: Tutorial:  Bioinformatics Resources

23

OMIM: Online Mendelian inheritance in man

(http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)

Juvenile cataract of Down syndrome

Autosomal recessive congenital progressive

cataract

Page 24: Tutorial:  Bioinformatics Resources

24

II. Protein Family Databases• Whole Proteins

– PIRSF: Nonoverlapping Classification of Full Length Proteins Based on Evolutionary Relationship

– COG (Clusters of Orthologous Groups) of Complete Genomes– PANTHER: Proteins Classified into Families/Subfamilies of Shared Function– ProtoNet: Automatic Hierarchical Classification of Proteins

• Protein Domains– Pfam: Alignments and HMM Models of Protein Domains– SMART: Protein Domain Identification and Annotation– CDD: Conserved Domain Database

• Protein Motifs– PROSITE: Protein Patterns and Profiles– BLOCKS: Protein Sequence Motifs and Alignments– PRINTS: Compendium of Protein Fingerprints (a group of conserved motifs)

• Integrated Family Databases– InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF,

SuperFamily…

Page 25: Tutorial:  Bioinformatics Resources

25

Protein Clustering

COGs:COGs: ((http://www.ncbi.nlm.nih.gov/COG/))

Initial version

New version: Includes Eukaryotic Clusters - KOGs

Page 26: Tutorial:  Bioinformatics Resources

26

PIRSF: Full Length Classification

iProClass Family Report

(http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)

Lab

Page 28: Tutorial:  Bioinformatics Resources

28

Pfam Domain(http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00525)

Page 29: Tutorial:  Bioinformatics Resources

29

Protein Motifs: PROSITE – A database of protein families and domains. It consists of biologically significant sites, patterns and profiles.

(http://us.expasy.org/prosite/)

Page 30: Tutorial:  Bioinformatics Resources

30

Integrated Family Classification

InterProInterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html)

Mapping of families

Page 31: Tutorial:  Bioinformatics Resources

31

III. Databases of Protein Functions• Metabolic Pathways, Enzymes, and Compounds

– Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB)

– KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways– LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes– EcoCyc: Encyclopedia of E. coli Genes and Metabolism– MetaCyc: Metabolic Encyclopedia (Metabolic Pathways)– BRENDA: Enzyme Database– UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways

• Inter-Molecular Interactions and Regulatory Pathways– IntAct: Protein interaction data from literature and user submission– BIND: Descriptions of interactions, molecular complexes and pathways– DIP: Catalogs experimentally determined interactions between proteins – Reactome - A curated knowledgebase of biological pathways – BioCarta: Biological pathways of human and mouse– GO: Gene Ontology Consortium Database

• Pathway Resources - Pathguide

Page 32: Tutorial:  Bioinformatics Resources

32

Biological Pathway Resource Collectionhttp://www.pathguide.org/

• Protein-protein interactions • Metabolic pathways • Signaling pathways • Pathway diagrams • Transcription factors / gene

regulatory networks • Protein-compound interactions • Genetic interaction networks

Page 33: Tutorial:  Bioinformatics Resources

33http://www.pathwaycommons.org/pc/home.do

Pathway CommonsSearch across multiple pathway databases; common format for global analysis

Page 34: Tutorial:  Bioinformatics Resources

34

KEGG Metabolic & Regulatory Pathways

(http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00220+4.3.2.1)

KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)

Lab

Page 35: Tutorial:  Bioinformatics Resources

35

BioCyc: EcoCyc/MetaCyc Metabolic Pathways

The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)

Page 36: Tutorial:  Bioinformatics Resources

36

BioCarta Cellular Pathways(http://www.biocarta.com/index.asp)

Page 37: Tutorial:  Bioinformatics Resources

37

Reactome: http://www.reactome.org/ • Collaboration of CSHL, EBI and GO Consortium• Curated resource of core pathways and reactions in human biology• Authored by biological researchers of field experts• Cross-referenced with NCBI, Ensembl and UniProt, HapMap, KEGG…• Inferred orthologous events in 22 non-human species (mouse, rat…)

Page 38: Tutorial:  Bioinformatics Resources

38

Transforming Growth Factor (TGF) beta signaling [Homo sapiens]

Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol]Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens]Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol]Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleusObject -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……

(http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=170834&)

Reactome: events and objects (including modified forms and complex)

Page 39: Tutorial:  Bioinformatics Resources

39

Protein-Protein Interaction Database - IntAct(http://www.ebi.ac.uk/intact/)

Page 40: Tutorial:  Bioinformatics Resources

40

Gene Ontology (GO)

- Molecular Function - Biological Process - Cellular Component

(http://www.geneontology.org/)

Page 41: Tutorial:  Bioinformatics Resources

41

IV. Databases of Protein Structures• Protein Structure

– PDB: Structure Determined by X-ray Crystallography and NMR– PDBsum: Summaries and analyses of PDB structures – MMDB: NCBI’s database of 3D structures, part of NCBI Entrez– SWISS-MODEL Repository: Database of annotated protein 3D

models– ModBase: Annotated comparative protein structure models

• Structure Classification– CATH: Hierarchical Classification of Protein Domain Structures– SCOP: Familial and Structural Protein Relationships– FSSP: Protein Fold Classification Based on Structure--Structure

Alignment

Page 42: Tutorial:  Bioinformatics Resources

42

PDB: Experimental 3D Structure Repository

(http://www.rcsb.org/pdb/)

Rat gamma-Rat gamma-crystallin (chain A, crystallin (chain A, B.)B.)Can you do a text search at PIR to find this (CRGE_RAT)?

Lab

Page 43: Tutorial:  Bioinformatics Resources

43

PDBsum:Pictorial Database to Provide Summary and Analysis to PDB Entries

Search 3-D structure summary

2-D structure summary

(http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/)

Page 44: Tutorial:  Bioinformatics Resources

44

Protein Structural Classification (1)

CATH: Hierarchical domain classification of protein structures (http://www.cathdb.info/)

Page 45: Tutorial:  Bioinformatics Resources

45

Protein Structural Classification (2)

(http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)

SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.

Page 46: Tutorial:  Bioinformatics Resources

46

SWISS-MODEL Repository

A database of annotated three-dimensional A database of annotated three-dimensional comparative protein structure modelscomparative protein structure models (http://swissmodel.expasy.org/repository/smr.php?sptr_ac=CRBA1_MOUSE&job=2)

http://swissmodel.expasy.org/repository/http://swissmodel.expasy.org/

Page 47: Tutorial:  Bioinformatics Resources

47

VI. Proteomic Resources• GELBANK (GELBANK (http://gelbank.anl.gov): 2D-gel patterns of species with ): 2D-gel patterns of species with

completed genomes. completed genomes. • SWISS-2DPAGESWISS-2DPAGE ( (http://www.expasy.org/ch2d/): index of 2D-gels): index of 2D-gels• PEP (PEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire ): Predictions for Entire

Proteomes: summarized analyses of protein sequences Proteomes: summarized analyses of protein sequences • Integr8 (Integr8 (http://www.ebi.ac.uk/integr8/): A browser for information ): A browser for information

relating to completed genomes and proteomes, based on data relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome setscontained in Genome Reviews and the UniProt proteome sets

• PRIDE (PRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications ): PRoteomics IDEntifications database Expression Profiling databasesdatabase Expression Profiling databases

• GPMdb GPMdb ((http://gpmdb.thegpm.org/): Mass spec proteomics ): Mass spec proteomics DatabasesDatabases

• PeptideAtlas (http://www.peptideatlas.org/): compendium of peptides PeptideAtlas (http://www.peptideatlas.org/): compendium of peptides identified in a large set of tandem mass spectrometry proteomic identified in a large set of tandem mass spectrometry proteomic experimentsexperiments

• HUPO (http://www.hupo.org/): Human Proteome Organization to HUPO (http://www.hupo.org/): Human Proteome Organization to foste international proteomics initiatives.foste international proteomics initiatives.

Page 48: Tutorial:  Bioinformatics Resources

48

2D-Gel Image Databases

(http://us.expasy.org/swiss-2dpage/ac=P02489) Part of WORLD-2DPAGE: index to 2-D PAGE databases and services

(http://us.expasy.org/ch2d/)

Lab

Page 49: Tutorial:  Bioinformatics Resources

49

GPMdb: MS Data Search (http://gpmdb.thegpm.org/)

Craig, et al., J Proteome Res. 2004, 3:1234-42.

Page 50: Tutorial:  Bioinformatics Resources

50

PRIDE: centralized, standards compliant, public data repository for proteomics data

http://www.ebi.ac.uk/pride/

HUPO Plasma

Proteome Project

Page 51: Tutorial:  Bioinformatics Resources

51

Protein Examples

• Rabbit alpha crystallin A (UniProtKB: CRYAA_RABIT/P02493)

• Delta crystallin II (Argininosuccinate lyase) (UniProtKB: ARLY2_ANAPL/P24058)

• Any additional proteins of your interest for search and retrieval

Lab:

I.I. Text search / Information retrievalText search / Information retrieval1. Literature search and text mining

– Finding synonyms (BioThesaurus)Finding synonyms (BioThesaurus)– Information extraction (e.g., protein phosphorylation sites)Information extraction (e.g., protein phosphorylation sites)

2. Find the sequence for the rabbit alpha crystallin A chain3. Find all alpha crystallin A chain classified in protein families4. Search crystallins that have active enzyme activities5. Find crystallins that have determined 3D structures

II.II. Database contents (reports)Database contents (reports)1.1. Sequence & genomics databases (UniProt)Sequence & genomics databases (UniProt)2.2. Protein family databases (PIRSF)Protein family databases (PIRSF)3.3. Database of protein functions (KEGG)Database of protein functions (KEGG)4.4. Databases of protein structures (PDB)Databases of protein structures (PDB)5.5. Proteomics databases (Swiss-2D)Proteomics databases (Swiss-2D)