38
The CMBI: Bioinformatics Content Bioinformatics Bioinformatics@CMBI Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009 [email protected]

The CMBI: Bioinformatics Content Bioinformatics Bioinformatics@CMBI Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

  • View
    258

  • Download
    1

Embed Size (px)

Citation preview

Page 1: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

The CMBI: Bioinformatics

Content

Bioinformatics Bioinformatics@CMBI Bioinformatics tools & databases

Hanka VenselaarCMBI

UMC RadboudFebruary 2009

[email protected]

Page 2: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

2/37 ©CMBI 2009

What is bioinformatics?

• Bioinformatics is the use of computers in solving information problems in the life sciences

• You are "doing bioinformatics" when you use computers to store, retrieve, analyze or predict the sequence, function and/or structure of biomolecules.

Bioinformatics

Page 3: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

3/37 ©CMBI 2009

Human genome, great expectations

Data ≠ Knowledge, insight !!!

Bioinformatics

Page 4: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

4/37 ©CMBI 2009

Why do we need Bioinformatics?

Flood of biological data:

– DNA-sequences (genomes)– protein sequences and structures– gene expression profiles (transcriptomics)– cellular protein profiles (proteomics)– cellular metabolite profiles (metabolomics)

We want to :

– collect and store the data– integrate, analyze, compare and mine the data– predict genes, protein function and protein structure– predict physiology (models, mechanisms, pathways)– understand how a whole cell works

Bioinformatics

Page 5: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

5/37 ©CMBI 2009

A large fraction of the human genes has an unknown function

(Science, 2001)

Bioinformatics

Page 6: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

6/37 ©CMBI 2009

What is protein function?

Homology

Genomic context

Bioinformatics

Page 7: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

7/37 ©CMBI 2009

How can we predict function of proteins?

“similar sequence with known function. E.g. proteine kinase”“new, unknown

protein”

Extrapolate the function

Compare with database of proteinsBLAST

The importance of sequence similarity and sequence alignment

Similar sequences have:– A similar evolutionary origin– A similar function– A similar 3D structure

Bioinformatics

Page 8: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

8/37 ©CMBI 2009

CMBI - Centre for Molecular and Biomolecular Informatics

•Dutch national centre for computational molecular sciences research

•Research groups –Comparative Genomics (Huynen) –Bacterial Genomics (Siezen)–Computational Drug Design (De Vlieg)–Bioinformatics of Macromolecular Structures (Vriend)

•Training & Education –MSc, PhD and PostDoc programmes –International workshops–Hotel Bioinformatica–High school courses

•Computational facilities, databases, and software packages via (inter-)national service platforms (NBIC, EBI, etc)

•NBIC: National BioInformatics Centre.

Bioinformatics @CMBI

Page 9: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

9/37 ©CMBI 2009

Computational Drug Discovery (CDD) Group

• Head: Prof. Jacob de Vlieg

• Key goalDevelop molecular modeling and computer-based simulation techniques for structure-based drug design, translational medicine and protein family based approaches to design and identify drug-like compounds

• Key Research Fields– Structural bioinformatics for drug design– Bioinformatics for genomics (microarray analysis, text mining, etc)– Translational medicine informatics

Academic ResearchNew scientific approachesTraining & education

ApplicationsExciting real life problems

‘wet’ validation

CDD

Bridging academic research and applied genomics

Bioinformatics @CMBI

Page 10: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

10/37 ©CMBI 2009

Examples of CDD Projects

•Exploiting Structural Genomics Information To Incorporate Protein Flexibility In Drug Design

•Protein knowledge building through comparative genomics and data integration  •In silico studies on p63 as a new drug-target protein

Bioinformatics @CMBI

Page 11: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

11/37 ©CMBI 2009

International Computational Drug Discovery Course

• Course covers the entire research pipeline from genomics and proteomics in target discovery to Structure Based Drug Design and QSAR in drug optimization.

• Lectures and practicals

• 2 week course

• June/July 2009

• www.cmbi.ru.nl/ICDD2008

Bioinformatics @CMBI

Page 12: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

12/37 ©CMBI 2009

Bacterial Genomics Group

• Head: Prof Roland Siezen

• Research interest: Biological questions in the interest of Dutch Food Industry

• How can we improve:– fermentation – safety – health

• Micro-organisms studied: Gram-positive food bacteria:– lactic acid bacteria (Lactococcus, Lactobacillus)– spoilage bacteria (Listeria, Clostridium, Bacillus cereus)

listeria

lactococcus

Bioinformatics @CMBI

Page 13: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

13/37 ©CMBI 2009

Bacterial Genomics: from sequence to predicted function

Key research fields: – Genome sequencing and interpretation– Network reconstruction and analysis– Systems biology, dynamic modelling

Raw sequence data: 2 to 5 million nucleotides

AAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAACAAATCAAAGATCAAAAAAGGATAGAAGAACAAGAAAAACCACAAACACTTAGACAATCAATATAAAGATGAAGTGAACGCTCTTAAAGAGAAGTTGGAAAACTTGCAGGAA

A virtual cell: overview of predicted pathways

Bioinformatics @CMBI

Page 14: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

14/37 ©CMBI 2009

Bacterial Genomics: Example

Differential NF-κB pathways induction by Lactobacillus plantarum in the duodenum of healthy humans correlating with immune tolerance Peter van Baarlen et al., PNAS, Febr 3, 2009

Bioinformatics @CMBI

Page 15: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

15/37 ©CMBI 2009

Comparative Genomics Group

• Head: Prof. Martijn Huynen

• Research Focus: – How do the proteins encoded in genomes interact with each other to

produce cells and phenotypes ? – To predict such functional interactions between proteins as there exist

e.g. in metabolic pathways, signalling pathways or protein complexes

A genome is more than the sum of its genes ->

Use “genomic context” for function prediction

Types of genomic context:

Gene fusion/fissionChromosomal locationGene order/neighbourhoodCo-evolutionCo-expression

Bioinformatics @CMBI

Page 16: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

16/37 ©CMBI 2009

Turning data into knowledge

Research topics:• Develop computational genomics techniques that exploit the information in

sequenced genomes and functional genomics data• Make testable predictions about pathways and the functions of proteins

therein. • Evolution of the eukaryotic cell and in the origin and evolution of organelles

like the mitochondria and the peroxisomes

Education: • Comparative Genomics Course, 3 EC, April 2009

Comparative genomics

Prediction of protein function, pathways

Bioinformatics @CMBI

Page 17: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

17/37 ©CMBI 2009

Frataxin Example

• Frataxin is a well-known disease gene (Friedreich's ataxia) whose function has remained elusive despite more than six years of intensive experimental research.

• Using computational genomics we have shown that frataxin has co-evolved with hscA and hscB and is likely involved in iron-sulfur cluster assembly in conjunction with the co-chaperone HscB/JAC1.

Prediction Confirmation

Bioinformatics @CMBI

Page 18: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

18/37 ©CMBI 2009

Bioinformatics of macromolecular structures

•Head: Prof. Gert Vriend

•Research Focus: Understanding proteins (and their environment)

•Proteins are the core of life, they do all the work, and they give you feelings, contact with the outside world, etc.

•Proteins, therefore, are the most important molecules on earth.

•We want to understand life; why are we what we are, why do we do what we do, how come you can think what you think?

Bioinformatics @CMBI

Page 19: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

19/37 ©CMBI 2009

Bioinformatics of macromolecular structures

Research topics Vriend group

•Homology modeling technology and applications•Application of bioinformatics in medical research (Hanka Venselaar)•Structure validation and structure determination improvement•Molecular class specific information systems (e.g. GPCRDB & NucleaRDB)•Data mining•WHAT IF molecular modelling and visualization software

Bioinformatics @CMBI

Page 20: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

Hearing loss

Unknown structure

MGTPWRKRKGIAGPGLPDLSCALVLQPRAQVGTMSPAIALAFLPLVVTLLVRYRHYFRLLVRTVLLRSLRDCLSGLRIEERAFSYVLTHALPGDPGHILTTLDHWSSRCEYLSHMGPVKGQILMRLVEEKAPACVLELGTYCGYSTLLIARALPPGGRLLTVERDPRTAAVAEKLIRLAGFDEHMVELIVGSSEDVIPCLRTQYQLSRADLVLLAHRPRCYLRDLQLLEAHALLPAGATVLADHVLFPGAPRFLQYAKSCGRYRCRLHHTGLPDFPAIKDGIAQLTYAGPG

DFNB63:

Homology Modeling

Homology modeling:Prediction of 3D structure based upon a highly similar structure

Bioinformatics @CMBI

Page 21: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

21/37 ©CMBI 2009

Prediction of 3D structure based upon a highly similar structure

Add sidechains, Molecular Dynamics simulation on model

Unknown structure

NSDSECPLSHDG

NSDSECPLSHDG

|| || | ||

NSYPGCPSSYDG

Alignment of model and template sequenceKnown structure

Known structure

Back bone copiedCopy backbone and conserved

residues

Model!

Homology Modeling

Bioinformatics @CMBI

Page 22: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

Hearing loss

Structure!

MGTPWRKRKGIAGPGLPDLSCALVLQPRAQVGTMSPAIALAFLPLVVTLLVRYRHYFRLLVRTVLLRSLRDCLSGLRIEERAFSYVLTHALPGDPGHILTTLDHWSSRCEYLSHMGPVKGQILMRLVEEKAPACVLELGTYCGYSTLLIARALPPGGRLLTVERDPRTAAVAEKLIRLAGFDEHMVELIVGSSEDVIPCLRTQYQLSRADLVLLAHRPRCYLRDLQLLEAHALLPAGATVLADHVLFPGAPRFLQYAKSCGRYRCRLHHTGLPDFPAIKDGIAQLTYAGPG

DFNB63:

Homology Modeling

Bioinformatics @CMBI

Page 23: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

23/37 ©CMBI 2009

Saltbridge between Arginine andGlutamic acid is lost in both cases

• Arginine 81 -> Glutamic acid

• Glutamic acid 110 -> Lysine

Mutations:

Homology Modeling

Bioinformatics @CMBI

Page 24: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

24/37 ©CMBI 2009

Mutation:

• Tryptophan 105 -> Arginine

Hydrophobic contacts from the Tryptophan are lost, introduction of an hydrophilic and charged residue

Homology Modeling

Bioinformatics @CMBI

Page 25: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

25/37 ©CMBI 2009

The three mutated residues are all important for the correct positioning of Tyrosine 111

Tyrosine 111 is important for substrate binding

Ahmed et al., Mutations of LRTOMT, a fusion gene with alternative reading frames, cause nonsyndromic deafness in humans. Nat Genet. 2008 Nov;40(11):1335-40.

Interested? Contact Hanka Venselaar ([email protected])

Homology Modeling

Bioinformatics @CMBI

Page 26: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

26/37 ©CMBI 2009

Hotel Bioinformatica

Hotel functions

• Temporary housing, teaching and supervision of experimentalists for data analysis at the CMBI

• Centralization of UMC-wide bioinformaticians

• Shared (weekly) seminars of CMBI with ‘inhouse bioinformaticians’

• Collaboration/advice in acquiring grants with a Bioinformatics aspect

Interested? Contact Martijn Huynen ([email protected])

Bioinformatics @CMBI

Page 27: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

27/37 ©CMBI 2009

Bioinformatics data types

mRNA expression

profiles

MS data

Large amount of data

Growing very very fast

Heterogeneous data types

Bioinformatics Tools & Databases

Page 28: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

28/37 ©CMBI 2009

Biological Databases

• Information is the core of bioinformatics• Literally thousands of databases exist that are relevant for

biology, medicine, and/or chemistry

Content Database

protein sequences SwissProt

UniProt

trEMBL

nucleotide sequences EMBL

GenBank

DDBJ

structures (protein, DNA, RNA) Protein Data Bank (PDB)

Genomes EnsemblUCSC

Mutations OMIM

Patterns, Motifs PROSITE

Protein Domains InterPro

SMART

Pathways KEGG

Bioinformatics Tools & Databases

Page 29: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

29/37 ©CMBI 2009

Important records in SwissProt/UniProt (1)

Bioinformatics Tools & Databases

Page 30: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

30/37 ©CMBI 2009

Important records in SwissProt/UniProt (2)

Cross references

Direct hyperlinks to:• EMBL• PDB• OMIM, • InterPro• etc. etc.

Features

• post-translational modifications• signal peptides• binding sites,• enzyme active sites• domains, • disulfide bridges• etc. etc.

Bioinformatics Tools & Databases

Page 31: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

31/37 ©CMBI 2009

Protein Databank & Structure Visualization

• PDB structures have a unique identifier, the PDB Code:4 digits (often 1 digit & 3 letters, e.g. 1CRN).

• Download PDB structures, give correct file extension: 1CRN.pdb

• Structures from PDB can directly be visualized with:

1. Yasara (www.yasara.org)2. SwissPDBViewer (http://spdbv.vital-it.ch/)3. Protein Explorer (http://www.umass.edu/microbio/rasmol/)4. Cn3D (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml)

Bioinformatics Tools & Databases

Page 32: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

32/37 ©CMBI 2009

OMIM Database

OMIM - Online Mendelian Inheritance in Man

• a large, searchable, current database of human genes, genetic traits, and hereditary disorders

• contains information on all known mendelian disorders and over 12,000 genes

• focuses on the relationship between phenotype and genotype

Bioinformatics Tools & Databases

Page 33: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

33/37 ©CMBI 2009

Browsing genomes

UCSChttp://genome.ucsc.edu/Only eukaryotic genomes

NCBI

Ensemblhttp://www.ensembl.org/

Bioinformatics Tools & Databases

Page 34: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

34/37 ©CMBI 2009

Sequence Retrieval with MRS (1)

Google = Thé best generic search and retrieval system

MRS = Maarten’s Retrieval System (http://mrs.cmbi.ru.nl )

MRS is the Google of the biological database world

Search engine (like Google)Input/Query = word(s)

Output = entry/entries from database

Searching is very intuitive:– Select database(s) of choice– Formulate your query – Hit “Search”– The result is a “query set” or “hitlist” – Analyze the results

Bioinformatics Tools & Databases

Page 35: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

35/37 ©CMBI 2009

Sequence Retrieval with MRS (2)

Formulate query.But think about your query first!!

Select database

MRS hitlist

Bioinformatics Tools & Databases

Page 36: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

36/37 ©CMBI 2009

BLAST and CLUSTAL with MRS

Blast brings you to the MRS-page from which you can

do Blast searches.

Blast results brings you to the page where MRS stores your

Blast results of the current session.

Clustal brings you to the MRS page from which you can

do Clustal sequence alignments.

Bioinformatics Tools & Databases

Page 37: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

37/37 ©CMBI 2009

Your Exercise Today

The practicum: FAMILIAL VISCERAL AMYLOIDOSIS

Today for PhD studentsFriday (13:00) for MMD students

CMBI, Course room, ground floor NCMLS

You will study Lysozyme:

• Protein• Gene• Mutations causing familial visceral amyloidosis• 3D structure

HAVE FUN!!

Bioinformatics Tools & Databases

Page 38: The CMBI: Bioinformatics Content  Bioinformatics  Bioinformatics@CMBI  Bioinformatics tools & databases Hanka Venselaar CMBI UMC Radboud February 2009

The Practicum

You can find the practicum at http://swift.cmbi.ru.nl/teach/lyso/

38/37 ©CMBI 2009

Work with MRS

Work with Yasara

Read the text carefully

User login = c(your pc number) f.e c07

User password = t0psp0rt (with zero’s)

The program Yasara is on your desktop