48
FATCHIYAH, M.KES. PH.D EMAIL: [email protected] BIOINFORMATIKA DAN BIOLOGI KOMPUTASI DALAM BIOLOGI MOLEKULER

FATCHIYAH, M.KES. PH.D EMAIL: [email protected]

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

F A T C H I Y A H , M . K E S . P H . D

E M A I L : F A T C H I Y A @ Y A H O O . C O . I D

BIOINFORMATIKA DAN BIOLOGI KOMPUTASI DALAM

BIOLOGI MOLEKULER

Page 2: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Introduction

The Human Genome Project

Challenges of Molecular Biology

The changing role of the Biologist in the Age of

Information

Bioinformatics software

Genomics

Impact on medicine

Page 3: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

What genes cause the condition? What are the normal function of the gene? What mutations have been linked to diseases? How does the mutation alter gene function?

What laboratories are performing DNA tests? Are there gene therapies or clinical trials?

What names are used to refer to the genes and the diseases?

What other conditions are linked to these same genes?

2/28/2011

3

fatchiyah JB UB Bioinformatic

Page 4: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

What is “bioinformatics”?Bioinformatics:

The use of computers to collect, analyze, and interpret biological information at the molecular level.

"The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information."

A set of software tools for molecular sequence analysis

2/28/2011

4

fatchiyah JB UB Bioinformatic

Page 5: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

NIH Working Definition

Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. http://www.bisti.nih.gov/CompuBioDef.pdf

fatchiyah JB UB Bioinformatic2/28/2011

5

Page 6: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Bioinformatics

Interdisciplinary approach Computer science, Mathematics & Statistics.

Molecular biology, Biochemistry & Medicine.

Page 7: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Scopes of bioinformatics Genomics Primarily sequences (DNA and RNA) Databanks and search algorithms Supports studies of molecular evolution (“Tree wars”)

Proteomics Sequences (Protein) and structures Mass spectrometry, X-ray crystallography Databanks, knowledge bases, visualization

Functional Genomics (transcriptomics) Microarray data Databanks, analysis tools, controlled terminologies

Systems Biology (metabolomics) Metabolites and interacting systems (interactomics) Graphs, visualization, modeling, networks of entities

DNA RNA Protein PhenotypePhenotype

2/28/2011

7

fatchiyah JB UB Bioinformatic

Page 8: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

What the function of bioinformatics analysis

1. Genome sequence: for the first time there is a blueprint of the activity of a cell

2. Gene expression, in the form of cDNA array, and proteomic studies:

how these genes interact, interfacing with each other, and how they form networks.

3. On structural level, the mechanism how these molecules work.

Major impact on diagnosis, treatment, drug discovery, regulation and metabolism, biodegradation

2/28/2011

8

fatchiyah JB UB Bioinformatic

Page 9: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

In SilicoIn Vivo

Analysis development

In Vitro

2/28/2011

9

fatchiyah JB UB Bioinformatic

Page 10: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

MESDAMESETMESSRSMYN

AMEISWALTERYALLKINCAL

LMEWALLYIPREFERDREVIL

MYSELFIMACENTERDIRATV

ANDYINTENNESSEEILIKENM

RANDDYNAMICSRPADNAPRI

MASERADCALCYCLINNDRKI

NASEMRPCALTRACTINKAR

KICIPCDPKIQDENVSDETAVS

WILLWINITALL

3D

structure

Cell

System Dynamics

Cell

Structures

Complexes

Sequence

Structural Scales

Organism

2/28/2011

10

fatchiyah JB UB Bioinformatic

Page 11: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Blue print of gene

Genome sequence: for the first time there is a blueprint of the activity of a cell

Gene expression, in the form of cDNA array, and proteomic studies: how these genes interact, interfacing with each other, and how they form networks.

On structural level, the mechanism how these molecules work.

Major impact on diagnosis, treatment, drug discovery, regulation and metabolism, biodegradation

Page 12: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Challenges in Computational Biology

February 28, 2011BBSI Summer School - Iowa State University

12

1. Obtain the genome of an organism.

2. Identify and annotate genes.

3. Find the sequences, three dimensional structures, and functions of proteins.

4. Find sequences of proteins that have desired three dimensional structures.

5. Compare DNA sequences and proteins sequences for similarity.

6. Study the evolution of sequences and species.

Page 13: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

http://gila.engr.uic.edu/bioinformatics/

Bioinformatics

Computational analysis of high-throughput biological data Whole genome sequencing.

Global genomic expression & profiling.

Functional genomics.

Structural genomics/proteomics

Comparative genomics.

Page 14: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011fatchiyah, JB-UB 14

In the mid-1990s, the GenBank database became part of the International Nucleotide Sequence Database Collaboration:

Internationally Networking Collaboration

NCBI investigators maintain on going collaborations with several institutes within NIH and also with numerous academic and government research laboratories

DDBJ Mishima,

Japan

GenBankNCBIUSA

EMBLEuropea

www.ncbi.nlm.nih.gov/

http://www.ebi.ac.uk/

Page 15: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011fatchiyah, JB-UB 15

Nucleotide

Protein

PubMed

The original version of Entrez had just 3 nodes: nucleotides, proteins, and PubMed abstracts.

Entrez has now grown to nearly 20 nodes

Page 16: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011fatchiyah, JB-UB 16

The future of genomic rests on the foundation of the Human Genome

Project

Page 17: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011fatchiyah, JB-UB 17

The Wellcome Trust

Free unrestricted access for all

The door to discovery is wide open

Genome browsers

Ensemblwww.ensembl.org

University of California Santa Cruzhttp://genome.cse.ucsc.edu

European Bioinformatics Instituteswww.ebi.ac.uk

MGD the Jackson Laboratorywww.informatics.jax.org

GenBankwww.ncbi.nlm.nih.gov

DNA Data Bank of Japanwww.ddbj.nig.ac.jp

Genome Databases

Page 18: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

I. The Human Genome Project

The genome sequence is complete - almost!

approximately 3.2 billion base pairs.

Page 19: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

DNA

Protein

Nucleotides

sequence

Gene expression = Protein production

Page 20: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

The Flow of Biotechnology Information

2/28/2011fatchiya

20

Gene > DNA sequenceAATTCATGAAAATCGTATACTGGTCTGGTACCGGCAACAC

TGAGAAAATGGCAGAGCTCATCGCTAAAGGTATCATCGAA

TCTGGTAAAGACGTCAACACCATCAACGTGTCTGACGTTA

ACATCGATGAACTGCTGAACGAAGATATCCTGATCCTGGG

TTGCTCTGCCATGGGCGATGAAGTTCTCGAGGAAAGCGAA

TTTGAACCGTTCATCGAAGAGATCTCTACCAAAATCTCTG

GTAAGAAGGTTGCGCTGTTCGGTTCTTACGGTTGGGGCGA

CGGTAAGTGGATGCGTGACTTCGAAGAACGTATGAACGGC

TACGGTTGCGTTGTTGTTGAGACCCCGCTGATCGTTCAGA

ACGAGCCGGACGAAGCTGAGCAGGACTGCATCGAATTTGG

TAAGAAGATCGCGAACATCTAGTAGA

> Protein sequence

MKIVYWSGTGNTEKMAELIAKGIIESGKDVNTINVSDVNIDELLNEDILILGCSAMGDEVLEESEFEPFIEEISTKISGKKVALFGSYGWGDGKWMRDFEERMNGYGCVVVETPLIVQNEPDEAEQDCIEFGKKIANI

Page 21: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Genome size

Organism Number of base pairs

X-174 virus 5,386

Epstein Bar Virus 172,282

Mycoplasma genitalium 580,000

Hemophilus Influenza 1.8 106

Yeast (S. Cerevisiae) 12.1 106

Human 3.2 109

Wheat 16 109

Lilium longiflorum 90 109

Salamander 100 109

Amoeba dubia 670 109

Page 22: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011fatchiya 22

Sanger Technique for Sequencing

Page 23: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

A Sequence print-out from a control sample

2/28/2011fatchiya

23

Page 24: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011fatchiya 24

Page 25: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

FASTA Format2/28/2011fatchiya

25

>identifier descriptive textnucleotide of amino-acid sequence on multiple lines if needed.

Example:>gi|41|emb|X63129.1|BTA1AT B. taurus mRNA for alpha-1-anti-trypsin

GACCAGCCCTGACCTAGGACAGTGAATCGATAATGGCACTCTCCATCACGCGGGGCCTTCTGCTGCTGGC ….

MOST important data format!!!

Page 26: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Modified FASTA Format

2/28/2011fatchiya

26

1) A few tools follow the convention that lower case sequences are masked. (repeat masker, some versions of blast, megablast, blastz)

2) A few analysis tools (like CLUSTAL) want a simplified identifier on the defline. So they can have a short string for the alignment.

>X63129.1

GACCAGCCCTGACCTAGGACAGTGAATCGATAATGGCACTCTC

CATCACGCGGGGCCTTCTGCTGCTGGC ….

Page 27: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011fatchiya 27

H5N1

Write specific name

of gene

Click & Go

Page 28: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011fatchiya 28GenBank Record-2

GeneBank Record

modification

date

Header

Molecule Type

GenBank Division

Modification Date

Version Number

Accession Number

Locus Name

Sequence Length

Page 29: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

GeneBank Record

2/28/2011fatchiya

29

Page 30: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Gene Sequence

2/28/2011fatchiya

30

Page 31: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Searching Biological Databases

February 28, 2011BBSI Summer School - Iowa State University

31

BLAST (Basic Local Alignment Search Tool)

http://www.ncbi.nlm.nih.gov

BLASTN (DNA)

BLASTP (Protein)

BLASTX (DNA against Protein)

PSI-BLAST (Position Specific Iterative BLAST)

Page 32: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Multiple Alignment Software

February 28, 2011BBSI Summer School - Iowa State University

32

Clustalw (http://www.ebi.ac.uk/clustalw)

MSA (http://softlib.rice.edu/softlib/msa.html)

HMMER (http://hmmer.wustl.edu/)

SAM (http://www.cse.ucsc.edu/research/ compbio/sam.html)

Page 33: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011 fatchiyah, JB-UB33

Biology Information on the Internet

Introduction to Databases

Searching the Internet for Biology Information.

General Search methods

Biology Web sites

Introduction to Genbank file format.

Introduction to Entrez and Pubmed

Ref: Chapters 1,2,5,6 of “Bioinformatics”

Page 34: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011fatchiyah, JB-UB 34

The Wellcome Trust

Free unrestricted access for all

The door to discovery is wide open

Genome browsers

Ensemblwww.ensembl.org

University of California Santa Cruzhttp://genome.cse.ucsc.edu

European Bioinformatics Instituteswww.ebi.ac.uk

MGD the Jackson Laboratorywww.informatics.jax.org

GenBankwww.ncbi.nlm.nih.gov

DNA Data Bank of Japanwww.ddbj.nig.ac.jp

Genome Databases

Page 35: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

2/28/2011 fatchiyah, JB-UB35

Refseq and LocusLink

Attempt to produce 1 mRNA, 1 protein, and 1 genomic gene for each frequently occuring allele of a protein expressing gene.

www.ncbi.nlm.nih.gov/LocusLink

Special non-genbank Accession numbers

NM_nnnnnn mRNA refseq

NP_nnnnnn protein refseq

NC_nnnnnn refseq genomic contig

NT_nnnnnn temporary genomic contig

NX_nnnnnn predicted gene

Page 36: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Ensembl Overviewsearching and browsing

Page 37: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Background

HGP draft sequence of human genome

jigsaw puzzle: assembling semi-accurate sequences coming from all over the world

Page 38: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Ensembl

EMBL-EBI and Sanger Institute

Automatic system track sequenced pieces (and their changes)

assemble into larger stretches

analyze to find genes

Page 39: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Ensembl Data

Annotation features identified on each DNA sequence

Examples genes: known genes or predicted by Ensembl

SNP (single nucleotide polymorphisms)

repeats (regions of simple repetitive seq.)

homologies (regions highly similar to other sequence in the public databases)

Page 40: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Data Access

A web-based genome browser which can be customized as required

A web-based system for data export and data mining

'Dumps' of sequence and other data sets for you to download

Direct access to the databases

A Perl-based object layer

Page 41: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Gene Prediction

finding genes GeneScan: a gene finding software

comparison with all known genes matches are considered supporting evidence

Page 42: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Ensembl Services

BLAST

Sequence browsing

Identifier search

Known gene names

OMIM diseases

Free text search of OMIM, SWISS-PROT, InterPro annotation

Page 46: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Contig View (Customization)

Page 47: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Marker View

Page 48: FATCHIYAH, M.KES. PH.D EMAIL: FATCHIYA@YAHOO.CO

Ensembl Gene Report