Introduction to Bioinformatics · biopotato bioinformatics Introduction to Bioinformatics. 8 What...

Preview:

Citation preview

1

Introduction to Bioinformatics

Dr. rer. nat. Jing Gong

Cancer Research center

Medicine School of Shandong University

2011.9.14

Introduction to Bioinformatics

2

Chapter 1

Introduction

Introduction to Bioinformatics

3

About me• Dr. rer. nat. Jing Gong• Bachelor Degree in Marine Biology at the China

Ocean University (former Qingdao Ocean University)

• Bachelor, Master & Doctoral Degree in Bioinformatics at the Ludwig MaximiliansUniversität München, Germany

• Affiliation: Cancer Research Center of SDU• Tel: 0531-88380202• Email: gongjing@sdu.edu.cn• Office: Dianjing Building, Rm.106, Baotuquan

Campus

Introduction to Bioinformatics

4

About this course• Schedule: 2011/9/14 - 2011/10/12, Mi. 14:00 - 18:00• Locus: 8#, first floor, west, Computer Pool • Homepage: http://1.51.212.243/bioinfo.html

• Table of Contents

Chapter 1 : Introduction Chapter 2 : Databases

Chapter 5 : Tree

Chapter 3 : Alignment Chapter 4 : Structure

My name is Lampy.

Introduction to Bioinformatics

5

Literatures:1. Bioinformatics - An Introduction, 2nd Edition, Jeremy Ramsden, 2009, Springer. 2. Bioinformatics For Dummies, 2nd Edition, Jean-Michel Claverie, Cedric Notredame, 2007, Wiley.

Introduction to Bioinformatics

6

Information Page Vocabulary ListInformation Page

Chapter 1, 2011/9/14

Dr. rer. nat. Jing GongAffiliation: Cancer Research Center of SDUTel: 0531-88380202Email: gongjing@sdu.edu.cnOffice: Dianjing Building, Rm.106, BaotuquanCampus

Schedule: 2011/9/14 - 2011/10/12, Mi. 14:00 - 18:00Place: 8#, first floor, west, Computer Pool

Course Homepage: http://1.51.212.243/bioinfo.html

Pubmed: http://www.ncbi.nlm.nih.gov/entrez/

ExPASy: http://expasy.org/

NCBI: http://www.ncbi.nlm.nih.gov/

PRI: http://pir.georgetown.edu

FASTA

FASTA (prounced FAST-Aye) stands forFAST-ALL, reflecting the fact that it canbe used for a fast protein ……

BLAST

Basic Local Alignment Search Tool. A sequence comparison algorithm optimized for speed used to search sequence dtabases ……

Alignment

The result of a comparison of two or more gene or protein sequences in order to determine their degree of base or amino acid…….

FASTA

FASTA (读作FAST-Aye) 代表FAST-ALL, 反映的实施是他能够用于快速的蛋白质比对或者快组的核苷比对。该程序……

BLAST

基本局部比对搜索工具。以速度优化算法为核心,搜索序列数

据库得到 佳局部比对结果。用替代矩阵和查新序列……

比对

两个甚至更多的基因或者蛋白质序列进行比较的结果,用以计算他们碱基或者氨基酸的相似度。序列比对用来决定两个甚至…….

Vocabulary

Chapter 1, 2011/9/14

Introduction to Bioinformatics

7

What is Bioinformatics?

biochemistry

biometrics

biophysics biohazards

biomathematics

bioterrorism

biopotato bioinformatics

Introduction to Bioinformatics

8

What is Bioinformatics? Interdisciplinary

a biology/medical researchers, just like you

a professional in the pharmaceutical industry

a policeman worrying about DNA testing

a computer scientist developing bio-databases

a consumer concerned about GMOs (Genetically Modified Organisms)

… …

Introduction to Bioinformatics

9

What is Bioinformatics?Definition:Bioinformatics – the science of collecting and analyzing complex biological data such as genetic codes. [Oxford Dictionary]

Bioinformatics – the computational branch of molecular biology. [Bioinformatics for Dummies]

Bioinformatics – the application of computer science and information technology to the field of biology and medicine. [Wikipedia]

Bioinformatics – the science of how information is generated, transmitted, received, and interpreted in biological systems, i.e. the application of information science to biology. [Bioinformatics-An Introduction]

A formel definition ?

Introduction to Bioinformatics

10

History of BioinformaticsIn 1809, French biologist Jean Baptiste Lamarck published “PhilosophieZoologique”. Lamarck stressed two main themes in his biological work:

1. The environment gives rise to changes in animals, i.e. changes through use and disuse.

2. Life was structured in an orderly manner and that many different parts of all bodies make it possible for the organic movements of animals.

“blind as a mole” “show your teeth” “birds have no teeth?” Jean Baptiste Lamarck (1744-1829)

Introduction to Bioinformatics

11

In 1859, English naturalist Charles Darwin published “On the Origin of Species by Means of Natural Selection, or the Preservation of FavouredRaces in the Struggle for Life”.

Charles Darwin (1809-1882)

History of Bioinformatics

Introduction to Bioinformatics

12

Gregor J. Mendel (1822-1884)

In 1866, Austrian scientist GregorMendel demonstrated that the inheritance of certain traits in pea plants follows particular patterns, now referred to as the laws of “Mendelian Inheritance”.

History of Bioinformatics

Introduction to Bioinformatics

13

Friedrich Miescher(1844-1895)

History of BioinformaticsIn 1869, Swiss physician and biologist Friedrich Miescher isolated DNA from the white blood cells at Felix Hoppe-Seyler's laboratory at the University of Tübingen, Germany.

Nuclei Nuclein Nucleic acid DNA

Introduction to Bioinformatics

14

Thomas Hunt Morgan, American geneticist, famous for his experimental research with the fruit fly by which he established the chromosome theory of heredity. He showed that genes are linked in a series on chromosomes and are responsible for identifiable, hereditary traits. Morgan’s work played a key role in establishing the field of genetics. He received the Nobel Prize for Physiology or Medicine in 1933.

Thomas H. Morgen(1866-1945)

nobel prize 1933

History of Bioinformatics

Introduction to Bioinformatics

15

In 1944, American physician and medical researcher Oswald Avery and his co-workers Colin MacLeod and Maclyn McCarty demonstrated that DNA is the material of which genes and chromosomes are made.

In his experiment he destroyed the lipids, ribonucleic acids, carbohydrates, and proteins. Transformation still occurred after this. Next he destroyed the deoxyribonucleic acid. Transformation did not occur.

Oswald Avery Colin MacLeod Maclyn McCarty(1877-1955) (1909-1972) (1911-2005)

History of Bioinformatics

Introduction to Bioinformatics

16

In 1950, American biochemist Erwin Chargaff noticed a pattern in the amounts of the four bases: adenine (A) , thymine (T) , cytosine (C) , guanine (G). He discovered that the amounts of adenine (A) and thymine (T) in DNA were roughly the same, as were the amounts ofcytosine (C) and guanine (G). This later became known as Chargaff's rule.

Erwin Chargaff (1905-2002)

History of Bioinformatics

%A = %T and %G = %C

Introduction to Bioinformatics

17

In 1953, James D. Watson and Francis Cricksuggested the first correct double-helix model of DNA structure in the journal Nature. Their double-helix model of DNA was based on a single X-ray diffraction image taken by Rosalind Franklin andMaurice Wilkins in 1952.

Rosalind Franklin(1920-1958)

James Waston(1928-)

nobel prize 1962

Francis Crick (1916-2004)

nobel prize 1962

Maurice Wilkins (1916-2004)

nobel prize 1962

History of Bioinformatics

Introduction to Bioinformatics

18

The sequence of 77 nucleotides of a yeast alanine tRNA was found by an American biochemist Robert W. Holley in 1965. Holley was awarded the 1968 Nobel Prize in Physiology or Medicine for describing the structure of this tRNA, linking DNA and protein synthesis.

History of Bioinformatics

Robert W. Holley (1922-1993)

nobel prize 1968

Introduction to Bioinformatics

19

Frederick Sanger (1918-)

nobel prize 1980

In 1977, Frederick Sanger and Colleagues introduced the “dideoxy” chain-termination method for sequencing DNA molecules, also known as the “Sanger method”. Hence, in 1980, he shared Nobel Prize in chemistry with Walter Gilbert.

Walter Gilbert(1932-)

nobel prize 1980

History of Bioinformatics

The key principle of the Sanger method was the use of dideoxynucleotide triphosphates (ddNTPs), as DNA chain terminators.

Introduction to Bioinformatics

20

Read protein sequence directly in the DNA sequence!

Central dogma of molecular biology was first articulated by Francis Crick in 1958 and re-stated in a Nature paper published in 1970. Francis Crick

(1916-2004)

History of Bioinformatics

Introduction to Bioinformatics

21

Marshall Warren Nirenberg shared a Nobel Prize in Physiology or Medicine in 1968 with Har Gobind Khorana and Robert W. Holley for "breaking the genetic code" and describing how it operates in protein synthesis.

Marshall Warren Nirenberg

(1927-2010)nobel prize 1968

Har GobindKhorana (1922-)nobel prize 1968

Robert W. Holley (1922-1993)

nobel prize 1968

History of Bioinformatics

Robert W. Holley (1922-1993)

nobel prize 1968

Robert W. Holley (1922-1993)

nobel prize 1968

Introduction to Bioinformatics

22

Introduction to BioinformaticsEnglish Courses for Graduate Students

Amino acids are the building blocks of protein.

Amino acids are made of carbon, hydrogen, oxygen, nitrogen, and sulfur atoms.

A protein = C1200H2400O600N300S100

Protein is a nutrient needed by the human body for growth and maintenance.

History of Bioinformatics

23V

Y

W

T

S

P

F

M

K

L

I

H

G

E

Q

C

D

N

R

A

1-letter

ValineVal20

TyrosineTyr19

TrytophanTrp18

ThreonineThr17

SerineSer16

ProlinePro15

PhenylalaninePhe14

MethionineMet13

LysineLys12

LeucineLeu11

IsoleucineIle10

HistindineHis9

GlycineGly8

Glutamic acidGlu7

GlutamineGln6

CysteineCys5

Aspartic acidAsp4

AsparagineAsn3

ArginineArg2

AlanineAla1

Nmae3-letter# A given type of protein always contains the same number of total amino acids in the same proportion.

Amino acids are linked together as a chain. The first amino acid sequence of a protein, Insulin, was determined in 1951 by Dr. Sanger.insulin = MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

insulin = (30 glycines + 44 alanines + 5 tyrosines + 14 glutamines + . . .)

Frederick Sanger (1918-)nobel prize 1958

Introduction to BioinformaticsEnglish Courses for Graduate Students

Amino acids are linked together as a chain. The first amino acid sequence of a protein, Insulin, was determined in 1951 by Dr. Sanger.insulin = MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN

Frederick Sanger (1918-)nobel prize 1958

History of Bioinformatics

24

Protein Sequence: MAVLD

The first 3D structure of a protein was determined in 1958 by Drs. Kendrewand Perutz, using the complicated technique of X-ray crystallography. Max Ferdinand

Perutz (1914-2002) nobel prize 1962

John CowderyKendrew (1917-1997)

nobel prize 1962

What Bioinformatics Can Do for You?Analyzing Protein Sequences

Introduction to BioinformaticsEnglish Courses for Graduate Students

25

In 1956, Symposium on Information Theory in Biology (Gatlinburg, USA).

In 1979, GenBank was established at Los Alamos National Laboratory (USA).

In 1982, nucleotide sequence database of European Molecular Biology Laboratory (EMBL) was created (Europe).

In 1986, DNA Data Bank of Japan (DDBJ) began data bank activities at NIG (Japan).

in the early 1990s, International Nucleotide Sequence Database Collaboration (INSDC) was founded in cooperation of Genebank/EMBL/DDBJ.

In 1987, a Chinese-American scientist LIN Hua-an first created the word “bioinformatics”. At the very beginning, he created the word “compbio”, then “bioinformatique”, and then “bio-informatics”. But at that time, the email title did not support the hyphen symbol, thus “bioinformatics” was born.

Since at least the late 1980s, the term “bioinformatics” has been primary used in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing.

Introduction to BioinformaticsEnglish Courses for Graduate Students

History of Bioinformatics

26

Introduction to BioinformaticsEnglish Courses for Graduate Students

History of Bioinformatics

27

Publicly funded project: Privately funded project

James D. Watson & Francis Collins President Clinton (2000) Craig Venter

1990 began, $3-billion 1998 began, $300-million

patented

feely available

2000 90%

2001 99%

2003 finished

2000 90%

2001 99%

2003 finished

Introduction to BioinformaticsEnglish Courses for Graduate Students

History of Bioinformatics

28

Introduction to BioinformaticsEnglish Courses for Graduate Students

History of Bioinformatics

29

AB SOLiDTM

4.0 SystemX 27

Illumina HiSeq 2000X 137

Beijing

Shanghai

Shenzhen

Introduction to BioinformaticsEnglish Courses for Graduate Students

History of Bioinformatics

30

What Bioinformatics Can Do for You?

Analyzing DNAs

Analyzing RNAs

Analyzing Proteins

Others: Pathway, Bioimaging, etc.

Introduction to BioinformaticsEnglish Courses for Graduate Students

31

1. Read the DNA sequence:ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG

2. Decompose it into successive triplets:ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G . . .

3. Translate each triplet into the corresponding amino acid:M E V F K A P P I G I STOP

What Bioinformatics Can Do for You?Analyzing DNAs

Introduction to BioinformaticsEnglish Courses for Graduate Students

32

ATGGAAGTATTTAA……

MEVFKAP…

DNA

Protein

Database

Introduction to BioinformaticsEnglish Courses for Graduate Students

What Bioinformatics Can Do for You?Analyzing DNAs

33

What Bioinformatics Can Do for You?Analyzing RNAs

In the context of bioinformatics, there are only two important differences between RNA and DNA:

RNA differs from DNA by one nucleotide.

RNA comes as a single strand.

Introduction to BioinformaticsEnglish Courses for Graduate Students

34

Even though RNA molecules consist of single strands of nucleotides, theirnatural urge for pairing with complementary sequences is still there.

Hairpin shapes are the basic elements of RNA secondary structure; they’re made up of loops (the unpaired C-U) and stems (the paired regions).

All transfer RNAs (tRNAs) assemble themselves into a shape like a cloverleaf.

What Bioinformatics Can Do for You?Analyzing RNAs

Introduction to BioinformaticsEnglish Courses for Graduate Students

35

What Bioinformatics Can Do for You?Analyzing ProteinsProtein Structure Determination:

Experimental Methods

Computational MethodsDe novo method, Homology Modeling, Threading, and ensemble method.

X-ray Crystallography Nuclear Magnetic Resonance (NMR)

Introduction to BioinformaticsEnglish Courses for Graduate Students

The first 3D structure of a protein was determined in 1958 using X-ray crystallography.

36

Maestro

Structure

SequenceVMD

Function

Pymol

What Bioinformatics Can Do for You?Analyzing Proteins

Introduction to BioinformaticsEnglish Courses for Graduate Students

37

What Bioinformatics Can Do for You?Analyzing Protein Sequences

Drug Design:

• Virtual Screen

• DockingVirtual screening involves the rapid in silico assessment of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.

Introduction to BioinformaticsEnglish Courses for Graduate Students

38

What Bioinformatics Can Do for You?Analyzing Protein Sequences

Molecular dynamics (MD) is a computer simulation of physical movements of atoms and molecules.

Super-computer

500-aa protein, 1 ns (10-9 s), 120 Cores :5 hours

Introduction to BioinformaticsEnglish Courses for Graduate Students

39

Introduction to BioinformaticsEnglish Courses for Graduate Students

What Bioinformatics Can Do for You?Analyzing Protein Sequences

Bavaria Supercomputing Centre• Linux Cluster: 2007, 753 notes, 5646 cores,

43 Tera Float/s

• HLRB II: 2007, 9728 cores, 62 Tera Float/s

• SuperMUC: 2012, 140000 cores, 3 Peta Float/s

天河一号: 2.5 Peta Float/s, No.1 in the world

Linux Cluster HLRB II SuperMUC

40

What Bioinformatics Can Do for You?Others: Pathway, Bioimaging, etc.

Introduction to BioinformaticsEnglish Courses for Graduate Students

CT

magnetic resonance

statistic graph

41

How Most People Use Bioinformatics?

Making a Multiple Protein Sequence Alignment with ClustalW

Becoming an Instant Expert with PubMed

Retrieving Protein Sequences

Retrieving DNA Sequences

Using BLAST to Compare Your Protein Sequence

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a 3D protein structure

42

How Most People Use Bioinformatics?

Gene Sequence

Specialistin

Bioinformatics

Great! It’s dUTPase.

But, what’s dUTPase.

Becoming an Instant Expert with PubMed

Introduction to BioinformaticsEnglish Courses for Graduate Students

43

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMedhttp://www.ncbi.nlm.nih.gov/entrez/

Introduction to BioinformaticsEnglish Courses for Graduate Students

dUTPase

44

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Introduction to BioinformaticsEnglish Courses for Graduate Students

45

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Introduction to BioinformaticsEnglish Courses for Graduate Students

46

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Introduction to BioinformaticsEnglish Courses for Graduate Students

47

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Introduction to BioinformaticsEnglish Courses for Graduate Students

48

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Introduction to BioinformaticsEnglish Courses for Graduate Students

49

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Author Name

Introduction to BioinformaticsEnglish Courses for Graduate Students

50

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Author Name + Topic

Introduction to BioinformaticsEnglish Courses for Graduate Students

51

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Introduction to BioinformaticsEnglish Courses for Graduate Students

52

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Introduction to BioinformaticsEnglish Courses for Graduate Students

53

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

1

2

3

Introduction to BioinformaticsEnglish Courses for Graduate Students

54

How Most People Use Bioinformatics?Pubmed ID

PublicationDate

Title

Page

Abstracts

Laboratory address

authors

Internal structure of a database record:

The information is spread out over separate sections, called fields.

Introduction to BioinformaticsEnglish Courses for Graduate Students

55

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Search “Down” in field “Author [AU]”

Search “Down” in field “Title [TI]”

Search “Down” in field “Laboratory address [AD]”

Search “Down”everywhere

Introduction to BioinformaticsEnglish Courses for Graduate Students

56

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Beijing

Using fields to find experts near you :

Tel : 86 - 10 - 6275-5002 Fax : 86 - 10 - 6276-2292 New Life Science Building, Peking University, Summer Palace Road No. 5, Beijing, P. R. China 100871

1

2

3

BeijingBeijing

Introduction to BioinformaticsEnglish Courses for Graduate Students

57

How Most People Use Bioinformatics?

Searching PubMedusing limits

Introduction to BioinformaticsEnglish Courses for Graduate Students

Becoming an Instant Expert with PubMedhttp://www.ncbi.nlm.nih.gov/entrez/

dUTPase

58

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Searching PubMedusing limits

Introduction to BioinformaticsEnglish Courses for Graduate Students

59

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

Introduction to BioinformaticsEnglish Courses for Graduate Students

60

How Most People Use Bioinformatics?Becoming an Instant Expert with PubMed

A few more tips about PubMed : How to get the most out of your query:

• quoted queries (for example, “down syndrome”)• logical connectors: AND, OR, NOT (for example,

dUTPase[TI] OR pyrophosphatase[TI] NOT Smith[AU])• initials to proper names (for example, “Abergel C”)• PubMed Identifier (the number in the PMID field)• deselection of the Limit box when starting a new search.• Related Articles link

How to get the most out of your query:• Names ranking beyond the 10th place in author’s list for older papers (before 1995). • Papers recorded before 1965. • Abstracts for most references recorded before 1976.

Introduction to BioinformaticsEnglish Courses for Graduate Students

61

How Most People Use Bioinformatics?

acquire some preliminary information about a particular function that you’re interested in — dUTPase.

find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.

Retrieving Protein Sequences http://expasy.org/

ExPASy

Introduction to BioinformaticsEnglish Courses for Graduate Students

62

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Prof. Amos Bairoch

dUTPase coli

Introduction to BioinformaticsEnglish Courses for Graduate Students

63

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

64

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

65

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

66

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

67

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

1 2 3

Introduction to BioinformaticsEnglish Courses for Graduate Students

68

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

69

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

1 2 3

Introduction to BioinformaticsEnglish Courses for Graduate Students

70

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

71

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

72

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

73

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Tab

Excel

FASTA

Introduction to BioinformaticsEnglish Courses for Graduate Students

74

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

75

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

Introduction to BioinformaticsEnglish Courses for Graduate Students

76

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

“Cross-references”point to data collections other than UniProtKB.

Introduction to BioinformaticsEnglish Courses for Graduate Students

77

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

“sequences” provides you with the actual amino acid sequence of the protein.

Save this sequence on your Desktop as “P06968.fasta”.

right click

Introduction to BioinformaticsEnglish Courses for Graduate Students

78

How Most People Use Bioinformatics?Retrieving Protein Sequences http://expasy.org/

What is FASTA? (has anything to do with PASTA?)

FASTA is the name of a popular sequence alignment and database scanning program created by W.R. Pearson and D.J. Lipman in 1988. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.

The sequence in FASTA format :

>P06968 My_Sequence_NameARCGTCRGCKINTANDRGCKINTANDCKINTANDARCGTCRGCKINTANDRGCKINTAND

The line starting with > (the definition line) contains a unique identifier followed by an optionalshort definition. The lines that follow it contain the DNA or protein sequence (in one-lettercode) until the next > symbol indicates the beginning of a new sequence.

Introduction to BioinformaticsEnglish Courses for Graduate Students

79

How Most People Use Bioinformatics?

acquire some preliminary information about a particular function that you’re interested in — dUTPase.

find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.ExPASy

retrieve DNA sequence relevant to dUTPase protein of E. coli.

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieving DNA Sequences

80

How Most People Use Bioinformatics?Retrieving DNA Sequences http://expasy.org/

P06968

Introduction to BioinformaticsEnglish Courses for Graduate Students

81

How Most People Use Bioinformatics?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieving DNA Sequences http://expasy.org/

82

How Most People Use Bioinformatics?Retrieving DNA Sequences

Introduction to BioinformaticsEnglish Courses for Graduate Students

83

How Most People Use Bioinformatics?Retrieving DNA Sequences

Introduction to BioinformaticsEnglish Courses for Graduate Students

84

How Most People Use Bioinformatics?Retrieving DNA Sequences

Introduction to BioinformaticsEnglish Courses for Graduate Students

85

How Most People Use Bioinformatics?Retrieving DNA Sequences

……

1. Summary Section

2. Reference Section

Introduction to BioinformaticsEnglish Courses for Graduate Students

From UniprotKB: P06968 jump to

86

How Most People Use Bioinformatics?Retrieving DNA Sequences

……

3. Features Section• promoter elements• ribosome binding

sites (RBS)• protein coding

segments (CDS)……

4. Sequence Section

Range of UTPaseORF (CDS)

ORF translation

Introduction to BioinformaticsEnglish Courses for Graduate Students

87

How Most People Use Bioinformatics?Retrieving DNA Sequences

……

1. Summary Section

2. Reference Section

Introduction to BioinformaticsEnglish Courses for Graduate Students

88

How Most People Use Bioinformatics?Retrieving DNA Sequences

……

1. Summary Section

2. Reference Section

Introduction to BioinformaticsEnglish Courses for Graduate Students

89

How Most People Use Bioinformatics?

acquire some preliminary information about a particular function that you’re interested in — dUTPase.

find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.

Using BLAST to Compare Sequence

ExPASy

perform a BLAST search

Introduction to BioinformaticsEnglish Courses for Graduate Students

retrieve DNA sequence relevant to dUTPase protein of E. coli.

90

How Most People Use Bioinformatics?Using BLAST to Compare Sequence

What is BLAST?

BLAST (Basic Local Alignment Search Tool) – A sequencecomparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query.

BLASTn – BLASTn will search a DNA sequence against a DNA databank.

BLASTp – BLASTp will compare a protein sequence against the protein database of your choice.

BLASTx – BLASTx will translate a nucleic acid sequence in all six reading frames and compare all these against the protein database of your choice.

BLAST? – BLAST? ……

Introduction to BioinformaticsEnglish Courses for Graduate Students

91

How Most People Use Bioinformatics?Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

Introduction to BioinformaticsEnglish Courses for Graduate Students

92

How Most People Use Bioinformatics?Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

Introduction to BioinformaticsEnglish Courses for Graduate Students

93

How Most People Use Bioinformatics?Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

Open “P06968.fasta” at your Desktop, and paste the sequence here.

Give a name here.

1

2

3

http://1.51.212.243/P06968.fasta

Introduction to BioinformaticsEnglish Courses for Graduate Students

94

How Most People Use Bioinformatics?Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

Introduction to BioinformaticsEnglish Courses for Graduate Students

95

How Most People Use Bioinformatics?Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

Introduction to BioinformaticsEnglish Courses for Graduate Students

96

How Most People Use Bioinformatics?

E-value (form 0 to 1) close to 1 is a warning that the conclusion you might draw from the alignments is NOTreliable.

Introduction to BioinformaticsEnglish Courses for Graduate Students

97

How Most People Use Bioinformatics?

to see the alignment between your query sequence and the matching sequence of the protein that corresponds to this score.

to see the corresponding database entry.

Introduction to BioinformaticsEnglish Courses for Graduate Students

98

How Most People Use Bioinformatics?Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

Introduction to BioinformaticsEnglish Courses for Graduate Students

99

How Most People Use Bioinformatics?Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/

Introduction to BioinformaticsEnglish Courses for Graduate Students

What is Alignment?

Alignment is the result of a comparison of two or more gene or protein sequences in order to determine their degree of base or amino acid similarity.

Pairwise Alignment

Multiple Alignment

100

How Most People Use Bioinformatics?

acquire some preliminary information about a particular function that you’re interested in — dUTPase.

find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.

Making a Multiple Sequence Alignment

ExPASy

perform a BLAST search

Introduction to BioinformaticsEnglish Courses for Graduate Students

retrieve DNA sequence relevant to dUTPase protein of E. coli.

perform a multiple

alignment

101

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment

Multiple alignments are used to :

• Identify sequence positions where specific amino acids really matter for the structural integrity or the function of a given protein

• Define specific sequence signatures for protein families• Classify sequences and build evolutionary trees

Introduction to BioinformaticsEnglish Courses for Graduate Students

102

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

Introduction to BioinformaticsEnglish Courses for Graduate Students

103

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

Introduction to BioinformaticsEnglish Courses for Graduate Students

104

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

http://1.51.212.243/multi.fasta

Get sequences under :http://1.51.212.243/multi.fasta

Select all

Copy

Introduction to BioinformaticsEnglish Courses for Graduate Students

105

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

Paste

Introduction to BioinformaticsEnglish Courses for Graduate Students

106

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

Introduction to BioinformaticsEnglish Courses for Graduate Students

107

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

Introduction to BioinformaticsEnglish Courses for Graduate Students

* identical

: similar

. related

different

108

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

Introduction to BioinformaticsEnglish Courses for Graduate Students

109

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

Introduction to BioinformaticsEnglish Courses for Graduate Students

Conserved region

110

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

Introduction to BioinformaticsEnglish Courses for Graduate Students

111

How Most People Use Bioinformatics?Making a Multiple Sequence Alignment http://pir.georgetown.edu

Introduction to BioinformaticsEnglish Courses for Graduate Students

112

How Most People Use Bioinformatics?

acquire some preliminary information about a particular function that you’re interested in — dUTPase.

find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.

Retrieve a protein structure

ExPASy

perform a BLAST search

perform a multiple

alignment

Introduction to BioinformaticsEnglish Courses for Graduate Students

retrieve DNA sequence relevant to dUTPase protein of E. coli.

retrieve a protein structure

113

How Most People Use Bioinformatics?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a protein structure dUTPase

protein sequence

DNA sequence

3D structure

114

How Most People Use Bioinformatics?

Beijing

Using fields to find experts near you :

Tel : 86 - 10 - 6275-5002 Fax : 86 - 10 - 6276-2292 New Life Science Building, Peking University, Summer Palace Road No. 5, Beijing, P. R. China 100871

BeijingBeijing

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a protein structure

115

How Most People Use Bioinformatics?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a protein structure

Su XD dUTPase

116

How Most People Use Bioinformatics?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a protein structure

117

How Most People Use Bioinformatics?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a protein structure

118

How Most People Use Bioinformatics?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a protein structure

119

How Most People Use Bioinformatics?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a protein structure Press leftbutton

120

How Most People Use Bioinformatics?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a protein structure Pressing left button

Action

Right-Click Jmol Menu

Left Click Select/DeselectResidue

Shift + Left Clickdrag mouse up or down / roll mouse middle button

Zoom

Left Click and Drag Rotate View

121

How Most People Use Bioinformatics?

Introduction to BioinformaticsEnglish Courses for Graduate Students

Retrieve a protein structure

122

How Most People Use Bioinformatics?Retrieve a protein structure

Introduction to BioinformaticsEnglish Courses for Graduate Students

Backbone by chain

123

Introduction to BioinformaticsEnglish Courses for Graduate Students

Recommended