View
36
Download
1
Category
Preview:
Citation preview
1
Introduction to Bioinformatics
Dr. rer. nat. Gong Jing
Cancer Research center
Medicine School of Shandong University
2012.10.31
Introduction to Introduction to BioinformaticsBioinformatics
2
Chapter 1
Introduction
Introduction to Introduction to BioinformaticsBioinformatics
3
About me• Dr. rer. nat. Gong Jing • Bachelor Degree in Marine Biology at the China
Ocean University (former Qingdao Ocean University)
• Bachelor, Master & Doctoral Degree in Bioinformatics at the Ludwig MaximiliansUniversität München, Germany
• Affiliation: Cancer Research Center of SDU• Tel: 0531-88380202• Email: gongjing@sdu.edu.cn• Office: Dianjing Building, Rm.106, Baotuquan
Campus
Introduction to Introduction to BioinformaticsBioinformatics
4
About this course• Schedule: 2012/10/31 – 2012/11/14, Wed. & Fri. 13:30 - 17:30• Locus: 8#, third floor, east, Computer Pool • Homepage: http://www.crc.sdu.edu.cn/bioinfo/2012
• Table of Contents
Chapter 1 : Introduction Chapter 2 : Databases
Chapter 5 : Structure
Chapter 3 : Alignment Chapter 4 : Tree
My name is Lampy.
Introduction to Introduction to BioinformaticsBioinformatics
5
Literatures:1. Bioinformatics - An Introduction, 2nd Edition, Jeremy Ramsden, 2009, Springer. 2. Bioinformatics For Dummies, 2nd Edition, Jean-Michel Claverie, Cedric Notredame, 2007, Wiley.
Introduction to Introduction to BioinformaticsBioinformatics
6
Literatures:1. 生物信息学,陈铭主编,2012-1-1,科学出版社. 2. 生物信息学,李霞主编,2010-8-1,人民卫生出版社3. 生物信息学,许忠能主编,2008-9-1,清华大学出版社
Introduction to Introduction to BioinformaticsBioinformatics
7
Information Page Vocabulary ListInformation Page
Chapter 1, 2012/10/31
Dr. rer. nat. Jing GongAffiliation: Cancer Research Center of SDUTel: 0531-88380202Email: gongjing@sdu.edu.cnOffice: Dianjing Building, Rm.106, BaotuquanCampus
Schedule: 20112/10/31 - 2012/11/14Wed. & Fri. 13:30 - 17:30
Place: 8#, third floor, east, Computer Pool
Course Homepage: http://www.crc.sdu.edu.cn/bioinfo/2012
Pubmed: http://www.ncbi.nlm.nih.gov/entrez/
ExPASy: http://expasy.org/
NCBI: http://www.ncbi.nlm.nih.gov/
PRI: http://pir.georgetown.edu
FASTA
FASTA (prounced FAST-Aye) stands forFAST-ALL, reflecting the fact that it canbe used for a fast protein ……
BLAST
Basic Local Alignment Search Tool. A sequence comparison algorithm optimized for speed used to search sequence dtabases ……
Alignment
The result of a comparison of two or more gene or protein sequences in order to determine their degree of base or amino acid…….
FASTA
FASTA (读作FAST-Aye) 代表FAST-ALL, 反映的实施是他能够用于快速的蛋白质比对或者快组的核苷比对。该程序……
BLAST
基本局部比对搜索工具。以速度优化算法为核心,搜索序列数
据库得到 佳局部比对结果。用替代矩阵和查新序列……
比对
两个甚至更多的基因或者蛋白质序列进行比较的结果,用以计算他们碱基或者氨基酸的相似度。序列比对用来决定两个甚至…….
Vocabulary
Chapter 1, 2012/10/31
Introduction to Introduction to BioinformaticsBioinformatics
8
What is Bioinformatics?
biochemistry
biometrics
biophysics biohazards
biomathematics
bioterrorism
biopotato bioinformatics
Introduction to Introduction to BioinformaticsBioinformatics
9
a biology/medical researchers, just like you
a professional in the pharmaceutical industry
a policeman worrying about DNA testing
a computer scientist developing bio-databases
a consumer concerned about GMOs (Genetically Modified Organisms)
… …
Introduction to Introduction to BioinformaticsBioinformatics
InterdisciplinaryWhat is Bioinformatics?
10
DEFINITION:Bioinformatics – the science of collecting and analyzing complex biological data such as genetic codes. [Oxford Dictionary]
Bioinformatics – the computational branch of molecular biology. [Bioinformatics for Dummies]
Bioinformatics – the application of computer science and information technology to the field of biology and medicine. [Wikipedia]
Bioinformatics – the science of how information is generated, transmitted, received, and interpreted in biological systems, i.e. the application of information science to biology. [Bioinformatics-An Introduction]
A formal definition ?
Introduction to Introduction to BioinformaticsBioinformatics
What is Bioinformatics?
11
History of BioinformaticsIn 1809, French biologist Jean Baptiste Lamarck published “PhilosophieZoologique”. Lamarck stressed two main themes in his biological work:
1. The environment gives rise to changes in animals, i.e. changes through use and disuse.
2. Life was structured in an orderly manner and that many different parts of all bodies make it possible for the organic movements of animals.
“blind as a mole” “show your teeth” “birds have no teeth?” Jean Baptiste Lamarck (1744-1829)
Introduction to Introduction to BioinformaticsBioinformatics
12
In 1859, English naturalist Charles Darwin published “On the Origin of Species by Means of Natural Selection, or the Preservation of FavouredRaces in the Struggle for Life”.
Charles Darwin (1809-1882)
History of Bioinformatics
Introduction to Introduction to BioinformaticsBioinformatics
13
Gregor J. Mendel (1822-1884)
In 1866, Austrian scientist Gregor Mendel demonstrated that the inheritance of certain traits in pea plants follows particular patterns, now referred to as the laws of “MendelianInheritance”.
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
14
Friedrich Miescher(1844-1895)
In 1869, Swiss physician and biologist Friedrich Miescher isolated DNA from the white blood cells at Felix Hoppe-Seyler's laboratory at the University of Tübingen, Germany.
Nuclei Nuclein Nucleic acid DNA
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
15
Thomas Hunt Morgan, American geneticist, famous for his experimental research with the fruit fly by which he established the chromosome theory of heredity. He showed that genes are linked in a series on chromosomes and are responsible for identifiable, hereditary traits. Morgan’s work played a key role in establishing the field of genetics. He received the Nobel Prize for Physiology or Medicine in 1933.
Thomas H. Morgen(1866-1945)
Nobel prize 1933
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
16
In 1944, American physician and medical researcher Oswald Avery and his co-workers Colin MacLeod and Maclyn McCarty demonstrated that DNA is the material of which genes and chromosomes are made.
In his experiment he destroyed the lipids, ribonucleic acids, carbohydrates, and proteins. Transformation still occurred after this. Next he destroyed the deoxyribonucleic acid. Transformation did not occur.
Oswald Avery Colin MacLeod Maclyn McCarty(1877-1955) (1909-1972) (1911-2005)
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
17
In 1950, American biochemist Erwin Chargaff noticed a pattern in the amounts of the four bases: adenine (A) , thymine (T) , cytosine (C) , guanine (G). He discovered that the amounts of adenine (A) and thymine (T) in DNA were roughly the same, as were the amounts of cytosine (C) and guanine (G). This later became known as Chargaff's rule.
Erwin Chargaff (1905-2002)
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
18
In 1953, James D. Watson and Francis Cricksuggested the first correct double-helix model of DNA structure in the journal Nature. Their double-helix model of DNA was based on a single X-ray diffraction image taken by Rosalind Franklin andMaurice Wilkins in 1952.
Rosalind Franklin(1920-1958)
James Waston(1928- )
Nobel prize 1962
Francis Crick (1916-2004)
Nobel prize 1962
Maurice Wilkins (1916-2004)
Nobel prize 1962
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
19
The sequence of 77 nucleotides of a yeast alanine tRNA was found by an American biochemist Robert W. Holley in 1965. Holley was awarded the 1968 Nobel Prize in Physiology or Medicine for describing the structure of this tRNA, linking DNA and protein synthesis.
Robert W. Holley (1922-1993)
Nobel prize 1968
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
20
Frederick Sanger (1918- )
Nobel prize 1980
In 1977, Frederick Sanger and Colleagues introduced the “dideoxy” chain-termination method for sequencing DNA molecules, also known as the “Sanger method”. Hence, in 1980, he shared Nobel Prize in chemistry with Walter Gilbert.
Walter Gilbert(1932- )
Nobel prize 1980
The key principle of the Sanger method was the use of dideoxynucleotide triphosphates (ddNTPs), as DNA chain terminators.
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
21
Read protein sequence directly in the DNA sequence!
Central dogma of molecular biology was first demonstrated by Francis Crick in 1958 and re-stated in a Nature paper published in 1970. Francis Crick
(1916-2004)
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
22
Marshall Warren Nirenberg shared a Nobel Prize in Physiology or Medicine in 1968 with Har Gobind Khorana and Robert W. Holley for "breaking the genetic code" and describing how it operates in protein synthesis.
Marshall Warren Nirenberg
(1927-2010)Nobel prize 1968
Har GobindKhorana (1922-)Nobel prize 1968
Robert W. Holley (1922-1993)
nobel prize 1968
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
23
All proteins are made up of the same basic building blocks, called amino acids.
Amino acids are made of carbon, hydrogen, oxygen, nitrogen, and sulfur atoms.
A protein = C1200H2400O600N300S100
Protein is a nutrient needed by the human body for growth and maintenance.
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
Aside from water, protein is the most abundant molecule in the body.
24V
Y
W
T
S
P
F
M
K
L
I
H
G
E
Q
C
D
N
R
A
1-letter
ValineVal20
TyrosineTyr19
TrytophanTrp18
ThreonineThr17
SerineSer16
ProlinePro15
PhenylalaninePhe14
MethionineMet13
LysineLys12
LeucineLeu11
IsoleucineIle10
HistindineHis9
GlycineGly8
Glutamic acidGlu7
GlutamineGln6
CysteineCys5
Aspartic acidAsp4
AsparagineAsn3
ArginineArg2
AlanineAla1
Nmae3-letter# A given type of protein always contains the same number of total amino acids in the same proportion.
Amino acids are linked together as a chain. The first amino acid sequence of a protein, Insulin, was determined in 1951 by Dr. Sanger.
insulin = MALWMRLLPLLALLALWGPDPAAAFVNQHL CGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG GPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
insulin = (30 glycines + 44 alanines + 5 tyrosines + 14 glutamines + . . .)
Frederick Sanger(1918- )
Nobel prize 1958
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
25
Protein Sequence: MAVLD
The first 3D structure of a protein was determined in 1958 by Drs. Kendrewand Perutz, using the complicated technique of X-ray crystallography.
Max Ferdinand Perutz (1914-2002) Nobel prize 1962
John CowderyKendrew (1917-1997)
Nobel prize 1962
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
26
Introduction to Introduction to BioinformaticsBioinformatics
27
In 1956, Symposium on Information Theory in Biology (Gatlinburg, USA).
In 1979, GenBank was established at Los Alamos National Laboratory (USA).
In 1982, nucleotide sequence database of European Molecular Biology Laboratory (EMBL) was created (Europe).
In 1986, DNA Data Bank of Japan (DDBJ) began data bank activities at NIG (Japan).
in the early 1990s, International Nucleotide Sequence Database Collaboration (INSDC) was founded in cooperation of Genbank/EMBL/DDBJ.
In 1987, a Chinese-American scientist LIN Hua-an first created the word “bioinformatics”. At the very beginning, he created the word “compbio”, then “bioinformatique”, and then “bio-informatics”. But at that time, the email title did not support the hyphen symbol, thus “bioinformatics” was born.
Since at least the late 1980s, the term “bioinformatics” has been primary used in genomics and genetics, particularly in those areas of genomics involving large-scale DNA sequencing.
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
28
Introduction to Introduction to BioinformaticsBioinformatics
History of Bioinformatics
29
Publicly funded project: Privately funded project
James D. Watson & Francis Collins President Clinton (2000) Craig Venter
1990 began, $3-billion 1998 began, $300-million
patented
freely available
2000 90%
2001 99%
2003 finished
2000 90%
2001 99%
2003 finished
History of Bioinformatics
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Human Genome Project
29
30
History of Bioinformatics
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
31
AB SOLiDTM
4.0 SystemX 27
Illumina HiSeq 2000X 137
Beijing
Shanghai
Shenzhen
History of Bioinformatics
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
32
Analyzing DNAs
Analyzing RNAs
Analyzing Proteins
Others: Pathway,
Bioimaging
Statistics, etc.
What Bioinformatics Can Do for You?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
33
1. Read the DNA sequence:ATGGAAGTATTTAAAGCGCCACCTATTGGGATATAAG
2. Decompose it into successive triplets:ATG GAA GTA TTT AAA GCG CCA CCT ATT GGG ATA TAA G . . .
3. Translate each triplet into the corresponding amino acid:M E V F K A P P I G I STOP
Analyzing DNAs
What Bioinformatics Can Do for You?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
34
ATGGAAGTATTTAA……
MEVFKAP…
DNA
Protein
Database
Analyzing DNAs
What Bioinformatics Can Do for You?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
35
Analyzing RNAs
In the context of bioinformatics, there are only two important differences between RNA and DNA:
RNA differs from DNA by one nucleotide.
RNA comes as a single strand.
What Bioinformatics Can Do for You?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
36
Even though RNA molecules consist of single strands of nucleotides, theirnatural urge for pairing with complementary sequences is still there.
Hairpin shapes are the basic elements of RNA secondary structure; they’re made up of loops (the unpaired C-U) and stems (the paired regions).
All transfer RNAs (tRNAs) assemble themselves into a shape like a cloverleaf.
Analyzing RNAs
What Bioinformatics Can Do for You?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
37
Analyzing ProteinsProtein Structure Determination:
Experimental Methods
Computational MethodsDe novo method, Homology Modeling, Threading, and ensemble method.
X-ray Crystallography Nuclear Magnetic Resonance (NMR)
The first 3D structure of a protein was determined in 1958 using X-ray crystallography.
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
What Bioinformatics Can Do for You?
38
Maestro
Structure
SequenceVMD
Function
Pymol
Analyzing Proteins
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
What Bioinformatics Can Do for You?
39
Analyzing Proteins
Drug Design:
• Virtual Screen
• DockingVirtual screening involves the rapid in silico assessment of large libraries of chemical structures in order to identify those structures which are most likely to bind to a drug target, typically a protein receptor or enzyme.
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
What Bioinformatics Can Do for You?
40
Analyzing ProteinsMolecular dynamics (MD) is a computer simulation of physical movements of atoms and molecules.
Super-computer
500-aa protein, 1 ns (10-9 s), 120 Cores -> 5 hours
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
What Bioinformatics Can Do for You?
41
Analyzing Protein SequencesBavaria Supercomputing Centre• Linux Cluster: 2007, 753 notes, 5646 cores,
43 Tera Float/s (1 Tera Float/s = 1012 float/s)
• HLRB II: 2007, 9728 cores, 62 Tera Float/s
• SuperMUC: 2012, 140000 cores, 3 Peta Float/s
• 天河一号: 2.5 Peta Float/s, No.1 in the world(1 Pera Float/s = 1015 float/s)
Linux Cluster HLRB II SuperMUC
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
What Bioinformatics Can Do for You?
41
42
Others: Pathway, Bioimaging, Statistics etc.
CT
magnetic resonance
statistic graph
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
What Bioinformatics Can Do for You?
43
How Most People Use Bioinformatics?
Making a Multiple Protein Sequence Alignment with ClustalW
Becoming an Instant Expert with PubMed
Retrieving Protein Sequences
Retrieving DNA Sequences
Using BLAST to Compare Your Protein Sequence
Retrieve a 3D protein structure
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
44
Gene Sequence
Specialistin
Bioinformatics
Great! It’s dUTPase.
But, what’s dUTPase.
Becoming an Instant Expert with PubMed
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
45
Becoming an Instant Expert with PubMedhttp://www.ncbi.nlm.nih.gov/pubmed
dUTPase
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
46
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
47
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
48
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
49
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
50
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
51
Author NameAuthor Name
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
52
Author Name + TopicAuthor Name + Topic
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
53
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
53
54
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
54
55
1
2
3
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
55
56
Pubmed ID
PublicationDate
Title
Page
Abstracts
Laboratory address
authors
Internal structure of a database record: Internal structure of a database record:
The information is spread out over The information is spread out over separate sections, called separate sections, called fieldsfields..
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
56
57
Search “Down” in field “Author [AU]”
Search “Down” in field “Title [TI]”
Search “Down” in field “Laboratory address [AD]”
Search “Down”everywhere
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
58
Beijing
Using fields to find experts near you :
Tel : 86 - 10 - 6275-5002 Fax : 86 - 10 - 6276-2292 New Life Science Building, Peking University, Summer Palace Road No. 5, Beijing, P. R. China 100871
1
2
3
BeijingBeijing
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
59
Searching Searching PubMedPubMed using using Advanced SearchAdvanced Search
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMedhttp://www.ncbi.nlm.nih.gov/pubmed
60
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
60
61
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
61
62
A few more tips about PubMed : How to get the most out of your query:
• quoted queries (for example, “down syndrome”)
• logical connectors: AND, OR, NOT (for example, dUTPase [TI] AND bacteria [TI] NOT Smith [AU])
• initials to proper names (for example, “Abergel C”)
• PubMed Identifier (the number in the PMID field)
How to get the most out of your query:• Names ranking beyond the 10th place in author’s list for older papers (before 1995).
• Papers recorded before 1965.
• Abstracts for most references recorded before 1976.
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Becoming an Instant Expert with PubMed
63
acquire some preliminary information about a particular function that you’re interested in — dUTPase.
find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.
Retrieving Protein Sequences http://expasy.org/
ExPASy
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
64Prof. Amos Bairoch
dUTPase coli
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
64
65
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
65
66
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
66
67
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
67
68
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
68
69
1 2 3
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
69
70
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
70
71
1 2 3
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
71
72
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
72
73
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
73
74
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
74
75
TabTab
ExcelExcel
FASTAFASTA
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
75
76
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
76
77
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
77
78
“Cross-references”point to data collections other than UniProtKB.
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
79
“sequences” provides you with the actual amino acid sequence of the protein.
Save this sequence on your Desktop as “P06968.fasta”.
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
right click
80
What is FASTA? (has anything to do with PASTA?)
FASTA is the name of a popular sequence alignment and database scanning program created by W.R. Pearson and D.J. Lipman in 1988. Its legacy is the FASTA format which is now ubiquitous in bioinformatics.
The sequence in FASTA format :
>P06968 My_Sequence_NameARCGTCRGCKINTANDRGCKINTANDCKINTANDARCGTCRGCKINTANDRGCKINTAND
The line starting with > (the definition line) contains a unique identifier followed by an optionalshort definition. The lines that follow it contain the DNA or protein sequence (in one-lettercode) until the next > symbol indicates the beginning of a new sequence.
Retrieving Protein Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
81
acquire some preliminary information about a particular function that you’re interested in — dUTPase.
find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.ExPASy
retrieve DNA sequence relevant to dUTPase protein of E. coli.
Retrieving DNA Sequences
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
82
Retrieving DNA Sequences http://expasy.org/
P06968
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
82
83
Retrieving DNA Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
83
84
Retrieving DNA Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
85
Retrieving DNA Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
85
86
Retrieving DNA Sequences http://expasy.org/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
86
87
Retrieving DNA Sequences From UniprotKB: P06968 jump to
……
1. Summary Section
2. Reference Section
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
87
88……
3. Features Section• promoter elements• ribosome binding
sites (RBS)• protein coding
segments (CDS)……
4. Sequence Section
Range of UTPaseORF (CDS)
ORF translation
Retrieving DNA Sequences http://www.ncbi.nlm.nih.gov
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
89……
1. Summary Section
2. Reference Section
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Retrieving DNA Sequences http://www.ncbi.nlm.nih.gov
89
90……
1. Summary Section
2. Reference Section
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Retrieving DNA Sequences http://www.ncbi.nlm.nih.gov
90
91
acquire some preliminary information about a particular function that you’re interested in — dUTPase.
find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.
Using BLAST to Compare Sequence
ExPASy
perform a BLAST searchretrieve DNA sequence relevant to dUTPase protein of E. coli.
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
92
Using BLAST to Compare Sequence
What is BLAST?
BLAST (Basic Local Alignment Search Tool) – A sequencecomparison algorithm optimized for speed used to search sequence databases for optimal local alignments to a query.
BLASTn – BLASTn will search a DNA sequence against a DNA database.
BLASTp – BLASTp will compare a protein sequence against a protein database.
BLASTx – BLASTx will translate a nucleic acid sequence in all six reading frames and compare all these against the protein database of your choice.
BLAST? – BLAST? ……
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
93
Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
93
94
Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
94
95
Open Open ““P06968.fastaP06968.fasta”” at at your Desktop, and paste your Desktop, and paste the sequence here.the sequence here.
Give a name here.Give a name here.
1
2
3
http://www.crc.sdu.edu.cn/bioinfo/2012/P06968.fasta
Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
95
96
Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
96
97
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
97
98
EE--value (form 0 to 1) close value (form 0 to 1) close to 1 is a warning that the to 1 is a warning that the conclusion you might draw conclusion you might draw from the alignments is from the alignments is NOTNOTreliable.reliable.
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
98
99
to see the alignment between to see the alignment between your query sequence and the your query sequence and the matching sequence of the matching sequence of the protein that corresponds to protein that corresponds to this score.this score.
to see the corresponding to see the corresponding database entry.database entry.
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
99
100
Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
101
What is Alignment?
Alignment is the result of a comparison of two or more gene or protein sequences in order to determine their degree of base or amino acid similarity.
Pairwise Alignment
Multiple Alignment
Using BLAST to Compare Sequence http://www.ncbi.nlm.nih.gov/
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
102
acquire some preliminary information about a particular function that you’re interested in — dUTPase.
find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.
Making a Multiple Sequence Alignment
ExPASy
perform a BLAST searchretrieve DNA sequence relevant to dUTPase protein of E. coli.
perform a multiple
alignment
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
103
Multiple alignments are used to :
• Identify sequence positions where specific amino acids really matter for the structural integrity or the function of a given protein
• Define specific sequence signatures for protein families• Classify sequences and build evolutionary trees
Making a Multiple Sequence Alignment
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
104
Making a Multiple Sequence Alignment http://pir.georgetown.edu
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
105
Making a Multiple Sequence Alignment http://pir.georgetown.edu
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
106
http://1.51.212.243/multi.fasta
Get sequences under :http://www.crc.sdu.edu.cn/bioinfo/2012/multi.fasta
Select all
Copy
Making a Multiple Sequence Alignment http://pir.georgetown.edu
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
107
Paste
Making a Multiple Sequence Alignment http://pir.georgetown.edu
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
108
Making a Multiple Sequence Alignment http://pir.georgetown.edu
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
108
109
* identical
: similar
. related
different
Making a Multiple Sequence Alignment http://pir.georgetown.edu
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
110
Making a Multiple Sequence Alignment http://pir.georgetown.edu
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
110
111
Making a Multiple Sequence Alignment http://pir.georgetown.edu
Conserved region
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
111
112
Making a Multiple Sequence Alignment http://pir.georgetown.edu
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
112
113
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
113
114
acquire some preliminary information about a particular function that you’re interested in — dUTPase.
find out more about it by retrieving a few examples of protein sequencesthat perform this function in E. coli.
Retrieve a protein structure
ExPASy
perform a BLAST search
perform a multiple
alignment
retrieve DNA sequence relevant to dUTPase protein of E. coli.
retrieve a protein structure
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
114
115
dUTPase
protein sequence
DNA sequence
3D structure
Retrieve a protein structure
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
116
Beijing
Using fields to find experts near you :
Tel : 86 - 10 - 6275-5002 Fax : 86 - 10 - 6276-2292 New Life Science Building, Peking University, Summer Palace Road No. 5, Beijing, P. R. China 100871
BeijingBeijing
Retrieve a protein structure
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
116
117
Su XD dUTPase
Retrieve a protein structure http://www.pdb.org
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
117
118
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Retrieve a protein structure http://www.pdb.org
119
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
120
Retrieve a protein structure
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
121
Retrieve a protein structure
How Most People Use Bioinformatics?
Press leftbutton
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
122
Retrieve a protein structure
How Most People Use Bioinformatics?
Pressing left button
Action
Right-Click Jmol Menu
Left Click Select/DeselectResidue
Shift + Left Clickdrag mouse up or down / roll mouse middle button
Zoom
Left Click and Drag Rotate View
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
123
Retrieve a protein structure
How Most People Use Bioinformatics?
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
124
Retrieve a protein structure
How Most People Use Bioinformatics?
Backbone by chain
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
125
English Courses English Courses for for
Graduate StudentsGraduate Students
Introduction to Introduction to BioinformaticsBioinformatics
Recommended