Jiří VondrášekÚstav organické chemie a biochemie AV ČR
Bioinformatikapodzimní škola výpočetní chemie, Praha 2006
Aplikace: teoriebiotechnologiefarmaciemedicínagenetické inženýrství
Informatika nad biologickými molekulami (daty).
Bioinformatika extrahuje molekulární informační systém pro molekulární biologii.
Bioinformatika je konceptualizovaná molekulární biologie (ve smyslu fyzikálně chemickém) na níž je aplikována informatika (odvozená od matematické informatiky a statistiky).
bioinformatika
experimentálnídata
počítačováanalýza
strukturovanádata
(databáze),hypotézy
sekvence
geny
kontigy
funkce
metabolismus
(vše)
struktura
bioinformatika
velikosti genomů
Mycoplasma genitalium 0.58 Mbp
Escherichia coli 4.6 Mbp
Saccharomyces cerevisiae16 chr. 11.2 Mbp
Arabidopsis thaliana 5 chr. 115.4 Mbp
Drosophila melanogaster5 chr. ~137.0 Mbp
Homo sapiens 24 chr. ~ 3.3 Gbp
centrální dogma molekulární genetiky
reverznítranskripce
DNA RNA proteintra
nskrip
ce
transla
cereplikace
informace funkce
DNA
geny
evoluční vztahy mezi geny aorganizmy
funkce
proteinystruktura
sekvence
>jana (4797 nt)GAATTCGCCGCGGGGCTGCGCATCACCGATGCCGCCACCATCGAGATCGTCGAGATGGTACTGGCCGGCTCGATCAACAAGCAGCTCGTCGGCTACATCAACGAAGCGGGCGGCAAGGCCGTCGGCCTGTGCGGCAAGGACGGCAACATGGTGTCCGCCACCAAGGCGACGCGCACCATGGTCGATCCGGATTCGCGGATCGAAGAGGTGATCGACCTCGGTTTCGTCGGCGAGCCGGAGAAGGTCGACCTCACCCTGCTCAACCAGCTGATCGGCCACGAGTTGATCCCGGTGCTGGCGCCGCTGGCGACCTCCGCGTCGGGCCAGACCTTCAACGTCAATGCCGACACCTTTGCAGGTGCGGTTGCCGGTGCGCTGCGGGCCAAGCGCCTGCTGCTGCTGACCGACGTGCCGGGCGTGCTCGACCAGAACAAGAAGCTGATCCCCGAACTGTCGATCAAGGATGCCCGCAAGCTGATCGCAGACGGCACCATCTCGGGCGGCATGATCCCCAAGGTCGAGACCTGCATCTACGCGCTCGAACAGGGCGTCGAAGGCGTCGTCATCCTCGACGGCAAGGTCCCGCACGCAGTGCTGCTCGAATTGTTCACCAACCAGGGCACCGGCACGCTGATCCACAAGTGATGCGAGGCTGCGGCGACAACATCCGTCATGGCCGGGCTCGTCCCGGCCATCCACGTCTTTCCGGCGGTTTTCTCAGCAAGACGTGGATGCCCGGCACAAGGCCGGGCATGACGGGGTGGAGATCGCGCGCCCTCGCCGCCATTGTCACCACCCTCGCCCTCACCTCCGCCGCCCACGCCGACCTCAAGCTCTGCAACCGCATGAGCTACGTGGTCGAGACGGCGATCGGGGTCGATTCCAACGGCACCACCGCCTCGCGCGGATGGCTGCGGATTGATCCGGCGCAATGCCGGGTCGTGGTGCAAGGCGCGCTCAACGCCGACCGCATCATGCTGAATGCCCGCGCGCTGGCGGTGTACGGCGTCTCGCCGCTGCCGCAGAACGGCACTGACCGGCTGTGCATTGCCGAAGACAATTTCGTCATCGCCGCCGCGCGGCAATGCCGCGGCGGCCAAACGCTCGCCGCCTTCACCGAGATCAAGCCCACCGACACCGAGGACGGCAACAAGATCGCTTATCTGGCGGAAGACTCCGGCTACGACGACGAACAGGCCAAACTCGCCGCGATCCAGCGGCTGCTGGTGATCGCCGGTTACGACGCCTCGCCGATCGACGGCGTCGACGGCCCGAAGACGCAGGCCGCGCTGTCCGCCTTCCTCAAGAGCCGAGGCCTGAAGCCCGAGATCGTCGATGCGCCGGATTTCTTCGACGTGATGATCAAGGCAGTGCAGCAGCCGTCCGGCAGCGGGCTGACCTGGTGCAACGACACCAAGTACAAGATCATGGCGGCCGTCGGCGAAGACGACGGCAAGACTGTCACCAGCCGCGGCTGGTACGGTGTTGCGCCCGGCCAATGCCTGCGCCCCGACCTCGGCGCACAGCCGAAGCGGGTGTTCAGCTTCGCCGAAGCGGTCGACGGCAGCGGCAGGCCGGTGACCATCAAGGGCCGTGCGCTGAACTGGGGCGGCGGCGTGACGCTGTGCACGCGTGACAGCAAGTTCGAGATCGGCGAGCAAGGCGATTGCGCGGCGCGCGGCCTCGCCGCCACCGGCTTCGCCGCCGTCGATCTCAGTAGCGGCAAGACATTGAGGTTGTCCGCCCCATGATGCAGCTCGGCAAACGCGGCTTCGATCACGTCGAGACCTGGGTGTTCGATCTCGACAACACGCTGTACCCGCATCACCTCAACCTATGGCAGCAGGTCGATGCGCGGATCCGCGACTTCGTCGCCGACTGGCTGAAGGTTTCGCCGGAAGAAGCCTTCCGTATCCAGAAGGATTACTACAAGCGCTACGGCACCACGATGCGCGGGATGATGACCGAGCACGGCGTTCACGCCGACGACTACCTGGCTTATGTCCACGCCATCGACCATTCGCCGCTGCAGCCGAATCCGGCGATGGGCGATGCGATCGAGCGACTGCCGGGCCGCAAGCTGATCCTGACCAACGGCTCGACCGCCCATGCGGGCAAGGTGCTGGAGCGGCTCGGCATCGGCCATCATTTCGAGGCGGTGTTCGACATCATTGCGGCCGACCTCGAGCCGAAGCCGGCGCCGCAGACCTACCGCCGTTTTCTCGATCGCCATGGTGTCGACCCGGCCCGCGCCGCGATGTTCGAAGACCTCGCCCGCAACCTCACCGTGCCGCACCAGCTCGGCATGACCACCGTGCTGGTGGTGCCTGACGATAGCCAGGACGTGGTCCGCGAAGATTGGGAGCTTGAAGGCCGCGACGCCGCCCACGTCGATCACGTGACTGATGATTTGACAGGGTTCTTGGGGAAGCTGAGTTCGCTGTAGGCCGGGGACGCCTCCCAAGCGTCAATCGTCATCGCCGCCGGATGCAAGGCGGCTAGGTATTGCGGAGCGCTCGCGATCTTCCGTCCAATGCCCTGGGATACTGGATCGCCCGGACGAGCCGGGCGACGACGTTGAAGAGAGATGACGTGGCGTCACCACATCCCCCGCCGTCATCGCCCGCGCAGGCGGGCGATGACTTGGCGGACGGGGCGGCGCCTTGACTCCGACCCGGCGAATCCGGACAACACTCCGCAAAACTCTCCCTGAAATCAGCCTCCCAAGGACCCGTCGATGCCGCTCACCGCCCTGGAATCTACCATCAACGCCGCTTTCGACGCGCGCGACACCGTTACCGCGGCGACGCAGGGCGAGATTCGTCAGGCCGTCGAGGATGCGCTCGATCTGCTCGACCAGGGCAAGGTGCGGGTGGCGCGGCGCGACGACTCCGGCGCCTGGACGGTCAATCAGTGGCTGAAGAAAGCAGTGCTGCTGTCGTTCCGGCTCAACGACATGGGCGTGATCGCCGGCGGCCCGGGCGGCGCCAACTGGTGGGACAAGGTGCCGTCGAAGTTCGAGGGCTGGGGTGAGAACCGCTTCCGCGAGGCCGGCTTCCGCGCCGTGCCGGGCCGATCGTCGCGCGTCGGCCTTTATCGCCAAGACGCGGTACTGATCCGTCCTTCGTCAATCTCGGCGCTTACGTCGATGAAAGCACCATGGTCGAACACCTGGGCGACCGTCGGCTCCTGCGCCCAGATCGGCAAGCGCGTGCACATCTCCGGCGGTGCCGGCATCGGCGGCGTGCTCGAGCCGCTGCAGGCCGGCCCGGTGATCATCGAGGACGACTGCTTCATCGGCGCCCGCTCCGAAGTCGCCGAAGGCGTGATCGTGCGCAAGGGTGCGGTGCTGGCGATGGGCGTTTTCCTCGGCGCCTCGACCAAGATCGTCGACCGCGAGACCGGCGAAATCTTCGTCGGCGAAGTGCCGGAATATGCCGTGCTGGTGCCCGGCACCCTGCCCGGCAAGCCGATGAAGAACGGCGCCCCCGGCCCAGCCACCGCCTGCGCGGTGATCGTCAAGCGCGTCGACGAGCGCACCCGTTCCAAGACCTCGATCAACGAATTGCTGCGGGACTGACACCTGTAGGAGGCGCGAATGGACTGGACCACGCTGTTCTTCAGCTTTCGAGGTCGGATCAATCGCGCCAAATACTGGCTGGTCGGACTGATCTACGTCGCCGCCTGGATGG ….
sekvence
Co lze v DNA najít?
strukturní a organizační elementyevoluční vztahygenypromotory a další řídící elementy„cizí“ DNA
všeobecná analýza
geny
Jak najít geny?
LeucinRhodobacter capsulatus
antikodón počet % CUA 3 <1 CUC 119 16 CUG 458 60 CUU 157 20 UUA 0 0 UUG 27 3
Escherichia coli
% 4 9 52 10 11 13
geny
geny
alignment
Jaké proteiny geny kódují?
PSI-BLASTHMMER
SSEARCHBLITZ
FASTABLAST
Dot plot1:1
n:n
nClustalWMultAlign
1:n
Dot plot
SSEARCH ftp://ftp.virginia.edu/pub/fastaBLITZ ... http://www.ebi.ac.uk
alignment
PSI-BLASTHMMER
SSEARCHBLITZ
FASTABLAST
Dot plot1:1
n:n
nClustalWMultAlign
1:nFASTA http://www.ebi.ac.ukBLAST http://ncbi.nlm.nih.gov/blast
alignment
PSI-BLASTHMMER
SSEARCHBLITZ
FASTABLAST
Dot plot1:1
n:n
nClustalWMultAlign
1:n
PSI-BLAST http://ncbi.nlm.nih.govHMMER
ClustalWMultAlign
alignment