Reconeixement de patrons en sequencies. Introduccio

Preview:

DESCRIPTION

Reconeixement de patrons en sequencies. Introduccio. roderic guig ó i serra centre de regulació genòmica universitat pompeu fabra roderic.guigo@crg.cat. - PowerPoint PPT Presentation

Citation preview

Reconeixement de patrons en sequencies. Introduccio

roderic guigó i serracentre de regulació genòmica

universitat pompeu fabraroderic.guigo@crg.cat

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

ACTCAGCCCCAGCGGAGGTGAAGGACGTCCTTCCCCAGGAGCCGGTGAGAAGCGCAGTCGGGGGCACGGGGATGAGCTCAGGGGCCTCTAGAAAGATGTAGCTGGGACCTCGGGAAGCCCTGGCCTCCAGGTAGTCTCAGGAGAGCTACTCAGGGTCGGGCTTGGGGAGAGGAGGAGCGGGGGTGAGGCCAGCAGCAGGGGACTGGACCTGGGAAGGGCTGGGCAGCAGAGACGACCCGACCCGCTAGAAGGTGGGGTGGGGAGAGCATGTGGACTAGGAGCTAAGCCACAGCAGGACCCCCACGAGTTGTCACTGTCATTTATCGAGCACCTACTGGGTGTCCCCAGTGTCCTCAGATCTCCATAACTGGGAAGCCAGGGGCAGCGACACGGTAGCTAGCCGTCGATTGGAGAACTTTAAAATGAGGACTGAATTAGCTCATAAATGGAAAACGGCGCTTAAATGTGAGGTTAGAGCTTAGAATGTGAAGGGAGAATGAGGAATGCGAGACTGGGACTGAGATGGAACCGGCGGTGGGGAGGGGGAGGGGGTGTGGAATTTGAACCCCGGGAGAGAAAGATGGAATTTTGGCTATGGAGGCCGACCTGGGGATGGGGAAATAAGAGAAGACCAGGAGGGAGTTAAATAGGGAATGGGTTGGGGGCGGCTTGGTAACTGTTTGTGCTGGGATTAGGCTGTTGCAGATAATGGAGCAAGGCTTGGAAGGCTAACCTGGGGTGGGGCCGGGTTGGGGTCGGGCTGGGGGCGGGAGGAGTCCTCACTGGCGGTTGATTGACAGTTTCTCCTTCCCCAGACTGGCCAATCACAGGCAGGAAGATGAAGGTTCTGTGGGCTGCGTTGCTGGTCACATTCCTGGCAGGTATGGGGCGGGGCTTGCTCGGTTTTCCCCGCTTCTCCCCCTCTCATCCTCACCTCAACCTCCTGGCCCCATTCAAGCACACCCTGGGCCCCCTCTTCTTCTGCTGGTCTGTCCCCTGAGGGGAAAGCCCAGGTCTGAGGCTTCTATGCTGCTTTCTGGCTCAGAACAGCGATTTGACGCTCTGTGAGCCTCGGTTCCTCCCCCGCTTTTTTTTTTTCAGCCAGAGTCTCACTCTGTCGCCCAGGCTGGAGTGCAGTGGCGCAATCTCAGCTCACTGCAAGCTCCGCCTCCCGGGTTCACGCTATTCTCCCGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCCCGCCACCATGCCCGGCTAATTTTTTGTACTTTGAGTAGGGAAGGGGTTTCACTGTATTATCCAGGATGGTCTCTATCTCCTGACCTCGTGATCTGCCCGCCTGGCCTCCCAAAGTGCTGGAATTACAGGCGTGAGCCTCCGCGCCCGGCCTCCCCATCCTTAATATAGGAGTTAGAAGTTTTTGTTTGTTTGTTTTGTTTTGTTTTTGTTTTGTTTTGAGATGAAGTCCCTCTGTCGCCCAGGCTGGAGTGCAGTGGCTCCCAGGCTGGAGTTCAGTGGCTGGATCTCGGCTCACTGCAAGCTCCGCCTCCCAGGTTCACGCCATTCTCCTGCCTCAGCCTCCGGAGTAGCTGGGACTACAGGAACATGCCACCACACCCGACTAACTTTTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGCCAGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTGCCTGCTTCAACCTCCCAAAGTGCTGGGATTACAGACGTGGGCCACCGCGCCCGGCTGGGAGTTAAGAGGTTTCTAATGCATTGCATTAGAATACCAGACACGGGACAGCTGTGATCTTTATTCTCCATCACCCCACACAGCCCTGCCTGGGGCACACAAGGACACTCAATACACGCTTTTCGGGCGCGGTGGCTCAAGCTGTAATCCCAGCACTTTGGGAGGCTGAGGCGGGTGGTACATGAGGTCAGGAGATCGAGACCATCCTGGCTAACATGGTGAAACCCCGTCTCTACTAAAAATACAAAAAACTAGCCCGGGCGTGGTGGCGGGCGCCTGTAGTCCCAGCTACTCGGAGGCTGAGGCAGGAGAATGGCGTGAACCTGGGAGGCGGAGCTTGCAGTGAGCCGAGATCGCGCCACTGCACTCCAGCCTGGGTGACACAGCGCGAGACTCCGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAATACACGCTTTTCCGCTAGGCACGGTGGCTCACCCCTGTAATCCCAGCATTTTGGGAGGCCAAGGTGGGAGGATCACTTGAGCCCAGGAGTTCAACACCAGACTCAGCAACATAGTGAGACTCTCTCTACTAAAAATACAAAAATTAGCCAGGCCTGGTGCCACACACCTGTGGTCCCAGCTACTCAGAAGGCTAAGGCAGGAGGATCGCTTAAGCCCAGAAGGTCAAGGTTGCAGTGAACCACGTTCAGGCCACTGCAGTCCAGCCTGGGTGACAGAGCAAGACCCTGTCTGTAAATAAATAACGCTTTTCAAGTGATTAAACAGACTCCCCCCTCACCCTGCCCACCATGGCTCCAAAGCAGCATTTGTGGAGCACCTTCTGTGTGCCCCTAGGTACTAGCTGCCTGGACGGGGTCAGAAGGAACCTGAACCACCTTCAACTTGTTCCACACAGGATGCCAGGCCAAGGTGGAGCAACCGGTGGAGCCAGAGACAGAACCCGACGTTCGCCAGCAGGCTGAGTGGCAGAGCGGCCAGCCCTGGGAGCTGGCACTGGGTCGCTTTTGGGATTACCTGCGCTGGGTGCAGACACTGTCTGAGCAGGTGCAGGAGGAGCTGCTCAGCCCCCAGGTCACCCAGGAACTGACGTGAGTGTCCCCATCCCGGCCCTTGACCCTCCTGGTGGGCGGCTATACCTCCCCAGGTCCAGGTTTCATTCTGCCCCTGCCACTAAGTCTTGGGGGCCTGGGTCTCTGCTGGTTCTAGCTTCCTCTTCCCATTTCTGACTCCTGGCTTTAGCTCTCTGGAATTCTCTCTCTCAGTTCTGTTTCTCCCTCTTCCCTTCTGACTCAGCCTGTCACACTCGTCCTGGCGCTGTCTCTGTCCTTCACTAGCTCTTTTATATAGAGACAGAGAGATGGGGTCTCACTGTGTTGCCCAGGCTGGTCTTGAACTTCTGGGCTCAAGCGATCCTCCCACCTCGCCTCCCAAAGTGCTGGGAATAGAGACATGAGCCACCTTGCTCGGCCTCCTAGCTCTTTCTTCGTCTCTGCCTCTGCTCTCTGCGTCTGTCTTTGTCTCCTCTCTGCCTCTGTCCCGTTCCTTCTCTCTTGGTTCACTGCCCTTCTGTCTCTCCCTGTTCTCCTTAGGAGACTCTCCTCTCTTCCTTCTCGAGTCTCTCTGGCTGATCCCCATCTCACCCACACCTATCC

http://www.ict-science-to-society.org/

Principis dels anys 60:

desxiframent del codi genètic

GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTAGATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGT

MALWTRLRPLLALLALWPPPPARAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGALELAGGPGAGGLEGPPQKRGIVEQCCASVCSLYQLENYCN

Trobar una sequencia dins d’una altra

ATCGTCCCGTC

GTC

4ATCGTCCCGTC

4 9ATCGTCCCGTC

BEGIN{ LongSeq="ATCGTCCCGTC"; ShortSeq="GTC"; lenL = length (LongSeq); lenS = length (ShortSeq);

for (i=1;i<=(lenL-LenS+1);i++) {nm=0; for (j=1;j<=lenS;j++) if (LongSeq[i+j-1] == ShortSeq[j])

nm++;if (nm==lenS) print i;

}}

Quantes operacions calen per dur a terme aquest tasca?

• Complexitat LenS x LenL• Velocitat de l’ordinador mes gran del mon

– 34 PetaFlops (3 x 1016 operacions/segon)

• Nombre d’operacions necessaries per comparar una sequencia de 100bp amb tot el genoma huma:– 100 x 3 x 109 = 3 x 1011

• En un segon, podem fer 100,000 comparacions com aquesta.

• Un simple experiment amb NGS genera 200 milions de sequencies de 100bp. El temps requerit per comparar-les al genoma huma es– 2 x 108 x 3 x 1011 = 6 x 1019

– Es a dir, triguem 2,000 segons. Uns 30 minuts.

The ENCODE RNASeq

Carrie Davis, CSHL

Quantes operacions calen per dur a terme aquest tasca?

• En el projecte ENCODE hem generat uns 30,000 milions de reads nomes per l’RNA. – 3 x 1010 x 3 x 1011 = 1022

– Es a dir, triguem 3 milions de segons. Unes 830 hores. Un mes sencer

Quantes operacions calen per dur a terme aquest tasca?

• En el projecte ENCODE hem generat uns 30,000 milions de reads nomes per l’RNA. – 3 x 1010 x 3 x 1011 = 1022

– Es a dir, triguem 3 milions de segons. Unes 830 hores. Un mes sencer

• A finals d’any el projecte GTEx haura generat 8,000 mostres de RNASeq de 50 milions de reads– 4 x 1011 x 3 x 1011 = 1023

– Es a dir un any sencer

• El Mare Nostrum te 64TeraFlops. Podeu calcular quan trigaria a processar les dates del GTEx?

Metodes d’indexacio/compressio

CGATGCTATGAAACAGTAACCG 1GA 2,9AT 3,7TG 4,8GC 5TA 6,16AA 10,11,17AC 12,18CA 13AG 14GT 15