76
Introduction to Bioinformatics Tuesday, 29 January Can we push the due date to Friday? X

Introduction to Bioinformatics Tuesday, 29 January

  • Upload
    heller

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction to Bioinformatics Tuesday, 29 January. X. Can we push the due date to Friday?. Introduction to Bioinformatics Tuesday, 29 January. Comments from Questionnaire. I need a textbook to teach me the basics. Introduction to Bioinformatics Tuesday, 29 January. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to Bioinformatics Tuesday, 29 January

Introduction to BioinformaticsTuesday, 29 January

Can we push the due date to Friday?

X

Page 2: Introduction to Bioinformatics Tuesday, 29 January

Introduction to BioinformaticsTuesday, 29 January

Page 3: Introduction to Bioinformatics Tuesday, 29 January

Comments from Questionnaire

I need a textbook to teach me the basics.

Page 4: Introduction to Bioinformatics Tuesday, 29 January

Introduction to BioinformaticsTuesday, 29 January

I need a textbook to teach me the basics.

Page 5: Introduction to Bioinformatics Tuesday, 29 January

I need a textbook to teach me the basics.

Page 6: Introduction to Bioinformatics Tuesday, 29 January

Comments from Questionnaire

I don't understand the purpose of mates.

I'm not really sure how to go about figuring out how much 1X of the

genome would be. How much what? Is this referring to nucleotides or contigs?

What is the purpose of a DNA library?

What do you do think is the better option for everyday sequencing

use in the real world: Dideoxy or shotgun genome?

Page 7: Introduction to Bioinformatics Tuesday, 29 January

Shotgun Sequence of a Genome

Drosophila genome(~100 million nt)

Page 8: Introduction to Bioinformatics Tuesday, 29 January

Shotgun Sequence of a Genome

Page 9: Introduction to Bioinformatics Tuesday, 29 January

Shotgun Sequence of a Book

Page 10: Introduction to Bioinformatics Tuesday, 29 January

Marley was dead, to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it. And Scrooge's name was good upon 'Change for anything he chose to put his hand to.

Old Marley was as dead as a doornail.

Shotgun Sequence of a Book

Page 11: Introduction to Bioinformatics Tuesday, 29 January

Marley was dead, to begin with. There is no doubt whatever about that. The register of his burial was signed by the clergyman, the clerk, the undertaker, and the chief mourner. Scrooge signed it. And Scrooge's name was good upon 'Change for anything he chose to put his hand to.

Old Marley was as dead as a doornail.

Shotgun Sequence of a Book

Page 12: Introduction to Bioinformatics Tuesday, 29 January

Marley was

begin wit

dead, to be

ad, to begi

There is no doub

Marley was dead

Shotgun Sequence of a Book

…you would have to oversample a sequence for the shotgun

approach to work?

Page 13: Introduction to Bioinformatics Tuesday, 29 January

Sequencing processDrosophila genome(~100 million nt)

. . .Suppose broken into 500 nt fragments

Page 14: Introduction to Bioinformatics Tuesday, 29 January

Sequencing processDrosophila genome(~100 million nt)

. . .SAMPLE

Page 15: Introduction to Bioinformatics Tuesday, 29 January

Sequencing processDrosophila genome(~100 million nt)

SAMPLE

. . .

Page 16: Introduction to Bioinformatics Tuesday, 29 January

Marley was

begin wit

dead, to be

ad, to begi

There is no doub

Marley was dead

Marley was dead, to begin with. There is no doub

Shotgun Sequence of a Book

Page 17: Introduction to Bioinformatics Tuesday, 29 January

Marley was

begin wit

dead, to be

ad, to begi

There is no doub

Marley was dead

Marley was dead, to begin with. There is no doub

Shotgun Sequence of a Book

Page 18: Introduction to Bioinformatics Tuesday, 29 January

Marley was

begin wit

dead, to be

ad, to begi

There is no doub

Marley was dead

Marley was dead, to begin with.

Shotgun Sequence of a Book

There is no doub

Contig #47

Contig #29 How to connect contigs?How to get the snippets?

Page 19: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencingCGACCATCGCCTTAGTAC

Page 20: Introduction to Bioinformatics Tuesday, 29 January

DNA replication

Page 21: Introduction to Bioinformatics Tuesday, 29 January

DNA replication

Page 22: Introduction to Bioinformatics Tuesday, 29 January

DNA replication

Page 23: Introduction to Bioinformatics Tuesday, 29 January

DNA replication

Page 24: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencing

Page 25: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencing

Page 26: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencing

Page 27: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencing

Page 28: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencing

Page 29: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencing

Page 30: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencing

Page 31: Introduction to Bioinformatics Tuesday, 29 January

What is the sequence (5' to 3') represented by the gel? G A T C

Myers et al SQ2

Page 32: Introduction to Bioinformatics Tuesday, 29 January

What is the sequence (5' to 3') represented by the gel? G A T C

ddCddC

ddCddC

ddC

TCGTGTACATCGTAACACGGTTAAGT

Myers et al SQ2

Page 33: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencing

Page 34: Introduction to Bioinformatics Tuesday, 29 January

Study Question 4What is high-quality sequence?

G A T C

Page 35: Introduction to Bioinformatics Tuesday, 29 January

Study Question 4What is high-quality sequence?

To determine high quality sequences,… How do you know when a peak stops

being high enough?

Could you explain in more detail the fluorescence chart

with the waves.

Page 36: Introduction to Bioinformatics Tuesday, 29 January

Study Question 4What is high-quality sequence?

Page 37: Introduction to Bioinformatics Tuesday, 29 January

Dideoxy sequencingHow sure are you?

To determine high quality sequences,… How do you know when a peak stops

being high enough?

Page 38: Introduction to Bioinformatics Tuesday, 29 January

For SQ3, I have been unable to identify the organism/molecule by using

BioBIKE. I have tried the function SEQUENCES-SIMILAR-TO.

Page 39: Introduction to Bioinformatics Tuesday, 29 January
Page 40: Introduction to Bioinformatics Tuesday, 29 January
Page 41: Introduction to Bioinformatics Tuesday, 29 January
Page 42: Introduction to Bioinformatics Tuesday, 29 January

DNA replication

Page 43: Introduction to Bioinformatics Tuesday, 29 January

DNA replication

Primer

How to provide a primer to an

unknown sequence?

Page 44: Introduction to Bioinformatics Tuesday, 29 January

G A T Cprimer

primer

plasmid

insert

~2000 nt mates

Myers et al SQ6Why read pairs? Scaffolds?

What is the purpose of a DNA library?

Page 45: Introduction to Bioinformatics Tuesday, 29 January

primer

plasmid

insert~40 letters

Mate pairs

How to connect contigsWhy read pairs? Scaffolds?

dead, to begin with. There is no doub

Page 46: Introduction to Bioinformatics Tuesday, 29 January

. . .

~ 150,000 nt

Bacterial Artificial CHROMOSOME

mates

Myers et al SQ6Why read pairs? Scaffolds?

What is BAC used for again? .

I don't understand the purpose of mates.

Marley was dead. God bless us every one.

Page 47: Introduction to Bioinformatics Tuesday, 29 January

Marley was

begin wit

dead, to be

ad, to begi

There is no doub

Marley was dead

Marley was dead, to begin with.

Shotgun Sequence of a Book

There is no doub

Contig #47

Contig #29 How to connect contigs?

Page 48: Introduction to Bioinformatics Tuesday, 29 January

Marley was

begin wit

dead, to be

ad, to begi

There is no doub

Marley was dead

Marley was dead, to begin with. There is no doub

Shotgun Sequence of a Book

How are gaps between assembled contigs "closed experimentally"?

Page 49: Introduction to Bioinformatics Tuesday, 29 January

Marley was dead, to begin with. There is no doubt whatever

about that. The register of his burial was signed by the

clergyman, the clerk, the undertaker, and the chief mourner.

Scrooge signed it. And Scrooge's name was good upon

'Change for anything he chose to put his hand to.

Old Marley was as dead as a doornail.

Shotgun Sequence of a Book

Polymerase Chain Reaction (PCR)

Requires known primer sequences,one on each of the two strands.

with. Ther

Page 50: Introduction to Bioinformatics Tuesday, 29 January

Sequencing vs AssemblyDideoxy sequencing

G A T C GGGATATGTCAGACGGTA

AATACAAGAACCCAAGCACCCAATTAA

GTCCGATAGGCTCTTGTCG

TCTGGAAGCATTTAACCG

TAATTCTCTTTGTTATGGTGTCTGACC

TGCAGCGTCAGCGAAA

TAAATTCTGCTAGTGTCCGGTTTGC

CGGATACGCGCGAGAACTGACGACAACTCAGCGA

Sequence assembly

Contig 1 Contig 2

Finishing

Page 51: Introduction to Bioinformatics Tuesday, 29 January

Sequencing vs AssemblyDideoxy sequencing

G A T C GGGATATGTCAGACGGTA

AATACAAGAACCCAAGCACCCAATTAA

GTCCGATAGGCTCTTGTCG

TCTGGAAGCATTTAACCG

TAATTCTCTTTGTTATGGTGTCTGACC

TGCAGCGTCAGCGAAA

TAAATTCTGCTAGTGTCCGGTTTGC

CGGATACGCGCGAGAACTGACGACAACTCAGCGA

Sequence assembly

Contig 1 Contig 2

Finishing

What do you do think is the better option for everyday sequencing

use in the real world: Dideoxy or shotgun genome?

Page 52: Introduction to Bioinformatics Tuesday, 29 January

SQ10. Why not 1X? How much of the sequence would thereby be determined?

Drosophila genome(~100 million nt)

SAMPLE

. . .

How many 500 nt samples needed 100 million nt?100 000 000 500

I'm not really sure how to go about figuring out how much 1X of the

genome would be. How much what? Is this referring to nucleotides or contigs?

Page 53: Introduction to Bioinformatics Tuesday, 29 January

Paint the wall

How long will this take?

SQ10. Why not 1X? How much of the sequence would thereby be determined?

Page 54: Introduction to Bioinformatics Tuesday, 29 January

Paint the wall

How long will this take?

SQ10. Why not 1X? How much of the sequence would thereby be determined?

Page 55: Introduction to Bioinformatics Tuesday, 29 January

Paint the wall

How long will this take?

40 "

25 "

1 sq "

SQ10. Why not 1X? How much of the sequence would thereby be determined?

Page 56: Introduction to Bioinformatics Tuesday, 29 January

Paint the wall

How long will this take?

40 "

25 "

1000paint balls?

SQ10. Why not 1X? How much of the sequence would thereby be determined?

Page 57: Introduction to Bioinformatics Tuesday, 29 January

Paint the wall

How long will this take?

40 "

25 "

1 sq "

SQ10. Why not 1X? How much of the sequence would thereby be determined?

Do more clones generate a more accurate genome assembly?

Page 58: Introduction to Bioinformatics Tuesday, 29 January

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 2 4 6 8 10

Oversampling

Com

plet

enes

s How much is painted with 1x oversampling?

SQ10. Why not 1X? How much of the sequence would thereby be determined?

Page 59: Introduction to Bioinformatics Tuesday, 29 January

What are gapped palindromes? ...how we can design an

algorithm to do this in normal english

Comments from Questionnaire

GATATCA palindrome?

Page 60: Introduction to Bioinformatics Tuesday, 29 January

What are gapped palindromes? ...how we can design an

algorithm to do this in normal english

Comments from Questionnaire

GATCATCA palindrome?

Page 61: Introduction to Bioinformatics Tuesday, 29 January

What are gapped palindromes? ...how we can design an

algorithm to do this in normal english

Comments from Questionnaire

GATCATCA palindrome?

Page 62: Introduction to Bioinformatics Tuesday, 29 January

What are gapped palindromes? ...how we can design an

algorithm to do this in normal english

Comments from Questionnaire

A palindrome?

Page 63: Introduction to Bioinformatics Tuesday, 29 January

Problem Set 1

Page 64: Introduction to Bioinformatics Tuesday, 29 January

Paint the wall

How long will this take?

40 "

25 "

1 sq "

SQ10. Why not 1X? How much of the sequence would thereby be determined?

Page 65: Introduction to Bioinformatics Tuesday, 29 January
Page 66: Introduction to Bioinformatics Tuesday, 29 January

2Firras

LawanginSonia

1Farah

KristenSandrine

Sue

3KavyaMandi

Supriya

5Franklin

SoniaTrevor

4Grace

JonathanKeith

Yordanos

7BobbyKathy

MoshrafTayab

6AbdallahCeleste

Tori

Me

FRONT

Problem Set 2, #3g: PalindromeSQ10: CoverageProblem Set 3, #1: Assembly

Page 67: Introduction to Bioinformatics Tuesday, 29 January
Page 68: Introduction to Bioinformatics Tuesday, 29 January

SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements:      a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."      b. ". . .trillions of overlaps between reads are examined."      c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."

Myers et al (2000)

Page 69: Introduction to Bioinformatics Tuesday, 29 January

how to read and understand the tables and the figures

they represent in reference to the sequencing.

Page 70: Introduction to Bioinformatics Tuesday, 29 January

SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements:      a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."      b. ". . .trillions of overlaps between reads are examined."      c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."

Myers et al (2000)

Is it okay if we answer the questions of the tour using the internet instead of answering them just through the article?

Page 71: Introduction to Bioinformatics Tuesday, 29 January

SQ14. From figures given in the text and in Table 1, check the accuracy of each of the following statements:      a. "We produced 3.156 million reads that yielded 1.76 Gbp of sequence. . ."      b. ". . .trillions of overlaps between reads are examined."      c. ". . .to produce 654,000 of the 2-kbp mates and 497,000 of the 10-kbp mates."

Myers et al (2000)

I'm having trouble setting up the calculations in a

way to help me solve SQ14.

Page 72: Introduction to Bioinformatics Tuesday, 29 January

SQ13. Consider the data types listed in Table 1.

Why is each important?

Myers et al (2000)

Page 73: Introduction to Bioinformatics Tuesday, 29 January
Page 74: Introduction to Bioinformatics Tuesday, 29 January

TATA boxes before genes?

Page 75: Introduction to Bioinformatics Tuesday, 29 January

Sequencing processDrosophila genome(~100 million nt)

. . .

Focus on one nucleotide…What’s the probability that it’s covered by one read?What’s the probability that it’s covered by two reads?

What’s the probability that it’s covered by 200,000 reads?

Page 76: Introduction to Bioinformatics Tuesday, 29 January

Questometer Report

Jan 17 Jan 22 Jan 24