24
Maize Production Sequencing [email protected] .edu

Maize Production Sequencing [email protected]

Embed Size (px)

Citation preview

Page 1: Maize Production Sequencing lfulton@watson.wustl.edu

Maize Production Sequencing

[email protected]

Page 2: Maize Production Sequencing lfulton@watson.wustl.edu

Maize Production Goals

BAC End Sequencing of 220,000 Clones

Fosmid End Sequencing of 500,000 Clones

Shotgun of 16,000 BAC Clones

Page 3: Maize Production Sequencing lfulton@watson.wustl.edu

Maize BAC End Sequences

580,000 reads processed

567 average read length

60% success

Page 4: Maize Production Sequencing lfulton@watson.wustl.edu

Maize Fosmid End Sequences

850,000 processed

79% success

543 average read length

Completed today

Page 5: Maize Production Sequencing lfulton@watson.wustl.edu

Library Construction Pipeline

Receipt of sheared DNA from AGI

Size selection of insert DNA

Ligation into pSMART vector

Page 6: Maize Production Sequencing lfulton@watson.wustl.edu

Constructed 17,034 Libraries as of August 31st

MAIZE CLONES SHIPPED AND LIBRARIES CONSTRUCTED

0

400

800

1200

1600

2000

2400

2800

Oct-05Nov-05Dec-05Jan-06Feb-06Mar-06Apr-06May-06Jun-06Jul-06Aug-06Sep-06Oct-06Nov-06Dec-06Jan-07Feb-07Mar-07Apr-07May-07Jun-07Jul-07Aug-07

Date-Year

Number of Clones Shipped and

Libraries Constructed

Clones Shipped Libraries Constructed

Page 7: Maize Production Sequencing lfulton@watson.wustl.edu

Library Construction Pass Rate

0

200

400

600

800

1000

1200

1400

1600

1800

Nov-05Dec-05Jan-06Feb-06Mar-06Apr-06May-06Jun-06Jul-06Aug-06Sep-06Oct-06Nov-06Dec-06Jan-07Feb-07Mar-07Apr-07May-07Jun-07Jul-07Aug-07

Month-Year

Libraries Constructed (Pass-Fail)

Libraries Constructed Libraries Failed

Average Fail Rate for Library Construction was less than 5%

Page 8: Maize Production Sequencing lfulton@watson.wustl.edu

3.5X coverage

Clone size verification

50% paired ends

BES agreement

25% of clones failed

22% need more data

3% BES disagreement

Shotgun Criteria

Page 9: Maize Production Sequencing lfulton@watson.wustl.edu

Shotgun Complete for 12,211 Clones as of August 31st

MAIZE CLONES SHOTGUN COMPLETED

95 61 113 119226

484 459 436 360 418 357279

681774

577

1052

830

1082

856778

882

1197

0

200

400

600

800

1000

1200

1400

1600

1800

Nov-05Dec-05Jan-06Feb-06Mar-06Apr-06May-06Jun-06Jul-06Aug-06Sep-06Oct-06Nov-06Dec-06Jan-07Feb-07Mar-07Apr-07May-07Jun-07Jul-07Aug-07

Number of Clones

Page 10: Maize Production Sequencing lfulton@watson.wustl.edu

Final Production Work

660 Clones Need Library Construction

2100 Clones In Production Pipeline

Expected Completion Date December 2007

Page 11: Maize Production Sequencing lfulton@watson.wustl.edu

Sequence Improvement Bob Fulton

Dick McCombie

Rod Wing

Page 12: Maize Production Sequencing lfulton@watson.wustl.edu

Sequence Improvement Pipeline

•Shotgun_done triggers the prefinishing

pipeline

•Initial identification of “do finish”

regions

•Manual sorting and use of

autoedit(Gordon) to break apart

misassembly.

•Autofinish(Gordon) used to choose

directed reactions for all gaps and

regions of low quality in “do finish”

regions

•Reassembly and 2nd iteration of

prefinishing pipeline

•Final identification of “do finish”

regions and handoff to finishing

pipeline

Page 13: Maize Production Sequencing lfulton@watson.wustl.edu

0

100

200

300

400

500

600

700

800

1-5 ctg6-10 ctg11-15 ctg16-20 ctg21-25ctg26-30 ctg31-35 ctg35-40 ctg40+ ctg

before prefinish after prefinish

Clone Improvement through the Prefinishing Pipeline

Page 14: Maize Production Sequencing lfulton@watson.wustl.edu
Page 15: Maize Production Sequencing lfulton@watson.wustl.edu

End

Spanning Plasmids

Coverage (green)

Assembly View-Entire Clone

Page 16: Maize Production Sequencing lfulton@watson.wustl.edu

Repeat Tags

Do Finish

GSS sequence

EST sequence

Assembly View-Do Finish Region

Page 17: Maize Production Sequencing lfulton@watson.wustl.edu

Alignment with cDNA read pairs

Alignment with End Sequences

Page 18: Maize Production Sequencing lfulton@watson.wustl.edu

Pipeline stats across time

0

2000

4000

6000

8000

10000

12000

14000

16000

12 18 26 30 33 36 39 42 45 48 51 54 58 61 64 67 70 73 76 79 82 85 88 91 94 97

weeks

number of clones

library_done shotgun_done prefin_done finished

Page 19: Maize Production Sequencing lfulton@watson.wustl.edu

Actual Projected

Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08Group 17 18 19 20 21 22 23 24 25 26 27 28 29 30

GSC 220 250 300 350 400 400 400 400 400 400 400 400 300 71GSC Cumulative 2029 2279 2579 2929 3329 3729 4129 4529 4929 5329 5729 6129 6429 6500AGI 160 160 160 160 160 160 160 160 160 0 0 0 0 0AGI Cumulative 1720 1880 2040 2200 2360 2520 2680 2840 3000 3000 3000 3000 3000 3000CSHL 300 325 350 375 400 400 400 400 400 400 400 400 250 243CSHL Cumulative 1757 2082 2432 2807 3207 3607 4007 4407 4807 5207 5607 6007 6257 6500total 680 735 810 885 960 960 960 960 960 800 800 800 550 314Total Cumulative 5506 6241 7051 7936 8896 9856 10816 11776 12736 13536 14336 15136 15686 16000

Page 20: Maize Production Sequencing lfulton@watson.wustl.edu

Maize GenBank Submissions

Joanne Nelson

Page 21: Maize Production Sequencing lfulton@watson.wustl.edu

Submission Landmarks

HTGS_FULLTOPHTGS_PREFINHTGS_ACTIVEFINHTGS_IMPROVED

Page 22: Maize Production Sequencing lfulton@watson.wustl.edu

Improved Sequence

“Non-repetitve portions of the sequence have had sequence improvement (directed attempts) and have been labeled as ‘improved.’ Improved regions are double stranded, sequenced with an alternate chemistry or covered by high quality data (i.e. phred quality greater than or equal to 30 or approval by an experienced finisher), unless otherwise noted. Regions of low sequence complexity (such as dinucleotide repeats and small unit tandem repeats) in the improved regions have not been resolved to previously established finishing standards. BAC end sequence, cot and methyl filtered genome survey sequence and data from overlapping projects of strain B73 may have been included in this project.Where possible, contigs have been ordered and oriented based on read pairing. These regions are designated as scaffolds. Additional order and orientation will be provided upon completion of detailed analysis of the complete finished tiling path.”

Page 23: Maize Production Sequencing lfulton@watson.wustl.edu

Improved SequenceFEATURES Location/Qualifiers source 1..184604 /organism="Zea mays" /mol_type="genomic DNA" /db_xref="taxon:4577" /chromosome="1" /clone="CH201-132J17; ZMMBBc0132J17" misc_feature 1..69252 /note="scaffold_name:Scaffold1" misc_feature 1..34245 /note="assembly_name:Contig28 vector_side:SP6" misc_feature 32401..34245 /note="Improved sequence." unsure 34230..34245 /note="Non-repetitive but unresolved region" gap 34246..34345 /estimated_length=unknown misc_feature 34346..68071 /note="assembly_name:Contig27" misc_feature 34346..36695 /note="Improved sequence." unsure 34346..34356 /note="Non-repetitive but unresolved region" misc_feature 38146..46795 /note="Improved sequence." gap 68072..68171 /estimated_length=unknown misc_feature 68172..69252 /note="assembly_name:Contig14" gap 69253..69352 /estimated_length=unknown misc_feature 69353..132243 /note="scaffold_name:Scaffold2”

Page 24: Maize Production Sequencing lfulton@watson.wustl.edu

Submission Totals

HTGS_FULLTOP 3342HTGS_PREFIN 2014HTGS_ACTIVEFIN 4151HTGS_IMPROVED 2660

TOTAL 12167