Sequencing the Maize (B73) Genome Genome Sequencing Center Maize Genome Sequencing Consortium

  • View

  • Download

Embed Size (px)

Text of Sequencing the Maize (B73) Genome Genome Sequencing Center Maize Genome Sequencing Consortium

  • Slide 1

Sequencing the Maize (B73) Genome Genome Sequencing Center Maize Genome Sequencing Consortium Slide 2 The Team WU Genome Sequencing Center (R. Wilson, PI) - Bob Fulton, Pat Minx, Sandy Clifton Arizona Genome Institute (R. Wing) Cold Spring Harbor Laboratory - D. Ware, L. Stein - R. McCombie, R. Martienssen Iowa State University (P. Schnable & S. Aluru) The Maize research community Slide 3 The Plan Slide 4 Progress as of 9/30/06 Slide 5 Slide 6 Agenda 9:00 9:15 Introductions and Project Overview (Rick Wilson) 9:15 10:15 Plans and Progress WU/AGI/CSHL/ISU Project Map and Tile Path Selection (Rod Wing) Library Construction and Production (Lucinda Fulton) Sequence Improvement (Bob Fulton, Dick McCombie, Rod Wing) Data Submission (Joanne Nelson) Annotation and Data Display (Doreen Ware) Outreach (Rick Wilson) 10:15 - 10:30 Break 10:30 11:00 Plans and Progress DOE Project (Dan Rohksar) 11:00 11:30 Future Plans and Collaborations Pat Schnable (by phone) - retrotransposons 11:30 Noon Executive Session Noon 1:00 Working Lunch and Discussion 1:00 Depart for Airport Slide 7 BAC-by-BAC Strategy to Sequence the Maize Genome Maize B73 Genome (2300 Mb) BAC library construction (Hind III, EcoR I/MboI ; 27X deep ; 150kb avg. insert) BAC End Sequencing ~800,000 Genetic Anchoring in silico, overgo hybridization Fingerprinting ~460,000 BACs STC database BAC physical maps (HICF & Agarose) FPC databases (Agarose and HICF) Choose a seed BAC Shotgun sequencing and finishing STC database search, FP comparison Determine minimum overlap BACs Complete maize genome sequence Slide 8 Map Summary 1.Total Assembled Contigs: 721 Equal to 2,150 Mb, 93.5% coverage of 2300 Mb genome Anchored: 421 ctgs, 86.1% the genome average anchored contig size: 4.7 Mb Unanchored: 300 ctgs, 7.4% coverage average unanchored contig size: 0.56 Mb 189 of the 300 unanchored contigs are less than 10 clones Largest anchored contig 22.9Mb in Chr9 Largest unanchored contig 6.7 Mb 2.Total FPC Markers:25,924 STS markers:9,129 Overgo Markers:14,877 Anchored markers: 1918 Slide 9 MTP Selection Seed BACs: 4000, done Mega Contig: 197, done Clone Walking from Seed BACs: 2,800 done; in progress Total clones picked = 6,997 On track to deliver 1000 clones/month until maze MTP is complete Slide 10 Flowchart for MTP picking and Library Construction Clone selection (combine seed BAC and BAC end sequences with fingerprinting and trace files) Clone picking (Resource Center) MTP sequencingGenBank BAC end sequence database Library DNA production Hfq sequencing Clone verification Clone shipping Continue shotgun library construction at WashU DNA shearing Seed BAC database MTP BAC end database Library DNA production Slide 11 Seed BAC Walking In Agarose and HICF map, selecting large clones next to seed BAC Blastn search of BAC end sequences against seed BAC sequences Check blastn alignment for candidate clones Check trace file for Dye blob Check the Sulston score in HICF map for overlap Check Agarose fingerprints to avoid overlap with large bands Choose walking clone Slide 12 Minimum Tile Path Pipeline BAC End Sequence of potential BACs are BLASTed against the Seed BACs Results are classified based on location on the FPC A table for each BAC is created of filtered BLAST results with links to CMap and GBrowse Blast results are imported into CMap and GBrowse with additional information such as trace files and FPCs Slide 13 Minimum Tile Path Pipeline Usage A table of alignments between the seed BAC and the BAC end sequences contains links to CMap and GBrowse. CMap displays the FPC data for the seed BAC and the potential next BACs. GBrowse provides an alignment of the BES with the seed sequence and displays the trace data. Slide 14 Blast Results Table Slide 15 Maize Production Sequencing Shotgun of 19,000 BACs Fosmid End Sequencing of 1 Million Reads BAC End Sequencing of 220,000 clones Slide 16 Maize BAC shotgun BAC DNA received from AGI or prepared at the GSC Small Scale Library Construction Production Sequencing - 1,536 reads/project Automated Shotgun_done Slide 17 Slide 18 To date 3,106 BAC clones are shotgun_done Slide 19 Maize Fosmid Sequencing Fosmid trays 0001 to 0471 were received from Messing lab Initial QC was fine, but bulk shipment has failed to grow Stamping results of the original trays show no growth 85 Fosmid ligations which represent ~250,000 clones were received from the Messing lab, plating is underway GSC Fosmid library construction has been completed and represents 1M clones Expected completion date is November of this year. Slide 20 Slide 21 Maize BAC End Sequencing BAC end sequencing will be completed next week Total of 440,000 reads from two different libraries Pass rate of 75% with an average read length 600 bases Paired end read rate is ~70% Slide 22 Sequence Improvement Pipeline Shotgun_done triggers the prefinishing pipeline Initial identification of do finish regions Manual sorting and use of autoedit(Gordon) to break apart misassembly. Autofinish(Gordon) used to choose directed reactions for all gaps and regions of low quality in do finish regions Reassembly and 2nd iteration of prefinishing pipeline Final identification of do finish regions and handoff to finishing pipeline Slide 23 Clone Improvement through the Prefinishing Pipeline Slide 24 Slide 25 End Spanning Plasmids Coverage (green) Slide 26 Repeat Tags Do Finish GSS sequence EST sequence Slide 27 Alignment with cDNA read pairs Alignment with End Sequences Slide 28 Future Plans for Improved Throughput Automated Shotgun-done status assigning Overlap Evaluation at Prefinishing Addition of Fosmid End Pairs at Prefinishing Direct Sequencing for Unspanned Gaps Additional Finishing Staff Hired at all 3 Centers Slide 29 Maize clone submissions clone status submission keywords shotgun complete HTGS_PHASE1; HTGS_FULLTOP 2 rounds of prefinish HTGS_PHASE1; HTGS_PREFIN in finishing HTGS_PHASE1; HTGS_ACTIVEFIN finished HTGS_PHASE1; HTGS_IMPROVED zea mays[ORGN] AND HTGS_PREFIN[KYWD] AND WUGSC[CNTR] zea mays[ORGN] AND HTGS_IMPROVED[KYWD] AND WUGSC[CNTR] Restrict by date range: zea mays[ORGN] AND WUGSC[CNTR] AND HTGS_FULLTOP[KYWD] AND 2006/09[PDAT] zea mays[ORGN] AND WUGSC[CNTR] AND HTGS_FULLTOP[KYWD] AND 2006/09/26:2006/10/03[PDAT] Query GenBank by keywords Slide 30 HTGS_IMPROVED submissions Pick a clonename, any clonename - DEFINITION Zea mays chromosome 4 clone CH201-11H16; ZMMBBc0011H16 Center project name: Z_AF-11H16 Improved sequence is annotated on submission record Where possible, contigs have been ordered and oriented based on read pairing. and these regions are designated as scaffolds. Small contigs (