Lecture Slides Lecture 2 Slides

Embed Size (px)

Citation preview

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    1/54

    TheHumanGenomeProject

    HaroldRiethman,PhD

    [email protected]

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    2/54

    TheHumanGenomeProject

    HumanGenomeMapping

    SangerDNASequencing

    PublicConsortium

    Draft

    Sequence

    CeleraDraftSequence

    Analysis

    of

    Draft

    Human

    Genome

    Sequence

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    3/54

    Landmarks FragmentEnds

    DNAprobes Restriction

    Enzyme

    STS Meiosis

    Radiation

    Cloneends

    HumanGenomeMapping

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    4/54

    MappingResolution

    From

    Matise

    et

    al.,

    Genome

    Analysis

    Vol

    4,

    1999

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    5/54

    MeioticRecombinationCreatesthe

    ChromosomeBreakpoints

    for

    Genetic

    Linkage

    Maps

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    6/54

    FromChakravarti and

    Lynn,

    Genome

    Analysis

    Vol 4,

    1999

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    7/54

    Analysisof

    Linkage

    Data

    Comparetheobservedfrequencythat2markersarecotransmittedtoprogenywiththeexpectedfrequencyofcotransmission

    if

    by

    random

    chance

    Formarkersthatareclosetogetheronachromosomethe

    ratioof

    observed/expected

    is

    high,

    said

    to

    be

    linked

    markers

    Linkagemapdistancesarestatisticalestimatesofmarker

    distancebased

    on

    recombination

    frequencies

    Forhumansthetotalnumberofbreaksandhencetheresolutionofhumangeneticlinkagemapsislimited(1cM,roughly

    =1Mb)

    because

    the

    number

    of

    meiosis

    is

    limited.

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    8/54

    RadiationCreates

    the

    Chromosome

    Breaks

    >radiation, >frequencyofbreaks

    tightlylinkedmarkerswilltendtoberetainedonthe

    sameRHfragments

    RadiationHybrid

    Maps

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    9/54

    Like linkage maps, RH maps are based upon probabilities

    Statistical errorinmarkerorderanddistance

    RetentionBiasnearCentromeres

    MappingpanelsofDNAavailable

    RadiationHybrid

    Maps

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    10/54

    CloneLibraryPreparation YACs&BACs

    CloneOverlapDetectionandContig

    Construction STSandFingerprintMethods

    Contig PlacementAlong

    Chromosome

    AlignmentwithSTSmaps&FISH

    Clonebased

    Physical

    Maps

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    11/54

    A) Yeast Artificial Chromosome (YAC) 200 kb 2 Mb

    B) BacterialArtificialChromosome(BAC) 150kb 250kb

    C)

    Cosmid/Fosmid 40kb

    CloningSystems

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    12/54

    Cloneends

    Clonebased

    PhysicalMap

    FromDunham

    et

    al.,

    GenomeAnalysisVol

    3,1999

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    13/54

    UsingthesamesetofSTSmarkerspermits

    integrationofthedifferentmaps

    YACbasedCloneMapswereusefulfor

    preparing100kbresolutionSTSmaps,but

    BACbasedclonemapsbecamethe

    workhorsefor

    sequencing

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    14/54

    HumanSequence

    ready

    Physical

    Map

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    15/54

    BACclonebased

    physicalmap

    FromWaterston

    et

    al.,

    PNAS

    2002

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    16/54

    BACCloning

    150 250kb

    FromBirren etal.,

    Genome

    Analysis

    Vol 3,

    1999

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    17/54

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    18/54

    BAC

    fingerprint

    mapping

    FromMarra

    et

    al.

    Genome

    Research

    1997

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    19/54

    FromMarra et

    al.

    Genome

    Research

    1997

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    20/54

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    21/54

    SangerDNA

    Sequencing

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    22/54

    FromShendureandJi,Nat.

    Biotech2008

    SangerDNAsequencing

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    23/54

    BACclonebased

    physicalmap

    FromWaterston

    et

    al.,

    PNAS

    2002

    SangerDNA

    Sequencing

    Assembly

    Anchoringto

    Chromosome

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    24/54

    Preparationof

    Templates

    and

    Sequencing

    Reactions

    highlyautomated

    1.)Growthofclones

    coloniespickedrobotically,transferredtosmallculturesin96wellformats

    2.)PurificationofDNA

    reagentsaddedandremovedrobotically,procedurescarriedoutin96wellformat

    3)DNAsequencingreactions

    reagentsadded

    and

    removed

    robotically,

    procedures

    carried

    out

    in

    96

    well

    or

    96

    x4(384

    well)

    format

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    25/54

    ProductionSequencingatMITsWhiteheadInstitute

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    26/54

    FromGordon

    et

    al.,

    Genome

    Research

    1998.

    SequenceAssembly

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    27/54

    Sequencing

    BACAteachstep,sequencesdepositedintoGenBank withanAccessionnumber

    a) 96reactions (Phase0)

    samplesequences,

    overlap

    detection

    b)35xcoverage (Phase1)

    assembleddraft

    c) 8 10xcoverage (Phase2)

    Highqualitydraftassemblies

    d) finishedsequence (Phase3)

    qvalues

    >40,

    no

    gaps

    in

    sequence

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    28/54

    Green,1997 WeberandMeyers,1997

    FromWaterston

    et

    al.,

    PNAS

    2002

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    29/54

    InternationalHumanGenomeSequencingConsortium

    DraftHuman

    Genome

    Sequence:

    Nature

    Feb.15,

    2001

    Goal:Immediateand

    unrestrictedpublicaccessto

    genomesequence

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    30/54

    BACclonebased

    physicalmap

    FromWaterston

    et

    al.,

    PNAS

    2002

    SangerDNA

    Sequencing

    Assembly

    Anchoringto

    Chromosome

    PublicDraftSequenceStrategy

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    31/54

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    32/54

    AssemblingtheDraftSequence

    Filtering

    Nonhumansequences

    ContaminationwithotherBACsequences

    Layout

    Sequenceslayed overphysicalmap

    Lab

    mix

    upsElectronicdigesttoplaceincorrectcontig

    BACendsequences

    STSandFISHmaps

    Merging

    GigAssembler

    Sequencecontig scaffold

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    33/54

    FromIHGSC

    Nature

    2001

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    34/54

    N50Length

    Maximumlength

    Lsuch

    that

    50%

    of

    all

    nucleotides

    lieincontigs (orscaffolds)oflengthL

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    35/54

    FromIHGSCNature2001

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    36/54

    CeleraGenomicsDraftHumanGenomeSequence

    ScienceFeb.16,

    2001

    Businessmodel:

    patent

    genesforcommercial

    products,sellaccesstothe

    genome

    sequence

    and

    Celeragenerated

    annotations

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    37/54

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    38/54

    Freepublicdatawasusedandincorporatedby

    Celerainto

    its

    assembly

    Summaryof

    Input

    Sequence:

    1.) 15GbofCelerarawsequence5xcoverage

    2.) 4.4Gb

    of

    Public

    Draft

    Sequence

    (derivedfromabout23Gb(7.5xcoverage)

    ofrawsequence)

    a.)

    shredded

    to

    a

    perfect

    2x

    coverage

    of

    Bactigsb.)2.96xcoverageofgenome

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    39/54

    FromVenter

    et

    al.,

    Science

    2001

    Celeraassemblystrategy

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    40/54

    FromVenteret

    al.,Science

    2001

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    41/54

    WGA:WholeGenomeAssembly

    5xcelera reads+3xpublicfauxreads

    +celera mate

    pair

    data

    TrueWGA(?)

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    42/54

    BACtigs

    are

    Mapped

    BAC

    sequence

    contigs from

    the

    publicproject

    CSA:CompartmentalizedShotgunAssembly

    (1) Bactigs fromregion+celera readsmatchingBactigs

    (2) Celerauniquescaffoldsmappingtoregion

    (3)scaffoldtilingforcompartmentcheckedmanually

    (4)Publicsequencewithincompartmentshredded,thenreassembled

    withcelera reads

    from

    compartment

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    43/54

    FromVenteret

    al.,Science

    2001

    Comparison Public Draft View

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    44/54

    Comparison,PublicDraftView

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    45/54

    Comparison,CeleraView

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    46/54

    Analysis

    of

    Draft

    Genome

    Sequences

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    47/54

    Public

    Data

    NationalCenterforBiotechnologyInformationwww.ncbi.nlm.nih.gov/genome/guide/human/

    UCSCGenomeBrowser extensiveannotationgenome.ucsc.edu/

    Footnote:Initially,

    Celera

    data

    was

    available

    only

    through

    licensingagreements,butlateritwasdepositedintothepublic

    databases

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    48/54

    Human

    Genome

    Size:

    ~3billion

    base

    pairs

    No.ofgenes:~40,000

    Average

    gene

    density:

    12

    genes/

    Mb Generichchromosome:17,19,22

    Genepoorchromosome:4,13,18,X,Y

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    49/54

    CpG

    islands RegionswithhigherfrequencyofCpGdinucleotides.

    >

    200

    bp

    regions

    with

    >

    50%

    GC. Associatedwith5endsofgenes/neartranscriptionalstart

    sites.

    30,000 50,000inthehumangenome.

    Y:2.9islands/Mb; 19:43islands/Mb

    GoodcorrelationofgenedensityandCpGislands

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    50/54

    Recombination

    Rate

    Onaverage: higherinfemalesthanmales.

    Highlyvariableamongdifferentgenomic

    regions.

    Higher telomericregions.

    Lower aroundthecentromeres.

    Repeat Content

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    51/54

    RepeatContent

    About50%ofthegenome

    5classesofrepetitiveelements: transposonderivedrepeats(interspersedrepeats) 45%

    genome;LINEs,SINEs,LTRs,DNAtransposons

    inactiveretroposed copiesofcellulargenes(pseudogenes e.g.intronless inactivatedgenes)

    simplesequence

    repeats

    micro

    ,minisatellites

    segmentalduplications

    blocksoftandemly repeatedsequences(e.g.aroundcentromeres,telomeres,shortarmsofacrocentricchromosomes)

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    52/54

    Sequence

    Variation Unrelatedindividualsare99.9%identicalat

    the

    DNA

    sequence. Mostcommontypeofvariant=single

    nucleotidepolymorphisms(SNPs)

    GGATCTA GGAGCTACCTAGAT CCTCGAT

    SNPrate

    ~1per

    1,200

    bp;

    1%

    of

    them

    affectproteinfunction.

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    53/54

    Gene

    Prediction

    Methods DirectevidenceoftranscriptionprovidedbyESTs

    ormRNA.

    Indirectevidenceofsequencesimilaritytoknown

    genesand

    proteins.

    Abinitio recognitionofexonsusingHMMs Genescan,

    Genie.

    Estimated gene number from the draft

  • 7/30/2019 Lecture Slides Lecture 2 Slides

    54/54

    Estimatedgenenumberfromthedraft

    sequences:

    35,000to40,000