1
Position (kbp) Background Methods Abstract (1) Long molecules of DNA are labeled with Bionano reagents by (2) incorporation of fluorophore labeled nucleotides at a specific sequence motif throughout the genome. (3) The labeled genomic DNA is then linearized in the Saphyr Chip using NanoChannel arrays (4) Single molecule are imaged and then digitized by the Saphyr instrument. (5) Molecules are labeled with a unique signature pattern that is uniquely identifiable and useful in assembly into genome maps. (6) Bionano maps may be used in a variety of downstream analysis using Bionano Access software. Extraction of long DNA molecules Label DNA at specific sequence motifs Saphyr Chip linearizes DNA in NanoChannel arrays Saphyr automates imaging of single molecules in NanoChannel arrays Molecules and labels detected in images by instrument software Bionano Access software assembles optical maps 1 2 3 4 5 6 Blood Cell Tissue Microbes Free DNA Solution DNA in a Microchannel DNA in a Nanochannel Gaussian Coil Partially Elongated Linearized Free DNA Displaced Strand Polymerase Nick Site Nickase Recognition Motif ©2017 Bionano Genomics. All rights reserved. Structural Variation Landscape Across 26 Human Populations Reveals Population Specific Variation Patterns in Complex Genomic regions Structural varia+on (SV) studies using different ethnic groups at popula+on level lead to greater insight in the genomic and trait diversity and differences in disease e+ology. While structural varia+on (SV) based on short-read sequences and sta+s+cal phasing have been constructed for samples comprising the 1000 Genomes Project 1 , the sensi+vity of detec+on and localiza+on of some classes of SVs (such as long inser+ons, inversions, copy number varia+ons, and duplica+ons spanning tens of kbp or more) are subop+mal. We have constructed genome op+cal maps 2 using Bionano next-genera+on mapping (NGM) for 146 unrelated individuals from 26 human popula+ons with long DNA molecules (>150 kbp) fluorescently labeled at specific sequence mo+fs (nickase recogni+on sites). These samples consist of 6 individuals (3 males and 3 females) from each of 26 human popula+ons of the 1000 Genomes Collec+on. As the data are generated from na+ve DNA without amplifica+on and assembled without the use of the human reference genome, the genome maps are de novo assemblies of the 146 genomes. All SVs >1.5 kbp are visualized and analyzed by algorithms developed by Bionano and the team that par+cipated in this study. When the mo+f paYerns from these genome op+cal maps were compared against the in silico maps digitally derived from the human reference genome and against each other, we found that there were clear specific SV paYerns among different ethnic groups and individuals in the popula+on. These popula+on SV paYerns are most pronounced in complex regions of the genome where large (>50 kbp) inversions and tandem duplica+ons are mixed together in the same loci. These regions include the loci for microdele+on syndromes (such as 7q11.23, 15q13.3, 16p11.2 and 22q11.2) and subtelomeric regions where near iden+cal, long repeats render them hotspots for SV forma+on and intractable for short-read sequences to assemble into unique con+gs. Genera+ng high-quality finished genomes replete with accurate iden+fica+on of structural varia+on and high comple+on (minimal gaps) remains challenging using short read sequencing technologies alone. Bionano NGM provides direct visualiza+on of long DNA molecules in their na+ve state, bypassing the sta+s+cal inference needed to align paired-end reads with an uncertain insert size distribu+on. These long labeled molecules are de novo assembled into physical maps spanning the whole genome. The resul+ng order and orienta+on of sequence elements in the map can be used for anchoring NGS con+gs and structural varia+on detec+on. HR Cao 4 , C. Chu 1 , A. Leung 3 , L. Li 3 , C. Lin 1 , J. McCaffrey 2 ,, Y. Mostovoy 1 , A. Naguib 4 , E. Lam 4 , A. Poon 1 , S. Pastor 2, R. Rajagopalan 2 , J. Sibert 2 , M. Sakin 1 , W. Wang 4 , A. Has+e 4 , E. Young 2 , T. Chan 3 , K. Yip 3 , M. Xiao 2 , P. Kwok 1 Conclusions We have constructed genome op+cal maps using Bionano NGM for 146 unrelated individuals from 26 human popula+ons with long DNA molecules (>150 kbp) fluorescently labeled at specific sequence mo+fs (nickase recogni+on sites). These samples consist of 6 individuals (3 males and 3 females) from each of 26 human popula+ons of the 1000 Genomes Collec+on. Here we demonstrate the ability of long single molecule mapping to resolve complex long range SVs, some+mes with mul+ple haplotypes, in the human genome and provide new “alterna+ve” human popula+on based references for these regions that are associated with important human diseases. The popula+on specific SV paYerns have been shown to present in rela+ve “well-behaved” as well as variable complex regions, shedding light on the origins of the complex regions and the paYerns more closely associated with human disease. In conclusion, Bionano NGM may prove to be the one cost-effec<ve, fast and comprehensive pla?orm for popula<on level study of func<onally-relevant large structural variants, paving the way for the era of precision genomics and medicine. . Reference Sudmant PH et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015; 526:75-81. Mak AC et al. Genome-Wide Structural Variation Detection by Genome Mapping on NanoChannel Arrays. Genetics. 2016; 202:351-62. Cao, H., et al., Rapid detection of structural variation in a human genome using NanoChannel-based genome mapping technology. Gigascience (2014); 3(1):34 Lam, E.T., et al. Genome mapping on NanoChannel arrays for structural variation analysis and sequence assembly. Nature Biotechnology (2012); 10: 2303 1) University of California, San Francisco, San Francisco, CA; 2) Drexel University, Philadelphia, PA. 3) CUHK, Sha+n, Hong Kong; 4) Bionano Genomics, Inc., La Jolla, CA De novo Assembled Genome Maps of 146 unrelated individuals from 26 human popula<ons Analyzed for SVs Summary of SV Sta<s<cs http://www.1000genomes.org/sites/1000genomes.org/files/documents/1000-genomes-map_11-6-12-2_750.jpg 5.6% of the reference genome not present in maps ~20 Mbp new genomic content not found in reference genome 5% of the reference genome is covered in <20% of the assemblies ~70% of the genome is “well -behaved” and covered by most individuals ~1800 SVs are common in all super-popula+on (Black) ~1500 SVs are shared at least in 2 of the SuPop (Grey) Large propor+ons of unique SVs in AFR (~2100) (yellow) Large propor+ons of unique SVs in AFR (42%) Variable Complexity Observed in the MHC Region (chr6:28.5-33.5M) The whole region spans across a long range (5Mbp) An overview of con+g-to-reference mapping shows different degrees of varia<ons among sub-regions 28Mb 33Mb C D F 1 yellow line for 1 con+g Each sample may have mul+ple con+gs Unmapped regions denoted in green B E G A High complexity 1 Reference Con+g Pattern 4: C<- G A C D E F G B Pattern 1 Pattern 2 Pattern 3 Pattern 4 C A B F G C G Pattern 1: A->B->C->E->F->G Pattern 2: A->B<- D->G Pattern 3: A<- C->F->G Segmental Duplica+on Region: 16p12 AFR is the deepest splits among Popula<on structure study Phylogene<c tree (Fst)

Structural Variation Landscape Across 26 Human Populations ... · Cao, H., et al., Rapid detection of structural variation in a human genome using NanoChannel-based genome mapping

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Structural Variation Landscape Across 26 Human Populations ... · Cao, H., et al., Rapid detection of structural variation in a human genome using NanoChannel-based genome mapping

Position (kbp)

Background

Methods

Abstract

(1) Long molecules of DNA are labeled with Bionano reagents by (2) incorporation of fluorophore labeled nucleotides at a specific sequence motif throughout the genome. (3) The labeled genomic DNA is then linearized in the Saphyr Chip using NanoChannel arrays (4) Single molecule are imaged and then digitized by the Saphyr instrument. (5) Molecules are labeled with a unique signature pattern that is uniquely identifiable and useful in assembly into genome maps. (6) Bionano maps may be used in a variety of downstream analysis using Bionano Access software.

Extraction of long DNA molecules Label DNA at specific sequence motifs

Saphyr Chip linearizes DNA in NanoChannel arrays

Saphyr automates imaging of single molecules in NanoChannel arrays

Molecules and labels detected in images by instrument software

Bionano Access software assembles optical maps

1 2 3 4 5 6

Blood Cell Tissue Microbes

Free DNA Solution DNA in a Microchannel DNA in a Nanochannel

Gaussian Coil Partially Elongated Linearized

Free DNA Displaced Strand

Polymerase Nick Site Nickase Recognition

Motif

©20

17 B

iona

no G

enom

ics.

All

right

s re

serv

ed.

Structural Variation Landscape Across 26 Human Populations Reveals Population Specific Variation Patterns in Complex Genomic regions

Structuralvaria+on(SV)studiesusingdifferentethnicgroupsatpopula+onlevelleadtogreaterinsightinthegenomicandtraitdiversityanddifferencesindiseasee+ology.Whilestructuralvaria+on(SV)basedonshort-readsequencesandsta+s+calphasinghavebeenconstructedforsamplescomprisingthe1000GenomesProject1,thesensi+vityofdetec+onandlocaliza+onofsomeclassesofSVs(suchaslonginser+ons,inversions,copynumbervaria+ons,andduplica+onsspanningtensofkbpormore)aresubop+mal.Wehaveconstructedgenomeop+calmaps2usingBionanonext-genera+onmapping(NGM)for146unrelatedindividualsfrom26humanpopula+onswithlongDNAmolecules(>150kbp)fluorescentlylabeledatspecificsequencemo+fs(nickaserecogni+onsites).These

samplesconsistof6individuals(3malesand3females)fromeachof26humanpopula+onsofthe1000GenomesCollec+on.Asthedataaregeneratedfromna+veDNAwithoutamplifica+onandassembledwithouttheuseofthe

humanreferencegenome,thegenomemapsaredenovoassembliesofthe146genomes.AllSVs>1.5kbparevisualizedandanalyzedbyalgorithmsdevelopedbyBionanoandtheteamthatpar+cipatedinthisstudy.

Whenthemo+fpaYernsfromthesegenomeop+calmapswerecomparedagainsttheinsilicomapsdigitallyderivedfromthehumanreferencegenomeandagainsteachother,wefoundthattherewereclearspecificSVpaYernsamongdifferentethnicgroupsandindividualsinthepopula+on.Thesepopula+onSVpaYernsaremostpronouncedincomplexregionsofthegenomewherelarge(>50kbp)inversionsandtandemduplica+onsaremixedtogetherinthesameloci.Theseregionsincludethelociformicrodele+onsyndromes(suchas7q11.23,15q13.3,16p11.2and22q11.2)andsubtelomericregionswhereneariden+cal,longrepeatsrenderthemhotspotsforSVforma+onandintractableforshort-readsequencestoassembleintouniquecon+gs.

Genera+nghigh-qualityfinishedgenomesrepletewithaccurateiden+fica+onofstructuralvaria+onandhighcomple+on(minimalgaps)remainschallengingusingshortreadsequencingtechnologiesalone.BionanoNGMprovidesdirectvisualiza+onoflongDNAmoleculesintheirna+vestate,bypassingthesta+s+calinferenceneededtoalignpaired-endreadswithanuncertaininsertsizedistribu+on.Theselonglabeledmoleculesaredenovoassembledintophysicalmapsspanningthewholegenome.Theresul+ngorderandorienta+onofsequenceelementsinthemapcanbeusedforanchoringNGScon+gsandstructuralvaria+ondetec+on.

HRCao4,C.Chu1,A.Leung3,L.Li3,C.Lin1,J.McCaffrey2,,Y.Mostovoy1,A.Naguib4,E.Lam4,A.Poon1,S.Pastor2,R.Rajagopalan2,J.Sibert2,M.Sakin1,W.Wang4,A.Has+e4,E.Young2,T.Chan3,K.Yip3,M.Xiao2,P.Kwok1

Conclusions Wehaveconstructedgenomeop+calmapsusingBionanoNGMfor146unrelatedindividualsfrom26humanpopula+onswithlongDNAmolecules

(>150kbp)fluorescentlylabeledatspecificsequencemo+fs(nickaserecogni+onsites).Thesesamplesconsistof6individuals(3malesand3females)fromeachof26humanpopula+onsofthe1000GenomesCollec+on.

HerewedemonstratetheabilityoflongsinglemoleculemappingtoresolvecomplexlongrangeSVs,some+meswithmul+plehaplotypes,inthehumangenomeandprovidenew“alterna+ve”humanpopula+onbasedreferencesfortheseregionsthatareassociatedwithimportanthumandiseases.Thepopula+onspecificSVpaYernshavebeenshowntopresentinrela+ve“well-behaved”aswellasvariablecomplexregions,sheddinglightontheoriginsofthecomplexregionsandthepaYernsmorecloselyassociatedwithhumandisease.Inconclusion,BionanoNGMmayprovetobetheonecost-effec<ve,fastandcomprehensivepla?ormforpopula<onlevelstudyoffunc<onally-relevantlargestructuralvariants,pavingthewayfortheeraofprecisiongenomicsandmedicine. .

Reference Sudmant PH et al. An integrated map of structural variation in 2,504 human genomes. Nature. 2015; 526:75-81. Mak AC et al. Genome-Wide Structural Variation Detection by Genome Mapping on NanoChannel Arrays. Genetics. 2016; 202:351-62. Cao, H., et al., Rapid detection of structural variation in a human genome using NanoChannel-based genome mapping technology. Gigascience (2014); 3(1):34 Lam, E.T., et al. Genome mapping on NanoChannel arrays for structural variation analysis and sequence assembly. Nature Biotechnology (2012); 10: 2303

1)UniversityofCalifornia,SanFrancisco,SanFrancisco,CA;2)DrexelUniversity,Philadelphia,PA. 3)CUHK,Sha+n,HongKong;4)BionanoGenomics,Inc.,LaJolla,CA

DenovoAssembledGenomeMapsof146unrelatedindividualsfrom26humanpopula<onsAnalyzedforSVs

SummaryofSVSta<s<cs

http://www.1000genomes.org/sites/1000genomes.org/files/documents/1000-genomes-map_11-6-12-2_750.jpg

•  5.6% of the reference genome not present in maps•  ~20 Mbp new genomic content not found in reference genome•  5% of the reference genome is covered in <20% of the assemblies•  ~70% of the genome is “well-behaved” and covered by most

individuals•  ~1800SVsarecommoninallsuper-popula+on(Black)•  ~1500SVsaresharedatleastin2oftheSuPop(Grey)•  Largepropor+onsofuniqueSVsinAFR(~2100)(yellow)

•  Largepropor+onsofuniqueSVsinAFR(42%)

VariableComplexityObservedintheMHCRegion(chr6:28.5-33.5M)

•  Thewholeregionspansacrossalongrange(5Mbp)•  Anoverviewofcon+g-to-referencemappingshows

differentdegreesofvaria<onsamongsub-regions28Mb 33Mb

C D F1yellowlinefor1con+gEachsamplemayhavemul+plecon+gsUnmappedregionsdenotedingreen

B E GA

Highcomplexity

1

Reference

Con+g

Pattern 4: C<-G

A C

D

E

F

G

B Pattern 1

Pattern 2

Pattern 3 Pattern 4 C

A B

F G

C G

Pattern 1: A->B->C->E->F->G

Pattern 2: A->B<-D->G

Pattern 3: A<-C->F->G

SegmentalDuplica+onRegion:16p12

AFRisthedeepestsplitsamong

Popula<onstructurestudyPhylogene<ctree(Fst)