32
nanoporetech.com nanoporetech.com/publications WHITE PAPER HUMAN GENETICS RESEARCH Advancing human genetics research with nanopore sequencing

Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

nanoporetech.com

nanoporetech.com/publications

WHITE PAPERHUMAN GENETICS RESEARCH

Advancing human genetics research with nanopore sequencing

Page 2: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

1The advantages of nanopore sequencing for human genetics research

234Summary

References

5

Case studies

About Oxford Nanopore Technologies

Contents

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 3: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Introduction

High-throughput sequencing technologies have revolutionised the field of human genetics, allowing researchers to more easily investigate and understand biological processes and their impact. Using these technologies, researchers can analyse entire genomes or specific targeted regions of interest, with further functional insights garnered through the characterisation and quantification of RNA transcripts and isoforms. Together, these capabilities have provided unprecedented insight into human genetic diversity and its implications in health and disease.

Despite offering significant advancement in terms of speed and resolution over the older techniques of Sanger sequencing and microarrays respectively, there are still a number of limitations inherent to traditional short-read, high-throughput sequencing platforms. From closing genome gaps to characterising full-length transcripts, this review will present how real-time, long-read, high-throughput nanopore sequencing technology is being used to address these limitations, resulting in new biological insights. Specific case studies reveal how researchers are applying the benefits of nanopore technology to a variety of sequencing techniques, including whole genome, targeted and RNA sequencing.

1

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 4: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

The advantages of nanopore sequencing for human genetics research

Whole genome sequencing without the holes

To date, the majority of large genomes sequenced have utilised short-read technologies, which require the DNA to be fragmented into small (typically 150-300 bp) lengths prior to sequencing. While these technologies allow rapid and relatively cost-effective genome analysis compared to older sequencing methodologies, their inherent read-length limitations preclude the analysis of vast stretches of DNA corresponding to repetitive regions and large structural variations1,2,3. It is for these reasons that approximately 8% of the human genome is intractable to assembly and interpretation4.

The required DNA fragmentation step effectively loses relative positional information, meaning that during genome assembly, these small fragments must be pieced back together through overlap with other short fragments. For repetitive regions, this can be particularly challenging and may result in the collapsing of potentially long regions of repeats down to much shorter lengths, leaving gaps in the assembly (Figure 1)5.

In the same way, the presence of some structural variants such as deletions, insertions, duplications, inversions and translocations may be missed when using short sequencing reads alone. It is well established that some repeat regions and structural variants are associated with human health and disease (e.g. ageing6, triplet expansion diseases7, autism8, epilepsy9 and cancer10,11), making their routine characterisation highly advantageous.

1

Genome sequence

Short reads

Short-read consensus

Long reads

Long-read consensus

Figure 1 Schematic highlighting the advantages of long reads in de novo assembly of repetitive regions. Long read lengths are more likely to incorporate the whole repetitive region (blue boxes) allowing more accurate assembly. Image adapted from Kellog12.

2

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 5: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Unlike short-read sequencing technologies, nanopore-based sequencing processes the entire length of the DNA fragment that is presented to the pore. Complete fragments of thousands of kb are routinely processed and ultra-long read lengths over 2 Mb have been achieved13. Clearly, such long reads are more likely to span entire regions of repetitive DNA and structural variation. As a result, nanopore sequencing provides a more complete view of genetic variation.

Using nanopore sequencing, highly contiguous genome assemblies have been generated for many large and complex organisms that had previously been deemed inaccessible to modern sequencing methods14,15. Researchers recently used nanopore sequencing to deliver the most complete human genome ever assembled with a single technology (Case study 1) and the first complete and accurate sequence of a human centromere (Case study 2). Furthermore, as will be discussed later, the long, direct sequencing reads delivered by nanopore technology also permit the phasing of variants and modified bases.

Rapid analysis of targeted regions

For researchers wishing to study specific genomic loci, a targeted sequencing approach is commonly employed. The facility to focus on the regions most likely to provide relevant data reduces cost, allows a higher depth of coverage and simplifies analysis.

A range of targeted sequencing methodologies are available with nanopore sequencing, from bespoke assays to targeted panels and whole exome approaches. Nanopore sequencing has been successfully utilised with both PCR- and hybrid capture-based targeted enrichment strategies – including whole exome enrichment16,17.

As discussed for whole genome sequencing, the long reads provided by nanopore technology provide a range of advantages for researchers interested in targeted sequencing. For example, it is possible to sequence much larger regions in a single read, which allows improved characterisation of highly repetitive regions and/or structural variants (see Case study 4). Furthermore, long sequencing reads allow the phasing of alleles and variants, which is extremely challenging when using short-read sequencing18.

‘Nanopore sequencing allows same-day detection of structural variants, point mutations, and methylation profiling using a single device with negligible capital cost’10.

3

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 6: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

A number of researchers are now also investigating the potential of CRISPR/Cas9 to enrich for long, targeted DNA molecules (see Case study 5). Initial results have shown significant promise for such techniques, which may offer faster, more streamlined workflows and, through direct sequencing of the target molecule, enable the analysis of base modifications.

A unique feature of nanopore technology is the facility to utilise real-time analysis to selectively sequence or reject DNA molecules as they pass through the pore. Rejection of molecules that fall

outside of the chosen criteria is achieved through reversing the current applied to the individual pore, thereby freeing up the pore to sequence an alternative DNA fragment. Although this potential real-time enrichment strategy is in its infancy, researcher-developed tools such as Read Until19 and RUBRIC20 are available.

Oxford Nanopore provides researchers with a cloud-based analysis platform, EPI2ME. Among the workflows available is the Human Alignment analysis tool, which aligns nanopore reads to the human GRCh38 reference genome, generating real-time coverage plots at the chromosome or gene level (Figure 2). This unique tool allows researchers to easily assess the efficiency of their sequence targeting.

‘The challenge of aligning short reads to regions with high homology is often not fully appreciated’18.

Figure 2 The Reference Alignment tool allows easy, real-time visualisation of targeted or whole genome sequencing coverage, allowing researchers easily assess the efficiency of their sequence targeting.

4

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 7: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Full-length RNA transcripts, isoform characterisation and accurate quantification Due to the fragmentation required by most traditional sequencing methods, accurate assembly of complete transcripts is exceptionally challenging, especially in instances where a read maps to more than one location. With nanopore technology, entire RNA molecules are processed regardless of their length, allowing the sequencing of complete transcripts in single reads.

In addition to reducing multiple-locus alignment issues, long, full-length reads provide a significant advantage in the analysis and correct identification of alternative splicing (Figure 3). When using short-reads, different transcript isoforms have to be computationally

Nanopore long-readRNA sequencing

Gene / precursormRNA

Short-read RNA-Seq

Exon 4 Exon 5 Exon 6 Exon 7 Exon 8 Exon 9Exon 1 Exon 2 Exon 3

Figure 3 Alternative splicing can give rise to numerous mRNA isoforms per gene, which in turn can alter protein composition and function. The short reads generated by traditional RNA sequencing techniques lose positional information, making the correct assembly of alternative mRNA isoforms challenging. Long nanopore reads can span full-length transcripts simplifying their identification.

reconstructed; however, a study by Steijger et al.21 revealed that automated transcript assembly methods fail to identify all constituent exons in over half of the transcripts analysed. Furthermore, of those transcripts with all exons identified, over half were incorrectly assembled. Such complications are further compounded where reads from highly similar transcripts, such as those of paralogous genes, are under investigation. Rare isoforms could also remain altogether undetected22.

Describing the additional insight gained from nanopore sequencing, Dr Christopher Vollmers at the University of California, Santa Cruz, comments: ‘It’s rare to find an isoform that perfectly matches the annotation, and that’s in the human genome which is already super-curated. Once in a while, you find an exon that nobody has annotated mostly because

5

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 8: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

it may contain an Alu element or other kind of repeat, so short reads won’t align to it. Most splice junctions [identified by short-read technology] are correct but transcription start sites and poly-A sites, that’s the wild west – there is so much going on that is not on any annotations’.

Recently, a team of researchers from the University of Oxford and Earlham Institute utilised nanopore sequencing to investigate expression of the neuropsychiatric disease-associated gene CACNA1C 23. The long nanopore reads enabled the identification of 38 putative novel exons and 90 transcript isoforms, of which only 7 had been previously identified. Interestingly, 9 of the top 10 expressed isoforms were novel, previously unknown transcripts. This and other studies have also demonstrated how nanopore sequencing provides highly accurate measurement of transcript abundance23,24.

The advantages of nanopore technology are now also being applied to single cell transcriptome studies, delivering further, more detailed insight into gene expression and function (see Case study 7).

Direct RNA sequencing

Until recently, sequence-based analysis of RNA required the conversion of RNA to complementary DNA (cDNA), a process that can introduce bias through reverse transcription or amplification25. These issues can be exacerbated by the use of traditional short-read sequencing technologies which are known to exhibit GC bias, where sequences with low or high levels of GC content are underrepresented (Figure 4). The amplification step required to generate cDNA also loses all modified base information. Such base modifications are known to have a role in modulating the activity and stability of RNA and are therefore of increasing interest to researchers. Nanopore sequencing overcomes all of these challenges through the facility for direct RNA sequencing — delivering unbiased, full-length, strand-specific RNA sequences26. The longest transcript processed by direct RNA sequencing currently stands at over 20 kb in length27.

6

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 9: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

A further benefit of direct RNA sequencing is the ability to accurately measure poly-A tail length (see Case study 6)27,28. In eukaryotes, messenger RNA (mRNA) is augmented with a series of adenosine bases at the 3’ end known as the poly-A tail.

Pearson r = 0.19; p = 1.6 e-58

0

5

10

15

Log

coun

t

Short-read cDNA

Pearson r = 0.016; p = 0.18

Log

coun

t

Short-read cDNA

Log

coun

t

0

5

10

15

20Pearson r = 0.3; p = 7e-141

Length (kb)0 4 8 12 16

0

2

4

6

8

10

Log

coun

t

Pearson r = 0.013; p = 0.29

Length (kb)0 4 8 12 16

0

2

4

6

8

10

Log

coun

t

Pearson r = 0.13; p = 5.4e-29

0

2

4

6

8

10

Log

coun

t

Pearson r = 0.18; p = 2.8e-52

Length (kb)0 4 8 12 16

GC content0.1 0.2 0.3 0.4 0.5 0.6

GC content0.1 0.2 0.3 0.4 0.5 0.6

GC content0.1 0.2 0.3 0.4 0.5 0.6

GC content0.1 0.2 0.3 0.4 0.5 0.6

0

2

4

6

8

10

12

Log

coun

t

0

2

4

6

8

10

12

Log

coun

t

Pearson r = 0.041; p = 0.00075

Length (kb)0 4 8 12 16

a)

b)

Pearson r = 0.048; p = 8.6 e-0.5

Oxford Nanopore direct RNA

Oxford Nanopore direct cDNA

Oxford Nanopore PCR-cDNA

Oxford Nanopore direct RNA

Oxford Nanopore direct cDNA

Oxford Nanopore PCR-cDNA

Figure 4 Sequencing workflows that incorporate amplification are vulnerable to sequence-specific biases. Yeast transcriptome libraries were prepared using three nanopore sequencing techniques (PCR-DNA, direct cDNA and direct RNA) and a typical short-read cDNA technique. In all cases, GC bias in the nanopore data sets was lower than in the short-read data set26. These tails can vary in size, with the largest

being over 250 nucleotides in length and therefore beyond the typical analysis capabilities of short-read sequencing technologies27,28. Research suggests that poly-A tail length is an important factor in post-transcriptional regulation and further study may provide new insights into gene expression and disease27,29. ‘We believe that direct RNA sequencing

will become a versatile tool for transcriptome analysis in the “complete genome era” of the future’50.

0

2

4

6

8

10

7

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 10: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Analysis of base modifications

The importance of base modifications such as 5-methylcytosine (5mC) and N6-methyladenosine (m6A) on gene expression and function is becoming increasingly apparent. For example, 5mC has been linked to many human diseases, including neurological disorders30 and cancer31, and may offer significant potential as a diagnostic and prognostic indicator.

The requirement for nucleic acid amplification in traditional short-read sequencing technology erases these base modifications, meaning they cannot be detected without additional time-consuming and often inefficient sample processing methods10,32. Nanopore sequencing does not require amplification or strand synthesis, allowing both the base and its modification to be detected in the same sequencing run (Figure 5). To date, researchers have utilised nanopore sequencing to detect a number of modified bases from both DNA and RNA, including pseudouridine27, N6-methyladenosine (m6A)25, 5-methylcytosine (5mC)25, and 7-methylguanosine (m7G)32.

‘Methylation data can directly be obtained from the same WGS data set which makes time-consuming bisulfite conversion and specialized methylation assays (sequencing or hybridization-based) expendable’10.

Figure 5 The Tombo data analysis package allows identification of a range of modified bases from raw nanopore signal.

Sig

nal

Consensus

Reference

Mods

0

2.5

-2.5

A U CC C C C A G A A C C C CG A A A U C C C A G U AA G U

A U U

Y

U

U

C C C C A G A A C G A G G

m7G

A A U U C C C A G U A A G U

Y

Reference

rRNA modifications

U U UUU U UU U U U U UU U U U UU U UUA A A A A A A A A A A A A A A A A A A A A A A ACC C CC C C CC C CC C CC C CC C CGGGGGGGGGGGGGGGGGGGGGG

Am(2’-O-methyl A)

Y(pseudouridine)

Um(2’-O-meU)

Gm Y

[0-0.99]

Proportion of reads modified (Tombo prediction)

Am Gm(2’-O-methyl G)

Y

88 bp

580 bp 590 bp 600 bp 610 bp 620 bp 630 bp 640 bp 650 bp 660 bp

X03205.1

Human SSU rRNA

Fig. 1 Modifications in human rRNA a) Tombo prediction

Fig. 1 Modifications in human rRNA a) raw reads

8

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 11: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Phasing of variantsDetermining the maternal or paternal inheritance of an allele can deliver insights into genome function, health, disease and evolution. It is also becoming an increasingly important avenue of research for the advancement of precision medicine33. However, the nature of short-read sequencing technologies make it extremely challenging to unambiguously assign parental origin for variants separated over large genomic regions.

The long reads delivered by nanopore sequencing simplify the phasing of alleles. In one recent study, researchers were able to phase the entire 4 Mb human major histocompatibility complex (see Case study 1)1. In addition to the phasing of SNVs, nanopore technology also allows the phasing of SVs and base modifications — providing unprecedented depth of genomic characterisation.

Cost-effective, scalable and on-demand analysis in real-time

Oxford Nanopore provides a range of devices that provide cost-effective, fully-scalable and on-demand sequencing to suit all research requirements. Unlike traditional sequencing platforms that require large capital investments (>$50k–1M)34, significant infrastructure and calibration by trained engineers, the MinION™ Starter Pack from Oxford

Nanopore costs just $1000 (including two flow cells and sequencing reagents) and is powered by the USB port on a laptop or the MinIT™ accessory (Figure 6). With current yields of up to 30 Gb per flow cell, the uniquely transportable and affordable MinION provides any researcher with access to the benefits of long-read, real-time sequencing technology.

GridION™ X5 and PromethION™ offer respectively 5 and 300 times the yield of the MinION, providing users with the facility to cost-effectively scale their research to meet the demands of routine whole genome sequencing — for example, the characterisation of cancer cell lines — or high-depth RNA sequencing of a large number of samples (Figures 7 and 8). Both devices are available with no capital expenditure and deliver a comparable cost-per-base

‘We’ve gone from a situation where you can only do genome sequencing for a huge amount of money in well-equipped labs to one where we can have genome sequencing literally in your pocket just like a mobile phone’43.

9

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 12: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

to traditional sequencing platforms. In addition, the facility to use flow cells independently allows other projects to be run concurrently with no need for sample batching — delivering rapid access to results.

Oxford Nanopore is also developing Flongle™, a flow cell adapter designed to provide even more cost-effective analysis of smaller, more frequently performed tests and experiments.

A range of dedicated, streamlined library preparation kits are available to suit all experimental requirements and include the facility for sample multiplexing.

‘PromethION generates such a lot of data in such a consistent way that we can more easily access any genome’49.

Figure 6 MinION: a pocked sized, portable device. Each flow cell can run up to 512 channels at a time.

Figure 7 GridION X5: 5 independent flow cells with integrated data processing.

Figure 8 PromethION: 48 independent flow cells, each of which can run up to 3,000 channels at a time. Integrated data processing for high-throughput sequencing.

10

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 13: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Case studies2

11

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 14: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Case study 1The most complete human genome ever assembled with a single technology

Although short-read sequencing technologies have improved considerably over the last decade in terms of yield and turnaround times, according to Jain and coworkers: ‘assembling human genomes with high accuracy and completeness remains challenging’1. At ~3.1 Gb, the human genome is not only large, it also contains regions of uneven nucleotide composition, high levels of repetitive content (up to 69%35) and large segmental duplications. As a result, most human genome assemblies are highly fragmented and contain gaps that both limit their structural integrity and subsequent biological interpretation. Furthermore, short sequencing reads prevent the assignment of alleles or variants to their original chromosome. Such ‘phasing’ information provides significantly more insight into gene expression and function and is of particular importance when studying genetic disease.

To assess the potential of long nanopore sequencing reads to overcome these issues and deliver more contiguous, complete genomes, a team comprising researchers from the UK, USA and Canada, used the MinION to sequence the well-characterised human reference genome NA12878. The team deployed a standard kit-based DNA extraction method together with the Ligation Sequencing Kit to generate long sequence reads. In total, 91.2 Gb of data was generated, which is equivalent to 30x genome coverage.

Using the assembly tool Canu, a highly contiguous assembly was produced, comprising 2,886 contigs with an NG50* contig size of ~3 Mb. The superior genome contiguity offered by nanopore sequencing was exemplified by the inclusion of the highly repetitive — and thereby notoriously difficult to assemble — HLA class I region in a single contig.

At over 2 Mb, the longest sequencing read set a new record for a single contiguous DNA sequence.

To investigate the impact of increasing read length on assembly contiguity, the team further used a modified phenol:chloroform extraction technique together with the streamlined Rapid Sequencing Kit to generate ultra-long reads. Approximately 18 Gb of ultra-long read data was obtained (equivalent to 5x genome coverage), with the longest mapped read being 882 kb. More recently, researchers from the University of Nottingham have obtained a human ultra-long read in excess of 2 Mb — a new record for a single contiguous DNA sequence36.

* The NG50 value represents the longest contig such that contigs of this length or greater sum to at least half of the haploid genome size.

12

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 15: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

The additional ultra-long reads not only doubled the assembly contiguity (NG50 ~6.4 Mb) but also significantly improved the facility to phase alleles. For example, it was possible to phase the entire 4 Mb major histocompatibility complex (MHC) that was contained within a single 16 Mb contig (Figure 9). As stated by the researchers: ‘The increased single-molecule read length that we report here, obtained using a MinION nanopore sequencer, enabled us to analyse regions of the human genome that were previously intractable with state-of-the-art sequencing methods’1. This sentiment

was further reflected through the utilisation of the nanopore data to close 12 large (>50 kb) gaps in the GRCh38 reference genome, which corresponded to 83,980 bp of previously unknown euchromatic sequence.

Unlike short-read technology, nanopore sequencing also allows the direct detection of DNA modifications alongside the nucleotide sequence. In this study, the levels of 5-methylcytosine (5mC), detected were highly concordant with results obtained using alternative methylation analysis techniques.

Data from this study is available at: github.com/nanopore-wgs-consortium/NA12878

Long nanopore sequencing reads allowed phasing of the entire 4 Mb MHC region.

Figure 9 Phasing of the entire 4 Mb MHC region. The long read lengths delivered by nanopore sequencing allowed the creation of haplotigs (contigs derived from the same chromosome) enabling phasing of genes and variants – providing potential new insights into gene expression. Blue box signifies the MHC class II region.

13

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 16: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Case study 2Comprehensive and cost-effective characterisation of the Y chromosome

* The N50 value represents the fragment length where half of the data are contained in fragments of this length and greater.

Figure 10 Assembly contiguity comparison between the human HG02982 and gorilla Y chromosomes. The size of each rectangle corresponds to the size of a contig within each assembly. The HG02982 assembly which combined long nanopore reads with short-read sequencing displayed significantly higher contiguity than the recently published gorilla assembly which utilised an alternative ‘long’-read sequencing technology combined with short-read sequencing. Both sequencing data sets were derived from flow-sorted Y chromosomes.

Mammalian Y chromosomes are often neglected from genomic analysis due to their inherent assembly difficulties caused by a high level of repetitive DNA and palindromes. To date, just a single reference-quality human Y chromosome, of European ancestry, is available — thereby increasing the potential for reference bias and overlooking the significant genomic variation present in other populations37.

While isolation of the Y chromosome using flow cytometry can simplify the assembly challenge — reducing the overlap of repetitive DNA with that on other chromosomes — due to their limited read lengths, amplification bias and removal of epigenetic modifications, traditional short-read sequencing technologies offer an imperfect solution.

To overcome the limitations of short-read technology and address the lack of reference-quality Y chromosomes, researchers from Spain developed a novel strategy to sequence native, unamplified flow-sorted DNA of African ancestry using the MinION. Approximately 9 million Y chromosomes were sorted from a lymphoblastoid cell line (HG02982), whose haplogroup (A0) represents one of the earliest known human lineages.

Nanopore sequencing of the flow-sorted chromosomes generated 2.3 Gb of data with an average read N50* of ~18 kb. De novo assembly and sequence polishing was carried out using the Canu38 and nanopolish39 tools respectively, with further sequence polishing using short-read data performed using Pilon40. The final assembly totalled 21.5 Mb in length and comprised 35 contigs with a contig N50 of 1.46 Mb.

The researchers commented that this technique: ‘…constitutes a significant improvement over comparable previous methods, increasing continuity by more than 800%’ (Figure 10)37.

Comparing the assembly against the GRCh38 reference, the team were able identify extensive genic copy number variation with expansions in 5 of the 9 multi-copy genes, four of which are implicated in male infertility. They also identified 347 structural variants of over 50 bp in size between the two assemblies.

Gorilla_chrYHG02982_chrY

14

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 17: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

The team were also able to detect the epigenetic modification 5-methylcytosine alongside the nucleotide sequence, demonstrating good correlation with data obtained using whole genome bisulfite sequencing.

This study highlights how long nanopore sequencing reads can be used to deliver new insights into complex genomic regions which have previously proven challenging to analyse using traditional sequencing technology. Commenting on this research the team suggest that: ‘Given the current developments in sequencing throughput, a single MinION flowcell should now be sufficient to assemble a whole human Y chromosome. Furthermore, it is becoming clear that the upper read length boundary is only delimited by the integrity of the DNA, suggesting the possibility that complete Y chromosome assemblies, including full resolution of amplicons, might be possible in the near future’37.

Expanding on this possibility, recent research led by Dr. Karen Miga at the University of California, Santa Cruz demonstrated the use of nanopore technology to deliver the first complete and accurate sequence of a human

centromere41. Human centromeres are composed of long tracts of near identical tandem repeats making them intractable to assembly using short-read sequencing technology. Using the long reads generated by nanopore technology, the team sequenced eight BAC clones that together spanned the Y chromosome centromere. In total the team generated over 3,500 reads that were greater than 150 kb in length. Consensus sequence polishing was performed using the BLASR tool, and variants were validated against short-read sequencing data. These informative markers, together with structural variants, allowed alignment of the BAC consensus sequences, revealing the centromere to be 365 kb in length (Figure 11). According to the researchers, their assembly: ‘enables the precise number of repeats in an array to be robustly measured and resolves the order, orientation, and density of both repeat-length variants across the full extent of the array. This work could potentially advance studies of centromere evolution and function and may aid ongoing efforts to complete the human genome’41. The team are now optimising the methodology to sequence centromeres directly from whole genomic DNA without the requirement for BACs.

Figure 11 Assembly of the human Y chromosome centromere. Eight BAC clones covering the entire centromere were ordered using sequence variants. The centromere is dominated by 5.8 kb higher order repeats (HOR) (light blue boxes) interspersed by HOR variants (purple boxes). Highly divergent monomeric alpha satellite is indicated in dark blue. Figure adapted from Jain et al.41

15

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 18: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Case study 3Mapping and phasing structural variation

Accounting for a far greater number of variable bases than single nucleotide variations (SNVs), structural variation (SV) is an important class of genetic variation that has been implicated in a wide range of genetic disorders. To address the limitations of short-read sequencing technology to accurately and cost-effectively characterise SV, an international research team led by Dr. Wigard Kloosterman of the University Medical Centre Utrecht, assessed the performance of long-read sequencing delivered by nanopore technology3. The team performed whole genome sequencing of two DNA samples using both the MinION and a short-read sequencing technology. The samples were obtained from individuals with congenital disease resulting from complex chromothripsis, which is characterised by dozens of locally clustered genomic rearrangements affecting one or a few chromosome(s) in a cell.

In one sample, nanopore sequencing at 16x depth allowed the detection of all of the previously validated de novo chromothripsis breakpoint junctions. For the second sample, which was sequenced at 11x depth, the nanopore

data allowed 24 of 29 previously validated breakpoints to be detected; however, further investigation revealed that two of the 5 undetected breakpoints represented a complex combination of joined segments which had been incorrectly assigned in the long-insert mate pair validated data set — further highlighting the benefits of long sequencing reads (Figure 12). Detection of the remaining breakpoint junctions was hampered by insufficient depth of coverage. In total, in the second sample, nanopore sequencing at 11x depth allowed the detection of 29 of 31 (91%) breakpoint junctions, which compared favourably to the 22 (69%) detected using short-read sequencing at 30x coverage. Furthermore, four validated breakpoint junctions were only detected using nanopore sequencing and were not found in either the long-mate pair or short-read data set. By subsampling their data, the team were able to identify 14x depth of coverage as the minimum required to detect all breakpoint junctions using nanopore sequencing.

16

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 19: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

a)

b)

Short-readNanopore

Long-insert mate-pair

Figure 12 Nanopore sequencing accurately detects more chromothripsis breakpoints than alternative sequencing approaches. a) Circos plot of all breakpoint junctions in a complex chromothripsis sample. b) Comparison of different sequencing approaches to genotype breakpoint junctions. SVs were detected in short-read and nanopore data using the Delly and NanoSV tools respectively. Figure adapted from Cretu Stancu et al.3

An important benefit of long nanopore sequencing reads is the facility for phasing. It had previously been hypothesised that germline chromothripsis originates from paternal chromosomes; however, this was based on only a few breakpoint junction sequences or deletions. Using nanopore sequencing, the team were able to phase all of the chromothripsis breakpoints detected, identifying their paternal origin and thereby providing further weight to the earlier hypothesis.

In the course of this research, the performance of a number of long-read SV calling tools was assessed, with the team demonstrating that their in-house developed tool, NanoSV, provided superior performance over a range of experimental parameters.

Summarising their research, the team suggest that their work: ‘demonstrates the potential of long-read, portable sequencing technology for human genomics research and clinical applications’.*

Phasing of all chromothripsis break-points demonstrated paternal origin.

chr9

chr1

chr5

not detecteddeletionduplicationinversioninterchromosomal

* Nanopore devices are currently for research use only.

17

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 20: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Case study 4Analysing SNVs, SVs and phasing using targeted nanopore sequencing

Long nanopore reads allowed more accurate characterisation of a 55 bp exonic deletion.

Gaucher disease (GD), the most common lysosomal storage disorder, is caused by homozygous or biallelic mutations in the GBA gene. Heterozygous mutations in this gene are also a significant risk factor for Parkinson’s disease and other disorders42. The complex structure of the genomic region incorporating GBA, which includes multiple pseudogenes (with up to 96% homology), complicates analysis using PCR and traditional short-read DNA sequencing techniques. Long-sequencing reads offer an alternative, more streamlined solution for complete analysis of the GBA gene. In order to assess this the validity of this approach, a team of researchers from the UK and USA utilised the MinION in combination with long-range PCR to amplify and sequence the entire ~8 kb gene18.

The team amplified the GBA gene from brain or saliva samples taken from different individuals with GD. All 10 samples were multiplexed and run on a single MinION flow cell, delivering 150-500x coverage. Reads were aligned to a human reference genome (Hg19) using both the Graphmap and NGMLR tools prior to single nucleotide variant

(SNV) calling using nanopolish. All previously characterised coding missense mutations were correctly identified. In addition, a number of non-coding SNVs were detected and any false-positives, while rare, could be easily identified and excluded. The team found that the NGMLR alignment tool provided optimal results and recommended a coverage of >300x for accurate determination of zygosity. Using the Sniffles and nanopolish tools for structural variant analysis, the team detected a single 55 bp exonic deletion in one of the samples.

The long reads provided by nanopore sequencing allowed more accurate characterisation of this SV than was possible using previous short-read based methodology – indicating a different site of recombination than previously thought. A further advantage of the long nanopore reads was the facility to phase the variants, which helps overcome the requirement to analyse relatives in clinical research samples. Using the Whatshap tool, the team were able to confirm compound heterozygosity in all relevant samples.

18

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 21: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Commenting on this research, the team state: ‘The rapid evolution of specific bioinformatic methods, and the improvements in accuracy and data yield, combined with the minimal footprint and capital investment, make the MinION a suitable platform for long-read sequencing of difficult genes such as GBA, both in the diagnostic and research environments’ 18.*

The team now plan to amplify the whole genomic region incorporating the GBA gene and pseudogene (located 20 kb downstream) to fully characterise structural variation across the entire region of interest.

VARIANT CALLING

True variants phased using WhatsHap.

Oxford Nanopore Technologies, the Wheel icon and MinION are registered trademarks of Oxford Nanopore Technologies in various countries. All other brands and names contained are the property of their respective owners. © 2018 Oxford Nanopore Technologies. All rights reserved. The MinION is for research use only.

BASECALLING& DEMULTIPLEXING

Raw data.

Reads base called and demultiplexed (Albacore now recommended platform). FASTQ file output. Only ‘pass’ reads further analysed.

Reads mapped to reference genome (hg19) using NGMLR. Coverage calculated using bedtools.

MAP TO REFERENCE

PHASING

nanoporetech.com nanoporetech.com/publications

SNVs detected using Nanopolish and structural variants detected using Sniffles.

Data analysis

1. Leija-Salazar, M. et al (2018) Detection of GBA missense mutations and other variants using the Oxford Nanopore MinION. bioRxiv 288068.

2. Nacheva, E. et al (2017) DNA isolation protocol effects on nuclear DNA analysis by microarrays, droplet digital PCR, and whole genome sequencing, and on mitochondrial DNA copy number estimation. PLoS One12:e0180467.

3. Sedlazeck, F.J. et al (2017) Accurate detection of complex structural variations using single molecule sequencing. bioRxiv 169557.

4. Quinlan, A.R. and Hall, I.M. (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841-2.

5. Loman, N.J., Quick, J. and Simpson, J.T. (2015) A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature methods 12(8):733-735.

6. Martin, M. et al (2016) WhatsHap: fast and accurate read-based phasing. bioRxiv 85050.

References

Find out more about real-time, long-read amplicon sequencing at www.nanoporetech.com.

The analysis workflow below has been updated from that used by Leija-Salazar et al.1 to take advantage of the improved basecalling and functionality offered by the Albacore basecaller. Only downstream analysis tools recommended by the authors are presented; however, other tools were assessed. More information can be found in the full publication.

Figure 13 Data analysis workflow. Only downstream analysis tools recommended by the authors are presented; however, other tools were assessed18.

* Nanopore devices are currently for research use only.

19

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 22: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Case study 5Targeted, amplification-free DNA sequencing using CRISPR/Cas9

Traditional amplification-based targeted sequencing approaches can be limited by a number of factors, including base composition (e.g. GC-rich content) and bias (e.g. allele bias). In addition, while long-range PCR approaches can, with careful optimisation, generate fragments in the region of 20 kb, such fragment sizes may not cover all genes or regions of interest and do not fully exploit the ultra-long read length capabilities provided by nanopore sequencing. Furthermore, the process of amplification removes all information on base modifications, thereby losing a potentially informative source of variation. In order to address these challenges, researchers are now investigating the potential of CRISPR/Cas9 techniques to enrich for specific regions of interest.

At Johns Hopkins University, USA, Timothy Gilpatrick is researching the methylation profiles of known cancer driver genes44. In order to cost-effectively target specific regions of the genome whilst maintaining epigenetic marks, Timothy employed a CRISPR/Cas9 enrichment methodology together with nanopore sequencing. Following a successful pilot study in E. coli, where 20,000-fold coverage of a 5 kb targeted fragment was observed, Timothy moved on to human DNA where he targeted the hTERT gene, which

encodes a core protein component of the telomerase complex. Telomerase, which acts to maintain the telomeric sequence at the ends of chromosomes, is inactive in most somatic cells; however in cancer cells, telomerase activity is commonly turned on. This telomerase activity may be associated with methylation of the hTERT gene promoter.

The repetitive nature and high GC content of the hTERT gene region makes it difficult to analyse using conventional PCR amplicons. Using the CRISPR/Cas9 approach on a thyroid cancer cell line, Timothy was able to demonstrate 50-fold increase in coverage of the targeted 2.4 kb region. Subsequent analysis of the nanopore sequencing data allowed identification of the epigenetic modification 5-methylcytosine with high concordance to alternative bisulfite sequencing approaches. A further benefit of the long nanopore reads is the facility for phasing and the identification of methylation patterns across both parental alleles.

20

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 23: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

The team at Johns Hopkins are also examining the potential of CRISPR/Cas9 enrichment to analyse the methylation status of a panel of gene promoters, which, they believe, may lead to diagnostic, prognostic and therapeutic applications.

At Tel Aviv University, Israel, researchers are applying Cas9-assisted targeting of chromosome segments (CATCH) to characterise the entire BRCA1 gene45. Mutations in BRCA1 are associated with significantly increased risk of breast, ovarian and other cancers. Due to the large size of this gene (~80 kb), current sequencing methodologies focus on the coding sequence, neglecting potentially important intronic and regulatory regions. In addition, the BRCA1 genomic region is also highly repetitive (50%), which contributes to genetic instability and genomic rearrangements, and can be difficult to analyse using short-read sequencing.

In brief, the CATCH methodology uses Cas9 to excise the targeted region, which is then separated using pulsed field gel electrophoresis, prior to amplification and sequencing. Cas9 allows enrichment of a target region without prior knowledge of its sequence. Only the sequence of flanking regions needs to be known. Applying this technique, the team enriched a 200 kb region containing the entire 80 kb BRCA1 gene, regulatory elements and non-coding regions. The target was enriched 237-fold and sequenced at up to 70x coverage on a single MinION flow cell. The data revealed a deletion of three di-nucleotide blocks in a 44 bp repeat array which was not detected using short-read sequencing. In addition, the read-lengths provided by nanopore sequencing were sufficient for analysis of structural variation. The team commented that this approach: ‘…may shed light on the mechanisms of disease onset and progression’45. They are now investigating the potential for direct sequencing without the requirement for amplification, in order to retain and study epigenetic marks.

A 200 kb region containing the entire 80 kb BRCA1 gene was enriched.

150bp TERT

Reference Allele

Alternate Allele

mutation

Figure 14 Distal, allele-specific DNA methylation patterns in the TERT locus of a BCPAP thyroid cancer cell line identified using nanopore sequencing. Black lines indicate points of CpG methylation. The alleles are distinguishable by a point mutation (green line).

Reference allele

Alternate allele

21

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 24: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Case study 6Native RNA sequencing of human polyadenylated transcripts

According to Dr. Angela Brooks of the University of California, Santa Cruz: ‘Short-read sequencing has revolutionised our understanding of the transcriptome […] but there is a huge limitation’28. These limitations include the loss of positional information and base modifications through the requirement to fragment and amplify the RNA molecules respectively. In addition, amplification also leads to bias which can negatively impact results. Angela and her colleagues at University of California, Santa Cruz are part of the Nanopore RNA Consortium, which comprises laboratories from six leading universities. The aim for the Consortium is to generate a reference dataset for the human transcriptome that has been sequenced in its native form, sharing methods and data with the scientific community.

The Consortium sequenced mRNA from the GM12878 cell line using both native nanopore RNA sequencing and cDNA sequencing, generating ~13 million and ~24 million reads respectively28.

Initial analysis showed the median native RNA read length to be longer than the median cDNA read length, which may be due to PCR bias in the cDNA preparation.

Good correlation of gene expression levels was observed between the two techniques and also with data obtained from short-read sequencing of the same cell line – confirming the validity of the nanopore data set. The reproducibility of the native RNA sequencing technique was also demonstrated through the delivery of highly concordant data across all consortium laboratories.

The team are now using orthogonal data to build a high-confidence set of full-length isoforms. The longest isoform that was detected by Angela and the Consortium was for Sorl1, a >10 kb read which spanned 48 exons and has been implicated in Alzheimer’s disease.

The Consortium also demonstrated the potential of long-read RNA sequencing to detect allele-specific expression. Examining the coverage data for a number of nucleotide positions across the Xist gene, which is located on the X chromosome, allowed the identification of paternal expression bias (Figure 15).

Another area of interest for the consortium is the detection of poly-A tail length, which has been shown to play a role in post-transcriptional regulation. As the nanopore sequencing adapter sits at the 3’ end of the poly-A tail, it is possible to use specific signals in the raw data, such as dwell times, to estimate the poly-A tail length.

The Nanopore RNA Consortium provides methods and data to the scientific community.

22

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 25: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

The accuracy of this technique was confirmed using spike-in controls with known tail lengths.

Direct RNA sequencing also allows the analysis of base modifications which are lost when using alternative sequencing approaches. By synthesising and sequencing RNA transcripts containing only a specific, known modification and comparing these with sequences from

Figure 15 Nanopore sequencing allows identification of allele-specific expression. Figure courtesy of the Nanopore RNA Consortium.

1 @NanoporeConf | #NanoporeConf

Coverage(0–350)

Reads[squished]

Xistvariants 2487834paternal=Cmaternal=T

2487835paternal=Tmaternal=C

2487837paternal=Amaternal=T

molecules without this modification, the team were able to show clear shifts in the nanopore signal. The team are now using these model training datasets to enhance basecalling algorithms, allowing the detection of both the position and type of modification in native RNA molecules.

The Nanopore RNA Consortium data is available at: github.com/nanopore-wgs-consortium/NA12878

23

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 26: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Case study 7High-throughput analysis of single cell transcriptomes

Traditionally, gene expression experiments are undertaken on samples containing millions of cells. While this allows the identification of differentially expressed genes and transcripts in distinct cell populations, many subtle differences between cells in the same sample can be overlooked. Recent advances in high-throughput cell separation and transcriptomic analysis techniques has spawned the burgeoning field of single-cell transcriptomics, which allows much more detailed analysis of gene expression.

At the University of California, Santa Cruz, Dr. Christopher Vollmers and his team are using single-cell transcriptomics to investigate gene expression in B cells. As Dr. Vollmers points out: ‘Each B cell makes a unique antibody transcript, so you really have to go at the single cell level to understand what B cells do’46.

In order to obtain full-length transcripts, which is a significant challenge when using short-read sequencing technology, the team developed a novel amplification strategy, which, when combined with the long reads delivered by nanopore sequencing, provided highly accurate, full-length reads. This Rolling Circle to

Concatermeric Consensus (R2C2) method allows the generation of a consensus sequence for each transcript, thereby increasing base accuracy (Figure 16).

Utilising this technique allowed transcriptome analysis of 96 individual B cells, delivering over 400,000 full-length cDNA reads with a median base accuracy of 94%47. Using an updated version of their Mandalorion data analysis pipeline, these reads could be used to identify high-confidence RNA transcript isoforms. A key finding of their study was that many of the B cells analysed, which were obtained from a healthy individual, express isoforms of the CD19 gene that lack the epitope targeted by CAR T-cell therapy — a discovery which may have significant implications in cancer treatment. This finding would not have been possible using short-read sequencing or without single-cell analysis.

According to the team: ‘The R2C2 method generates a larger number of accurate reads of full-length RNA transcript isoforms than any other available long-read sequencing method’46. They further comment that they: ‘...believe that R2C2 has the potential to replace short-read RNA-seq and its shotgun approach to transcriptome analysis entirely, especially considering the […] wide release of the high-throughput PromethION sequencer’47.

The R2C2 technique delivers highly accurate, full-length transcripts.

24

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 27: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

DNA splintGibson assembly

cDNA molecule

Accurate full-lengthcDNA sequence

Consensus calling (racon)

Subread alignment (poaV2)

SW repeat finderInaccurateraw read

Rapid library prep

Nanopore sequencing

RCA

Figure 16 Schematic of the R2C2 method. Following cDNA circularisation, rolling circle amplification creates multiple joined copies of the transcript. After sequencing, each read is split into its constituent subreads which are then aligned to generate an accurate consensus sequence. Figure courtesy of Dr. Christopher Vollmers, University of California, Santa Cruz.

25

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 28: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Summary3Our knowledge of the human genome, its genes and their function has advanced considerably since the publication of the first human genome sequence in 2003. These advances have been supported by the rapid development of genomic analysis technologies, allowing faster, more detailed and more affordable genetic analyses. However, the inherent challenges of traditional short-read sequencing technologies limits their ability to fully characterise the whole spectrum of genetic variation.

The long, direct and real-time sequencing reads offered by nanopore technology delivers a step-change in human genetics research, allowing routine and complete characterisation of highly important genomic events such as structural variation, repetitive regions, phasing, RNA isoforms and base modifications. Using nanopore sequencing, researchers are now unlocking the secrets of the genome — from characterising complete centromeres to discovering new, highly expressed transcript isoforms. As stated by Dr. Winston Timp at Johns Hopkins University, USA, ‘Nanopore sequencing is a tool that can let us look at things that we couldn’t otherwise see’ 48.

26

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 29: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

About Oxford Nanopore Technologies4Oxford Nanopore Technologies introduced the world’s first nanopore DNA sequencer, the MinION — a portable, real-time, long-read, low-cost device – followed by the larger GridION X5 and PromethION, and smaller Flongle. The long reads offered by nanopore sequencing deliver a complete understanding of human genetic variation, allowing enhanced characterisation of structural variation, repetitive regions, haplotype phasing, RNA splice variants, isoforms and fusion transcripts.

A range of platforms are available to meet the coverage and throughput requirements for all sequencing applications (Table X).

For the latest information about applying long-read nanopore sequencing to your human genetics research, visit www.nanoporetech.com/applications.

Flongle MinION GridION X5 PromethION (1 flow cell)

PromethION (48 flow cells)

Read length Fragment length = read length. Longest read now >2 Mb

Run time1 1 min - 16 hrs 1 min - 48 hrs 1 min - 48 hrs 1 min - 64 hrs 1 min - 64 hrs

Theoretical maximum 1D Yield

Up to 3.3 Gb Up to 40 Gb Up to 200 Gb Up to 315 Gb Up to 15 Tb

Current yield range 1D

Early access to start ASAP - commercial target 1 Gb

Up to 30 Gb (Rev D Chip)

Up to 150 Gb (Rev D Chip)

Up to 150 Gb

Multiplexing enabled Not in first release

Kits for 96 samples

Kits for 96 samples

Coming soon Coming soon

Number of channels available for sequencing

Up to 126 Up to 512 Up to 2,560 Up to 3,000 Up to 144,000

27

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 30: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

References51. Jain, M. et al. Nanopore sequencing and

assembly of a human genome with ultra-long reads. Nat Biotechnol. 36(4):338-345 (2018).

2. Conrad, D. F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

3. Cretu Stancu, M. et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nat Commun. 8(1):1326 (2017).

4. Miga, K.H., Eisenhart, C. and Kent, W.J. Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments. Nucleic Acids Research 43(20): e133-e133 (2015).

5. Cao, M.D. et al. Scaffolding and completing genome assemblies in real-time with nanopore sequencing. Nat Commun. 8:14515 (2017).

6. Cawthon, R.M., Smith, K.R., O’Brien, E., Sivatchenko, A., and Kerber, R.A. Association between telomere length in blood and mortality in people aged 60 years or older. Lancet. 361 (9355): 393–5 (2003).

7. De Roeck, A. Human genome sequencing on PromethION to investigate tandem repeats in dementia. Presentation. Available at: https://nanoporetech.com/resource-centre/human-genome-sequencing-promethion-investigate-tandem-repeats-dementia [Accessed: 18 August 2018].

8. Brandler, W.M. et al. Paternally inherited cis-regulatory structural variants are associated with autism. Science. 360(6386):327-331 (2018).

9. Ishiura, H. Expansions of intronic TTTCA and TTTTA repeats in benign adult familial myoclonic epilepsy. Nat Genet. 50(4):581-590 (2018).

10. Euskirchen, P. et al. Same-day genomic and epigenomic diagnosis of brain tumors using real-time nanopore sequencing. Acta Neuropathol. 134(5):691-703 (2017).

11. Gong, L. et al. Picky comprehensively detects high-resolution structural variants in nanopore long reads. Nat Methods 15(6):455-460 (2018).

12. Kellog, E.A. Genome sequencing: Long reads for a short plant. Nature Plants 1, 15169 (2015).

13. Payne, A., Holmes, N., Rakyan, V. and Loose, M. Whale watching with BulkVis: A graphical viewer for Oxford Nanopore bulk fast5 files. bioRxiv 312256 (2018).

14. Jansen, H. The beauty and the beast. Presentation. Available at: https://nanoporetech.com/resource-centre/talk/beauty-and-beast. [Accessed: 15 June 2018]

15. Salzberg, S. Assembly of large genomes using Oxford Nanopore and Illumina data. Presentation. Available at: https://nanoporetech.com/resource-centre/assembly-large-genomes-using-oxford-nanopore-and-illumina-data [Accessed 30 July 2018]

16. Oxford Nanopore Technologies. Incorporating sequence capture into library preparation for MinION GridION and PromethION. Online. Available at: https://nanoporetech.com/resource-centre/incorporating-sequence-capture-library-preparation-minion-gridion-and-promethion [Accessed: 01 August 2018]

17. Agilent. Use of Agilent SureSelect to perform targeted long-read nanopore sequencing. Online. Available at: https://www.agilent.com/cs/library/applications/5991-8056EN-2%20Sure%20Select%20App%20Note.pdf [Accessed: 1 August 2018]

18. Leija-Salazar, M. et al. Detection of GBA missense mutations and other variants using the Oxford Nanopore MinION. BioRxiv 288068 (2018).

19. Loose, M., Malla, S., and Stout, M. Real-time selective sequencing using nanopore technology. Nat Methods. 13(9):751-4 (2016).

20. Bartsch, M. et al. Real-time selective sequencing with RUBRIC. Presentation. Available at: https://nanoporetech.com/resource-centre/real-time-selective-sequencing-rubric [Accessed 30 July 2018]

21. Steijger, T. et al. Assessment of transcript reconstruction methods for RNA-seq. Nature Methods 10, 1177–1184 (2013).

22. Martin, J. A. and Wang, Z. Next-generation transcriptome assembly. Nature Reviews Genetics 12, 671-682 (2011).

23. Clark, M. et al. Long-read sequencing reveals the splicing profile of the calcium channel gene CACNA1C in human brain. bioRxiv 260562 (2018).

28

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 31: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

24. Oikonomopoulos, S., Wang, Y. C., Djambazian, H., Badescu, D. and Ragoussis, J. Benchmarking of the Oxford Nanopore MinION sequencing for quantitative and qualitative assessment of cDNA populations. Scientific reports 6 (2016).

25. Garalde, D. R. et al. Highly parallel direct RNA sequencing on an array of nanopores. Nature Methods 15(3): 201–206 (2018).

26. Oxford Nanopore Technologies. Low bias RNA-seq: PCR-cDNA, PCR-free direct cDNA and direct RNA sequencing. Poster. Available at: https://nanoporetech.com/resource-centre/low-bias-rna-seq-pcr-cdna-pcr-free-direct-cdna-and-direct-rna-sequencing [Accessed: 20 August 2018]

27. Timp, W. and Jain, M. Direct RNA cDNA sequencing of the human transcriptome. Presentation. Available at: https://nanoporetech.com/resource-centre/videos/direct-rna-cdna-sequencing-human-transcriptome [Accessed: 20 August 2018]

28. Brooks, A. Native RNA sequencing of polyadenylated transcripts. Available at: https://nanoporetech.com/resource-centre/native-rna-sequencing-human-polyadenylated-transcripts [Accessed: 1 August 2018]

29. Jalkanen, A.L., Coleman, S.J. and Wilusz, J. Determinants and implications of mRNA poly(A) tail size — Does this protein make my tail look big? Semin Cell Dev Biol. 34: 24–32 (2014).

30. Weng, Y.L., An, R., Shin, J., Song, H., and Ming, G.L. DNA modifications and neurological disorders. Neurotherapeutics. (4):556-67 (2013)

31. Simpson, J.T. et al. Detecting DNA cytosine methylation using nanopore sequencing. Nature Methods 14: 407–410 doi:10.1038/nmeth.4184 (2017).

32. Smith, A.M. et al. Reading canonical and modified nucleotides in 16S ribosomal RNA using nanopore direct RNA sequencing. bioRxiv 132274 (2017).

33. Ammar, R. et al. Long read nanopore sequen-cing for detection of HLA and CYP2D6 variants and haplotypes. F1000Research 4, 17 (2015).

34. Norris, A.L. et al. Nanopore sequencing detects structural variants in cancer. Cancer Biol Ther 17(3): 246-253 (2016).

35. de Koning, A.P., Gu, W., Castoe, T.A., Batzer, M.A., and Pollock, D.D. PLoS Genet. Repetitive elements may comprise over two-thirds of the human genome. 7(12):e1002384 (2011).

36. Payne, A., Holmes, N., Rakyan, V. and Loose, M. Whale watching with BulkVis: A graphical viewer for Oxford Nanopore bulk fast5 files. bioRxiv 312256 (2018).

37. Kuderna, L.F.K. et al. Selective single molecule sequencing and assembly of a human Y chromosome of African origin. bioRxiv 342667 (2018).

38. GitHub. CANU. Available at: <www.github.com/marbl/canu> [Accessed: 20 August 2018]

39. GitHub. Nanopolish. Available at: <www.github.com/jts/nanopolish> [Accessed: 20 August 2018]

40. GitHub. Pilon. Available at: <https://github.com/broadinstitute/pilon> [Accessed: 20 August 2018]

41. Jain, M. Linear assembly of a human centromere on the Y chromosome. Nat Biotechnol. 36(4):321-323 (2018).

42. Proukakis, C. Detection of GBA missense mutations and other variants using Oxford Nanopore MinION. Presentation. Available at: https://nanoporetech.com/resource-centre/detection-gba-missense-mutations-and-other-variants-using-oxford-nanopore-minion [Accessed:1 August 2019]

43. BBC. Handheld device sequences human genome. Online. Available at: https://www.bbc.co.uk/news/health-42838821 [Accessed: 01 August 2018]

44. Gilpatrick, T. Cas9 targeted enrichment for nanopore profiling of methylation at known cancer drivers. Presentation. Available at: https://nanoporetech.com/resource-centre/cas9-targeted-enrichment-nanopore-profiling-methylation-known-cancer-drivers [Accessed: 1 August 2018]

45. Gabrieli, T. et al. Selective nanopore sequencing of human BRCA1 by Cas9-assisted targeting of chromosome segments (CATCH). Nucleic Acids Res. gky411 (2018).

46. Vollmers, C. Improving MinION read accuracy to enable the high-throughput analysis of single cell transcriptomes. Presentation. Available at: https://nanoporetech.com/resource-centre/improving-minion-read-accuracy-enable-high-throughput-analysis-single-cell [Accessed: 1 August 2018]

47. Volden, R. et al. Improving nanopore read accuracy with the R2C2 method enables the sequencing of highly-multiplexed full-length single-cell cDNA. Proc Natl Acad Sci U S A. doi: 10.1073/pnas.1806447115 (2018).

48. Timp, W. Direct RNA-seq project shows nanopore sequencing can reveal new insights into basic biology. Podcast. Available at: https://mendelspod.com/podcasts/direct-rna-seq-project-shows-nanopore-sequencing-can-reveal-new-insights-basic-biology [Accessed: 17 August 2018]

49. Jansen, H. The beauty and the beast. Presentation. Available at: https://nanoporetech.com/resourcecentre/talk/beauty-and-beast [01 August 2018]

50. Jenjaroenpun, P. Complete genomic and transcriptional landscape analysis using third-generation sequencing. Nucleic Acids Res. 46(7):e38 (2018).

29

OXFORD NANOPORE TECHNOLOGIES | ADVANCING HUMAN GENETICS RESEARCH WITH NANOPORE SEQUENCING

Page 32: Advancing human genetics …...Advancing human genetics research with nanopore sequencing 1 The advantages of nanopore sequencing for human genetics research 2 3 4 Summary References

Oxford Nanopore Technologies

phone +44 (0)845 034 7900

email [email protected]

twitter @nanopore

www.nanoporetech.com

Oxford Nanopore Technologies, the Wheel icon, GridION, Flongle, Metrichor, MinION, MinIT, MinKNOW, PromethION, SmidgION and VolTRAX are registered trademarks of Oxford Nanopore Technologies in various countries. All other brands and names contained are the property of their respective owners. © 2018 Oxford Nanopore Technologies. All rights reserved. Flongle, GridION, MinION, PromethION and VolTRAX are currently for research use only.

HG_W1010_v1_revA_03Oct2018