22
Additional File 1 Figure S1 – Quality of PacBio Reads Figure S2 – Comparison Long and Short data sets Figure S3 – Within transcript variation Figure S4 – No increased bias for longer ERCC transcript Figure S5 – Verification of isoforms by Sanger and amplification-free PacBio seuqencing Figure S6 – Major isoform expression Figure S7 – Coding and non-coding variation Figure S8 – Example ideal read Figure S9 – Example concatenated read Figure S10 – List of custom oligonucleotides

Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Additional File 1 Figure S1 – Quality of PacBio Reads Figure S2 – Comparison Long and Short data sets Figure S3 – Within transcript variation Figure S4 – No increased bias for longer ERCC transcript Figure S5 – Verification of isoforms by Sanger and amplification-free PacBio seuqencing Figure S6 – Major isoform expression Figure S7 – Coding and non-coding variation Figure S8 – Example ideal read Figure S9 – Example concatenated read Figure S10 – List of custom oligonucleotides

Page 2: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S1 (A) Quality of PacBio reads. Comparison of read length between the first 5 SMRT cells and the last 2 SMRT cells. As mentioned in Methods there was an added purification step and a change in chemistry between the first five SMRT cells and the last two. Note the change of scales on the vertical axis for the last two SMRT cells. (B) GC distribution among all pooled PacBio sequences. The theoretical distribution was calculated from the observed data and used to build a reference distribution. (C) Average quality per read. (D) Sequence nucleotide content across all bases. Note the shift in scale in the X-axis to allow for individual bases at the start of the sequence. Note also that few reads are longer than 5 kb. All subfigures in Figure S1 were made with the software FastQC

01020

4050

30

%T%C%A%G

All PacBio Reads First 5 SMRT cells Last 2 SMRT cells

80’000

40’000

0

60’000

20’000

2000

-249

9

500-

990

3500

-399

9

5000

-549

9

6500

-699

9

8000

-849

9

9500

-999

9

80’000

40’000

0

60’000

20’000

2000

-249

9

500-

990

3500

-399

9

5000

-549

9

6500

-699

9

8000

-849

9

9500

-999

9

4’000

0

6’000

2’000

2200

-239

9

600-

799

3800

-399

9

5400

-559

9

7000

-719

9

8600

-879

9

2’000

0

3’000

1’000

600 10 20 9040 50 70 8030GC distribution

Num

ber o

f rea

ds

GC count per readTheoretical Distribution

Sequence Length Sequence Length Sequence Length

Average quality per read

4’000

0

6’000

2’000

8’000

10’000

10 20 4030Mean Sequence Quality (Phred Score)

Sequence content across all bases

Position in read (bp)

1 95

2200

-239

9

600-

799

3800

-399

9

5400

-559

9

7000

-719

9

8600

-879

9

1020

0-10

399

A

B C

D

Sequence Length (bp)Sequence Length (bp) Sequence Length (bp)

Num

ber o

f rea

ds

Page 3: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative data set (see Methods), which contains reads from both the long and short data set. (A) Scatterplot depicting for each gene the longest sequenced transcript and the longest annotated isoform in bp. (B) Distribution of transcript lengths. The green bars show number of observed transcripts, while the red bars shows actual annotated lengths of the corresponding genes.

Page 4: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S3 Within-transcript variation based on reads sharing the same UMI, showing technical errors in isoform boundaries. (A) Histograms showing the offset from median alignment position for 5ʹ ends, 3ʹ ends and splice sites. (B) Magnified view of 5ʹ and 3ʹ ends. (C) Magnified view of exon junctions.

24403

0.00%

0.25%

0.50%

0.75%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

119793

0.00%

0.25%

0.50%

0.75%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

24403

0%10%20%30%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

24403

0%10%20%30%

100%<−

100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

119793

0.00%

0.25%

0.50%

0.75%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

24403

0.00%

0.25%

0.50%

0.75%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

24403

<−1000

−1000

−100 −10 0 10 100

1000

>1000

119793

<−1000

−1000

−100 −10 0 10 100

1000

>1000

24403

<−1000

−1000

−100 −10 0 10 100

1000

>1000

24403

<−1000

−1000

−100 −10 0 10 100

1000

>1000

e

Transcripts5 prime 3 prime

ERCC5 prime 3 primeA

BZoom in 5 prime Zoom in 3 prime

ERCC

Tran

scrip

ts

C Zoom in exon junctions

a b c d e f

b d

a f

c

119793

<−1000

−1000

−100 −10 0 10 100

1000

>1000

24403

<−1000

−1000

−100 −10 0 10 100

1000

>1000

20%10%

30%11307

<−1000

−1000

−100 −10 0 10 100

1000

>1000

11307<−

1000

−1000

−100 −10 0 10 100

1000

>1000

11307

0%10%20%30%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

11307

0%10%20%30%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

100%

100% 100%

100%

100%

Page 5: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S4 Histogram showing the offset from median for 5ʹ , 3ʹ ends and splice sites, in terms of percentage of zero offset for reads mapping to ERCC, split by ERCC length. Red, observed number of transcripts. Note that the Y-axis is truncated and that the X-axis is shown in bins of 10 bp outside the region of ± 10 bp. Note also that there were no ERCC transcripts with length between 1300 and 1900 bp and that the number of transcripts longer than 1900 bp are very few.

1510

0%10%20%30%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

5’ end 3’ end

ERCC

500

-900

bp

1510

0%10%20%30%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

ERCC

900

-130

0 bp

ERCC

190

0-20

00 b

p

3248

0%10%20%30%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

3248

0%10%20%30%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

11

0%10%20%30%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

11

0%10%20%30%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

Page 6: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S5A Sanger Verification of CNP (2',3'-Cyclic Nucleotide 3' Phosphodiesterase) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For Oligo 1.1, Oligo 5.1, Oligo 5.2 and VLMC 2 Pacbio single-cell data exists (marked in black). For cell Oligo 1.2 Pacbio and limited Sanger sequencing exists (marked in red). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBI’s Primer-Blast.

1 2 3 4

1 2 3 4

chr11:

5 kb mm10100,576,000 100,578,000 100,580,000 100,582,000

PP1 (210)

PP2 (Na)

PP3 (2069)

PP4 (855)

1a1b 2a

3a3b

3c

Annotated isoforms RefSeq

Sanger sequencing

PacBio sequencing

Primer pairs

Olig

o 1.

1O

ligo

1.2

Olig

o 5.

1 O

ligo

5.2

VLM

C 2

100

500

100

500

PP1

PP2

PP3

PP4

Ladder

abc

a

a

b

a

a

a

b

b

Sanger seq didn’t workSequenced and Correct mapping

Not sequenced but Correct mapping in another cellNot sequenced

b

4a

Olig

o 1.

2 O

ligo

1 in

depe

nden

tO

ligo

1.2

Olig

o 1

inde

pend

ent

Sanger verified exon isoforms

A

B

C

D

E

F

Exons: 1 2 3 45’ 3’

1159

2

333

Verified Not verified

Page 7: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

D| Sanger sequencing results mapped to the mouse genome E| Isoforms verified by Sanger sequencing:

• PP1 verified that exon 1 isn’t an artifact and that when exon 1 is prescent then exon 2 is truncated

• PP2 verified that full length exon 2 exists, but failed to show internal introns in exon 2 that could be seen in the sequencing data

• PP3 verified isoforms with exon 3 exists but failed to show isoforms without exon 3

• PP4 failed to verify exon 4 internal introns. The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. Note that most sequences comes from Oligo 1.1, and that sanger sequencing was done on an independent cell. F| Single cell isoforms from PacBio sequencing Na, Not applicable, i.e. no valid primer product in primer blast

Figure S5B Amplification-free PacBio sequencing of CNP (2',3'-Cyclic Nucleotide 3' Phosphodiesterase) The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red.

chr11:

2 kb mm10

100,576,000 100,578,000 100,580,000 100,582,000

Amplification-free bulk sequencing

Annotated isoforms RefSeq

Verification of PacBio suggested exon isoforms

1159

2

333

Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing

Neither verified by sanger nor amplification-free sequencing

Page 8: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S5C Sanger Verification Hmgcs1 (3-Hydroxy-3-Methylglutaryl-CoA Synthase 1) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For Oligo 1.1 and Oligo 5.2 Pacbio single-cell data exists (marked in black). For cell Oligo 1.2 Pacbio and limited Sanger sequencing exists (marked in red). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBIs Primer-Blast.

1 2 3 4 5 6

a

b

a

a

chr13:5 kb mm10

119,695,000 119,700,000 119,705,000

PP1 (151, 210) PP2 (456)O

ligo

1 in

depe

nden

t O

ligo

1.2

Olig

o 1.

2

a

a

2a1a1a1c1b

1 2

1 2

a

abc

100

500

100

500

Annotated isoforms RefSeq

Olig

o 1.

1O

ligo

1.2

Olig

o 5.

2

Sanger sequencing

PacBio seqeucning

Primer pairs

Olig

o 1

inde

pend

ent

Ladd

er

PP1

PP2

Sanger verified exon isoforms33

21

7

25

Sanger seq didn’t workSequenced and Correct mapping

Not sequenced but Correct mapping in another cellNot sequenced

Verified Not verified

A

B

C

D

E

F

Exons: 1 2 3 4 5 6 7 8 910 11

100

Page 9: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

D| Sanger sequencing results mapped to the mouse genome E| Isoforms verified by Sanger sequencing:

• PP1 verified the existence of three isoforms of exon 2, either it is absent, or it has a short or long version of the exon.

• PP2 failed to verify internal introns in exon 11 The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. F| Single cell isoforms from PacBio sequencing

Figure S5D Amplification-free PacBio sequencing of Hmgcs1 (3-Hydroxy-3-Methylglutaryl-CoA Synthase 1) The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red.

chr13:

10 kb mm10

119,690,000 119,695,000 119,700,000 119,705,000 119,710,000

Amplification-free bulk sequencing

Annotated isoforms RefSeq

33

21

7

25

Verification of PacBio suggested exon isoforms

Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing

Neither verified by sanger nor amplification-free sequencing

Page 10: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S5E Sanger Verification Mbp (Myelin Basic Protein) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For Oligo 1.1 Pacbio sequencing, PCR amplification and limited Sanger sequencing data exists (marked in green). For Oligo 1.2 Pacbio and Sanger sequencing exists (marked in red). For Oligo 5.1 and Oligo 5.2 Pacbio single-cell data exists (marked in black). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBI’s Primer-Blast. Note that PP6, PP7 and PP8 are spanning exon junctions.

chr18:10 kb mm10

82,560,000 82,565,000 82,570,000 82,575,000 82,580,000 82,585,000

100

500La

dder

Ladd

erPP1

PP2

PP3

PP4

PP5

PP6

PP7

PP8

PP1 (173,251)

PP2 (258,291,336,369)

PP3 (250,283,373,406)

PP4 (354,387,465,477,510,555,588)

PP5 (1289)

PP6 (101) PP7 (185)

PP8 (80)

ab

ab

a

baba

ab

ab a

b

a

ab

ab

ab

ab

ab

ab

ab a

b

acd

abc

a

a

a

a

a

5e1b

4a4b

4a

2b2a

4c

1a

4b

6a3a

7b

5a

e

5c5b

Olig

o 1.

1O

ligo

1.2

Olig

o 1

inde

pend

ent

Olig

o 1.

1O

ligo

1.2

Olig

o 5.

1 O

ligo

5.2

Anno

tate

d Re

fSeq

Isof

orm

s

100

500

100

500

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 8

Sanger sequencing

PacBio seqeucning

Primer pairsOligo 1.1

Oligo 1.2

Oligo 1 independent

Sanger verified isoforms163516

85*

A

B

C

D

E

FSanger seq didn’t workSequenced and Correct mapping

Not sequenced but Correct mapping in another cellNot sequenced

25

Exons: 1 2 3 4 65 7

Verified Not verifiedNot tested

Page 11: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

D| Sanger sequencing results mapped to the mouse genome E| Isoforms verified by Sanger sequencing:

• PP1 and PP6 verifies both isoforms of exon 2. (Note PP6 upstream primer spans the junction between exon 2 and 3)

• PP2 also verifies both isoforms of exon 2 and that exon 3, 4 and 5 are expressed together.

• PP3 verified one of the isoforms with exon 6, however the isoform without exon 6 could be seen by gel electrophoresis but not be verified by Sanger sequencing

• PP4 verifies both the existence of 2 isoforms of exon 2 and 2 isoforms of exon 6

• PP5 verified the existence of internal introns in exon 7, and that those isoforms varies from cell to cell.

• PP7 verified the connection between exon 5 and 7 (note that the upstream primer spans the exon junction between exon 5 and 7).

• PP8 failed to map. (Note that the downstream primer spans the exon junction between exon 6 and 7).

The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. The (*) indicated all isoforms with internal intron in exon 7. Note that not all exon isoforms were attempted to be verified by Sanger sequencing, for example the isoform spanning only exon 1 and 7. F| Single cell isoforms from PacBio sequencing

chr18:10 kb mm10

82,560,000 82,570,000 82,580,000 82,590,000

Amplification-free bulk sequencing

Annotated isoforms RefSeq

Exon isoforms single cell sequencing

8

1616

36

255*

Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing

Neither verified by sanger nor amplification-free sequencing

Page 12: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S5F Amplification-free PacBio sequencing of Mbp (Myelin Basic Protein) The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red. The star denotes that all isoforms with 3ʹ UTR introns are calculated, regardless of other exon structures and the length of the intron.

Page 13: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S5G Sanger Verification Mobp (Myelin-Associated Oligodendrocyte Basic Protein) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For Oligo 1.1, Oligo 5.1 and Oligo 5.2 Pacbio single-cell data exists (marked in black). For cell Oligo 1.2 Pacbio and limited Sanger sequencing exists (marked in red). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBI’s Primer-Blast. D| Sanger sequencing results mapped to the mouse genome

1 2 3 4 5 6

1a3b3a

4a3c6a

5b5a

PP1 (Na) PP2 (408)

PP4 (512)

PP5 (2654)

PP6 (Na)

PP3 (349)O

ligo

1.2

Olig

o 1

inde

pend

ent

a a

a

a

a

aa

b

b

c

Annotated isoforms RefSeq

Olig

o 1.

2

Olig

o 5.

1

Olig

o 5.

2

1 2 3 4 5 6

100

500

100

500

Olig

o 1.

2 O

ligo

1 in

depe

nden

t

Primer pairs

Sanger sequencing

chr9:10 kb mm10

120,155,000 120,160,000 120,165,000 120,170,000 120,175,000 120,180,000

Olig

o 1.

1

PacBio seqeucning

Sanger verified isoforms

PP1

PP2

PP3

PP4

PP5

PP6

Ladd

er

Sanger seq didn’t workSequenced and Correct mapping

Not sequenced but Correct mapping in another cellNot sequenced

A

B

C

D

E

F

3

178

74

2 Verified

Not verifiedNot tested

Exons: 1 2 3 4 5

3

Page 14: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

E| Isoforms verified by Sanger sequencing:

• PP1 verifies the existence of an exon between exon 2 and 3. Note that the forward Sanger sequencing primer is on exon 2.

• PP2 didn’t work • PP3 verifies two different lengths of exon 4 • PP4 verifies a long version of exon 5 • PP5 verifies an internal intron in exon 5 • PP6 verifies that if exon 5 is short if exon 6 and 7 exist

The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. Due to space limitation only exon isoforms with more than 1 UMI are shown. F| Single cell isoforms from PacBio sequencing Na, Not applicable, i.e. no valid primer product in primer blast

Figure S5H Amplification-free PacBio sequencing of Mobp (Myelin-Associated Oligodendrocyte Basic Protein) The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red.

chr9:10 kb mm10

120,150,000 120,160,000 120,170,000 120,180,000

Amplification-free bulk sequencing

Annotated isoforms RefSeq

Exon isoforms single cell sequencing

3

178

74

23

Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing

Neither verified by sanger nor amplification-free sequencing

Page 15: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S5I Sanger Verification Plp1 (Proteolipid Protein 1) A| Simplification of gel images. Each band is annotated with a letter and the corresponding alignment of Sanger sequences are shown in C. Each band is also annotated for if the it was Sanger sequenced and if the Sanger sequence mapped to the correct position in the genome. For all cells except Oligo 1.2 only Pacbio sequencing data exists (marked in black). For cell Oligo 1.2 Pacbio and limited Sanger sequencing exists (marked in red). Additionally Sanger sequencing was done on a cell not sequenced by Pacbio (Oligo 1 external cell, marked in blue) B| Gel visualization of PCR products C| Genomic position of the different primerpairs. The number in brackets is the expected PCR product length according to according to NCBI’s Primer-Blast. D| Sanger sequencing results mapped to the mouse genome

1 2 3

acd e

a

1 2 3

a

b

fa

chrX:

Oligo 1.1

Oligo 5.1

Oligo 5.2

VLMC 1

VLMC 2

10 kb mm10

136,820,000 136,825,000 136,830,000 136,835,000 136,840,000

PP1 (280, 385)

1a1b

PP2 (1921)

PP3 (1070)

3a

2f3a

2d2c2b2a

b

2e

Annotated isoforms Refseq

Oligo 1.2

Olig

o 1.

2 O

ligo

1 in

depe

nden

tO

ligo

1.2 a

b

Olig

o 1

inde

pend

ent

a

b

fa

100

500

100

500

Primer pairs

Sanger sequencing

Sanger verified isoforms

PacBio seqeucning

133408

2710

Verified

Not verifiedNot tested

Sanger seq didn’t workSequenced and Correct mapping

Not sequenced but Correct mapping in another cellNot sequenced

PP1

PP2

PP3

Ladd

er

A

B

C

D

E

F

Exons: 1 2 3 84 5 6

Page 16: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

E| Isoforms verified by Sanger sequencing:

• PP1 verified two isoforms of exon 3 • PP2 and PP3 verified multiple internal intron of exon 8

The number to the left of each isoform indicate how many instances of that exon isoform that were sequenced in all single cells combined. Due to space limitation only exon isoforms with more than 1 UMI are shown. F| Single-cell isoforms from Pacbio sequencing. Note that for all cells except Oligo 1.2 the Pacbio sequencing isoforms have been collapsed due to space limitations. ¨

Figure S5J Amplification-free PacBio sequencing of Plp1 (Proteolipid Protein 1)

chrX:

10 kb mm10

136,825,000 136,830,000 136,835,000 136,840,000

Amplification-free bulk sequencing

Plp1

Annotated isoforms Refseq

Exon isoforms single cell sequencing

133408

2710

Verified by either sanger and amplification-free sequencingVerified by both sanger and amplification-free sequencing

Neither verified by sanger nor amplification-free sequencing

Page 17: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

The single cell PacBio exon isoforms that are verified by both sanger sequencing and amplification-free bulk sequencing are colored green. If either sanger sequencing or bulk sequencing could verify an isoform it is colored yellow. If no method could verify the isoform it is colored red.

Page 18: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S6 Major isoform usage. A) Percentage major isoform was calculated across all genes with more than 10 transcripts for each cell. The major isoform was defined per gene and cell and could therefore change between cells. B) Percentage major isoform and number of exons per gene. Each dot is a gene. C) Percentage major isoform and gene expression. Each dot is a gene.

Page 19: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S7 Zoom-in on coding and non-coding exon splice sites. Histogram showing the offset from median for splice sites, in terms of percentage of zero offset. The data for coding and non-coding region is the same as in figure 3A, but. The data for within transcript variation is the same as in Additional file 1: Figure S3, but zoomed-in. Note that the Y-axis is truncated and that the X-axis is shown in bins of 10 bp outside the region of ± 10 bp.

60558

0.00%

0.25%

0.50%

0.75%

100%

2595

0.00%

0.25%

0.50%

0.75%

100%

Transcripts5 prime 3 prime

a b c d e f

Codi

ngN

on-c

odin

gW

ithin

UM

I var

iatio

n

Position “b”

Non

-cod

ing

With

in U

MI v

aria

tion

Position “c”

Codi

ng

Codi

ngN

on-c

odin

gW

ithin

UM

I var

iatio

n

Position “d”

Codi

ngN

on-c

odin

gW

ithin

UM

I var

iatio

n

Position “e”

119793

0.00%

0.25%

0.50%

0.75%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

24403

0.00%

0.25%

0.50%

0.75%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

9592

0.00%

0.25%

0.50%

0.75%

100%

58828

0.00%

0.25%

0.50%

0.75%

100%12075

0.00%

0.25%

0.50%

0.75%

100%

3455

0.00%

0.25%

0.50%

0.75%

100%

4325

0.00%

0.25%

0.50%

0.75%

100%972

0.00%

0.25%

0.50%

0.75%

100%

119793

0.00%

0.25%

0.50%

0.75%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

24403

0.00%

0.25%

0.50%

0.75%

100%

<−100

−90

−80

−70

−60

−50

−40

−30

−20

−10 −9 −8 −7 −6 −5 −4 −3 −2 −1 0 1 2 3 4 5 6 7 8 9 10 20 30 40 50 60 70 80 90

>100

Page 20: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S8: Example ideal read >m150922_014300_42134_c100884922550000001823198604021672_s1_p0/3975/ccs GATCTCTACTATATGCGAATGATACGGCGACCACCGATGGTATAGGGATCAGTAGATTTGAACAGAGCCAGAGGTACCATGCTCTTGCCAGTAAAATTCTTTTCCCATGTTTTTGAATACATTTGTATTTAATGTGGCTGAAATGACAAAAACAAAGGTCCTAAATTTTAGAATAGATGTTATCTTTTTATTTAATCTTTTAAAAAAAAAACTGGATGTTTCAATCTTTTAAATTGCAAGACACAAGGCAATTCCAACTGGGCATTTCAATCTGGTTTTAGGTGCTTTAGAGATAATTGCTAGGCAGTGCCAAATGAAGGCATTAGTGCTTTGTACCCAGAAATTGGCCTCTACATGCAGCACTAAGATTAATGCAGATTTCTCTTGAACCTTCCAGTCTTCCTTGTTGGTGGTAATGTGCTTTGTTCTTTCCTTTTTTTCCTCAAGAAGAAATGTTAGTATGTCTTAGAACCTAGGTAACGAAGTGCACTTATGCTCATAAAATTGCTTGCACCTAATCCAAAAGTCGTGGTCTTGCCTTAGTCCATTAACATGGTCTTCATTTCCAGTTTGCATTGACTGCTACAGCACTTACTAATTTGGCTTTTATGTTTAAATGGTTCTGTGATAATCAAAACTTAAATAGTTACTGTTCGTTCATTGTGGTACTTTTGTATTCAGTTCTGTACTCAGAGATGGCACCAATTGTTCATTTACTATGCACCACCTACTGCTTATATAAGAGAACTAATAACTCTTTCAGCCGATACTTAACACATCATGTATTCCTTGCAGCTTCTAGGACATCTTCCATACATGTATGGATGTGTACATACATATACATGTGTACATACTGGTATGGCCCAGCACATATTTCTTAAGCTCTGCAGCCATATATAGAAGTCCTTGTTATTTGGGAGGACAGATTAAATGCCTTTCATTCACAAGAAAATGAAGAGCTCTCTGGGTTCCAAAATGCTGATGTTGAATACATGATATATATGCACACATATAGGCACTTGCATGTTAGCATGTTCATGTCTGTGACTGTCTTTTGAGTTATATAGGAGTGGCAGAGATTGTGTCACTTTAACTTGTTGTCGACTTAGCCTGAAAACCTTACTAGTTCAGAATCATACTCCTGACTGTCTTCTGGAGCAACAGGGAACTGCATGGCCAACTTAAAGTACTTGTTATAAATGAACCAGAAATCTGATTTTAAAATACATTTTTCATTTTAATTGTTTTTGCATTTTGTTTTGTTAATCCTTAAAGGATGGATTGTGCTTTTAAGAATGTAAGTGTTATGTATTTGTCATGAATCAACAAAACAGCTTTTAAAAAAGACTAATTGGAACAAGTGGGTGGCACTGGCTTAATGCTGTTAATGTTTCTAAACGATTTTACATTTAGATTATATATCGGATTCATATTGAGATACCTCCAAAGCACTGTTTTATAGAAAGATCTGACTTCAATAAATATTTTTGTATTTTACATGGGCCAGTTTATGTACCGCTATTTACACTATTATTTCCTATAGACAATACACACTGTCTTTGTAGTTTTTGTATTTTTGTTTTGTTTTACTGGTTTTGCTTCTTGATGGTAATATACTCTGTCTGGTGTGGGATATTTTCCACACTTTAGAATTTGTATAAGAAACTGGTCCATGTAAGTACTTTCCATGTTTTCTCTTCAAATGTTTAAAGTGCTAGCTGATGTACGTACGATATCTCTTCTCAGATATTTGCCTGTCTGTTTGCCCAAAAATTGCTTCTAAATCAATAAAGATTCTTTTATTTCTTAAGGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCGGTGGTCGCCGTATCATTCGCATATAGTAGAGA Illumina adaptor PacBio barcodes – start PacBio barcodes – end UMI Site of template switching PolyA / PolyT

Page 21: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S9: Example concatenated read >m150920_000225_42134_c100885002550000001823198604021652_s1_p0/1024/ccs TACTAGAGTAGCACTCGAATGATACGGCGACCACCGATGGAAGAGGGTGTCCTGACTTCCTTTGATGATGAACAGTGATGTGAAAGTGTAAGCCAAACCTTTTCCTCCTCAACTTCCTTTTGGTCATGGTTTTTCGTCACAGCAATAGAAACCCTAATTAGGATATATCCCTGAAACTGGAGTTACAGAAGCTTGTGAGTGGCCATGTGGTGCCAGGAATACAACCCAGCTCCTCCAGAAGGGCAGCCAGTGCACTTAACCACGGAGCCATCTCTCCAGCCCTAGTTAACCTTTAAAAGTAATACATTACTATTTACAAATTAGTATATGTCAAACATTTTCTTCCTCATGTTGAAATGAAATTCTATTCTTCCTGTGATCTTGTTTGTGTATTCTTTGGATGTTATAACTTGCCAAAAATCCTTGATATTATAGCTTGCCAATGTTTGCTGACAAATGTAATTCTTTAGAAAAGTTTGTTTTTCATAAACATATTTAGTTTTATTGTAATATTTATCTGACCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCGGTGGTCGCCGTATCATTCGAGTGCTACTCTAGTATACTAGAGTAGCACTCGAATGATACGGCGACCACCGATCTTGCAGGGACCGTCGGGGCTTCCTCGACGAGGCCGTTCGGAAGGTCTCCTGCTCCGTCTCGAGAGCTGCTTTCTCCTTCCGCACACGCTACCCGGCTGCTGCGGCCCCAGAACGCCCGGGTGAGGAGTTGGTTGTAGTGAGCAGTTCCGATCCCTTGGGCTACCGGCGGCGAGCGCCCGAGCCGCTCCTCCCAATGGCGAAGAAGACGTACGACCTGCTTTTCAAGCTGCTCCTGATCGGGACTCGGGAGTGGGCAAGACCTGCGTCCTTTTTCGTTTTTCGGACGATGCCTTCAATACCACCTTTATTTCCACCATAGGAATAGACTTTAAGATCAAAACAGTGGAACTACAAGGAAAGAAGATCAAGCTACAGATATGGGACACAGCAGGCCAGGAGCGATTTCACACCATCACAACCTCCTACTACAGAGGAGCAATGGGCATCATGCTAGTGTATGACATCACCAACGGTAAAAGCTTTGAGAACATCAGCAAGTGGCTTAGAAACATAGATGAGCATGCCAATGAAGATGTGGAAAGAATGTTACTAGGGAACAAGTGTGACATGGACGACAAGAGAGTTGTACCGAAAGGCAAAGGAGAACAGATTGCAAGGGAGCATGGTATTAGGTTTTTGAGACTAGTGCAAAAGCAAATATAAACATCGAAAAGGCGTTCTCACATTAGCTGAAGACATCCTCCGAAAGACCCCTGTAAAAGAACCCAACAGTGAAAACGTAGATATCAGCAGTGGAGGAGGCGTGACGGGCTGGAAGAGCAAGTGCTGCTGAGTGCTCTCCTGTCCATCTGCTGCCATCCACCATCCGGTTCTCTTCTTGCTGCAAAATAAAACACTCTGTCCATTTTTAACTCTAAACAGATATTTTTGTTTCTCATCTTAACTATTCAATCCACCTATTTTATTTGTTCTTTCATCTGTGACTGCTTGCGGACTATTATAATTTTCTTCAAACAAACAAACAAAAATGTATAGAGAAATCATGTCTGTGAGTTCATTTTGAGATTTACTTGCTCACTCAGCCCTGCACTTCAGTTGTATTATAGTCCAGTTCTTATCAACATTAAACTAGAGCAATCATTTTCAACTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACGATCGGTGGTCGCCGTATCATTCGAGTGCTACTCTAGTATCAGACGATGCGTCATGAATGATACGGCGACCACCGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTCTTCGGCTTCCTACTACCTGTGCCTTTAAGTACTTTTCTGTCTTCTAAAGATGCTTTGTCTTCTATTTTCCTATCTTCTCTAACAGCTTTTTCTTCACCTTCCTGTATGTCTTCTTCGTCTTTCACTTCCATTTTCTAATCTTCTTTCTCCAAATACTAGCCTAACCTTTTTGTTTTTTCAAGACAGGGTTTCTCTGTGTAGCCCTGGCTGTCCTGGAACTCACTCTGTAGACCAGGCTGGCCTCGAACTCAGAAATCCGCCTGCCTCTGCCTCCCAAGTGCTAGGATTAAAGGCGTGCGCCACCATCGCCCGGCTAGCCTAACACTTCCATCTTCCCTGTTTCCAACTCATGCCTTCAAACGTACATGCGCCAGTCGAAGGATTTTTATAACGGCCGTCAACTTAACCTACTTTTCTCAATCTTTTAAAACTACACCTCATTTTCAAACTATTCTCTACAACTTTACTGTTTTAAATGCCACTTAGATTCTATTTAGGTAACCTTCGTTTTAATCTACAAGGCCGACCTTCAAACTAGAACCTTTTAGAACTTCACAAAACCTCCCTTTACAATCTCCTAAACTGCTCTGGTCAGCCTCCATTATACAGTACAAATACTTTTCTCCTCATTTCCTTTTTCAAACTGTTTAAATAAGCCTTCATGTTATCTCTTAAGATCTTCTTAGATTATTAAGACTAAGTAATATTTCGATAAGCTACTCTATTAGCTTAAGTTTAGAGTTCTAATTCTTTTTACTGCTCAATCTTTTTAATTAAAAACTTATCTGCGATTTCCTCGGGCTGAGTCTCCTGCCTCACGAGCTCAGCTGTGCTGCTCTACGCTGCTCTGCTCTCGCTGCCTGAATGCCTCCCAACCACATCGCCCTCCAATATCGGTGGTCGCCGTATCATTCATGACGCATCGTCTGA Illumina adaptor PacBio barcodes – start PacBio barcodes – end UMI Site of template switching PolyA / PolyT

Page 22: Additional File 1 Figure S1 – Quality of PacBio …10.1186...Figure S2 Comparison between annotated and sequenced gene length for the short and long data set as well as the conservative

Figure S10: List of custom oligonucleotides (5ʹ to 3ʹ direction): Idx1: TCAGACGATGCGTCATGAATGATACGGCGACCACCGAT Idx2: CTATACATGACTCTGCGAATGATACGGCGACCACCGAT Idx3: TACTAGAGTAGCACTCGAATGATACGGCGACCACCGAT Idx4: TGTGTATCAGTACATGGAATGATACGGCGACCACCGAT Idx5: ACACGCATGACACACTGAATGATACGGCGACCACCGAT Idx6: GATCTCTACTATATGCGAATGATACGGCGACCACCGAT C1-P1-T31: Bio-AATGATACGGCGACCACCGATCGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT C1-P1-RNA-TSO: Bio-AAUGAUACGGCGACCACCGAUNNNNNGGG (RNA)