1
Assessing the Immunoglobulin Repertoire by Next-Generation Sequencing using HIVE tools Andrea M. Siegel 1 , Sean P. Fitzsimmons 1 , Alin Voskanian 2 , Luis Santana-Quintero 2 , Vahan Simonyan 2 , Marjorie A. Shapiro 1 1 Division of Biotechnology Review and Research I, Center for Drug Evaluation and Research, 2 HIVE, Center for Biologics Evaluation and Research, U. S. Food and Drug Administration, Silver Spring, Maryland 20993 Follicular and marginal zone B cells were isolated from a single mouse spleen. Spleens were homogenized, and cells resuspended in Ammonium-Chloride- Potassium (ACK) Lysing Solution to deplete erythrocytes and stained for flow cytometry. Singlet lymphocytes that stained LIVE/DEAD Violet - (live) were separated into follicular (CD19 + CD23 + CD21/35 +/- ) and marginal zone (CD19 + CD23 - CD21/35 + ) B cells were isolated by flow cytometry on an Aria Fusion sorter using DIVA software (BD Biosciences) RNA isolated using TRIzol-LS reagent and the Direct-zol RNA Miniprep kit from Zymo Research. RNA was reverse transcribed using the SMARTer® RACE 5’/3’ Kit from Clontech. Rapid Amplification of cDNA Ends (RACE) is a method to amplify mRNA transcripts. RACE eliminates 5’ primer bias from the library. Computational pipeline Future Directions ACKNOWLEDGEMENTS We are grateful to the Division of Veterinary Services for their animal care and guidance. We would like to thank Adovi Akue and Mark A. KuKuruga in the CBER Flow Cytometry Core for their sorting expertise. We are also grateful to RongFong Shen, Rong Wang, Wells Wu, and Je-Nie Phue in the Facility for Biotechnology Resources in CBER for conducting the bioanalyzer and next-generation sequencing as well as for their helpful advice in library preparation techniques. Additionally, we thank members of the Shapiro and Simonyan groups for their input and advice. This project was funded by the Office of Biotechnology Products in the Center for Drug Evaluation and Research at the FDA. NGS Data loading Next-Gen Sequencing Quality and length filtration distributed storage cloud web portal ftp drop-box Metadata Database Parsing Splitting the download job web browser HIVE Infrastructure 300 bp paired-end reads Humoral immunity depends on the generation of a diverse array of immunoglobulin molecules. Each immunoglobulin consists of two heavy (H) chain and two light (L) chain molecules. Both the heavy and light chain loci encode multiple gene cassettes (V, D, and J for heavy chains; V and J for light chains) that recombine with each other to produce a complete and unique variable region of the immunoglobulin. These recombination events are error-prone and random nucleotides are added, deleted, and mutated in each cell to further generate diversity. In order to measure the large breadth of the antibody repertoire, we turned to a next-generation sequencing based platform. Murine B cell subsets were sorted by flow cytometry, RNA was purified, and RNA was reverse transcribed using SMARTer 5’RACE to add a common primer sequence to the 5’ end of each cDNA. H and L chain variable regions were amplified with the common 5’ primer and a constant region primer, gel purified, and labelled with adapter and index sequences for Illumina MiSeq sequencing. A customized bioinformatics pipeline was developed with the HIVE team to analyze the resultant data. A diverse array of immunoglobulin genes were amplified from different B cell populations. Next-generation sequencing is rapid method to measure diversity in the highly diverse immunoglobulin locus. BD FACSAria™ sorter Primers/adapters Algorithm Function Number of Sequences Import into HIVE 6,152,791 Paired End Collapser Matches homologous regions from read 1 and read 2. Creates a single sequence. Select for Q score above 30 Selects only sequences with a high base-pair assignment quality 491,425 Select for sequences longer than 350 bp Eliminates incomplete immunoglobulin sequences based on length 300,519 Align with a constant region primer using HIVE-Hexagon Selects only reads that contain a specific constant region sequence 37,238 (IgM) Assign VDJs using High V-Quest http://www.imgt.org/IMGTin dex/IMGTHighV-QUEST.html, assigns genes, determines mutations, CDR3 length, etc. 30,554 functional 6,817 unique sequences HIVE-hexagon V gene family Number of Sequences Frequency (%) VH1 2825 41.44 VH2 755 11.08 VH3 227 3.33 VH4 120 1.76 VH5 753 11.05 VH6 107 1.57 VH7 275 4.03 VH8 99 1.45 VH9 881 12.92 VH10 269 3.95 VH11 22 0.32 VH12 1 0.01 VH13 6 0.09 VH14 477 7.00 VH15 0 0.00 VH16 0 0.00 D gene Number of Sequences Frequency (%) No D 1748 25.64 D1 1672 24.53 D2 2278 33.42 D3 442 6.48 D4 566 8.30 D5 94 1.38 D6 17 0.25 J gene Number of Sequences Frequency (%) JH1 584 8.57 JH2 2329 34.16 JH3 1903 27.92 JH4 2001 29.35 IgM Heavy Chains from Marginal Zone B cells Conclusions We have developed an unbiased method of assaying the immunoglobulin repertoire using NGS with a customized bioinformatics platform. Our ongoing project is the analysis of the B cell repertoire from subsets of B cells in the spleen (follicular and marginal zone) as well as the peritoneum (B-1a and B-1b cells). Marginal zone, B-1a, and B-1b cells are innate-like and have been shown to have a more limited repertoire using low throughput techniques. This method will provide a more complete survey of the diversity of these B cell subsets FDA Mission Relevance Understanding the immunoglobulin repertoire will aid the FDA mission by analyzing the way that regulated biologics and vaccines change the immune system. Basic understanding of immunoglobulin development is key in regulation of immunoglobulin-based therapies. Future Directions Aging and the immune repertoire. As humans age, their immune response weakens including to vaccination. Repertoire analysis will be performed in aged versus young mice. Repertoire changes following B cell depletion and reconstitution, a therapy commonly used in cancer treatment. We will deplete the B cells from mice using an anti-CD20 antibody and study how the repertoire changes pre- and post- B cell reconstitution. Contribution of the microbiota to the formation of the immunoglobulin repertoire. We will examine the immunoglobulin repertoire in germ-free mice. Clinical Applications Autoimmunity Immunodeficiency Aging Immune depletion and reconstitution Vaccine responses Tracking effects of cancer treatment Nat Rev Immunol. 2011 Apr;11(4):251-63. doi: 10.1038/nri2941. PMID: 21394103 B cells express Immunoglobulin http://www.virology.ws/2009/07/22/adaptive-immune-defenses-antibodies/ Library analysis of marginal zone B cell IgM repertoire from a single mouse. Complexity of IgM Repertoire Differential Pairing 0 5 10 15 20 25 30 35 40 JH1 JH2 JH3 JH4 Percent of unique sequences J gene J Gene Usage 0 5 10 15 20 25 30 35 40 No D D1 D2 D3 D4 D5 D6 Percent of unique sequences D gene D Gene Usage 0 5 10 15 20 25 30 35 40 45 Percent of unique sequences V gene family V Gene Usage High V-Quest HIVE Classification

Assessing the Immunoglobulin Repertoire by Next-Generation

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Assessing the Immunoglobulin Repertoire by Next-Generation Sequencing using HIVE tools

Andrea M. Siegel1, Sean P. Fitzsimmons1, Alin Voskanian2, Luis Santana-Quintero2, Vahan Simonyan2, Marjorie A. Shapiro1

1 Division of Biotechnology Review and Research I, Center for Drug Evaluation and Research, 2 HIVE, Center for Biologics Evaluation and Research,

U. S. Food and Drug Administration, Silver Spring, Maryland 20993

• Follicular and marginal zone B cells were isolated from a single mouse spleen. Spleens were homogenized, and cells resuspended in Ammonium-Chloride-Potassium (ACK) Lysing Solution to deplete erythrocytes and stained for flow cytometry.

• Singlet lymphocytes that stained LIVE/DEAD Violet- (live) were separated into follicular (CD19+CD23+CD21/35+/-) and marginal zone (CD19+CD23-CD21/35+) B cells were isolated by flow cytometry on an Aria Fusion sorter using DIVA software (BD Biosciences)

• RNA isolated using TRIzol-LS reagent and the Direct-zol RNA Miniprep kit from Zymo Research. RNA was reverse transcribed using the SMARTer® RACE 5’/3’ Kit from Clontech. Rapid Amplification of cDNA Ends (RACE) is a method to amplify mRNA transcripts. RACE eliminates 5’ primer bias from the library.

Com

puta

tiona

l pip

elin

e Future Directions

ACKNOWLEDGEMENTS We are grateful to the Division of Veterinary Services for their animal care and guidance. We would like to thank Adovi Akue and Mark A. KuKuruga in the CBER Flow Cytometry Core for their sorting expertise. We are also grateful to RongFong Shen, Rong Wang, Wells Wu, and Je-Nie Phue in the Facility for Biotechnology Resources in CBER for conducting the bioanalyzer and next-generation sequencing as well as for their helpful advice in library preparation techniques. Additionally, we thank members of the Shapiro and Simonyan groups for their input and advice. This project was funded by the Office of Biotechnology Products in the Center for Drug Evaluation and Research at the FDA.

NGS Data loading

Next-Gen Sequencing

Quality and length filtration

distributed storage cloud

web portal ftp drop-box

Metadata Database

Parsing Splitting the download job

web browser

HIVE Infrastructure

300 bp paired-end reads

Humoral immunity depends on the generation of a diverse array of immunoglobulin molecules. Each immunoglobulin consists of two heavy (H) chain and two light (L) chain molecules. Both the heavy and light chain loci encode multiple gene cassettes (V, D, and J for heavy chains; V and J for light chains) that recombine with each other to produce a complete and unique variable region of the immunoglobulin. These recombination events are error-prone and random nucleotides are added, deleted, and mutated in each cell to further generate diversity. In order to measure the large breadth of the antibody repertoire, we turned to a next-generation sequencing based platform. Murine B cell subsets were sorted by flow cytometry, RNA was purified, and RNA was reverse transcribed using SMARTer 5’RACE to add a common primer sequence to the 5’ end of each cDNA. H and L chain variable regions were amplified with the common 5’ primer and a constant region primer, gel purified, and labelled with adapter and index sequences for Illumina MiSeq sequencing. A customized bioinformatics pipeline was developed with the HIVE team to analyze the resultant data. A diverse array of immunoglobulin genes were amplified from different B cell populations. Next-generation sequencing is rapid method to measure diversity in the highly diverse immunoglobulin locus.

BD FACSAria™ sorter

Primers/adapters

Algorithm Function Number of Sequences

Import into HIVE 6,152,791

Paired End Collapser Matches homologous regions from read 1 and read 2. Creates a single sequence.

Select for Q score above 30

Selects only sequences with a high base-pair assignment quality

491,425

Select for sequences longer than 350 bp

Eliminates incomplete immunoglobulin sequences based on length

300,519

Align with a constant region primer using HIVE-Hexagon

Selects only reads that contain a specific constant region sequence

37,238 (IgM)

Assign VDJs using High V-Quest

http://www.imgt.org/IMGTindex/IMGTHighV-QUEST.html, assigns genes, determines mutations, CDR3 length, etc.

30,554 functional

6,817 unique sequences

HIVE-hexagon

V gene family

Number of Sequences

Frequency (%)

VH1 2825 41.44

VH2 755 11.08

VH3 227 3.33

VH4 120 1.76

VH5 753 11.05

VH6 107 1.57

VH7 275 4.03

VH8 99 1.45

VH9 881 12.92

VH10 269 3.95

VH11 22 0.32

VH12 1 0.01

VH13 6 0.09

VH14 477 7.00

VH15 0 0.00

VH16 0 0.00

D gene Number of Sequences

Frequency (%)

No D 1748 25.64

D1 1672 24.53

D2 2278 33.42

D3 442 6.48

D4 566 8.30

D5 94 1.38

D6 17 0.25

J gene Number of Sequences

Frequency (%)

JH1 584 8.57

JH2 2329 34.16

JH3 1903 27.92

JH4 2001 29.35

IgM Heavy Chains from Marginal Zone B cells

• Conclusions • We have developed an unbiased method of assaying the immunoglobulin

repertoire using NGS with a customized bioinformatics platform. • Our ongoing project is the analysis of the B cell repertoire from subsets of B

cells in the spleen (follicular and marginal zone) as well as the peritoneum (B-1a and B-1b cells).

• Marginal zone, B-1a, and B-1b cells are innate-like and have been shown to have a more limited repertoire using low throughput techniques.

• This method will provide a more complete survey of the diversity of these B cell subsets

• FDA Mission Relevance • Understanding the immunoglobulin repertoire will aid the FDA mission by

analyzing the way that regulated biologics and vaccines change the immune system. Basic understanding of immunoglobulin development is key in regulation of immunoglobulin-based therapies.

• Future Directions • Aging and the immune repertoire. As humans age, their immune response

weakens including to vaccination. Repertoire analysis will be performed in aged versus young mice.

• Repertoire changes following B cell depletion and reconstitution, a therapy commonly used in cancer treatment. We will deplete the B cells from mice using an anti-CD20 antibody and study how the repertoire changes pre- and post- B cell reconstitution.

• Contribution of the microbiota to the formation of the immunoglobulin repertoire. We will examine the immunoglobulin repertoire in germ-free mice.

• Clinical Applications • Autoimmunity • Immunodeficiency • Aging • Immune depletion and reconstitution • Vaccine responses • Tracking effects of cancer treatment

Nat Rev Immunol. 2011 Apr;11(4):251-63. doi: 10.1038/nri2941. PMID: 21394103

B cells express Immunoglobulin

http://www.virology.ws/2009/07/22/adaptive-immune-defenses-antibodies/

Library analysis of marginal zone B cell IgM repertoire from a single mouse. Complexity of IgM Repertoire

Differential Pairing

0

5

10

15

20

25

30

35

40

JH1 JH2 JH3 JH4

Perc

ent o

f uni

que

sequ

ence

s

J gene

J Gene Usage

0

5

10

15

20

25

30

35

40

No D D1 D2 D3 D4 D5 D6

Perc

ent o

f uni

que

sequ

ence

s

D gene

D Gene Usage

05

1015202530354045

Perc

ent o

f uni

que

sequ

ence

s

V gene family

V Gene Usage

High V-Quest

HIVE Classification