Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Eltaf Alamyar, Véronique Giudicelli, Patrice Duroux and Marie-Paule LefrancLaboratoire d'ImmunoGénétique Moléculaire (LIGM), Université Montpellier 2,Institut de Génétique Humaine (IGH), UPR CNRS 1142, Montpellier (France){Eltaf.Alamyar,Veronique.Giudicelli,Patrice.Duroux,Marie-Paule.Lefranc}@igh.cnrs.fr
ImMunoGeneTics
Informationsystem®
http://www.imgt.org
Biological ContextThe adaptive immune response is characterized by an extreme diversity of the specific antigen receptors that comprise the immunoglobulins (IG) or antibodies and the T cell receptors (TR) (10¹² different IG and 10¹² different TR per individual, in humans). The complex molecular mechanisms (DNA rearrangements, N-diversity, and for IG, somatic hypermutations) that occur in B cells and T cells are at the origin of that huge diversity.
10¹² IG (antibodies)per individual
AGENCENATIONALEDE LARECHERCHE
©2012 Eltaf Alamyar, Géraldine Folch, Marie-Paule Lefranc
IMGT® founder and director: Marie-Paule Lefranc ([email protected])Bioinformatics manager: Véronique Giudicelli ([email protected])Computer manager: Patrice Duroux ([email protected])Webmaster: Chantal Ginestoux ([email protected])© Copyright 1995-2012 IMGT®, the international ImMunoGeneTics information system®
Users and AnalysesSince the availability of IMGT/HighV-QUEST in October 2010, more than 200 millions of sequences (from external users) have been submitted. They required more than 86,000 hours of computational resources. About 5 terabytes of results were generated.
164 of IMGT/HighV-QUEST users are from USA (45%), 137 are from EU (17 countries) (37%), and 66 from China (16), Japan (13), Taiwan (8), Canada (6), Australia (5), Mexico (5), and 10 other countries (18%).
Statistics in 2012 show an increasing number of IMGT/HighV-QUEST users and a growing analysis demand compared with 2011 (25% increase in the number of submitted sequences and 162 new users were registered and activated during 2012)
Users from USA submitted 74% of the sequences, users from EU submitted 19%, while the remaining sequences were submitted by users from other countries.
OutputsThe outputs are archived in a single file in ZIP format which comprises:
11 CSV files equivalent to the eleven sheets of the 'Excel files' of IMGT/V-QUEST
for each analysed sequence, the 'Detailed view' individual files that allows one to visualize the individual detailed results
IMGT_HighV-QUEST_individual_files_folder
IMGT_HighV-QUEST_main_folder (named according to the analysis title)
Sequence 1 individual file
Sequence 2 individual file
Sequence n individual file
T 1_Summary.txt
2_IMGT-gapped-nt-sequences.txt
3_Nt-sequences.txt
4_IMGT-gapped-AA-sequences.txt
5_AA-sequences.txt
6_Junction.txt
7_V-REGION-mutation-and-AA-change-table.txt
8_V-REGION-nt-mutation-statistics.txt
9_V-REGION-AA-change-statistics.txt
10_V-REGION-mutation-hotspots.txt
11_Parameters.txt
T
T
T
T
T
T
T
T
T
T
T
T
T
Sequence 2 individual fileT
Statistical analysis is performed on 3 CSV files and the results are encapsulated in a single file in PDF format
Statistical analysis output
Analysis output
Statistical Analysis
For each gene, number of sequences, average sequence length, average V-, D-, J-REGION length, and number of sequences with an identity percentage of 100% by comparison with the germline, are provided.
Statistics provide the histogram of different and identical CDR3-IMGT sequences for each CDR3-IMGT length in nucleotides (nt) and amino acids (AA).Results are shown as:
Nb of sequences with different (unique) CDR3-IMGT (nt)Nb of sequences with different (unique) CDR3-IMGT (AA)Nb of sequences (in sets) with identical CDR3-IMGT (nt)Nb of sequences (in sets) with identical CDR3-IMGT (AA)
1000
0
200
400
600
800
15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 105
Nt
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 35 AA
Nb of sequences with different (unique) CDR3-IMGT (nt)Nb of sequences with different (unique) CDR3-IMGT (AA)Nb of sequences (in sets) with identical CDR3-IMGT (nt)Nb of sequences (in sets) with identical CDR3-IMGT (AA)
CDR3-IMGTlength
0 200 400 600 8000 200 400 600 800
0 200 400 600 800
IGHV1−18IGHV3−16IGHV3−15IGHV3−13IGHV3−11IGHV2−10IGHV3−9IGHV1−8IGHV3−7IGHV2−5IGHV7−4−1
(647)(0)(2)(1)(339)(0)(472)(413)(199)(0)(378)
V gene and allele table
J gene and allele table
D gene and allele table
V gene histogram
0 500 1000 1500 2000 2500 30000 500 1000 1500 2000 2500 3000
0 500 1000 1500 2000 2500 3000
IGHD3−9IGHD3−10IGHD5−12IGHD6−13
(600)(2757)(329)(1715)
D gene histogram
J gene histogram0 1000 2000 3000 4000 5000 60000 1000 2000 3000 4000 5000 6000
0 1000 2000 3000 4000 5000 6000
IGHJ1PIGHJ1IGHJ2IGHJ3IGHJ4IGHJ5IGHJ3PIGHJ6
(0)(226)(376)(2562)(5066)(984)(0)(1748)
3. CDR3-IMGT length analysis
Statistical analyses are performed on results selected as ‘1 copy’ (redundancies are recorded but not processed), and with quality criteria (identification of a single gene/allele, known functionality, absence of IMGT/V-QUEST warnings regarding the CDR1-IMGT and CDR2-IMGT lengths and the percentage of identity).
1. Selection of results for statistical analysis
Colored lines illustrate results per gene and white lines under each gene illustrate the results per allele, individually. In the histograms, genes are ordered according to their positions from 5’ to 3’ in the locus.
2. Tables and histograms for each gene (V, D and J)
#IMGT gene and allele Total
Averagesequencelength
AverageV-REGIONlength
id=100%nb (%)
1 IGHV1-18 647 243 166 455 (70.32%)IGHV1-18*01 647 243 166 455 (70.32%)
9 IGHV3-11 339 242 166 253 (74.63%)IGHV3-11*01 339 242 166 253 (74.63%)
10 IGHV3-13 1 223 158 1 (100.0%)IGHV3-13*01 1 223 158 1 (100.0%)
11 IGHV3-15 2 266 173 1 (50.0%)IGHV3-15*04 1 283 173 0 (0.0%)IGHV3-15*07 1 248 173 1 (100.0%)
#IMGT gene and allele Total
Averagesequencelength
AverageJ-REGIONlength
id=100%nb (%)
2 IGHJ2 414 243 50 0 (0.0%)IGHJ2*01 414 243 50 0 (0.0%)
3 IGHJ3 2685 244 44 0 (0.0%)IGHJ3*01 36 245 41 0 (0.0%)IGHJ3*02 2649 243 48 0 (0.0%)
4 IGHJ4 5795 240 41 754 (13.01%)IGHJ4*01 5 239 46 3 (60.0%)IGHJ4*02 5708 238 33 751 (13.16%)IGHJ4*03 82 242 43 0 (0.0%)
#IMGT gene and allele Total
Averagesequencelength
AverageD-REGIONlength
10 IGHD3-10 2757 243 17IGHD3-10*01 2693 244 15IGHD3-10*02 64 242 19
14 IGHD3-9 600 246 19IGHD3-9*01 600 246 19
18 IGHD5-12 329 238 14IGHD5-12*01 329 238 14
21 IGHD6-13 1715 239 15IGHD6-13*01 1715 239 15
The analysis of expressed repertoires of antigen receptors - immunoglobulins (IG) or antibodies and T cell receptors (TR) - represents a huge challenge for the study of the adaptive immune response in normal and disease-related situations, such as viral infections. To answer that need, IMGT®, the international ImMunoGeneTics information system® (http://www.imgt.org) [1] has developed IMGT/HighV-QUEST [2]. IMGT/HighV-QUEST is devoted to the analysis of large repertoires of IG and TR sequences that result from Next Generation Sequencing technologies. IMGT/HighV-QUEST, a high throughput version of IMGT/V-QUEST [3], analyses up to 150,000 sequences per run, and provides statistical analysis for up to 450,000 sequences. IMGT/HighV-QUEST identifies the IG and TR variable (V), diversity (D) and joining (J) genes and alleles by alignment with the germline IG and TR gene and allele sequences of the IMGT reference directory. It describes the V-REGION mutations and identifies the hot spot positions in the closest germline V gene. It integrates IMGT/JunctionAnalysis for a detailed analysis of the V-J and V-D-J junctions, and IMGT/Automat for a full V-J- and V-D-J-REGION annotation. The analysis is based on the IMGT-ONTOLOGY concepts of description, classification and numerotation [4-6]. The analysis results of IMGT/HighV-QUEST comprise a set of text files which include 11 files in CSV format equivalent to the eleven sheets of the 'Excel files' of IMGT/V-QUEST and, for each analysed sequence, the 'Detailed view' that allows one to visualize the individual detailed results. These result files are archived in a single ZIP file that is downloaded by the user.
[1] Lefranc, M.P. et al., Nucleic Acids Res., 37:1006-1012, 2009. [3] Brochet, X. et al., Nucleic Acids Res., 36:W503-508, 2008. [5] Duroux, P. et al., Biochimie, 90:570-583, 2008.[2] Alamyar, E. et al., Immunome Res., 8(1):26, 2012. [4] Giudicelli, V. and Lefranc, M.-P., Bioinformatics, 15:1047-1054, 1999. [6] Giudicelli, V. and Lefranc, M.-P., Front. Genet., 3:79, 2012.
IMGT/HighV-QUEST 2012
Acknowledgments: this work was granted access to the HPC resources of CINES under the allocation 2012-036029 made by GENCI (Grand Equipement National de Calcul Intensif).
Country
450,11China 122,33
111,63Canada 109,93France 101,23
69,1156,45
Portugal 710,39 39,34657,31 36,40
Mexico 630,30 34,90313,77 17,38266,09 14,74257,54 14,26176,36 9,77141,58 7,84131,13 7,26128,53 7,1277,92 4,31
Luxembourg 75,03 4,1557,34 3,1850,78 2,81
Venezuela 20,47 1,1315,62 0,863,27 0,181,47 0,08
481,00 0,21 0,0130,00 0,01 0,009,00 0,00 0,00
Total
Nb of submitted sequences
Computational resources
(hours)
Size of generated files
(Gbytes)United States 148 897 288,00 64 106,54 3 550,00Germany 18 878 961,00 8 128,19
5 131 029,00 2 209,12United Kingdom 4 682 074,00 2 015,83
4 610 900,00 1 985,194 246 071,00 1 828,11
Belgium 2 898 827,00 1 248,07Ireland 2 367 729,00 1 019,41
1 650 000,00Israel 1 526 692,00
1 463 961,00Spain 728 785,00Netherlands 618 030,00Austria 598 171,00Denmark 409 616,00Japan 328 842,00Switzerland 304 566,00Taiwan 298 526,00Australia 180 979,00
174 262,00Hong Kong 133 190,00Italy 117 950,00
47 556,00Norway 36 272,00Greece 7 596,00Finland 3 417,00SwedenKoreaArgentina
200 341 810,00 86 255,56 4 776,54
Users USEUOther
45%
137
66164
37%
18%
Sequence origin74%
148 897 288
USEUOther
37 722 808
13 721 714
19%
7%
36
2
EU users
13
8
532111111
11
16
6
5
Other countries’ users
2415
13
87555433321
12
36
China Japan Taiwan Canada Australia Mexico Israel Korea Hong Kong New Zealand Venezuela Pakistan Singapore Uruguay Argentina South Africa
Germany United Kingdom Netherlands France Italy Norway Spain Sweden Switzerland Finland Austria Denmark Belgium Greece Luxembourg Ireland Portugal