9
IRIS: A database surveying known human immune system genes $ James Kelley a , Bernard de Bono b , John Trowsdale a, T a Department of Pathology, Immunology Division, University of Cambridge, Cambridge CB2 1QP, UK b European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK Received 11 December 2004; accepted 20 January 2005 Abstract We have compiled an online database of known human defense genes: the Immunogenetic Related Information Source (IRIS). As of October 1, 2004, there are 1562 immune genes recorded in IRIS, representing 7% of the human genome. This resource contains searchable information including chromosomal location, sequence data, and a curated functional annotation for each entry. We used IRIS as a basis for analyzing the composition and characteristics of the immune genome, such as gene clustering, polymorphism, and relationship to disease. High protein sequence similarity correlated inversely with distance between immune genes, consistent with clustering of duplicated loci. We also found that, even though some immune genes exhibit high levels of polymorphism, such as MHC class I, the range of levels of polymorphism in immune genes is similar to that of nonimmune genes. Approximately 20% of immune genes have a known disease association. IRIS is available online at http:// www.immunegene.org. D 2005 Elsevier Inc. All rights reserved. Keywords: Immune system; Human genome; Genetic information databases; Genetic polymorphisms; Gene clusters; Disease Where comparisons have been made, such as between mouse and human [1] and Anopheles and Drosophila [2,3], the number of immune system genes differs markedly in related species. It has been suggested that other features, such as increased levels of polymorphism and clustering of immune system genes, may also reflect the strong selection required for resistance to infection [4]. A database of all known immune genes would provide an organized subset of functionally related human genes to compare immune system genes, as a set, with the rest of the genome. It could also be used in genetic studies of function and disease, in DNA arrays, and in computer prediction programs of potential therapeutic targets during drug development. While several databases address the function of the im- mune system [5–7], none include an exclusive survey of the entire immune genome. To create such a database, we developed the Immunogenetic Related Information Source (IRIS). It currently includes information on chromosomal locations, protein and nucleotide sequences, and manual curations of the proposed function of each gene in immunity. This online resource is available at http://www.immunegene.org. Using IRIS we set out to explore whether, as a group, immune system genes display any distinctive character- istics that reflect selective pressure for resistance to infection, including clustering, level of polymorphism, and genetic association with disease. This formal organization of immune genes provides a basis for functional interaction mapping, serves as a documented subset of functionally related genes, and facilitates future observations on genomic features of the immune system. Our initial analysis using IRIS suggests that immune defense genes, as a functional subset, exhibit marked gene duplication and association with disease, although these features are by no means restricted to immune defense. 0888-7543/$ - see front matter D 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.ygeno.2005.01.009 $ Supplementary data for this article may be found on ScienceDirect. T Corresponding author. Fax: +44 1223 339768. E-mail address: [email protected] (J. Trowsdale). Genomics 85 (2005) 503 – 511 www.elsevier.com/locate/ygeno

IRIS: A database surveying known human immune system genes

Embed Size (px)

Citation preview

Page 1: IRIS: A database surveying known human immune system genes

www.elsevier.com/locate/ygeno

Genomics 85 (20

IRIS: A database surveying known human immune system genes$

James Kelley a, Bernard de Bonob, John Trowsdalea,TaDepartment of Pathology, Immunology Division, University of Cambridge, Cambridge CB2 1QP, UK

bEuropean Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK

Received 11 December 2004; accepted 20 January 2005

Abstract

We have compiled an online database of known human defense genes: the Immunogenetic Related Information Source (IRIS).

As of October 1, 2004, there are 1562 immune genes recorded in IRIS, representing 7% of the human genome. This resource

contains searchable information including chromosomal location, sequence data, and a curated functional annotation for each entry.

We used IRIS as a basis for analyzing the composition and characteristics of the immune genome, such as gene clustering,

polymorphism, and relationship to disease. High protein sequence similarity correlated inversely with distance between immune

genes, consistent with clustering of duplicated loci. We also found that, even though some immune genes exhibit high levels of

polymorphism, such as MHC class I, the range of levels of polymorphism in immune genes is similar to that of nonimmune

genes. Approximately 20% of immune genes have a known disease association. IRIS is available online at http://

www.immunegene.org.

D 2005 Elsevier Inc. All rights reserved.

Keywords: Immune system; Human genome; Genetic information databases; Genetic polymorphisms; Gene clusters; Disease

Where comparisons have been made, such as between

mouse and human [1] and Anopheles and Drosophila

[2,3], the number of immune system genes differs

markedly in related species. It has been suggested that

other features, such as increased levels of polymorphism

and clustering of immune system genes, may also reflect

the strong selection required for resistance to infection [4].

A database of all known immune genes would provide an

organized subset of functionally related human genes to

compare immune system genes, as a set, with the rest of

the genome. It could also be used in genetic studies of

function and disease, in DNA arrays, and in computer

prediction programs of potential therapeutic targets during

drug development.

While several databases address the function of the im-

mune system [5–7], none include an exclusive survey of

0888-7543/$ - see front matter D 2005 Elsevier Inc. All rights reserved.

doi:10.1016/j.ygeno.2005.01.009

$Supplementary data for this article may be found on ScienceDirect.

T Corresponding author. Fax: +44 1223 339768.

E-mail address: [email protected] (J. Trowsdale).

the entire immune genome. To create such a database,

we developed the Immunogenetic Related Information

Source (IRIS). It currently includes information on

chromosomal locations, protein and nucleotide sequences,

and manual curations of the proposed function of each

gene in immunity. This online resource is available at

http://www.immunegene.org.

Using IRIS we set out to explore whether, as a group,

immune system genes display any distinctive character-

istics that reflect selective pressure for resistance to

infection, including clustering, level of polymorphism,

and genetic association with disease. This formal

organization of immune genes provides a basis for

functional interaction mapping, serves as a documented

subset of functionally related genes, and facilitates future

observations on genomic features of the immune system.

Our initial analysis using IRIS suggests that immune

defense genes, as a functional subset, exhibit marked

gene duplication and association with disease, although

these features are by no means restricted to immune

defense.

05) 503–511

Page 2: IRIS: A database surveying known human immune system genes

J. Kelley et al. / Genomics 85 (2005) 503–511504

Results and discussion

Definition of an immune gene

For the purposes of this database, an immune gene was

operationally defined as a complete gene that produces a

functional transcript and demonstrates at least one defense

characteristic listed in Table 1. Gene segments, such as those

encoding antigen receptors, immunoglobulins, and T cell

receptors [8,9], were not included. This exclusion prevented

an overrepresentation of data when counting the number of

immune genes. These immune gene segments are extensively

analyzed in an online database, the International ImMuno-

GeneTics Information System (IMGT) (http://imgt.cines.

fr:8104), maintained by LIGM at Montpellier, France

[10,11]. (For more details on the definition of an immune

gene, please refer to a recent review from our laboratory [4].)

Even though some genes not included under this

definition may be involved in defense mechanisms in a

broader sense, we were deliberately conservative for the

initial compilation. We have not included genes whose

products correct or prevent physical disturbances such as

tissue repair from wounds or DNA repair from radiation (for

example, melanin); genes whose products provide a

physical barrier such as epithelia, unless there is a specific

interaction with a pathogen (for example, keratin); genes

whose role in immunity is used in many systems of the body

(for example, actin); genes that share sequence similarity

with known immune genes but have no putative defense

function (for example, some immunoglobulin superfamily

members); and genes whose products interact with non-

pathogenic antigens (for example, some components in

digestion or reproduction).

Composition and functional survey of the immune genome

As of October 1, 2004, IRIS contained 1562 immune

genes, indicating that the immune genome comprises over

Table 1

Defense characteristics of the immune system

Known or putative function in innate or adaptive immunity

Participates in the development or maturation of immune system

components

Induced by immunomodulatorsa

Encodes a protein expressed primarily in immune tissuesb

Participates in an immune pathway that results in the expression of

defense moleculesc

Produces a protein that interacts directly with pathogens or their products

Any complete, human gene that produces a functional transcript and

matches one or more of these criteria is included in IRIS.a Immunomodulators, such as IFN-g, regulate expression of immune

components. Proteins that are direct effectors of immunomodulators are

thought to have an immune function.b We assume that if a gene is commonly expressed in immune tissues then

it has a role in immune functioning.c NF-nB is an example.

7% of the total 21,432 human genes. This was the number

of complete human genes recorded in LocusLink (http://

www.ncbi.nlm.nih.gov/LocusLink).

In this analysis, the innate and adaptive immune systems

occupied 41 and 27%, respectively, of the immune genome.

The larger number of innate immune genes compared to

adaptive immune system genes was affected by the absence

in IRIS of immunoglobulin [8] and T cell receptor [9] gene

segment repertoires, which in base pairs comprise a large

portion of the adaptive immune genome. Genes dedicated to

the development of immune tissues, for example in immune

cell hematopoiesis, represented 8%. Approximately 24% of

the immune genes were not easily placed into any of the

previously mentioned categories. This botherQ category

included genes that can be induced by an immunomodulator,

can participate in a pathway leading to the expression of an

immune molecule, and are expressed primarily in immune

tissues (see Fig. 1).

Distribution of immune genes

IRIS was used to ask whether immune system genes are

randomly distributed in the genome. They were found to

be fairly evenly distributed, as expected of an ancient,

complex system. However, mapping immune gene distri-

bution, which is shown as Fig. 2, revealed some variation

in immune gene densities. Chromosomes 19, 17, 6, and 11

had higher, and chromosomes 18, 13, X, and 14 had lower,

immune gene densities, while the Y chromosome was

completely lacking in immune genes. There were some

pockets of immune genes at very high density. In some

cases this was accounted for by adjacent duplicated genes,

such as in the leukocyte receptor complex (LRC) of

chromosome 19, the natural killer complex of chromosome

12, the MHC on chromosome 6, and the cluster of

interleukin genes on chromosome 1. This increased density

might be due to relatively recent gene duplications, as in

the LRC, but there was also evidence for examples, such

as the MHC, that do not solely consist of related

duplicates.

Clustering of immune genes

There are essentially two types of gene clusters. The most

common are clusters of related duplicated genes whose

functions diverge, such as the KIRs and ULBPs. These

presumably result from tandem segmental duplications [12],

caused by slipped-strand mispairing, gene conversion, or

unequal crossover [13]. Tandem duplication explains the

clustered arrangement of the NBS–LRR disease resistance

genes in plants, for example [14].

A second type of gene cluster is of genes with related

functions but with unrelated sequences. Clustering of this

type is found in prokaryotic operons that encode interact-

ing proteins [15]. With the exception of gene clustering in

operons of Caenorhabditis elegans [16], it is generally

Page 3: IRIS: A database surveying known human immune system genes

Fig. 1. Percentage of immune genes dedicated to major functional divisions.

Note that immune gene segments, such as those that encode antigen

receptors, are not included. These 673 gene segments comprise a substantial

portion of the adaptive immune genome and are recorded in the IMGT/

GENE-DB database (http://imgt.cines.fr) [10,11].

J. Kelley et al. / Genomics 85 (2005) 503–511 505

thought that genes in eukaryotes are randomly distributed in

the genome [17]. However, recent observations in human

and other eukaryotic genomes reveal significant clustering of

genes that function within the same pathway [18]. While

there are suggestions of grouping of functionally related

immune system genes in the MHC [19,20] and evidence of

nonrandom distribution of segmental duplications for

immune genes [21], no comprehensive, reported measure

of gene clustering exists in the literature.

Immune genes with high protein sequence similarity

(z63%) were significantly clustered together (p = 7.57 �10�31), while immune genes with low protein sequence

Fig. 2. Immune genomemap. Each line shows the genomic location of an immune ge

the distances between genes are drawn to scale. Note that immune gene segments, su

similarity (V21%) were distributed randomly with respect

to distance (p = 0.1698). We arbitrarily established the

definition of high and low sequence similarity by taking

the protein sequence similarity values falling above and

below the median 95% of all values, as shown in Fig. 3.

Using an analysis of variance between the distance values

of all immune gene pairs and gene pairs with descending

thresholds of sequence similarity, we found that subsets of

immune genes with protein sequence similarity equal to or

greater than 28% are located significantly closer in

distance, compared to the entire data set (see Fig. 4).

The inverse relationship between distance and protein

sequence similarity suggested prevalence of tandem dupli-

cations and segmental duplications resulting in close

proximity of diverging genes. Further evidence for this

could be seen in the periodicity noticed comparing

distances between gene pairs on a chromosome. As shown

in Figs. 4B and 4C, we observed that genes on the same

chromosome frequently occur at set distances from each

other, demonstrating both the clustered nature of immune

gene superfamilies and their arrangement as tandem,

segmental duplications.

Polymorphism in immune genes

Certain immune loci are among the most variable in

sequence, consistent with selection related to differences in

pathogen load and type. For example, MHC class I

ne. Dense clusters of genes appear as a larger band. Chromosome lengths and

ch as those that encode immunoglobulin and Tcell receptors, are not included.

Page 4: IRIS: A database surveying known human immune system genes

Fig. 3. Number of genes by percentage of sequence similarity in the clustering analysis. Genes falling in the lighter shaded area constitute the median 95% of

gene pairs in this analysis. The vertical lines and darker areas indicate the gene pairs we considered as having high (z63%) and low (V21%) sequence

similarity. The mean percentage of protein sequence similarity for immune genes is 32.4%.

J. Kelley et al. / Genomics 85 (2005) 503–511506

exhibits extremely high levels of polymorphism, relating to

its role in presenting antigens from infectious organisms,

some of which, such as HIV, evolve rapidly [22]. We

therefore used the IRIS database to ask whether immune

system genes, in general, are disproportionately polymor-

phic. Using data provided from Stephens et al. [23], we

calculated the mean number of polymorphisms per kilo-

base, with 95% confidence intervals, for 271 nonimmune

genes as 5.36 F 0.31. The same measure for 125 immune

genes, calculated with data extracted from the University

of Washington–Fred Hutchinson Cancer Research Center

Variation Discovery Resource (UW-FHCRC) (http://

pga.gs.washington.edu) and the Innate Immunity Programs

for Genetic Applications (IIPGA) (http://innateimmunity.

net), was 4.85 F 0.45. The levels of polymorphism

between these two data sets, which are shown in Fig. 5,

were not statistically significantly different (p = 0.0671).

Therefore, the overall range of levels of polymorphism of

immune system genes was, in general, similar to that of

nonimmune genes. The nonimmune genes with the highest

polymorphism per kilobase ratio included C6orf15,

CHRAC1 (chromatin assembly), GUCA1B (visual percep-

tion), OR2S2 (smell perception), PDE6H (visual percep-

tion), POLR1D (polymerase), PSORS1C2 (function

unknown, located in the MHC), and UFM1 (function

unknown). The most polymorphic immune genes in this

analysis were DEFB1, IL1R2, SFTP1, SFTP2, TLR6, and

TLR10. Genes with a level of polymorphism higher than

what is expected by chance included TLR10, DEFB1,

POLR1D, PSORS1C2, and C6orf15.

Furthermore, we used data for the 125 immune genes

mentioned above to calculate the number of common

polymorphisms (minor allele frequency z10%) per kilobase

for subdivisions of the immune genome. We restricted this

calculation to common polymorphisms to prevent bias

generated in lower frequency polymorphisms due to different

sequencing techniques, different samples, or errors from

different laboratories. The average number of common

polymorphisms per kilobase for the 125 immune genes was

1.17. There was no significant difference in the levels of

polymorphism between innate (n = 89) and adaptive (n = 22)

immune genes (p = 0.9483); however, immune receptors (n =

49; 1.62 common polymorphism per kilobase) were signifi-

cantly more polymorphic than nonreceptors (n = 76; 1.40

common polymorphisms per kilobase) (p = 0.0272). Note

that genes that would be considered botherQ as described in

Fig. 1 were not included in the comparison between levels of

polymorphism in innate and adaptive immune genes. The

higher number of innate immune genes was due to using

IIPGA as a data source. To compare immune genome

averages to the exceptionally high levels of common poly-

morphisms in MHC, we calculated the number of common

polymorphisms per kilobase for HLA-A, HLA-B, and HLA-

C. These ratios are 58.28, 57.80, and 61.76, respectively,

which were significantly higher than the immune genome

average (p = 6.83 � 10�68).

The extreme high levels of variation observed in a small

subset of loci may be a feature of brecognitionQ genes thatsense the environment, which includes some immune

system genes as well as some in other categories. Several

Page 5: IRIS: A database surveying known human immune system genes

Fig. 4. Gene clustering, tandem gene duplication, and periodicity in the immune genome. (A) Each point shows the distance (in megabases) between two

immune genes on the same chromosome (x axis) vs the percentage of protein sequence similarity ( y axis) of the measured gene pair. All data points are

combined on this plot to show the data from all chromosomes. (B) The diagram shows how tandem, segmental duplication can result in multiple genes

occurring at set distances. The top shows an initial gene sequence that is subsequently duplicated using tandem, segmental duplications and another

mechanism. The bottom shows the distances measured between genes in the resulting duplicated sequence and an alignment of those distances. (C) A

clustering analysis of chromosome 19, which was selected since it is the most immune-gene-dense chromosome, is provided to show periodicity in the immune

genome. Many points appear in vertical lines, similar to the aligned distances frame of (B). This could indicate the presence of tandem, segmental duplication.

J. Kelley et al. / Genomics 85 (2005) 503–511 507

physiological processes have the basic pattern of recogniz-

ing a foreign stimulus and then developing an appropriate

response. Examples include pathogen recognition followed

by an immune response, olfaction followed by salivation,

visual perception followed by movement, and receptor

stimulation followed by a molecular cascade [24–26]. The

wide ranges of stimuli, either pathogens or factors in the

environment, are continually changing. To remain success-

ful recognition molecules, the recognition genes must also

evolve rapidly and effectively. This might explain the

increased gene duplication and rapid evolution observed in

some genes directing immune response, membrane-surface

interactions, and drug detoxification [21] and the genetic

diversity and high gene number found in the olfactory

Page 6: IRIS: A database surveying known human immune system genes

Fig. 4 (continued).

J. Kelley et al. / Genomics 85 (2005) 503–511508

receptor gene family [27]. Of the molecules with known

functions that we analyzed, genes with the highest level of

polymorphism were involved in recognition, such as the

immune receptors, the MHC class I genes, PDE6H,

GUCA1B (both involved in visual perception), OS2S2

(an olfactory gene), SFTP1, SFTP2, TLR6, and TLR10.

Recent work suggests that other parameters, such as

extensive, undisrupted linkage disequilibrium and large

Fig. 5. Comparison of level of polymorphism between immune genes and non

polymorphisms per kilobase of a gene. The points are arranged in ascending order

immune genes, n = 125.

allele frequency differences between populations, may be

useful indicators of recent, or frequent, selection [28–30]. In

a recent study by Akey et al. [31], they performed a

genome-wide search to determine which database-recorded

polymorphisms statistically show evidence of selective

pressure. When evaluating their lists of polymorphisms

under selective pressure, we noticed that 21% of the

polymorphisms with a statistically low WrightTs Fixation

immune genes. Each point on this scatter plot represents the number of

to show the range of levels of polymorphism. Nonimmune genes, n = 271;

Page 7: IRIS: A database surveying known human immune system genes

Table 2

Functional categories recorded in IRIS

Category Number

Innate immunity 656

Inflammation 320

Cell movement, including chemotaxis and cell adhesion 196

Coagulation 114

Phagocytosis 37

Complement 66

Innate killing, including natural killer cell function 85

Adaptive immunity 432

Cellular response, including immune-related apoptosis 149

Humoral immunity, including activities involving B cells

and immunoglobulins (antibodies)

99

Antigen processing, including recognition and presentation

of antigen

153

Barrier/mucosal immunity 45

Cytokines, chemokines, proteins interacting directly with

them, and proteins in their pathways

264

Pathways or signaling that result in the expression of

immune molecules

478

Development of the immune system, including receptor

formation, hematopoiesis, leukemia, and the maturation/

selection processes

131

Involved in immunodeficiency 74

Involved in autoimmunity 45

Related to disease other than immunodeficiency and

autoimmunity

176

Induced by immunomodulator 202

Expressed primarily in immune tissues 341

Other 108

Number indicates the number of genes recorded in each category. Although

there are 1562 genes listed in IRIS, more than that number appear here, as

genes could be placed into multiple categories.

J. Kelley et al. / Genomics 85 (2005) 503–511 509

Index (FST), indicating balancing selection, occurred in

immune genes, while only 7% of the polymorphisms with

statistically high FST values, indicating positive selection,

were from immune genes. Since 7% of all human genes are

immune related, this initial comparison may offer a

preliminary suggestion that genes under balancing selection

commonly include immune genes.

Disease associations and immune genes

If defective, immune system genes can result in

immunodeficiency [32,33] and can confer susceptibility

to infection [34–36]. Genetic associations identified in

common autoimmune disorders have been attributed

largely to immune system genes [37–40]. Immune genes

have been associated with diseases not related directly to

immune functioning, such as mental retardation, athero-

sclerosis, and myocardial infarction [41–43]. We asked

whether genes in IRIS objectively were disproportionately

disease-associated.

Almost 19% of our immune gene set (n = 293) had

known disease associations recorded in the LocusLink

database, encompassing autoimmunity, immunodeficiency,

or another form of disease. The high number of disease

associations was understandable given the intimate rela-

tionship of the immune system and disease. However, such

a large number of disease-associated genes could be

enhanced by overreporting of false-positive disease asso-

ciations and bias toward conducting disease association

studies on immunologically relevant genes.

Methods

Recording and annotating immune genes

Keyword searches of bcommon immunological terms,Qdefined as bold printed words in Immunobiology [44], on

NCBI’s LocusLink database provided an initial list of

immune genes (http://www.ncbi.nlm.nih.gov/LocusLink/).

We supplemented this list with literature searches and

recorded over 2500 potential immune-related genes. By

using the information available on LocusLink and sub-

sequently linked Web pages, textbooks, and literature

sources, we inspected genes from the initial broad sweep

one by one, discarding any genes that did not match our

definition. For each gene included in IRIS, we recorded a

descriptive name, the official or interim HUGO gene

symbol, and GenBank accession number. Using the

accession number, we extracted the nucleotide sequence

and protein sequence. Where multiple accession numbers

were available for each gene, we selected the accession

number for the longest nucleotide isoform. The human

chromosomal location was established using the Ensembl

database [45]. Each immune gene was then placed into

categories for its real or putative function, listed in Table 2.

This annotation is both reproducible and limited because the

source for information used was the LocusLink entry for

each gene. Personal bias in reading the gene descriptions

could affect the annotation.

Clustering analysis

We compared the protein sequences of every immune

gene against every other immune gene on each chromosome

by BLAST to provide a percentage of protein sequence

similarity.Wethenmeasured thedistance inbasepairsbetween

eachpair of immunegenes oneachchromosome.This resulted

in a two-axis measure of sequence identity and distance

for each pair of immune genes on the same chromosome.

Polymorphism analysis

Data extracted from the supplementary material of

Stephens et al. [23] provided the number of polymorphisms,

occurring at 1% frequency or higher, and the number of base

pairs sequenced for the nonimmune genes in our analysis.

These sequences were from 82 unrelated individuals from

Caucasian, African-American, Asian, and Hispanic–Latino

populations. Evaluating polymorphism levels from multiple

populations can enter additional bias into a study [46–48],

Page 8: IRIS: A database surveying known human immune system genes

J. Kelley et al. / Genomics 85 (2005) 503–511510

since 3 to 5% of genetic differences occur between varying

populations [49]. Additionally, different polymorphism

detection protocols and sequencing different Centre d’Etude

Polymorphisme Humain (CEPH) samples used in different

laboratories can introduce variation in lower frequency

polymorphisms, affecting a comparison between the poly-

morphism data of nonimmune genes and immune genes,

which come from different sources, analyzed in our study.

However, in view of the limited amount of validated,

publicly available polymorphism data, we were unable to

eliminate these biases from our experimental design.

Data extracted from the University of Washington–Fred

Hutchinson Cancer Research Center Variation Discovery

Resource (http://pga.gs.washington.edu) and the IIPGA

(http://innateimmunity.net) were taken between November

2003 and March 2004 and represent the nucleotide sequen-

ces of 23 CEPH Caucasian samples of European descent. A

list of the immune genes used in our polymorphism analysis

is included as supplementary material as Table 3.

Allele frequencies and sequence alignments describing a

population of European descent for HLA-A, HLA-B, and

HLA-C were taken from NCBITs dbMHC database (http://

www.ncbi.nlm.nih.gov/MHC/).

All p value statistics were calculated using an analysis of

variance from SmithTs Statistical Package (http://www.

economics.pomona.edu/StatSite/SSP.html).

Acknowledgments

We thank Peter Lachmann (University of Cambridge)

and Peter Parham (Stanford University) for commenting on

the completeness of the immune gene list, and we thank

Will Wang (Cambridge Institute for Medical Research) for

useful comments on the polymorphism analysis.

References

[1] R. Waterston, et al., Initial sequencing and comparative analysis of the

mouse genome, Nature 420 (2002) 520–562.

[2] E. Zdobnov, et al., Comparative genome and proteome analysis of

Anopheles gambiae and Drosophila melanogaster, Science 298 (2002)

149–159.

[3] G.K. Christophides, et al., Immunity related genes and gene families

in Anopheles gambiae, Science 298 (2002) 159–165.

[4] J. Trowsdale, P. Parham, Defense strategies and immune-related

genes, Eur. J. Immunol. 34 (2004) 7–17.

[5] V. Brusic, J. Zeleznikow, N. Petrovsky, Molecular immunology

databases and data repositories, J. Immunol. Methods 238 (2000)

17–28.

[6] A.D. Baxevanis, The molecular biology database collection: 2002

update, Nucleic Acids Res. 30 (2002) 1–12.

[7] N. Petrovsky, V. Brusic, Computational immunology: the coming of

age, Immunol. Cell Biol. 80 (2002) 248–254.

[8] M.P. Lefranc, G. Lefranc, The Immunoglobulin Facts Book,

Academic Press, London, 2001.

[9] M.P. Lefranc, G. Lefranc, The T Cell Receptor Facts Book, Academic

Press, London, 2001.

[10] M.P. Lefranc, et al., IMGT, the International Immunogenetics

Information System, Nucleic Acids Res. 33 (2005) D593–D597

(database issue).

[11] V. Giudicelli, D. Chaume, M.P. Lefranc, IMGT/GENE-DB: a

comprehensive database for human and mouse immunoglobulin and

T cell receptor genes, Nucleic Acids Res. 33 (2005) D256–D261

(database issue).

[12] E.E. Eichler, Recent duplication, domain accretion and the dynamic

mutation of the human genome, Trends Genet. 17 (2001) 661–669.

[13] O. Elemento, O. Gascuel, M.-P. Lefranc, Reconstructing the duplica-

tion history of tandemly repeated genes, Mol. Biol. Evol. 19 (2002)

278–288.

[14] D. Leister, Tandem and segmental gene duplication and recombination

in the evolution of plant disease resistance genes, Trends Genet. 20

(2004) 116–122.

[15] T. Dandekar, B. Snel, M. Huynen, P. Bork, Conservation of gene

order: a fingerprint of proteins that physically interact, Trends

Biochem. Sci. 23 (1998) 324–328.

[16] T. Blumenthal, et al., A global analysis of Caenorhabditis elegans

operons, Nature 417 (2002) 851–854.

[17] M.J. Lercher, A.O. Urrutia, L.D. Hurst, Clustering of housekeeping

genes provides a unified model of gene order in the human genome,

Nat. Genet. 31 (2002) 180–183.

[18] J.M. Lee, E.L.L. Sonnhammer, Genomic gene clustering analysis of

pathways in eukaryotes, Genome Res. 13 (2003) 875–882.

[19] MHC Sequencing Consortium, Complete sequence and gene map

of a human major histocompatibility complex, Nature 401 (1999)

921–923.

[20] J. Trowsdale, The gentle art of gene arrangement: the meaning of gene

clusters, Genome Biol. 3 (2002) 2002.1–2002.5.

[21] J.A. Bailey, et al., Recent segmental duplications in the human

genome, Science 297 (2002) 1003–1007.

[22] C.B. Moore, et al., Evidence of HIV-1 adaptation to HLA-

restricted immune responses at a population level, Science 296

(2002) 1439–1443.

[23] J. Stephens, et al., Haplotype variation and linkage disequilibrium in

313 human genes, Science 293 (2001) 489–493.

[24] R. Medzhitov, C.A. Janeway, The Toll receptor family and microbial

recognition, Trends Microbiol. 8 (2000) 452–456.

[25] S.A. Hackley, F. Valle-Inclan, Which stages of processing are speeded

by a warning signal? Biol. Psychol. 64 (2003) 27–45.

[26] R.M. Pangborn, S.A. Witherly, F. Jones, Parotid and whole-mouth

secretion in response to viewing, handling, and sniffing food,

Perception 8 (1979) 339–346.

[27] I. Menashe, O. Man, D. Lancet, Y. Gilad, Different noses for different

people, Nat. Genet. 34 (2003) 143–144.

[28] T. Bersaglieri, et al., Genetic signature of strong recent positive

selection at the lactase gene, Am. J. Hum.Genet. 74 (2004) 1111–1120.

[29] P.C. Sabeti, et al., Detecting recent positive selection in the human

genome from haplotype structure, Nature 419 (2002) 832–837.

[30] M. Nordborg, S. Tavare, Linkage disequilibrium: what history has to

tell us, Trends Genet. 18 (2002) 83–90.

[31] J.M. Akey, G. Zhang, K. Zhang, L. Jin, M.D. Shriver, Interrogating a

high density SNP map for signatures of natural selection, Genome

Res. 12 (2002) 1805–1814.

[32] H.W.J. Schroeder, H.W.R. Schroeder, S.M. Sheikh, The complex

genetics of common variable immunodeficiency, J. Invest. Med. 52

(2004) 90–103.

[33] V. Lemahieu, J.M. Gastier, U. Francke, Novel mutations in the

Wiskott–Aldrich syndrome and their effects on transcriptional, trans-

lational, and clinical phenotypes, Hum. Mutat. 14 (1999) 54–66.

[34] M.P. Martin, et al., Epistatic interaction between KIR3DS1 and HLA-

B delays the progression to AIDS, Nat. Genet. 31 (2002) 429–434.

[35] O. Koch, et al., IFNGR1 gene promoter polymorphisms and

susceptibility to cerebral malaria, J. Infect. Dis. 185 (2002)

1684–1687.

[36] A. Balamurugan, S.K. Sharma, N.K. Mehra, Human leukocyte

Page 9: IRIS: A database surveying known human immune system genes

J. Kelley et al. / Genomics 85 (2005) 503–511 511

antigen class I supertypes influence susceptibility and severity of

tuberculosis, J. Infect. Dis. 189 (2004) 805–811.

[37] J.-P. Hugot, et al., Association of NOD2 leucine-rich repeats

variants with susceptibility to Crohn’s disease, Nature 411 (2001)

599–603.

[38] Y. Ogura, et al., A frameshift mutation in NOD2 associated

with susceptibility to Crohn’s disease, Nature 411 (2001)

603–606.

[39] J.A. Todd, L.S. Wicker, Genetic protection from inflammatory disease

type 1 diabetes in human and animal models, Immunity 15 (2001)

387–395.

[40] D.A. Dyment, G.C. Ebers, A.D. Sadovnick, Genetics of multiple

sclerosis, Lancet Neurol. 3 (2004) 104–110.

[41] A. Carrie, et al., A new member of the IL-1 receptor family highly

expressed in hippocampus and involved in X-linked mental retarda-

tion, Nat. Genet. 23 (1999) 25–31.

[42] K. Ozaki, et al., Functional SNPs in the lymphotoxin-alpha gene that

are associated with susceptibility to myocardial infarction, Nat. Genet.

32 (2002) 650–654.

[43] K. Wenzel, et al., E-selectin polymorphism and atherosclerosis: an

association study, Hum. Mol. Genet. 3 (1994) 1935–1937.

[44] C.A. Janeway, P. Travers, M. Walport, M. Shlomchik, Immunobio-

logy, Garland, New York, 2001.

[45] E. Birney, et al., An overview of Ensembl, Genome Res. 14 (2004)

925–928.

[46] J.F. Wilson, et al., Population genetic structure of variable drug

response, Nat. Genet. 29 (2001) 265–269.

[47] J.A. Schneider, et al., DNA variability of human genes, Mech. Ageing

Dev. 124 (2003) 17–25.

[48] N. Risch, E. Burchard, E. Ziv, H. Tang, Categorization of humans in

biomedical research: genes, race and disease, Genome Biol. 3 (2002),

2007.

[49] N.A. Rosenberg, et al., Genetic structure of human populations,

Science 298 (2002) 2381–2385.