Upload
james-kelley
View
214
Download
0
Embed Size (px)
Citation preview
www.elsevier.com/locate/ygeno
Genomics 85 (20
IRIS: A database surveying known human immune system genes$
James Kelley a, Bernard de Bonob, John Trowsdalea,TaDepartment of Pathology, Immunology Division, University of Cambridge, Cambridge CB2 1QP, UK
bEuropean Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, UK
Received 11 December 2004; accepted 20 January 2005
Abstract
We have compiled an online database of known human defense genes: the Immunogenetic Related Information Source (IRIS).
As of October 1, 2004, there are 1562 immune genes recorded in IRIS, representing 7% of the human genome. This resource
contains searchable information including chromosomal location, sequence data, and a curated functional annotation for each entry.
We used IRIS as a basis for analyzing the composition and characteristics of the immune genome, such as gene clustering,
polymorphism, and relationship to disease. High protein sequence similarity correlated inversely with distance between immune
genes, consistent with clustering of duplicated loci. We also found that, even though some immune genes exhibit high levels of
polymorphism, such as MHC class I, the range of levels of polymorphism in immune genes is similar to that of nonimmune
genes. Approximately 20% of immune genes have a known disease association. IRIS is available online at http://
www.immunegene.org.
D 2005 Elsevier Inc. All rights reserved.
Keywords: Immune system; Human genome; Genetic information databases; Genetic polymorphisms; Gene clusters; Disease
Where comparisons have been made, such as between
mouse and human [1] and Anopheles and Drosophila
[2,3], the number of immune system genes differs
markedly in related species. It has been suggested that
other features, such as increased levels of polymorphism
and clustering of immune system genes, may also reflect
the strong selection required for resistance to infection [4].
A database of all known immune genes would provide an
organized subset of functionally related human genes to
compare immune system genes, as a set, with the rest of
the genome. It could also be used in genetic studies of
function and disease, in DNA arrays, and in computer
prediction programs of potential therapeutic targets during
drug development.
While several databases address the function of the im-
mune system [5–7], none include an exclusive survey of
0888-7543/$ - see front matter D 2005 Elsevier Inc. All rights reserved.
doi:10.1016/j.ygeno.2005.01.009
$Supplementary data for this article may be found on ScienceDirect.
T Corresponding author. Fax: +44 1223 339768.
E-mail address: [email protected] (J. Trowsdale).
the entire immune genome. To create such a database,
we developed the Immunogenetic Related Information
Source (IRIS). It currently includes information on
chromosomal locations, protein and nucleotide sequences,
and manual curations of the proposed function of each
gene in immunity. This online resource is available at
http://www.immunegene.org.
Using IRIS we set out to explore whether, as a group,
immune system genes display any distinctive character-
istics that reflect selective pressure for resistance to
infection, including clustering, level of polymorphism,
and genetic association with disease. This formal
organization of immune genes provides a basis for
functional interaction mapping, serves as a documented
subset of functionally related genes, and facilitates future
observations on genomic features of the immune system.
Our initial analysis using IRIS suggests that immune
defense genes, as a functional subset, exhibit marked
gene duplication and association with disease, although
these features are by no means restricted to immune
defense.
05) 503–511
J. Kelley et al. / Genomics 85 (2005) 503–511504
Results and discussion
Definition of an immune gene
For the purposes of this database, an immune gene was
operationally defined as a complete gene that produces a
functional transcript and demonstrates at least one defense
characteristic listed in Table 1. Gene segments, such as those
encoding antigen receptors, immunoglobulins, and T cell
receptors [8,9], were not included. This exclusion prevented
an overrepresentation of data when counting the number of
immune genes. These immune gene segments are extensively
analyzed in an online database, the International ImMuno-
GeneTics Information System (IMGT) (http://imgt.cines.
fr:8104), maintained by LIGM at Montpellier, France
[10,11]. (For more details on the definition of an immune
gene, please refer to a recent review from our laboratory [4].)
Even though some genes not included under this
definition may be involved in defense mechanisms in a
broader sense, we were deliberately conservative for the
initial compilation. We have not included genes whose
products correct or prevent physical disturbances such as
tissue repair from wounds or DNA repair from radiation (for
example, melanin); genes whose products provide a
physical barrier such as epithelia, unless there is a specific
interaction with a pathogen (for example, keratin); genes
whose role in immunity is used in many systems of the body
(for example, actin); genes that share sequence similarity
with known immune genes but have no putative defense
function (for example, some immunoglobulin superfamily
members); and genes whose products interact with non-
pathogenic antigens (for example, some components in
digestion or reproduction).
Composition and functional survey of the immune genome
As of October 1, 2004, IRIS contained 1562 immune
genes, indicating that the immune genome comprises over
Table 1
Defense characteristics of the immune system
Known or putative function in innate or adaptive immunity
Participates in the development or maturation of immune system
components
Induced by immunomodulatorsa
Encodes a protein expressed primarily in immune tissuesb
Participates in an immune pathway that results in the expression of
defense moleculesc
Produces a protein that interacts directly with pathogens or their products
Any complete, human gene that produces a functional transcript and
matches one or more of these criteria is included in IRIS.a Immunomodulators, such as IFN-g, regulate expression of immune
components. Proteins that are direct effectors of immunomodulators are
thought to have an immune function.b We assume that if a gene is commonly expressed in immune tissues then
it has a role in immune functioning.c NF-nB is an example.
7% of the total 21,432 human genes. This was the number
of complete human genes recorded in LocusLink (http://
www.ncbi.nlm.nih.gov/LocusLink).
In this analysis, the innate and adaptive immune systems
occupied 41 and 27%, respectively, of the immune genome.
The larger number of innate immune genes compared to
adaptive immune system genes was affected by the absence
in IRIS of immunoglobulin [8] and T cell receptor [9] gene
segment repertoires, which in base pairs comprise a large
portion of the adaptive immune genome. Genes dedicated to
the development of immune tissues, for example in immune
cell hematopoiesis, represented 8%. Approximately 24% of
the immune genes were not easily placed into any of the
previously mentioned categories. This botherQ category
included genes that can be induced by an immunomodulator,
can participate in a pathway leading to the expression of an
immune molecule, and are expressed primarily in immune
tissues (see Fig. 1).
Distribution of immune genes
IRIS was used to ask whether immune system genes are
randomly distributed in the genome. They were found to
be fairly evenly distributed, as expected of an ancient,
complex system. However, mapping immune gene distri-
bution, which is shown as Fig. 2, revealed some variation
in immune gene densities. Chromosomes 19, 17, 6, and 11
had higher, and chromosomes 18, 13, X, and 14 had lower,
immune gene densities, while the Y chromosome was
completely lacking in immune genes. There were some
pockets of immune genes at very high density. In some
cases this was accounted for by adjacent duplicated genes,
such as in the leukocyte receptor complex (LRC) of
chromosome 19, the natural killer complex of chromosome
12, the MHC on chromosome 6, and the cluster of
interleukin genes on chromosome 1. This increased density
might be due to relatively recent gene duplications, as in
the LRC, but there was also evidence for examples, such
as the MHC, that do not solely consist of related
duplicates.
Clustering of immune genes
There are essentially two types of gene clusters. The most
common are clusters of related duplicated genes whose
functions diverge, such as the KIRs and ULBPs. These
presumably result from tandem segmental duplications [12],
caused by slipped-strand mispairing, gene conversion, or
unequal crossover [13]. Tandem duplication explains the
clustered arrangement of the NBS–LRR disease resistance
genes in plants, for example [14].
A second type of gene cluster is of genes with related
functions but with unrelated sequences. Clustering of this
type is found in prokaryotic operons that encode interact-
ing proteins [15]. With the exception of gene clustering in
operons of Caenorhabditis elegans [16], it is generally
Fig. 1. Percentage of immune genes dedicated to major functional divisions.
Note that immune gene segments, such as those that encode antigen
receptors, are not included. These 673 gene segments comprise a substantial
portion of the adaptive immune genome and are recorded in the IMGT/
GENE-DB database (http://imgt.cines.fr) [10,11].
J. Kelley et al. / Genomics 85 (2005) 503–511 505
thought that genes in eukaryotes are randomly distributed in
the genome [17]. However, recent observations in human
and other eukaryotic genomes reveal significant clustering of
genes that function within the same pathway [18]. While
there are suggestions of grouping of functionally related
immune system genes in the MHC [19,20] and evidence of
nonrandom distribution of segmental duplications for
immune genes [21], no comprehensive, reported measure
of gene clustering exists in the literature.
Immune genes with high protein sequence similarity
(z63%) were significantly clustered together (p = 7.57 �10�31), while immune genes with low protein sequence
Fig. 2. Immune genomemap. Each line shows the genomic location of an immune ge
the distances between genes are drawn to scale. Note that immune gene segments, su
similarity (V21%) were distributed randomly with respect
to distance (p = 0.1698). We arbitrarily established the
definition of high and low sequence similarity by taking
the protein sequence similarity values falling above and
below the median 95% of all values, as shown in Fig. 3.
Using an analysis of variance between the distance values
of all immune gene pairs and gene pairs with descending
thresholds of sequence similarity, we found that subsets of
immune genes with protein sequence similarity equal to or
greater than 28% are located significantly closer in
distance, compared to the entire data set (see Fig. 4).
The inverse relationship between distance and protein
sequence similarity suggested prevalence of tandem dupli-
cations and segmental duplications resulting in close
proximity of diverging genes. Further evidence for this
could be seen in the periodicity noticed comparing
distances between gene pairs on a chromosome. As shown
in Figs. 4B and 4C, we observed that genes on the same
chromosome frequently occur at set distances from each
other, demonstrating both the clustered nature of immune
gene superfamilies and their arrangement as tandem,
segmental duplications.
Polymorphism in immune genes
Certain immune loci are among the most variable in
sequence, consistent with selection related to differences in
pathogen load and type. For example, MHC class I
ne. Dense clusters of genes appear as a larger band. Chromosome lengths and
ch as those that encode immunoglobulin and Tcell receptors, are not included.
Fig. 3. Number of genes by percentage of sequence similarity in the clustering analysis. Genes falling in the lighter shaded area constitute the median 95% of
gene pairs in this analysis. The vertical lines and darker areas indicate the gene pairs we considered as having high (z63%) and low (V21%) sequence
similarity. The mean percentage of protein sequence similarity for immune genes is 32.4%.
J. Kelley et al. / Genomics 85 (2005) 503–511506
exhibits extremely high levels of polymorphism, relating to
its role in presenting antigens from infectious organisms,
some of which, such as HIV, evolve rapidly [22]. We
therefore used the IRIS database to ask whether immune
system genes, in general, are disproportionately polymor-
phic. Using data provided from Stephens et al. [23], we
calculated the mean number of polymorphisms per kilo-
base, with 95% confidence intervals, for 271 nonimmune
genes as 5.36 F 0.31. The same measure for 125 immune
genes, calculated with data extracted from the University
of Washington–Fred Hutchinson Cancer Research Center
Variation Discovery Resource (UW-FHCRC) (http://
pga.gs.washington.edu) and the Innate Immunity Programs
for Genetic Applications (IIPGA) (http://innateimmunity.
net), was 4.85 F 0.45. The levels of polymorphism
between these two data sets, which are shown in Fig. 5,
were not statistically significantly different (p = 0.0671).
Therefore, the overall range of levels of polymorphism of
immune system genes was, in general, similar to that of
nonimmune genes. The nonimmune genes with the highest
polymorphism per kilobase ratio included C6orf15,
CHRAC1 (chromatin assembly), GUCA1B (visual percep-
tion), OR2S2 (smell perception), PDE6H (visual percep-
tion), POLR1D (polymerase), PSORS1C2 (function
unknown, located in the MHC), and UFM1 (function
unknown). The most polymorphic immune genes in this
analysis were DEFB1, IL1R2, SFTP1, SFTP2, TLR6, and
TLR10. Genes with a level of polymorphism higher than
what is expected by chance included TLR10, DEFB1,
POLR1D, PSORS1C2, and C6orf15.
Furthermore, we used data for the 125 immune genes
mentioned above to calculate the number of common
polymorphisms (minor allele frequency z10%) per kilobase
for subdivisions of the immune genome. We restricted this
calculation to common polymorphisms to prevent bias
generated in lower frequency polymorphisms due to different
sequencing techniques, different samples, or errors from
different laboratories. The average number of common
polymorphisms per kilobase for the 125 immune genes was
1.17. There was no significant difference in the levels of
polymorphism between innate (n = 89) and adaptive (n = 22)
immune genes (p = 0.9483); however, immune receptors (n =
49; 1.62 common polymorphism per kilobase) were signifi-
cantly more polymorphic than nonreceptors (n = 76; 1.40
common polymorphisms per kilobase) (p = 0.0272). Note
that genes that would be considered botherQ as described in
Fig. 1 were not included in the comparison between levels of
polymorphism in innate and adaptive immune genes. The
higher number of innate immune genes was due to using
IIPGA as a data source. To compare immune genome
averages to the exceptionally high levels of common poly-
morphisms in MHC, we calculated the number of common
polymorphisms per kilobase for HLA-A, HLA-B, and HLA-
C. These ratios are 58.28, 57.80, and 61.76, respectively,
which were significantly higher than the immune genome
average (p = 6.83 � 10�68).
The extreme high levels of variation observed in a small
subset of loci may be a feature of brecognitionQ genes thatsense the environment, which includes some immune
system genes as well as some in other categories. Several
Fig. 4. Gene clustering, tandem gene duplication, and periodicity in the immune genome. (A) Each point shows the distance (in megabases) between two
immune genes on the same chromosome (x axis) vs the percentage of protein sequence similarity ( y axis) of the measured gene pair. All data points are
combined on this plot to show the data from all chromosomes. (B) The diagram shows how tandem, segmental duplication can result in multiple genes
occurring at set distances. The top shows an initial gene sequence that is subsequently duplicated using tandem, segmental duplications and another
mechanism. The bottom shows the distances measured between genes in the resulting duplicated sequence and an alignment of those distances. (C) A
clustering analysis of chromosome 19, which was selected since it is the most immune-gene-dense chromosome, is provided to show periodicity in the immune
genome. Many points appear in vertical lines, similar to the aligned distances frame of (B). This could indicate the presence of tandem, segmental duplication.
J. Kelley et al. / Genomics 85 (2005) 503–511 507
physiological processes have the basic pattern of recogniz-
ing a foreign stimulus and then developing an appropriate
response. Examples include pathogen recognition followed
by an immune response, olfaction followed by salivation,
visual perception followed by movement, and receptor
stimulation followed by a molecular cascade [24–26]. The
wide ranges of stimuli, either pathogens or factors in the
environment, are continually changing. To remain success-
ful recognition molecules, the recognition genes must also
evolve rapidly and effectively. This might explain the
increased gene duplication and rapid evolution observed in
some genes directing immune response, membrane-surface
interactions, and drug detoxification [21] and the genetic
diversity and high gene number found in the olfactory
Fig. 4 (continued).
J. Kelley et al. / Genomics 85 (2005) 503–511508
receptor gene family [27]. Of the molecules with known
functions that we analyzed, genes with the highest level of
polymorphism were involved in recognition, such as the
immune receptors, the MHC class I genes, PDE6H,
GUCA1B (both involved in visual perception), OS2S2
(an olfactory gene), SFTP1, SFTP2, TLR6, and TLR10.
Recent work suggests that other parameters, such as
extensive, undisrupted linkage disequilibrium and large
Fig. 5. Comparison of level of polymorphism between immune genes and non
polymorphisms per kilobase of a gene. The points are arranged in ascending order
immune genes, n = 125.
allele frequency differences between populations, may be
useful indicators of recent, or frequent, selection [28–30]. In
a recent study by Akey et al. [31], they performed a
genome-wide search to determine which database-recorded
polymorphisms statistically show evidence of selective
pressure. When evaluating their lists of polymorphisms
under selective pressure, we noticed that 21% of the
polymorphisms with a statistically low WrightTs Fixation
immune genes. Each point on this scatter plot represents the number of
to show the range of levels of polymorphism. Nonimmune genes, n = 271;
Table 2
Functional categories recorded in IRIS
Category Number
Innate immunity 656
Inflammation 320
Cell movement, including chemotaxis and cell adhesion 196
Coagulation 114
Phagocytosis 37
Complement 66
Innate killing, including natural killer cell function 85
Adaptive immunity 432
Cellular response, including immune-related apoptosis 149
Humoral immunity, including activities involving B cells
and immunoglobulins (antibodies)
99
Antigen processing, including recognition and presentation
of antigen
153
Barrier/mucosal immunity 45
Cytokines, chemokines, proteins interacting directly with
them, and proteins in their pathways
264
Pathways or signaling that result in the expression of
immune molecules
478
Development of the immune system, including receptor
formation, hematopoiesis, leukemia, and the maturation/
selection processes
131
Involved in immunodeficiency 74
Involved in autoimmunity 45
Related to disease other than immunodeficiency and
autoimmunity
176
Induced by immunomodulator 202
Expressed primarily in immune tissues 341
Other 108
Number indicates the number of genes recorded in each category. Although
there are 1562 genes listed in IRIS, more than that number appear here, as
genes could be placed into multiple categories.
J. Kelley et al. / Genomics 85 (2005) 503–511 509
Index (FST), indicating balancing selection, occurred in
immune genes, while only 7% of the polymorphisms with
statistically high FST values, indicating positive selection,
were from immune genes. Since 7% of all human genes are
immune related, this initial comparison may offer a
preliminary suggestion that genes under balancing selection
commonly include immune genes.
Disease associations and immune genes
If defective, immune system genes can result in
immunodeficiency [32,33] and can confer susceptibility
to infection [34–36]. Genetic associations identified in
common autoimmune disorders have been attributed
largely to immune system genes [37–40]. Immune genes
have been associated with diseases not related directly to
immune functioning, such as mental retardation, athero-
sclerosis, and myocardial infarction [41–43]. We asked
whether genes in IRIS objectively were disproportionately
disease-associated.
Almost 19% of our immune gene set (n = 293) had
known disease associations recorded in the LocusLink
database, encompassing autoimmunity, immunodeficiency,
or another form of disease. The high number of disease
associations was understandable given the intimate rela-
tionship of the immune system and disease. However, such
a large number of disease-associated genes could be
enhanced by overreporting of false-positive disease asso-
ciations and bias toward conducting disease association
studies on immunologically relevant genes.
Methods
Recording and annotating immune genes
Keyword searches of bcommon immunological terms,Qdefined as bold printed words in Immunobiology [44], on
NCBI’s LocusLink database provided an initial list of
immune genes (http://www.ncbi.nlm.nih.gov/LocusLink/).
We supplemented this list with literature searches and
recorded over 2500 potential immune-related genes. By
using the information available on LocusLink and sub-
sequently linked Web pages, textbooks, and literature
sources, we inspected genes from the initial broad sweep
one by one, discarding any genes that did not match our
definition. For each gene included in IRIS, we recorded a
descriptive name, the official or interim HUGO gene
symbol, and GenBank accession number. Using the
accession number, we extracted the nucleotide sequence
and protein sequence. Where multiple accession numbers
were available for each gene, we selected the accession
number for the longest nucleotide isoform. The human
chromosomal location was established using the Ensembl
database [45]. Each immune gene was then placed into
categories for its real or putative function, listed in Table 2.
This annotation is both reproducible and limited because the
source for information used was the LocusLink entry for
each gene. Personal bias in reading the gene descriptions
could affect the annotation.
Clustering analysis
We compared the protein sequences of every immune
gene against every other immune gene on each chromosome
by BLAST to provide a percentage of protein sequence
similarity.Wethenmeasured thedistance inbasepairsbetween
eachpair of immunegenes oneachchromosome.This resulted
in a two-axis measure of sequence identity and distance
for each pair of immune genes on the same chromosome.
Polymorphism analysis
Data extracted from the supplementary material of
Stephens et al. [23] provided the number of polymorphisms,
occurring at 1% frequency or higher, and the number of base
pairs sequenced for the nonimmune genes in our analysis.
These sequences were from 82 unrelated individuals from
Caucasian, African-American, Asian, and Hispanic–Latino
populations. Evaluating polymorphism levels from multiple
populations can enter additional bias into a study [46–48],
J. Kelley et al. / Genomics 85 (2005) 503–511510
since 3 to 5% of genetic differences occur between varying
populations [49]. Additionally, different polymorphism
detection protocols and sequencing different Centre d’Etude
Polymorphisme Humain (CEPH) samples used in different
laboratories can introduce variation in lower frequency
polymorphisms, affecting a comparison between the poly-
morphism data of nonimmune genes and immune genes,
which come from different sources, analyzed in our study.
However, in view of the limited amount of validated,
publicly available polymorphism data, we were unable to
eliminate these biases from our experimental design.
Data extracted from the University of Washington–Fred
Hutchinson Cancer Research Center Variation Discovery
Resource (http://pga.gs.washington.edu) and the IIPGA
(http://innateimmunity.net) were taken between November
2003 and March 2004 and represent the nucleotide sequen-
ces of 23 CEPH Caucasian samples of European descent. A
list of the immune genes used in our polymorphism analysis
is included as supplementary material as Table 3.
Allele frequencies and sequence alignments describing a
population of European descent for HLA-A, HLA-B, and
HLA-C were taken from NCBITs dbMHC database (http://
www.ncbi.nlm.nih.gov/MHC/).
All p value statistics were calculated using an analysis of
variance from SmithTs Statistical Package (http://www.
economics.pomona.edu/StatSite/SSP.html).
Acknowledgments
We thank Peter Lachmann (University of Cambridge)
and Peter Parham (Stanford University) for commenting on
the completeness of the immune gene list, and we thank
Will Wang (Cambridge Institute for Medical Research) for
useful comments on the polymorphism analysis.
References
[1] R. Waterston, et al., Initial sequencing and comparative analysis of the
mouse genome, Nature 420 (2002) 520–562.
[2] E. Zdobnov, et al., Comparative genome and proteome analysis of
Anopheles gambiae and Drosophila melanogaster, Science 298 (2002)
149–159.
[3] G.K. Christophides, et al., Immunity related genes and gene families
in Anopheles gambiae, Science 298 (2002) 159–165.
[4] J. Trowsdale, P. Parham, Defense strategies and immune-related
genes, Eur. J. Immunol. 34 (2004) 7–17.
[5] V. Brusic, J. Zeleznikow, N. Petrovsky, Molecular immunology
databases and data repositories, J. Immunol. Methods 238 (2000)
17–28.
[6] A.D. Baxevanis, The molecular biology database collection: 2002
update, Nucleic Acids Res. 30 (2002) 1–12.
[7] N. Petrovsky, V. Brusic, Computational immunology: the coming of
age, Immunol. Cell Biol. 80 (2002) 248–254.
[8] M.P. Lefranc, G. Lefranc, The Immunoglobulin Facts Book,
Academic Press, London, 2001.
[9] M.P. Lefranc, G. Lefranc, The T Cell Receptor Facts Book, Academic
Press, London, 2001.
[10] M.P. Lefranc, et al., IMGT, the International Immunogenetics
Information System, Nucleic Acids Res. 33 (2005) D593–D597
(database issue).
[11] V. Giudicelli, D. Chaume, M.P. Lefranc, IMGT/GENE-DB: a
comprehensive database for human and mouse immunoglobulin and
T cell receptor genes, Nucleic Acids Res. 33 (2005) D256–D261
(database issue).
[12] E.E. Eichler, Recent duplication, domain accretion and the dynamic
mutation of the human genome, Trends Genet. 17 (2001) 661–669.
[13] O. Elemento, O. Gascuel, M.-P. Lefranc, Reconstructing the duplica-
tion history of tandemly repeated genes, Mol. Biol. Evol. 19 (2002)
278–288.
[14] D. Leister, Tandem and segmental gene duplication and recombination
in the evolution of plant disease resistance genes, Trends Genet. 20
(2004) 116–122.
[15] T. Dandekar, B. Snel, M. Huynen, P. Bork, Conservation of gene
order: a fingerprint of proteins that physically interact, Trends
Biochem. Sci. 23 (1998) 324–328.
[16] T. Blumenthal, et al., A global analysis of Caenorhabditis elegans
operons, Nature 417 (2002) 851–854.
[17] M.J. Lercher, A.O. Urrutia, L.D. Hurst, Clustering of housekeeping
genes provides a unified model of gene order in the human genome,
Nat. Genet. 31 (2002) 180–183.
[18] J.M. Lee, E.L.L. Sonnhammer, Genomic gene clustering analysis of
pathways in eukaryotes, Genome Res. 13 (2003) 875–882.
[19] MHC Sequencing Consortium, Complete sequence and gene map
of a human major histocompatibility complex, Nature 401 (1999)
921–923.
[20] J. Trowsdale, The gentle art of gene arrangement: the meaning of gene
clusters, Genome Biol. 3 (2002) 2002.1–2002.5.
[21] J.A. Bailey, et al., Recent segmental duplications in the human
genome, Science 297 (2002) 1003–1007.
[22] C.B. Moore, et al., Evidence of HIV-1 adaptation to HLA-
restricted immune responses at a population level, Science 296
(2002) 1439–1443.
[23] J. Stephens, et al., Haplotype variation and linkage disequilibrium in
313 human genes, Science 293 (2001) 489–493.
[24] R. Medzhitov, C.A. Janeway, The Toll receptor family and microbial
recognition, Trends Microbiol. 8 (2000) 452–456.
[25] S.A. Hackley, F. Valle-Inclan, Which stages of processing are speeded
by a warning signal? Biol. Psychol. 64 (2003) 27–45.
[26] R.M. Pangborn, S.A. Witherly, F. Jones, Parotid and whole-mouth
secretion in response to viewing, handling, and sniffing food,
Perception 8 (1979) 339–346.
[27] I. Menashe, O. Man, D. Lancet, Y. Gilad, Different noses for different
people, Nat. Genet. 34 (2003) 143–144.
[28] T. Bersaglieri, et al., Genetic signature of strong recent positive
selection at the lactase gene, Am. J. Hum.Genet. 74 (2004) 1111–1120.
[29] P.C. Sabeti, et al., Detecting recent positive selection in the human
genome from haplotype structure, Nature 419 (2002) 832–837.
[30] M. Nordborg, S. Tavare, Linkage disequilibrium: what history has to
tell us, Trends Genet. 18 (2002) 83–90.
[31] J.M. Akey, G. Zhang, K. Zhang, L. Jin, M.D. Shriver, Interrogating a
high density SNP map for signatures of natural selection, Genome
Res. 12 (2002) 1805–1814.
[32] H.W.J. Schroeder, H.W.R. Schroeder, S.M. Sheikh, The complex
genetics of common variable immunodeficiency, J. Invest. Med. 52
(2004) 90–103.
[33] V. Lemahieu, J.M. Gastier, U. Francke, Novel mutations in the
Wiskott–Aldrich syndrome and their effects on transcriptional, trans-
lational, and clinical phenotypes, Hum. Mutat. 14 (1999) 54–66.
[34] M.P. Martin, et al., Epistatic interaction between KIR3DS1 and HLA-
B delays the progression to AIDS, Nat. Genet. 31 (2002) 429–434.
[35] O. Koch, et al., IFNGR1 gene promoter polymorphisms and
susceptibility to cerebral malaria, J. Infect. Dis. 185 (2002)
1684–1687.
[36] A. Balamurugan, S.K. Sharma, N.K. Mehra, Human leukocyte
J. Kelley et al. / Genomics 85 (2005) 503–511 511
antigen class I supertypes influence susceptibility and severity of
tuberculosis, J. Infect. Dis. 189 (2004) 805–811.
[37] J.-P. Hugot, et al., Association of NOD2 leucine-rich repeats
variants with susceptibility to Crohn’s disease, Nature 411 (2001)
599–603.
[38] Y. Ogura, et al., A frameshift mutation in NOD2 associated
with susceptibility to Crohn’s disease, Nature 411 (2001)
603–606.
[39] J.A. Todd, L.S. Wicker, Genetic protection from inflammatory disease
type 1 diabetes in human and animal models, Immunity 15 (2001)
387–395.
[40] D.A. Dyment, G.C. Ebers, A.D. Sadovnick, Genetics of multiple
sclerosis, Lancet Neurol. 3 (2004) 104–110.
[41] A. Carrie, et al., A new member of the IL-1 receptor family highly
expressed in hippocampus and involved in X-linked mental retarda-
tion, Nat. Genet. 23 (1999) 25–31.
[42] K. Ozaki, et al., Functional SNPs in the lymphotoxin-alpha gene that
are associated with susceptibility to myocardial infarction, Nat. Genet.
32 (2002) 650–654.
[43] K. Wenzel, et al., E-selectin polymorphism and atherosclerosis: an
association study, Hum. Mol. Genet. 3 (1994) 1935–1937.
[44] C.A. Janeway, P. Travers, M. Walport, M. Shlomchik, Immunobio-
logy, Garland, New York, 2001.
[45] E. Birney, et al., An overview of Ensembl, Genome Res. 14 (2004)
925–928.
[46] J.F. Wilson, et al., Population genetic structure of variable drug
response, Nat. Genet. 29 (2001) 265–269.
[47] J.A. Schneider, et al., DNA variability of human genes, Mech. Ageing
Dev. 124 (2003) 17–25.
[48] N. Risch, E. Burchard, E. Ziv, H. Tang, Categorization of humans in
biomedical research: genes, race and disease, Genome Biol. 3 (2002),
2007.
[49] N.A. Rosenberg, et al., Genetic structure of human populations,
Science 298 (2002) 2381–2385.