Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Chapter 1
Review of Literature
Review of Literature
1.1 Background All of us exhibit a wide range of phenotypic variation: morphological,
physiological, psychological, behavioral, susceptibility to diseases, as well as
response to drugs. The reason for this phenotypic diversity is thought to be due to
the existence of genetic variations, although gene-environment interactions are
increasingly becoming important. The genetic variations may be of several types,
such as single nucleotide polymorphisms (SNPs), insertion/deletion, block
substitutions, inversions, variable number of tandem repeat sequences (VNTRs),
microsatellites, and copy number variations (CNVs).
Recent astounding progress in sequencing technology has opened a way for
personal genomics, with six genome sequences already published in the last two
years: that of J. Craig Venter (Levy et al. 2007), James D. Watson (Wheeler et al.
2008), as well as four additional genomes from Han Chinese (Asian) (Wang et al.
2008), Nigerian (African) (Bentley et al. 2008), and two Korean (Ahn et al.
2009;Kim et al. 2009) individuals. The availability of these genomes has furthered
our understanding of the various forms of human genetic variations. While some
of these genetic variations have no proven phenotypic influences, others may have
significant consequences. For instance, they may increase our risk of developing a
disease or lower the likelihood of our response to a particular pharmaceutical
treatment. Comprehensive understanding of various forms of genetic variations
may uncover the genetic basis of human phenotypic differences.
SNPs (often pronounced as ‘snips’), the most common source of human genetic
variations, have been implicated in a number of human complex disorders, as well
as in inter-individual variability in drug response. However, the identification of
disease-associated SNPs from the large pool of SNPs is a daunting task, especially
keeping in mind the cost and labour involved in the existing methods, including
whole genome association studies (or WGAS).
Computational approaches, by prioritization of functional SNPs, may facilitate
experimental efforts to identify SNPs that impact biological processes. This thesis
deals with the development of such computational methods capable of identifying
1 A T GC
Review of Literature
functionally relevant coding regions, as well as SNPs likely to be deleterious, and
their applications in population genetics.
1.2 Single Nucleotide Polymorphisms
1.2.1 Introduction The genetic variations resulting in the substitution of one nucleotide for another in
a DNA sequence are called single-nucleotide polymorphisms, or SNPs (Figure
1.1). For a variation to be considered a SNP, it must occur in at least 1 per cent of
the population (Botstein, Risch 2003).
T
Individual 1 Individual 2
Figure 1.1 A single nucleotide difference in individuals 1 and 2, depicting SNP.
These DNA base pair polymorphisms are of two types: transitions, which involve
substitution of a purine by another purine (A G) or a pyrimidine by another
pyrimidine nucleotide (C T); and transversions, which involve substitution of a
purine by a pyrimidine or vice versa.
C G A
2 A T GC
Review of Literature
1.2.2 Classification of SNPs SNPs are classified based on their genomic location into coding and non-coding
SNPs. Non-coding SNPs may occur in the promoter region of the gene, within
introns, 5´- and 3´- untranslated regions, and intergenic regions (Figure 1.2).
Coding SNPs, which occur in the coding (exonic) region of genes, are of two
types; synonymous and non-synonymous. Synonymous SNPs change a codon
specifying an amino acid into another that codes for the same amino acid.
Therefore, there is no change in the amino acid sequence of the protein. Non-
synonymous SNPs are further sub-classified into missense and nonsense SNPs.
While missense SNPs change the codon specifying an amino acid to another
specifying a different amino acid, nonsense SNPs change the codon specifying an
amino acid to a stop codon.
Figure 1.2 Classification of SNPs according to their genomic location.
While majority of the SNPs reside in the intergenic region, genic SNPs comprise
~ 7 million SNPs. Table 1.1 shows the distribution of genic SNPs in the human
genome (dbSNP build 130; May 03, 2009).
3 A T GC
Review of Literature
Table 1.1 Distribution of genic SNPs in the human genome
Genomic Region SNP Count Gene Count
Intron 66,71,226 20,818
UTR 2,00,750 17,830
Missense 1,10,927 20,411
Synonymous 78,310 18,703
Nonsense 3,536 2,431
1.2.3 SNP databases Today, the primary resource of SNPs is dbSNP available at NCBI (Sherry et al.
2001), which currently contains more than 17 million SNPs (dbSNP build 130).
Information on disease-associated SNPs is available from several databases:
Human Gene Mutation Database (HGMD) (Stenson et al. 2009) collates known
gene mutations responsible for human inherited diseases; Genetic Association
Database (GAD) (Becker et al. 2004) is an archive of human genetic association
studies of complex diseases and disorders; Online Mendelian Inheritance in Man
(OMIM) database is a catalog of genetic disorders of inherited diseases mapped to
human genes, and highly penetrant, but rare (MAF < 0.01) mutations (Hamosh et
al. 2005;Rashbass 1995); Swiss-Prot classifies SNPs into disease (disease-
associated SNPs) and polymorphisms (benign SNPs) (Boeckmann et al. 2003);
Human Genome Variation Database (HGVbase) (Fredman et al. 2004) provides a
centralized compilation of summary level findings from genetic association
studies, thus facilitating research into DNA sequence variation and human
phenotypes.
Databases with frequency information of SNPs across different human
populations are also available. The ALelle FREquency Database (ALFRED)
(Cheung et al. 2000) is a database of allele frequencies of DNA variants for
multiple anthropologically defined populations. ALFRED has data on 18068
polymorphisms in 681 populations. Similarly, HapMap (International HapMap
Consortium 2003) characterizes over 3.1 million human SNPs genotyped in 270
individuals from four geographically diverse populations. IGVdb is another major
4 A T GC
Review of Literature
effort that has genotyped individuals from distinct Indian sub-populations (Indian
Genome Variation Consortium 2005;Indian Genome Variation Consortium 2008).
Various resources for visualisation of SNP locations and other genome
annotations have become available in the public domain. One such resource is the
UCSC Genome Browser (Kuhn et al. 2009), wherein a large array of annotations
has been assembled. The other primary genome resource is Ensembl (Birney et al.
2006;Hubbard et al. 2009). Within Ensembl, users can visualise variation in and
around genes, and their data annotations are of high quality and embedded into an
elegant interface. In addition, a number of (>750) locus specific databases
(LSDBs) are also available in the public domain (Appendix 1.1). Collectively,
these efforts may help researchers find genetic basis to phenotypic variability
among individuals.
1.2.4 SNPs as molecular markers The introduction of molecular markers in genetic analysis has transformed
research in genetic medicine. These molecular markers are the genetic variations
associated with phenotypic traits like predisposition to common diseases and
individual variations in drug responses. The most important property of a good
marker is that it should be sufficiently polymorphic such that a randomly chosen
individual will be heterozygous for it. This is quantitatively assessed by the “mean
heterozygosity” or “polymorphism information content” parameters assigned to
each marker (Botstein, Risch 2003). Moreover, the marker alleles should be easily
and inexpensively genotyped; and should be distributed throughout the genome.
Blood group variants, HLA variants, and electrophoretic variants of serum
proteins were used as molecular markers for human genetic analysis till 1970s.
However, all these had several limitations (Strachan, Read 1999). Since the
discovery of restriction fragment length polymorphisms (RFLPs) in the human
genome, variable DNA sequences were used almost exclusively as genetic
markers. But RFLPs suffer from the major handicap of limited polymorphism;
each has only two alleles, allowing a maximum probability of heterozygosity
equal to 0.5. More polymorphic markers such as variable number tandem repeats
5 A T GC
Review of Literature
(VNTRs) whose notable polymorphism is due to variability in the number of
tandem repeats of length < 100 nt, have difficulties in detection and uneven
distribution in the genome (Jeffreys et al. 1985). The polymorphism of
microsatellites is due to variations in the number of tandem repeats of short
sequence units typically ranging from two to four nucleotides in size. Many have
5-10 alleles and heterozygosity levels of 0.75 or greater. However, the mutation
rate of microsatellites is high, which is a concern for association and linkage
disequilibrium (LD) studies.
SNPs came into use in the twentieth century (Venter et al. 2001) (Sachidanandam
et al. 2001). The major advantage of SNPs is their abundance, theoretically
allowing detection of tighter linkages and associations. Millions of SNPs have
already been identified, corresponding to a frequency of about 1/300 bp. In
addition to frequency, SNPs have the benefit of being more stable and easily
amenable to automation for assessment in large scale experiments (Wang, Moult
2001);(Goddard et al. 2000). Moreover, the extraordinary abundance of SNPs
largely offsets the disadvantage of their being biallelic and makes them the most
attractive molecular marker system developed so far.
1.3 SNPs in Disease Susceptibility
1.3.1 Types of genetic diseases Genetic diseases, one of the major classes of human diseases, based on their
genetic etiology, have been broadly classified into monogenic and polygenic
(complex).
In monogenic diseases (also referred to as ‘Mendelian’ or ‘single-gene’ disorders),
mutations in a single gene are both necessary and adequate to produce the clinical
phenotype. For example, single amino acid substitutions of the human β-globin
gene (HBB) are responsible for β-thalassaemia, sickle cell anemia, and other
haemoglobinopathies, which are the most common genetic diseases of blood
(Doss, Sethumadhavan 2009). Sickle cell disease arises from a mutation
substituting thymine for adenine in the sixth codon of the beta-chain gene, GAG
to GTG, resulting in the substitution of glutamine by valine at position 6 of the Hb
6 A T GC
Review of Literature
beta chain.
On the other hand, common human diseases, such as obesity, diabetes,
cardiovascular disease, cancer and asthma, follow a more complicated inheritance
pattern, and are proving much harder to analyze. Difficulties are caused by
incomplete penetrance (a person carrying a predisposing allele may not exhibit the
disease phenotype); genetic heterogeneity (mutations in one or several genes may
result in identical phenotypes); and polygenic inheritance (a trait is controlled by
multiple gene interactions, such that each individual predisposing allele has a low
risk factor and shows weak correlation with the disease trait). In addition,
environmental factors may also play an important role in shaping disease
phenotypes (Yue et al. 2005). Such disorders are referred to as ‘complex’ or
‘multifactorial’ disorders.
The identification of genes contributing to the susceptibility and progression of
complex human diseases has become a major focus of genetics research in the
post-genomics landscape (Glazier et al. 2002).
1.3.2 Hypotheses regarding the role of genetic variants in
complex disorders Two contrasting hypotheses have been discussed in the past regarding the role of
genetic variants in complex disorders.
The ‘common disease-common variant’ (CD-CV) hypothesis posits the role of
common variants, with small to modest penetrance (Risch, Merikangas 1996), in
susceptibility to complex traits. The APOE*4 allele (Corbo, Scacchi 1999), which
confers increased susceptibility to Alzheimer disease (Saunders et al. 1993); the
CCR5∆32 allele, which prevents infection by HIV-1 (Dean et al. 1996) and the F5
1691 G -> A allele (also known as FV Leiden) in deep vein thrombosis (Bertina et
al. 1994) are examples of common variants causing a common phenotype in
human populations.
On the other hand, the ‘common disease-rare variant’ hypothesis proposes that a
7 A T GC
Review of Literature
significant proportion of the inherited susceptibility to common complex diseases
may be due to the summation of the effects of a series of low frequency variants
of a variety of different genes, each conferring a moderate but readily detectable
increase in relative risk (Bodmer 1999;Frayling et al. 1998). An important role for
rare variants in inherited multifactorial susceptibility to colorectal cancer has been
suggested by the effects of rare, highly penetrant variants in the APC gene
(Bodmer 1999;Frayling et al. 1998).
On the whole, concerted efforts are being made by various researchers to
determine the significance of common and rare variants in complex disorders.
1.4 SNPs in Pharmacogenomics There is wide variability in the response of individuals to standard doses of drug
therapy. This is an important problem in applied medicine, where it may lead to
therapeutic failures or adverse drug reactions (Figure 1.3).
Figure 1.3 Different patients respond to drugs differently.
Knowledge of human SNPs promises to uncover the cause of this inter-individual
8 A T GC
Review of Literature
variability in drug response. Pharmacogenomics is a science that explores the
ways by which variations in genes may be used to predict a patient’s response to a
particular drug (Roses 2004).
1.4.1 SNPs associated with variable drug response A number of variants have been shown to be associated with variable drug
response among individuals (Appendix 1.2). Mentioned below are a few examples
of SNPs shown to be associated with differential drug response.
• Inhaled corticosteroids (e.g., budesonide) are one of the most commonly
used therapeutic agents in treatment of asthma (Bruni et al. 2009;Humbert
et al. 2008). Gene encoding TBX21 (transcription factor T-box expressed
in T cells) has been implicated in asthma pathogenesis. C allele of a
missense SNP in this gene (rs2240017, His33Gln) has been shown to be
associated with significant improvement in airway responsiveness to
budesonide in asthmatics (Tantisira et al. 2004).
• The β(2)-adrenergic receptor (ADRB2) is the target for β (2)-agonist drugs
used for bronchodilation in asthma and other respiratory diseases. The
genotype at nucleotide position 46 in β(2)AR gene of asthmatic patients
has been shown to be significantly associated with his responder status to
salbutamol treatment (Bhatnagar et al. 2005).
• Inter-patient variability in blood pressure response to β-blocker
monotherapy is well-known. Two common β(1)-adrenergic receptor
polymorphisms (ADRB1), rs1801252 (A allele) and rs1801253 (C allele),
have been associated with good antihypertensive response to metoprolol
and carvedilol in patients with hypertension (Johnson et al. 2003).
• Catechol-o-methyl transferase (COMT) is a strong candidate for
therapeutic response to antipsychotic medication. G allele encoding valine
at position 158 in this gene (rs4680) was found to be over-represented in
poor responders of risperidone (Gupta et al. 2009).
• Therapy with statins lowers total and low-density lipoprotein (LDL)
cholesterol, and has proven to be highly effective for cardiovascular risk
reduction. However, there is wide variation in inter-individual response to
statin therapy. C allele of a missense SNP (rs20455, Trp719Arg) in
9 A T GC
Review of Literature
kinesin-like protein 6 (KIF6) has been shown to be associated with
improved response to statin drugs, including atorvastatin, pravastatin,
rosuvastatin and simsvastatin (Iakoubova et al. 2008).
• Inter-individual variability in pain relief by morphine has been shown to
be significantly (P<0.0001) associated with the SNP rs1799971 in OPRM1
gene (encoding µ-opioid receptor), which is the primary site of action for
morphine, with A allele showing better response to morphine (Campa et
al. 2008).
All these studies provide us a good insight into the role of SNPs in variable drug
response, and their plausible role in pharmacogenomics.
1.4.2 SNPs associated with adverse drug reactions Some individuals develop adverse effects to a particular drug, while others do not.
This inter-individual variability has been attributed to the presence of SNPs
(Appendix 1.3). Following are some of the examples indicating the role of SNPs
as predictors of adverse drug reactions:
• 5-Fluorouracil (5-FU) is a drug given for the treatment for some types of
cancer, including colon, breast, stomach, and esophagus cancer.
Dihydropyrimidine dehydrogenase (DPYD) plays an important role in the
metabolism of 5-FU. The incidence of chemotherapeutic toxicity (middle-
severe nausea and vomiting) has been found to be significantly higher in
gastric carcinoma and colon carcinoma patients with C allele of SNP
rs1801265, & G allele of SNP rs1801159 in DPYD (Zhang et al. 2007).
• A allele of SNP rs2075252, and T allele of SNP rs4668123 in LRP2 gene
(Low-density lipoprotein receptor-related protein 2) were found to be
associated with higher incidence of ototoxicity (hearing loss) in patients
treated with cisplatin, a platinum-based chemotherapy drug used to treat
various types of cancers (Riedemann et al. 2008).
• Cyclosporine, one of the immunosuppressive drugs usually given in renal
transplant cases, has been found to be associated with gum hyperplasia in
some patients. This toxicity has been shown to be associated with A allele
of SNP rs231775 in CTLA4 (Cytotoxic T-lymphocyte antigen 4) in these
10 A T GC
Review of Literature
cases (Kusztal et al. 2007).
• Methotrexate, an antifolate chemotherapeutic agent, is widely used, alone
or in combination with other drugs, in the treatment of a number of
hematologic malignancies (Robien et al. 2005;Stern, Raizer 2005;Sterba et
al. 2005;Colozza et al. 2006) as well as benign cancers (Grim et al.
2003;Choy et al. 2005), especially non-Hodgkin’s lymphomas (NHL).
However, its effectiveness is limited by toxicity against normal tissues,
particularly towards gastrointestinal epithelium, bone marrow and liver
(Gorlick, Bertino 1999;Ulrich et al. 2002). T allele of SNP rs1801133 in
MTHFR has been associated, and may explain, the toxicity and variable
outcome of methotrexate therapy in patients with NHL (Gemmati et al.
2007).
In view of the above observations, it seems reasonable to believe that SNPs might
play a crucial role in identifying inter-individual variability in drug response, as
well as decreasing the risk for unexpected toxicities.
1.5 SNPs in Population-Based Studies Human subpopulations are known to differ in the susceptibility to the diseases,
response to drugs, and also in the allele frequency distribution of SNPs. For
example, in the United States, asthma prevalence and mortality are the highest
among Puerto Ricans and the lowest among Mexicans (Choudhry et al. 2006).
Similarly, variability in response to salbutamol and albuterol has been observed
among asthmatics in Indian and Puerto Rican populations, respectively (Kukreti et
al. 2005;Choudhry et al. 2005). Thus, when the population under study consists of
a mixture of two or more subpopulations that have different allele frequencies
(population admixture/ stratification) and disease risks, associations between
genotype and outcome may be spurious. Hence, it becomes crucial to uncover the
heterogeneity, if any, existing in the population under study.
Two different views have been proposed relating the distribution of a SNP across
populations with disease-association. The CD-CV hypothesis, as previously
discussed in section 1.3.2, proposes that risk alleles for common complex diseases
11 A T GC
Review of Literature
are common (i.e. ≥ 5%). Thus, they are likely to be found in multiple human
populations, rather than being population specific (Lander 1996;Chakravarti
1999;Reich, Lander 2001;Pritchard, Cox 2002). However, Ioannidis et al, from a
meta-analysis of disease-association studies, proposed that the frequencies of
disease-associated alleles show large heterogeneity between races (Ioannidis et al.
2004). Thus the inquest of whether risk alleles discovered in one population
account for disease prevalence across all human populations, still remains
unanswered.
The International HapMap Project (International HapMap Consortium 2003), till
date, has genotyped over 3.1 million human SNPs in individuals from four
geographically diverse populations (Yoruba in Ibadan, Nigeria; Japanese in
Tokyo; Han Chinese in Beijing; and Utah residents with northern and western
European ancestry). This may help in understanding the patterns of common
genetic diversity in the human genome in order to accelerate the search for the
genetic basis of human diseases.
Moreover, the Indian Genome Variation (IGV) Consortium (Indian Genome
Variation Consortium 2005;Indian Genome Variation Consortium 2008) has
shown the existence of heterogeneity among 55 Indian populations, and also
identified clusters of sub-populations with substantial genetic homogeneity. This
may be instrumental in careful design and analysis of association studies across
Indian populations. As an example, a strong allelic/genotypic association of a
missense SNP (rs1042713; R16G) in the β2-adrenergic receptor (ADRB2) with
response to salbutamol in the Indian population has been observed (Kukreti et al.
2005). However, the SNP shows substantial heterogeneity across Indian
populations, with AA genotype frequencies ranging from 0.048 in AA-C-IP4 to
0.69 in DR-S-LP3 (Figure 1.4), which could be associated with the variable
response to salbutamol among Indian asthmatics (Kukreti et al. 2005).
12 A T GC
Review of Literature
Figure 1.4 Color composite map of genotype frequency of rs1042713. Red:
genotype frequency of AA (poor responder); blue: genotype
frequency of GG (good responder); green: genotype frequency of
AG (Indian Genome Variation Consortium 2008).
These studies provide a framework for designing future epidemiological studies to
identify populations with differential disease susceptibility and variable response
to a given drug or a class of drugs.
1.6 Functional Missense SNPs Missense SNPs, which lead to substitution of an amino acid by another amino
acid, are the most pertinent to human inherited diseases (Stenson et al. 2003).
According to the 2009 release of HGMD (Krawczak et al. 2000), non-
synonymous SNPs account for more than half of all genetic polymorphisms
known to cause inherited diseases (Table 1.2).
13 A T GC
Review of Literature
Table 1.2 Relative frequencies of types of mutations underlying disease
phenotypes.
Change Number % of total
Missense/nonsense 49806 56.4
Splicing 8548 9.7
Regulatory 1459 1.7
Small deletions 14063 15.9
Small insertions 5751 6.5
Small indels 1295 1.5
Repeat variations 267 0.3
Gross insertions/duplications 1053 1.2
Complex rearrangements 772 0.9
Gross deletions 5303 6.0
However, a large number of missense SNPs are ‘benign’ and have minimal impact
on the structure or function of protein; while some are ‘functional’, and may lead
to significant changes in protein properties. These functional missense SNPs may
exert their effect on human physiology through various mechanisms, including
modification of splice sites; inactivation of protein functional sites, such as
catalytic, ligand-binding and post-translational modification sites; alteration of
protein solubility and stability; or affecting the interactions of proteins - thereby
perturbing protein functions, such as, the kinetic parameters of enzymes, signal
transduction activities of transmembrane receptors, and architectural roles of
structural proteins (Rebbeck et al. 2004). For instance, a missense SNP
(Cys260Tyr) associated with hereditary hemochromatosis in HLA-H protein
disrupts a critical disulphide bond (Figure 1.5).
14 A T GC
Review of Literature
Figure 1.5 A missense SNP (C260Y) in HLA-H protein disrupts a critical
disulphide bond, thereby perturbing the structure of this protein.
1.7 Identification of Functional Missense SNPs :
Finding Needles in Haystack? As the amount of genomic information that is available greatly exceeds the
information about the function of variants, it becomes imperative to develop
methods that prioritize the genetic variants to be genotyped in genetic studies. The
increasingly large number of SNPs deposited in the public databases has provided
a platform to perform genome-wide computational analyses of these SNPs and
their relationship with inter-individual variation in susceptibility to complex
disorders and response to drugs.
The initial efforts to understand the patterns of SNPs in the coding regions of
genes were made by Cargill et al (Cargill et al. 1999) and Halushka et al
(Halushka et al. 1999). Since then, progress in comparative genomics, and
evidence that functional elements tend to lie in conserved regions (Carlton et al.
2006), have enabled, to some extent, the prediction of SNPs likely to be
deleterious for the structure or function of proteins, and may therefore lead to
disease. Over the last decade, several methods have utilized the sequence
conservation of a particular amino acid within a family of sequences to predict
whether an amino acid substitution affects protein function. SIFT (Sorts Intolerant
from Tolerant) (Ng, Henikoff 2001;Ng, Henikoff 2003), for example, presumes
15 A T GC
Review of Literature
that critical amino acids will be conserved in the protein family, and so changes at
well-conserved positions are predicted as deleterious. Such evolutionary
information, integrated with other structural and biochemical properties of
functional SNPs, may permit comprehensive prediction of functional SNPs.
The Crescendo method identifies residues that have a higher degree of
conservation than would be expected on the basis of the local structural
environment (Chelliah et al. 2004). Thereafter, known SNPs are mapped onto the
structure of the proteins, and based on the assumption that these additional
restraints may be due to functions mediated by interactions with other molecules,
their effect on function is predicted. For instance, if a residue is a known catalytic
residue or is close to a known binding site, substitutions of this residue are
predicted to affect function.
The LS-SNP database (Karchin et al. 2005) contains predictions of missense SNPs
using features of protein structure, sequence and evolution, specifically SNPs that
interfere with the formation of domain–domain interfaces or have an effect on
protein–ligand binding, based on machine learning techniques.
Predictions of the PolyPhen method are based on empirical rules based on the
sequence, phylogenetic and structural information characterizing the substitution
(Ramensky et al. 2002;Sunyaev et al. 2001).
Several methods have estimated the loss or gain in energy of the protein structure
due to single amino acid substitutions. The Site Directed Mutator (SDM) method
uses a set of conformationally constrained environment-specific substitution tables
(ESSTs) to calculate the difference in the stability scores, analogous to the
difference in free energy, for the folded and unfolded state for the wild-type and
mutant protein structures (Topham et al. 1997). The I-Mutant2.0 method
(Capriotti et al. 2005) uses both sequence and structural information in support
vector machine (SVM) learning to predict protein stability changes upon single
amino acid substitutions, as does the MUpro method (Cheng et al. 2006).
16 A T GC
Review of Literature
A widely used computational technique to predict functional residues, based upon
the evolutionary conservation of sequences, is the "evolutionary trace" (ET)
method (Innis et al. 2000;Lichtarge et al. 1996;Lichtarge, Sowa 2002). In this
method, each residue is ranked by evolutionary importance by comparison with
groups of proteins which originate from a common node in a phylogenetic tree.
The information obtained by the ET method can then be mapped on to known
protein structures, thus allowing us to identify clusters of important amino acids.
All these studies indicate that the need for identification of functional genetic
variants among a vast number of irrelevant ones in which they are immersed has
translated into a need for sophisticated tools to effectively prioritize genetic
variants underlying complex diseases.
1.8 Objectives of the Study In this thesis, we comprehensively study all human missense SNPs reported in
public databases for their functional and demographic significance in complex
disorders and pharmacogenetics. Specifically, following were the objectives of
this work:
1. To identify protein coding exons likely to harbor disease-associated missense
SNPs.
2. To develop an automated classification scheme capable of distinguishing
between functional and benign missense SNPs.
3. To prioritize SNPs for disease-association and pharmacogenetics in Indian
populations.
4. a) To develop a comprehensive database for the analysis of protein coding
exons.
b) To develop a web-server for the identification of functional missense SNPs.
17 A T GC