Upload
dangbao
View
227
Download
3
Embed Size (px)
Citation preview
Chapter 3Chapter 3Chapter 3Chapter 3 Materials and MethodsMaterials and MethodsMaterials and MethodsMaterials and Methods
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 62
MATERIALS AND METHODS
After identifying the objectives of the study, it is important to have a strategically
designed research plan to achieve coherent and methodological accomplishment of the
goals of the study. The present section outlines step by step the conception, planning
and underlying reasoning followed for methodological execution of the study to obtain
meaningful and reliable inferences.
3.1. Selection of the Study Population and Field Area
Prior to the pilot survey, information on various tribal populations of India from
different sources such as books, journals and the internet was collected. Since
population sub-structuring in Indian populations is a unique feature and not many
populations have been investigated to observe the patterns of micro-differentiation
among the subdivided populations (where each subdivision constitutes a Mendelian
unit), the search was initiated for identification of populations bearing this distinct
characteristic. After going through literature review it was found that Chaudhari tribe
from Gujarat (Bhatt, 1985) fulfilled this criterion. Moreover, keeping in view the dearth
of molecular genetics work on tribal populations of Gujarat, Chaudhari tribe from South
Gujarat appeared to be an apt choice. Chaudhari tribe constitutes 3.80% of the total
schedule tribe population of Gujarat (Census of India, 2001) and is mainly restricted to
the Surat district of Gujarat as per 2001 census data which led to selection of the Surat
district as study area. However, talukas for the collection of blood samples and
ethnographic information were finalized after the pilot survey.
3.2. Pilot Survey
A pilot study, also called feasibility study, is a small scale survey designed to test
logistics, gather preliminary information and to establish rapport with the people prior
to a larger, full fledged study, in order to improve the latter’s quality and efficiency. A
pilot survey can reveal loopholes in the design of the proposed research that failed to
meet the empirical field work conditions, thereby giving the opportunity to the
3
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 63
researcher to modify the methodology accordingly and to achieve successful fieldwork.
A pilot study may address a number of logistic issues. As a part of research strategy the
following information can be gathered prior to the main study.
1. Distribution of study populations
2. Adequacy of sampling frame
3. Climatic conditions of the area
4. Probability estimation of expenditure and duration of subsequent field work
5. Rapport establishment
6. The response rate and cooperation from the locals including study populations and
officials
7. The suitability and adequacy of the methodology to be followed.
A pilot study is normally small in comparison with the main study and therefore can
provide only limited information on the source and magnitude of troubles that one
might face during the main field work. But, it provides vital information on the
efficiency of the proposed procedure and results in improvements of the research design
prior to the main survey.
For the present work, pilot survey was undertaken from 5th
September to 22nd
September, 2008 for getting an overview of the population distribution, to check field
conditions and for assessing the feasibility of research and data collection in the area.
During the pilot survey sincere attempts were made to collect information on primary
health centers (PHCs) and hospitals, tribal community centers, educational institutes,
tribal welfare centers, missionaries and non government organizations (NGOs),
museums and science centers. Several government offices were also visited to appraise
them about the purpose of the visit and to seek help required during the data collection.
The various officers contacted were the Collector, District Development Officer, Chief
District Health Officer, Tribal Development Officer, Taluka Development Officers,
Principals of the various schools and colleges.
Valsad Raktdan Kendra, a Voluntary Blood Bank and Haematology Research Centre,
working in the health sector in the area was also approached for help. The blood bank
has all the facilities for the collection of blood, its storage and processing. The bank
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 64
also has the laboratory facilities for carrying out biochemical analysis of the blood.
Consent and permission for procuring help necessary for the collection of blood
samples in the subsequent field trips was taken from Dr. Y. Italia, Honorary Secretary
of the blood bank and also the Honorary Director of Sickle Cell Anemia Control
Program in Gujarat. Dr. Y. Italia, laboratory in-charge Mr. Bhavesh and other
laboratory members were cooperative and extended their full support for the proposed
research problem.
Few people were also interviewed during the pilot survey to inquire about the existence
of population sub-structuring among the Chaudhari tribe of Surat district. Some of the
knowledgeable people from the area were contacted and informed about the research
work. Prominent names among them were Dr. Bharat Desai, Girish Bhai Chaudhari,
Ashok Bhai Chaudhari, Manoj Bhai Chaudhari and Dr. Arvind Bhatt.
Dr. Arvind Bhatt is a retired professor in Anthropology from Gujarat Vidyapith,
Ahmedabad. Dr. Bhatt has also worked on the Chaudhari tribe of Surat and found the
existence of population subdivision as a result of adherence to the rule of strict
endogamy by the community members. Girish Bhai, Ashok Bhai and Manoj Bhai
Chaudhari belong to the same community and are involved in the social welfare of
tribal communities in Gujarat. They also told about the existence of division among the
Chaudhari tribe. Dr. Bharat Desai who is an eminent sociologist and a renowned social
worker also provided some valuable literature to support the presence of population
sub-structuring among the Chaudhari tribe of Surat district, Gujarat.
Secondary information on demographic aspects such as population distribution and
population size was also collected during the pilot survey. Census data of the Surat
district was collected to understand the distribution and preponderance of the Chaudhari
tribal population in the region. Ethnographic information on culture, food habits, and
social practices especially, mating system was collected from various libraries,
museums and local people.
From the pilot survey it was gathered that the Chaudhari population is widespread
throughout the Surat district especially in its Mandvi, Vyara, Umarwada and Mahuva
talukas. It was also established that the Chaudhari tribe has four major subdivisions
namely Pavagadhi Chaudhari, Mota Chaudhari, Valvi Chaudhari and Nana Chaudhari.
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 65
Since during the pilot survey it was found that the Chaudhari tribe is subdivided into
four major subgroups, the present study was designed to attempt and apportion genetic
diversity and differentiation among the sub-structured groups of Chaudhari. This can
also be seen as a model of population structure where the subgroups are nested inside
the total population of Chaudhari.
3.3. Collection of the Data
After the initial pilot survey and rapport establishment, the field area was revisited in
the subsequent years to collect both ethnographic details and blood samples from
people of the Chaudhari tribe.
3.3.1. Blood Sample Collection
3.3.1.1. Ethical Clearance
Ethical clearance was obtained from the Ethical Review Committee of the Department
of Anthropology, University of Delhi. The certificate of clearance by ethical committee
is attached as Appendix I.
3.3.1.2. Sampling Procedure
The blood samples were collected from the individuals unrelated up to at least second
cousin level, belonging to the four Chaudhari groups. Three generation pedigree charts
were also prepared to ascertain un-relatedness in all the samples and to reduce the
chances of any kind of bias in sampling. Prior to the blood collection, the purpose and
procedure involved in the study were elaborately explained in groups as well as
individually to the participants. Written informed consents were obtained before the
collection and the blood was collected by a trained medical practitioner. The consent
form used for the study is attached as Appendix II. It was made as informative as
possible for the subject. For collection of blood samples sterilized disposable syringes
were used which were properly disposed after single use with the help of syringe
crusher. 9 ml vacutainers coated with anti-coagulant Ethylenediaminetetraacetic acid
(EDTA) were used to collect 5 ml blood. On each tube serial number, date, person’s
name, age and sex were neatly written to avoid the mixing of samples.
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 66
3.3.1.3. Field Visits for Data Collection
Once the rapport was established, the collection of data was initiated. Both primary and
secondary ethnographic data were collected during all the field works to meet the
objectives of the study. While collecting the blood samples all the necessary precautions
were taken. 5 ml intravenous blood samples were collected by a trained medical
practitioner from randomly chosen individuals, unrelated up to second cousin level,
from the study populations with prior informed written consent.
� First Phase of Field Work (10th
February to 10th
March, 2009)
During the first field work, 27 samples from Nana Chaudhari, 22 samples from Mota
Chaudhari divisions were collected with prior informed written consent. The collected
blood samples were sent to the Biochemical and Molecular Genetics Laboratory of
Department of Anthropology, University of Delhi for the purpose of DNA extraction.
Along with the samples secondary information on social aspects such as mating pattern,
cultural practices, food habits and other rituals was also collected.
� Second Phase of Field Work (21st June to 11
th July, 2009)
The second field trip was conducted from 21st June, 2009 to 11
th July, 2009. During this
period attempts were made to achieve the targeted sample size (50 each) for both Nana
and Mota Chaudhari. Apart from random blood sampling special emphasis was given
on collection of ethnographic account of the tribe. By the end of second field work 52
Nana Chaudhari and 50 Mota Chaudhari samples had been collected. This time the
work of DNA isolation was undertaken in the sickle cell unit of Valsad Raktdan
Kendra, Gujarat.
� Third Phase of Field Work (6th
April to 27th
April, 2010)
The third field work was undertaken in the month of April from 6th
to 27th
April, 2010.
During this field trip Pavagadh region of Panchmahal district was visited to inquire
about the migration of Pavagadhi Chaudhari. Going by the information obtained,
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 67
Mahuva taluka of Surat district was visited to collect blood samples from Pavagadhi
Chaudhari. By the end of the field work, 41 samples had been collected.
� Fourth Phase of Field Work (8th
January to 25th
January, 2011)
The fourth field work was undertaken between 8th
to 25th
January, 2011. In this field
work 50 blood samples were collected from Valvi Chaudhari along with their
ethnographic details.
3.3.1.4. Sample Size
A total of 193 Samples were collected for the present study. 50 blood samples from
Mota Chaudhari, 52 from Nana Chaudhari, 41 from Pavagadhi Chaudhari and 50 from
Valvi Chaudhari were collected randomly from 29 villages of 3 talukas namely
Mahuva, Mandvi and Umarwada of Surat district Gujarat and 4 villages of Vyara taluka
of Tapi district. Table 3.1 presents the distribution of samples according to the area of
sample collection.
3.3.2. Ethnographic Details
Deliberate sampling, which is also called purposive sampling, was used to gather
ethnographical facts from the population under study. Besides the basic information
such as name, gender, age, current residence, ethnicity of the subject, information on
Chaudhari subgroup affiliation was also collected. Census of India, State Gazetteers or
any other local enumeration project has never collected subdivision based data on the
Chaudhari population. These surveys have always addressed the Chaudhari population
as a single unit. Except few studies such as Shah (1964) and Bhatt (1985), no written
record of the existence of internal population subdivision and their adherence to group
endogamy is available. Along with group affiliation, information was also collected on
mating pattern and other social practices. The detailed schedule used for the study is
attached as Appendix III.
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 68
Table 3.1. Distribution of samples according to the area of sample collection
Population Village Taluka District
Pavagadhi Chaudhari
Dungari
Mahuva
Surat
Naladhara
Karchiliya
Mahuva
Nana Chaudhari
Muritha
Mandvi
Kharoli
Satvaov
Kamal Kua
Gantholi
Gordha
Andhatari
Damodia
Utewa
Simbariamba
Ghasiyameda
Naren
Ghotadav
Vyara Tapi Gordha
Kada
Mota Chaudhari
Bilwan Umarwada
Surat
Kimdungara
Mandvi
Luharwad
Isher
Balathi
Peeperwad
Regama
Kamalkua
Devgadh
Amba padi
Vadh
Ladkua
Gharbadar
Valvi Chaudhari Nanicher Vyara Tapi
3.4. Laboratory Analysis
The collected blood samples were subjected to molecular analysis. Three different sets of
markers namely, Autosomal, Mitochondrial and Y chromosomal were screened. The
autosomal markers were analyzed at Biochemical and Molecular Genetics Laboratory of
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 69
Department of Anthropology, Delhi University, whereas mtDNA and Y chromosomal
markers were analyzed at the Anthropological Survey of India, Southern Regional Centre
at Mysore, Karnataka. Techniques used to assess the variation in the above mentioned
genomic sites and the work flow involved are given in Figure 3.1.
Figure 3.1. General overview of laboratory work
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 70
3.4.1. Extraction of Genomic DNA
The first step in the molecular analysis of the collected blood samples was the
extraction of DNA. DNA was extracted using salting-out method (Miller et al., 1988).
The salting-out method is a rapid, safe, inexpensive method that gives high yields of
good quality DNA. The detailed procedure followed for the extraction of DNA has
been given in Appendix IV.
3.4.2. Quantification of DNA
Following extraction, DNA was quantified using NanoDrop instrument, ND-1000. It is
based on the same principle as spectrophotometer but requires much less quantity of
DNA sample to measure nucleic acid concentrations. NanoDrop can measure nucleic
acid concentration and purity of nucleic acid samples up to 3700 ng/ul using 2 µl of
sample. In the present study the DNA concentration of the samples was found within
the range of 250-350 ng/µl.
3.4.3. Polymerase Chain Reaction (PCR)
The targeted genomic regions were amplified using the PCR technique. PCR is an in
vitro method for the enzymatic synthesis of specific DNA sequence using
oligonucleotide primers that hybridize to opposite strands at the regions flanking the
target DNA sequence. It carries out the exponential amplification of target DNA
sequence through repeated cycles of DNA synthesis. Each newly synthesized molecule
of target DNA acts as a template for the synthesis of new target molecules in the next
cycle. The final number of copies of amplicons (amplified targeted DNA) generated is
2n
, where n = number of cycles. The detailed technique of Polymerase Chain Reaction
is given in Appendix V.
3.4.4. Agarose Gel Electrophoresis
Prior to carrying out any further analysis, it was confirmed that the amplification has
taken place and ensured that the amplicons generated were indeed the required ones.
This task was done by comparing the molecular size of the amplified DNA with a
standard DNA ladder using electrophoresis and visualizing the results under UV ray
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 71
transilluminator (Figure 3.2). Depending upon the size of the PCR product
concentration of the gel used was decided. In case of the Alu markers, the genotypes
were analyzed at this stage whereas for RFLPs, PCR products were first digested with
restriction enzymes, followed by gel electrophoresis.
Figure 3.2. Agarose gel showing amplified PCR products
3.4.5. Restriction Digestion
In case of Restriction Fragment Length Polymorphism (RFLP) markers, the amplified
DNA was subjected to restriction digestion. Restriction Digestion is the process in
which a class of enzyme known as Restriction Enzymes (REs) are used to cut DNA into
smaller restriction fragments. These enzymes identify specific sequences (4-6 base pair
long) in the DNA molecule known as restriction sequences (i.e. the site where the
enzyme actually cuts the DNA molecule). On the basis of the presence or absence of the
specific sequence, restriction sites vary and so the fragment lengths of the digested PCR
product are generated which are identified by variable banding patterns. Each enzyme
has its specific temperature of activity.
3.4.6. Genotyping by Agarose Gel Electrophoresis
The genotypes of the digested target regions were determined using Agarose gel
electrophoresis. Gel electrophoresis refers to the technique in which macromolecules-
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 72
either nucleic acids or proteins, are forced across a span of gel, acted upon by an
electrical current. The macromolecules are separated on the basis of size and electric
charge under the influence of an electric field. Their rate of migration through the
electric field depends on the strength of the field, size and shape of the molecules,
relative hydrophobicity of the samples, and on the ionic strength and temperature of the
buffer in which the molecules are moving. Agarose is a natural colloid extracted from
seaweed. It is very fragile and easily destroyed by handling. Agarose gels have very
large "pore" size and are used primarily to separate very large molecules up to 10,000
base pairs. The higher is the concentration of the gel, lower is the porosity, which is
suitable for movement of low molecular weight substances and vice versa. Depending
upon the digested product size, Agarose gels of various concentrations were formed by
suspending dry agarose in aqueous Tris Acetate EDTA (TAE) buffer (composition
given in Appendix VI), then boiling the mixture until a homogenous solution is formed.
Ethidium bromide used in the gel gives fluorescence under UV rays, permitting
visualization of the series of DNA bands spread across the gel according to their size
and viewed against a commercially available ladder which contains DNA fragments of
known size.
3.4.7. DNA Sequencing
There are various methods available for DNA sequencing like chemical degradation,
chain termination method, sequencing by ligation etc. Advances in automation have
opened gates for faster and more reliable automated DNA sequencing technologies.
Owing to its greater efficiency and speed, dye-terminator sequencing is now the
mainstay of automated sequencing. Dye-terminator sequencing is a slight modification
of the Sanger’s chain termination method. It utilizes labeling of the chain terminator
ddNTPs, which permits sequencing in a single reaction. In dye-terminator sequencing,
each of the four dideoxynucleotide chain terminators is labeled with fluorescent dyes,
each with different wavelengths of fluorescence and emission. The dye labeled DNA
fragments are then capillary electrophoresed and a detection system identifies the
labeled bases when they pass through a laser that activates the dye.
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 73
3.4.7.1. Cycle Sequencing
Cycle sequencing is similar to PCR except that only one primer is used in each cycle
sequencing reaction so that the amplification of product is linear not exponential; and
the addition of dideoxynucleotides which interrupt the extension of the DNA strands
when incorporated.
Following usual PCR and gel electrophoresis, 0.5µl of the generated PCR amplicons
were subjected to cycle sequencing reaction with one primer and fluorescent dye
labeled ddNTPs using ABI Prism® BigDyeTM
terminator v3.1 cycle sequencing ready
reaction kits (Applied Biosystems, USA) following the manufacturer’s guidelines.
3.4.7.2. Sequencing Cleanup
One of the most important factors in automated DNA sequencing is clean and pure
templates. For a sequencing reaction to be successful all excess primers, dNTPs, salts
and residual DNA must be removed from the sample. The detailed procedure for
processing the plates prior to sequencing is given in Appendix VII.
3.4.7.3. Sequencing Run
10 µl of Hi-Di formamide was added to each well of the sample plate. The samples were
heated to 96°C and immediately cooled to 4
°C to denature the DNA. Sample information
sheets which contain analysis protocol along with the sample details were prepared and
imported into the data collection software. Prepared samples were analyzed on ABI 3730
genetic analyzer (Applied Biosystems, USA) to generate DNA sequences.
3.4.7.4. Sequence Quality Check
After completion of the sequencing reaction, quality of the generated sequences was
checked by using Sequencing Analysis version 5.2 software (Applied Biosystems,
USA). The Applied Biosystems Sequencing Analysis Software version 5.2 is designed
to analyze, display, edit, save, and print sample files generated using ABI genetic
analyzers. The program has a basecaller algorithm that performs basecalling for pure
and mixed base calls. It provides quality values (QV) for every single base and sample
scores for the assessment of the average quality value of the bases in the clear range
A Genomic Study on the Sub
sequence for the sample. The QV is a per
QVs are calibrated on a scale corresponding to:
Where, Pe is the probability of error.
set to 20 to 50 and typical
samples which didn’t
amplification.
3.4.7.5. Sequence Alignment, Editing and Recording of V
The generated sequences were aligned to
use of SeqScape version
of high quality, characterized by sharp peaks and little to no background
matched against the reference seq
(Figure 3.3). The electropherogram
peak colour. The sequences having a different base and peak
sequence at a given position were considered to
mtDNA, variants were reported in terms of the
Reference Sequence (rCRS
for diverse Y chromosomal sites by
sequence obtained from NCBI,
Figure 3.3. Electrophe
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat
e sample. The QV is a per-base estimate of the basecaller accuracy. The
QVs are calibrated on a scale corresponding to:
QV= –10 log10 (Pe)
is the probability of error. For this study high-quality pure bases QVs were
set to 20 to 50 and typical high-quality mixed bases QVs were set to 10 to 50
samples which didn’t follow the above conditions were re-sequenced after fresh PCR
Alignment, Editing and Recording of Variants
The generated sequences were aligned to their respective reference sequences with the
ersion 2.5 software (Applied Biosystems, USA). The DNA sequence
of high quality, characterized by sharp peaks and little to no background
matched against the reference sequence by carefully observing in
electropherogram is a plot of results where each base
. The sequences having a different base and peak
sequence at a given position were considered to have a variant at that position
variants were reported in terms of the mutations of revised Cambridge
(rCRS) (Andrews et al., 1999). Likewise, variants were identified
for diverse Y chromosomal sites by comparing the generated sequence with
obtained from NCBI, GenBank data base.
Electropherogram showing distinct peak colours for each base
Materials and Methods
74
base estimate of the basecaller accuracy. The
quality pure bases QVs were
quality mixed bases QVs were set to 10 to 50. The
sequenced after fresh PCR
their respective reference sequences with the
The DNA sequence
of high quality, characterized by sharp peaks and little to no background noise, were
carefully observing in electropherogram
each base shows a distinct
. The sequences having a different base and peak than the reference
variant at that position. For
mutations of revised Cambridge
variants were identified
d sequence with a reference
for each base
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 75
3.5. Genomic Regions Analyzed
3.5.1. Autosomal Markers
14 autosomal markers, which included seven Alu Insertion-Deletion (InDel) markers
and seven Restriction Fragment Length Polymorphisms (RFLPs), were screened. The
details pertaining to the technique of PCR such as PCR components, concentrations and
temperature conditions for autosomal markers are given in Appendix VIII and IX.
3.5.1.1. Alu Insertion-Deletion (InDel) Markers
All the Alu InDels considered in the present study were human specific and biallelic
codominant loci. Except for CD4, all other Alu loci were insertion polymorphisms. The
protocols for amplification of the Alu InDel markers were taken up from Majumder et
al. (1999).
3.5.1.1.1. Predicted Variant 92 (PV92)
PV92 Alu insertion belongs to Alu Ya5 subfamily, part of the youngest subfamily Alu Y
of Alu insertions. It is present on chromosome 16.
3.5.1.1.2. Coagulation Factor XIII B (FXIIIB)
This Alu insertion polymorphism is located in intron 10 of FXIIIB gene on chromosome
1 at position 1q31-q32.1. This gene encodes coagulation factor XIII B subunit. Factor
XIII deficiency can result in a lifelong bleeding tendency, defective wound healing, and
habitual abortion.
3.5.1.1.3. D1
D1 belongs to Sb2 Alu family. This Alu insertion polymorphism is located on
chromosome 3 at 3q26.32.
3.5.1.1.4. Apolipoprotein (APO)
This site is adjacent to the APOA1, APOC3 and APOA4 gene cluster and is located at
11q23.3. It is present 4.3 kb upstream of the APOA1 gene.
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 76
3.5.1.1.5. Angiotensin I Converting Enzyme (ACE)
InDel ACE is located in intron 16 of the gene Angiotensin I Converting Enzyme (ACE)
positioned on chromosome 17q23 (Mattei et al., 1989). The gene is also known as
Dipeptidyl carboxypeptidase I (DCPI). The gene plays an important role in the
regulation of blood pressure via the renin-angiotensin-aldosterone system (RAAS).
3.5.1.1.6. Cluster of Differentiation 4 (CD4)
The deletion allele ends 1.665 kb upstream of the CD4 start codon. A cluster of
Differentiation (CD4) gene encodes the T4/leu3 antigen, expressed on the surface of
helper T lymphocytes. The gene is located on chromosome 12 and is composed of 10
exons and spans at least 33 kb (Maddon et al., 1987). CD4 also serves as a receptor for
human immunodeficiency virus (HIV) (Lyerly et al., 1987).
3.5.1.1.7. Plasminogen Activator Tissue (PLAT)
This Alu polymorphism belonging to PV subfamily is present in intron 8 of the
Plasminogen Activator Tissue (PLAT) gene. The gene is also known as Tissue type
Plasminogen Activator (TPA). It is located on chromosome 8 at 8p11.2, spans 36,594
bp and comprises of 14 exons (Degen et al., 1986). In the adult brain, t-PA is highly
expressed in the hippocampus, amygdala, cerebellum, and hypothalamus; regions
regulating biological functions such as learning and memory, emotions, motor
coordination, endocrine function among others. It also plays a role in neuronal
degeneration (Alzheimer's disease) and seizure (Tsirka et al., 1995).
3.5.1.2. Restriction Fragment Length Polymorphisms (RFLPs)
Seven unlinked RFLPs were considered in the present investigation. Their expected
band sizes and protocols for amplification were taken from Eccles Institute of Human
Genetics website (http://www.genetics.utah.edu/~swatkins/pub/RSP_links.html) as
described in Jorde et al. (1995) and Watkins et al. (2001).
3.5.1.2.1. Estrogen Receptor (ESR)
The RFLP under study, a C→T single nucleotide polymorphism (SNP) altering a PvuII
restriction site, is located in intron 1 of the gene ESR1, 0.4 kb upstream of exon 2 on
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 77
chromosome 6. The gene encodes the estrogen receptor (ER) which is a member of a
superfamily of transcription factors. Several RFLPs such as XbaI (exon 2), BstUI (exon
1) and PvuII (intron 1) have been used to report an association between the gene and
breast cancer (Andersen et al., 1994).
3.5.1.2.2. N-acetyltransferase 2 (NAT)
Arylamine N-acetyltransferase 2 or N-acetyltransferase 2 gene encodes a drug-
metabolizing enzyme. The enzyme is found in liver (Jenne, 1965) and intestinal
epithelium (Hickman et al., 1998) and functions to both activate and deactivate
arylamine and hydrazine drugs and carcinogens. The NAT2 locus is located on
chromosome 8p23.1-p21.3 (Hickman et al., 1994). The locus is highly polymorphic and
more than 20 alleles have been reported (Hein et al., 2000). Polymorphisms in this gene
are also associated with higher incidences of cancer and drug toxicity. The
polymorphism under study is a C→T SNP altering KpnI restriction site.
3.5.1.2.3. PSCR
PSCR or D21S13E locus has been localized to 21q11.1-q21. RFLPs such as TaqI
(Stinissen et al., 1990), EcoRI (Stinissen et al., 1990; Pulst et al., 1990a) and HaeIII
(Pulst et al., 1990b) have been identified in the PSCR locus. The site under
consideration in the present study is the TaqI polymorphism.
3.5.1.2.4. 5-Hydroxytryptamine Receptor 2A (T2)
The RFLP under study, a synonymous C→T SNP altering an MspI restriction site, is
located at nucleotide position 102 in the T2 gene (Warren et al., 1993). T2 or HTR2A
gene encodes one of the seven surface subtype receptors, 5-hydroxytryptamine 2A,
which mediates the functioning of the hormone and neurotransmitter Serotonin or 5-
hydroxytryptamine (Frazer et al., 1990). The neurotransmitter serotonin has been
implicated in a wide range of psychiatric conditions (Lucki, 1998). The HTR2A gene
has been assigned to chromosomal region 13q14-q21 (Sparkes et al., 1991).
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 78
3.5.1.2.5. Lipo Protein Lipase (LPL)
Lipoprotein Lipase (LPL) gene encodes lipoprotein lipase, which plays a major role in
lipoprotein metabolism by hydrolyzing core triglycerides of circulating chylomicrons
and very low density lipoprotein (VLDL). Human LPL locus is present on chromosome
8, localized to 8p22 (Sparkes et al., 1987), and comprises of 10 exons. Eighty-eight
variable sites have been identified across the 9.7 kb region by the 3′ end of the LPL
gene in three different populations: African-American, European, and European-
American (Nickerson et al., 1998; Templeton et al., 2000). The most extensively
studied polymorphism sites are HindIII in the 3′ flanking region (Heinzmann et al.,
1987) and PvuII in intron 6 (Oka et al., 1989). The PvuII site is altered by a C→T SNP
and has been considered in the present study.
3.5.1.2.6. Alcohol Dehydrogenase (ADH2)
The Class I Alcohol Dehydrogenase gene ADH1B, previously known as ADH2 is
located in chromosomal region 4q21-q23. The protein encoded by this gene is a
member of the alcohol dehydrogenase family. Members of this enzyme family
metabolize a wide variety of substrates, including ethanol, retinol, other aliphatic
alcohols, hydroxysteroids, and lipid peroxidation products. A C→T SNP in intron 3 of
the gene results in a variable RsaI restriction site that has been taken up in the present
investigation.
3.5.1.2.7. Aminolevulinate, delta-, Dehydratase (ALAD)
ALAD locus located in 9q34 region, encodes the ALAD enzyme that catalyzes the second
step in the porphyrin and heme biosynthetic pathway. Its activity is inhibited by lead and
a defect in the ALAD structural gene can cause increased sensitivity to lead poisoning and
acute hepatic porphyria.A T→C polymorphism altering RsaI restriction site is located 3.4
Kb upstream of the polyadenylation signal and is the RFLP under study.
3.5.2. Y Chromosomal Markers
Y-chromosome is a powerful tool to study genealogies. It is inherited paternally (Jobling
and Smith, 1995), lacks recombination, and carries a wide range of polymorphisms
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 79
(Underhill et al., 2000). It is a large linear molecule (approximately 60 Mb), and
preserves a unique record of mutational events (Hammer, 1994). Y-chromosome
haplotype’s combination has been used as a tool to study human migrations (Hammer et
al., 1998). It contains a simple record of the past that helps elaborate the evolutionary
relationships of modern Y-chromosome (Jobling and Smith, 1995). Two types of
markers are studied in the Y-chromosome, microsatellites and biallelic polymorphic
sites. Microsatellite markers demonstrate high levels of heterozygosity due to the high
mutation rate that allows for the inference of phylogenies among populations (Shriver et
al., 1997; White et al., 1999). Binary markers on the other hand, have a lower mutation
rate, which allows reconstruction of the ancestral state and can preserve the population
specific haplotype information that spans the human population history (Underhill et al.,
1997). Binary markers represent unique event polymorphisms (UEPs) in human
evolution; these events could be single nucleotide polymorphisms (SNPs) or insertion
/deletions at specific sites on the Y-chromosome. They allow identification of deep splits
in the Y-chromosome genealogy. The Y-chromosomal classification and nomenclature
system is maintained and updated periodically (Y Chromosome Consortium, 2002;
Karafet et al., 2008).
In the present study 54 Y UEPs namely, M9, M89, M201, P91, M427, P96, P254, M69,
M52, M82, M36, M97, M39, APT, M145, P143, M216, M356, P92, M45, M207,
M242, P36.2, M346, M173, SRY10831.2, M174, M56, M157, M87, PK5, P98, M124,
M343, M11, M27, P123, M170, M304, M172, M12, M205, M241, M99, M280, M321,
P84, M410, M147, P60, P79, P261, M214, M175 were examined. Their details such as
primer sequence, PCR conditions, position on the Y chromosome and the cycle
sequencing protocol has been given in Appendices X and XI. The phylogeny of these
markers is a perfect tree whose hierarchical structure corresponds to the historical
accumulation of mutations. The hierarchical trees having standardized nomenclature
systems of haplogroups which in turn have haplotype groups carrying specific motifs of
UEPs is given in Figure 3.4. Currently there are 20 haplogroups (A-T) divided into a
number of subhaplogroups (Karafet et al., 2008).
A Genomic Study on the Sub
Figure 3.4. The global Y Chromosome phylogenetic tree illustrating topology of major
Y chromosome haplogroup
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat
The global Y Chromosome phylogenetic tree illustrating topology of major
Y chromosome haplogroups (Source: Karafet et al., 2008)
Materials and Methods
80
The global Y Chromosome phylogenetic tree illustrating topology of major
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 81
3.5.3. Mitochondrial DNA Regions
Mitochondrial DNA (mtDNA) is maternally inherited and lacks recombination which
makes it an ideal tool to study populations through maternal line. The haploid, circular
mitochondrial genome consists of 16,569 base pairs containing 37 densely packed
intronless genes and a short regulatory region, the D-loop. mtDNA accumulates
mutations faster than the nuclear genome. There are no major repetitive elements,
insertions or deletions. The control region of the mitochondrial genome is highly variable,
and it is used to determine the genetic structure and origin of populations (Parsons et al.,
1997). The high rate of base substitution in the control region of the mitochondrial
genome, and the fact that the effective population size of this region is one-fourth of the
nuclear genome (leading to increased genetic drift) allows maternal genealogy to be
constructed with high specificity (Richards et al., 1996; Macaulay et al., 1999). Similar to
Y Chromosome, the ease of reconstructing the phylogeny is the main advantage of
mtDNA. Figure 3.5 presents the simplified mtDNA phylogenetic tree, displaying the
major mtDNA haplogroups identified on the basis of mtDNA variation on both coding
and control regions. The root of the tree is indicated by a star representing the most recent
common matrilineal ancestor of all humans. The L haplogroups are the most deep-rooting
lineages and are African specific indicating the African origin of modern humans.
Haplogroup L3 gave rise to macrohaplogroups M, N and R (the latter itself a subclade of
N), which encompass all variations observed outside Africa. Haplogroup symbols
followed by a star represent all other descendant lineages (besides the ones shown) of a
particular clade, for which no unique alphabetical letters have been reserved.
The mtDNA control region is divided into 2 regions: Hyper Variable Region-I and II
(HVR-I and HVR-II). The numbering of the sequence data was done according to
revised Cambridge Reference Sequence (rCRS) for human mtDNA by Andrew et al.
(1999). In the present study HVR I corresponding to mtDNA sequence 15904 to 16540
and HVR II corresponding to nucleotide positions 70 to 300 were screened using 23
forward (F), reverse (R) and 24 F, R primer sets (Rieder et al., 1998). In addition to
HVR I and II regions, nucleotide position 10400, diagnostic for “M” haplogroup was
also screened using 15F and R primer sets (Rieder et al., 1998). Samples having T
mutation in place of C at 10400 were classified under haplogroup M while others were
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 82
classified under haplogroup N. The detailed protocol followed for the amplification of
mtDNA regions is given in Appendix XII. Its cycle sequencing protocol has been given
along with the cycle sequencing protocol for Y SNPs in Appendix XI.
Figure 3.5. The global mtDNA phylogeny illustrating topology of major mtDNA
haplogroups (Source: van Oven and Kayser, 2008)
3.6. Statistical Analysis
In the field of population genetics the data generated following varied number of
methodologies (Classical or Molecular markers) itself do not make any sense unless
submitted to appropriate statistical analysis. Statistical methods include a battery of
analytical measures required for describing, comparing, interpreting and finally
concluding the data with some generalization about the larger data set. Various
parameters used to assess population variability, structure, affinities or disparities and to
examine what evolutionary forces most significantly contributed in maintaining the
amount of genetic variation in the present investigation have been described below.
3.6.1. Allele Frequency
An allele is defined as one of the two or more alternative forms of a gene or DNA
sequence at specific chromosomal location. Allele frequency is the frequency of an
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 83
allele within a population. Allele frequencies were calculated from the genotype data by
direct gene counting at each locus separately for each population.
3.6.2. Hardy-Weinberg Equilibrium and Bonferroni’s Correction
The allele frequencies were subjected to Chi-square goodness-of-fit test to determine if
they were in Hardy-Weinberg equilibrium within the population.
Bonferroni’s correction was applied to correct for multiple comparisons. It is the
simplest correction of individual p-values for multiple hypotheses testing in order to
maintain an overall significance level α. It was estimated as:
pcorrected = �����������
Where n is the number of Chi square tests performed.
3.6.3. Heterozygosity
Heterozygosity is a measure of the diversity of a polymorphic locus Nei (1973). In other
words it can be defined as the proportion of heterozygotes per locus in a randomly
mating population. The unbiased estimate of heterozygosity for a single locus was
computed using the following formula:
� � 2��1 � � ��
��/�2� � 1�
Where, n is the number of individuals sampled and xi is the population frequency of the
ith
allele at a locus. Average heterozygosity (H) is the average of this quantity over all
loci.
3.6.4. Haplotype Diversity or Gene Diversity
Haplotype diversity is equivalent to the expected heterozygosity for diploid data. It is
defined as the probability that two randomly chosen haplotypes are different in the
sample. It is also denoted as Gene diversity. It was computed using the same formula as
employed for the calculation of heterozygosity (given above). For haploid markers such
as mtDNA or Y chromosomal haplogroups, this measure was calculated by replacing
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 84
2n with n. The method for estimating sampling variance was also same as that for
heterozygosity.
3.6.5. Estimates of Gene Diversity (GST and NST)
One of the methods for apportionment of gene diversity in the total population into its
components is Nei’s (1973) method.
Genetic differentiation is defined as the accumulation of differences in allele
frequencies between completely or partially isolated populations. This differentiation
between populations could be due to operation of diverse evolutionary forces.
Coefficient of gene differentiation denoted as GST, is a very good measure of gene
differentiation. It may be defined as the ratio of the inter-population gene diversity
(DST) to the total gene diversity (HT) among the populations i.e.
GST = DST / HT
Where, DST = HT - HS
HS represents the average expected heterozygosity of subpopulations assuming random
mating within each subpopulation and is calculated as following:
HS=2n (1-∑ ���)/(2n-1)
And HT denotes the expected heterozygosity of the total population assuming random
mating within. It is calculated as follows:
��� 1‐ � ���� � ��/�2� �
NST is an analogue to GST at the nucleotide level (Lynch and Crease, 1990). It is
calculated as the ratio of the average genetic distance between genes from different
populations relative to that among genes in the population at large.
!�� � ∑ "#$ %#$#$ &'
Where, πij are defined as distances between the haplotypes i and j, cij denotes the
covariance between them. νT is the total diversity in population and is defined as:
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 85
(� � ∑ )�*+�+*�*
Where, pi and pj are the frequency of the ith
and jth
haplotypes in population.
Extreme NsT estimates of 0 and 1 indicate zero and complete population subdivision,
respectively.
3.6.6. Wright’s F– Statistics
Hartl and Clark (1989) have illustrated the Wright’s (1921) method to quantify the
inbreeding effect of population sub-structure. Wright (1921) had formulated fixation
index which is defined as the reduction in heterozygosity expected with random mating
at any one level of a population hierarchy relative to another, more inclusive level of the
hierarchy. At the first level, FIS is estimated which is defined as the average difference
between observed and Hardy–Weinberg expected heterozygosity within each
subpopulation due to non-random mating. It can also be interpreted as the correlation
between the states of two alleles in a genotype sampled at random from any
subpopulation.
FIS = �� � �,---/ HS
Where, HS is the average expected heterozygosity of subpopulations assuming random
mating within each subpopulation and HI is the average observed heterozygosity within
each subpopulation.
The next level in the hierarchy is FST which represents the average expected
heterozygosity for subpopulations compared with expected heterozygosity for the total
population.
FST = �� � ��/ HT
Where, HT is the expected heterozygosity of the total population assuming random mating
within subpopulations and no divergence of allele frequencies among subpopulations.
The final level in the hierarchy is FIT, the comparison of the average observed
heterozygosity for subpopulations with the heterozygosity expected for the total
population. This gives the departure from Hardy–Weinberg expected genotype frequencies
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 86
due to the combination of non-random mating within sub populations and divergence of
allele frequencies among subpopulations.
FIT = �� � �,/ HT
3.6.7. Genetic Distance
Genetic distance is the extent of gene differences between populations or species that is
measured by some numerical quantity. Thus, the number of nucleotide substitutions per
nucleotide site or the number of gene substitutions per locus is the measure of genetic
distance. Various measures of genetic distances have been proposed by many scholars.
Nei (1972) proposed a genetic distance measure to estimate the number of genes or
codon substitution per locus between two populations. Using this method pairwise
genetic distance among the populations is estimated as,
D= -ln I
Where,
. � ∑ ��/�0∑ ��� ∑ /��
And xi and yi are the frequencies of ith
allele at a locus in the two populations X and Y
respectively. Standard errors of the genetic distances were computed using Nei and
Roychoudhury’s 1974 method.
3.6.8. Pairwise FST and Slatkin’s Linearization
As mentioned earlier, Wright’s FST is a statistic for measuring genetic differentiation of
populations. This statistic can also be applied to quantify the genetic distance for a pair
of populations. Latter (1972) formulated a better estimate of FST for the case of two
populations and multiple alleles at many loci. The measure denoted by 12 and
equivalent to FST is calculated as following:
12 � ∑ ∑ �3#4�5#4 �67
84�79
2 :1 � ∑ ∑ 3#4;4# 5#4�47 <
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 87
Where mk is the number of alleles at the kth
locus and r is the number of loci examined.
xik and yik are the frequencies of k allele at ith
locus in population X and Y.
To incorporate the population divergence attribute to the pairwise FST, slight
transformation to linearize the distance with population divergence time was suggested
by Slatkin (1995). Slatkin considered a simple demographic model, where two haploid
populations of size N have diverged generations ago from a population of identical size.
These two populations have remained isolated ever since, without exchanging any
migrants. Under such conditions, FST can be expressed in terms of the coalescence
times, which are the mean coalescence time of two genes drawn from two different
populations, and which is the mean coalescence time of two genes drawn from the same
population.
3.6.9. Neighbor Joining Tree
The neighbor-joining (Saitou and Nei, 1987) is a method for reconstructing
phylogenetic trees from evolutionary distance data. The principle of this method is to
find pairs of operational taxonomic units (OTUs [= neighbors]) that minimize the total
branch length at each stage of clustering of OTUs starting with a star like tree. The raw
data is provided as a distance matrix and the initial tree is a star tree. Then a modified
distance matrix is constructed in which the separation between each pair of nodes is
adjusted on the basis of their average divergence from all other nodes. The tree is
constructed by linking the least-distant pair of nodes in this modified matrix. When two
nodes are linked, their common ancestral node is added to the tree and the terminal
nodes with their respective branches are removed from the tree. This pruning process
converts the newly added common ancestor into a terminal node on a tree of reduced
size. At each stage in the process two terminal nodes are replaced by one new node. The
process is complete when two nodes remain, separated by a single branch.
3.6.10. Y Chromosome Haplogroup Assignment
A haplogroup is a group of related lineages defined by single nucleotide polymorphisms
which have accumulated along different lineages. The Y Chromosome Consortium
(YCC) has assigned hierarchical alphanumeric labels, which can be presented
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 88
graphically in the form of a phylogenetic or haplogroup tree (Y Chromosome
Consortium, 2002). In the present study, all the male samples were assigned specific
haplogroups following revised Y chromosomal tree nomenclature system (Karafet et
al., 2008).
3.6.11. Exact Test of Population Differentiation
In order to test the hypothesis of a random distribution of k different haplotypes or
genotypes among r populations, Exact test of population differentiation, as described in
Raymond and Rousset (1995) was carried out. This test is analogous to Fisher’s exact
test on a 2x2 contingency table extended to a r x k contingency table. All potential states
of the contingency table are explored with a Markov chain. A Markov chain is used to
explore the space of all possible tables instead of enumerating all possible contingency
tables. This Markov chain consists of a random walk in the space of all contingency
tables. It is done is such a way that the probability to visit a particular table corresponds
to its actual probability under the null hypothesis of linkage equilibrium. During this
random walk between the states of the Markov chain, the probability of observing a
table less or equally likely than the observed sample configuration, under the null
hypothesis of panmixia is estimated.
3.6.12. mtDNA Haplogroup Assignment
Based on the variations observed in mitochondrial DNA, putative haplogroups were
assigned to the samples using HaploGrep software (Kloss-Brandstaetter et al., 2011).
HaploGrep is a web application based on Phylotree, a periodically updated phylogenetic
tree of global human mitochondrial DNA variation. The tree is based on both coding
and control region mutations and provides haplogroup nomenclature for designation of
haplogroups (van Oven and Kayser, 2008).
3.6.13. Number of Polymorphic Sites
The number of polymorphic sites, also denoted as the number of segregating sites is
defined as any of the n nucleotide sites that maintain two or more nucleotides within the
population.
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 89
3.6.14. Nucleotide Diversity
Nucleotide diversity, also known as the average pairwise differences in a sample of
DNA sequences, is defined as the probability that two randomly chosen homologous
sites are different.
It is the average number of nucleotide differences per site between two sequences (Nei,
1987). ) � �
� � 1 � �� �*
�* )�*
Where, n equals the number of sampled sequences, xi and xj are the frequencies of ith
and jth
sequences and πij is the proportion of nucleotide differences between them.
3.6.15. Tajima’s Test of Selective Neutrality
Tajima’s D is one of the common measures of selective neutrality. The statistic is
applied for distinguishing population expansion from constant population size. The null
hypothesis of the test is that the sample of DNA sequences was taken from a population
with constant effective population size and selective neutrality of all mutations. Natural
selection operating on DNA sequences as well as changes in effective population size
through time lead to the rejection of this null hypothesis. The test uses the nucleotide
diversity and the number of segregating sites observed in a sample of DNA sequences
to make two estimates of the mutation parameter θ, both of which are expected to be
approximately equal under the standard coalescent model where all mutations are
selectively neutral and the population maintains a constant size through time (Tajima,
1989). The test statistic D is estimated as:
D = =>�=?
@&A7�=>�=?�
Where, θπ is equivalent to the mean number of pairwise differences between sequences
(π) and θS is based on the number of nucleotide variant sites. Negative and statistically
significant score is indicative of larger values for θS relative to θπ signifying the potential
effects of population expansion and balancing selection. Positive or statistically non-
significant negative scores may indicate the effects of shrinking of the effective
population size or population bottlenecks as well as strong directional selection.
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 90
Differences in the shape of genealogies are the basis of Tajima’s D test. The coalescent
process with neutral alleles and constant effective population size results in
approximately the same total length along interior and exterior branches in a genealogy.
Longer external branches in a genealogy can also be caused by population structure if
the DNA sequences compared are sampled from different demes. Shrinking effective
population size or population bottlenecks as well as strong directional selection lead to
increasing probabilities of coalescence towards the present time and shorter external
branches. Substantial changes to the genealogical branching pattern lead to differences
in π and S that cause Tajima’s D to differ from zero.
3.6.16. Fu’s FS
Another test for measuring the selective neutrality and changes in the population size
over time is Fu’s Fs (1997). Like Tajima’s D, the test is also based on the infinite-site
model without recombination but utilizes data from the haplotype distribution. This test
statistic is based on the equation:
FS = B� C �DE��DF
Where, S′ is the probability of observing a random neutral sample and defined as
GH � IJ�K L KMNO|Q � Q"�
Where, K is equal to the number of alleles similar or smaller than the observed value
given θπ and Fs is the logit of S′. Statistically significant negative scores indicate an
excess of alleles, a signature of population expansion. This test is considered less
conservative than Tajima’s D and is more sensitive to large population expansions
expressed as large negative numbers whereas positive numbers indicate populations
impacted by genetic drift.
3.6.17. Mean Number of Mismatch, Mismatch Distribution and Raggedness Statistic
Mean number of mismatch is defined as the average number of nucleotide sites that
differ between unique pairs of DNA sequences (Tajima, 1983). Mismatch distribution,
also known as distribution of pairwise differences, is defined as the frequency
distribution of the number of nucleotide sites that differ between all unique pairs of
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 91
DNA sequences in a sample from a single species. The mismatch distribution is
constructed by counting the number of differences between each pair of subjects and
then using histograms or scatter plots to display the frequencies of sites that differ. The
mismatch distribution has distinct patterns depending on the demographic history of the
population (Slatkin and Hudson, 1991; Rogers and Harpending, 1992). A smooth,
unimodal distribution is indicative of population expansion whereas a ragged,
multimodal distribution indicates constant population size over a long time period. To
distinguish between these two types of distributions another measure, raggedness
statistic (r) is used (Harpending, 1994). Raggedness statistics is sum of the squared
difference between neighboring peaks, and is estimated by the equation below:
R � ���� � ���E��STE
�UE
Where, d is the greatest number of differences between alleles, xi is the relative
frequency of i pairwise differences.
3.6.18. Median Joining Tree
An important method for visualization of haplotype data is the construction of a
phylogenetic network of haplotypes, which allows inspection of their population and
allele frequency distributions. For haplotypes without recombination or recurrent
mutations, the analysis produces a perfect tree. In this study, median-joining networks
of mtDNA based on both HVR I and II, were constructed with the NETWORK
software (fluxus-engineering.com, Bandelt et al., 1999).
3.6.19. Principal Component Analysis (PCA) and Multidimensional Scaling (MDS)
PCA and MDS are methods for displaying complex data sets in fewer dimensions in
order to extract and visualize the most important trends. The first principal component
(PC) is an eigenvector fitted to the correlation or covariance matrix of the data (e.g.
allele frequencies of populations) that explains most of the observed variation. The
following PCs are always perpendicular to the preceding component. The eigenvalues
of the PCs express how much of the variation they account for. Another method for
visualizing complex data, classical MDS, takes the data as a matrix of dissimilarities,
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 92
such as genetic distances between individuals or populations, and produces an output of
distances in the desired number of dimensions so that the deviations from the original
distances are minimized.
3.6.20. Population Structure and Gene Flow
To determine the population structure, a regression analysis of heterozygosity on
genetic distance was carried out with the method described by Harpending and Ward
(1982). The model estimates the relative roles of genetic drift versus gene flow in
causing population differentiation. The model assumes that the islands exchange genes
among themselves and each receives in addition, a small constant input of genes from a
continent, the same proportion to each island. In this model, genetic heterozygosity is
negatively correlated with genetic distances from the gene frequency centroid (the
overall mean gene frequencies of the population system). If the linearity between
genetic distance of an island from the gene frequency centroid and the relative
homozygosity of the islands holds then the exchange with populations outside the
region is same for each island. If the gene flow from outside the region varies in
amount from island to island, this linear relationship no longer holds. Those populations
that have undergone systematic migrations will show greater heterozygosity than
predicted by the regression line, while those groups that are more isolated will exhibit
lower than predicted heterozygosity.
3.6.21. Analysis of Molecular Variance
Analysis of molecular Variance (Excoffier et al., 1992) is a method of estimating
population differentiation directly from molecular data and testing hypothesis about such
differentiation. Variety of molecular data such as restriction fragment length
polymorphism data, direct sequence data, haplotype or haplogroup frequency data can be
analyzed using this method. The analysis is based on analyses of variance of gene
frequencies, but it also takes into account the number of mutations between molecular
haplotypes (which first needs to be evaluated). Populations are first grouped into different
clusters in order to define a particular genetic structure that will be tested. A hierarchical
analysis of variance partitions the total variance into covariance components due to intra-
individual differences, inter-individual differences, and/or inter-population differences.
Materials and Methods
A Genomic Study on the Sub-Structured Chaudhari Tribe of Southern Gujarat 93
3.7. List of Softwares used
1. DISPAN (Ota, 1993)
2. POPGENE version 1.31 (Yeh et al., 1997)
3. ARLEQUIN version 3.5 (Excoffier et al., 2005)
4. DnaSP version 5.0 (Rozas and Rozas, 1999)
5. MEGA version 4.0 (Tamura et al., 2007)
6. HaploGrep (Kloss-Brandstaetter et al., 2011)
7. NETWORK version 4.1.0. Available at www.fluxus-engineering.com
8. SPSS version 16.0
3.8. Limitations of the Study
As with any investigation one is bound to encounter some difficulties that cannot be
completely overcome in the short course of the research being conducted. Some of the
problems faced during the study were:
1. One of the problems faced during the field work was that of transportation. Both
private and public transport services were very infrequent, therefore, the time period
for working in the field was restricted.
2. Scattered settlement of the villages visited was another factor that posed a serious
problem during the fieldwork.