18
Analysis of Meiotic Recombination Hotspots: A Bioinformatics Approach Shipra Agrawal Institute of Bioinformatics and Applied Biotechnology Bangalore, India Nishant Koodali Thazath Department of Molecular Biology and Genetics Cornell University, USA Manchanahalli Rangaswamy Satyanarayana Rao Molecular Biology and Genetics Unit Jawaharlal Nehru Center for Advanced Scientific Research, India 1 Introduction Meiotic recombination hotspots are localized chromosomal regions of the order of one to two kilobases of DNA, which have recombination rates that are substantially higher than the genome average. There is great interest in determining their distribution in humans because they punctuate the genome into haplotype blocks that can reduce the number of markers needed for disease gene mapping [Goldstein et al., 2001; Gabriel et al., 2002; Kauppi et al., 2004]. The concentration of recombination events within the hotspots also facilitates greater understanding of the recombination process [Nishant and Rao, 2006; Petes, 2001]. Recombination hotspots are highly dynamic and show little conservation between species [Boulton et al., 1997; Ptak et al., 2004; Winckler et al., 2005]. Meiotic recombination hotspots have been extensively studied in yeast (Saccharomyces cerevisiae). They have also been globally mapped in the yeast genome using biochemical approaches, providing invalu- able knowledge on the distribution and properties of hotspots [Gerton et al., 2000; Petes, 2001]. In humans, sperm typing approaches have been used to infer the presence of meiotic hotspots, especially in the MHC (Major histocompatibility complex) region [Jeffreys et al., 2001; Jeffreys and May, 2004]. Sperm typing methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions in the haplotype block structure have been used to identify 25,000 pu- tative human recombination hotspots, although few have been verified experimentally [Myers et al., 2005]. Studies on the distribution of hotspots in the MHC region and the large-scale identification of recombination hotspots from haplotype analysis have estimated that the human genome is likely to contain as many as 50,000 meiotic hotspots. In other model systems (e.g., mouse), properties of specific hotspots, like the PSMB9 hotspot, have been well studied using a combination of sperm typing and biochemical approaches [Baudat and deMassy, 2007;

Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

Analysis of Meiotic Recombination Hotspots:A Bioinformatics Approach

Shipra AgrawalInstitute of Bioinformatics and Applied Biotechnology

Bangalore, India

Nishant Koodali ThazathDepartment of Molecular Biology and Genetics

Cornell University, USA

Manchanahalli Rangaswamy Satyanarayana RaoMolecular Biology and Genetics Unit

Jawaharlal Nehru Center for Advanced Scientific Research, India

1 Introduction

Meiotic recombination hotspots are localized chromosomal regions of the order of one to two kilobases ofDNA, which have recombination rates that are substantially higher than the genome average. There is greatinterest in determining their distribution in humans because they punctuate the genome into haplotype blocksthat can reduce the number of markers needed for disease gene mapping [Goldstein et al., 2001; Gabriel etal., 2002; Kauppi et al., 2004]. The concentration of recombination events within the hotspots also facilitatesgreater understanding of the recombination process [Nishant and Rao, 2006; Petes, 2001]. Recombinationhotspots are highly dynamic and show little conservation between species [Boulton et al., 1997; Ptak et al.,2004; Winckler et al., 2005].

Meiotic recombination hotspots have been extensively studied in yeast (Saccharomyces cerevisiae).They have also been globally mapped in the yeast genome using biochemical approaches, providing invalu-able knowledge on the distribution and properties of hotspots [Gerton et al., 2000; Petes, 2001]. In humans,sperm typing approaches have been used to infer the presence of meiotic hotspots, especially in the MHC(Major histocompatibility complex) region [Jeffreys et al., 2001; Jeffreys and May, 2004]. Sperm typingmethods are too laborious for generating genome-wide hotspot maps; hence, population genetic approachesthat infer hotspots from disruptions in the haplotype block structure have been used to identify 25,000 pu-tative human recombination hotspots, although few have been verified experimentally [Myers et al., 2005].Studies on the distribution of hotspots in the MHC region and the large-scale identification of recombinationhotspots from haplotype analysis have estimated that the human genome is likely to contain as many as50,000 meiotic hotspots.

In other model systems (e.g., mouse), properties of specific hotspots, like the PSMB9 hotspot, have beenwell studied using a combination of sperm typing and biochemical approaches [Baudat and deMassy, 2007;

Page 2: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

134 Chapter 10 – Analysis of Meiotic Recombination Hotspots – A Bioinformatics Approach

Buard et al., 2009; Grey et al., 2009; Guillon et al., 2002]. Genome-wide information on the distribution ofmeiotic hotspots is however lacking in the mouse model system because methods, which are analogous topopulation genetic approaches used in humans, is harder to apply given the inbred nature of laboratory micestrains that create unusually large haplotype blocks [Kauppi et al., 2007]. These methods are likely to bemore useful as increasing SNP information across mice strains become available, although they have alreadybeen used to identify two mouse crossover hotspots, M1 and M2, located at the haplotype block boundariesin chromosome 1 [Kauppi et al., 2007].

Comparison of the properties of hotspots identified in yeast, man, and mouse have revealed similarfeatures, such as a narrow 1-2 kb zone for double-strand break (DSB), crossover formation [Guillon etal., 2002; Jeffreys et al., 2001; Jeffreys et al., 2005; Petes, 2001; Yauk et al., 2003], and the associationof several sequence features, such as repetitive elements, gene promoters, and certain DNA motifs withmeiotic hotspots. A systematic cataloging of meiotic hotspots from model systems, such as yeast, man, andmouse, in specialized databases is important in comparing new hotspots with existing ones and in identifyingsequence features associated with large-scale hotspot data. A number of bioinformatics algorithms can beused to analyze and identify conserved DNA motifs, repetitive elements, and gene promoters/other sequenceembedded features, which have been reported to stimulate recombination. Such motifs, which are sharedby a large number of hotspots, may serve as potential signal(s) in determining distribution and therefore thepredictability of meiotic recombination hotspots in the genome.

This chapter gives a comprehensive description of different bioinformatics approaches, which are cur-rently used to analyze and catalog large-scale human hotspot data through our own case study on humanmeiotic hotspots. It includes a description of the database design principles that we used in developing abioinformatics resource for human hotspots (HUMHOT) [Nishant et al., 2006]. We demonstrate the useof bioinformatic tools in finding conserved DNA motifs in a sample case study involving the analysis of139 experimentally determined and 7, 383 putative hotspots. To facilitate such bioinformatic analysis inan efficient and automated mode, the chapter also incorporates design issues relevant to the developmentof a bioinformatics analysis platform, which must be integrated into the database [Agrawal S. et al., 2008;Sailakshmi and Agrawal, 2008].

2 Organizatiopn of Meiotic Hotspot Information into Online Databases

The literature reports a huge dataset of recombination hotspots in model systems, such as yeast (S. cerevisiae)and man [Gerton et al., 2000; Jeffreys et al., 2001; Jeffreys et al., 2004; Li and Stephens (2003); Myers et al.,2005]. It is presumed that if the entire data is compiled as a systematic resource of hotspots, it would be veryuseful in performing analysis pertaining to the evolution and distribution of these meiotic hotspots withinand across organisms. The manually curated database can provide detailed information on each hotspot. Itwill also facilitate several types of bioinformatics analysis on the data.

There is no publicly available data resource for S. cerevisiae meiotic hotspots; however the HapMapwebsite (http://www.hapmap.org/downloads/index.html.en) has the latest information on computa-tionally identified human hotspots. In this study, we describe general principles for the development of bio-logical databases. We also discuss the structure and usefulness of HUMHOT, a database on human meiotichotspots. We have developed this database for experimentally determined and putative meiotic hotspots.This concept was conceived to facilitate easy access to the entire human hotspot data and to perform theaforementioned bioinformatics analysis on meiotic hotspot sequence data.

Page 3: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

S. Agrawal, N.K. Thazath & M.R.S. Rao 135

2.1 Principles of Database Design and Considerations

The primary challenges in any biological database system lie in understanding programming logic and pre-sentation. Mostly, there are two major challenges in the implementation of biological databases [Birney& Clamp, 2004]. First, biological data is complex and variable; therefore, the actual biological interpreta-tion of data, data size, and other related fields stored in a database change over time. The database shouldtherefore be designed to adapt to all kinds of changes in terms of new information and changes in data size,and to build new relationships between existing and new biological data fields. Second, database develop-ers and bioinformatics experts should understand both the biological problems and related computationalrequirements.

Our experience on biological database development suggests that one should primarily consider thefollowing design issues, while developing a biological/hotspot database [Agrawal et al., 2008; Nishant et al.,2006; Shipra et al., 2006; Sailakshmi & Agrawal, 2008].

• The entire database design should be based on small and relational data tables, which are finally con-nected to the MASTER TABLE. The MASTER TABLE of a database has the core information on themain subject of the database. This table also contains keys to get easily connected to other supplemen-tary data tables related to the main subject.

• The developer should connect all data tables with the MASTER TABLE using Relational DatabaseManagement Systems (RDBMS). The programming of the entire database should be done using simpleprogramming languages, like C, C++, Java, Perl, Python, or at the most, Lisp. The relational databasesystem can be built using ANSI SQL and ORACLE, among others.

• One should focus on developing a multiple query system (Figure 1) for accessing and manipulatingstored data, for presenting search results, and for automated updating of the database and of biologicalinferences from data.

• The database should facilitate access to all the important and updated information on database contents,which may be harder to access by searching the literature.

• One should develop automated scripts/programs that can do some data mining in the database. Thisfacilitates the incorporation of additional relevant information into the database.

• On code writing and designing the schema:

– One must decide upon a single principal data model description. For example, either one ofthe following could be used: SQL-DDL, XML-DTD, XML schema, or UML. The data modeldescription or data representation model is created to provide identification to all data enti-ties/elements within a data system, including their attributes and the logical relationships acrossdata elements.

– Focus on the core aspects of the data and of analysis. The organization/connectivity relationshipacross all data/tables should be done in all possible ways. This will allow the entire biologicalinformation from the different data tables to be projected in a composite way when required byusers.

– Design appropriate data models and tables.

– While writing the software, one should focus on the simplicity of the code and code re-usability.Usually, the object-oriented programming style is used to accomplish this purpose.

Page 4: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

136 Chapter 10 – Analysis of Meiotic Recombination Hotspots – A Bioinformatics Approach

HUMHOTHuman Meiotic Recombination Hot Spots

Jawaharla Nehru Centre for Advanced Scienti!c Resarch

List of Hot Spot Sequences Belonging to the Chromosomal Locus ‘3’

Database Output

Sample Search Result

Hot SpotLocusPPARGAGTR1

KNG

ChromosomeNumber

3p252q21-25

3q27

AccessionNumber

AY157024AY436325AY248697

Date ofData Entry08-29-200508-29-200508-29-2005

HopSpotRegion

78000-8100031000-34000

25141-289980

MultipleQuery Forms

Search byLocus Name

ChromosomeNumber

HomologySearch

Links to various recombinationand DNA sequence analysis tools

Recombination Tool Box

PHASE

DNA SP

DESCLIDER

JLIN

Rockefeller’s Page

LDhat

Useful Links

Gibbs Sampler

Consensus

Meme

WebLogo

Mar!nder

Gen Seq Tools

Read Seq

Figure 1: Schematic layout of HUMHOT with its search and analysis features

– Software testing is another integral part of database development and programming, wherein onehas to test the code intensively, using an example, to make sure that the code works [Birney &Clamp, 2004].

– Like the database of human hotspots (HUMHOT), biological databases are constructed with thepossibility of growth in data size. The database should be flexible enough to add fields or columnswhen required. The output of a database depends on the construction and relationships built be-tween multiple related data types. Such aspects can be achieved by relational database manage-ment systems (RDBMS), such as Oracle, Sybase, and PostgreSQL.

2.2 Humhot Database

HUMHOT is a web-based and manually curated database of Human Meiotic Recombination Hotspots. Thedatabase is composed of DNA sequences corresponding to hotspots from the literature, which have beenmapped to high resolution in humans. It has been conceived with the idea to store all meiotic recombina-tion hotspot sequences identified in humans through a database management system [Nishant et al., 2006]. It

Page 5: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

S. Agrawal, N.K. Thazath & M.R.S. Rao 137

currently stores 132 experimentally verified and almost 25,0000 putative hotspot sequences. The experimen-tally determined hotspots were obtained through sperm typing, while putative hotspots were from populationgenetics-based approaches. Details for every hotspot, such as locus name, chromosome number, hotspot andflanking sequence information, and hyperlinks to reference papers that describe the hotspots, are stored inthe database. For additional analysis, the hotspot and flanking sequences can be downloaded as text files.Figure 1 describes the detailed schema of HUMHOT. A similar strategy could be designed to develop thedatabase of recombination hotspots from other model systems. The searches could be made specific byselecting a specific organism.

2.2.1 Database Feature

In the database, a multiple query interface has been provided to facilitate the searching for hotspots based onhotspot identity (locus name), chromosome number, or homology to user submitted sequences. Selecting amenu from the “Search database” drop down box opens a simple form in which these search criteria can bespecified. Additional database features include the following:

Hotsopt Information

The database provides basic information on meiotic hotspot properties, motifs associated with hotspots,classes, techniques used to map hotspots, and an outline of the meiotic recombination process in differentorganisms. The database is also updated with recent published literature in this field.

Useful Link

The “Useful Links” button on the menu bar provides access to other websites, which could be helpful forDNA sequence analysis and DNA sequence format conversions. Website links include Gibbs Sampler,Consensus, MEME, WebLogo, Gen Seq Tools, and Read Seq.

Recombination Tool Box

The “Recombination Tool Box” provides access to programs that compute recombination rates and performgenetic analysis. This includes links to various tools and software, such as DnaSP, PHASE, JLIN, LDhat,and RECSLIDER.

3 Bioinformatics Approaches to Analyze Meiotic Hotspots

Hotspot sequences stored in databases can be analyzed by bioinformatics methods to detect or discoverknown or new conserved motifs shared by multiple hotspots. In this study, we explain methods that can beused to predict motifs from meiotic hotspots. We will also discuss few reported studies on motif predictionand on the analysis of recombination hotspots from human and yeast systems. A demonstration of how thesetools can be used to identify new motifs in human hotspots is also presented.

3.1 Identification of Sequence Motifs

Sequence motifs are short conserved patterns in DNA that repeatedly occur across the genome of an or-ganism or of related organisms. The conserved sequence motifs are called consensus sequences. They arepresumed to have significant biological functions. There are proteins that can specifically identify and bindto the consensus sequence to carry out specific biological processes. Such motifs are becoming increasingly

Page 6: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

138 Chapter 10 – Analysis of Meiotic Recombination Hotspots – A Bioinformatics Approach

Motif Analysisof the Hotspot

permissiveregions using

motif predictiontools like

MEME

Hotspotsequence

motifs

Evidence of hotspotpermissive regions

taken from literature

Figure 2: A schema for identifying motifs from genomic regions that are permissive for meiotichotspots.

important in the discovery and analysis of meiotic hotspots. As described in the reported studies, these mo-tifs contribute to the formation and evolution of meiotic hotspots. Figure 2 describes a schema for identifyingrepresentative conserved motifs in meiotic hotspots.

3.1.1 Method Design

Identifying conserved sequence motifs/features from hotspot sequences requires the implementation of rig-orous and well-tested bioinformatics approaches/algorithms. We suggest the following steps in the detec-tion/discovery of conserved DNA elements from unaligned and aligned hotspot sequence data: (i) filter-ing/masking of the hotspot sequence data for simple repeats, tandem repeats, and low complexity regions,among others; (ii) generating random sequence data or taking sequence data, which have strictly been knownfor not having any meiotic hotspots, such as mitochondrial DNA sequences (these data will serve as controldata); (iii) performing multiple statistical hypothesis tests to calculate significant occurrences of the motifs;(iv) identifying prevalent motifs based on their higher frequency and statistical significance; (v) and testinghighly enriched motifs for their presence in well-known hotspots and for the correlation of their frequencywith corresponding recombination rates.

3.1.2 Bioinformatics Tools for Motif Identification/Discovery

A large number of motif prediction algorithms/tools have been developed and implemented to identify var-ious DNA motifs, such as transcription factor binding sites and regulatory elements. Motif prediction algo-rithms can be classified according to the type of DNA sequence information used by the algorithm in findingmotifs. Accordingly, there are three classes: those that use promoter sequences from co-regulated genesfrom a single genome, such as YMF (Yeast Motif Finder) algorithm; those that use DNA sequences of a sin-gle gene from multiple species, such as phylogenetic footprinting (e.g., CONREAL); and those algorithmsthat make use of any DNA sequence to discover new motifs, which are further used to do phylogenetic foot-printing, such as PhyME [Das et al., 2007]. It is difficult to address the problem regarding the selection ofthe best motif prediction tool, and the performance assessment of motif-finding tools is a very challengingtask. Most of the available algorithms have some or the other constraints. A reason for this is that motif tools

Page 7: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

S. Agrawal, N.K. Thazath & M.R.S. Rao 139

are assessed based on one type of data with which a particular motif model may perform well, but it maydo worse on other types of data [Tompa et al., 2005]. Oligo-Analysis is very efficient in picking up yeastmotifs found by laboratory experiments [Van Helden et al., 1998]. Others, such as SP-STAR, can performbetter than GibbsDNA, Consensus, and MEME for shorter motifs [Pevzner & Sze, 2000]. Some of the motifprediction algorithms are specialized to discover DNA sequence motifs, such as AlignACE, BioProspector,and MEME.

According to the survey of DNA motif prediction algorithms made by Das et al., [2007], most motif-finding algorithms work successfully in yeast and other lower organisms, but they perform significantlyworse in higher organisms. It is advisable to select prediction tools based on their citations or on the reporteduse of the original algorithm. Based on these considerations, we suggest the following tools and algorithmsfor predicting motifs from large-scale hotspot sequence data.

1. MEME: Multiple EM for Motif Elicitation (http://meme.nbcr.net/meme4_3_0/cgi-bin/meme.cgi) MEME is a proven and good tool for discovering motifs in a group of related DNA sequences.It identifies motifs from unaligned data. It is one of the most widely used tools for searching novelmotifs in sets of biological sequences. The strategy used by MEME to discover motifs can be viewedas a “needle in a haystack” problem. The algorithm looks for the signal or motif (i.e., the needle) ina longer sequence. It searches for repeated, ungapped sequence patterns occurring in DNA or proteinsequences provided by the user. The output of MEME is an HTML page, which shows motifs aslocal multiple alignments. In addition, the output page contains links to various tools and databases,like JASPAR database, allowing users to compare their results with databases of known motifs. Theidentified motifs are also displayed in various formats [Bailey & Elkan, 1995; Bailey et al., 2006].

2. PhyME (http://edsc.rockefeller.edu/cgi-bin/phyme/download.pl) Phylogenetic Motif Eli-citation is a probabilistic algorithm that integrates two important aspects (i.e., over-representation andcross-species conservation). The algorithm takes a set of orthologous sequences as input in order tofind motifs that occur in evolutionarily conserved and unconserved regions. The algorithm scales bet-ter with the number of species, and it allows non-conserved occurrences [Sinha et al., 2004]. In thesetwo aspects, it is more powerful than orthoMEME (a phylogenetic model). However, orthoMEME[Prakash et al., 2004] can work with a greater range of motif variation than PhyME. The algorithmis based on the Expectation Maximization technique. It has been tested with various datasets fromdifferent organisms, such as yeast, fly, and man. It was tested on human datasets that corresponded toSP1 transcription factor binding sites, with mouse and rat as orthologs. The motif reported by PhyMEwas almost similar to the known SP1 weight matrix. Out of the top 27 instances of this motif re-ported in human promoters, 16 overlapped with known binding sites. The same dataset when usedto run MEME, reported 41 motifs in human promoters, only 9 of which overlapped with known sites[Prakash et al., 2004].

3. AlignACE (http://atlas.med.harvard.edu/) This is a Gibbs sampling algorithm that discoversmotifs or signals that are over-represented in a set of DNA sequences. Using this algorithm, the authorshave identified many functionally related cis-regulatory elements of genes in S. cerevisiae. They havealso reported many unidentified motifs, few of which have been experimentally verified [Roth et al.,1998; Hughes et al., 2000].

4. MotifSampler (http://homes.esat.kuleuven.be/˜thijs/Work/MotifSampler.html) The prin-ciple behind this algorithm is Gibbs sampling. However, this has two major modifications: one is theuse of a probability distribution to estimate the number of copies of the motifs in a sequence and the

Page 8: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

140 Chapter 10 – Analysis of Meiotic Recombination Hotspots – A Bioinformatics Approach

other is the incorporation of a higher-order Markov-chain background model. In Thijs et al. [2002]MotifSampler gave convincing results for datasets from plants containing the G-box motif and theupstream sequences from bacterial genes regulated by oxygen-responsive protein FNR. For upstreamsequences from four clusters of co-expressed genes, which express in response to wounding in Ara-bidopsis thaliana, the algorithm identified putative motifs that matched regulatory DNA elements fromplant defense pathways [Thijs et al., 2002]. In general, Gibbs sampling-based motif discovery algo-rithms produce results that are comparable with MEME [Defrance & van Helden, 2009]

3.2 Reported Case Studies on Motify Identification from Yeast and Human Meiotic Hotspots

3.2.1 Identification of Motifs from Yeast Meiotic Hotspots

Meiotic recombination hotspots have been thoroughly analyzed in S. cerevisiae and Schizosaccharomycespombe. One of the widely studied eukaryotic hotspot is M26, a meiotic recombination hotspot in S. pombe[Steiner et al., 2005]. Another hotspot is CRE cAMP response element. M26 and CRE overlap at six oftheir seven base pairs, differing only at the ends. The nucleotide sequence for M26 hotspot is ATGACGTN.For CRE hotspot, the sequence is [C/G/T] TGACGT [C/A] [Fox et al., 2000]. A consensus motif from thesetwo hotspots can be used to predict the position of any related hotspot in a genomic sequence.

In another study, a genome-wide analysis of the distribution of meiotic DSBs in S. pombe revealed thatDSBs primarily occur in intergenic regions, particularly in exceptionally large intergenic regions (IGRs). Itwas observed that 44% of IGRs (¿3 kb in size) contain prominent DSBs, but associations between IGRs andDSBs are not clear [Cromie et al., 2007]. Some IGRs have specific sequence motifs responsible for creatingrecombination hotspots, while others have none.

Another research group tested the relevance of this hypothesis in S. pombe [Steiner et al., 2009]. Theyscanned several short random oligonucleotide sequences for hotspot activity in fission yeast and produceda library of approximately 500 unique 15 and 30 bp long sequences that contain hotspots. Further analysisof this motif library has demonstrated that motifs ranging from 6-10 bp are frequently present in hotspot se-quences. Many of these motifs produce hotspots when tested and reconstructed in vivo. They also proposedthat there are several unrelated short sequence motifs that are capable of producing recombination hotspotsother than the previously characterized CRE elements. They have used YMF3.0 and MEME software todiscover these novel motifs.

3.2.2 Examples of Motifs Identified from Human Hotspots

Bioinformatics analysis of human recombination hotspots has previously reported a DNA motif (7-MerCCTCCCT), which is enriched in hotspot regions [Myers et al., 2005]. Another study by the same researchgroup has used the degenerate 13-mer motif CCNCCNTNNCCNC, which is identical to the previouslyidentified 7-Mer motif, to validate the hypothesis that hotspot-promoting motifs operate in diverse geneticbackgrounds [Myers et al., 2008].

In line with the hypothesis, Myers et al. [2008] reported that these 13 bp motifs are differently enrichedin repeat families and in non-repeat DNA. They have used meiotic hotspots identified from the HapMapPhase 2 data and novel search methods in identifying an extended family of motifs based around the de-generate 13-Mer motif. The study showed that the 13-Mer motif is critical for crossover events in at least40% of all human hotspots. These motifs are also found in hypervariable minisatellites and are clustered atthe breakpoint regions of both disease-causing non-allelic homologous recombination hotspots and commonmitochondrial deletion hotspots.

Further examination of motif occurrence outside the repeats revealed degeneracy within the motif se-

Page 9: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

S. Agrawal, N.K. Thazath & M.R.S. Rao 141

quence at positions 3, 6, and 12. These positions are prone to mismatches, yet they retain their hotspotactivities to some extent. At these three positions, all four bases were observed in the motifs, explaining thedegeneracy of the motif in determining hotspot activities [Myers et al., 2008].

3.2.3 Association of Poly-Purine/Poly-Pyrimidine (poly-pu/py) Tracts (PPTs) and Simple SequenceRepeats with Yeast and Human Meiotic Hotspots

Bagshaw et al. (2006) studied the relationship between poly-purine/poly-pyrimidine (poly-pu/py) tracts(PPTs) and hotspots in human and yeast (S. cerevisiae) genomes using computational methods. They showedthe statistically significant association of PPT frequency with meiotic recombination hotspots and double-strand breaks in the yeast genome and with well-characterized human hotspots. They have explained thatthere could be a possible functional involvement of these PPTs with recombination hotspot formation. Be-sides, there is evidence suggesting the binding of PPTs to transcription factors [Lu et al., 2003; Sandalt-zopoulos et al., 1995;], and studies from yeast have shown the involvement of transcription factor binding inhotspot determination [Kon et al., 1997; Mieczkowski et al., 2006; Wahls et al., 1994; White et al., 1993].They also found that the three single nucleotide polymorphisms, which were previously shown to be asso-ciated with variations in the human MS32, NID1, DNA2 hot spot activity, occur within sequence contextsof 14 bp or longer, which are 85% or more poly-pu/py and at least 70% G/C rich. These polymorphismsare all close to hotspot midpoints. MS32 has a sequence context “(G/c) GTGGGAAGGGTGG” with thepolymorphism indicated in the brackets in the first position, which is located at a distance of 151 bp from thehotspot midpoint [Bagshaw et al., 2006; Jeffreys AJ et al., 1998]. NID1 has a sequence context of “CC(C/t)CCCACCCCACCCC” with the polymorphism shown in the third position, which is located at a distance of64 bp from the hotspot midpoint [Bagshaw et al., 2006; Jeffreys et al., 2005; Jeffreys & Neumann, 2005].DNA2 has a sequence context of “AGGGGGCAGCAACAGGG (A/g)GG” with the polymorphism shownin the base, the third from the last base. It is located 166 bp from the hotspot midpoint [Bagshaw et al., 2006;Jeffreys et al., 2001; Jeffreys & Neumann, 2002].

Microsatellites are sequence motifs that are tandem repeats of 1-6 base pairs that commonly occur inthe genomes of all eukaryotes. They are known to be associated with human meiotic hotspots in the MHCregion [Cullen et al., 2002], as well as with regions having high recombination rates on human chromosome22 [Majewsky and Ott, 2000]. A recent study on yeast shows that there is a link between microsatellitesand meiotic recombination hotspots [Bagshaw et al., 2008]. Further, high-copy, short-motif micro satellitesare strongly associated with S. cerevisiae meiotic recombination hotspots, and this association is weak oralmost absent for low-copy hotspots. Bagshaw et al. [2008] suggest that the factors that contribute to theirassociation are: mutational bias relating to recombination or other property of hotspot regions, which causesmicrosatellites to form and grow; and the regulation of hotspot locations by simple sequences. However,they suggest further experimental probing. The foregoing examples are some of the reported case studies inwidely-studied organisms, such as man and yeast. These studies have utilized bioinformatics tools and ap-proaches for identifying/discovering motifs associated with meiotic crossover points. Similarly, we presenta case study on real human hotspot datasets, which demonstrates the use of bioinformatics tools to discovernovel motifs associated with meiotic hotspots.

3.3 Identification of Novel Motifs from Large Scale Human Hotspot Data

As a demonstration highlighting the use of motif identification tools and approaches, we have analyzedlarge scale human hotspot sequences to discover sequence determinants that may give rise to hotspots. Wepresent a case study using the method design we proposed in Section 3.1.1. Our dataset consists of 139empirically determined and 7,383 computationally predicted human hotspot sequences. Each step described

Page 10: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

142 Chapter 10 – Analysis of Meiotic Recombination Hotspots – A Bioinformatics Approach

below corresponds to the method design proposed earlier.

1. Filtering/masking of hotspot sequence data for simple repeats, tandem repeats, and low complexityregions First, we filtered all the sequences with the help of Censor program for 742 repeats and STRloci, including human and ancestrally shared repeat sequences from primates available in Repbase, arepeat database [Jurka et al., 1996].

2. Generating random sequence data or taking sequence data, which have strictly been known for nothaving any hotspots We ran MEME, Gibbs sampler, and DNA pattern programs to identify sequencesignatures. This step yielded a set of 9 new non-repeat DNA sequence motifs associated with humanhotspots (Tables 1 and 2). Subsequently, we have analyzed our motifs for their specific occurrence inorder to show their relevance in meiotic recombination. We used centromeric regions of approximately2 kb from all 24 chromosomes as control data for analysis. None of the motifs were detected in theseleast recombining regions. Further, they were not found in human mitochondrial DNA and in thegenome sequence of six human DNA viruses.

3. Performing multiple statistical hypothesis tests to calculate significant motif occurrences We observedthe occurrences of motifs in both actual and permuted (shuffled) sequence data. We used Fisher’sexact test, Paired T-test, and Wilcoxon Matched-Pairs Signed-Ranks test to calculate the statisticalsignificance of motif occurrences in hotspot data compared with the shuffled hotspot sequence data.

4. Identification of prevalent motifs based on their higher frequency and statistical significance In ouranalysis, we observed the most significant correlation between recombination rate and frequency ofmotif occurrence for motif M9, CYHBDDVC (Tables 1 and 2).

5. Testing highly enriched motifs in well-known hotspots and the correlation of their frequency withcorresponding recombination rates In order to examine whether the newly identified motifs from thepresent study can reliably predict hotspot regions with low false positive rates, we looked at the occur-rence of these motifs in the 4.6 Mb human MHC locus. The human MHC locus served as a positivecontrol because a recombination profile with experimentally mapped hotspot locations is available.The four major hotspots in the 4.6 Mb human MHC region have been reported to exceed the averagerecombination activity by several folds [Cullen at al., 2002]. An average rate of recombination inthe MHC is 0.49 cM/Mb (1x), while the four individual hotspots of TelomereVHLAF, BAT 2VLTA,DQB3VDQB1, and DPB1VRING3 have fold increases of 2.4x, 5.2x, 3.8x and 2.5x, respectively. Weexamined the frequency of occurrence of the nine motifs (M1 to M9) with these four major hotspotregions. All nine motifs showed enrichment at the four segments. The frequency of motifs correspond-ing to each segment is shown in Figure 3A. Our analysis revealed a sufficiently high enrichment in thenumber of motifs in these hotspots, varying from 5 to 280 motifs. Further, linear regression analysisof motif density (motifs/Mb) with the recombination intensity of these hotspots shows a positive andsignificant Pearson correlation (r2) value for all motifs (data not shown). However, the most significantcorrelation (r2 = 0.816) was observed for motif M9 (CYHBDDVC) (Figure 3B).

4 Design of an Integrated Platform for Analyzing Meiotic Hotspots

With the increasing amount of hotspot sequences being identified by both experimental and computationalmethods, there is a need for efficient tools and databases to store, explore, and analyze data. Many methodsand tools are available for estimating recombination rates, detecting recombination hotspots, determining

Page 11: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

S. Agrawal, N.K. Thazath & M.R.S. Rao 143

Motif Patterns Ratio1 Ratio2 P-Value P-Value P-Value Position Specific(hotspot / (freq. of (Fisher’s (Wilco- (Paired Information Contentshuffled motifs in exact xon Test) T-test) of the Motifs3

hotspot hotspot / test)sequence) freq in

shuffledhotspots)

 

M1: CYHBDDVCBDGG 103/5 330/11 > 0.0001 1.011e-29 1.05e-19

 

M2: YBKGKCCCWGVB 47/0 72/0 > 0.0001 7.004e-17 1.486e-18

 

M3: GDDVMADGVYYDRG 64/0 83/0 > 0.0001 7.941e-22 7.435e-25

 

M4: CTGDGYHCA 74/6 161/7 > 0.0001 3.7083-23 2.6e-24

 

M5: TGGAGG 61/9 125/11 > 0.0001 4.11e-18 1.818e-18

 

M6: CYYCAGYYTR 47/0 61/0 > 0.0001 7.00e-17 1.015e-26

 

M7: GAMBGGG 102/24 290/29 > 0.0001 7.192e-30 5.823e-24

 

M8: KCBGWBVYC 15/24 505/52 > 0.0001 4.848e-30 1.441e-24

 

M9: CYHBDDVC 138/78 5782/792 > 0.0001 4.348e-36 7.955e-37

1 – Ratio of number of hotspot and number of shuffled hotspot sequences showing motifs2 – Ratio of total frequency of motifs in hotspots and shuffled hotspot sequences3 – Logo diagram generated using the motif pattern occurrences in all 139 hotspots

Table 1: Novel motifs associated with human meiotic hotspots along with statistical significancefor their occurrences. The motifs were observed for their exact matches (zero mismatch), andthe occurrences were counted in both actual and permuted (shuffled) sequence data. Statisticalsignificance of the motifs was examined through Fishers exact test, Paired T-test, and WilcoxonMatched-Pairs Signed-Ranks test. All three tests showed concurrence for the high statistical sig-nificance of all motifs. Statistical tests were conducted by constructing a 2x2 contingency tablefor each motif, which was based on motif occurrences in the real hotspot and shuffled hotspot se-quences. Final motif consensus selection was accomplished after generating all possible patternsof the motif through sequence permutation (shuffling of nucleotides), and 200-1000 top scoringmotif patterns for individual motifs were analyzed to see their comparative presence inside andoutside the hotspots. The following equivalencies were used in the consensus sequences: R = (A,G), Y = (T, C), W = (A, T), S = (G, C), M = (A, C), K = (G, T), H = (A, T, C), B = (G, C, T), V =(G, A, C), D = (G, A, T), N = (A, G, C, T).

Page 12: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

144 Chapter 10 – Analysis of Meiotic Recombination Hotspots – A Bioinformatics Approach

Motif Hotspots Shuffled Hotspots P-Value∗

#Motifs1 Hotspots$ #Motifs2 Hotspots$ (Wilcoxon Sign Test)

M1: CYHBDDVCBDGG 26,237 2934 4,282 1,602 4.919e-07M2: YBKGKCCCWGVB 2,623 994 310 275 1.178e-07M3: GDDVMADGVYYDRG 3,036 1,475 945 625 2.722e-08M4: CTGDTYHCA 8,680 2,300 1,624 671 7.344e-07M5: TGGGAGG 16,676 2,400 685 537 4.392e-09M6: CYYCAGYYTR 9,172 2,264 491 405 5.405e-10M7: GAMBGGG 16,549 2,850 3,890 1,485 1.311e-08M8: KCBGWBVYC 24,491 3,081 8,331 2,161 2.753e-07M9: CYHBDDVC 291,780 5,726 171,418 2,758 1.274e-061 – Number of motifs observed in the hotspots.2 – Number of motifs observed in the shuffled hotspots.$ – Total number of hotspots having corresponding motifs.∗ – P-value has been calculated to test the random occurrence of motifs within the hotspots

Table 2: Motifs analyzed in 7,383 hotspots and shuffled hotspot sequences.

haplotypes, and mapping linkage disequilibrium, among others. At present, there is no platform to carryout all the aforementioned tasks in an integrated fashion. The integration of the entire hotspot data frommultiple organisms, along with bioinformatics tools to analyze them, will facilitate the abstraction of im-portant features and information from hotspot data. HUMHOT and the public Hapmap are two databaseresources available for human meiotic hotspots. These databases include some links to tools and softwarerelated to recombination studies, but they are not integrated in the data system itself. We have previouslyreported the development of such integrated analysis platforms on other biological data types [Agrawal, etal., 2008; Sailakshmi and Agrawal, 2008]. There is a requirement to develop an integrated bioinformaticsanalysis platform for meiotic hotspots with which the user can retrieve hotspot sequences reported fromdifferent human populations and from other model organisms. The platform should be able to identify novelrecombination hotspot motifs, to search for known motifs in new genomic sequences to predict hotspots,to search for homologous hotspots within and across organisms, and to carry out linkage disequilibriumanalysis. Figure 4 describes the detailed schema for this proposed platform.

We identified the following major ways by which such a platform can work:

1. Hotspot databases could be developed and integrated with pertinent analysis/search features. The plat-form can be provided with additional analysis tools/features, such as “Sequence analysis of hotspots”and “Detecting new hotspots using bio-informatics tools” apart from “Search for existing hotspots.”

2. The category “Sequence analysis of hotspots” will include motif prediction tools, such as MEME,Gibbs Sampler, and YMF to identify conserved DNA elements/features for selected/all sequences fromthe database. This category will also facilitate the identification of these elements from user uploadedhotspot sequences. It will also have integrated tools to perform multiple sequence alignment to detectconserved motifs from hotspot sequences.

3. The newly discovered motifs from the foregoing analysis, as well as known motifs, can be used to de-tect recombination hotspots through another tool (motif search algorithm), integrated into the platform.

Page 13: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

S. Agrawal, N.K. Thazath & M.R.S. Rao 145

This tool will predict hotspots based on the frequency of motif occurrences and on related associationstatistics.

4. The category “Detecting new hotspots using bioinformatics tools” will combine many other recombi-nation tools together. This will include tools like “HOTSPOTTER,” a recombination hotspot detectiontool; recombination rate calculation tools, such as PHASE (http://stephenslab.uchicago.edu/software.html), RECSLIDER (http://genapps.uchicago.edu/recslider1/index.html); andlinkage disequilibrium mapping software like BLADE (Bayesian LinkAge DisEquilibrium mapping)(http://www.people.fas.harvard.edu/˜junliu/index1.html). New hotspots identified usingthese tools or motif search algorithms can be used to update the database. Nucleotide BLAST can beintegrated to allow users to find homologous hotspot sequences for a submitted hotspot sequence orfor hotspot sequences selected from the database.

5 Conclusion

In conclusion, this chapter describes possible bioinformatics approaches to facilitate the efficient and accu-rate analysis and prediction of meiotic hotspots. We hope that the methods described in this chapter willfacilitate the development of resources and tools, which will ease the use of bioinformatics in abstractinguseful information from hotspot sequence data.

References[Agrawal et al., 2008] grawal, S., Dimitrova, N., Nathan, P., Udayakumar, K., Lakshmi, S. S., Sriram, S., Manjusha, N. & Sengupta,

U. (2008). T2D-Db: an integrated platform to study the molecular basis of Type 2 diabetes. BMC Genomics, 9, 320.

[Bagshaw et al., 2006] agshaw, A. T., Pitt, J. P. & Gemmell, N. J. (2006). Association of poly-purine/poly-pyrimidine sequenceswith meiotic recombination hot spots. BMC Genomics, 7, 179.

[Bagshaw et al., 2008] agshaw, A. T., Pitt, J. P. & Gemmell, N. J. (2008). High frequency of microsatellites in S. cerevisiae meioticrecombination hotspots. BMC Genomics, 9, 49.

[Bailey & Elkan, 1995] ailey, T. L. & Elkan, C. (1995). Unsupervised Learning of Multiple Motifs in Biopolymers Using Expecta-tion Maximization. Machine Learning Journal, 21, 51–83.

[Bailey et al., 2006] ailey, T. L., Williams, N., Misleh, C. & Li, W. W. (2006). MEME: discovering and analyzing DNA and proteinsequence motifs. Nucleic Acids Research, 34 (Web Server issue), W369-73.

[Baudat & de Massy, 2007] audat, F. & de Massy, B. (2007). Cis- and trans-acting elements regulate the mouse Psmb9 meioticrecombination hotspot. PLoS Genetics, 3(6), e100.

[Buard et al., 2009] uard J, Barthes P, Grey C, de Massy B. Distinct histone modifications define initiation and repair of meioticrecombination in the mouse. EMBO J., 28(17), 2616-24.

[Birney and Clamp, 2004] irney, E. & Clamp, M.(2004). Biological database design and implementation. Briefings in Bioinformat-ics, 5(1), 31-8.

[Boulton et al., 1997] oulton, A., Myers, R.S. & Redfield, R.J. (1997). The hotspot conversion paradox and the evolution of meioticrecombination. Proc Natl Acad Sci U S A. 94(15):8058-63.

[Cullen et al., 2002] ullen, M., Perfetto, S.P., Klitz, W., Nelson, G. & Carrington, M. (2002). High-resolution patterns of meioticrecombination across the human major histocompatibility complex. Am. J. Hum. Genet, 71: 759-776.

[Cromie et al., 2007] romie, G. A., Hyppa, R. W., Cam, H. P., Farah, J. A., Grewal, S. I. & Smith, G. R. (2007). A discrete class ofintergenic DNA dictates meiotic DNA break hotspots in fission yeast. PLoS Genetics, 3(8).

[Das et al., 2007] as, M. K. & Dai, H. K.(2007). A survey of DNA motif finding algorithms. BMC Bioinformatics, 8 Suppl 7:S21.

Page 14: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

146 Chapter 10 – Analysis of Meiotic Recombination Hotspots – A Bioinformatics Approach

0

50

100

150

200

250

300

350

Mot

if fr

eque

ncy

Major hotspots in MHC region

M1 M2 M3 M4 M5 M6 M7 M8 M9

Telomere – HLAF BAT 2 – LTA DQB3 – DQB1 DPB1– RING3

A. Distribution of novel motifs (M1-M9) in four highly recombining segments of the 4.6 Mb humanMHC consensus region along with their frequency of occurrence [TelomereVHLAF (992072 bp),BAT 2VLTA (64817 bp), DQB3VDQB1 (71458 bp), and DPB1VRING3 (110018 bp)]. Motif detailsare as follows: M1=CYHBDDVCBDGG, M2=YBKGKCCCWGVB, M3= GDDVMADGVYY-DRG, M4 = CTGDGYHCA, M5 = TGGGAGG, M6= CYYCAGYYTR, M7= GAMBGGG, M8=KCBGWBVYC and M9 = CYHBDDVC.

B. Relationship between the recombination intensity and motif density of M9 (CYHBDDVC) infour major hotspots of the 4.6 Mb human MHC consensus region. The graph shows a plot of recom-bination intensity of the following hotspot regions by motif density of M9 in the linear regressionanalysis: TelomereVHLAF, BAT 2VLTA, DQB3-DQB1, and DPB1-RING3.

Figure 3: Distribution of novel motifs in the human MHC region.

Page 15: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

S. Agrawal, N.K. Thazath & M.R.S. Rao 147

HUMHOTSequence

DbORDBMS

Search Analysis of Hotspots

Detecting New Hotspotsusing Bioinformatics Tools

UpdatePredicted Hotspots

Updating thenew/predictedhotspots to the

database

New Genomic Sequences

BLAST

Identify/Search for the homologous hotspots

in other organizms

Sample alignment showing motif

Key:Q - QueryR - Result

CACCTACCTCCCTCACCACACCTACCTCCCTCACCACTTCAACCTCCCTCACCA

Search for Existing Hotspots

Locus Name

Chromosome No.and Sequence

Coordinates

Motif Prediction ToolMEME

Multile Sequence Alignments Tools for

Conserved Motifs

Homologous SequenceIdeni!cation from

other Organisms BLAST

Recombinant HotspotDetection ToolHOTSPOTTER

Recombinant RateCalculation Tool

PHASE RECSLIDER

Linkage DisequilibriumMapping

BLADE - Bayesian LinkageDisEquilibrium Mapping

Search by Locus Name

Search by Chromosome Number

Search by Homology

Hot Spot Sequence

Web InterfacePHP, HTMLJavascript

Output from the MotifPrediction Tools

Motif SearchAlgorithm

New Hotspots

HomologousHotspots

Feed the new sequencesalong with known/newmotifs to motif search

algorithm

R R

R

R

R

Q

Q

Q

Q Q

Q

R

R

Q

Figure 4: Schematic representation of the integrated analysis platform for recombinationhotspots.

Page 16: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

148 Chapter 10 – Analysis of Meiotic Recombination Hotspots – A Bioinformatics Approach

[Defrance & van Helden, 2009] efrance, M. & van Helden, J. (2009). info-gibbs: a motif discovery algorithm that directly optimizesinformation content during sampling. Bioinformatics, 25(20), 2715-22.

[Fox et al., 2000] ox, M. E., Yamada, T., Ohta, K. & Smith, G. R. (2000). A family of cAMP-response-element-related DNA se-quences with meiotic recombination hotspot activity in Schizosaccharomyces pombe. Genetics, 156(1), 59-68.

[Gabriel et al., 2002] abriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M.,Lochner, A., Faggart, M., Liu-Cordero, S. N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E. S., Daly, M. J. &Altshuler, D. (2002). The structure of haplotype blocks in the human genome. Science, 296(5576), 2225-9.

[Gerton et al., 2000] erton, J. L., DeRisi, J., Shroff, R., Lichten, M., Brown, P. O. & Petes, T. D.(2000). Inaugural article: globalmapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proceedings of the NationalAcademy of Sciences U S A, 97(21), 11383-90.

[Goldstein, 2001] oldstein, D. B. (2001). Islands of linkage disequilibrium. Nature Genetics, 29(2), 109-11.

[Guillon et al., 2002] uillon, H. & de Massy, B. (2002). An initiation site for meiotic crossing-over and gene conversion in themouse. Nature Genetics, 32(2), 296-9.

[Grey et al., 2009] rey, C., Baudat, F. & de Massy, B. (2009). Genome-wide control of the distribution of meiotic recombination.PLoS Biology, 7(2), e35.

[Hughes et al., 2000] ughes, J. D., Estep, P. W., Tavazoie, S. & Church, G. M. (2000). Computational identification of cis-regulatoryelements associated with groups of functionally related genes in Saccharomyces cerevisiae. Journal of Molecular Biology, 296(5),1205-14.

[Jeffreys & Neumann, 2002] effreys, A. J. & Neumann, R. (2002). Reciprocal crossover asymmetry and meiotic drive in a humanrecombination hot spot. Nature Genetics, 31(3), 267-71.

[Jeffreys & Neumann, 2005] effreys, A. J. & Neumann, R.(2005). Factors influencing recombination frequency and distribution ina human meiotic crossover hotspot. Human Molecular Genetics,14(15), 2277-87.

[Jeffreys et al., 1998] effreys, A. J., Murray, J. & Neumann, R. (1998). High-resolution mapping of crossovers in human spermdefines a minisatellite-associated recombination hotspot. Molecular Cell, 2(2), 267-73.

[Jeffreys et al., 2001] effreys, A. J., Kauppi, L. & Neumann, R.(2001). Intensely punctate meiotic recombination in the class IIregion of the major histocompatibility complex. Nature Genetics, 29(2), 217-22.

[Jeffreys et al., 2004] effreys, A. J., Holloway, J. K., Kauppi, L., May, C. A., Neumann, R., Slingsby, M. T. & Webb, A. J. (2004).Meiotic recombination hot spots and human DNA diversity. Philosophical Transactions of the Royal Society of London. Series B,Biological Sciences, 359(1441), 141-52.

[Jeffreys & May, 2004] effreys, A. J. & May, C. A. (2004). Intense and highly localized gene conversion activity in human meioticcrossover hot spots. Nature Genetics, 36(2), 151-6.

[Jeffreys et al., 2005] effreys, A. J., Neumann, R., Panayi, M., Myers, S. & Donnelly, P. (2005). Human recombination hot spotshidden in regions of strong marker association. Nature Genetics, 37(6), 601-6.

[Jurka et al., 1996] urka, J., Klonowski, P., Dagman, V., & Pelton, P. (1996). CENSOR--a program for identification and eliminationof repetitive elements from DNA sequences. Computers & Chemistry, 20(1), 119-21.

[Kauppi et al., 2004] auppi, L., Jeffreys, A. J. & Keeney, S. (2004). Where the crossovers are: recombination distributions inmammals. Nature Reviews Genetics,5(6), 413-24.

[Kauppi et al., 2007] auppi, L., Jasin, M. & Keeney, S. (2007). Meiotic crossover hotspots contained in haplotype block boundariesof the mouse genome. Proceedings of the National Academy of Sciences U S A, 104(33), 13396-401.

[Kon et al., 1997] on, N., Krawchuk, M. D., Warren, B. G., Smith, G. R. & Wahls, W. P. (1997). Transcription factor Mts1/Mts2(Atf1/Pcr1, Gad7/Pcr1) activates the M26 meiotic recombination hotspot in Schizosaccharomyces pombe. Proceedings of theNational Academy of Sciences U S A, 94(25),13765-70.

[Li and Stephens, 2003] i, N. & Stephens, M.(2003). Modeling linkage disequilibrium and identifying recombination hotspots usingsingle-nucleotide polymorphism data. Genetics, 165(4), 2213-2233.

[Lu et al., 2003] u, Q., Teare, J. M., Granok, H., Swede, M. J., Xu, J. & Elgin, S. C. (2003). The capacity to form H-DNA cannotsubstitute for GAGA factor binding to a (CT)n*(GA)n regulatory site. Nucleic Acids Research, 31(10), 2483-94.

[Majewsky and Ott, 2000] Majewski, J. & Ott, J. (2000). GT repeats are associated with recombination on human chromosome 22.Genome Res. 10(8):1108-14.

Page 17: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions

S. Agrawal, N.K. Thazath & M.R.S. Rao 149

[Mieczkowski et al., 2006] ieczkowski, P. A., Dominska, M., Buck, M. J., Gerton, J. L., Lieb, J. D. & Petes, T. D. Global analysisof the relationship between the binding of the Bas1p transcription factor and meiosis-specific double-strand DNA breaks inSaccharomyces cerevisiae. Molecular Cell Biology, 26(3), 1014-27.

[Myers et al., 2005] yers, S., Bottolo, L., Freeman, C., McVean, G. and Donnelly, P. A fine scale map of recombination rates andhotspots across the human genome. Science. 2005, 310, 321-324.

[Myers et al., 2008] yers, S., Freeman, C., Auton, A., Donnelly, P. & McVean, G. (2008). A common sequence motif associatedwith recombination hot spots and genome instability in humans. Nature Genetics,40(9),1124-9.

[Nishant et al., 2006] ishant, K. T., Kumar, C. & Rao, M. R. (2006). HUMHOT: a database of human meiotic recombination hotspots. Nucleic Acids Research, 34(Database issue), D25-8.

[Nishant & Rao 2006] ishant, K.T. & Rao, M.R. (2006). Molecular features of meiotic recombination hot spots. Bioessays,28(1):45-56.

[Ptak et al., 2004] tak, S.E., Roeder, A.D., Stephens, M., Gilad, Y., Pbo, S. & Przeworski M. (2004). Absence of the TAP2 humanrecombination hotspot in chimpanzees. PLoS Biol. 2(6):e155.

[Petes, 2001] etes, T. D. (2001). Meiotic recombination hot spots and cold spots. Nature Reviews. Genetics,2(5), 360-9.

[Pevzner & Sze 2000] evzner, P. A. & Sze, S. H.(2000). Combinatorial approaches to finding subtle signals in DNA sequences.Proceedings/International conference on Intelligent Systems for Molecular Biology, 8, 269-78.

[Prakash et al., 2004] rakash, A., Blanchette, M., Sinha, S. & Tompa, M. (2004). Motif discovery in heterogeneous sequence data.In Pacific Symposium on Biocomputing, Hawaii, 348–359.

[Roth et al., 1998] oth, F. P., Hughes, J. D., Estep, P. W. & Church, G. M.(1998). Finding DNA regulatory motifs within unalignednoncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnology,16(10), 939-45.

[Sai Lakshmi & Agrawal, 2008] ai Lakshmi, S. & Agrawal, S.(2008). piRNABank: a web resource on classified and clusteredPiwi-interacting RNAs. Nucleic Acids Research, 36(Database issue), D173-7.

[Sandaltzopoulos et al., 1995] andaltzopoulos, R., Mitchelmore, C., Bonte, E., Wall, G. & Becker, P. B.(1995). Dual regulation ofthe Drosophila hsp26 promoter in vitro. Nucleic Acids Research,23(13),2479-87.

[Shipra et al., 2006] hipra, A., Chetan, K. & Rao, M. R.(2006). CREMOFAC--a database of chromatin remodeling factors. Bioin-formatics, 22(23), 2940-4.

[Sinha et al., 2004] inha, S., Blanchette, M. & Tompa, M.(2004). PhyME: a probabilistic algorithm for finding motifs in sets oforthologous sequences. BMC Bioinformatics, 5, 170.

[Steiner et al., 2005] teiner, W. W. & Smith, G. R. (2005). Natural meiotic recombination hot spots in the Schizosaccharomycespombe genome successfully predicted from the simple sequence motif M26. Molecular Cell Biology, 25(20), 9054-62.

[Steiner et al., 2009] teiner, W. W., Steiner, E. M., Girvin, A. R. & Plewik, L. E.(2009). Novel nucleotide sequence motifs thatproduce hotspots of meiotic recombination in Schizosaccharomyces pombe. Genetics,182(2), 459-69.

[Thijs et al., 2002] hijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor, B., Rouze, P. & Moreau, Y. A Gibbs sampling methodto detect overrepresented motifs in the upstream regions of coexpressed genes. Journal of Computational Biology, 9(2), 447-64.

[Tompa et al., 2005] ompa, M., Li, N., Bailey, T. L., Church, G. M., De Moor, B., Eskin, E., Favorov, A. V., Frith, M. C., Fu, Y.,Kent, W. J., Makeev, V. J., Mironov, A. A., Noble, W. S., Pavesi, G., Pesole, G., Renier, M., Simonis, N., Sinha, S., Thijs, G., vanHelden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C. & Zhu, Z. (2005). Assessing computational tools for the discoveryof transcription factor binding sites. Nature Biotechnology, 23(1), 137-44.

[Van Helden et al., 1998] an Helden, J., Andre, B. & Collado-Vides, J. (1998). Extracting regulatory sites from the upstream regionof yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology, 281(5), 827-42.

[Wahls et al., 1994] ahls, W.P. & Smith, G. R. (1994). A heteromeric protein that binds to a meiotic homologous recombination hotspot: correlation of binding and hot spot activity. Genes & Development, 8(14), 1693-702.

[White et al., 1993] hite, M. A., Dominska, M. & Petes, T. D.(1993). Transcription factors are required for the meiotic recombi-nation hotspot at the HIS4 locus in Saccharomyces cerevisiae. Proceedings of the National Academy of Sciences U S A, 90(14),6621-5.

[Winckler et al., 2005] inckler, W., Myers, S.R., Richter, D.J., Onofrio, R.C., McDonald, G.J., Bontrop, R.E., McVean, G.A.,Gabriel, S.B., Reich, D., Donnelly, P. & Altshuler D. Comparison of fine-scale recombination rates in humans and chimpanzees.Science 308(5718):107-11.

[Yauk et al., 2003] auk, C. L., Bois, P. R. & Jeffreys, A. J. (2003). High-resolution sperm typing of meiotic recombination in themouse MHC Ebeta gene. The EMBO Journal, 22(6), 1389-97.

Page 18: Analysis of Meiotic Recombination Hotspots: A ...methods are too laborious for generating genome-wide hotspot maps; hence, population genetic approaches that infer hotspots from disruptions