MultiPoint
An interactive package
for ordering multilocus genetic maps,
and verification of maps
based on re-sampling techniques
MultiQTL Ltd,
Institute of Evolution, Haifa University, Haifa 31905, Israel
Tel: 972-4-8240449, Fax: 972-4-8288788
http://www.multiqtl.com
1_1
MultiPoint structure
MultiPoint is a suit that consists of three software products: MultiPoint-basic,
MultiPoint-consensus and MultiPoint-ultradense.
Each of the products can be purchased and operated separately/ They are
presented as a suit as they have many common properties and other suppliers
offer them as a single product.
The functionalities of the products are described in details in four sections of the
tutorial. The first section is an introduction and describes elements common to
the products.
The contents of the tutorial is separated also into four parts and the relevant
contents is adjacent to the text.
1_4
Table of Contents
Introduction Short Introduction to the algorithms
The main steps of multilocus ordering
Input population file Input panel
Control and correction of errors
Input of anchor markers
Preliminary treatment Analyzing markers and genotypes for missing level and segregation
Defining the threshold recombination level
The window for working with clusters Controlling “bound together” markers
Setup for clustering
Some additional service options
Analysis and treatment of a separate linkage group Defining groups of tightly linked
Additional example
The procedure of multilocus ordering
Options of the table of ordered markers
Further clustering and treatment of merged clusters
Representing the map of an LG
Adding markers
Deleting markers
Attaching markers
Division of LG into sub-groups
1_11
1_11
1_15
1_19
1_19
1_22
1_24
1_25
1_26
1_28
1_31
1_33
1_34
1_35
1_37
1_37
1_40
1_42
1_44
1_51
1_54
1_56
1_58
1_58
1_63
MultiPoint Tutorial Part 1 - General
1_9
Table of Contents
1_65
1_65
1_66
1_69
1_71
1_73
1_73
1_74
1_80
1_83
1_83
1_86
1_89
1_94
Additional functions of treating LGs (clusters) An extended form of the clustering panel
Saving the results
Possible operations with clusters
Searching cluster residence of a marker
Output options Saving LGs as text files
Printing linkage group map
Printing “graphical genotypes”
Data analysis in special cases
RIL_Selfing, RIL_Sib_mating, and IRIL populations
Import of ordered linkage groups
Adding new data to the data set
References
1_10
Constructing genetic maps (multilocus ordering) Objectives
♦ Ordering multilocus maps (with ~103 markers/chr) ♦ Verification of the orders (and removing the “bad guys”) ♦ Building consensus maps (with verification)
Method and technology
♦ Reduction to the Traveler Salesman Problem (TSP) ♦ Guided Evolutionary Strategy optimization algorithms
Genetic mapping: Some objectives
Constructing genetic maps (multilocus ordering)
Physical mapping (contig assembling)
Mapping simple (Mendelian) traits
Mapping complex (quantitative) traits
genetic mapping of QTL (MultiQTL package)
QTL physical mapping, cloning, and sequencing
QTL and gene expression (eQTL)
Short Introduction to the algorithms employed in MultiPoint
for multilocus map ordering
1_11
Reduction of multilocus ordering to Traveler Sales Person (TSP)
Order 1: a b c d e f g h k l m n l1
Order 2: b a c d e f g h k l m n l2
Order 3: a c b d e f g h k l m n l3 ………
Order N: f c m h e a g n k l b d lN
n=60 N =60!/2 ~ 3.1056 orders
The problem
How to chose the best (true) order, i.e., the
one that gives the map of minimal length?
A B C D E F G H …
a b c d e f g h …
No exact solution exists to TSP (computationally challenging). For practical situation various heuristic methods have been proposed, e.g., Evolutionary Strategy optimization (for more details see: Mester et al. 2003a,b, 2004, 2005)
GES algorithm as a memory based simulation analogue of evolutionary adaptation models (Mester and Braysy 2005)
Natural elements Simulated elements
Chromosome Variable value xi
Individual, a set of chromosomes Solution vector x=(x1,…,xn)
Mutation, change of the chromosome for a small value Operator M : xk xk+1
Population, set of individuals Set P of solution vectors {xk}
Fitness, quantitative characteristic of organism’s performance Opt. criterion value f(xk)
Selection, choosing the fittest individual(s) for next generations Operator S: f(xk) min
1_12
How to ensure high-quality of the maps despite the complexity caused by: ½ n! orders, while we need the best order (unique solution)
sampling variation in rij, missing data, data errors
negative interference
The best way to check / verify the map is to show that the obtained solution does not depend on:
(a) sampling data variation, and (b) starting points
Re-sampling for quality control: By taking sub-samples from the initial data, one can build many repeated maps upon resampling and test whether /where marker ordering remains the same
BOOTSTRAP or JACKKNIFE
with without replacement
1_13
1_14
Example: Maize B73 Mo17 (IBM) population (chr. 10)
a b
(a) Initial ordering: Unstable neighborhoods; were detected by using jackknife re-sampling
(b) Resulting ordering: Stabilizing neighborhoods after removing the detected problematic markers
Detecting and removing (correcting) the markers/scores causing the troubles
Mester, D., Ronin Y.I., Minkov, D., Nevo, E. & Korol, A.B. 2003. Constructing large scale genetic maps using evolutionary strategy algorithm. Genetics 165: 2269-2282.
New high throughput DNA technologies resulted in a disproportion between the high number of scored
markers for the mapping populations and relatively small population size. Correspondingly, the number of
scored markers may by orders of magnitude exceed the number of practically resolvable by recombination
marker for the given population size. Hence, only a minority of markers can be genuinely mapped. The
question is how to chose the most informative markers for building such a “skeleton” reliable map. We
believe that MultiPoint provides a solution to this difficult problem due to: (a) its powerful algorithms of
discrete optimization for multilocus ordering; (b) verification procedure (that is also impossible without fast
and high quality optimization); (c) interactive algorithm of marker clustering in complicated situations
caused by “quasi-linkage” (or “pseudo-linkage”) – significant deviation of recombination rates between
markers of non-homologous chromosomes from the expected 50%; and (d) algorithm of removing
excessive markers to increase stability of multilocus ordering.
Two major problems should be solved in multilocus genetic mapping: Markers that belong to non-
homologous chromosomes should not be assigned to the same linkage group, whereas markers from the
same chromosome should be placed on the genetic map in the same order as the corresponding
fragments reside in the DNA molecule. With ~200-1000 markers per chromosome, sample size ~100, and
real deviations of the recombination rates between non-synteny markers from 50%, the problem of
clustering cannot be solved by an arbitrary choice of a certain (constant) threshold value of recombination
or LOD, albeit this is exactly how this problem is treated in many multilocus mapping packages (Lander et
al. 1987; Linkoln et al. 1983; Stam 1993). Indeed, in experiments with the foregoing characteristics, the
recombination values between groups of markers from different chromosomes may be smaller than
between adjacent markers within a chromosome. In MultiPoint package this problem is treated as follows.
1_15
The main steps of multilocus ordering approach implemented
in MultiPoint software
The first step is calculating pairwise recombination fractions (rf) for all pairs of markers (using maximum
likelihood estimation procedure). Then, the number of clusters (linkage groups, LG) is evaluated and
displayed as a function of the threshold (maximal) value rfs, allowing to preliminary assign a marker to a
certain LG: Namely, marker mi may belong to a LGj if recombination between mi and at least one marker
from LGj is lower than the threshold rfs. User can obtain a prediction of the number of LGs for a series of
threshold rf values that he/she defines by setting min, step, and max values of rfs. Then, based on the
obtained information, it is necessary to chose a sufficiently small value of rfs to exclude the possibility of
getting in one LG markers from non-homologous chromosomes due to quasi-linkage. But because of the
chosen relatively small rfs you will get a large number of clusters (linkage groups) that will considerably
exceed the real haploid number of chromosomes. Therefore, the next steps should be controlled merging of
some of the clusters by relaxing the conditions on quasi-linkage (i.e., by increasing rfs). The specific feature
of our approach is that building and ordering of the LGs are considered as interacting procedures, in order
to reduce the danger of including non-syntenic loci in one LG due to the “quasi-linkage” (“pseudo-linkage”)
phenomenon (see Korol et al., 1994, 2009; Peng et al., 2000; Sakamoto et al. 2000; Sivagnanasundaram et
al. 2004; Ronin et al., 2010). Namely, if some markers of two LGs appeared closer than the relaxed rfs, it
would be reasonable to permit merging if the closest markers of the two candidate LGs are terminal, so that
merging will of “end-to-end” type. If the closest markers reside in the interior part of one or both candidates,
then merging should be forbidden.
To employ efficiently the foregoing idea, we propose a repeated clustering approach that includes (see
the scheme below): (i) ordering the LGs obtained with the chosen value for rfs; (ii) replacing groups of
tightly linked (non-recombining) markers by their most informative “delegates” (bin markers) that will further
comprise the skeleton map; (iii) verifying (evaluating the reliability) of the ordered LGs using the re-
sampling procedure (bootstrap or jackknife); (iv) removing the markers causing unstable neighborhoods in
the map; (v) relaxing the clustering conditions by increasing the end-to-end condition of merging, and
merging such candidates.
The main steps of multilocus ordering approach implemented
in MultiPoint software (continued)
1_16
The presented cycle can be repeated several times until further merging will cause appearance of large
gaps in the LGs. It is noteworthy that the procedure can be considerably simplified if anchor markers are
available. However, the choice and usage of anchors should be cautious because a relatively high level
of errors is characteristic of some published maps. We should also make here some introductory
remarks on our multilocus ordering procedure. As noted above, the number of scored markers may by
orders of magnitude exceed the number of practically resolvable by recombination markers for the given
population size. Thus, with population size n~100 and number of markers 1000, the minimum distance
between markers should be 1cM, hence the map length for a chromosome should be 1000 cM, which
is unrealistic in vast majority of organisms. In other words, only a small portion of markers (delegate
markers) can be included to the skeleton map, with the reminder markers being attached to the
delegates.
1_17
chosen rfs
Candidate LGs
for merging
Merging the
end-by-end pairs
Increasing rfs
End clustering
First clustering
For each LG:
Choosing
“delegates” Ordering
Verification
& removing the
problematic markers
The main steps of multilocus ordering approach implemented
in MultiPoint software (continued)
1_18
Ordering Bounding tightly
linked markers
Verification & removing
the problematic markers
Attaching removed markers
to the skeleton map
Choosing “delegates”
Beside close linkage combined with sample sample size, the necessity for selection of representative markers
for the skeleton map derives from varying information content of markers (co-dominant versus dominant,
missing data, distorted segregation, and scoring errors), linkage between repulsion-phase dominant markers,
and negative interference (Peng et al. 2000; Esch & Weber 2002; Korol et al., 2009). Using the MultiPoint
tools, you start from a linkage group with hundreds of markers and conduct several analytical steps (see the
scheme below): (a) multilocus ordering; (b) bounding together of closely linked markers followed by selecting
“delegates” (bin markers) with highest information content; (c) replacing the groups of tightly linked markers
by their “delegate” markers; (d) repeated ordering and re-sampling verification of the reduced LG; (e)
removing the markers causing unstable neighborhoods, and repeated ordering to get skeleton map; (f)
attaching previously removed markers to their best intervals on the skeleton map. The most difficult problem
is in step (e). MultiPoint allows conducting this step automatically, but the user may choose interactive
analysis based on his own control.
The main steps of multilocus ordering approach implemented
in MultiPoint software (continued)
Input population file
For demonstration of the diverse functions of the system functions, we have prepared examples for different
population structures: Backcross, F2, RIL_selfing. The majority of the examples are based on simulated data.
After you have entered, you get the main window of the program and its main menu.
To start working, you choose the option <Open>, and to finish – the option <Exit>.
The option <Clear saved cluster> will be described later.
The option <Open> includes a few possibilities. We begin from the <Population file>
and will get the window <Input data>.
Format of input data : each row for one marker, includes the marker name and marker
scores, separated by backspace, comma, or tab.
We first should chose the population type in
the <Type of population data> window (very
limited in the fist version of the package).
In the right part of the
<Input data> window we
will see the genotypes
characteristic of the
chosen population type
(F2 in the presented
example)
By pressing <Select data file> button, we can chose the data file for mapping analysis.
Input panel
1_19 For details about IRIL input see p. 3_7.
The input file should be in text format and have extension *.txt (default) or *.chr. The system suggest to open or
create a folder for the data and mapping results. To create a new folder, you should chose the root folder (in our
example it was Local Disk D) and press buttons <Make New Folder> and <OK>. This will create a folder with
name <New Folder> that can be re-named by user. Results of analysis of different data sets can be stored
in one or different folders. If such a folder already exists,
user can choose it and press <OK>. In the current
example this is folder MultiPoint_Results.
A new sub-folder will then be created in this folder named Project_ <name of the treated data file> (e.g.,
Project_chw7_5Name). If you want to include your project into an existing folder, you should chose this folder
(marked in blue). It will include user’s mapping data after their control/correction, and then the intermediate and final
results (as described in corresponding sections of the tutorial). On this stage, the program is testing whether in the
chosen folder you have already treated data with the same name. The system of storing the data and the treatment
results will be described later (p. 1_23, 1_73). If data with such a name
have already been treated, you will get the following message:
If you answer <Yes>, the old data are deleted, and if some
treatment results were already stored, they also will be deleted.
The answer <Yes> makes sense if you want to replace the old
data by new under the same name. If your answer was <No>,
then the program reads previously treated data and control is not needed. 1_20
Input population file
Input panel (continued)
Input population file
During the initial data input or input updated data under old name (see p. 1_20), the program checks
correspondence of the codes to the population type. As standard we consider codes: for Backcross, Ril_Selfing,
Ril_Sib-mating, and Double haploid : 1, 2 and 0 for missing data, or a, b (A, B) and «-» for missing data. For F2:
1, 2, 3, 4, 5 and 0, or a, b, h, d, c (A, B, H, D, C) and «-». In case of standard codes, they will be displayed in the
window (here the input file F2.txt, is for population F2.)
To input data and check the data file, press <Input Data> button.
If your codes differ from standard one, you will get a message with a request
to fill in the code table. But even if your codes are standard ones, but the
data include excessive symbols, these later will be considered as wrong
codes and you will have to fill the coding table. The excessive codes can be
considered as missing values. For that, you should put the check button
<Control of data codes> to state <Off>. It may happen that your codes are
standard, but have another sense (see the example).
If the treatment results for this data were already saved, you will
get a corresponding message. If, nonetheless, you answer
<Yes>, the old treatment results will be abolished. If the answer is
<No>, the system will warn you that you cannot continue the
analysis. You should leave the system and start again after
changing the name of the data file or folder name for your data.
Input panel (continued)
1_21
Two types of errors can be detected in the file: errors in genotypes and in marker in general.
The first type includes the following:
1. The data include symbols that differ from the defined. You can correct these or automatically consider these
as missing data by getting button <Control of data codes> in state <Off>.
2. In some markers, the codes are inconsistent, e.g., in a row with 2 and 4, sometimes codes 1 and/or 3 appear.
The second type includes the following: 3. Different population size for different markers
4. Fully identical markers with identical names and scores appear twice.
5. Markers with identical names but different scores appear.
Messages with detailed information on the number and types of the detected errors are provided. The errors
can be fixed independently of the program or using the program as a tool.
In the first case, all errors are saved in a special file error.txt, in the same folder, where user has saved the
data and results.
In the second case, the errors are provided in form of data tables and can be corrected by the program. In
fact, errors of types 2 and 3 are difficult to correct by the program: it is not clear which symbol should be
inserted, or how to replace the marker value that does not correspond the chosen coding. Errors of types 4
and 5 can easily be corrected by the program.
Control and correction of errors in the data file
Input population file
1_22
As a result, in the folder chosen (or created) by user, a sub-folder will be created named: Project+name of
data file, and within it, a sub-folder <Data> that will carry the corrected data file and file of data codes,
whereas for data of anchor markers the last sub-folder will also include the file of anchor markers (see next
page). In the future, if you need a repeated data input, it is easy to do that from this sub-folder: the data will
be displayed automatically, and will certainly be correct.
The errors of type 5 can be corrected using the program, but we should provide a new name to replace the
repeated one. Thus, if we have to markers of and F2 population with identical names but different marker scores:
Using <Change name> option, clean the name field and enter another one, e.g., Xgwm497b.
Control and correction of errors in the data file (continued)
Input population file
1_23
Input of anchor markers
You may have anchor markers in your mapping problem. To allow dealing with anchors, you should switch
the button <Anchor Data Exist> of the <Input data> window to state <On>.
Then, during the input process, after pressing button <Input Data> you will get
the system’s requirement to enter the name of the file with anchor markers.
During input of this file, the program checks for correspondence of its markers to other
markers of the population. In case of inconsistency, the user will get error report. The file is
copied to the sub-folder <Data> of the project. The name of this file together with the data file
name is displayed on the panel of the main clustering window. We will also show on this
example how to deal with data containing anchor markers..
The structure of the anchor file is as follows: The name of the marker, the number of the chromosome of the
markers, and the number defining the order of the anchor marker among other anchors for this chromosome. If
the position of the anchor is not defined, the second number will be -1. The elements of this file are separated
from each other by backspace or tab. The file should be in text format and be named as *.txt. Among our example
files, one is F2_anchor.txt for F2, for which anchor markers are provided in file anchor.txt (see below)
After input, the marker name is extended by its sequential number in the input file. Anchor markers are marked
by an additional left letter <A>.
Input population file
1_24
Preliminary treatment
Analyzing markers and genotypes for missing level and segregation
1_25
First window displays information about missing data and segregation distortion (2) of markers and
missing of genotypes. Marker sorting can be conducted for missing or segregation distortion (as in the
example). To delete, we can select markers or genotypes and press <Delete> button.
Preliminary treatment
Analyzing markers and genotypes for missing level and segregation (continued)
1_26
We can return back for one step of the
deletion of markers or genotypes,
according to the selected menu option.
Menu option <Global Undo> allows
returning to the initial data. After closing
the window, you’ll get a question asking
for confirmation of the deletion request.
If the answer is <No> the window will be
retained and you could use Undo option.
If the answer is <Yes>, the data will be
changed as requested by deletion
choice, and a question about saving the
deleted markers will appear.
These markers may be moved to Heap and be used in the future as attached or saved in a special file
deletedMarkers.txt in the project folder <Data>.
Preliminary treatment
Analyzing markers and genotypes for missing level and segregation (continued)
1_27
Warning: If you plan to input additional portions of data, then do not delete genotypes! Otherwise the
population size for the second portion will not be equal to that of already included data and such situation is
considered as error.
The second window displays the markers sorted for
“informativity” (maximal value of LODs for linkage of the
corresponding marker to all other markers in the data set.
You can delete the markers with low informativity
You can employ Undo option for the last step or even start the analysis from the beginning by using <Global
Undo>. After closing the window, the user will get the same questions as described on the previous page.
The first step is calculating pairwise recombination
fractions (rf) for all pairs of markers (using maximum
likelihood estimation procedure). Then, the number of
clusters (linkage groups, LG) is evaluated and displayed
as a function of the threshold (maximal) value of rf
allowing to preliminary assign a marker to a certain LG:
Namely, marker mi may belong to a LGj if recombination
between mi and at least one marker from LGj is lower than
the threshold rfs. User can obtain a prediction of the
number of LGs for a series of threshold rf values that
he/she defines by setting min, step, and max values of rfs.
If anchor markers are available, the threshold rfs will be
increased until a critical level of rf is reached when
anchors from different chromosomes will be “ready” to
merge.
Preliminary treatment
The system suggest conducting stepwise clustering,
to define a reasonable initial threshold value rfs. In
case of very large marker set, this procedure takes
a lot of time. Thus, the user may skip this step by
answering “No” to the system’s question.
Defining the threshold recombination level
1_28
Corresponding message is displayed in this case, and the process is stopped. In fact, the last will be the step of
fusing when the anchor markers are not yet “ready” to fuse, whereas at the next step they could fuse if this were
not forbidden (because they belong to different chromosomes). There might be situations when already at the first
step LGs anchored by markers from different chromosomes will tend to fuse. In such cases, a smaller initial value of
threshold rf should be taken (and, possibly, a smaller step of changing rf values).
In the absence of the file of anchor markers (example
in file F2.txt), the situation is different. But in both cases
using threshold rf=0.2 and 0.25 we’ll get 35
and 15 clusters, respectively.
Preliminary treatment
Defining the threshold recombination level (continued)
1_29
Using these histograms, we can chose a reasonable threshold value of rfs. We strongly recommend to start with
moderately low rfs value, to prevent fusion of linkage groups that may belong to non-homologous chromosomes
displaying quasi-linkage (pseudo-linkage) (see Korol et al., 1994; Peng et al., 2000). With large amount of markers,
it would be reasonable to chose such an initial rfs that the size of each cluster will not exceed 150-200 markers. You
should select the desirable rfs in the left column of the list by clicking left mouse button, and then press button
<Choosing of threshold>. In the clustering window you’ll get the results. We chose rfs = 0.25 and will show the
window of clustering results for both our examples.
By double click, we can choose now any level of clustering from the
left column of the <Result of clustering> list : It will be displayed by a
histogram of clusters distribution with different number of markers. We
can get such histograms with different steps. Thus, for step=0.2 we have
clusters one, two, three, five, and eight clusters (two clusters for each of
the foregoing sizes), five clusters with 16 markers each, and one cluster
per each of the remainder cases. For step=0.25 we have six clusters
with 50 markers, and one cluster per each other size.
Preliminary treatment
Defining the threshold recombination level (continued)
1_30
The window for working with clusters
This is how the window looks like when anchor markers are not available. The
clusters (linkage groups) are denoted as LGi (ni), where i is the number of the
cluster and ni is its size (number of markers). Note, that the clusters are ordered
by decreasing size.
1_31
This is how the window looks like in case of availability of anchor markers. In this
case, the chromosome number defined by the anchor marker(s) is also indicated.
In our example, two clusters had anchor markers that belong to chromosome #1,
and two with anchors of chromosome #3. Some clusters have no anchors and for
the reminder clusters the anchors define one cluster per chromosome.
The window for working with clusters
1_32
Some of the markers may be of special importance for the user (“priority markers”). A part of priority markers can
be marked by special symbols added to their names. These symbols can be defined in the window <Part of name
to choose priority markers>. In case of one combination of the symbols, you can set these symbols directly,
whereas in case of several sets you should connect them by “&” (see the example below).
In addition, user can denote priority markers in treating each linkage group (see
p. 1_37). In current version of the system, we employ the information on priority
markers dealing with the problem of tightly linked markers and choosing among
these so called “bin” (or “skeleton”) markers. For that, we evaluate for marker its
missing (Miss) and segregation (Segr) levels and sorting the markers according
to linear combination (A*Miss + B*Segr). Here coefficients А and В (А+В =1) are
set equal by default, but user can define other (unequal) weights by setting A
(Missing) in the window <Coefficients of priority>. It is also necessary to set the
<Minimum rf> value; markers that are closer than this value are considered as
“fused”. By default, we set this value as = 0.0.
In our example of a set with anchor markers (file F2_anchor), let us
define one marker, r338, as priority marker, and leave unchanged other
parameters of the <Setup for controlling bound together markers >
Controlling “bound together” markers
The window for working with clusters
1_33
The left panel of the window includes the name of the data file, population type,
population size, and number of markers in the data file. For data with anchors, the
name of the file with anchor markers is also provided. In the current version of the
package, only threshold recombination rate is employed as a criterion for clustering.
If you have selected function Defining the threshold recombination level the
threshold value is chosen on stage of preliminary treatment (see p. 1_30) to get
the first step of clustering, under a relatively stringent conditions (resulting in
relatively large number of relatively small linkage groups). If this function has not
been used on the previous stages of analysis, the default value of threshold
recombination rate is =0.05, hence you should define your value of threshold and
press button <Build Linkage Groups>.
To continue building the linkage groups, you need to change the threshold value
of rf and press the button <Build Linkage Groups>. Clearly, the higher rf, the
smaller the number of linkage groups. If you want to decrease rfs, the clustering
starts from the beginning, whereas by increasing rfs you switch on the algorithm
of repeated clustering.
Setup for clustering
The window for working with clusters
1_34
Some additional service options
By using option <Show> <Population data> or
corresponding button Tool bar, one can get the
information on the entire population – all its markers
and genotypes. However, this option is
practical only for small size problems. Due to
technical limitations, not more than 550 markers can
be shown on the display.
The window for working with clusters
1_35
By using option <Show> <Population data> or
corresponding button Tool bar, one can get the
matrix of pairwise recombination fractions for all
markers. However, this option is also practical only
for small size problems. Due to technical limitations,
not more than 550 markers can be shown on the
display.
Some additional service options (continued)
The window for working with clusters
1_36
Analysis and treatment of a separate linkage group
Defining groups of tightly linked (”bound together”) markers
.
We’ll take one of the cluster with anchor markers and demonstrate how to
define and analyze groups of fusing markers. To choose a cluster, we use
<double click> of the mouse left button on cluster’s the name or icon. In the
example, we selected cluster LG12, chr 7.
In the list of its markers, symbol А denotes the
anchors and symbol Р – priority markers. To
choose additional priority markers, we use the
sub-option <Select of priority> of <Marker list
options>.
Now we’ll choose the desired marker by
mouse left button, and then, by pressing
the right button, will get the prompt:
We select the option <Create (or Undo) priority marker>. Correspondingly,
the marker will become priority marker, or oppositely, its priority status will be
cancelled. This procedure can be applied to several markers.
Choosing additional priority markers
1_37
Pressing <Control of bound together markers> button results in the question
shown below. If the answer is <Yes> the system will define groups of markers with
rf <Minimum rf> set by user
for all clusters (see p. 1_32). The
algorithm includes the following. For
each marker of the cluster, the relative
missing values and the 2 score for
segregation distortion are scaled on
their maximum values within the cluster, and then the linear combinations with the
<Coefficients priority> are calculated (see p. 1_33). Markers of each group,
anchor, priority, codominant and dominant (for F2) are sorted by increasing value of
the foregoing linear combination. Note that anchor markers are considered as having
higher rank compared to priority markers, but user can set an opposite situation.
All markers are combined in one set. Each marker of the set, starting from the
first marker of highest priority, “establishes” a group of markers with recombination
distances to the priority marker less than the <Minimum rf>. Such marker is referred
to as a “delegate” marker of its group. Markers that were already included into
groups established by higher rank delegates are not considered in the group of
lower rank delegates. If some groups with delegates of equal rank include shared
markers, these shared makers will remain in the group with smaller distance to the
delegate marker. Thus, only delegate markers are retained in the cluster.
Analysis and treatment of a separate linkage group
Defining groups of tightly linked (”bound together”) markers (continued)
1_38
For the possibility of using this function for all data immediately after data input
see p. 2_2.
The reminder markers are removed from the cluster to the “Heap” set.
These markers do not participate further map ordering. The system informs
about the number of groups of bound together markers and the number of
markers retained in the cluster.
The names of delegates are marked by symbol S (or AS for anchors and PS
for priority markers). Such a marker can be chosen with mouse right button, or
function <Change delegate marker> of the <Marker list option> of main menu.
This allows displaying a table of all markers associated with the chosen group, their
missing, segregation and distance to the delegate marker. The red rectangular
indicates that the marker is dominant (dominant repulsion phase markers will be
marked in blue and codominant in green).
You can cancel the established groups within the
cluster by using the button <Control of bound
together markers> and answering <Yes> to the
appeared question.
Analysis and treatment of a separate linkage group
Defining groups of tightly linked (”bound together”) markers (continued)
1_39
We recommend to treat each LG after the initial clustering and only then to continue
the clustering procedure. In other words, to reach reliable results and reduce the
danger of combining non-syntenic loci in one LG due to the “quasi-linkage” (or
“pseudo-linkage”) phenomenon (see Korol et al., 1994; Peng et al., 2000), building
and ordering of the LGs should be considered as interacting procedures.
To demonstrate the procedure of treatment a separate cluster, we will use the
example from data file BC.txt that allows to deal will diverse situations. After the input
and primary clustering (no marker or genotype deletion was conducted), let us
choose a threshold rfs =0.25. The following picture will be obtained:
Let us choose cluster LG9 for further illustrations
Analysis and treatment of a separate linkage group
Additional example
1_40
We employ the option of controlling bound markers and found 3 groups
of such markers. The “delegates” of these groups are marked by symbol
S. We can see, that after removing the the bound together markers (3
markers were removed) but retaining the delegates, the cluster includes
24 markers. These three groups can be analyzed as shown before.
Now we can move to the process of multilocus ordering.
Analysis and treatment of a separate linkage group
Additional example (continued)
1_41
The procedure of multilocus ordering
To start ordering you should choose the menu
option <Ordering> or use the corresponding
button of Tool bars. The ordering algorithm is
based on minimizing the total length of the
multilocus map of the linkage group. The problem is solved on the initial data
set and on re-sampled sets, in order to test the stability of the obtained order.
The number of such sets is defined by user in the parameter <Number of
iteration > (by default 10). Re-sampling can be conducted using Bootstrap or
Jackknife approaches (only the second is implemented in current version of the
package). Parameter <Population for Jackknife> defines the part of the total
population (in %) sampled at each run (by default 90%). The results of the first
iteration define the ordering that will be used as a “reference” one for compare
all other iterations. Parameter <Time to Es> defines maximum time allowed for
searching of the multilocus order in each iteration. By default, it is defined as a
function of the number of markers in the cluster by some simple procedure. All
these parameters can be changed by user.
For data with anchor markers, if the order of anchors is known and indicated in
the input data, a special check box <Taking into account order of anchors>
will appear on the panel <Setup for ordering>. It will be in state <On> to take
into account the preset order of the anchors. If you change the state to <Off>,
the ordering will be conducted ignoring the preset order of anchors.
Analysis and treatment of a separate linkage group
1_42
After ordering is finished, a grid table and a graphical display of the LG will appear in
the window. The table shows the effect of variation of the recombination estimates
caused by re-sampling on the local stability of the map. It includes also the
information on missing data and segregation ratios. The graph of the LG includes
cluster name, its length, and rf values for adjacent markers. If the rf exceeds the
threshold value, it will be highlighted in red. Likewise, the anchors, priority markers,
and “delegate” markers are indicated by special symbols.
Analysis and treatment of a separate linkage group
The procedure of multilocus ordering (continued)
1_43
Options of the table of ordered markers
Some simple service functions are available in this
section to facilitate the analysis. For any chosen
marker, user can get a table of its rf values with other
markers of the ordered LG. Based on this information
and/or the results of ordering displayed in the grid
table you may want to remove this marker (in fact,
deletion can be conducted for a separate marker or
simultaneously for a set of markers). After the
marker(s) is (are) chosen using left mouse button,
you can do that using the menu of the table,
Visualizing the distance table
or getting a prompt help by pressing mouse right button. In the considered option, the distance table is
displayed for any one chosen marker. It can show the rf values between the chosen marker and all other
markers (using option <All markers>). In this case, you may need to use Scroll bar, which may be time
consuming if the number of markers in the LG is relatively high and you employ this option many times.
Alternatively, you can display only to its nearest 8 markers from each side (using option <Nearest marker>).
Analysis and treatment of a separate linkage group
1_44
After <Change delegate marker> is selected, you get a
window with a list of all markers of the group “represented” by
the delegate marker. By selecting any marker of this group
with the mouse left and then right button, you can obtain the
table of its distances (rfs) to all other markers of the group, or
replace the delegate. In the last case, the process of ordering
the LG is initiated, to take into account the new marker
participating in multilocus map. Consequently, updated
version of the grid table of ordered markers and the LG graph
are displayed.
For delegate markers there is an additional option: <Change delegate
marker>. After choosing the delegate marker, you’ll get this additional
(to the previous) option. In the considered example, the chosen marker
is the first one in the ordering, hence for displaying its pairwise distances
only <Display all marker distance> option is possible.
Options for delegate marker
1_45
Options of the table of ordered markers (continued)
Analysis and treatment of a separate linkage group
To delete markers from a LG you can use the menu
function <Delete marker> or the first option of the
prompt table called for by pressing mouse right button.
You can choose several markers by using keyboard
buttons <Ctrl> or <Shift>. After the selected markers
are deleted, the system automatically moves to re-
ordering of the LG, followed up by output of new grid
table and LG graph. Simultaneously, the <UnDo>
option becomes available.
Such operation can be conducted several times. At the bottom of the window you’ll get a list of deleted markers
numbered according to the order they were deleted from the LG.
Using menu option <UnDo> you can recover the deleted markers. This can be
done starting form some step. Namely, by choosing the number of a deleted group
in the list, you can recover or markers of this group and those deleted after this
group. Thus, pressing <UnDo> after the choice shown in the list shown below, we
can return to the LG all markers starting from marker178(193) and till end of the list.
Deleting markers
Anchor marker(s) can also be deleted. But in such an attempt the system displays
a warning message. As indicated earlier, all deleted markers are moved to a group
referred to as <Heap> and do not participate in further clustering (if needed) and
ordering and can appear in the map only as attached markers
1_46
Options of the table of ordered markers (continued)
Analysis and treatment of a separate linkage group
After the introduction to the service tools helping in analyzing the clusters (linkage groups), we can describe the
algorithm of analysis. The following steps and actions aim to utilize the available information for excluding from
the map markers that (1) cause unstable neighborhoods, and (2) unreasonable map extension. Clearly,
removing markers causing the map extension, we actually deal with double recombinants. Their appearance on
small distances may be caused by both negative interference (e.g., Peng et al., 2000) and errors in marker
scoring. We have not yet implemented the “cleaning” process, albeit some functions are already available. For
example, automatic “cleaning” the map from closely linked markers to get a stable skeleton map is conducted
by pressing button <Control of bound together markers> and allows deleting markers with minimal ranking.
The results are shown on p. 1_39. Verification process based on re-sampling procedures (jackknife or
bootstrap) reveals unstable local neighborhoods, hence potential candidate markers causing such instability. A
crude approximate information about unstable neighborhoods can be obtained just by using 10-20 jackknife
runs. A formal objective of cleaning is to get a map with minimal deviation of left-side and right-side
neighborhoods from the 1-1 double diagonal in the grid table (expected under perfect ordering). Ideal 1-1
pattern indicates that sampling variation among the jackknife runs does not affect the results of multilocus
ordering. One may relax the requirement to stability and instead of an ideal ordering (1-1 along the “double
diagonal”), be satisfied by probabilities ≥0.9. .
We are now describing the steps of the algorithm of cleaning up the LG from problematic markers. First, we
should check whether the automatically chosen optimization time is sufficient for convergence. For that we can
repeat the ordering procedure several times with the same parameter <Time to Es>. If the same order is
obtained, we can conclude that the chosen optimization time is sufficient and we can start “cleaning”.
1_47
Options of the table of ordered markers (continued)
Analysis and treatment of a separate linkage group
We’ll start from deleting markers that violate
monotonic increase of rfs (i.e., deviation from the
expected increase of rf between a marker and its
subsequent neighbors). The algorithm detects such
markers automatically. By pressing the button
<Control of monotony>, you start the process of
detecting and removing such
markers. By the end of this
process a message appears
indicating how many markers
were deleted.
In many cases, after once cycle of such
cleaning the resulting ordering does not satisfy
you (e.g., the probabilities on the diagonal are
less than 0.9). You can continue cleaning
(removing markers) as will be shown on the
next page. Alternatively, you may cancel the
results of automatic cleaning by using <Undo>
and analyze the situation manually, step-by-
step.
1_48
See also additions on p. 2_4–8
Options of the table of ordered markers (continued)
Analysis and treatment of a separate linkage group
Among markers with strong deviation
from the 1 on the diagonal we may
choose marker(s) with highest missing
and most distorted segregation. After
deleting this marker we can see a clear
improvement manifested in increased
values of probabilities along the double diagonal. The name of the deleted
marker is shown in the window below the grid table. In case on a not successful
choice, you can cancel the deletion by using <Undo> option.
After deletion, we recommend to conduct a
repeated control of marker for deviation from
monotony. For that, we should again press the
button <Control of monotony> In the considered
example, one marker was deleted; it is displayed
in the table of deleted markers. It can be marked
their and returned back by using <Undo> (and this
is what we will do).
To close the table of ordered markers
we can press the button 1_49
Options of the table of ordered markers (continued)
Analysis and treatment of a separate linkage group
After closing the table of ordered markers, we
return to the single LG window that includes the
scheme of the ordered LG, list of its markers,
number of markers and number of deleted
markers. Information about
the number of markers moved
to the Heap set from current
LG is also provided together
with total number of markers
in Heap.
To close this window we can use the button
This brings us to the window where all the clusters
are presented. Ordered LGs are presented as
1_50
Options of the table of ordered markers (continued)
Analysis and treatment of a separate linkage group
Further clustering and treatment of merged clusters
After each of the clusters with ≥ 3 markers was treated we return to the window “treatment of all clusters”.
It makes sense now to increase the threshold value rfs. For our example, let us increase rfs from 0.25 to 0.27.
these two closest markers is interior in its cluster, we analyze the “tentatively” combined cluster after its ordering. If
after ordering rf (mki,mnj) is less or equal than the relaxed threshold value of rf, the clusters will fuse. If rf (mki,mnj) is
higher than 1.5 of the relaxed threshold value rfs, fusing is forbidden. And if rf (mki,mnj) is between these two values,
the decision is by the user (visual analysis). Pressing button <Build LinkageGroups>
Now we proceed with a special algorithm
that allows testing different pairs of
clusters for the possibility of merging.
Consider a pair of clusters Cm and Cn. All
pairs of markers mki-mnj are tested. If the
pair with minimal rf (mki,mnj) consists of
markers distal in their clusters and this
minimum is less than the relaxed
threshold value of rf, the clusters will fuse,
if they do not include anchors from
different chromosomes.
initiates the clustering process. If cluster merging depends of user’s decision (i.e., if
rfs <rf (mki,mnj)<1.5 rfs) the following massage will appear:
After pressing <OK> a new window will appear (see next
page) with the names of two closest markers from the
indicated two clusters (#5 and #6) that fit the condition rf
(mki,mnj) < rfs.
Clearly, clusters with markers anchoring different chromosomes cannot be merged by definition. If at least one of
Analysis and treatment of a separate linkage group
1_51
The distance between the closest markers also
appears in the window. If we press the button
<Display clusters>, the figures of three LGs
will appear: two old ordered groups and a newly
ordered group after merging the initial two. The
markers that have displayed minimal rf before
merging are highlighted in bold font in the three
groups. Near the name of the LG we can see
the sum of the recombination fractions taken
over all its consequent intervals.
We recommend to refuse merging the groups (by answering
<No>), in any of the two conditions: (1) If one (or both) of the
bold markers are relatively far from the ends of their LGs
(separated from the end by more than one marker), and (2) If
the a posteriori distance between the two merged groups
considerably exceeds the aforementioned distance between the
bold markers. In the discussed example the reasonable answer
is, of course, <No>. After answering the question on this pair of
clusters, the window is closed and the clusters are merged or
not (depending on the answer).
1_52
Further clustering and treatment of merged clusters (continued)
Analysis and treatment of a separate linkage group
Usually, clusters obtained from merging are ordered very easily. The described clustering process should be
continued until the number of LGs will coincide with the number of chromosomes, or until rfs has reached a
certain user-defined maximum level (say, <0.30 or 0.35).
As a result, the following pattern of clustering
will then be obtained: We will see 8 clusters,
two LGs appeared with a changed (mosaic)
coloration, telling us that they resulted from
fusion of smaller clusters.
We need now to clean these 2 clusters. There is no need
here for control of bound together markers (already
conducted earlier). Let us open one of these two, e.g.,
LG8, and conduct its ordering accompanied by re-
sampling analysis. In this example, two markers were
deleted at the step control of monotony and two more
were deleted to achieve the
neighborhood stability (values
of probabilities along the
double diagonal).
For returning to the previous step of
clustering the <Undo> option can be
employed.
1_53
Further clustering and treatment of merged clusters (continued)
Analysis and treatment of a separate linkage group
To get the LG map length in cM, we should chose <Metric length> in the LG title, choose the needed mapping
function in the appeared window (e.g., <Haldane>), and press <OK>.
Then the map length of the LG and marker distances will
be shown in cM. In the table, the distances were shown
as recombination fractions. To return back to this
presentation, we should again enter the selection
window in LG title and select <Recombination> option.
When needed, the map can be printed and/or the
information about the map can be saved as EXCEL table.
For that, we choose option <Printing> and the needed
options of the described window of map distance options.
Note that in printing regime, an additional option
<Summary space> appears. It allows to output the map
positions of the markers instead of showing the interval
lengths. For more details about output see p. 1_74.
1_54
Representing the map of an LG
Analysis and treatment of a separate linkage group
Note: For getting better quality visualization of the constructed genetic map, by publicly
available software MapChart (Voorrips, 2002). https://www.wur.nl/en/show/Mapchart.htm
For population of F2 type, the marker types are denoted by colors: red and blue
for the two types of dominant markers and green for co-dominant markers.
If the function <Control of bound together markers> has not been applied, the map of the LG may include
markers with distance 0.0. Such markers are drawn in one line.
The form of the graph in some specific cases
1_55
Representing the map of an LG (continued)
Analysis and treatment of a separate linkage group
During the treatment of the LG, some markers were removed from it to the Heap set, that does not participate in
further clustering. Heap set will also include new markers that may be added to the problem after the main ordering
process was finished (see p. 1_88 ), as well as markers of small clusters removed to Heap (see p. 1_69). Markers
from Heap group can be added to the LG by using one of sub-options of the menu option <Extending the linkage
group>. Two options of adding markers to the skeleton map: by <insert marker(s)> and by < attach marker(s)>.
Why these functions are important ? Adding markers to the map makes sense despite the fact that these
markers were previously removed from the map in order to prevent their disturbing effect on the quality of
multilocus ordering. Indeed, in many cases, user may want to know the positions of these markers (genes,
ESTs, SNPs, etc) relative to the skeleton markers.
The foregoing list displays marker name, its missing
and segregation characteristics, name of the closest
marker and the distance to it. It shows also the
predicted length of the LG after this marker is
inserted above or below the closest to it marker of the LG.
After first of the two sub-options (or corresponding
button of tool bars) was chosen, we’ll get a list of
markers from the Heap set. This list is prepared as
follows: all markers from Heap are subdivided into
groups according to their closeness to each cluster
and for the current LG its group is provided for further
adding steps.
1_56 (In the last versions of MultiPoint, the form of the marker list is slightly modified, see p. 2_11)
Adding markers
Analysis and treatment of a separate linkage group
From the aforementioned list we can
select by mouse left button the
desired marker and then, choose
one of the four possibilities using
right mouse button or <Options for
additional markers> from the main
menu window.
In this example, we have chosen <marker117>. Please note, that on the LG’s
graph the marker locus closest to the selected marker is denoted by bold
font. If one of the first two menu options was chosen, then the selected
marker will be placed near the marked one. If we choose option <Insert up
nearest marker>, the added marker will appear in the LG marker list and in
the LG graph and marked by underlying. If one of the last two menu options
Adding markers (continued)
was chosen, then we should indicate by mouse button the marker from the list, e.g., as explained below. We choose
Note that no additional ordering is conducted in this case: the marker is placed on the chosen position
on the skeleton map and removed from the list of added markers.
<marker133> and put it above <marker135>
by using option <Insert up the marker
chosen by user>.
As the result we will get:
Analysis and treatment of a separate linkage group
1_57
The user can delete an earlier added marker or any other marker of
the LG. This function is a complementary to the function of adding
markers and is activated only if previously the function <Extending
the linkage group> < insert marker(s)> was chosen and list of
added markers is displayed on the screen. To delete a marker it
should be selected from the marker list of the LG using mouse left
button, and then by using option <Delete chosen marker> from
the menu <Marker list options>. There is also another possibility:
mouse right button click on the selected marker will open the prompt
menu where from the delete option can be chosen. The selected
marker will be moved to the Heap set without re-ordering the LG,
but with updating of the LG graph, list of its markers and list of
added markers.
Note that for each marker of the LG it is always possible to get the lane of its
distances (recombination fractions) with all other markers of the LG.
Deleting markers
Analysis and treatment of a separate linkage group
1_58
Attaching markers
In the list of markers to be attached we can see the markers’ characteristics:
missing and segregation. The system remind that user must chose one of two
possible methods of attachment: by choosing either the best interval for each
marker or the markers that correspond to the user-selected interval (e.g., if it is a
gap on the LG map). In the first case, the user selects the markers he/she needs.
After <Options for attaching> of the main menu is selected (or mouse right
button is pressed), a question about calculation method appears. Currently, only
<Interval-length method> is implemented.
It allows extending the LG, but not the skeleton map, by markers that are closely linked to markers of the skeleton
map. This function may be useful at the final stages of analysis, when the skeleton maps for all LGs are already
finished, and many “excessive” markers remained in Heap. The user may be interested to place these remained
markers relative to the skeleton markers. The window for this option looks exactly the same as in the previous
option, and the list of attached markers is prepared in the same way as the list of added markers (albeit it looks a
bit different).
This is the second possibility from the option <Extending the linkage group> < attach marker(s)> or by
pressing corresponding button of tool bars
Analysis and treatment of a separate linkage group
1_59
According to the complementary way of attaching markers to the skeleton map, the
user can choose the interval for which he/she may want to find all suitable candidates
from the list of added markers. The interval is marked by red. The algorithm of the
currently available <Interval-length method> Алгоритм first searches for each
marker its “optimal” interval (as described above) and then selects markers for which
the marked interval was the solution (if at least one such marker was found).
The idea of this method is very simple derives from the main criterion employed in this
package for multilocus map ordering. Namely, for each chosen marker, the choice of
the interval will correspond to minimum increase in the number of recombination
events.
To indicate the intervals with attached
markers in the main list of markers of
the considered LG, the upper marker for
each such interval is marked by symbol
“G”; if such upper marker is simultaneously
a “delegate” of a group of tightly linked
markers then its symbol will be “SG”.
Attaching markers (continued)
Analysis and treatment of a separate linkage group
1_60 (In the new version the list of attached markers is provided in a slightly
modified form - see p. 2_11).
For a marker with a symbol G, we can use <Marker list options> of
the main menu (or the prompt obtained by pressing mouse right
button) to select one of the few options:
The first is to get the table of recombination distances from this
marker to other markers of the LG. The next two options allow
getting on the screen full information about the markers attached to this interval or LG, correspondingly.
The last two options allow to return back to Heap set the group of attached markers for the current interval
(the chosen marker is the firs flank of this interval) or return back all attached markers of the current LG.
These options are possible if the menu option <Extending the linkage group> <attach marker(s)> is
activated and, therefore, the list of candidate for attachment markers is shown on the screen (this list will be
changed after these option are applied).
For delegate marker with a symbol SG, an additional option is available that allows analyzing and
replacement of the delegate by another marker (option <Change delegate marker> described on p. 1_39).
Attaching markers (continued)
Analysis and treatment of a separate linkage group
1_61
An important note: We have already mentioned that marker
attachment to the skeleton map is considered as one of the final
stages of mapping. If, nevertheless, after the attachment the user is
going to conduct again ordering, adding markers, or division of the
LG into sub-groups, a message will appear about the necessity to
return to Heap all attached markers. If the clustering is continued, all
the clusters will be checked for the presence of attached markers,
and corresponding message appear:
For printing the map with attached markers, the option <Printing> on the
map of the LG should be chosen and then the needed options in the
appeared window. In the figure we see the skeleton markers on the left side
and the attached markers on the right side.
With the answer is <Yes> all attached markers are returned to
Heap, and все присоединённые маркеры возвращаются в
Heap, and with <No> the chosen function will not be conducted.
Attaching markers (continued)
Analysis and treatment of a separate linkage group
1_62
The user may encounter on situations when after
clustering and ordering, the resulting LG includes one
or more long intervals (gaps). It may be desirable to cut
such LG into sub-groups. This can be easily done by
using <Division of the linkage group> option or
corresponding tool bar.
Before the division procedure can be started, the LG
should be ordered and then the ordering table should
be closed. In the list of markers you can mark a contig
of markers using key <Shift> of the keyboard (if the
division option was not activated the choice of several
markers simultaneously is impossible). Canceling this
simultaneous choice is possible by selecting one
arbitrary marker. Pressing mouse right button opens
the option of selecting and creating a new cluster.
Correspondingly, the number of markers in the
remaining LG will be decreased, the marker list will be
updated, and in the graph of the LG the selected
markers will be re-drawn.
As a result, the LG will be dissected into few parts. If the LG before dissection included attached markers, these
attachments are removed to Heap before dissection, and this change is accompanied by a message of the system.
1_63
Division of LG into sub-groups (continued)
Analysis and treatment of a separate linkage group
Due to these actions, just a few markers may remain in the list of markers, and several
selected groups will be shown on the graph. If the window of this cluster will be closed
now, a special message will appear:
If the answer is <Yes>, new clusters will be created, whereas all remaining markers (i.e.,
not included to any of the selected groups) will be moved to Heap and could be later used
for adding or attaching. The names of the clusters (LGs) will be changed, and the new
ones will be denoted by a special sign “NEW”. If the answer is <No> all changes
conducted to dissect the LG will be cancelled and we’ll see the old LGs.
After dissection, the generated clusters can be ordered and treated as any other cluster. If the clustering
process will be continued with a higher threshold value, these new clusters will again become candidates
for merging, if only they will not be marked as clusters excluded from further clustering.
Analysis and treatment of a separate linkage group
1_64
Division of LG into sub-groups (continued)
Additional functions of treating LGs (clusters)
An extended form of the clustering panel
We are returning to description of the clustering window. So far we have been dealing with this window in the
form where the clusters are denoted by rectangular icons with indication of the number of markers in each, and
the form of the rectangular allows to see whether the cluster was already ordered or is created by merging two
smaller clusters. We can get also a more detailed description of all clusters, including their mutual “relationships”.
For that, we should select the option <View> <Cluster details> .from the main menu. This will result in
appearance of the following table:
In this table, for each cluster, the closest to it cluster is indicated together with minimal distance between them.
For ordered cluster (LG) we will see its lengths, maximum interval length (in cM ), calculated for Haldane
mapping function), number of its markers moved to Heap, and the number of attached markers. In the provided
example, all clusters were ordered, and for LG8 and LG6 markers from Heap were attached to the LGs. We can
call a separate cluster from this table by using double click on the LG name in the column of cluster names.
1_65
Saving the results
Careful mapping analysis may be a time consuming process with some steps being relatively subjective. Thus, it
is important to save some intermediate results, to have the possibility to return back and check the consequences
of the decisions made earlier. We recommend to save the results before each new clustering step. Such
intermediate treatment results are stored in the file “Save.job” from the sub-folder “IntermediateFiles” of the
current project [under name (Project+name of the input data file)]. Saving of the results is conducted
each time when option <Save All Clusters> of the main menu of clustering window is chosen.
The first saved results is stored in this file under name S1. Its first line includes the main
parameters at the current stage: number of clusters, time of recording, existence of
markers in the Heap set, etc. The record includes also names of markers of each
cluster and its characteristics (e.g., whether the cluster is ordered or not). Next savings
will be named sequentially as S1, S2, S3 ….
To select one of the saved results, the option <Open> <Old linkage groups> of
the main menu (see next page). If user selects the results recorded on the last step,
numbering of the next results will continue as expected. But if instead of the last result,
one the previous results is selected, the derivative results will be numbered in a
different manner. Let this selection was S3, although further results are also recorded
(S4,S5,S6). Then, the results derived from S3 will be saved, if desired, as S3_4, S3_5,
etc. This allows flexibility in decision making in complicated situations and comparing
the results obtained on different parts of the study.
Additional functions of treating LGs (clusters)
1_66
The saved results can also be opened by using input option <Open> <Old linkage groups> of the main menu.
If this option was chosen, we will get the
content of the folder from which the program
was started. In particular, we will see all its
sub-folder with the names of our projects.
We can select the needed project-folder (in
our example, it is named Project_BC) and
press button <OK>. The table of saved
results will be displayed.
In this table, the conditional names of the conducted steps with saved results, additional information is provided
including the time of saving, <Threshold rf > at the last clustering step, and the number of clusters. It includes also
useful information about the presence (marked by sing V) or absence (marker by -) of unordered clusters, presence
or absence of the Heap set, arrays of bound together markers <Delegates> and arrays of markers attached to some
of the clusters <Attached markers>. In order to select the step of interest, you can “double click” on the
name of the step in the first column of the table by mouse left button.
Saving the results (continued)
Additional functions of treating LGs (clusters)
1_67
In a long-term analysis associated with many steps and frequent saving of the results, the file <Save.job> may
become very big. You can clean it up by taking out a part of the old results by using the option <Clear saved
clusters> of the main menu. It can be called from the either during the current analysis (e.g., just before a new
saving), or at the beginning of a new round of analysis, I.e. before opening the selected saved result. After
selecting this option, a window appears in which you can choose one or a few names of earlier saved results
and press button <Delete Selected Saving>.
After confirmation of this decision, the chosen saved steps will be deleted without affecting the names of the
remainder saved steps. If this function is applied at the beginning of a work, you can immediately after deletion
choose for further analysis one of the remainder saved steps.
Saving the results (continued)
Additional functions of treating LGs (clusters)
1_68
Possible operations with clusters
Consider first the option <Exclude (Undo exclude) from clustering>. This
option is used when you need to exclude one or few clusters from further
clustering. First, you should choose a cluster using the mouse left button
(assisted by the key <ctrl> if a few clusters should be selected), and then
select the considered option. This will cause a change in the icon of the
cluster: it will be marked by a red frame and sign “exc”. During further
clustering, such clusters will not be merged with any other, even if the distance
between them is smaller that the threshold value. If we again select these
clusters and repeated apply the same option, the clusters will be returned to
the previous state (i.e., repeated application here means Undo).
Option <Moving (Undo moving) to Heap> removes the markers of the selected clusters to “Heap”. In this case,
these clusters are excluded from the list of clusters, and their markers (as any marker of the Heap set) is excluded
from clustering. This option may be helpful for isolated markers or small clusters (with 2 маркерs), that during the
clustering steps have not been fused with others. As before, these clusters should be chosen by the mouse left
button; then the indicated option can be applied. For this operation <Undo> is not possible. When such operation
is initiated, the option <Save all clusters> is conducted automatically with corresponding system’s report.
Option <Edit> of the main menu provides some options for treating clusters. Note that
choosing the clusters for the described below treatments is possible by using cluster lists both
in form of icons and tables. It is worth recalling that the form of the cluster list can be changed
(option <View> of the main menu). To conduct <Merging two clusters> operation, more
useful will the the list in the form of table that displays the distances between clusters.
Additional functions of treating LGs (clusters)
1_69
User may force some clusters to merge even if their distance
exceeds the threshold. Such option may be important in
situation when user knows that these clusters belong to the
same chromosome even if this is not reflected in the anchoring
marker information. After choosing two clusters with mouse left
button (assisted by key <ctrl>), we select
the menu option <Merging two clusters>.
This will result in the message shown below and
instead of these two clusters we’ll get a combined
one. Note that this last cluster is not ordered.
The initial situation can be recovered by using
<Undo> option or corresponding button of Tool
bars.
Possible operations with clusters (continued)
Additional functions of treating LGs (clusters)
1_70 See also addition on p. 2_3.
.
Searching a marker between the clusters
During conducting mapping analysis, user may need to get information about some marker: is it presented in
Heap set or some (any) cluster, among skeleton markers of a linkage group or among attached markers? Such
information can be easily obtained by using <Find marker’s location> option of the main menu of the clustering
window or corresponding button of tool bars.
As a result, we’ll get a new window with a sorted list of
names of all markers. For each marker its LG or Heap,
and its status is shown. In the bottom window the
shared initial part of markers’ names is shown (in our
example it is word “marker”).
To find the information about a marker with known
name, we should print its name below the list. During
printing of the consequent letters of the marker’s name
in the bottom window, the list will be automatically
“positioned”, so that the marker can be easily chosen
from the list using mouse left button.
Additional functions of treating LGs (clusters)
1_71
After the marker was selected,
we can press the button <Find
the group containing chosen
marker>. This will give us: a list
of all markers of the LG
containing the selected marker;
or marker to which our marker
is attached, or marker that is a
delegate of its group of bound
together markers. The name of
the LG containing the chosen
marker (in this example it is
LG6) is provided (it may also
be “Heap”).
Additional functions of treating LGs (clusters)
Searching a marker between the clusters (continued)
1_72
This option can be employed only when
all clusters have already been ordered.
The window <Parameters for printing>
of this option are identical to the window of
the <Print> option. Yet, this new option
provides additional possibilities for a
flexible control of the output information,
listed in the second window <Parameters
for final result>. The user can output the
results of each chromosome in a separate
file or get a file with all chromosomes. The
output may include only marker names and
their chromosomal positions, or names
and genotype calls.
The obtained output files can be used for
visualization of the genetic maps, e.g. by
publicly available software MapChart
(Voorrips, 2002).
https://www.wur.nl/en/show/Mapchart.htm
Option <Final result>
Depending on user’s requests, the output results will include two or three files for all LGs or separate files
for each of the LGs. File with name Sk contains skeleton markers only, file with name Sk&Ex contains the
skeleton and bound together (twin) markers; file with name Glob contains all markers.
1_73
Saving LGs as text files
Output options
Voorrips, R.E., 2002. MapChart: Software for the graphical presentation of linkage maps and QTLs.
Journal of Heredity 93 (1): 77-78.
Printing linkage group map
Two printing options are available in the system: printing the LG map and
printing the graphical genotypes for the same LG. In both cases, you
should choose one of the LGs for printing and one of the options of menu
<Print (output to EXCEL) results>. Consider
printing LG map (the first item of the menu).
As in the description of printing options
on p. 1_54, we will get a panel for
defining the method of re-calculating
recombination fractions to cM. By
choosing the desirable parameters and
pressing button <OK>, we will get the
graph of the chosen chromosome.
The size of the figure can be changed by moving its fame. On top of the
figure we can see a menu and buttons of tool bars. The buttons
allows changing Zoom; button allows to get Preview of the figure,
e.g., to see whether it fits in the page; button is for printing the fig.;
button transforms the picture into table; and button allows
copying the picture to an opened in advance file Excel, Word, or any
other format that allows inserting a picture.
Output options
1_74
The menu items are partially overlapping with functions of tool bars. In the
menu option <Edit> there is a possibility of copying, and option <View>
allows changing Zoom. By choosing menu option <Edit> <Options> or
by pressing mouse right button on any place of the picture, we will get a
special panel for editing. Let us consider its parts.
By using parameters <TOTAL SIZE> we can change the size of the
picture exactly as by changing the frame of the picture. Parameters
<POSITION> allow changing the position of the picture in the page.
Parameters <Width> : <Chromosome> define the width of the
“chromosome column” in the middle of the map graph, <Slope line> - the
length of the lines connecting the column and the markers. By pressing
button <CHANGE FONT>, we obtain a special panel to set the font
parameters. Note that changing the font may cause a change in the total
view of the picture on the page. After changing any parameter, we should
press the button <Apply> to change the picture in accordance with the
changed parameters.
It is also possible to return
to the default parameters:
by pressing button
<Restore default>.
Printing linkage group map (continued)
Output options
1_75
The part <PRIORITY> of the editing panel is to allow to user defining the priorities in choosing
the font size. By default, <Fit to page> is chosen, which means that the figure should get into
the page, even at the expense of small font size. In case of changing the size of the figure, the
font size will be changed correspondingly. In this case, the radio button <Total Size> will
automatically switch to <On>.
If we want increase the font size, the radio button <Font
size> should be put to state <On>. Then, by pressing
<CHANGE FONT> button, we can select the desired font.
It may happen that a part of the figure will not get in the
page.
We may need to place our figure on several pages. For
that, we should change the options on the panel <MAP>.
By default, <Single map> value is chosen on this page,
thus we see only one page. If we choose one of the values
<Multiple equivalent> or <Multiple Hierarchic>, the
figure will be placed on several pages but in different
forms. This is illustrated by the example.
1_76
Printing linkage group map (continued)
Output options
By choosing variant <Multiple equivalent> we will see only a part of the figure marked on
the top by letter «А». To see the other parts of the figure we should employ menu options
<View> <Next page> and <View> <Prev.page>, or buttons of tool bars .
With such a choice the figure
is divided into two parts that
can fit in the page size. The
division can be conducted
into several parts, along the
figure length or its width.
1_77
Printing linkage group map (continued)
Output options
By choosing <Multiple Hierarchic> variant, we will see a figure marked on the top by letter
«А» with an internal part marked by letter «В». There may be several such inclusions. The
transition to next part occurs in the same way as shown before (p. 1_77).
1_78
Printing linkage group map (continued)
Output options
If the picture is divided into two relatively narrow parts, they can be
placed in one page by using option <Two columns>. The form of the
figure can be modified using panels <MARGING>and <WIDTH>. The
second of these two options was already described. The first one
affects the proportions of the columns for the width (<Horizontal>) or
length (<Vertical>) (if 4 figures were placed).
After change of any of the parameters,
the button <Apply>should be pressed,
whereas for returning the parameters to
the initial state the button <Restore
default> should be pressed.
For saving the picture in EXCEL file or for
printing, it is necessary to employ the options
of <File> menu. Option <Print Preview>
shows each page prepared for printing. Option
<Save as> provides an output of information to
EXCEL file, with user defined name. In this file,
information is saved in two forms
simultaneously: as a table and as one or few
pages of the figure. Option <Add to file>
allows adding information to the chosen file.
1_79
For some changes made to the last version see p. 2_12.
Printing linkage group map (continued)
Output options
Note: You can also employ the obtained output files for getting better quality visualization of the constructed genetic
maps, by publicly available software MapChart (Voorrips, 2002). https://www.wur.nl/en/show/Mapchart.htm
User may generate and print the graphical presentation of mapping results in the form of
“graphical genotype”. For each ordered LG, each genotype is shown by its alternating
segments highlighted to indicate the grand-parental origin of the segment. For that, option
<Print (output to EXCEL) results> <Graphical genotypes> is employed. To conduct
the analysis, user should choose the mapping function for transforming the mapping
results into map positions in cM. This representation is saved directly to EXCEL.
1_80
Printing “graphical genotypes”
Output options
The window shown here allows choosing different color for different
allelic content per locus. By setting the check box <Sorting> to state
<On> allows ordering the genotypes according to similarity to the initial
parental lines (with respect to allelic content of the LG under
consideration.
1_81
Printing “graphical genotypes” (continued)
Output options
If needed, the “graphical genotype” presentation can be
provided in a more compact form, by groups of 10
genotypes each. It is conducted by setting check box
<Merge Individual Numbers> to state <On>.
By closing the window with the “graphical genotype”
output, user is suggested to choose a name of EXCEL file
for saving this output.
1_82
Printing “graphical genotypes” (continued)
Output options
To demonstrate some specific aspects of analyzing such populations, we will use simulated
data (files RIL_observ.txt and RIL_transf.txt). After entering such data, and conducting
preliminary analysis, the system requires to choose one of two possible ways of dealing
with recombination scores: (1) using of observed rf values in the RIL population (resulting
from accumulating recombination events during the few generations of RIL history), and
(2) using transformed rf values, to get a “per meiosis” equivalent.
We recommend using the first of these two options, because
it allows higher map resolution at the stage of multilocus
ordering. This suggestion is confirmed by our tests and
comparisons conducted on various simulations. Clearly, if
even the first option is selected, the final results should be
transformed to get “per meiosis” map distances (Haldane &
Waddington, 1935).
To illustrate the two options, we have prepared two files with the
same data but saved under different names. Consequently, one
will connected within “observable” and the other with “transformed”
option. Let us start within first option and answer <Yes>. In this
case, the process of initial clustering will be conducted as shown
before for Backcross population. Threshold rf =0.25 was chosen,
resulting in 17 clusters. Out of these, LG7 was chosen for further
treatment. We first use “bound together markers”, represent the
groups of non-recombining markers by their “delegate” markers,
and then conduct multilocus ordering.
1_83
Data analysis in special cases
RIL_Selfing, RIL_Sib_mating, and IRIL populations
The direct presentation of the rf values for the ordered LG7 will give inflated map,
hence the need in transformation. This map will be obtained by selecting “Observable”
from options <Printing> or <Metric length>.
If “Transformed” will be selected, the rf values for the
intervals of the same ordered LG7 will decrease by
about a half compared to “observed” values. It is
noteworthy, that the usual practice of deleting double
recombinants for adjacent intervals, especially for
small intervals, is absolutely not acceptable for RIL
populations. Indeed, in RIL, "double recombinants"
are not necessarily the result of scoring errors or real
double recombination events. Instead, many of the
“double recombinants” more likely result from
recombination in adjacent intervals that occurred IN
DIFFERENT generations of meiosis in genotypes
that remained heterozygous for those regions (in F2,
F3, etc.).
1_84
Data analysis in special cases
RIL_Selfing, RIL_Sib_mating, and IRIL populations (continued)
Consider now the case “transformed”. We use file RIL_transf.txt, and will choose answer <No> to system’s question.
In this case, rf values between
the markers will be smaller.
Thus, under the same threshold
rf value as in the previous
example, initial clustering will
give much less clusters. We,
therefore, select a lower threshold level, rf=0.15, that resulted in 20 clusters. One of these
clusters, again LG7, coincided with LG7 from the previous example. The same operations
as before, i.e., control of bound together markers and multilocus ordering, resulted directly
in a linkage map identical to the one obtained by using “transformed” option in previous
example. Thus, for relatively simple situations, there should not be, seemingly, difference
between: (a) ordering based on “observable” in RIL rf values followed by transformation
RIL“per meiosis” scale, and (b) direct ordering based on RIL“per meiosis” transformed
rf values. However, in more complicated situations the first approach gives more reliable
results (Ronin et al., unpublished results).
1_85
Data analysis in special cases
RIL_Selfing, RIL_Sib_mating, and IRIL populations (continued)
1_86
Import of ordered linkage groups
To input one or a few earlier ordered LGs you can employ the menu option
<FileOpenOrdered linkage groups for analysis>. After choosing this
option you’ll get a new window for data input. Select the population type
and press <Select data file for input>.
The name of the first selected file will appear and in the
column <State> this file will be marked as “select”. To
input this file press button <Input Data>, which will result in
the appearance of the window <Initial data analysis>. By
closing this window we input the file which will be reflected
in changing the column <State> from “select” to “input”.
Repeat this process for all LGs. Note that we suppose the
same type of mapping population for all LGs.
Press <End of Input> and select, as usually, the folder to save
the project.
To input one or a few earlier ordered LGs you can employ the menu option
<FileOpenOrdered linkage groups for analysis>. After choosing this
option you’ll get a new window for data input. Select the population type
and press <Select data file for input>.
Data analysis in special cases
1_87
Import of ordered linkage groups (continued)
We obtain a window with all
imported LGs. Note that
instead of the project name we
have here “Few ordered
clusters”. The value <Recomb
Rate threshold> is set 0.35 (in
fact, it is not defined here and
makes sense only during
clustering).
Each cluster (LG) can be opened. You should pay attention on the state of the <Reserve old order> button: by
default it stays in <On> (see more details about this function on p.1-88). As usually, we can use the menu option
<Save all clusters> for all input information and read it in the future by using option <Open Old linkage
groups>. To input additional markers we can employ option <Open Append additional markers>. This
function is described on p.1-90, but for the current situation the window for input additional makers slightly
differs from the one for the standard situation.
Data analysis in special cases
1_88
In the input window appeared after choosing option <OpenAppend additional markers>
you should pay attention on parameter Maximum rf; by default it should be 0.35. You can
change it before you press the button <Input Data>. When you change its value you should
press <OK>. This parameter controls whether the additional marker(s) can be appended to the
considered clusters. If for some marker, its recombination rate with all markers of a cluster is
higher than this parameter, then the marker cannot be added to any of these clusters.
The appended markers, as usually, are saved in Heap and can be added to the
closest clusters from the considered set of clusters by using menu option
<Extending the linkage group insert marker(s) or attach marker(s)>. For
more details see p.1-58.
Data analysis in special cases
In the window for treating a single cluster, a special check box
<Reserve old order> will appear on the panel <Setup for ordering>.
By default, it will be in sate <On>. This means that in attempt of
ordering, the multilocus orders obtained in jackknife runs will be
displayed around the diagonal pre-defined by the initial (“preserved”)
order. The degree of deviations from this diagonal will actually display
the map instability.
Import of ordered linkage groups (continued)
1_89
Adding new data to the data set
Data analysis in special cases
Consider a situation when for a created project the markers have already been clustered, the clustered ordered, the
removed markers attached. What can you do if you have got now a new portion of data for the same population? In
the previous version of the MultiPoint we suggested to input the new data into Heap and then to attempt attaching the
new markers to each of the old clusters. A more reasonable approach is presented in the updated version of the
program. Namely, we suggest first to test the new markers for linkage to the old clusters, and after that to perform
clustering of the remaining markers into new independent clusters. Consider this approach on an example.
Stage 1
To input the additional data we employ the menu option <OpenAppend
additional markersInput the new markers>.
The new dataset is tested for
coincidence of the population type and
size with the old data and distinction of
the names of new markers from those
of the old names.
In case of no errors, the new markers are included in the file <DataaddData.txt> of
the project. You don’t need to remember the name of this file. You can move to the next
stage right away or in a while. In the last case, after opening the project, you’ll get a
reminder: “Markers were added to this project. For processing this data it is necessary
to use menu option <Edit Treatment of added markers array of candidates for
new clusters>”
1_90
Adding new data to the data set (continued)
Data analysis in special cases
Stage 2
After opening the project, you’ll get a reminder that a new set of data has been added and it should be processed.
To begin the treatment we choose the menu option <Edit Treatment of added markers array of candidates
for new clusters>
Markers saved earlier in file <Data addData.txt> are divided
into two groups why the file is erased. Markers that are closer
to any marker of any of the old cluster that the clustering
threshold are assigned to the first group and are saved in the
project folder in file <attached.txt>. These markers will be
used on Stage 5.
The remaining markers are used to create a subproject with its own clusters, Heap, and bound together markers. The
situation is reflected in a message like the following one:
If the number of remaining markers is small, and the user may decide to put them to Heap. Creation of subproject, if
needed, is conducted on Stage 3, and meantime these markers are saved in file <dataForNewProject.txt>.
1-91
Adding new data to the data set (continued)
Data analysis in special cases
Stage 3
Now the user can create a Subproject. The program reminds about this when the user opens the initial project: "It
is possible to create subproject,using for input file dataForNewProject.txt“. The subproject is created in a usual
way, using file <dataForNewProject.txt> as a source of input data. A usually, the function <Bound togheter>
should be employed followed by clustering (with the same threshold recombination value used in the analysis of
the main project). Then we should order the markers and delete, if necessary, some markers destabilizing the
order. The resulting project is called Subproject and is placed in the same folderas the main project. The file with
initial data is erased.
In this example, 7 clusters are created. Markers in clusters of size=1 can be
moved to Неар (using menu <Edit Moving to Heap> ).
1_92
Adding new data to the data set (continued)
Data analysis in special cases
Stage 4
At this stage, we join the Subproject with the main project. Namely, when we open the main project, the program
reminds us that we can move to the stage of merging the projects: It is possible to add clusters of the subproject to
clusters of the main project using menu option <Open Append additional markers Addition the new
clusters to the main project >.
The program suggests to open the saved Subproject and then merges
the clusters, Heap and other arrays of the two projects and saves all the
data and total distance matrix of the two projects.
Clusters from the Subproject have got numbers following after the numbers of the
main project. The resulting project should be saved while the Subproject is erased.
1_93
Stage 5
At this stage we should return to file <attached.txt>, created at
stage 2. Markers saved in this file should be attached to clusters
of the extended project (resulted from merging the initial project
and the subproject). The minimal distance of each of the markers
from attached.txt file to markers of the initial project is lower than
the threshold value. However, this does not exclude that some
markers from attached.txt may be closer to clusters originated
from Subproject. After opening the new project resulted from
merging a reminder message appear: "It is possible to add special markers to the clusters using menu option
<Edit Treatment of added markers classification of the remaining new marker>”.
Then, file <attached.txt> is erased. Clusters with appended new markers are shown
as non-ordered; thus they should be ordered.
Adding new data to the data set (continued)
Data analysis in special cases
References
Our algorithms are based on theoretical papers of the entire mapping community, and our own publications. List
of our relevant publications was provided in page 8. Here we provide references to other papers cited in the
Tutorial.
Esch E., Weber W.E. 2002, Investigation of crossover interference in barley (Hordeum vulgare L.) using the
coefficient of coincidence. Theor Appl Genet 104:786–796.
Haldane J.B.S., Waddington C.H. 1931, Inbreeding and linkage. Genetics 16: 357-374.
Lander E.S., Green P., Abrahamson J., Barlow A., Day M.J., Lincoln S.E., and Newberg L. 1987, Mapmaker:
an interactive computer package for constructing primary genetic linkage maps of experimental and
natural populations. Genetics 121 174-181.
Linkoln, Stephen E., Mark J. Daly, and Eric S. Lander. 1993, Constructing Genetic Linkage Maps with
MAPMAKER/EXP Version 3.0: A Tutorial and Reference Manual. Whitehead Institute for Biomedical
Research Technical Report Third Edition (Beta Distribution 3B.
Sakamoto T., Danzmann R.G., Gharbi K., Howard P., Ozaki A., Khoo S.K., Woram R.A., Okamoto N.,
Ferguson M.M., Holm L.-E., Guyomard R., Hoyheim B. 2000, Genetics 155: 1331–1345.
Sivagnanasundaram S., Broman K.W., Liu M., Petronis A. 2004, Quasi-linkage: a confounding factor in
linkage analysis of complex diseases? Hum Genet 114: 588.593.
Stam P., 1993. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap.
The Plant Journal 3: 739-744.
Yap I., Schneider D., Kleinberg J., Matthews D., Cartinhour S., McCouch S. (2003) A Graph-Theoretic approach
to comparing and integrating genetic, physical and sequence-based Maps. Genetics 165: 2235–2247.
Jackson B., Schnable P., Aluru S. 2007, Consensus genetic maps as median orders from inconsistent sources.
IEEE-ACM Transactions on Comp. Biol. and Bioinformatics 5: 161-171.
1_94
89
Table of Contents
Changes and additions to separate chapters of the previous version New window – Creation Global parameters
Introducing user defined name for each linkage group
New variant of the function Control of monotony
Some changes made to allow large-size data sets
Changes in function Extending the linkage group
Changes in function Print (output to EXCEL)
Analysis of F2 data with dominant and codominant markers Displaying clusters (linkage groups) with dominant and codominant markers
Treating a cluster with codominant and two types of dominant markers
Repeated clustering (under relaxed stringency)
Employing Consensus option
Extending the linkage group – insert
Extending the linkage group – attach
Output, final results
Treatment of F2 data with only dominant marker
Population F1 x F1 Data input
Instructions for Recoding
Preliminary treatment
Control of bound together markers
First Clustering
The general view of the obtained clusters
Treatment of each cluster
MultiPoint Tutorial
Part 2 - Basic
2_1
2_2
2_2 2_3 2_4 2_9
2_11 2_12 2_13 2_13 2_15 2_18 2_19 2_25
2_28 2_29 2_30 2_33 2_33 2_34 2_35 2_36 2_37 2_38 2_39 2_40
New window – Creation of Global parameters
After the initial analysis, when you close the window <Initial data analysis> a
new window <Creation global parameters> appears on the screen. It allows
you to define the names or parts of names of priority markers and to create
groups of bound together markers (with no recombinants). If you do not
conduct this operation, it can be conducted later, separately for each cluster.
Parameters “Coefficients of priority” can be changed by user (by default,
minimum rf is set zero). Pressing button <Bound> leads to creation of the
groups followed by information about the number of groups and number of
markers moved to Heap. For dealing with F2 data including dominant markers
this function must be conducted in this window, i.e. for all markers rather than
for each LG separately (see p. 1_39). For other types of data, this function can
be employed either for all markers or for each cluster separately. By pressing
button <Display> you can see the resulting groups.
By pressing button <First clustering> we will get the window of clusters. After we close the window
<Creation of global parameters>, the main window of clusters will appear where we should define the initial threshold value for clustering (see p. 1_28–30).
2_2
Changes and additions to separate chapters of the previous version
Introducing user defined name for each linkage group
This option is an addition to the chapter “Possible operation with clusters” (see p. 1_ 69). It is called from menu
<Edit User’s name of cluster>.
This calling will result in a message requiring to enter
user’s variant of name for the chosen linkage group. It
should be put to the window <User’s name of cluster>
(bottom, left). This will cause by extension of the cluster
name.
In the further clustering steps (merging clusters during relaxing the threshold) the names are preserved.
If merged are two clusters with different names, the name of the new cluster is a combination of the
names of its component clusters.
2_3
92
New variant of the function Control of monotony
(For the first variant of this function see p. 1_48). This function is based on the following reasons. Let us take any marker on the chromosome. For a correctly ordered map, one would expect that the distance (or recombination rate) from this marker to its adjacent neighbor, then to the next neighbor, etc. will grow monotonically. Deviation from monotony can be considered as an indicator of the presence of problematic markers. And indeed, when unstable neighborhoods are revealed by using jackknife-based re-sampling, one of the major sources of this instability are the markers violating the monotony. Moreover, it appears that these markers are among major contributors to “map expansion”. In such case, some authors recommend to check for double recombinants and remove corresponding data points. This suggestion is based on the assumption that the considered multilocus order is correct. But what if we are not sure about the order ? Or if the mapping population is RIL, hence “simultaneous” recombination events in adjacent intervals could have occurred as two single-exchanges in different generations ? Therefore, in order to escape such an artificial correction, we suggest detecting and removing the markers causing considerable deviation from the natural expectation of monotonic growth of recombination when moving for a chosen marker to its more and more distant neighbor markers (either left- or right-ward).
In corresponding tests, for each marker mi, the program calculates sequentially the ratio R=r(mi, mi +1)/r(mi, mi +2), R=r(mi, mi +1)/r(mi, mi +3), R=r(mi, mi +1)/r(mi, mi +4), etc., and the same to the left from the marker mi . This series extends till the recombination rate mi +j reached an arbitrary chosen level min(0.5; 1.5rs), where rs=rfs is the threshold recombination value introduced on p. 1_16. Clearly, due to the sampling nature of the recombination rates, one may get R=1 for the estimated rates of recombination, even if for the true rates it was R<1. Moreover, to be conservative, we may want to agree with violations of monotony that do not exceed some threshold, i.e., not reject markers that give R value slightly exceeding 1. For that, user may define his “degree of conservatism” by setting some threshold value of R*, so that cases with R<R* will be considered as tolerable. In “hard” regime, the algorithm finds the marker with highest violation of monotony (highest value of R>R*) and moves to Heap only this marker. The resulting set of markers is ordered again, without showing the order on the screen. Then, again, the worst marker is detected, moved to Heap, etc. (markers deleted to Heap could be later returned as attached ones, without having the privilege of affecting the multilocus order). In addition to the described “hard” regime of automatic control of monotony, we suggest also a “soft” regime. In this case, for one step the program can delete only one marker from the linkage group. Namely, we check which markers violate the codition R<R* from both sides, left-ward or right-ward. Out of this, we select the one with highest product Rleft*Rright.
.
2_4
93
In this example, we have got a
rather good result, with relatively
small deviations from the diagonal.
The result is accompanied by
appearance of the panel
<Sequence of operations> that
shows the details of the operation
conducted and the number of
removed markers.
After ordering a cluster, the user can apply the function Control of
monotony, by pressing button <OK> of the corresponding box. Before
that we should define the parameters of this panel or use the default
parameters.
User can chose the “hard” or “soft” regime, and can define the threshold value of R*. It is difficult to give a good universal recommendation for such a choice, independently on the map density and population size, quality of marker scoring, legitimacy of mechanical deleting of double recombinants, the level of missing data, etc. Clearly, the general intention should be to achieve maximal map stability with minimal losses of markers. If R*=1, all marker with R1 will be deleted. By default, R*=1.4.
New variant of the function Control of monotony (continued)
2_5
User can try to improve the map by
deleting markers, that are presumably the
troublemakers (causing the deviation from
the diagonal), e.g., markers # 316 and
312 in our example. These actions will be
reflected in the panel <Sequence of
operations> as well as in the <History
window> if we will call it.
If we find the results unsatisfactory, we should chose the step, where from we turned to the “wrong way”, and return
back one step before this turn. We mark this step and apply the <UnDo> option. This will return us to the preferred
previous ordering step. This allows achieving interactively, by trial and errors, the best parameters of the <Control of
monotony> function. You may find reasonable to take advantage from the initial application of the “Hard” regime, that will allow you
detecting a set of candidate bad markers. After you have got the list of the markers deleted using “Hard” option, you
may use it as a help. Namely, if after “Hard” step, you apply “Soft” option, you will get a list of markers that are deleted
by the first but not second operation. This list can help you in choosing individual markers for removing.
To see the names of the deleted markers,
we should chose the menu option
<DisplayHistoryMap>. A special window
appears with a list of markers.
2_6
New variant of the function Control of monotony (continued)
Now, if we chose a marker in a region of high map instability and press right mouse button, we’ll get and additional
option <Help>. By choosing this option, we obtain a message with a list of markers recommended for removal.
Actually, the recommendation is the name(s) of the marker(s) from this list that belong to the selected instability
region. In the presented example, this may be marker #382, because of its higher missing level. Before you delete
it, the message with the suggested candidates should be closed by pressing <OK>.
The result is indicated below. We can proceed now with other problematic neighborhoods/markers, or apply
<Undo> from any previous step.
2_7
New variant of the function Control of monotony (continued)
For better analysis of changes in the order quality after any operation that includes re-ordering of markers (i.e.,
ordering by itself, deleting markers, or Control of monotony), the user is provided with additional parameter. In the
list of markers, parameter <var> - the standard deviation of neighbor markers of each marker, is displayed. In
addition, <Glob.var> , the mean value of parameter <var> across all markers, is displayed.
2_8
New variant of the function Control of monotony (continued)
97
Some changes made to allow large-size data sets
We describe here shortly some changes made to allow working with large numbers of markers per chromosome,
e.g., a few thousands. For a large number of input markers, the matrices of pair-wise recombination rates and LOD
values are calculated just after the input and this takes time (e.g., ~2 min for 4000 markers). During the first saving
of the results (function Save all markers) these matrices will also be saved and this also takes time, but in further
applications of Save all markers function there is now no need to save these matrices again. Still, reading these
matrices during each reading of the saved results slows down the process. We provide these details to explain why
the analysis of big data sets is not so fast as you have seen when working with small to moderate data sets. It is
noteworthy that the ordering function is also time consuming for large data.
The number of markers for which
you can see the matrix of distances
of all markers or the marker scores,
is limited in size 550 (limitations are
caused by the grid tool). To allow
working with much larger numbers
per chromosome, some changes
were made to the grid function. In
the provided example we are
ordering a cluster with 779 markers.
In addition to previous one-
dimensional scrolling (either up-
down or left-right), we can scroll now
along the diagonal, using the
“diagonal scrolling” button. However,
keeping in mind the large number of
markers, we added one more tool to
facilitate the interactive analysis of
unstable neighborhoods.
2_9
For that, a new table, Dispersion, is generated, that displays an ordered list of
a parameter quantify the instability of the neighborhood for each marker:
2i= 0.5 pij (i-j)
2,
here pij is the proportion of jackknife runs where markers i and j were
adjacent neighbors. Obviously, markers with stable local order will give =1.
The table displays 2i for all markers of the analyzed linkage group.
Obviously, the user is interested to deal first with the regions of the map with
the highest values of this parameters, in order to detect and remove markers
with highest disturbing effect on the map quality (i.e., deal with regions
represented by markers for the top of Dispersion table). Selection such a
marker leads to re-centering of the grid table, so that the grid with its 550
lines/columns will cover a part of the total linkage group (that may carry
thousands of markers), centered around the chosen marker. After you delete
this marker or any other marker from this neighborhood, the table remains in
the same position relative to the entire linkage group but without the removed
marker, while the list of variances will be updated. Note that the deletion and
re-calculation cycle takes some time.
2_10
Some changes made to allow large-size data sets (continued)
Changes in function Extending the linkage group
In version 2.1, markers that were moved to Heap after application of the <Bound
together markers>, are not displayed in the list of markers that can be insert in or
attach to the skeleton map. Indeed, their distances to their “delegate” markers is
zero, thus there is no sense to add them (see p. 1_56).
The list of candidate markers to
attach or insert looks now as shown
here. Latter “S” denotes the main
(delegate) marker of a group of
bound together markers. By choosing
marker denoted by “S” we obtain
option <Display all delegate markers>
in the additional menu. It allow us to
see all groups of bound together
markers, whose delegates are
presented in this list.
Note that for populations F2 with dominant markers these options have been considerably changed in
v.2.1 compared with 1.2 (see p. 2_25).
2_11
Changes in function Print (output to EXCEL)
In v.2.1, all markers from groups of bound
together markers are shown in prints near
the corresponding delegate markers (i.e.,
those that displayed recombination). Names
of such markers are presented in brackets
(as [xxxxx]) with an indication of the
connection to the delegate marker of the
group. In the EXCEL table, the missing level
and segregation ratio are provided
(for more details see p. 1_74 -79).
2_12
Working with such data needs special consideration due to the fact that estimates of recombination rates between
repulsion phase markers is biased downward (Mester et al. 2003). Therefore, we proposed to subdivide such data
set into two subsets, each carrying coupling phase dominant markers (amplified on DNA of only one of the two
parents) and shared codominant markers. Then, the ordering markers in the two subsets is conducted based on
consensus mapping principle: in the two sets the codominant markers should appear in the same order (Mester et al.
2003, 2005). This approach is implemented in the current version of MultiPoint. All steps of mapping are the same as
usual, with two exclusions: (i) they are conducted after splitting the data into two subsets, and (ii) the procedure is
based on synchronous ordering with restriction that shared (i.e., codominant) markers should be the same order.
The linkage groups resulting from clustering
of the split data set is displayed by colors: the
two alternative types of dominant markers are
shown in red and blue, and codominant
markers in green. As before, we will have
clusters (linkage groups) with dominant
markers of both types (red and blue) but the
ordering of these clusters will be based on
virtual splitting of the markers.
Note: Before initial clustering the function
<Control of bound together
markers> must by employed
Displaying clusters (linkage groups) with dominant and codominant markers
Analysis of F2 data with dominant and codominant markers
2_13
If we chose now for ordering one of the clusters that includes only one type of dominant markers (e.g., with
green+blue markers), its analysis will be exactly the same as before. But if the chosen cluster includes
green+blue+red markers, it will be displayed differently: you will see two windows, each containing one type of
dominant markers, and shared codominant markers. The cluster name represent the type of included dominant
markers: <*_r> for red dominants, and <*_b> for blue dominants. Codominant markers are denoted by acronym
<CD>, or <CDS> if marker is a delegate of a group of bound together markers.
In two windows corresponding to
one linkage group we see two its
variants (LG_r and LG_b) with
green+red and green+blue
markers, respectively. Each of
these can be treated separately.
Displaying clusters (linkage groups) with dominant and codominant
markers (continued)
2_14
Let us start from ordering of the first (*_r) кластера. We can also delete some markers or
employ the function <Control of monotony>, but it is noteworthy that it does not delete
automatically codominant markers if even some of them violate monotony. In the same
manner we can analyze the second part of the cluster (*_b).
Treating a cluster with codominant and two types of dominant markers
2_15
If we close now any of the windows of the cluster, we’ll get the following icon
It symbolizes the fact that the cluster has not been yet ordered: codominant markers of its two parts
do not yet appear in identical order. So far, we only detected and removed markers that strongly violate
stability. After repeated opening of the cluster we will need to order again both its parts. When the ordering
is applied to a cluster carrying only one type of dominant markers the result is marked as usual:
If necessary, a codominant marker can be deleted manually, e.g. if it causes local map instability. Corresponding
information is presented in the table of marker characteristics (column var). If codominant marker displays high
local instability of its relative position and this situation cannot be improved by removing some of its neighbors, you
may decide to remove this codominant marker. In the example below, marker Xgwm 181 (#218) is highly unstable
in the red variant of the cluster (_r) and to a lesser extent in the blue version (_b) and we may want to remove this
marker. LG2_r LG2_b
Treating a cluster with codominant and two types of dominant markers (continued)
2_16
After treatment of two parts of o a cluster, it
may happen that their codominant markers
will appear in identical order. That was the
case with cluster LG6. Such situation is
marked as follows:
However, more frequently, to rich such a result,
we need to apply the operation <Consensus>.
For that, we should choose corresponding option
of the submenu for the corresponding part of the
cluster. This approach will be described further
(p. 2_19).
2_17
Treating a cluster with codominant and two types of dominant markers (continued)
Repeated clustering (under relaxed stringency)
In the previous example, we started the analysis from threshold recombination rate 0.25. Let us relax it to 0.28.
Then, any cluster with two types of dominant markers will be spilt into two parts. Therefore, two arrays of clusters
will appear: the first with codominant markers and one type of dominant markers (r), the second with the same
codominant markers and dominant markers of the other type.. The re-clustering upon relaxed conditions is
conducted in two steps, according to the two types of dominant markers. The process of clustering is carried out as
before: clusters that are closer to each other (by their ends) than the threshold are merged by default when the
informativity score LOD>2.0. Also by default, merging is prevented is the total length of the resulting cluster is 1.1-
fold longer that the sum of the lengths of the component clusters. In the reminder cases the decision is made by
user. Merged clusters are marked as shown below:
2_18
Employing Consensus option
Our approach of building multilocus maps with dominant repulsion markers using F2 data is based on splitting the
linkage groups into two sets, with a request that codominant markers should be in the same order. This condition will
not necessarily hold if one orders each of the two sets independently. To ensure such a condition we suggest
synchronous (consensus) ordering. <Consensus> menu option should be employed. In window <Consensus>, the
number of codominant markers and the number of conflict markers (denoted by «С») is displayed in table <Shared
(codominant) markers>. Currently, our algorithm provides exact solution for synchronous ordering of a pair of
chromosomes with 8 shared markers. In such a case a corresponding message appears on the screen, and by
pressing corresponding button we start consensus ordering. For more details about consensus analysis see p. 3_2.
The criterion of consensus ordering is minimum of the
total length of the two chromosomes under the
constraint that shared markers must be in shared order.
In the window, the results of changing this criterion
during the process of optimization is shown together with
time of the process. By the end of the process a window
with final results appears. If we close now the window
<Consensus>, we will again see the two parts of our
cluster, but codominant markers will now appear in the
same order, although the resulting maps may be slightly
longer (“the cost of consensus ordering”).
By the end of the computation, a corresponding message appears. 2_19
.
In cases where the number of codominant markers is >8 we
should delimit in both variants (red and blue) parts with the
number of shared markers 8 for consensus ordering of these
parts. For each such round of ordering, the flanking regions of
these parts remain unchanged. Corresponding message is
displayed on the screen.
Employing Consensus option (continued)
То select such parts we should press button <Display chrom.>.
A picture with codominant markers of both parts of the linkage
group will appear. There are two possibilities for reconciliation of
the order of codominant markers between the two parts: to
invert a sub-group of markers or to transpose a sub-group to
another interval. For that, one the options of the menu
<Reorganization of the cluster> are available, e.g., <Invert
the selected part>, suitable for presented example.
2_20
Let us select the maximum interval including the mixed up (inverted) region by indicating in the list of markers the
beginning and the end of this interval in the red variant of the map. For that, it would be reasonable to include non-
shared (i.e., red) markers flanking the inverted region. By pressing button <OK> we conduct the operation and
corresponding change in ordering appears on the screen: the marked interval will be inverted, the length of the
chromosome will change, and instead of the initial mix up we will see isolated conflicts. Before accepting this result,
we should also check what will be the reaction of the blue variant of the map to such operation. Using <UnDo> option
we return back and choose for the test the blue chromosome.
After the corresponding
inversion is done, we can
compare the results of inversion
of either the red or blue parts
and select the best, based on
the information on their length
change caused by inversion.
The selected change can be
saved by pressing button
<Save transformed order>.
2_21
Employing Consensus option (continued)
Here is another example where option <Move the selected part> is more suitable. In this example we have 10
conflicting markers. Let us apply option <Move the selected part> to LG36_r. Then, we select the sub-group of
markers to be transposed and the interval of their new location (in this case we move the selected sub-group to the
upper end of the chromosome). Then, by pressing button <OK> we obtain the result.
2_22
Employing Consensus option (continued)
In this example, like in the previous one, we may apply <UnDo> option and conduct the same operation with
second (blue) part of the linkage group. This will also result in the appearance of the table with information on
changes in the map lengths.
To resolve the detected conflicts, we can use the two available options for
Reorganization of the cluster sequentially. In the presented examples we have
got one or two regions with conflicts. We can now conduct consensus ordering
to resolve local conflicts of codominant orders in the red and blue versions of
our map. To do that, we employ option <Display conflict group>, select one of
the conflict groups and press <OK> button. After the window of the consensus
process appears on the screen, we press button <Start>. Due to the small
number of markers in the conflict region of the chosen example, the
optimization process will be very fast. It’s end is marked by a message and
window with final result appears.
By closing the window of the optimization process,
we will see in the table of share markers that the
number of conflicts has decreased.
Now, by pressing again button <Display chrom> we obtain the
window displaying codominant markers of both versions of the
chromosome (red and blue). The remained one conflict can be
resolved by conducting consensus analysis for this region.
2_23
Employing Consensus option (continued)
As a result, we obtain identical order of codominant markers in both variants (red and blue) of the map, that can
be seen when we close the window Consensus>. It should be noted that the map of each (red and blue)
consensus variants can be longer compared to the corresponding map obtained without the request of identical
order for the codominant markers in both variants.
Final comments and reminds:
1. After opening of a cluster with two types of dominant markers, each part (with codominant and
coupling phase dominant markers) should be first treated separately, by using <Ordering> option.
Only after that you should move to the tools in <Consensus> option.
2. Consensus analysis should be started only after the stepwise clustering (with gradually relaxed
threshold recombination rate) was finished.
3. We continue to improve our discrete optimization algorithm in order to increase the number of
simultaneously treated markers in consensus analysis, and thereby reducing the need in
treatment of the conflicts by small regions with 8 conflicting markers.
2_24
Employing Consensus option (continued)
Extending the linkage group – insert function
The function allowing to add markers to the skeleton map can be applied only when the clusters have already
been treated using Consensus menu, or cluster that from the beginning included only one type of dominant
markers (i.e., in coupling phase). After opening a consensually treated cluster, we will see two its parts (red and
blue) with codominant markers (green) in identical order. The function of extending the skeleton map by additional
markers is applied separately for each of the two variants (red and blue). After calling for this function, a question
appears on the screen on which markers will be added first, dominant or codominant? The reason is that
dominant markers can be added to the chosen part of the cluster, either red or blue, whereas codominant markers
should be added to both parts under the requirement of consensus.
If we answer <No>, the list of relevant candidate
markers from the chosen map variant (e.g., red)
appears. It includes coupling phase dominant
markers from Heap. In addition, the list includes
dominant markers from the second variant of the
cluster that can also be considered as candidate
markers. Using insert function can be helpful in
filling the gaps in the map or extending the map by
more distal markers compared to those of the
skeleton map resulted from consensus treatment.
In the example, shown are the lists of additional
markers for red part of a cluster. When a candidate
marker is chosen for insertion, the closest to this
candidate marker on the map is highlighted by bold
font. To move from one variant (e.g., red) to the
second (blue) we should us the button:
Red
Similarly, for the second part of the chromosome, we also should choose only dominant markers, because
codominant markers are added to both parts simultaneously (see next page). 2_25
To facilitate adding codominant markers (answer <Yes>) from Heap to one or both parts of the linkage group, a
special table appears. Appears also a switch allowing to insert the marker to one (red or blue) parts or to both.
Can be inserted only downward the nearest marker
Can be inserted only upward the nearest marker
Cannot be inserted, otherwise consensus is violated
User’s decision: for one cluster upward and for the other downward insertion is better
User’s decision – no conflicts
In this table, for each marker to be added, the following information is provided: its nearest neighbor in both (Red and
Blue) parts of the cluster, distance to the nearest markers and change in the map length upon adding this marker
upward or downward the nearest marker. User can employ this information together with the already ordered maps,
to make his/her decisions. For the variants of adding a marker to both (red and blue) maps, the system analyzes and
marks symbolically the possible situations of insertion without violation the consensus order:
2_26
Extending the linkage group – insert function (continued)
As before, the inserted markers are underlined, both in the list of markers and in the figure of representing
the chromosome map. As before, the added markers can be removed (in such a case, removing a
codominant marker from one part leads to its automatic removal also from the second part).
In appending a codominant marker to both parts of the linkage group, the possibility for insertion (upward or
downward the closest marker, or prohibition) is defined by the corresponding symbol of the foregoing table. When
a marker is inserted in one part only, the result depends on user’s choice (e.g., consensus may be violated). Thus,
this option should be employed only in specific cases, e.g., when a certain marker must be included to the map.
A red dominant marker was inserted A codominant marker was inserted
Red
Blue
2_27
Extending the linkage group – insert function (continued)
This function is also applicable only for clusters after Consensus treatment. However, for this function, the list of
additional markers includes only those dominant markers from Heap that are in coupling with dominant markers of
the considered variant of the cluster. Therefore, for the cluster considered in function <insert>, the lists of the
attached markers for the two parts will have the following form:
It is worth recalling that the attached markers are not displayed in the list, whereas
the markers which attach them are marked by letter G. If we choose such a marker
and click the right mouse button, we’ll get an additional menu. Its options allow to
see all markers attached to the chosen marker or chosen interval.
Extending the linkage group – attach function
Red Blue
2_28
Output, final results
In fact, the output of the results remains the same as in the previous version. However, the option <Print (output
to EXEL)> can be applied only to a cluster (linkage group) carrying either one type of dominant markers, or both
types of dominant markers that are already in consensus order. In the last case, the user should print separately
two graphs for the maps corresponding to two types of dominant markers. The choice of the variant is conducted
by answering the question:
The option <Final results> is possible only after finishing consensus analysis for all clusters with two types of
dominant markers. For each such cluster two types of output file are generated.
2_29
Treatment of F2 data with only dominant markers
Working with such data is very similar to the approach described above. As a result of
the initial clustering, we will get two types of clusters:
In this case, each cluster can be treated separately, e.g., as in backcross data. We can
conduct ordering the markers alternated with stepwise clustering with gradual relaxation
of the threshold recombination and merging end-to-end clusters within each class of
markers (red or blue). The result of such steps will be reflected in a graphical for as the
following picture:
An important question is how to “combine” the ordered clusters of each type into representatives of linkage
groups, having in mind two complications: (a) we have not here shared codominant markers, and (b) there is an
increased danger of false linkage between non-syntenic repulsion phase markers (see Master et al., 2003). We
suggest a simple interactive tool based on analysis of distances between the clusters. Let us press button
<Display table of distances>.
2_30
A new window will appear with a table of distances between clusters with opposite linkage phases and possible
menu options.
As a result, the names of chosen clusters disappear from the table, and a list of chromosomes created by this
merging appear in a separate table.
Treatment of F2 data with only dominant markers (continued)
2_31
Rows present cluster with one type of markers (red) and columns the other one (blue). Small distance between two
clusters is a basis to consider them as a part of one chromosome. By analyzing the table of distances, user can
choose certain rows and columns and activate function <Select LGs that belong to one chromosome>.
If user suspects that the last choice was wrong, option <UnDo> can be applied. Option <All chromosomes>
erases the list of the created chromosomes and recovers the
initial table of distances. Option <Selected chromosome>
removes from the list the selected chromosome and recovers
corresponding clusters the table.
Analysis of distances between the distal markers of the clusters may allow
determination of relative position and orientation of clusters in the
chromosome. For that, the option <Orientation of LGs> can be employed.
The chromosomes ordered in such a ways are marked by a special sign. It
should be noted that not always the distances between distal markers of the
clusters provide sufficient information to allow unequivocal ordering.
After closing the table of distances and the list of chromosomes, we obtain
the list of clusters combined in chromosomes.
2_32
Treatment of F2 data with only dominant markers (continued)
We employ designation F1xF1 for mapping populations obtained by crossing two
heterozygous diploid individuals, although usually they are referred to as F1 in the
literature. Such crosses are commonl for outbred species and. For a heterozygous
locus, the progeny of such cross may segregate for two-to-four alleles. Loci
heterozygous in both parents and segregating for 4 or 3 alleles will be referred to as
F1 (crosses A1A2 x A3A4 and A1A2 x A1A3, A1A2 x A2A3); similarly The presence of
2 segregating alleles may represent a situation when both parents are heterozygous
for the same two alleles, A1A2 x A1A2 (referred to as F2), or situations when only
one parent is heterozygous, A1A2 x AA (referred to as testcross or ”backcross”).
Data on this population can be prepared in one of two formats: as a Tab-delimited
table or the format of JoinMap package. During input of population data you should
put the <Recoding data availability> in state <on> and press button <Select data
Population F1 x F1
Data input
file for recoding>. The program recoding.exe will input the data, test the file, out put the detected errors, and
create two files – with mother and father alleles (moth.txt and fath.txt ). The initial part of the names of these files
coincides with the name of the initial data file: for example, dataBor-moth.txt and dataBor-fath.txt. These files will
appear in the same folder where the initial data file was placed. Some details of Recoding function are provided in
the Instruction on the next page. For F2 markers, the scores of the markers are presented in both moth.txt and
fath.txt files; values of backcross markers appear in (one) corresponding file.
Creation of the two files is the first phase of input. Now you should put the <Recoding data availability> in state
<off> and, after pressing the button <Select data file for recoding>, select either of the two created files and
the folder where you want the solution will be saved. After input is done, the program calculates two matrices of
recombination rates, for female and male sides. Please note that this is a rather slow process, but we hope to
expedite it in the future.
2-33_
Instructions for Recoding.
If input file extension is .xls the desired worksheet is opened (assuming it contains data analogous to that as in
example "fam1test.xls "). Otherwise the user should select the format of text file.
Now two formats are available:
Tab-delimited table (columns - marker alleles) - this is the same as for Excel worksheet, but the file is tab-
delimited text and "Joint Map" program (CP - population type).
After the program have stopped to work, it create file with the name "<input_file_name>_err.txt“.
Messages in "<input_file_name>_err.txt" file:
1. If the user has clicked "Cancel" on some stage of the input, the file contains the message "Cancel".
2. If the opened worksheet was empty, the message in the file is "Empty worksheet".If some special format was
selected and the file is not compatible with it, the message in the file is "Erroneous file format".
3. If the file was interpreted successfully, the file contains: the number of individuals; the number of informative
markers for inheritance from father and mother; notes about non-informative markers (list of names) and errors
in data (if any) and user selection, how to work with data. For example:
"Genotypes: 237 Father: 244 markers Mother: 225 markers
Notes: Not informative for father genotype:NYU10 !
Not informative for mother genotype:
222C,210B,224B,208E,206B,205F,211E,109A,130A,44c,idh2,NYU19,NYU6,23a,NYU3,143C,twhh,142c,1c,NYU50,12
0C !
Mendelian errors found in YU22 !
Number of individuals in the file header is 237 and does not coincide with numbers for markers 211C (236),NYU3
(240)!
Incompatible dominance types in markers 214E!
User selection:
Markers having nonvalid number of individuals were excluded by the user!
Mendelian errors were replaced with missing values!
Marker 214E made codominant by the user.
If there are some informative markers for father, the data file "<input_file_name>_fath.txt" is created. The file contains
strings, which begin with name of marker following with space, then marker data (1,2,3,4,5,0 – as usual) divided with
tabs.If there are some informative markers for mother, the data file "<input_file_name>_moth.txt" is created, having
the same format.
2-34
Preliminary treatment
After input and building the recombination matrices, the program displays a summary table with some characteristics
for each of the markers. It includes data on missing scores and Chi^2 for deviation of segregation ratios from
expected ones for both female and male sides (for backcross markers these data appear for only one of the sides).
For each marker, the table includes also its max LOD value (for most significant linkage on the whole set of the
remaining markers). All data can be sorted according to the values of any of the columns, helping to select the
markers that you may want to delete (too many missing scores, or high segregation distortion, or too loose linkage
with any of the remaining markers).
2_35
You can mark such problematic markers and press button <Delete markers>. These markers will appear in the
bottom part of the table and the total number of markers will be updated. Two undo options are available here:
<Undo of last step> or <Global Undo>. After closing this window, we can move to the next step.
Preliminary treatment (continued)
2_36
Control of bound together markers
This process was already described earlier, but Its specific features for the
F1 x F1 population should be are noted. Groups of bound together markers
are created separately for the male and female side data. A group is
registered if all its markers are of backcross type, or if includes only one
marker of F2 or F1 x F1 type and the others are of backcross type. But if
the group includes a few markers of F2 or F1 x F1 type, we will retain in the
group those markers of this type that are bound together in both moth.txt
and fath.txt files. As a result, a window will appear with info about groups of
bound together markers and the number of markers moved to Heap:
Marker clustering is conducted based on a common for the two sets
matrix of pair-wise recombination rates that is built from minimum between
male and female side recombination values. To start the process, you
should press button <First clustering>.
2_37
First Clustering
Clustering is conducted as with other populations. We
recommend to start with small threshold values.
In this example we started with threshold 0.1.
2_38
The general view of the obtained clusters
At the further steps of clustering in addition to parameter “Recomb.Rate threshold” uses also a
parameter “LOD threshold” 2_39
128
In the figure with the obtained clusters on
the previous page, green color marks
clusters that include only shared markers
(F2 or F1 x F1), red color denotes mother
alleles, and blue denotes father alleles.
By opening any of the clusters (using
double click) we obtain a window with the
two parts of the cluster, according to
female (r) and male (b) alleles. Shared
markers (F2 or F1) are denoted by Sh,
the remaining markers are of backcross
type. Each part, b and c, is ordered
separately. Ordering combined with re-
sampling (jackknife or bootstrap) is a
relatively slow process and we hope to
expedite it in the future version.
Treatment of each cluster
2_
129
As usually, during analysis, a marker can be moved to
Heap. A unique (backcross) marker is then deleted from
its set whereas a shared marker (F2 or F1*F1) can be
deleted from both male and female sets or only from one
of them. To make a proper decision, the user can check
the effect of the marker on the order stability in both male
and female parts.
Treatment of each cluster (continued)
2_
130
During the analysis of each male and
female part, it may be useful to take into
account the matrices of the pairwise
recombination rates LOD values. They
can be displayed on the screen by
using corresponding menu options.
Note that matrix LOD is calculated
during the ordering procedure and can
be displayed after this operation.
After closing the analyzed cluster, we
obtain the window which shows all
clusters. A cluster with male and
female part ordered is denoted as
shown below, with LG48 as example.
If shared markers in the two parts are
in the same order (consensus), then it
is displayed as LG8
After repeated opening of these clusters, we obtain the pictures of their parts, while for the cluster in
consensus a special message will also appear.
Treatment of each cluster (continued)
2_
131
If for a treated cluster the shared markers of the male and female parts
are in a conflicting order, they should be re-ordered using menu option
<Consensus>. It makes sense to conduct this step after the user has
achieved the reasonable size of the clusters during the stepwise increase
of the threshold recombination value. Thus, we demonstrate the
consensus analysis for one of the clusters assembled at threshold
r=0.25.
In the example, the cluster contains 15 markers of F2 or F1*F1 type
shared by the two parts of the cluster. Each part is already ordered.
Thus, we use menu option <Consensus> on either part of the screen
(left or right). The resulting window shows that we have here 15 shared
markers with four of them being in conflicting order.
By pressing button <Display chrom.> we can see
the markers, their relative positions and the conflicts.
We can also change the conflict situations by
ourselves as in case F2Dom (see pp. 2-20 – 2-23).
Now we press <Creation of consensus order> to
call for the process of consensus analysis .
Treatment of each cluster (continued)
2_ 2-43
132
Treatment of each cluster (continued)
After pressing button
<Start>, we will be able to
stop the process only after
finishing the function
<Optimization by Global
Criterion> (will be
accompanied by a special
message). We press button
<Stop> and could see that
the conflicts are eliminated.
(For a more detailed
description see Part 3).
2_
To resolve this conflict, we should repeat the process, by pressing the button <Display chrom>. Now, when all the
conflicts are resolved we can close this window.
133
Treatment of each cluster (continued)
Division option for F1_F1
By opening a cluster, we will actually get two sets of markers, maternal and paternal. For each of the sets we should
conduct Ordering and then Consensus procedures. Consider and example.
This cluster was saved at the stage Consensus,
hence the message
After pressing <OK> button, we will see the two
maps. Let us subdivide this cluster (both its
maps, r and b) into two parts.
2_
134
Treatment of each cluster (continued)
In the right part (LG2_b) chose menu option <Division of the linkage group>.
Select markers starting from the first one till JM042E24r_127(514), using Shift
button. The selected range of markers will be highlighted in blue. By pressing
the mouse right button, the following message will be obtained:
Division option for F1_F1 (continued)
Simultaneously, in the left part (LG2_r) a marker displayed with a larger bold
font will indicate the border of the marker set to be used for creation a new
cluster. The map will be shifted to leave place for displaying the highlighted
marker. In the left part we must choose the menu option <Division on the
linkage group> and select the set of markers flanked by highlighted (with bold
font) first and last markers.. The user may slightly modify the choice, e.g., by
including markers non-shared markers outside or inside of the selected range.
However, all shared markers (F2 and F1_F1 ) must be selected. By pressing
the right mouse button, we will get the following message:
By pressing this button we agree with
this suggestion, and the selected
markers will be removed from the list
and surrounded on the map by a
puncture frame.
2_
Treatment of each cluster (continued)
We return now back on the right part of the window, and click on the field near the marker list. This will allow us to
see all earlier selected markers. By pressing the mouse right button we’ll get the message
Division option for F1_F1 (continued)
We accept this proposal and delete 24 markers: they
will be removed from the list and surrounded on the
map by a puncture frame. Now we select all markers
from the lower part on map displayed right side of the
screen, to create one more cluster, and repeat the
described above procedure. But now we do not need
to select repeatedly the menu option <Division on the
linkage group> (neither on the right nor on the left
side of the screen).
As a result of these steps, only one marker remains in the cluster (will be removed
to Heap). The derived maps will look as shown in the picture:
After the closure of the cluster, the following question appears:
By choosing <Yes> you implement the division of the cluster; otherwise
All the steps related to subdivision of this cluster will canceled. 2_47
136
Treatment of each cluster (continued)
Division option for F1_F1 (continued)
As expected, we have now 26 instead of 25 clusters. Cluster LG2 includes now 38 markers and is marked as
a New, in status “ordered and in consensus”. Also a new cluster has appeared, LG26 with 25 markers,
in “ordered and in consensus”. The results should be saved using menu option <Save all clusters>.
2_
Other functions of the system (repeated clustering and output of the results) are very similar to those for
other types of mapping populations.
3_1
Table of Contents
3_1
3_2
3_3
3_4
3_5
3_6
3_8
3_9
3_10
3_11
3_14
3_15
3_17
3_18
3_18
3_19
3_28
3_32
3_34
3_34
3_37
3_40
3_41
3_44
3_44
3_47
MultiPoint Tutorial
Part 3 Consensus mapping analysis of multiple data sets
:
Introduction Building multilocus consensus maps
Two-phase algorithm for consensus mapping
The general scheme of consensus mapping analysis
The general scheme of consensus mapping analysis
Input data
Preliminary analysis
Ordering each chromosome separately
Consensus analysis in case of high proportion of shared markers Creating derivative datasets without unique markers
Consensus analysis in the absence of unique markers
Results of consensus analysis in the absence of unique markers
Continuation of the “consensus” analysis in the absence of unique markers
Consensus analysis for all markers Beginning the consensus analysis
Local analysis
Global analysis
Reviewing the results of consensus analysis
Displaying the results Integral map
Results for each set
Saving the intermediate results and continuing the analysis
Removing and adding sets in the process of consensus analysis
Appendix
Reorganization of maps for set pairs with conflicting orders
References
Building multilocus consensus maps
The Objective: Building multilocus genetic maps based on data
from different labs and mapping populations with a requirement
that shared markers must be in shared orders. Multilocus
consensus mapping (MCGM) is a further complication of
genome mapping. Two approaches were suggested to solve
MCGM problems, both looking for shared orders with maximum
number of shared markers. The first approach is based on
“giving credit” to the available maps; to obtain the consensus
solution different heuristics are employed, e.g., graph-analytical
method based on voting over partial orders (Yap et al. 2003;
Jackson et al. 2007).
(Mester et al. 2005; Korol et al. 2009). The algorithm implemented in MultiPoint is based on this approach and
includes two phases (see next page). On Phase I multilocus ordering for each data set is performed combined with
iterative re-sampling to evaluate the stability of marker orders in the individual maps. On Phase II, we consider
consensus mapping as a new variant of the famous Traveling Salesperson Problem (TSP) that can be formulated as
synchronized-TSP, and MCGM is solved by minimizing criterion of sum of recombination lengths along all multilocus
maps for the considered chromosome Mester et al., 2010).
We apply as the main criterion the sum of recombination rates (SRR) taken across the participant maps, i.e.
SRR=Li. Clearly, the amount of information about the multilocus order provided by the ith dataset is proportional to
its sample size Ni. Hence, it is natural to employ as optimization criterion the weighted SRR, with wi=Ni/(Ni) taken
as weights: SRR=wiLi. Another factor that may affect the between-set differences in information content is the
accuracy of marker scoring. Erroneous scoring leads to inflation of the map length, hence increased impact of low
quality data on the final result. To compensate for this effect, we employ weights that reduce the influence of the
datasets with long individual maps, e.g., wi=L0min/L0i, where L0i and L0min denote the initial (before consensus
Graph-theoretical approach for reconciling orders
received from different sources (Yap et al. 2003)
3_2
The second approach is based on searching of consensus
solution by re-analysis of raw data, instead of looking for
shared orders in pictures of previously constructed maps
analysis) map length of the ith and the shortest chromosome, respectively; to account both effects, we use
weights wi=(Ni/L0i)/(Ni/L0min).
Two-phase algorithm for consensus mapping
Original datasets
Phase I. Constructing verified
multilocus maps
Phase II. Consensus Mapping
SCF exact and heuristic
algorithms
FF heuristic algorithm
n>16
Heuristic algorithm
n14-16
Exact algorithm
For the second phase, i.e., for searching the consensus solution to MCGM, two different algorithms are available
in MultiPoint. The first one was named Full Frame (FF), and it assumes using special heuristics for global discrete
optimization of synchronized-TSP for all markers (unique, shared conflicting and non-conflicting). Our numerous
tests show that FF algorithm is effective with up to k=10-15 populations (data sets) with total number of shared
markers N<50. For larger problems, we developed another algorithm, based on defining regions of local conflicts
in the orders of shared markers (referred to as Specific Conflicted Frames, SCF), followed by “local” multilocus
ordering for each such region. This approach allows solving much larger MCGM problems (e.g., with k>20-30
populations and N>50-100 and more markers) by consequently moving along SCFs.
Solving MSGM via dissecting the chromosome into SCFs includes defining sets of conflicting marker regions
obtained on Phase I (based on non-synchronized solutions). Then, SCFs are formed by analysis of all pairs of the
resulting individual maps. Each SCF contains shared conflicting and non-conflicting markers, and some set-
specific (“unique”) markers. The remainder non-conflicting shared markers between the SCF regions are
considered as “frozen” anchors during the solution process for each SCF region (hence, only SCF markers
participate in the optimization process). This version of the algorithm significantly reduces CPU time. Moreover,
for certain sizes of SCF exact solution can be obtained.
The described algorithms are represented in more detail on the schemes in the next two pages.
3_3
The general scheme of consensus mapping analysis
Input and separate analysis of each dataset
Possible utilization of bound together markers for all sets simultaneously
Multilocus ordering of each set separately
Consensus ordering based on FF
(Global First step)
Defining local regions with conflict orders
of shared markers (SCF). Resolving the
conflicts. маркерами (Local step)
Continuing joint analysis of all sets using the preliminary results
from Global First or Local steps
Consensus ordering
Results
Output of the integral map
for the chosen method of
consensus analysis
Displaying all set maps
for the chosen method
of consensus analysis
Displaying for each set the
maps based on using
different consensus methods 3_4
Consensus local analysis of several sets with conflicting shared markers
Analysis of conflicts in pair-wise combinations of datasets with allowing for heuristic rules
of transposition and inversion in order to get a better initial point for consensus ordering
Defining frames (regions) with local
conflicts across several sets.
Consensus analysis for the defined
frame of the chosen group of sets
Until all pair-wise conflicts are resolved
Control for the presence of reminder
conflicts and their resolution
3_5
The consensus analysis system of the MultiPoint package is build for comparison of multiple maps and conducting
joint analysis of multiple data sets in order to build consensus maps that obey the requirement: shared markers in
these maps should appear in shared order. It should be noted that the consensus analysis across multiple data
sets in conducted separately for each chromosome. This means that for starting this analysis, the user should have
the markers classified into linkage groups, based on the literature, previous analysis
with MultiPoint standard version (2.1), or previous analysis with any other software.
In the corresponding window we should
select a folder with mapping data and a
concrete data file. The file name will be
marked as “select”. By pressing button
<Input Data> we input this file. After all
necessary data file are included, and their
names marked as “input” are seen in the
input window, we should press button <End
of input> and chose the folder for saving the
intermediate and final results of the analysis.
Then the main window of the analysis
appears. Note, that joint mapping analysis
may include datasets from different types of
mapping populations, e.g., dihaploid, F2,
and RIL, simultaneously.
The functions of consensus analysis are provided by the option <Consensus> of the
main menu, which includes 3 sub-options. Let us start with the first one, <Input files>.
After choosing this option, we should press button <Select data file for input>.
Input data
3_6
Input data (continued)
In case of mapping data for populations RIL_Selfing or RIL_Sib_mating, the analysis
of a separate data set is conducted using “observed” recombination rates but for
consensus mapping analysis we employ “transformed” rates (p. 1_83 – 1_85),
otherwise the comparison and joint analysis with other population data would be
impossible.
For IRIL data, the number of intercross
generations employed in building the
mapping population should be indicated.
Input of F2 data is controlled for the presence of dominant markers. In such a case, the dataset is split into two
subsets, each containing codominant markers and dominant markers in coupling phase (see also p. 2_30 – 2.32)
The names of these sets include the name of the initial file and extension “red” or “blue”.
3_7
After data input, we obtain a window with a table showing the list of all chosen files, population type(s), sample sizes
and numbers of markers. These files are also named as (numbered) data sets.
During the analysis of shared markers we should take into account the bound together markers among the shared
markers. Markers that belong to this class actually do not participate in further analysis and could be moved to Heap
till that last phase, when they can be returned to the final consensus maps. Their presence in the input data is
reflected in the difference in the number of markers between the < in set> and < in map> columns. The item <Shared
markers> for each set represents the number of shared markers, i.e., those that appear at least once among the
marker names of other sets. The number of bound together markers for each
set is shown in brackets.
As usually, we can employ the function <Control of bound together
markers>, and indicate names (or part of names) of priority markers
(see the details in p. 2_2, 1_38). After pressing button <Start of control> the
item <Markers in the map> will change for each set.
During the analysis of groups of bound together markers, only markers that are not shared, or those that have only
one shared marker are considered (and this shared marker, or “delegate” will represent the group). In groups with
several bound together shared markers and several unique markers, the program chooses a shared marker with
highest priority that is used then as a “delegate” of the unique markers of the group (that will be moved to Heap), while
the other shared markers will remain in the set. It is reasonable already at
this stage to choose the mode of transformation of recombination rates to
map distances (cM) and save corresponding choice using the option
<Save all change>.
Preliminary analysis
3_8
Each chromosome can be ordered, and certain markers can be moved to Heap. The “consensus” framework imposes
some constraints on this stage of analysis (compared to our standard scheme described in the non-consensus
chapters of the tutorial). Namely, to improve the quality of the multilocus order upon resampling analysis, shared
problematic markers (if the list of detected problematic markers include shared markers) cannot be deleted using
automatic <Control of monotony> function. Such marker(s) can be deleted only manually. Note that shared markers
are signed by symbol “Sh”. After separate ordering, each set is marked in the table with a special symbol; the weights
of each set in the corresponding optimization criteria also appear in the table.
Ordering each chromosome separately
Our consensus mapping is based on joint analysis of raw mapping data from multiple populations. For the chromosome
in question, we consider as the best solution such set of maps for the involved mapping populations that provides
minimum to the criterion “weighted sum of map lengths across the involved populations” (for the proposed weights see
p. 3_2). We can choose one of the three proposed weighting approaches, according to: (a) the sample sizes, (b)
lengths of individual (before consensus analysis) maps, and combined (a) & (b). There is an option to put as weights
numerical values (multipliers) proposed by the user. Then the column <Cur.weight> is replaced by values of the chosen
weight.
The weight for each set can be changed by the user, by selecting the set and pressing
the title <Cur.weight>. Then, in the appeared window
the corresponding value is replaced by the needed
value, followed by pressing <OK> button.
3_9
3_10
Consensus analysis in case of high proportion of shared markers
During data input, the program determines the set of shared markers. If many of the shared markers are co-
segregating, we perform the “bound together” procedure only for shared markers. It is noteworthy that
some two markers co-segregating in one dataset may recombine in another dataset, while one or both of
them may be absent in a third dataset. Thus, we first build bound together groups only for such markers
that are present and co-segregate in all data sets. The next step is performing the <Control of bound
together markers> function, which defines groups of bound together markers for each data set. During the
analysis of each such group, the program takes into account the groups of bound together markers
obtained for shared markers. As a result, the maps of separate data sets may include cosegregating shared
markers that do not belong to one group of bound together markers. This peculiarity will also be reflected in
the corresponding output EXCEL files.
Some peculiarities of the function “bound together” function when all markers are shared
We consider here a version of consensus analysis applied only to shared markers. It may be especially useful
when several mapping populations have been genotyped with an SNP array. In such situations the proportion
of shared markers between at least pairs or trios of populations may be very high. It appears that focusing on
shared markers simplifies the analysis and enables to perform consensus mapping for a very high number of
markers (Mester et al. 2015). Therefore, after separate ordering of all involved datasets, the user should
estimate the proportion of unique markers and decide whether he/she is ready to ignore unique markers and
conduct consensus analysis only for the shared markers.
----------------------------------------------------------
Mester D., Y. Ronin, P. Schnable, S. Aluru and A.B. Korol. 2015. Fast and accurate construction of ultra-dense consensus genetic
maps using evolution strategy optimization. PloS One 10(4): e0122485.
3_11
We consider a stage after the input of all datasets and ordering of each such set. The data include a high
proportion of shared markers and many cosegregating markers
The assumption that all markers are shared simplifies the analysis,
enables to work with very high number of markers, and most
importantly allows applying our heuristics to global optimization
criteria (Mester et al. 2015). After separate ordering of all involved
datasets, we should estimate the proportion of unique markers and
decide whether the proportion of unique (population specific)
markers is small enough to ignore them and perform consensus
analysis only for the shared markers. In such a case, by answering
YES, the unique markers will be removed from each datasets. The
result of this operation is shown in the table on the next page:
Creating derivative datasets without unique markers
Consensus analysis in case of high proportion of shared markers (continued)
3_12
You should apply function <ordering> for each set. This step will provide us with information on
map length of each set, with and without unique markers (in cM and as a sum of recombination
rates across all intervals (i.e. for all pairs of adjacent markers).
Using the option <View list of the sets> we can move from the comparative data
on map lengths to details on each set. Here we can choose between
<all markers> and <shared only>.
Creating derivative datasets without unique markers (continued)
3_13
At this stage (before the consensus analysis)
we can compare two variants of the map for
each data set: with and without unique
markers. For that, we should activate the
button <Comparing two first order>, select the
menu option <View list of the sets detail>
All markers> and chose one of the sets.
In the resulting window, we press button
<Display> and obtain two sets: <First> - with
unique markers, and <First shared only> -
without unique markers. Symbol (Sh) indicates
shared markers.
Creating derivative datasets without unique markers (continued)
3_14
Consensus analysis in the absense of unique markers
By activating button <shared markers> and pressing buttons <global analysis> and <Start of process> we
obtain the window reflecting the process of consensus analysis. After a certain delay, we will see in the column
<Non-consensus solution> the initial map length of each set and the sum of map lengths of all sets (multiplied
by 100000) in column “Criterion’ and the proxy of map length for each set calculated as a sum of recombination
rates across intervals.
During the consensus analysis, the
values in column <Consensus solution>
become smaller. Simultaneously, the
values «Cost of consensus,%»
(reflecting the proximity of the consensus
map lengths and the initial map lengths)
become smaller. When these differences
do not change anymore, you can stop
the process by pressing the button
<Stop>. This table is saved in the project
folder as a txt file CostofConsensus.txt.
3_15
Results of consensus analysis in the absence of unique markers
The results of consensus analysis are summarized in the following table:
We can compare the
consensus and initial marker
order for each of the
analyzed sets. For that, we
should select the menu
option <Comparing all
result with first>, move to
<detail> and select the
desired set.
In the example we can see
two sets that differ in the
degree of changes of the
consensus order compared
to the initial order.
3_16
A useful function is comparison marker positions in the initial and consensus maps. If we should select a marker
in the table <First shared only> this marker will be highlighted I bold blue in the <Global shared> table.
Results of consensus analysis in the absence of unique markers (continued)
3_17
Continuation of the “consensus” analysis in the absence of unique markers
Two reasons justifying the need in continuation of the optimization process in consensus analysis can be
mentioned:
(a) The user may want to continue the process assuming that the optimal solution has not yet been obtained.
(b) A break in computations has occurred during the analysis. In this case, after the next enter to the system,
a message will appear: «Previous computation was ended abnormality. Do you want to continue the
computation from the last control point?» By answering “YES” you can continue the process. You should
select and press the same buttons as described on page 3_14, but instead of button <Start of process>
you press button <Continue the process>.
Beginning the consensus analysis
We employ a combination of two methods for searching the consensus solution, local and global. The first one allows
fast calculation of a good approximation that can be employed as a starting point for the global analysis. In
global analysis we search for the solution by working simultaneously with all markers of the linkage group. In the
local analysis, using the individual (non-consensus) solutions for each mapping population, we first reveal regions of
local conflicts separated by non-conflicting regions. The consensus analysis is then applied separately to each
conflicting region using our heuristic discrete optimization tools. Due to relatively small size of such regions (with
respect to the number of shared markers), the solution does not take too much CPU time. Moreover, when the region
is really small, and exact solution is also possible. Combining the solutions for the local conflicts, we obtain a good
approximation for the global analysis. However, we can also use the global analysis from the beginning. Still, we
advice to start using local analysis.
To start the analysis begin, we should chose the method of solution. The
global analysis is conducted with all datasets. It can start from the results
of the initial analysis conducted before consensus with each data set separately (denoted as “first”), or from the
results of local consensus analysis as the initial approximation. After the type of analysis is chosen, the upper part
of the window shows the map length for each set (in cM) as well as total length in cM and total sum of
recombination rates (Criterion), for the initial (before consensus) maps.
Using the options of <View list of the set> menu, we can see the main characteristics of each set.
3_18
Consensus analysis for all markers
When the local analysis is chosen, the user obtain a table of pair-wise conflicts
in the orders of shared markers. For pair of sets we see the number of shared
markers and, when some are in conflicting order, the number of such conflicts. Namely, symbol “14(С_6)” indicates that for considered pair of sets, the number of shared markers in the targeted
linkage group is 14 and the number of shared markers in conflict(s) is 6. In the considered example, the number of
conflicting markers for each pair is lower than the total number of shared markers per pair. More complex situations
will be shown in the further examples (see Appendix 1).
In many cases, the data may include bound together markers. They can
appear in different orders in some sets but this cannot be considered as
a conflict. In the example, markers m21 and m87 (with zero
recombination) can be considered as staying in the same order. The
same would be true if zero recombination is found only in one of these
two sets. In calculating the table of pair-wise conflicts, we do not consider such cases as conflicting orders, thereby
reducing the total number of conflicts, but the user may prefer to not using such simplification, by negative answer
to the system question:
Yes
Obviously, the alternative answers will result in different tables of pair-wise conflicts.
No
Local analysis
3_19
We consider now how to search the
solution by dividing the datasets into
regions of local conflicts followed by
resolution of the local conflicts. By
double-click on the name of one of the
sets, e.g. Set3, we obtain a new
window that shows the shared markers
of the chosen set.
. To confirm the choice we press <OK>.
If we found a few small neighborhoods
with conflicts, we may want to select
these regions simultaneously resulting in
a combined conflict region.
In the employed example, the differences between the tables are not big because of the small proportion of bound
together markers. If opposite is the case, then the difference would be much more important and affect the
performance of the analysis.
shared
Local analysis (continued)
3_20
The system collects all the sets that contain the selected conflicting markers. Clearly, if the selected markers in such a
set are in conflict with some markers in other sets from the same defined region, these other markers are also
considered as a part of the conflict. Thus, all sets are tested for the extended thereby group of conflicting markers.
Consequently, the extension involves not only the conflicting markers but also the corresponding sets of populations.
Then, for each set and its local group of conflicting markers, the system finds the first and last markers in the group
and analyzes all markers above the first and below the last conflicting markers, until the next conflict is encountered
on from one or both sides or till the end of the linkage group is reached. To conduct local consensus analysis, the
group of conflicting markers is surrounded by a minimal number of shared non-conflicting markers. Red color here
highlights the shared markers comprising the conflict region surrounded by non-conflicting border markers (the whole
group is denoted by left red bracket) for each of the analyzed sets, whereas non-conflicting shared markers (Sh_*) are
shown in usual color. The list of sets included in the local consensus analysis of the region in question is provided as
well as the list of shared markers in this region.
Local analysis (continued)
3_21
In selection an interval of conflicting markers, the number of chosen markers can be increased by using menu
option <Include inside shared markers>.
The importance of this option can be seen from the following example.
Marker <Sh_ mar19> residing within the group of conflicting markers
shows no conflicts with these markers in any of the sets (hence
presented in black font). Without using this option, its order will not be
controlled, resulting in a possibility of conflicts.
If we do use this option, marker <Sh_ mar19> will be included to the
set of conflicting markers (highlighted in red) and after consensus
ordering will appear in shared order.
Local analysis (continued)
3_22
.
We continue to demonstrate the analysis using the example from p. 3_12. As a rule, the selected by default regions
(marked by red brackets) are well suited for searching local solution. However, we can extend these regions by
using menu option <Change the list of chosen set> <Change the selected part of the chosen set>. Then we
choose the set to be changed, and after selecting in list of its markers the upper and lower markers, press <OK>.
Several conditions should be taken into account. If a shared non-conflict marker is a border marker (included in the
bracketed group), it cannot be an internal marker for of targeted region in any of the sets, and vice versa: if it is
internal for any of such regions, the same should be correct for all other sets. The internal part cannot be bordered
by conflict markers from both sides. Thus, the region should be extended in such a way that at least one of the
border conflicts becomes internal. This may cause changes in all involved sets. The extension may not always be
possible, e.g., if in one of the sets a marker that should be internal is on the upper or bottom border. In such a case
a corresponding message appears and the extension is cancelled.
Local analysis (continued)
3_23
By closing the window <Consensus> we get the system’s inquiry whether we are ready to move to the process of
consensus analysis. The answer <No> makes sense if in one of the sets the selected part includes a high number
of markers or is bordered from both sides by conflicting markers. In such a case we may want to return to
<Consensus> window in order to try defining conflicting regions starting from another set. If we answer <Yes>, a
warning message may appear about long time that will be needed if we employ exact solution method. For such
cases we recommend using heuristic method of local search. The user may choose one of two calculation
methods by answering to the special system request. Our experience shows practically identical results; you may
choose the exact method, but the heuristic method works faster. If you get tired from waiting the result, you may
close the window ProgressBar and return to the stage of selection of conflicting intervals or just change the
calculation method (i.e. move to the heuristic method).
After returning to <Consensus> window we should press button <Start of consensus process>. The process of
exact local solution (testing all possible local orders) will take a relatively short time resulting in a line displaying
the results. In the first position of this line we see the number of the trial (1, 2, etc.). Symbol «_СЕ» indicates that
the method of exact solution was employed whereas «_Н» will indicate that heuristic method of local search was
employed. For each set, its total map length after applying the consensus analysis is shown (in cM) and the
number of markers (in brackets) in the selected for local analysis region. The item Time indicates computation
time, in sec (for the heuristic method – the time allocated for the solution), and item Criterion – the reached value
of the optimization criterion (sum of the recombination rates along the treated chromosome multiplied by 100000).
The analysis can be repeated, e.g., for larger parts of the chromosome or for the same part but using the heuristic
method. For that, we should press button <Return to the work with this part>. After the new round of analysis, we’ll
get a new line of results with a new trial number.
Local analysis (continued)
3_24
Local analysis (continued)
After selecting the heuristic method, we obtain a window representing the optimization process, and can press the
button <Start>.
In our example, we show a situation when for each local group of conflicting markers the solution is searched by
heuristic method (rather than using the exact local solution by testing all possible local orders). This approach is
referred to as “New Full frame algorithm (FF)” (see p. 3_2 -3_3). First, the algorithm orders each set separately; the
results are displayed in column “Non-Synchronized Solution” (the length of each solution and the total length are
shown). Once this stage is over, the corresponding field in the right upper corner of the window is highlighted in green.
Next stage, called “Skeleton”, is to find the best order of shared markers. And the last stage, called ”Consensus”, is
to find the optimal consensus solution for shared markers upon the inclusion of non-shared markers. The obtained
results are reflected in the column “FF synchronized solution”. The stage “Consensus” is finished after
pressing the button <<Stop>>.
The order of
shared markers.
The length of each
solution and the
total length are the
sums of the rates
of recombination
between adjacent
markers along the
maps, multiplied by
10,000
3_25
Each such trial can be chosen for the further steps of analysis by <double click> on the corresponding trial name.
After this choice, the first line of the table will change and the table of pair-wise conflicts will be re-calculated for
all sets. The order of markers in all of the sets will change as a result of consensus analysis.
Working consequently with each set, i.e., resolving the conflicts in the defined regions by local consensus
analysis, we will reach the situation when the table of shared markers will be free of conflicts.
Local analysis (continued)
3_26
The system will also check for conflicts of triples, in addition to pairwise conflicts. If such triple conflicts are
detected, the user will be asked whether he/she would like to resolve these conflicts. If the answer is <Yes>, the
conflicting sets will be shown. In the employed example, conflicts of shared markers were found in sets Set1,
Set7, and Set8, while Set2 was added because it also includes conflicting markers (but in the same order as in
Set1). Consensus analysis should be conducted for the shown sets, as described above.
After finishing the analysis, the system will again check for conflicts, and if no conflicts are found a message
<All right> appears.
Local analysis (continued)
3_27
A drawback of local analysis is in the fact that the resulting solution orders are
combined from several well ordered pieces. We will show here two examples of
application of global analysis. By choosing global analysis we get the system’s
question whether we want to use the previously obtained local solution as a
starting point for global analysis. Answer <No> gives us the possibility to start
directly the global analysis by pressing button <Start of process>. Global analysis
employs the heuristic method of optimization. In contrast to the window on p. 3_17,
here we show 47 shared markers. Their order changes at the “Skeleton” stage.
After the user stops the “Consensus” stage, the program conducts one more
iteration of consensus analysis and moves to the stage “Recalculating the sets”.
After this stage, the process is finished and the window is closed automatically.
Global analysis
3_28
Another possibility to utilize global analysis is to use it after local analysis,
with a hope to further improve the solution. This time, we choose again
<global analysis> and answer <Yes> to the question whether we want to
use the local solution as a starting point for global analysis.
A window of the process for employing of the heuristic method appears, and we can press the <Start> button.
Global analysis (continued)
3_29
The obtained result shows that the solution has indeed improved by very slightly.
The considered variant of the analysis employs as a starting point the order of shared markers obtained by using
the local analysis. Therefore, the stage “Skeleton” is skipped, and instead of the “FF synchronized solution”
column, the column “SCF synchronized solution” is filled in by the values from the local solution. The process of
searching the solution is started here from the stage “Optimization by Global Criterion”.
The global analysis can be continued (with a hope that the solution can be further improved). For that we should
switch on <Global analysis>, answer <No> to the system’s question and press the button <Continue of the
process>.
The window of the process for heuristic method here is the same as the one shown on p. 3_21. The obtain
solution will replace the solution shown in the line “First global”.
Global analysis (continued)
3_30
Consider the second example. It includes 16 sets, but in the figure we show only 12. The local solution looks as:
The obtained global solution is obviously worse than the local one.
But when the global analysis was started with the local solution, it considerably improved the result.
Global analysis (continued)
3_31
Now for each set we have the results of initial (set-specific) ordering and of several consensus ordering analyses. If
we employ the menu option <View list of the sets> <in detail>, we will see detailed characteristics of the
solutions for each set. Simultaneously, a window will appear that helps to see the correspondence of marker order
in each set with any of the obtained variants of consensus solutions.
Furthermore, by choosing in this window the radio button <Comparison> and one of the sets, e.g., Set3, we get
a new window, that represents all results of individual and consensus ordering of this set.
In this window we can choose one of the employed variants of solution and display the corresponding results
for each set obtained using the chosen method.
Reviewing the results of consensus analysis
3_32
Reviewing the results of consensus analysis (continued)
To conclude the request, the button <Display> in this window should be pressed.
3_33
Displaying the results, integral map
Two types of displaying the results of consensus analysis are currently
available in the package: the compromised map order for each set and integral
map. For the integral presentation we suggest 3 types of outputs: (1) a text file
for all shared markers; (2) a window with graphs of all sets, and (3) a txt file that
serves as input for drawing all ordered shared markers. In any case, we should
indicate which variant of the employed solutions we want to output (Local, First
global, local global).
Menu option <Integral map>: it allows output of shared markers ordered during
consensus analysis, to a text file IntegralMap. The file name also provides info
about the type of the conducted consensus analysis, e.g., IntegralMap ForLocal.
This file contains the names of shared markers with lists of sets where this
marker appears. In {…} brackets we show markers with uncertain order. Thus,
in the employed example, markers *m7 *m59 *m58 can be put in another order
without changing the optimization criterion. Brackets […] include a marker that is
absolutely linked with its next neighbor. In our example, this is marker *m86 that
shows no recombination with marker *m56.
The menu option <Integral picture> allows to output markers to a special
visualization file IntegralMapForPicture_ForLocal.viz. Using publicly available
program http://www.graphviz.org one can get a graphical presentation of the
integral map, as shown on the next page. This graph does not include bound
together (i.e., absolutely linked) markers. Depending on user’s choice, the file may
include only shared markers or shared plus unique (i.e., set specific) markers. In
the last case, the file name includes sub-name (Unique). All saved files are stored
in a special sub-folder that includes data of the project - the folder <ResultFiles>.
3_34
Unfortunately, the program http://www.graphviz.org imposes restrictions on the marker names: the name cannot
begin from digitals (0,1,…,9) and cannot include some special symbols (*, _, /, etc.). In the integral map, shared
and unique markers are highlighted in brown and grey colors, respectively.
Displaying the results, integral map (continued)
3_35
By choosing the menu option <All maps> we get a window with maps for all sets, for the selected variant of the
solution procedure. To see the maps, we should press <Display> button.
Scrolling allow to see all sets (if their number is >6).
3_36
Displaying the results, integral map (continued)
To obtain a graphical output of ordered markers for each data set, we need to
shift the list of sets to the state <View list of the sets> <in detail>, and then to
select the desired variant out of the conducted consensus analyses and the data
set of interest.
The menu option <Results for chosen set> has two sub-options for
output: to text file and to EXCEL file. In its turn, output to EXCEL may be
in two forms: as a map of markers and as a table graphical genotypes,
exactly as in the previously described options of see p. 1_74-79 and
2_12. The map output is shown in the figure.
Displaying the results for each set
3_37
An output EXCEL file for genotypes for the chosen set looks as in the usual mapping analysis (see also
p. 1_80-82).
Displaying the results for each set (continued)
3_38
All markers of the chosen set will be saved in text files of the folder containing the initial data of the project, namely,
in its special sub-folder <ResultFiles>..
For using the menu option <Result for every
set><Output to text file>, it is necessary to
choose one of the variants of the employed
consensus analysis and the needed set.
3_39
After saving is finished, it may happen that this sub-folder contains one, two, or three text files: the file with
the name *_Sk.txt includes only skeleton markers, file *_Sk&Ext.txt ” includes skeleton and bound
together markers, whereas file *_Glob.txt contains all markers including attached. Therefore, choosing
this option leads to generation of a file containing skeleton markers. Two other types of files appear only if
the solution for the chosen set includes bound together and (or) attached markers
Displaying the results for each set (continued)
Saving the intermediate results and continuing the analysis
During the analysis, the user may need to save various results, in order to have a flexibility of comparing the
efficiency of different scenarios of consensus mapping. For that, the menu option <Save all change> is employed.
We recommend to use this option after the data input, after initial (individual) ordering of the data sets, after
finishing <Global analysis> process and after each its continuation. You may also want to save the results during
some steps of <Local analysis>.
To continue the work, you should choose the option <Consensus>-><Open saved file> of the main menu and
select the step from which you want to continue the process; for example, S1 as one of the four saved steps (see
also p. 1_68 -1_70)
Usually, it makes more sense to continue the
analysis starting from the last saved step. But you
may also have situations that you want to start the
consensus analysis from the beginning, but with
somehow corrected one or few of the data sets.
Like in the usual multilocus mapping, repeated
analysis from different saved states will result in a
tree of steps. If needed, some of the saved states
can be deleted using <Clear saved file> option.
3_40
3_41
Removing and adding sets in the process of consensus analysis
It may be necessary to remove or add sets during consensus mapping. The process of
consensus analysis will have to be repeated, but some sets you can save in the form in
which they were before the analysis. To do this, open the previously saved results of the
treatment; it is necessary to choose the saving step before you starting attaching
markers. Removal or adding functions are activated by the relevant menu options.
When either of these options is chosen, an automatic additional save of the opened set is conducted. In the folder
carrying the selected set, a new sub-folder PartSet1 in case of removing a set and ExtendSet1 in case of adding
a set. These options can be used several times for the same open set with corresponding changes in the sub-
folder names. For example, if these functions are used for folder PartSet1, a sub-folder PartSet1_2 is created etc.
Using any of these functions will change the array of shared markers, so all shared markers previously moved to
<Heap> and <heapDelegate>, will be returned to their sets
Delete set(s)
Before choosing a menu option, from the list of files you should select the sets you want to delete, then answer
<Yes> to the message to confirm the deletion. After sets removing, a situation may arise when in one of the
remaining sets, the number of shared markers is less than 3. You receive a message and such sets will be
deleted automatically. The remaining sets are searched for markers that were shared in the initial data, but after
removal of a data set become unique. For the new combination of sets for consensus analysis, the program
selects the bound together markers and moves these markers to the <Heap> and <heapDelegate>. The saved
sets are displayed in the Consensus window; sets with changed marker content are indicated by the sign !! This
can be a set, in which some shared markers become unique, or a set in which there some additional markers
become shared (in case when these markers were sticky with markers of the deleted set).
3_42
Removing and adding sets in the process of consensus analysis (continued)
In this example we have 8 initial sets.
Set6 (1B_dataGG_3_x) and Set8 (1B_data_GG_10_x) have been deleted. This caused also a need to delete
Set5 (chr1_MC2_1B_tub) because the number of markers that it shared with any other remaining set has become
less than 3. Thus, the result is 5 sets for new consensus analysis.
The marked sets should be opened and treated using function <Ordering>. The
markers converted from shared to unique ones are also marked with sign !!.
During the analysis, they can be easily seen and, if needed, removed. After the
<Ordering> step, this sign disappear from markers and from the sets.
3_43
Removing and adding sets in the process of consensus analysis (continued)
Add set(s)
When you select this option, in the window Consensus a window appears for input sets. After input we should
press button <End input>. When you add new sets the number of shared markers may increase and some
“unique” markers that have been removed earlier to Heap, may become “shared”. If such a marker was found
in one of the old sets, all its markers from <Heap> and <heapDelegate> of this will be returned back to the
analysis. Before the new combination of sets will be subjected to consensus analysis, the program selects the
bound together markers and moves these markers to the <Heap> and <heapDelegate>. The new enlarged
combination of sets is displayed; the sets with markers returned from <Heap> and <heapDelegate> are signed
by !!. After opening such sets you should treat them using the functions Control of bound together markers and
Ordering. Obviously, the “cleaning” operation for these sets should be conducted from the beginning.
Reorganization of maps for set pairs with conflicting orders
The bottom part of the window includes a table of shared markers, with the number of shared markers for each pair
and number of conflicts. The record «С_10» means that for the considered pair of sets the number of conflicts is 10.
For problems with a large number of conflict markers we strongly
suggest first to reduce the number of conflicts by reorganizing the sets.
For that, we should analyze each pair of sets having a large number of
conflicts. In order to obtain the info about conflicting markers for a pair of
sets we should first select a line and then a column of the table. Thus,
for Set2 (line ) × Set3 (column), the number of conflicts is 10. We can
display graphically the situation with shared markers for this pair of sets.
In the figure presented on the next page we can see: the names of the
corresponding files, the names of shared markers of the chosen
chromosomes and their map positions, as well as the distance between
the markers and the number of set-specific (“unique” markers) in each
map interval. Most importantly for this stage of analysis is that conflicting
markers are indicated.
Appendix
3_44
To reduce the number of conflicts we can use the menu option
<Reorganization of the chromosome>. Two options are available for that:
inversion of the group of markers in one of the sets and transposition of a
group of markers to a selected interval.
We show now how the reorganization of the sets can be conducted. We
start from the menu option <Move the selected part> and apply it to the
set Set3. As a result, a list of all markers of this set together with its picture
will appear: the picture on the previous page shows that two markers
(mar13,mar14) are good candidates for moving down in the selected part
(highlighted in red) of Set2 Press now <OK>. As a result, these markers
were moved to the chosen region and a part of the conflict is resolved. If
in some analysis, some step proved not successful, you can employ menu
option <UnDo> and try another variant of transposition.
Reorganization of maps for set pairs with conflicting orders (continued)
3_45
The transposition resolved only a part of the conflict. The reminder part can be referred to as a “propeller”. If the
propeller includes an entire segment, it can be inverted. In our example (p.3_33), we select the Set2 and try to
employ menu option <Inversion the selected part>.
We again will obtain the list of all markers and a
picture of the set.
We select now the part of the list that we want to
invert. In this example it is the part from markers
*mar9 to marker *mar2. Based on the info from the
picture about the interval length, we can extend the
target segment by including the adjacent unique
markers. Press button <OK> to start. As a result, we
reduced the number of conflicts to two. If we are not
happy with the result, we can employ menu option
<UnDo> and then try several other inversion
variants. If the result is acceptable, the initial sets
should be replaced by the new ones. For that we
press the button <Save transformed data> and
answer <Yes> to the questions asked by the system
when we try to close the window.
The described treatment of individual pair-wise
conflicts can considerably reduce the total number of
conflicts in the sets before we apply global analysis. .
Reorganization of maps for set pairs with conflicting orders (continued)
3_46
References
Our algorithms are based on theoretical papers of the entire mapping community, and our own publications. List
of our relevant publications was provided on p. 1_8. Here we provide references to other papers cited in the
Tutorial.
Esch E., Weber W.E. 2002, Investigation of crossover interference in barley (Hordeum vulgare L.) using the
coefficient of coincidence. Theor Appl Genet 104: 786–796.
Haldane J.B.S., Waddington C.H. 1931, Inbreeding and linkage. Genetics 16: 357-374.
Lander E.S., Green P., Abrahamson J., Barlow A., Day M.J., Lincoln S.E., and Newberg L. 1987, Mapmaker:
an interactive computer package for constructing primary genetic linkage maps of experimental and
natural populations. Genetics 121: 174-181.
Linkoln, Stephen E., Mark J. Daly, and Eric S. Lander. 1993, Constructing Genetic Linkage Maps with
MAPMAKER/EXP Version 3.0: A Tutorial and Reference Manual. Whitehead Institute for Biomedical
Research Technical Report Third Edition (Beta Distribution 3B).
Sakamoto T., Danzmann R.G., Gharbi K., Howard P., Ozaki A., Khoo S.K., Woram R.A., Okamoto N.,
Ferguson M.M., Holm L.-E., Guyomard R., Hoyheim B. 2000, Genetics 155: 1331–1345.
Sivagnanasundaram S., Broman K.W., Liu M., Petronis A. 2004, Quasi-linkage: a confounding factor in
linkage analysis of complex diseases? Hum Genet 114: 588-593.
Stam P., 1993. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap.
The Plant Journal 3: 739-744.
Yap I., Schneider D., Kleinberg J., Matthews D., Cartinhour S., McCouch S. (2003) A Graph-Theoretic approach
to comparing and integrating genetic, physical and sequence-based Maps. Genetics 165: 2235–2247.
Jackson B., Schnable P., Aluru S. 2007, Consensus genetic maps as median orders from inconsistent sources.
IEEE-ACM Transactions on Comp. Biol. and Bioinformatics 5: 161-171.
3_47
Table of Contents
MultiPoint Tutorial
Part 4 - Building ultra-dense genetic maps
in the presence of genotyping errors and missing data
:
Introduction
Input data
Analysis of missing and segregation
Window “Creation of global parameters”
Clustering
Treatment of a separate LG
Treatment of a set of LGs
Option <Save all clusters>
Output of the final results
References
4_2
4_3
4_4
4_6
4_10
4_12
4_19
4_24
4_22
4_25
4_1
Introduction
4_2
Recent advances of genomic technologies have opened unprecedented possibilities of relatively inexpensive genotyping at
genome-wide scale generating a large number of SNP markers. It would seem that there is now everything needed to build
high quality ultra-dense genetic maps. This should be the case if genotyping is error free and the number of markers per
chromosome is of the same order of magnitude as the population size. With very large number of markers available for a
mapping population, most of the markers on a genetic map will remain inseparable by recombination and will represent groups
of tightly linked loci. In such case, only one representative per each group could be placed on the (skeleton) map; all of the
remaining markers can then be attached to the skeleton. The real situation is significantly complicated by technology-
associated genotyping errors, which “diversify” a certain part of markers that would be identical in an ideal situation of no
errors. The higher the error rate and the ratio of number of marker to population size the more difficult is the problem of
building a reliable map. The situation is further complicated by missing data that is usual in genotyping-by-sequencing (GBS)
approach and cannot be compensated by imputation of missing scores, especially for RIL populations.
The sub-package MultiPoint-ultradense suggests a method of addressing these problems that is based on a simple probabilistic
estimation of the proportion of identical markers, as a function of the error level when the errors are rare, and of the radius of
“diversified” markers when the error level is increased (Ronin et al. 2015, 2017). Let, for example, sample size be N = 100 and
the probability of genotyping error p = 0.01 per marker. Then the probability that in all individuals both alleles of the marker m
will be unmistakably identified, is P = (1-p)N = (1-0.01)100 ≈ e-1. This means that assuming 1% error rate within a group of
absolutely linked markers, about a third will still remain error-free. Thus, for building the skeleton map one can select error-free
markers based on the presence of their “twins” in the sample. However, there is also non-zero probability of an opposite effect,
i.e., when non-identical markers become “twins” because of genotyping errors. Therefore, a certain threshold is introduced in
our algorithm for the selection of markers with a sufficient number of absolutely linked copies (Ronin et al. 2015). With higher
level of errors, the proportion of twin markers may become negligible: the genotyping errors lead to dissipation of the twin
groups, so that the resulting marker agglomerations are “blurred” around the positions of the (unobservable because of errors)
initial points corresponding to error-free situation. Therefore, with higher level of errors we employ an additional marker filtration.
Namely, after the twin groups exceeding a pre-set threshold size ts0 are selected as candidate for the skeletal map, we conduct
clustering of the remaining markers by a procedure similar to k-means algorithm. Then, representative markers of clusters are
added to the set of selected candidate markers for building the skeletal map (Ronin et al. 2017). The developed approach
allows for mapping big sets of markers (~105-106), i.e. suitable to deal with mapping data generated by GBS approach.
---------------------------------------------------------------------
Ronin et al. 2015 Building ultra-dense genetic maps in the presence of genotyping errors and missing data, pp. 127-133 in Proc. the 12th
Intern. Wheat Genetics Symp., edited by Y. Matsuoka and S. Takumi. Springer, Yokohama, Japan.
Ronin et al. A new approach for building ultra-high density linkage maps based on efficient filtering of trustable markers. Genetics 2017
Input data
We have two variants of data input: input followed by clustering the markers
into linkage groups (LGs), and input of one LG. In the first, we use option
Open->Population file, in the second we use Open->Input of one LG only. In
both cases, we get the input window. The issues related to input are described
in detail in section 1. As before, the button <Select data file for input> is used
to select the data file and the button <Input Data> to load the data.
It is noteworthy that mapping data, especially those
generated via genotyping-by sequencing under relatively
low coverage level, may have high level of missing data
and massive segregation distortion (hence high 2 for
deviations from the expected ratio). Thus, during the input
we suggest to conduct certain data filtering. If missing is
very high, the first step in filtering is for missing level. In
such situation, the user gets a warning message:
4_3
After this message, a window to set up the
parameters of filtering is opened.
Joint ULD analysis of co-dominant and dominant markers in F2 populations
Upon data input, filtering markers for missing and segregation is performed separately for co-dominant markers and each of the
two types of dominant markers. In the upper part of the window, you should chose the button defining the currently selected
marker type for filtering.
Then, for the defined group, set the threshold values of filtering parameters for missing and segregation distortion chi2, conduct
saving and move to the next group.
After finishing filtering for all three groups, close the window. The system then generates 3 sub-projects named
_Cod3, _Dom4, and _Dom5, and for each of these sub-projects the selected markers are saved in its folder ‘Data’.
Analysis of missing and segregation (continued)
4_5
Function “bound together markers”
We first describe the parameters that should be defined for this function. Parameters
<Part of name…> and <Coefficients of priority> are described in detail in Part1
(page1_33). In the considered examples, missing data is a more important complicating
factor than segregation distortion, hence we use by default coefficients 0.9 and 0.1. The
markers are selected in accordance to priority defined by these coefficients, and for each
pair of markers they are compared for identity across all genotypes (excluding those with
missing data for the considered markers). If the number of identical scores does not
exceed the <Min. number of genotypes for two markers>, the markers are considered
unlinked. A representative marker for a group of bound together markers (twin group) will
be included to further analysis if the number of markers in the group is no less than the
preset parameter <Min. size of bound together group> (or ts0 - Ronin et al. 2017). The
default value of this parameter can be changed by the user. The process of marker
selection is started by pressing the <Bound> button. In each group, the pair of markers
with maximal number of identical scores is selected; within the pair, the marker with
minimal missing is considered as a skeleton marker, the representative of the whole twin
group (Ronin et al. 2015). With the default parameters in the example, we get 717 twin
groups, hence 717 candidate skeleton markers; in total, the groups included 1660
markers. The remaining markers are saved in Heap that will serve a source of markers
that can be tried in order to fill in the gaps of the ordered LG.
Window “Creation of global parameters”
By answering <Yes> you save the
results, but if the number of selected
makers is too small, you answer
<No>. It may even happen that no
twin groups were detected fitting a
chosen stringent threshold ts0.
4_6
The answer <No> implies that skeletal markers will be recruited using
representatives of twin groups with size ≥ ts0 only; by pressing the appeared
button <First clustering> you start the process of clustering the selected
candidate skeletal markers into LGs. Alternatively, you may chose <Yes> to
increase the number of skeletal markers via clustering of the remaining markers
into kernels of a preset radius (min. rf) by a procedure similar to k-means. You
may change the radius, depending on population size, data quality, etc. (see
Ronin et al. 2017). For big number of markers the process may take considerable
time. At the end, a table appears that informs about the number of clusters for
each cluster size. The user should decide about the minimum size of the clusters
to be used as source of additional skeletal markers.
In any case, the appeared window provides you 2 radial
buttons: <Reiteration> enables repeating the process with another
parameter ts0 (min. size), while choosing <Continue> leads to the
following question:
Window “Creation of global parameters (continued)
4_7
Again 2 buttons appear: <Reiteration> enables repeating the process with
a changed value of “min. rf group” while selecting <Continue> starts the
examination of the new candidate skeletal markers obtained by clustering
for co-segregation with candidate skeletal markers representing twin groups
of size ≥ ts0. The results of examination appear in the table:
Pressing <OK> leads to saving the
selected candidate skeletal marker in
a special array, which is reflected in
appearance of a scrolling bar and
button кнопка <First clustering>.
Pressing initiates the process of calculation of pairwise rates of recombination needed to cluster markers into linkage
groups (see next page) followed by ordering the skeletal markers within LGs. In total, all markers will be kept in three
arrays: skeletal, bound together (twins), and remaining markers (Heap).
Important note: Mapping data may include a certain proportion of markers in repulsion phase relative to the
majority of markers. If their phase was not defined in advance, this aspect should be taken into account during the
mapping analysis. The estimate of recombination frequency (rf) between two linked repulsion-phase markers will
be >50%, hence the program automatically replaces such estimates by 1- rf. The accuracy of this simple approach
for phase control during map construction was carefully checked and validated in our simulation tests. In the output
tables (function <FinalResult>) the markers proved to be in repulsion phase relative to the majority of markers in
the skeletal map are marked by a special symbol (‘T’). If the user wants to save the ordered genotypes as well, the
genotyping calls of such markers are transformed, e.g., HBBHABHHABA will be transformed to HAAHBAHHBAB.
Window “Creation of global parameters (continued)
4_8
After filtering is complete, the window <Global parameters> opens. By pressing button <Bound>, you start the
process <Bound together> for all markers remained after filtering. As usually, the ‘priority’ of markers is taken into
account in this process. Obviously, codominant markers have higher priority rank, hence they are the first to aggregate groups of
twins (Ronin et al. 2017) that also include dominant markers. The remaining dominant markers are also grouped into twin groups,
separately for the two phases, Dom4 and Dom5, according to the dominant allele origin, maternal or paternal (‘red’ and ‘blue’).
By answering “Yes”, you can repeat the process, with or without changing the limits of the
group sizes. Your answer “No” leads to the continuation of usual process of the analysis, but
only for codominant markers. Pressing the button "First clustering” initiates the calculation of
matrices of pairwise recombination rates and results in opening of the clustering window for
co-dominant candidate skeletal markers, as in general (earlier described) protocol of ultra-
dense analysis of MultiPoint-ULD. For both types of dominant markers, corresponding arrays
are maintained in Heap and include twin groups of dominant markers. These arrays are saved
in ‘Data’ folders of sub-projects _Dom4 and _Dom5.
The results of grouping appear in a special table (left), representing 3 types of twin groups:
(i) pure codominant (CC), (ii) with more than one codominant and few dominant markers
(C>1D), and (iii) with only one codominant and a few dominant (C_1D). For each of these
groups, we see the distribution of obtained groups under certain gradation of group sizes
(which can be changed by the user, if needed). The table enables choosing the threshold
sizes for the 3 types of twin groups: You select the group name (one of the 3) and the column
defining the minimal allowed size for such type of twin groups. After performing selection for
all 3 types, press button <OK selection>, which results in a message on the number of each
type of groups, hence the initial number of candidate skeletal markers:
The results of the analysis can be saved, as usually, by option <Save all clusters>. Heap and matrices are saved in the sub-
project “_Cod3”. The main stage of the analysis is to build a skeletal map for co-dominant markers in sub-project Cod3 (by earlier
described protocol of MultiPoint-ULD). Each use of the option <Save all clusters> in Cod3, automatically updates Dom4 and
Dom5. After finishing the construction of the skeletal map in Cod3, the user can move to the stage of additional saturation of the
map by adding dominant markers; this should be done separately in Dom and Dom5 sub-projects. During this analysis, the user
may decide that some additional changes/revision is needed of the already constructed skeletal map in Cod3. For that, the user
should open again Cod3 and continue the analysis within Cod3. However, it is noteworthy, that saving the changes
in Cod3 (by using menu option <Save all clusters>) will automatically remove all insertions of dominant markers
made in Dom3 or Dom4.
Joint ULD analysis of co-dominant and dominant markers in F2 populations
Window “Creation of global parameters (continued)
4_9
Clustering
After <First clustering> is over and
threshold parameter is chosen, the system
asks whether the project deals with real or
simulated data. The reason is that in the
latter case we know in advance the
simulated order and, therefore, can
evaluate the correspondence between the
generated and reconstructed order of
markers in each LG. For that, we use here
a simple score, coefficient of recovery
(Mester et al. 2003).
4_10
The obtained subdivision of markers into LGs depends on the chosen threshold
recombination rate. Too liberal choice (e.g., 0.4) may lead to fusion of LGs. Replacing
it by a smaller value (e.g., 0.25) and pressing the button <Build Linkage Groups>
will give you ‘on the spot’ a solution with higher resolution. On the contrary, too
stringent threshold (e.g., 0.20) may result in fragmentation of LGs (too many LGs
compared to the species haploid number). After getting such a result, you may want
to fuse some LGs using an increased threshold (see Part1 page 1_51). For such a
case you may want to avoid too high increase of the total length of the resulting map
compared to the sum of lengths of fused maps (e.g. prevent fusions resulting in more
than 1.1 increase). Thus, you can replace the default value of the parameter
<Allowed increase of combined cluster> by a new one.
Clustering (continued)
4_11
Option <Extending the linkage group> has only one option: <insert marker(s)>. It is used mainly for inserting
markers from Heap to gaps between markers of the LGs. This operation leads also to re-calculation of the
recombination matrices (see next page for details). graph of one chromosome map. The function <Division of the
linkage group> also differs a bit from that in the previous versions, namely: (a) formation of new clusters by the
division should be accompanied by a corresponding re-distribution of Heap markers; and (b) markers that have not
be included into new clusters will be removed to Heap, and this also involves re-calculation of the matrices.
Treatment of a separate LG
Most functions of this part are analogous to those in section <Analysis and treatment of a separate linkage
group> described in Tutorial Part1, pages 1_42 – 1-50. Function <Control of monotony> is described in
Part2, pages 2_4 – 2_7. Note that markers deleted from the LGs are moved to the Heap, which is accompanied
by re-calculation of the matrices, that takes some time. The parameter <Time to ES> is calculated automatically
as a function of the number of markers in the LG. ).
4_12
Treatment of a separate LG (continued)
In the window displaying all clusters detailed information is provided on the number of
skeleton markers, bound together markers and markers in Heap. In the current
version of MultiPoint, the table with the LG marker list is extended to include the size
of each twin group and mean rank of markers of the group (useful if one is interested
to compare the obtained order of markers in the LG with the original order in the input
dataset).
In the described example, the process of ordering and removing problematic
markers violating monotony and local map stability of LG12 cluster, has reduced
the number of markers to 85. The result was saved as step S2.
4_13
Treatment of a separate LG: Extending the LG
In this example, a lot of markers closely linked to the considered LG were found in Heap. To see the
entire list of such markers we use the menu option <Extending the linkage group->insert
markers> that leads to the appearance of “List of additional markers” on the screen. This list is
generated as following: for each Heap marker, the program calculates the closest skeleton marker.
Thus, for each LG we obtain a list of “associated” Heap markers.
A special window <Insert to interval> is provided enabling the user to set up a variant of
insertion strategy (for a selected interval or for the entire LG interval-by-interval). The first step is
setting the parameter <inflation coeff.> to control the allowed inflation of the interval caused by
insertion of a candidate marker from the “List of additional markers” (its default value is 1.2,
but we recommend to start with 1.0 value followed by a step-wise increase).
4_14
Then the user should select the mode of insertions: manual, for a certain
interval, or automatic along the LG (using the option <Input additional
markers to the LG>). In the automatic regime, the system checks, for each
interval, whether the list contains suitable candidate markers and inserts the
best one. With this regime, a button <Break> is available in the window <Insert
to interval>. Pressing this button enables to stop the insertion process. After
the process is terminated or interrupted, the <Break> button is replaced with
the <UnDo last step> button. If needed, by pressing this button you can delete
all markers inserted to the LG during this insertion process.
To ensure high quality of the map, we recommend to coordinate the insertion
process by marker ordering. Then, the whole process can be repeated under
the same or slightly increased inflation coefficient .
In the automatic regime, the choice of the best marker for insertion is controlled
by the following rules. From all potential candidates for the current interval, the
system select those that upon insertion do not increase the total interval length
more than allowed by the parameter <Inflation coeff>. If priority markers
appear in the list of candidates for the current interval, such markers will be
preferred. Other quality characteristics that are taken into account, include: (a)
missing, (b) group size and (c) proximity of the candidate’s calculated position to
the center of the interval. The quality of each candidate for insertion is quantified
by relative weights so that sum of the (a)-(c) scores is equal to 1. The weights
are defined by the user. They can be changed and then saved till the next
change. In the absence of priority markers, these rules are applied to usual
markers, under both manual and automatic insertion regimes.
Treatment of a separate LG: Extending the LG
4_15
Treatment of a separate LG: Extending the LG (continued)
User can select a single target interval to add a marker
from the list: corresponding button <Choose the interval
on the map and click right mouse button> is marked.
On the LG map, we select the interval and by pressing the
mouse right button, the option <Interval-length method>.
The choice made by the user causes a change in the list
of additional markers: now it shows only relevant markers
(suitable to the selected interval) as well as the interval
length and recombination rate of each additional marker to
the interval’s flanking markers. We choose only markers
close to the interval and obeying the condition that the
distances to the flanks are smaller than the interval length.
4_16
The “Extending the LG” function needs a special comment. Its importance in the current version of MultiPoint-
ultradense derives from the fact that for building the skeleton map we select as initial candidates only markers that
represent either twin groups with a size no less than some pre-set threshold ts0. This principle that gives priority to
more reliable markers, may be less relevant in hotspots of recombination, leading to gaps in the map. Similarly, it
may prevent getting sufficient coverage at sub-telomeric regions known to have higher recombination rate. Thus, we
complement the sets of candidates by markers representing kernels of certain minimum size resulted from a
clustering procedure similar to k-means approach (Ronin et al. 2017). By using the function
“Extending the LG” we actually relax the request to the size of trustable twins-ships or kernels.
The selected marker enters the interval and is
marked by underlining (both in the list and the map).
After this step, the list of additional markers is
updated and you can continue inserting additional
markers to the same region or target another region
of the map. If you don’t like the result of insertion of a
certain marker, you can delete it. For that, select this
marker in the main list, press mouse right button and
use the option <Delete chosen marker>.
By pressing the mouse right button on any marker of
the list, the program offers 2 insertion options: manual
inserting the chosen marker by user, and automatic
insertion. For the conditions of automatic selection see
p. 4_14.
Treatment of a separate LG: Extending the LG (continued)
4_17
In addition to insertion suitable markers from Heap to intervals along the LG, markers from Heap can also be
added to the ends of the LG:
before the first marker of the LG
after the last markers of the LG
Treatment of a separate LG: Extending the LG (continued)
4_18
Treatment of a set of LGs
Function <Find markers location> is described in Part1 (p.1-72). Function
<Moving to Heap> is described in Part1 (p.1-69), but it takes much more time,
due to much larger data sets treated by Ultra-dense version of the software.
Function <User’s name of the cluster> enables to assign names to linkage
groups, in addition to LG1, LG2, ... For that, you should fist mark the target LG.
Then, using this function, obtain a special window (at the bottom on the left side
of the page), and write in it the additional name of the marked LG. This name
is preserved in all further manipulations.
To merge two LGs presumably representing two part of the same chromosome
we can use the menu option <Merging two clusters> from the list of clusters
present in the form <Detail>. In this list, for each ordered cluster, its distance
to the closest cluster is shown (actually, shown is the smallest recombination
rate between markers of the clusters). Thus, for the selected pair of clusters
оne can check whether their merging would indeed fit the expectation of end-
to-end order, when the closest markers in the two cluster are located near their
ends and will appear in combined cluster as adjacent markers or very close
neighbors. (see Part1 p.1-70)
4_19
The answer <No>
cancels merging of the
selected cluster pair
This procedure can be
continued with other pairs
of clusters. Ideally, we
should get finally the
number of clusters equal
to the haploid number of
chromosome of our
organism.
Please pay attention: bold
markers at the ends of
LG12 and LG18 appear
as adjacent neighbors in
the merged cluster.
After choosing the option <Merging two clusters> we can obtain a window with information about their closest
markers. By pressing the button <Display clusters> we will see the two clusters and the prediction of the combined
cluster. By selecting <Yes> we confirm the merging decision, and the component clusters are removed from the list.
The new cluster gets the last number; if the cluster were earlier named by user, the combined cluster will have a
compound name. Obviously, the connections between markers and clusters should be updated, which may be a time-
consuming task for big mapping projects.
Treatment of a set of LGs (continued)
4_20
This option is described in Tutorial Part1, page
1_66. However, in current version of the software,
the two saved distance matrices reflect the
transition of the removed skeleton markers to
Heap during ordering and backward movement
during inserting procedures. If you open the saved
project not from the last step, the program will re-
analyze the matrices in
During the analysis, the option <Save all cluster> can be employed many times enabling you to return to any such
step. At each step, the current version of the program needs not only the details on each cluster (LG), but also the
two matrices of recombination rates: (i) between the skeleton markers, to build the skeleton map; and (ii) between
the skeleton and Heap markers. If you saved the project details sequentially at steps S1,S2,S3,..., the system
remembers only the recombination matrices saved on the last step. It may happen that you decided to return back
to some earlier step, say S2. This will lead to a re-calculation of the matrices in accordance to the sets of skeletal
and Heap markers of this step. The new branch of the analysis can be continued (with steps S2_3, S2_4 …) with re-
calculating a corresponding step-specific pair of matrices. You may also use the option <Clear saved clusters> to
remove the unneeded anymore steps. When you remove some intermediate steps, the matrices remain in the
memory, while removing the last step in a branch the matrices of this branch are deleted. When you open your
earlier saved project at one step before the last one, the system re-calculates the recombination matrices in
accordance to the subdivision of skeletal markers in the LGs and Heap markers.
Option <Save all clusters>
Very important comment: During the work with a cluster that includes deletion
or insertion of a large number of markers, the system may generate a message:.
To avoid damage to the distance matrices, you should finish the current step,
save the results and close the program. Upon the repeated enter, the system
revises the sizes of distance matrices allowing for the continuation of the analysis.
accordance with distribution of markers at that step (see the example below):
4_21
The option <Print (output to EXCEL)> enables to output not only the skeletal markers of the chosen cluster but
also all bound together markers and all markers from Heap attached to this cluster. The flexibility is provided by
using a special window where the user should define which markers and which details should appear in the output.
In particular, the user may want to output all markers attached to the
skeletal map. We do not recommend to do that, because a part of attached
markers, due to genotyping errors, may be at a much higher distance from
the closest interval of the skeleton map compared to the size of the interval.
Alternatively, the user may request to include only those of the attached
markers that their distance to the closest interval does not exceed the
length of the interval multiplied by some constant, “relative distance to the
interval” (that might be <1 or >1). By default, we put a rather liberal
constant 1.4, but it can be changed by user.
By pressing <OK> we get
a new window. For a more
detailed description of this
function see Part1, pages
1-54 and 1-74.
Output of the final results
4_22
This option can be employed only
when all clusters have already
been ordered. The window
<Parameters for printing> of
this option are identical to the
window of the <Print> option.
Yet, this new option provides
additional possibilities for a
flexible control of the output
information, listed in the second
window <Parameters for final
result>. The user can output the
results of each chromosome in a
separate file or get a file with all
chromosomes. The output may
include only marker names and
their chromosomal positions, or
names and genotype calls.
Option <Final result>
Depending on user’s requests, the output results will include two or three files for all
LGs or separate files for each of the LGs. File with name Sk contains skeleton
markers only, file with name Sk&Ex contains the skeleton and bound together (twin)
markers; file with name Glob contains all markers.
4_23
Output of the final results
For example, the output file Glob.txt may look like the one in the figure below, where “S” denotes skeletal markers,
“B” – bound together markers, “A” – attached markers, and “AB” - markers bound together with the previous
attached marker. If the data include repulsion-phase markers, then the letter “T” on the left of the marker denotes a
repulsion-phase marker compared to the majority of markers. If the output included genotyping data, the marker
calls for T-markers are transformed (see p. 4-8), hence genotyping data now include all markers in coupling-phase.
Upon selection of option <Marker position> the output will include the marker
coordinate on the chromosome map, while by choosing the option <Interval length>
you will get in the output the distances between adjacent markers.
4_24
Output of the final results
References
Our algorithms are based on theoretical papers of the entire mapping community, and our own publications. List
of our relevant publications was provided on pages 7-8. Here we provide references to other papers cited in the
Tutorial.
Esch E., Weber W.E. 2002, Investigation of crossover interference in barley (Hordeum vulgare L.) using the
coefficient of coincidence. Theor Appl Genet 104:786–796.
Haldane J.B.S., Waddington C.H. 1931, Inbreeding and linkage. Genetics 16: 357-374.
Lander E.S., Green P., Abrahamson J., Barlow A., Day M.J., Lincoln S.E., and Newberg L. 1987, Mapmaker:
an interactive computer package for constructing primary genetic linkage maps of experimental and
natural populations. Genetics 121 174-181.
Linkoln, Stephen E., Mark J. Daly, and Eric S. Lander. 1993, Constructing Genetic Linkage Maps with
MAPMAKER/EXP Version 3.0: A Tutorial and Reference Manual. Whitehead Institute for Biomedical
Research Technical Report Third Edition.
Sakamoto T., Danzmann R.G., Gharbi K., Howard P., Ozaki A., Khoo S.K., Woram R.A., Okamoto N.,
Ferguson M.M., Holm L.-E., Guyomard R., Hoyheim B. 2000, Genetics 155: 1331–1345.
Sivagnanasundaram S., Broman K.W., Liu M., Petronis A. 2004, Quasi-linkage: a confounding factor in
linkage analysis of complex diseases? Hum Genet 114: 588.593.
Stam P., 1993. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap.
The Plant Journal 3: 739-744.
Yap I., Schneider D., Kleinberg J., Matthews D., Cartinhour S., McCouch S. (2003) A Graph-Theoretic approach
to comparing and integrating genetic, physical and sequence-based Maps. Genetics 165: 2235–2247.
Jackson B., Schnable P., Aluru S. 2007, Consensus genetic maps as median orders from inconsistent sources.
IEEE-ACM Transactions on Comp. Biol. and Bioinformatics 5: 161-171.
4_25