Download pdf - MultiPoint - genetic maps - MultiQTL

MultiPoint

An interactive package

for ordering multilocus genetic maps,

and verification of maps

based on re-sampling techniques

MultiQTL Ltd,

Institute of Evolution, Haifa University, Haifa 31905, Israel

Tel: 972-4-8240449, Fax: 972-4-8288788

http://www.multiqtl.com

1_1

http://www.multiqtl.com/

MultiPoint structure

MultiPoint is a suit that consists of three software products: MultiPoint-basic,

MultiPoint-consensus and MultiPoint-ultradense.

Each of the products can be purchased and operated separately/ They are

presented as a suit as they have many common properties and other suppliers

offer them as a single product.

The functionalities of the products are described in details in four sections of the

tutorial. The first section is an introduction and describes elements common to

the products.

The contents of the tutorial is separated also into four parts and the relevant

contents is adjacent to the text.

1_4


Table of Contents

Introduction Short Introduction to the algorithms

The main steps of multilocus ordering

Input population file Input panel

Control and correction of errors

Input of anchor markers

Preliminary treatment Analyzing markers and genotypes for missing level and segregation

Defining the threshold recombination level

The window for working with clusters Controlling “bound together” markers

Setup for clustering

Some additional service options

Analysis and treatment of a separate linkage group Defining groups of tightly linked

Additional example

The procedure of multilocus ordering

Options of the table of ordered markers

Further clustering and treatment of merged clusters

Representing the map of an LG

Adding markers

Deleting markers

Attaching markers

Division of LG into sub-groups

1_11

1_11

1_15

1_19

1_19

1_22

1_24

1_25

1_26

1_28

1_31

1_33

1_34

1_35

1_37

1_37

1_40

1_42

1_44

1_51

1_54

1_56

1_58

1_58

1_63

MultiPoint Tutorial Part 1 - General

1_9


Table of Contents

1_65

1_65

1_66

1_69

1_71

1_73

1_73

1_74

1_80

1_83

1_83

1_86

1_89

1_94

Additional functions of treating LGs (clusters) An extended form of the clustering panel

Saving the results

Possible operations with clusters

Searching cluster residence of a marker

Output options Saving LGs as text files

Printing linkage group map

Printing “graphical genotypes”

Data analysis in special cases

RIL_Selfing, RIL_Sib_mating, and IRIL populations

Import of ordered linkage groups

Adding new data to the data set

References

1_10


Constructing genetic maps (multilocus ordering) Objectives

♦ Ordering multilocus maps (with ~103 markers/chr) ♦ Verification of the orders (and removing the “bad guys”) ♦ Building consensus maps (with verification)

Method and technology

♦ Reduction to the Traveler Salesman Problem (TSP) ♦ Guided Evolutionary Strategy optimization algorithms

Genetic mapping: Some objectives

Constructing genetic maps (multilocus ordering)

Physical mapping (contig assembling)

Mapping simple (Mendelian) traits

Mapping complex (quantitative) traits

genetic mapping of QTL (MultiQTL package)

QTL physical mapping, cloning, and sequencing

QTL and gene expression (eQTL)

Short Introduction to the algorithms employed in MultiPoint

for multilocus map ordering

1_11


Reduction of multilocus ordering to Traveler Sales Person (TSP)

Order 1: a b c d e f g h k l m n l1

Order 2: b a c d e f g h k l m n l2

Order 3: a c b d e f g h k l m n l3 ………

Order N: f c m h e a g n k l b d lN

n=60 N =60!/2 ~ 3.1056 orders

The problem

How to chose the best (true) order, i.e., the

one that gives the map of minimal length?

A B C D E F G H …

a b c d e f g h …

No exact solution exists to TSP (computationally challenging). For practical situation various heuristic methods have been proposed, e.g., Evolutionary Strategy optimization (for more details see: Mester et al. 2003a,b, 2004, 2005)

GES algorithm as a memory based simulation analogue of evolutionary adaptation models (Mester and Braysy 2005)

Natural elements Simulated elements

Chromosome Variable value xi

Individual, a set of chromosomes Solution vector x=(x1,…,xn)

Mutation, change of the chromosome for a small value Operator M : xk xk+1

Population, set of individuals Set P of solution vectors {xk}

Fitness, quantitative characteristic of organism’s performance Opt. criterion value f(xk)

Selection, choosing the fittest individual(s) for next generations Operator S: f(xk) min

1_12


How to ensure high-quality of the maps despite the complexity caused by: ½ n! orders, while we need the best order (unique solution)

sampling variation in rij, missing data, data errors

negative interference

The best way to check / verify the map is to show that the obtained solution does not depend on:

(a) sampling data variation, and (b) starting points

Re-sampling for quality control: By taking sub-samples from the initial data, one can build many repeated maps upon resampling and test whether /where marker ordering remains the same

BOOTSTRAP or JACKKNIFE

with without replacement

1_13


1_14

Example: Maize B73 Mo17 (IBM) population (chr. 10)

a b

(a) Initial ordering: Unstable neighborhoods; were detected by using jackknife re-sampling

(b) Resulting ordering: Stabilizing neighborhoods after removing the detected problematic markers

Detecting and removing (correcting) the markers/scores causing the troubles

Mester, D., Ronin Y.I., Minkov, D., Nevo, E. & Korol, A.B. 2003. Constructing large scale genetic maps using evolutionary strategy algorithm. Genetics 165: 2269-2282.


New high throughput DNA technologies resulted in a disproportion between the high number of scored

markers for the mapping populations and relatively small population size. Correspondingly, the number of

scored markers may by orders of magnitude exceed the number of practically resolvable by recombination

marker for the given population size. Hence, only a minority of markers can be genuinely mapped. The

question is how to chose the most informative markers for building such a “skeleton” reliable map. We

believe that MultiPoint provides a solution to this difficult problem due to: (a) its powerful algorithms of

discrete optimization for multilocus ordering; (b) verification procedure (that is also impossible without fast

and high quality optimization); (c) interactive algorithm of marker clustering in complicated situations

caused by “quasi-linkage” (or “pseudo-linkage”) – significant deviation of recombination rates between

markers of non-homologous chromosomes from the expected 50%; and (d) algorithm of removing

excessive markers to increase stability of multilocus ordering.

Two major problems should be solved in multilocus genetic mapping: Markers that belong to non-

homologous chromosomes should not be assigned to the same linkage group, whereas markers from the

same chromosome should be placed on the genetic map in the same order as the corresponding

fragments reside in the DNA molecule. With ~200-1000 markers per chromosome, sample size ~100, and

real deviations of the recombination rates between non-synteny markers from 50%, the problem of

clustering cannot be solved by an arbitrary choice of a certain (constant) threshold value of recombination

or LOD, albeit this is exactly how this problem is treated in many multilocus mapping packages (Lander et

al. 1987; Linkoln et al. 1983; Stam 1993). Indeed, in experiments with the foregoing characteristics, the

recombination values between groups of markers from different chromosomes may be smaller than

between adjacent markers within a chromosome. In MultiPoint package this problem is treated as follows.

1_15

The main steps of multilocus ordering approach implemented

in MultiPoint software


The first step is calculating pairwise recombination fractions (rf) for all pairs of markers (using maximum

likelihood estimation procedure). Then, the number of clusters (linkage groups, LG) is evaluated and

displayed as a function of the threshold (maximal) value rfs, allowing to preliminary assign a marker to a

certain LG: Namely, marker mi may belong to a LGj if recombination between mi and at least one marker

from LGj is lower than the threshold rfs. User can obtain a prediction of the number of LGs for a series of

threshold rf values that he/she defines by setting min, step, and max values of rfs. Then, based on the

obtained information, it is necessary to chose a sufficiently small value of rfs to exclude the possibility of

getting in one LG markers from non-homologous chromosomes due to quasi-linkage. But because of the

chosen relatively small rfs you will get a large number of clusters (linkage groups) that will considerably

exceed the real haploid number of chromosomes. Therefore, the next steps should be controlled merging of

some of the clusters by relaxing the conditions on quasi-linkage (i.e., by increasing rfs). The specific feature

of our approach is that building and ordering of the LGs are considered as interacting procedures, in order

to reduce the danger of including non-syntenic loci in one LG due to the “quasi-linkage” (“pseudo-linkage”)

phenomenon (see Korol et al., 1994, 2009; Peng et al., 2000; Sakamoto et al. 2000; Sivagnanasundaram et

al. 2004; Ronin et al., 2010). Namely, if some markers of two LGs appeared closer than the relaxed rfs, it

would be reasonable to permit merging if the closest markers of the two candidate LGs are terminal, so that

merging will of “end-to-end” type. If the closest markers reside in the interior part of one or both candidates,

then merging should be forbidden.

To employ efficiently the foregoing idea, we propose a repeated clustering approach that includes (see

the scheme below): (i) ordering the LGs obtained with the chosen value for rfs; (ii) replacing groups of

tightly linked (non-recombining) markers by their most informative “delegates” (bin markers) that will further

comprise the skeleton map; (iii) verifying (evaluating the reliability) of the ordered LGs using the re-

sampling procedure (bootstrap or jackknife); (iv) removing the markers causing unstable neighborhoods in

the map; (v) relaxing the clustering conditions by increasing the end-to-end condition of merging, and

merging such candidates.


in MultiPoint software (continued)

1_16


The presented cycle can be repeated several times until further merging will cause appearance of large

gaps in the LGs. It is noteworthy that the procedure can be considerably simplified if anchor markers are

available. However, the choice and usage of anchors should be cautious because a relatively high level

of errors is characteristic of some published maps. We should also make here some introductory

remarks on our multilocus ordering procedure. As noted above, the number of scored markers may by

orders of magnitude exceed the number of practically resolvable by recombination markers for the given

population size. Thus, with population size n~100 and number of markers 1000, the minimum distance

between markers should be 1cM, hence the map length for a chromosome should be 1000 cM, which

is unrealistic in vast majority of organisms. In other words, only a small portion of markers (delegate

markers) can be included to the skeleton map, with the reminder markers being attached to the

delegates.

1_17

chosen rfs

Candidate LGs

for merging

Merging the

end-by-end pairs

Increasing rfs

End clustering

First clustering

For each LG:

Choosing

“delegates” Ordering

Verification

& removing the

problematic markers




1_18

Ordering Bounding tightly

linked markers

Verification & removing

the problematic markers

Attaching removed markers

to the skeleton map

Choosing “delegates”

Beside close linkage combined with sample sample size, the necessity for selection of representative markers

for the skeleton map derives from varying information content of markers (co-dominant versus dominant,

missing data, distorted segregation, and scoring errors), linkage between repulsion-phase dominant markers,

and negative interference (Peng et al. 2000; Esch & Weber 2002; Korol et al., 2009). Using the MultiPoint

tools, you start from a linkage group with hundreds of markers and conduct several analytical steps (see the

scheme below): (a) multilocus ordering; (b) bounding together of closely linked markers followed by selecting

“delegates” (bin markers) with highest information content; (c) replacing the groups of tightly linked markers

by their “delegate” markers; (d) repeated ordering and re-sampling verification of the reduced LG; (e)

removing the markers causing unstable neighborhoods, and repeated ordering to get skeleton map; (f)

attaching previously removed markers to their best intervals on the skeleton map. The most difficult problem

is in step (e). MultiPoint allows conducting this step automatically, but the user may choose interactive

analysis based on his own control.




Input population file

For demonstration of the diverse functions of the system functions, we have prepared examples for different

population structures: Backcross, F2, RIL_selfing. The majority of the examples are based on simulated data.

After you have entered, you get the main window of the program and its main menu.

To start working, you choose the option <Open>, and to finish – the option <Exit>.

The option <Clear saved cluster> will be described later.

The option <Open> includes a few possibilities. We begin from the <Population file>

and will get the window <Input data>.

Format of input data : each row for one marker, includes the marker name and marker

scores, separated by backspace, comma, or tab.

We first should chose the population type in

the <Type of population data> window (very

limited in the fist version of the package).

In the right part of the

<Input data> window we

will see the genotypes

characteristic of the

chosen population type

(F2 in the presented

example)

By pressing <Select data file> button, we can chose the data file for mapping analysis.

Input panel

1_19 For details about IRIL input see p. 3_7.


The input file should be in text format and have extension *.txt (default) or *.chr. The system suggest to open or

create a folder for the data and mapping results. To create a new folder, you should chose the root folder (in our

example it was Local Disk D) and press buttons <Make New Folder> and <OK>. This will create a folder with

name <New Folder> that can be re-named by user. Results of analysis of different data sets can be stored

in one or different folders. If such a folder already exists,

user can choose it and press <OK>. In the current

example this is folder MultiPoint_Results.

A new sub-folder will then be created in this folder named Project_ <name of the treated data file> (e.g.,

Project_chw7_5Name). If you want to include your project into an existing folder, you should chose this folder

(marked in blue). It will include user’s mapping data after their control/correction, and then the intermediate and final

results (as described in corresponding sections of the tutorial). On this stage, the program is testing whether in the

chosen folder you have already treated data with the same name. The system of storing the data and the treatment

results will be described later (p. 1_23, 1_73). If data with such a name

have already been treated, you will get the following message:

If you answer <Yes>, the old data are deleted, and if some

treatment results were already stored, they also will be deleted.

The answer <Yes> makes sense if you want to replace the old

data by new under the same name. If your answer was <No>,

then the program reads previously treated data and control is not needed. 1_20


Input panel (continued)



During the initial data input or input updated data under old name (see p. 1_20), the program checks

correspondence of the codes to the population type. As standard we consider codes: for Backcross, Ril_Selfing,

Ril_Sib-mating, and Double haploid : 1, 2 and 0 for missing data, or a, b (A, B) and «-» for missing data. For F2:

1, 2, 3, 4, 5 and 0, or a, b, h, d, c (A, B, H, D, C) and «-». In case of standard codes, they will be displayed in the

window (here the input file F2.txt, is for population F2.)

To input data and check the data file, press <Input Data> button.

If your codes differ from standard one, you will get a message with a request

to fill in the code table. But even if your codes are standard ones, but the

data include excessive symbols, these later will be considered as wrong

codes and you will have to fill the coding table. The excessive codes can be

considered as missing values. For that, you should put the check button

<Control of data codes> to state <Off>. It may happen that your codes are

standard, but have another sense (see the example).

If the treatment results for this data were already saved, you will

get a corresponding message. If, nonetheless, you answer

<Yes>, the old treatment results will be abolished. If the answer is

<No>, the system will warn you that you cannot continue the

analysis. You should leave the system and start again after

changing the name of the data file or folder name for your data.

Input panel (continued)

1_21


Two types of errors can be detected in the file: errors in genotypes and in marker in general.

The first type includes the following:

1. The data include symbols that differ from the defined. You can correct these or automatically consider these

as missing data by getting button <Control of data codes> in state <Off>.

2. In some markers, the codes are inconsistent, e.g., in a row with 2 and 4, sometimes codes 1 and/or 3 appear.

The second type includes the following: 3. Different population size for different markers

4. Fully identical markers with identical names and scores appear twice.

5. Markers with identical names but different scores appear.

Messages with detailed information on the number and types of the detected errors are provided. The errors

can be fixed independently of the program or using the program as a tool.

In the first case, all errors are saved in a special file error.txt, in the same folder, where user has saved the

data and results.

In the second case, the errors are provided in form of data tables and can be corrected by the program. In

fact, errors of types 2 and 3 are difficult to correct by the program: it is not clear which symbol should be

inserted, or how to replace the marker value that does not correspond the chosen coding. Errors of types 4

and 5 can easily be corrected by the program.

Control and correction of errors in the data file


1_22


As a result, in the folder chosen (or created) by user, a sub-folder will be created named: Project+name of

data file, and within it, a sub-folder <Data> that will carry the corrected data file and file of data codes,

whereas for data of anchor markers the last sub-folder will also include the file of anchor markers (see next

page). In the future, if you need a repeated data input, it is easy to do that from this sub-folder: the data will

be displayed automatically, and will certainly be correct.

The errors of type 5 can be corrected using the program, but we should provide a new name to replace the

repeated one. Thus, if we have to markers of and F2 population with identical names but different marker scores:

Using <Change name> option, clean the name field and enter another one, e.g., Xgwm497b.

Control and correction of errors in the data file (continued)


1_23


Input of anchor markers

You may have anchor markers in your mapping problem. To allow dealing with anchors, you should switch

the button <Anchor Data Exist> of the <Input data> window to state <On>.

Then, during the input process, after pressing button <Input Data> you will get

the system’s requirement to enter the name of the file with anchor markers.

During input of this file, the program checks for correspondence of its markers to other

markers of the population. In case of inconsistency, the user will get error report. The file is

copied to the sub-folder <Data> of the project. The name of this file together with the data file

name is displayed on the panel of the main clustering window. We will also show on this

example how to deal with data containing anchor markers..

The structure of the anchor file is as follows: The name of the marker, the number of the chromosome of the

markers, and the number defining the order of the anchor marker among other anchors for this chromosome. If

the position of the anchor is not defined, the second number will be -1. The elements of this file are separated

from each other by backspace or tab. The file should be in text format and be named as *.txt. Among our example

files, one is F2_anchor.txt for F2, for which anchor markers are provided in file anchor.txt (see below)

After input, the marker name is extended by its sequential number in the input file. Anchor markers are marked

by an additional left letter <A>.


1_24


Preliminary treatment

Analyzing markers and genotypes for missing level and segregation

1_25

First window displays information about missing data and segregation distortion (2) of markers and

missing of genotypes. Marker sorting can be conducted for missing or segregation distortion (as in the

example). To delete, we can select markers or genotypes and press <Delete> button.



Analyzing markers and genotypes for missing level and segregation (continued)

1_26

We can return back for one step of the

deletion of markers or genotypes,

according to the selected menu option.

Menu option <Global Undo> allows

returning to the initial data. After closing

the window, you’ll get a question asking

for confirmation of the deletion request.

If the answer is <No> the window will be

retained and you could use Undo option.

If the answer is <Yes>, the data will be

changed as requested by deletion

choice, and a question about saving the

deleted markers will appear.

These markers may be moved to Heap and be used in the future as attached or saved in a special file

deletedMarkers.txt in the project folder <Data>.



Analyzing markers and genotypes for missing level and segregation (continued)

1_27

Warning: If you plan to input additional portions of data, then do not delete genotypes! Otherwise the

population size for the second portion will not be equal to that of already included data and such situation is

considered as error.

The second window displays the markers sorted for

“informativity” (maximal value of LODs for linkage of the

corresponding marker to all other markers in the data set.

You can delete the markers with low informativity

You can employ Undo option for the last step or even start the analysis from the beginning by using <Global

Undo>. After closing the window, the user will get the same questions as described on the previous page.


The first step is calculating pairwise recombination

fractions (rf) for all pairs of markers (using maximum

likelihood estimation procedure). Then, the number of

clusters (linkage groups, LG) is evaluated and displayed

as a function of the threshold (maximal) value of rf

allowing to preliminary assign a marker to a certain LG:

Namely, marker mi may belong to a LGj if recombination

between mi and at least one marker from LGj is lower than

the threshold rfs. User can obtain a prediction of the

number of LGs for a series of threshold rf values that

he/she defines by setting min, step, and max values of rfs.

If anchor markers are available, the threshold rfs will be

increased until a critical level of rf is reached when

anchors from different chromosomes will be “ready” to

merge.


The system suggest conducting stepwise clustering,

to define a reasonable initial threshold value rfs. In

case of very large marker set, this procedure takes

a lot of time. Thus, the user may skip this step by

answering “No” to the system’s question.

Defining the threshold recombination level

1_28


Corresponding message is displayed in this case, and the process is stopped. In fact, the last will be the step of

fusing when the anchor markers are not yet “ready” to fuse, whereas at the next step they could fuse if this were

not forbidden (because they belong to different chromosomes). There might be situations when already at the first

step LGs anchored by markers from different chromosomes will tend to fuse. In such cases, a smaller initial value of

threshold rf should be taken (and, possibly, a smaller step of changing rf values).

In the absence of the file of anchor markers (example

in file F2.txt), the situation is different. But in both cases

using threshold rf=0.2 and 0.25 we’ll get 35

and 15 clusters, respectively.


Defining the threshold recombination level (continued)

1_29


Using these histograms, we can chose a reasonable threshold value of rfs. We strongly recommend to start with

moderately low rfs value, to prevent fusion of linkage groups that may belong to non-homologous chromosomes

displaying quasi-linkage (pseudo-linkage) (see Korol et al., 1994; Peng et al., 2000). With large amount of markers,

it would be reasonable to chose such an initial rfs that the size of each cluster will not exceed 150-200 markers. You

should select the desirable rfs in the left column of the list by clicking left mouse button, and then press button

<Choosing of threshold>. In the clustering window you’ll get the results. We chose rfs = 0.25 and will show the

window of clustering results for both our examples.

By double click, we can choose now any level of clustering from the

left column of the <Result of clustering> list : It will be displayed by a

histogram of clusters distribution with different number of markers. We

can get such histograms with different steps. Thus, for step=0.2 we have

clusters one, two, three, five, and eight clusters (two clusters for each of

the foregoing sizes), five clusters with 16 markers each, and one cluster

per each of the remainder cases. For step=0.25 we have six clusters

with 50 markers, and one cluster per each other size.


Defining the threshold recombination level (continued)

1_30


The window for working with clusters

This is how the window looks like when anchor markers are not available. The

clusters (linkage groups) are denoted as LGi (ni), where i is the number of the

cluster and ni is its size (number of markers). Note, that the clusters are ordered

by decreasing size.

1_31


This is how the window looks like in case of availability of anchor markers. In this

case, the chromosome number defined by the anchor marker(s) is also indicated.

In our example, two clusters had anchor markers that belong to chromosome #1,

and two with anchors of chromosome #3. Some clusters have no anchors and for

the reminder clusters the anchors define one cluster per chromosome.


1_32


Some of the markers may be of special importance for the user (“priority markers”). A part of priority markers can

be marked by special symbols added to their names. These symbols can be defined in the window <Part of name

to choose priority markers>. In case of one combination of the symbols, you can set these symbols directly,

whereas in case of several sets you should connect them by “&” (see the example below).

In addition, user can denote priority markers in treating each linkage group (see

p. 1_37). In current version of the system, we employ the information on priority

markers dealing with the problem of tightly linked markers and choosing among

these so called “bin” (or “skeleton”) markers. For that, we evaluate for marker its

missing (Miss) and segregation (Segr) levels and sorting the markers according

to linear combination (A*Miss + B*Segr). Here coefficients А and В (А+В =1) are

set equal by default, but user can define other (unequal) weights by setting A

(Missing) in the window <Coefficients of priority>. It is also necessary to set the

<Minimum rf> value; markers that are closer than this value are considered as

“fused”. By default, we set this value as = 0.0.

In our example of a set with anchor markers (file F2_anchor), let us

define one marker, r338, as priority marker, and leave unchanged other

parameters of the <Setup for controlling bound together markers >

Controlling “bound together” markers


1_33


The left panel of the window includes the name of the data file, population type,

population size, and number of markers in the data file. For data with anchors, the

name of the file with anchor markers is also provided. In the current version of the

package, only threshold recombination rate is employed as a criterion for clustering.

If you have selected function Defining the threshold recombination level the

threshold value is chosen on stage of preliminary treatment (see p. 1_30) to get

the first step of clustering, under a relatively stringent conditions (resulting in

relatively large number of relatively small linkage groups). If this function has not

been used on the previous stages of analysis, the default value of threshold

recombination rate is =0.05, hence you should define your value of threshold and

press button <Build Linkage Groups>.

To continue building the linkage groups, you need to change the threshold value

of rf and press the button <Build Linkage Groups>. Clearly, the higher rf, the

smaller the number of linkage groups. If you want to decrease rfs, the clustering

starts from the beginning, whereas by increasing rfs you switch on the algorithm

of repeated clustering.

Setup for clustering


1_34


Some additional service options

By using option <Show> <Population data> or

corresponding button Tool bar, one can get the

information on the entire population – all its markers

and genotypes. However, this option is

practical only for small size problems. Due to

technical limitations, not more than 550 markers can

be shown on the display.


1_35


By using option <Show> <Population data> or

corresponding button Tool bar, one can get the

matrix of pairwise recombination fractions for all

markers. However, this option is also practical only

for small size problems. Due to technical limitations,

not more than 550 markers can be shown on the

display.

Some additional service options (continued)


1_36


Analysis and treatment of a separate linkage group

Defining groups of tightly linked (”bound together”) markers

.

We’ll take one of the cluster with anchor markers and demonstrate how to

define and analyze groups of fusing markers. To choose a cluster, we use

<double click> of the mouse left button on cluster’s the name or icon. In the

example, we selected cluster LG12, chr 7.

In the list of its markers, symbol А denotes the

anchors and symbol Р – priority markers. To

choose additional priority markers, we use the

sub-option <Select of priority> of <Marker list

options>.

Now we’ll choose the desired marker by

mouse left button, and then, by pressing

the right button, will get the prompt:

We select the option <Create (or Undo) priority marker>. Correspondingly,

the marker will become priority marker, or oppositely, its priority status will be

cancelled. This procedure can be applied to several markers.

Choosing additional priority markers

1_37


Pressing <Control of bound together markers> button results in the question

shown below. If the answer is <Yes> the system will define groups of markers with

rf <Minimum rf> set by user

for all clusters (see p. 1_32). The

algorithm includes the following. For

each marker of the cluster, the relative

missing values and the 2 score for

segregation distortion are scaled on

their maximum values within the cluster, and then the linear combinations with the

<Coefficients priority> are calculated (see p. 1_33). Markers of each group,

anchor, priority, codominant and dominant (for F2) are sorted by increasing value of

the foregoing linear combination. Note that anchor markers are considered as having

higher rank compared to priority markers, but user can set an opposite situation.

All markers are combined in one set. Each marker of the set, starting from the

first marker of highest priority, “establishes” a group of markers with recombination

distances to the priority marker less than the <Minimum rf>. Such marker is referred

to as a “delegate” marker of its group. Markers that were already included into

groups established by higher rank delegates are not considered in the group of

lower rank delegates. If some groups with delegates of equal rank include shared

markers, these shared makers will remain in the group with smaller distance to the

delegate marker. Thus, only delegate markers are retained in the cluster.


Defining groups of tightly linked (”bound together”) markers (continued)

1_38

For the possibility of using this function for all data immediately after data input

see p. 2_2.


The reminder markers are removed from the cluster to the “Heap” set.

These markers do not participate further map ordering. The system informs

about the number of groups of bound together markers and the number of

markers retained in the cluster.

The names of delegates are marked by symbol S (or AS for anchors and PS

for priority markers). Such a marker can be chosen with mouse right button, or

function <Change delegate marker> of the <Marker list option> of main menu.

This allows displaying a table of all markers associated with the chosen group, their

missing, segregation and distance to the delegate marker. The red rectangular

indicates that the marker is dominant (dominant repulsion phase markers will be

marked in blue and codominant in green).

You can cancel the established groups within the

cluster by using the button <Control of bound

together markers> and answering <Yes> to the

appeared question.


Defining groups of tightly linked (”bound together”) markers (continued)

1_39


We recommend to treat each LG after the initial clustering and only then to continue

the clustering procedure. In other words, to reach reliable results and reduce the

danger of combining non-syntenic loci in one LG due to the “quasi-linkage” (or

“pseudo-linkage”) phenomenon (see Korol et al., 1994; Peng et al., 2000), building

and ordering of the LGs should be considered as interacting procedures.

To demonstrate the procedure of treatment a separate cluster, we will use the

example from data file BC.txt that allows to deal will diverse situations. After the input

and primary clustering (no marker or genotype deletion was conducted), let us

choose a threshold rfs =0.25. The following picture will be obtained:

Let us choose cluster LG9 for further illustrations


Additional example

1_40


We employ the option of controlling bound markers and found 3 groups

of such markers. The “delegates” of these groups are marked by symbol

S. We can see, that after removing the the bound together markers (3

markers were removed) but retaining the delegates, the cluster includes

24 markers. These three groups can be analyzed as shown before.

Now we can move to the process of multilocus ordering.


Additional example (continued)

1_41


The procedure of multilocus ordering

To start ordering you should choose the menu

option <Ordering> or use the corresponding

button of Tool bars. The ordering algorithm is

based on minimizing the total length of the

multilocus map of the linkage group. The problem is solved on the initial data

set and on re-sampled sets, in order to test the stability of the obtained order.

The number of such sets is defined by user in the parameter <Number of

iteration > (by default 10). Re-sampling can be conducted using Bootstrap or

Jackknife approaches (only the second is implemented in current version of the

package). Parameter <Population for Jackknife> defines the part of the total

population (in %) sampled at each run (by default 90%). The results of the first

iteration define the ordering that will be used as a “reference” one for compare

all other iterations. Parameter <Time to Es> defines maximum time allowed for

searching of the multilocus order in each iteration. By default, it is defined as a

function of the number of markers in the cluster by some simple procedure. All

these parameters can be changed by user.

For data with anchor markers, if the order of anchors is known and indicated in

the input data, a special check box <Taking into account order of anchors>

will appear on the panel <Setup for ordering>. It will be in state <On> to take

into account the preset order of the anchors. If you change the state to <Off>,

the ordering will be conducted ignoring the preset order of anchors.


1_42


After ordering is finished, a grid table and a graphical display of the LG will appear in

the window. The table shows the effect of variation of the recombination estimates

caused by re-sampling on the local stability of the map. It includes also the

information on missing data and segregation ratios. The graph of the LG includes

cluster name, its length, and rf values for adjacent markers. If the rf exceeds the

threshold value, it will be highlighted in red. Likewise, the anchors, priority markers,

and “delegate” markers are indicated by special symbols.


The procedure of multilocus ordering (continued)

1_43


Options of the table of ordered markers

Some simple service functions are available in this

section to facilitate the analysis. For any chosen

marker, user can get a table of its rf values with other

markers of the ordered LG. Based on this information

and/or the results of ordering displayed in the grid

table you may want to remove this marker (in fact,

deletion can be conducted for a separate marker or

simultaneously for a set of markers). After the

marker(s) is (are) chosen using left mouse button,

you can do that using the menu of the table,

Visualizing the distance table

or getting a prompt help by pressing mouse right button. In the considered option, the distance table is

displayed for any one chosen marker. It can show the rf values between the chosen marker and all other

markers (using option <All markers>). In this case, you may need to use Scroll bar, which may be time

consuming if the number of markers in the LG is relatively high and you employ this option many times.

Alternatively, you can display only to its nearest 8 markers from each side (using option <Nearest marker>).


1_44


After <Change delegate marker> is selected, you get a

window with a list of all markers of the group “represented” by

the delegate marker. By selecting any marker of this group

with the mouse left and then right button, you can obtain the

table of its distances (rfs) to all other markers of the group, or

replace the delegate. In the last case, the process of ordering

the LG is initiated, to take into account the new marker

participating in multilocus map. Consequently, updated

version of the grid table of ordered markers and the LG graph

are displayed.

For delegate markers there is an additional option: <Change delegate

marker>. After choosing the delegate marker, you’ll get this additional

(to the previous) option. In the considered example, the chosen marker

is the first one in the ordering, hence for displaying its pairwise distances

only <Display all marker distance> option is possible.

Options for delegate marker

1_45

Options of the table of ordered markers (continued)



To delete markers from a LG you can use the menu

function <Delete marker> or the first option of the

prompt table called for by pressing mouse right button.

You can choose several markers by using keyboard

buttons <Ctrl> or <Shift>. After the selected markers

are deleted, the system automatically moves to re-

ordering of the LG, followed up by output of new grid

table and LG graph. Simultaneously, the <UnDo>

option becomes available.

Such operation can be conducted several times. At the bottom of the window you’ll get a list of deleted markers

numbered according to the order they were deleted from the LG.

Using menu option <UnDo> you can recover the deleted markers. This can be

done starting form some step. Namely, by choosing the number of a deleted group

in the list, you can recover or markers of this group and those deleted after this

group. Thus, pressing <UnDo> after the choice shown in the list shown below, we

can return to the LG all markers starting from marker178(193) and till end of the list.

Deleting markers

Anchor marker(s) can also be deleted. But in such an attempt the system displays

a warning message. As indicated earlier, all deleted markers are moved to a group

referred to as <Heap> and do not participate in further clustering (if needed) and

ordering and can appear in the map only as attached markers

1_46




After the introduction to the service tools helping in analyzing the clusters (linkage groups), we can describe the

algorithm of analysis. The following steps and actions aim to utilize the available information for excluding from

the map markers that (1) cause unstable neighborhoods, and (2) unreasonable map extension. Clearly,

removing markers causing the map extension, we actually deal with double recombinants. Their appearance on

small distances may be caused by both negative interference (e.g., Peng et al., 2000) and errors in marker

scoring. We have not yet implemented the “cleaning” process, albeit some functions are already available. For

example, automatic “cleaning” the map from closely linked markers to get a stable skeleton map is conducted

by pressing button <Control of bound together markers> and allows deleting markers with minimal ranking.

The results are shown on p. 1_39. Verification process based on re-sampling procedures (jackknife or

bootstrap) reveals unstable local neighborhoods, hence potential candidate markers causing such instability. A

crude approximate information about unstable neighborhoods can be obtained just by using 10-20 jackknife

runs. A formal objective of cleaning is to get a map with minimal deviation of left-side and right-side

neighborhoods from the 1-1 double diagonal in the grid table (expected under perfect ordering). Ideal 1-1

pattern indicates that sampling variation among the jackknife runs does not affect the results of multilocus

ordering. One may relax the requirement to stability and instead of an ideal ordering (1-1 along the “double

diagonal”), be satisfied by probabilities ≥0.9. .

We are now describing the steps of the algorithm of cleaning up the LG from problematic markers. First, we

should check whether the automatically chosen optimization time is sufficient for convergence. For that we can

repeat the ordering procedure several times with the same parameter <Time to Es>. If the same order is

obtained, we can conclude that the chosen optimization time is sufficient and we can start “cleaning”.

1_47




We’ll start from deleting markers that violate

monotonic increase of rfs (i.e., deviation from the

expected increase of rf between a marker and its

subsequent neighbors). The algorithm detects such

markers automatically. By pressing the button

<Control of monotony>, you start the process of

detecting and removing such

markers. By the end of this

process a message appears

indicating how many markers

were deleted.

In many cases, after once cycle of such

cleaning the resulting ordering does not satisfy

you (e.g., the probabilities on the diagonal are

less than 0.9). You can continue cleaning

(removing markers) as will be shown on the

next page. Alternatively, you may cancel the

results of automatic cleaning by using <Undo>

and analyze the situation manually, step-by-

step.

1_48

See also additions on p. 2_4–8




Among markers with strong deviation

from the 1 on the diagonal we may

choose marker(s) with highest missing

and most distorted segregation. After

deleting this marker we can see a clear

improvement manifested in increased

values of probabilities along the double diagonal. The name of the deleted

marker is shown in the window below the grid table. In case on a not successful

choice, you can cancel the deletion by using <Undo> option.

After deletion, we recommend to conduct a

repeated control of marker for deviation from

monotony. For that, we should again press the

button <Control of monotony> In the considered

example, one marker was deleted; it is displayed

in the table of deleted markers. It can be marked

their and returned back by using <Undo> (and this

is what we will do).

To close the table of ordered markers

we can press the button 1_49




After closing the table of ordered markers, we

return to the single LG window that includes the

scheme of the ordered LG, list of its markers,

number of markers and number of deleted

markers. Information about

the number of markers moved

to the Heap set from current

LG is also provided together

with total number of markers

in Heap.

To close this window we can use the button

This brings us to the window where all the clusters

are presented. Ordered LGs are presented as

1_50




Further clustering and treatment of merged clusters

After each of the clusters with ≥ 3 markers was treated we return to the window “treatment of all clusters”.

It makes sense now to increase the threshold value rfs. For our example, let us increase rfs from 0.25 to 0.27.

these two closest markers is interior in its cluster, we analyze the “tentatively” combined cluster after its ordering. If

after ordering rf (mki,mnj) is less or equal than the relaxed threshold value of rf, the clusters will fuse. If rf (mki,mnj) is

higher than 1.5 of the relaxed threshold value rfs, fusing is forbidden. And if rf (mki,mnj) is between these two values,

the decision is by the user (visual analysis). Pressing button <Build LinkageGroups>

Now we proceed with a special algorithm

that allows testing different pairs of

clusters for the possibility of merging.

Consider a pair of clusters Cm and Cn. All

pairs of markers mki-mnj are tested. If the

pair with minimal rf (mki,mnj) consists of

markers distal in their clusters and this

minimum is less than the relaxed

threshold value of rf, the clusters will fuse,

if they do not include anchors from

different chromosomes.

initiates the clustering process. If cluster merging depends of user’s decision (i.e., if

rfs <rf (mki,mnj)<1.5 rfs) the following massage will appear:

After pressing <OK> a new window will appear (see next

page) with the names of two closest markers from the

indicated two clusters (#5 and #6) that fit the condition rf

(mki,mnj) < rfs.

Clearly, clusters with markers anchoring different chromosomes cannot be merged by definition. If at least one of


1_51


The distance between the closest markers also

appears in the window. If we press the button

<Display clusters>, the figures of three LGs

will appear: two old ordered groups and a newly

ordered group after merging the initial two. The

markers that have displayed minimal rf before

merging are highlighted in bold font in the three

groups. Near the name of the LG we can see

the sum of the recombination fractions taken

over all its consequent intervals.

We recommend to refuse merging the groups (by answering

<No>), in any of the two conditions: (1) If one (or both) of the

bold markers are relatively far from the ends of their LGs

(separated from the end by more than one marker), and (2) If

the a posteriori distance between the two merged groups

considerably exceeds the aforementioned distance between the

bold markers. In the discussed example the reasonable answer

is, of course, <No>. After answering the question on this pair of

clusters, the window is closed and the clusters are merged or

not (depending on the answer).

1_52

Further clustering and treatment of merged clusters (continued)



Usually, clusters obtained from merging are ordered very easily. The described clustering process should be

continued until the number of LGs will coincide with the number of chromosomes, or until rfs has reached a

certain user-defined maximum level (say, <0.30 or 0.35).

As a result, the following pattern of clustering

will then be obtained: We will see 8 clusters,

two LGs appeared with a changed (mosaic)

coloration, telling us that they resulted from

fusion of smaller clusters.

We need now to clean these 2 clusters. There is no need

here for control of bound together markers (already

conducted earlier). Let us open one of these two, e.g.,

LG8, and conduct its ordering accompanied by re-

sampling analysis. In this example, two markers were

deleted at the step control of monotony and two more

were deleted to achieve the

neighborhood stability (values

of probabilities along the

double diagonal).

For returning to the previous step of

clustering the <Undo> option can be

employed.

1_53

Further clustering and treatment of merged clusters (continued)



To get the LG map length in cM, we should chose <Metric length> in the LG title, choose the needed mapping

function in the appeared window (e.g., <Haldane>), and press <OK>.

Then the map length of the LG and marker distances will

be shown in cM. In the table, the distances were shown

as recombination fractions. To return back to this

presentation, we should again enter the selection

window in LG title and select <Recombination> option.

When needed, the map can be printed and/or the

information about the map can be saved as EXCEL table.

For that, we choose option <Printing> and the needed

options of the described window of map distance options.

Note that in printing regime, an additional option

<Summary space> appears. It allows to output the map

positions of the markers instead of showing the interval

lengths. For more details about output see p. 1_74.

1_54

Representing the map of an LG


Note: For getting better quality visualization of the constructed genetic map, by publicly

available software MapChart (Voorrips, 2002). https://www.wur.nl/en/show/Mapchart.htm


https://www.wur.nl/en/show/Mapchart.htm


For population of F2 type, the marker types are denoted by colors: red and blue

for the two types of dominant markers and green for co-dominant markers.

If the function <Control of bound together markers> has not been applied, the map of the LG may include

markers with distance 0.0. Such markers are drawn in one line.

The form of the graph in some specific cases

1_55

Representing the map of an LG (continued)



During the treatment of the LG, some markers were removed from it to the Heap set, that does not participate in

further clustering. Heap set will also include new markers that may be added to the problem after the main ordering

process was finished (see p. 1_88 ), as well as markers of small clusters removed to Heap (see p. 1_69). Markers

from Heap group can be added to the LG by using one of sub-options of the menu option <Extending the linkage

group>. Two options of adding markers to the skeleton map: by <insert marker(s)> and by < attach marker(s)>.

Why these functions are important ? Adding markers to the map makes sense despite the fact that these

markers were previously removed from the map in order to prevent their disturbing effect on the quality of

multilocus ordering. Indeed, in many cases, user may want to know the positions of these markers (genes,

ESTs, SNPs, etc) relative to the skeleton markers.

The foregoing list displays marker name, its missing

and segregation characteristics, name of the closest

marker and the distance to it. It shows also the

predicted length of the LG after this marker is

inserted above or below the closest to it marker of the LG.

After first of the two sub-options (or corresponding

button of tool bars) was chosen, we’ll get a list of

markers from the Heap set. This list is prepared as

follows: all markers from Heap are subdivided into

groups according to their closeness to each cluster

and for the current LG its group is provided for further

adding steps.

1_56 (In the last versions of MultiPoint, the form of the marker list is slightly modified, see p. 2_11)

Adding markers



From the aforementioned list we can

select by mouse left button the

desired marker and then, choose

one of the four possibilities using

right mouse button or <Options for

additional markers> from the main

menu window.

In this example, we have chosen <marker117>. Please note, that on the LG’s

graph the marker locus closest to the selected marker is denoted by bold

font. If one of the first two menu options was chosen, then the selected

marker will be placed near the marked one. If we choose option <Insert up

nearest marker>, the added marker will appear in the LG marker list and in

the LG graph and marked by underlying. If one of the last two menu options

Adding markers (continued)

was chosen, then we should indicate by mouse button the marker from the list, e.g., as explained below. We choose

Note that no additional ordering is conducted in this case: the marker is placed on the chosen position

on the skeleton map and removed from the list of added markers.

<marker133> and put it above <marker135>

by using option <Insert up the marker

chosen by user>.

As the result we will get:


1_57


The user can delete an earlier added marker or any other marker of

the LG. This function is a complementary to the function of adding

markers and is activated only if previously the function <Extending

the linkage group> < insert marker(s)> was chosen and list of

added markers is displayed on the screen. To delete a marker it

should be selected from the marker list of the LG using mouse left

button, and then by using option <Delete chosen marker> from

the menu <Marker list options>. There is also another possibility:

mouse right button click on the selected marker will open the prompt

menu where from the delete option can be chosen. The selected

marker will be moved to the Heap set without re-ordering the LG,

but with updating of the LG graph, list of its markers and list of

added markers.

Note that for each marker of the LG it is always possible to get the lane of its

distances (recombination fractions) with all other markers of the LG.

Deleting markers


1_58


Attaching markers

In the list of markers to be attached we can see the markers’ characteristics:

missing and segregation. The system remind that user must chose one of two

possible methods of attachment: by choosing either the best interval for each

marker or the markers that correspond to the user-selected interval (e.g., if it is a

gap on the LG map). In the first case, the user selects the markers he/she needs.

After <Options for attaching> of the main menu is selected (or mouse right

button is pressed), a question about calculation method appears. Currently, only

<Interval-length method> is implemented.

It allows extending the LG, but not the skeleton map, by markers that are closely linked to markers of the skeleton

map. This function may be useful at the final stages of analysis, when the skeleton maps for all LGs are already

finished, and many “excessive” markers remained in Heap. The user may be interested to place these remained

markers relative to the skeleton markers. The window for this option looks exactly the same as in the previous

option, and the list of attached markers is prepared in the same way as the list of added markers (albeit it looks a

bit different).

This is the second possibility from the option <Extending the linkage group> < attach marker(s)> or by

pressing corresponding button of tool bars


1_59


According to the complementary way of attaching markers to the skeleton map, the

user can choose the interval for which he/she may want to find all suitable candidates

from the list of added markers. The interval is marked by red. The algorithm of the

currently available <Interval-length method> Алгоритм first searches for each

marker its “optimal” interval (as described above) and then selects markers for which

the marked interval was the solution (if at least one such marker was found).

The idea of this method is very simple derives from the main criterion employed in this

package for multilocus map ordering. Namely, for each chosen marker, the choice of

the interval will correspond to minimum increase in the number of recombination

events.

To indicate the intervals with attached

markers in the main list of markers of

the considered LG, the upper marker for

each such interval is marked by symbol

“G”; if such upper marker is simultaneously

a “delegate” of a group of tightly linked

markers then its symbol will be “SG”.

Attaching markers (continued)


1_60 (In the new version the list of attached markers is provided in a slightly

modified form - see p. 2_11).


For a marker with a symbol G, we can use <Marker list options> of

the main menu (or the prompt obtained by pressing mouse right

button) to select one of the few options:

The first is to get the table of recombination distances from this

marker to other markers of the LG. The next two options allow

getting on the screen full information about the markers attached to this interval or LG, correspondingly.

The last two options allow to return back to Heap set the group of attached markers for the current interval

(the chosen marker is the firs flank of this interval) or return back all attached markers of the current LG.

These options are possible if the menu option <Extending the linkage group> <attach marker(s)> is

activated and, therefore, the list of candidate for attachment markers is shown on the screen (this list will be

changed after these option are applied).

For delegate marker with a symbol SG, an additional option is available that allows analyzing and

replacement of the delegate by another marker (option <Change delegate marker> described on p. 1_39).



1_61


An important note: We have already mentioned that marker

attachment to the skeleton map is considered as one of the final

stages of mapping. If, nevertheless, after the attachment the user is

going to conduct again ordering, adding markers, or division of the

LG into sub-groups, a message will appear about the necessity to

return to Heap all attached markers. If the clustering is continued, all

the clusters will be checked for the presence of attached markers,

and corresponding message appear:

For printing the map with attached markers, the option <Printing> on the

map of the LG should be chosen and then the needed options in the

appeared window. In the figure we see the skeleton markers on the left side

and the attached markers on the right side.

With the answer is <Yes> all attached markers are returned to

Heap, and все присоединённые маркеры возвращаются в

Heap, and with <No> the chosen function will not be conducted.



1_62


The user may encounter on situations when after

clustering and ordering, the resulting LG includes one

or more long intervals (gaps). It may be desirable to cut

such LG into sub-groups. This can be easily done by

using <Division of the linkage group> option or

corresponding tool bar.

Before the division procedure can be started, the LG

should be ordered and then the ordering table should

be closed. In the list of markers you can mark a contig

of markers using key <Shift> of the keyboard (if the

division option was not activated the choice of several

markers simultaneously is impossible). Canceling this

simultaneous choice is possible by selecting one

arbitrary marker. Pressing mouse right button opens

the option of selecting and creating a new cluster.

Correspondingly, the number of markers in the

remaining LG will be decreased, the marker list will be

updated, and in the graph of the LG the selected

markers will be re-drawn.

As a result, the LG will be dissected into few parts. If the LG before dissection included attached markers, these

attachments are removed to Heap before dissection, and this change is accompanied by a message of the system.

1_63

Division of LG into sub-groups (continued)



Due to these actions, just a few markers may remain in the list of markers, and several

selected groups will be shown on the graph. If the window of this cluster will be closed

now, a special message will appear:

If the answer is <Yes>, new clusters will be created, whereas all remaining markers (i.e.,

not included to any of the selected groups) will be moved to Heap and could be later used

for adding or attaching. The names of the clusters (LGs) will be changed, and the new

ones will be denoted by a special sign “NEW”. If the answer is <No> all changes

conducted to dissect the LG will be cancelled and we’ll see the old LGs.

After dissection, the generated clusters can be ordered and treated as any other cluster. If the clustering

process will be continued with a higher threshold value, these new clusters will again become candidates

for merging, if only they will not be marked as clusters excluded from further clustering.


1_64

Division of LG into sub-groups (continued)


Additional functions of treating LGs (clusters)

An extended form of the clustering panel

We are returning to description of the clustering window. So far we have been dealing with this window in the

form where the clusters are denoted by rectangular icons with indication of the number of markers in each, and

the form of the rectangular allows to see whether the cluster was already ordered or is created by merging two

smaller clusters. We can get also a more detailed description of all clusters, including their mutual “relationships”.

For that, we should select the option <View> <Cluster details> .from the main menu. This will result in

appearance of the following table:

In this table, for each cluster, the closest to it cluster is indicated together with minimal distance between them.

For ordered cluster (LG) we will see its lengths, maximum interval length (in cM ), calculated for Haldane

mapping function), number of its markers moved to Heap, and the number of attached markers. In the provided

example, all clusters were ordered, and for LG8 and LG6 markers from Heap were attached to the LGs. We can

call a separate cluster from this table by using double click on the LG name in the column of cluster names.

1_65


Saving the results

Careful mapping analysis may be a time consuming process with some steps being relatively subjective. Thus, it

is important to save some intermediate results, to have the possibility to return back and check the consequences

of the decisions made earlier. We recommend to save the results before each new clustering step. Such

intermediate treatment results are stored in the file “Save.job” from the sub-folder “IntermediateFiles” of the

current project [under name (Project+name of the input data file)]. Saving of the results is conducted

each time when option <Save All Clusters> of the main menu of clustering window is chosen.

The first saved results is stored in this file under name S1. Its first line includes the main

parameters at the current stage: number of clusters, time of recording, existence of

markers in the Heap set, etc. The record includes also names of markers of each

cluster and its characteristics (e.g., whether the cluster is ordered or not). Next savings

will be named sequentially as S1, S2, S3 ….

To select one of the saved results, the option <Open> <Old linkage groups> of

the main menu (see next page). If user selects the results recorded on the last step,

numbering of the next results will continue as expected. But if instead of the last result,

one the previous results is selected, the derivative results will be numbered in a

different manner. Let this selection was S3, although further results are also recorded

(S4,S5,S6). Then, the results derived from S3 will be saved, if desired, as S3_4, S3_5,

etc. This allows flexibility in decision making in complicated situations and comparing

the results obtained on different parts of the study.


1_66


The saved results can also be opened by using input option <Open> <Old linkage groups> of the main menu.

If this option was chosen, we will get the

content of the folder from which the program

was started. In particular, we will see all its

sub-folder with the names of our projects.

We can select the needed project-folder (in

our example, it is named Project_BC) and

press button <OK>. The table of saved

results will be displayed.

In this table, the conditional names of the conducted steps with saved results, additional information is provided

including the time of saving, <Threshold rf > at the last clustering step, and the number of clusters. It includes also

useful information about the presence (marked by sing V) or absence (marker by -) of unordered clusters, presence

or absence of the Heap set, arrays of bound together markers <Delegates> and arrays of markers attached to some

of the clusters <Attached markers>. In order to select the step of interest, you can “double click” on the

name of the step in the first column of the table by mouse left button.

Saving the results (continued)


1_67


In a long-term analysis associated with many steps and frequent saving of the results, the file <Save.job> may

become very big. You can clean it up by taking out a part of the old results by using the option <Clear saved

clusters> of the main menu. It can be called from the either during the current analysis (e.g., just before a new

saving), or at the beginning of a new round of analysis, I.e. before opening the selected saved result. After

selecting this option, a window appears in which you can choose one or a few names of earlier saved results

and press button <Delete Selected Saving>.

After confirmation of this decision, the chosen saved steps will be deleted without affecting the names of the

remainder saved steps. If this function is applied at the beginning of a work, you can immediately after deletion

choose for further analysis one of the remainder saved steps.

Saving the results (continued)


1_68


Possible operations with clusters

Consider first the option <Exclude (Undo exclude) from clustering>. This

option is used when you need to exclude one or few clusters from further

clustering. First, you should choose a cluster using the mouse left button

(assisted by the key <ctrl> if a few clusters should be selected), and then

select the considered option. This will cause a change in the icon of the

cluster: it will be marked by a red frame and sign “exc”. During further

clustering, such clusters will not be merged with any other, even if the distance

between them is smaller that the threshold value. If we again select these

clusters and repeated apply the same option, the clusters will be returned to

the previous state (i.e., repeated application here means Undo).

Option <Moving (Undo moving) to Heap> removes the markers of the selected clusters to “Heap”. In this case,

these clusters are excluded from the list of clusters, and their markers (as any marker of the Heap set) is excluded

from clustering. This option may be helpful for isolated markers or small clusters (with 2 маркерs), that during the

clustering steps have not been fused with others. As before, these clusters should be chosen by the mouse left

button; then the indicated option can be applied. For this operation <Undo> is not possible. When such operation

is initiated, the option <Save all clusters> is conducted automatically with corresponding system’s report.

Option <Edit> of the main menu provides some options for treating clusters. Note that

choosing the clusters for the described below treatments is possible by using cluster lists both

in form of icons and tables. It is worth recalling that the form of the cluster list can be changed

(option <View> of the main menu). To conduct <Merging two clusters> operation, more

useful will the the list in the form of table that displays the distances between clusters.


1_69


User may force some clusters to merge even if their distance

exceeds the threshold. Such option may be important in

situation when user knows that these clusters belong to the

same chromosome even if this is not reflected in the anchoring

marker information. After choosing two clusters with mouse left

button (assisted by key <ctrl>), we select

the menu option <Merging two clusters>.

This will result in the message shown below and

instead of these two clusters we’ll get a combined

one. Note that this last cluster is not ordered.

The initial situation can be recovered by using

<Undo> option or corresponding button of Tool

bars.

Possible operations with clusters (continued)


1_70 See also addition on p. 2_3.


.

Searching a marker between the clusters

During conducting mapping analysis, user may need to get information about some marker: is it presented in

Heap set or some (any) cluster, among skeleton markers of a linkage group or among attached markers? Such

information can be easily obtained by using <Find marker’s location> option of the main menu of the clustering

window or corresponding button of tool bars.

As a result, we’ll get a new window with a sorted list of

names of all markers. For each marker its LG or Heap,

and its status is shown. In the bottom window the

shared initial part of markers’ names is shown (in our

example it is word “marker”).

To find the information about a marker with known

name, we should print its name below the list. During

printing of the consequent letters of the marker’s name

in the bottom window, the list will be automatically

“positioned”, so that the marker can be easily chosen

from the list using mouse left button.


1_71


After the marker was selected,

we can press the button <Find

the group containing chosen

marker>. This will give us: a list

of all markers of the LG

containing the selected marker;

or marker to which our marker

is attached, or marker that is a

delegate of its group of bound

together markers. The name of

the LG containing the chosen

marker (in this example it is

LG6) is provided (it may also

be “Heap”).


Searching a marker between the clusters (continued)

1_72


This option can be employed only when

all clusters have already been ordered.

The window <Parameters for printing>

of this option are identical to the window of

the <Print> option. Yet, this new option

provides additional possibilities for a

flexible control of the output information,

listed in the second window <Parameters

for final result>. The user can output the

results of each chromosome in a separate

file or get a file with all chromosomes. The

output may include only marker names and

their chromosomal positions, or names

and genotype calls.

The obtained output files can be used for

visualization of the genetic maps, e.g. by

publicly available software MapChart

(Voorrips, 2002).


Option <Final result>

Depending on user’s requests, the output results will include two or three files for all LGs or separate files

for each of the LGs. File with name Sk contains skeleton markers only, file with name Sk&Ex contains the

skeleton and bound together (twin) markers; file with name Glob contains all markers.

1_73

Saving LGs as text files

Output options

Voorrips, R.E., 2002. MapChart: Software for the graphical presentation of linkage maps and QTLs.

Journal of Heredity 93 (1): 77-78.





Printing linkage group map

Two printing options are available in the system: printing the LG map and

printing the graphical genotypes for the same LG. In both cases, you

should choose one of the LGs for printing and one of the options of menu

<Print (output to EXCEL) results>. Consider

printing LG map (the first item of the menu).

As in the description of printing options

on p. 1_54, we will get a panel for

defining the method of re-calculating

recombination fractions to cM. By

choosing the desirable parameters and

pressing button <OK>, we will get the

graph of the chosen chromosome.

The size of the figure can be changed by moving its fame. On top of the

figure we can see a menu and buttons of tool bars. The buttons

allows changing Zoom; button allows to get Preview of the figure,

e.g., to see whether it fits in the page; button is for printing the fig.;

button transforms the picture into table; and button allows

copying the picture to an opened in advance file Excel, Word, or any

other format that allows inserting a picture.

Output options

1_74


The menu items are partially overlapping with functions of tool bars. In the

menu option <Edit> there is a possibility of copying, and option <View>

allows changing Zoom. By choosing menu option <Edit> <Options> or

by pressing mouse right button on any place of the picture, we will get a

special panel for editing. Let us consider its parts.

By using parameters <TOTAL SIZE> we can change the size of the

picture exactly as by changing the frame of the picture. Parameters

<POSITION> allow changing the position of the picture in the page.

Parameters <Width> : <Chromosome> define the width of the

“chromosome column” in the middle of the map graph, <Slope line> - the

length of the lines connecting the column and the markers. By pressing

button <CHANGE FONT>, we obtain a special panel to set the font

parameters. Note that changing the font may cause a change in the total

view of the picture on the page. After changing any parameter, we should

press the button <Apply> to change the picture in accordance with the

changed parameters.

It is also possible to return

to the default parameters:

by pressing button

<Restore default>.

Printing linkage group map (continued)

Output options

1_75


The part <PRIORITY> of the editing panel is to allow to user defining the priorities in choosing

the font size. By default, <Fit to page> is chosen, which means that the figure should get into

the page, even at the expense of small font size. In case of changing the size of the figure, the

font size will be changed correspondingly. In this case, the radio button <Total Size> will

automatically switch to <On>.

If we want increase the font size, the radio button <Font

size> should be put to state <On>. Then, by pressing

<CHANGE FONT> button, we can select the desired font.

It may happen that a part of the figure will not get in the

page.

We may need to place our figure on several pages. For

that, we should change the options on the panel <MAP>.

By default, <Single map> value is chosen on this page,

thus we see only one page. If we choose one of the values

<Multiple equivalent> or <Multiple Hierarchic>, the

figure will be placed on several pages but in different

forms. This is illustrated by the example.

1_76


Output options


By choosing variant <Multiple equivalent> we will see only a part of the figure marked on

the top by letter «А». To see the other parts of the figure we should employ menu options

<View> <Next page> and <View> <Prev.page>, or buttons of tool bars .

With such a choice the figure

is divided into two parts that

can fit in the page size. The

division can be conducted

into several parts, along the

figure length or its width.

1_77


Output options


By choosing <Multiple Hierarchic> variant, we will see a figure marked on the top by letter

«А» with an internal part marked by letter «В». There may be several such inclusions. The

transition to next part occurs in the same way as shown before (p. 1_77).

1_78


Output options


If the picture is divided into two relatively narrow parts, they can be

placed in one page by using option <Two columns>. The form of the

figure can be modified using panels <MARGING>and <WIDTH>. The

second of these two options was already described. The first one

affects the proportions of the columns for the width (<Horizontal>) or

length (<Vertical>) (if 4 figures were placed).

After change of any of the parameters,

the button <Apply>should be pressed,

whereas for returning the parameters to

the initial state the button <Restore

default> should be pressed.

For saving the picture in EXCEL file or for

printing, it is necessary to employ the options

of <File> menu. Option <Print Preview>

shows each page prepared for printing. Option

<Save as> provides an output of information to

EXCEL file, with user defined name. In this file,

information is saved in two forms

simultaneously: as a table and as one or few

pages of the figure. Option <Add to file>

allows adding information to the chosen file.

1_79

For some changes made to the last version see p. 2_12.


Output options

Note: You can also employ the obtained output files for getting better quality visualization of the constructed genetic

maps, by publicly available software MapChart (Voorrips, 2002). https://www.wur.nl/en/show/Mapchart.htm




User may generate and print the graphical presentation of mapping results in the form of

“graphical genotype”. For each ordered LG, each genotype is shown by its alternating

segments highlighted to indicate the grand-parental origin of the segment. For that, option

<Print (output to EXCEL) results> <Graphical genotypes> is employed. To conduct

the analysis, user should choose the mapping function for transforming the mapping

results into map positions in cM. This representation is saved directly to EXCEL.

1_80

Printing “graphical genotypes”

Output options


The window shown here allows choosing different color for different

allelic content per locus. By setting the check box <Sorting> to state

<On> allows ordering the genotypes according to similarity to the initial

parental lines (with respect to allelic content of the LG under

consideration.

1_81

Printing “graphical genotypes” (continued)

Output options


If needed, the “graphical genotype” presentation can be

provided in a more compact form, by groups of 10

genotypes each. It is conducted by setting check box

<Merge Individual Numbers> to state <On>.

By closing the window with the “graphical genotype”

output, user is suggested to choose a name of EXCEL file

for saving this output.

1_82

Printing “graphical genotypes” (continued)

Output options


To demonstrate some specific aspects of analyzing such populations, we will use simulated

data (files RIL_observ.txt and RIL_transf.txt). After entering such data, and conducting

preliminary analysis, the system requires to choose one of two possible ways of dealing

with recombination scores: (1) using of observed rf values in the RIL population (resulting

from accumulating recombination events during the few generations of RIL history), and

(2) using transformed rf values, to get a “per meiosis” equivalent.

We recommend using the first of these two options, because

it allows higher map resolution at the stage of multilocus

ordering. This suggestion is confirmed by our tests and

comparisons conducted on various simulations. Clearly, if

even the first option is selected, the final results should be

transformed to get “per meiosis” map distances (Haldane &

Waddington, 1935).

To illustrate the two options, we have prepared two files with the

same data but saved under different names. Consequently, one

will connected within “observable” and the other with “transformed”

option. Let us start within first option and answer <Yes>. In this

case, the process of initial clustering will be conducted as shown

before for Backcross population. Threshold rf =0.25 was chosen,

resulting in 17 clusters. Out of these, LG7 was chosen for further

treatment. We first use “bound together markers”, represent the

groups of non-recombining markers by their “delegate” markers,

and then conduct multilocus ordering.

1_83


RIL_Selfing, RIL_Sib_mating, and IRIL populations


The direct presentation of the rf values for the ordered LG7 will give inflated map,

hence the need in transformation. This map will be obtained by selecting “Observable”

from options <Printing> or <Metric length>.

If “Transformed” will be selected, the rf values for the

intervals of the same ordered LG7 will decrease by

about a half compared to “observed” values. It is

noteworthy, that the usual practice of deleting double

recombinants for adjacent intervals, especially for

small intervals, is absolutely not acceptable for RIL

populations. Indeed, in RIL, "double recombinants"

are not necessarily the result of scoring errors or real

double recombination events. Instead, many of the

“double recombinants” more likely result from

recombination in adjacent intervals that occurred IN

DIFFERENT generations of meiosis in genotypes

that remained heterozygous for those regions (in F2,

F3, etc.).

1_84


RIL_Selfing, RIL_Sib_mating, and IRIL populations (continued)


Consider now the case “transformed”. We use file RIL_transf.txt, and will choose answer <No> to system’s question.

In this case, rf values between

the markers will be smaller.

Thus, under the same threshold

rf value as in the previous

example, initial clustering will

give much less clusters. We,

therefore, select a lower threshold level, rf=0.15, that resulted in 20 clusters. One of these

clusters, again LG7, coincided with LG7 from the previous example. The same operations

as before, i.e., control of bound together markers and multilocus ordering, resulted directly

in a linkage map identical to the one obtained by using “transformed” option in previous

example. Thus, for relatively simple situations, there should not be, seemingly, difference

between: (a) ordering based on “observable” in RIL rf values followed by transformation

RIL“per meiosis” scale, and (b) direct ordering based on RIL“per meiosis” transformed

rf values. However, in more complicated situations the first approach gives more reliable

results (Ronin et al., unpublished results).

1_85


RIL_Selfing, RIL_Sib_mating, and IRIL populations (continued)


1_86

Import of ordered linkage groups

To input one or a few earlier ordered LGs you can employ the menu option

<FileOpenOrdered linkage groups for analysis>. After choosing this

option you’ll get a new window for data input. Select the population type

and press <Select data file for input>.

The name of the first selected file will appear and in the

column <State> this file will be marked as “select”. To

input this file press button <Input Data>, which will result in

the appearance of the window <Initial data analysis>. By

closing this window we input the file which will be reflected

in changing the column <State> from “select” to “input”.

Repeat this process for all LGs. Note that we suppose the

same type of mapping population for all LGs.

Press <End of Input> and select, as usually, the folder to save

the project.

To input one or a few earlier ordered LGs you can employ the menu option

<FileOpenOrdered linkage groups for analysis>. After choosing this

option you’ll get a new window for data input. Select the population type

and press <Select data file for input>.



1_87

Import of ordered linkage groups (continued)

We obtain a window with all

imported LGs. Note that

instead of the project name we

have here “Few ordered

clusters”. The value <Recomb

Rate threshold> is set 0.35 (in

fact, it is not defined here and

makes sense only during

clustering).

Each cluster (LG) can be opened. You should pay attention on the state of the <Reserve old order> button: by

default it stays in <On> (see more details about this function on p.1-88). As usually, we can use the menu option

<Save all clusters> for all input information and read it in the future by using option <Open Old linkage

groups>. To input additional markers we can employ option <Open Append additional markers>. This

function is described on p.1-90, but for the current situation the window for input additional makers slightly

differs from the one for the standard situation.



1_88

In the input window appeared after choosing option <OpenAppend additional markers>

you should pay attention on parameter Maximum rf; by default it should be 0.35. You can

change it before you press the button <Input Data>. When you change its value you should

press <OK>. This parameter controls whether the additional marker(s) can be appended to the

considered clusters. If for some marker, its recombination rate with all markers of a cluster is

higher than this parameter, then the marker cannot be added to any of these clusters.

The appended markers, as usually, are saved in Heap and can be added to the

closest clusters from the considered set of clusters by using menu option

<Extending the linkage group insert marker(s) or attach marker(s)>. For

more details see p.1-58.


In the window for treating a single cluster, a special check box

<Reserve old order> will appear on the panel <Setup for ordering>.

By default, it will be in sate <On>. This means that in attempt of

ordering, the multilocus orders obtained in jackknife runs will be

displayed around the diagonal pre-defined by the initial (“preserved”)

order. The degree of deviations from this diagonal will actually display

the map instability.

Import of ordered linkage groups (continued)


1_89

Adding new data to the data set


Consider a situation when for a created project the markers have already been clustered, the clustered ordered, the

removed markers attached. What can you do if you have got now a new portion of data for the same population? In

the previous version of the MultiPoint we suggested to input the new data into Heap and then to attempt attaching the

new markers to each of the old clusters. A more reasonable approach is presented in the updated version of the

program. Namely, we suggest first to test the new markers for linkage to the old clusters, and after that to perform

clustering of the remaining markers into new independent clusters. Consider this approach on an example.

Stage 1

To input the additional data we employ the menu option <OpenAppend

additional markersInput the new markers>.

The new dataset is tested for

coincidence of the population type and

size with the old data and distinction of

the names of new markers from those

of the old names.

In case of no errors, the new markers are included in the file <DataaddData.txt> of

the project. You don’t need to remember the name of this file. You can move to the next

stage right away or in a while. In the last case, after opening the project, you’ll get a

reminder: “Markers were added to this project. For processing this data it is necessary

to use menu option <Edit Treatment of added markers array of candidates for

new clusters>”


1_90

Adding new data to the data set (continued)


Stage 2

After opening the project, you’ll get a reminder that a new set of data has been added and it should be processed.

To begin the treatment we choose the menu option <Edit Treatment of added markers array of candidates

for new clusters>

Markers saved earlier in file <Data addData.txt> are divided

into two groups why the file is erased. Markers that are closer

to any marker of any of the old cluster that the clustering

threshold are assigned to the first group and are saved in the

project folder in file <attached.txt>. These markers will be

used on Stage 5.

The remaining markers are used to create a subproject with its own clusters, Heap, and bound together markers. The

situation is reflected in a message like the following one:

If the number of remaining markers is small, and the user may decide to put them to Heap. Creation of subproject, if

needed, is conducted on Stage 3, and meantime these markers are saved in file <dataForNewProject.txt>.


1-91



Stage 3

Now the user can create a Subproject. The program reminds about this when the user opens the initial project: "It

is possible to create subproject,using for input file dataForNewProject.txt“. The subproject is created in a usual

way, using file <dataForNewProject.txt> as a source of input data. A usually, the function <Bound togheter>

should be employed followed by clustering (with the same threshold recombination value used in the analysis of

the main project). Then we should order the markers and delete, if necessary, some markers destabilizing the

order. The resulting project is called Subproject and is placed in the same folderas the main project. The file with

initial data is erased.

In this example, 7 clusters are created. Markers in clusters of size=1 can be

moved to Неар (using menu <Edit Moving to Heap> ).


1_92



Stage 4

At this stage, we join the Subproject with the main project. Namely, when we open the main project, the program

reminds us that we can move to the stage of merging the projects: It is possible to add clusters of the subproject to

clusters of the main project using menu option <Open Append additional markers Addition the new

clusters to the main project >.

The program suggests to open the saved Subproject and then merges

the clusters, Heap and other arrays of the two projects and saves all the

data and total distance matrix of the two projects.

Clusters from the Subproject have got numbers following after the numbers of the

main project. The resulting project should be saved while the Subproject is erased.


1_93

Stage 5

At this stage we should return to file <attached.txt>, created at

stage 2. Markers saved in this file should be attached to clusters

of the extended project (resulted from merging the initial project

and the subproject). The minimal distance of each of the markers

from attached.txt file to markers of the initial project is lower than

the threshold value. However, this does not exclude that some

markers from attached.txt may be closer to clusters originated

from Subproject. After opening the new project resulted from

merging a reminder message appear: "It is possible to add special markers to the clusters using menu option

<Edit Treatment of added markers classification of the remaining new marker>”.

Then, file <attached.txt> is erased. Clusters with appended new markers are shown

as non-ordered; thus they should be ordered.




References

Our algorithms are based on theoretical papers of the entire mapping community, and our own publications. List

of our relevant publications was provided in page 8. Here we provide references to other papers cited in the

Tutorial.

Esch E., Weber W.E. 2002, Investigation of crossover interference in barley (Hordeum vulgare L.) using the

coefficient of coincidence. Theor Appl Genet 104:786–796.

Haldane J.B.S., Waddington C.H. 1931, Inbreeding and linkage. Genetics 16: 357-374.

Lander E.S., Green P., Abrahamson J., Barlow A., Day M.J., Lincoln S.E., and Newberg L. 1987, Mapmaker:

an interactive computer package for constructing primary genetic linkage maps of experimental and

natural populations. Genetics 121 174-181.

Linkoln, Stephen E., Mark J. Daly, and Eric S. Lander. 1993, Constructing Genetic Linkage Maps with

MAPMAKER/EXP Version 3.0: A Tutorial and Reference Manual. Whitehead Institute for Biomedical

Research Technical Report Third Edition (Beta Distribution 3B.

Sakamoto T., Danzmann R.G., Gharbi K., Howard P., Ozaki A., Khoo S.K., Woram R.A., Okamoto N.,

Ferguson M.M., Holm L.-E., Guyomard R., Hoyheim B. 2000, Genetics 155: 1331–1345.

Sivagnanasundaram S., Broman K.W., Liu M., Petronis A. 2004, Quasi-linkage: a confounding factor in

linkage analysis of complex diseases? Hum Genet 114: 588.593.

Stam P., 1993. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap.

The Plant Journal 3: 739-744.

Yap I., Schneider D., Kleinberg J., Matthews D., Cartinhour S., McCouch S. (2003) A Graph-Theoretic approach

to comparing and integrating genetic, physical and sequence-based Maps. Genetics 165: 2235–2247.

Jackson B., Schnable P., Aluru S. 2007, Consensus genetic maps as median orders from inconsistent sources.

IEEE-ACM Transactions on Comp. Biol. and Bioinformatics 5: 161-171.

1_94


89

Table of Contents

Changes and additions to separate chapters of the previous version New window – Creation Global parameters

Introducing user defined name for each linkage group

New variant of the function Control of monotony

Some changes made to allow large-size data sets

Changes in function Extending the linkage group

Changes in function Print (output to EXCEL)

Analysis of F2 data with dominant and codominant markers Displaying clusters (linkage groups) with dominant and codominant markers

Treating a cluster with codominant and two types of dominant markers

Repeated clustering (under relaxed stringency)

Employing Consensus option

Extending the linkage group – insert

Extending the linkage group – attach

Output, final results

Treatment of F2 data with only dominant marker

Population F1 x F1 Data input

Instructions for Recoding


Control of bound together markers

First Clustering

The general view of the obtained clusters

Treatment of each cluster

MultiPoint Tutorial

Part 2 - Basic

2_1

2_2

2_2 2_3 2_4 2_9

2_11 2_12 2_13 2_13 2_15 2_18 2_19 2_25

2_28 2_29 2_30 2_33 2_33 2_34 2_35 2_36 2_37 2_38 2_39 2_40


New window – Creation of Global parameters

After the initial analysis, when you close the window <Initial data analysis> a

new window <Creation global parameters> appears on the screen. It allows

you to define the names or parts of names of priority markers and to create

groups of bound together markers (with no recombinants). If you do not

conduct this operation, it can be conducted later, separately for each cluster.

Parameters “Coefficients of priority” can be changed by user (by default,

minimum rf is set zero). Pressing button <Bound> leads to creation of the

groups followed by information about the number of groups and number of

markers moved to Heap. For dealing with F2 data including dominant markers

this function must be conducted in this window, i.e. for all markers rather than

for each LG separately (see p. 1_39). For other types of data, this function can

be employed either for all markers or for each cluster separately. By pressing

button <Display> you can see the resulting groups.

By pressing button <First clustering> we will get the window of clusters. After we close the window

<Creation of global parameters>, the main window of clusters will appear where we should define the initial threshold value for clustering (see p. 1_28–30).

2_2

Changes and additions to separate chapters of the previous version


Introducing user defined name for each linkage group

This option is an addition to the chapter “Possible operation with clusters” (see p. 1_ 69). It is called from menu

<Edit User’s name of cluster>.

This calling will result in a message requiring to enter

user’s variant of name for the chosen linkage group. It

should be put to the window <User’s name of cluster>

(bottom, left). This will cause by extension of the cluster

name.

In the further clustering steps (merging clusters during relaxing the threshold) the names are preserved.

If merged are two clusters with different names, the name of the new cluster is a combination of the

names of its component clusters.

2_3


92

New variant of the function Control of monotony

(For the first variant of this function see p. 1_48). This function is based on the following reasons. Let us take any marker on the chromosome. For a correctly ordered map, one would expect that the distance (or recombination rate) from this marker to its adjacent neighbor, then to the next neighbor, etc. will grow monotonically. Deviation from monotony can be considered as an indicator of the presence of problematic markers. And indeed, when unstable neighborhoods are revealed by using jackknife-based re-sampling, one of the major sources of this instability are the markers violating the monotony. Moreover, it appears that these markers are among major contributors to “map expansion”. In such case, some authors recommend to check for double recombinants and remove corresponding data points. This suggestion is based on the assumption that the considered multilocus order is correct. But what if we are not sure about the order ? Or if the mapping population is RIL, hence “simultaneous” recombination events in adjacent intervals could have occurred as two single-exchanges in different generations ? Therefore, in order to escape such an artificial correction, we suggest detecting and removing the markers causing considerable deviation from the natural expectation of monotonic growth of recombination when moving for a chosen marker to its more and more distant neighbor markers (either left- or right-ward).

In corresponding tests, for each marker mi, the program calculates sequentially the ratio R=r(mi, mi +1)/r(mi, mi +2), R=r(mi, mi +1)/r(mi, mi +3), R=r(mi, mi +1)/r(mi, mi +4), etc., and the same to the left from the marker mi . This series extends till the recombination rate mi +j reached an arbitrary chosen level min(0.5; 1.5rs), where rs=rfs is the threshold recombination value introduced on p. 1_16. Clearly, due to the sampling nature of the recombination rates, one may get R=1 for the estimated rates of recombination, even if for the true rates it was R<1. Moreover, to be conservative, we may want to agree with violations of monotony that do not exceed some threshold, i.e., not reject markers that give R value slightly exceeding 1. For that, user may define his “degree of conservatism” by setting some threshold value of R*, so that cases with R<R* will be considered as tolerable. In “hard” regime, the algorithm finds the marker with highest violation of monotony (highest value of R>R*) and moves to Heap only this marker. The resulting set of markers is ordered again, without showing the order on the screen. Then, again, the worst marker is detected, moved to Heap, etc. (markers deleted to Heap could be later returned as attached ones, without having the privilege of affecting the multilocus order). In addition to the described “hard” regime of automatic control of monotony, we suggest also a “soft” regime. In this case, for one step the program can delete only one marker from the linkage group. Namely, we check which markers violate the codition R<R* from both sides, left-ward or right-ward. Out of this, we select the one with highest product Rleft*Rright.

.

2_4



93

In this example, we have got a

rather good result, with relatively

small deviations from the diagonal.

The result is accompanied by

appearance of the panel

<Sequence of operations> that

shows the details of the operation

conducted and the number of

removed markers.

After ordering a cluster, the user can apply the function Control of

monotony, by pressing button <OK> of the corresponding box. Before

that we should define the parameters of this panel or use the default

parameters.

User can chose the “hard” or “soft” regime, and can define the threshold value of R*. It is difficult to give a good universal recommendation for such a choice, independently on the map density and population size, quality of marker scoring, legitimacy of mechanical deleting of double recombinants, the level of missing data, etc. Clearly, the general intention should be to achieve maximal map stability with minimal losses of markers. If R*=1, all marker with R1 will be deleted. By default, R*=1.4.

New variant of the function Control of monotony (continued)

2_5


User can try to improve the map by

deleting markers, that are presumably the

troublemakers (causing the deviation from

the diagonal), e.g., markers # 316 and

312 in our example. These actions will be

reflected in the panel <Sequence of

operations> as well as in the <History

window> if we will call it.

If we find the results unsatisfactory, we should chose the step, where from we turned to the “wrong way”, and return

back one step before this turn. We mark this step and apply the <UnDo> option. This will return us to the preferred

previous ordering step. This allows achieving interactively, by trial and errors, the best parameters of the <Control of

monotony> function. You may find reasonable to take advantage from the initial application of the “Hard” regime, that will allow you

detecting a set of candidate bad markers. After you have got the list of the markers deleted using “Hard” option, you

may use it as a help. Namely, if after “Hard” step, you apply “Soft” option, you will get a list of markers that are deleted

by the first but not second operation. This list can help you in choosing individual markers for removing.

To see the names of the deleted markers,

we should chose the menu option

<DisplayHistoryMap>. A special window

appears with a list of markers.

2_6



Now, if we chose a marker in a region of high map instability and press right mouse button, we’ll get and additional

option <Help>. By choosing this option, we obtain a message with a list of markers recommended for removal.

Actually, the recommendation is the name(s) of the marker(s) from this list that belong to the selected instability

region. In the presented example, this may be marker #382, because of its higher missing level. Before you delete

it, the message with the suggested candidates should be closed by pressing <OK>.

The result is indicated below. We can proceed now with other problematic neighborhoods/markers, or apply

<Undo> from any previous step.

2_7



For better analysis of changes in the order quality after any operation that includes re-ordering of markers (i.e.,

ordering by itself, deleting markers, or Control of monotony), the user is provided with additional parameter. In the

list of markers, parameter <var> - the standard deviation of neighbor markers of each marker, is displayed. In

addition, <Glob.var> , the mean value of parameter <var> across all markers, is displayed.

2_8



97

Some changes made to allow large-size data sets

We describe here shortly some changes made to allow working with large numbers of markers per chromosome,

e.g., a few thousands. For a large number of input markers, the matrices of pair-wise recombination rates and LOD

values are calculated just after the input and this takes time (e.g., ~2 min for 4000 markers). During the first saving

of the results (function Save all markers) these matrices will also be saved and this also takes time, but in further

applications of Save all markers function there is now no need to save these matrices again. Still, reading these

matrices during each reading of the saved results slows down the process. We provide these details to explain why

the analysis of big data sets is not so fast as you have seen when working with small to moderate data sets. It is

noteworthy that the ordering function is also time consuming for large data.

The number of markers for which

you can see the matrix of distances

of all markers or the marker scores,

is limited in size 550 (limitations are

caused by the grid tool). To allow

working with much larger numbers

per chromosome, some changes

were made to the grid function. In

the provided example we are

ordering a cluster with 779 markers.

In addition to previous one-

dimensional scrolling (either up-

down or left-right), we can scroll now

along the diagonal, using the

“diagonal scrolling” button. However,

keeping in mind the large number of

markers, we added one more tool to

facilitate the interactive analysis of

unstable neighborhoods.

2_9


For that, a new table, Dispersion, is generated, that displays an ordered list of

a parameter quantify the instability of the neighborhood for each marker:

2i= 0.5 pij (i-j)

2,

here pij is the proportion of jackknife runs where markers i and j were

adjacent neighbors. Obviously, markers with stable local order will give =1.

The table displays 2i for all markers of the analyzed linkage group.

Obviously, the user is interested to deal first with the regions of the map with

the highest values of this parameters, in order to detect and remove markers

with highest disturbing effect on the map quality (i.e., deal with regions

represented by markers for the top of Dispersion table). Selection such a

marker leads to re-centering of the grid table, so that the grid with its 550

lines/columns will cover a part of the total linkage group (that may carry

thousands of markers), centered around the chosen marker. After you delete

this marker or any other marker from this neighborhood, the table remains in

the same position relative to the entire linkage group but without the removed

marker, while the list of variances will be updated. Note that the deletion and

re-calculation cycle takes some time.

2_10

Some changes made to allow large-size data sets (continued)


Changes in function Extending the linkage group

In version 2.1, markers that were moved to Heap after application of the <Bound

together markers>, are not displayed in the list of markers that can be insert in or

attach to the skeleton map. Indeed, their distances to their “delegate” markers is

zero, thus there is no sense to add them (see p. 1_56).

The list of candidate markers to

attach or insert looks now as shown

here. Latter “S” denotes the main

(delegate) marker of a group of

bound together markers. By choosing

marker denoted by “S” we obtain

option <Display all delegate markers>

in the additional menu. It allow us to

see all groups of bound together

markers, whose delegates are

presented in this list.

Note that for populations F2 with dominant markers these options have been considerably changed in

v.2.1 compared with 1.2 (see p. 2_25).

2_11


Changes in function Print (output to EXCEL)

In v.2.1, all markers from groups of bound

together markers are shown in prints near

the corresponding delegate markers (i.e.,

those that displayed recombination). Names

of such markers are presented in brackets

(as [xxxxx]) with an indication of the

connection to the delegate marker of the

group. In the EXCEL table, the missing level

and segregation ratio are provided

(for more details see p. 1_74 -79).

2_12


Working with such data needs special consideration due to the fact that estimates of recombination rates between

repulsion phase markers is biased downward (Mester et al. 2003). Therefore, we proposed to subdivide such data

set into two subsets, each carrying coupling phase dominant markers (amplified on DNA of only one of the two

parents) and shared codominant markers. Then, the ordering markers in the two subsets is conducted based on

consensus mapping principle: in the two sets the codominant markers should appear in the same order (Mester et al.

2003, 2005). This approach is implemented in the current version of MultiPoint. All steps of mapping are the same as

usual, with two exclusions: (i) they are conducted after splitting the data into two subsets, and (ii) the procedure is

based on synchronous ordering with restriction that shared (i.e., codominant) markers should be the same order.

The linkage groups resulting from clustering

of the split data set is displayed by colors: the

two alternative types of dominant markers are

shown in red and blue, and codominant

markers in green. As before, we will have

clusters (linkage groups) with dominant

markers of both types (red and blue) but the

ordering of these clusters will be based on

virtual splitting of the markers.

Note: Before initial clustering the function

<Control of bound together

markers> must by employed

Displaying clusters (linkage groups) with dominant and codominant markers

Analysis of F2 data with dominant and codominant markers

2_13


If we chose now for ordering one of the clusters that includes only one type of dominant markers (e.g., with

green+blue markers), its analysis will be exactly the same as before. But if the chosen cluster includes

green+blue+red markers, it will be displayed differently: you will see two windows, each containing one type of

dominant markers, and shared codominant markers. The cluster name represent the type of included dominant

markers: <*_r> for red dominants, and <*_b> for blue dominants. Codominant markers are denoted by acronym

<CD>, or <CDS> if marker is a delegate of a group of bound together markers.

In two windows corresponding to

one linkage group we see two its

variants (LG_r and LG_b) with

green+red and green+blue

markers, respectively. Each of

these can be treated separately.

Displaying clusters (linkage groups) with dominant and codominant

markers (continued)

2_14


Let us start from ordering of the first (*_r) кластера. We can also delete some markers or

employ the function <Control of monotony>, but it is noteworthy that it does not delete

automatically codominant markers if even some of them violate monotony. In the same

manner we can analyze the second part of the cluster (*_b).

Treating a cluster with codominant and two types of dominant markers

2_15


If we close now any of the windows of the cluster, we’ll get the following icon

It symbolizes the fact that the cluster has not been yet ordered: codominant markers of its two parts

do not yet appear in identical order. So far, we only detected and removed markers that strongly violate

stability. After repeated opening of the cluster we will need to order again both its parts. When the ordering

is applied to a cluster carrying only one type of dominant markers the result is marked as usual:

If necessary, a codominant marker can be deleted manually, e.g. if it causes local map instability. Corresponding

information is presented in the table of marker characteristics (column var). If codominant marker displays high

local instability of its relative position and this situation cannot be improved by removing some of its neighbors, you

may decide to remove this codominant marker. In the example below, marker Xgwm 181 (#218) is highly unstable

in the red variant of the cluster (_r) and to a lesser extent in the blue version (_b) and we may want to remove this

marker. LG2_r LG2_b

Treating a cluster with codominant and two types of dominant markers (continued)

2_16


After treatment of two parts of o a cluster, it

may happen that their codominant markers

will appear in identical order. That was the

case with cluster LG6. Such situation is

marked as follows:

However, more frequently, to rich such a result,

we need to apply the operation <Consensus>.

For that, we should choose corresponding option

of the submenu for the corresponding part of the

cluster. This approach will be described further

(p. 2_19).

2_17

Treating a cluster with codominant and two types of dominant markers (continued)


Repeated clustering (under relaxed stringency)

In the previous example, we started the analysis from threshold recombination rate 0.25. Let us relax it to 0.28.

Then, any cluster with two types of dominant markers will be spilt into two parts. Therefore, two arrays of clusters

will appear: the first with codominant markers and one type of dominant markers (r), the second with the same

codominant markers and dominant markers of the other type.. The re-clustering upon relaxed conditions is

conducted in two steps, according to the two types of dominant markers. The process of clustering is carried out as

before: clusters that are closer to each other (by their ends) than the threshold are merged by default when the

informativity score LOD>2.0. Also by default, merging is prevented is the total length of the resulting cluster is 1.1-

fold longer that the sum of the lengths of the component clusters. In the reminder cases the decision is made by

user. Merged clusters are marked as shown below:

2_18


Employing Consensus option

Our approach of building multilocus maps with dominant repulsion markers using F2 data is based on splitting the

linkage groups into two sets, with a request that codominant markers should be in the same order. This condition will

not necessarily hold if one orders each of the two sets independently. To ensure such a condition we suggest

synchronous (consensus) ordering. <Consensus> menu option should be employed. In window <Consensus>, the

number of codominant markers and the number of conflict markers (denoted by «С») is displayed in table <Shared

(codominant) markers>. Currently, our algorithm provides exact solution for synchronous ordering of a pair of

chromosomes with 8 shared markers. In such a case a corresponding message appears on the screen, and by

pressing corresponding button we start consensus ordering. For more details about consensus analysis see p. 3_2.

The criterion of consensus ordering is minimum of the

total length of the two chromosomes under the

constraint that shared markers must be in shared order.

In the window, the results of changing this criterion

during the process of optimization is shown together with

time of the process. By the end of the process a window

with final results appears. If we close now the window

<Consensus>, we will again see the two parts of our

cluster, but codominant markers will now appear in the

same order, although the resulting maps may be slightly

longer (“the cost of consensus ordering”).

By the end of the computation, a corresponding message appears. 2_19


.

In cases where the number of codominant markers is >8 we

should delimit in both variants (red and blue) parts with the

number of shared markers 8 for consensus ordering of these

parts. For each such round of ordering, the flanking regions of

these parts remain unchanged. Corresponding message is

displayed on the screen.

Employing Consensus option (continued)

То select such parts we should press button <Display chrom.>.

A picture with codominant markers of both parts of the linkage

group will appear. There are two possibilities for reconciliation of

the order of codominant markers between the two parts: to

invert a sub-group of markers or to transpose a sub-group to

another interval. For that, one the options of the menu

<Reorganization of the cluster> are available, e.g., <Invert

the selected part>, suitable for presented example.

2_20


Let us select the maximum interval including the mixed up (inverted) region by indicating in the list of markers the

beginning and the end of this interval in the red variant of the map. For that, it would be reasonable to include non-

shared (i.e., red) markers flanking the inverted region. By pressing button <OK> we conduct the operation and

corresponding change in ordering appears on the screen: the marked interval will be inverted, the length of the

chromosome will change, and instead of the initial mix up we will see isolated conflicts. Before accepting this result,

we should also check what will be the reaction of the blue variant of the map to such operation. Using <UnDo> option

we return back and choose for the test the blue chromosome.

After the corresponding

inversion is done, we can

compare the results of inversion

of either the red or blue parts

and select the best, based on

the information on their length

change caused by inversion.

The selected change can be

saved by pressing button

<Save transformed order>.

2_21



Here is another example where option <Move the selected part> is more suitable. In this example we have 10

conflicting markers. Let us apply option <Move the selected part> to LG36_r. Then, we select the sub-group of

markers to be transposed and the interval of their new location (in this case we move the selected sub-group to the

upper end of the chromosome). Then, by pressing button <OK> we obtain the result.

2_22



In this example, like in the previous one, we may apply <UnDo> option and conduct the same operation with

second (blue) part of the linkage group. This will also result in the appearance of the table with information on

changes in the map lengths.

To resolve the detected conflicts, we can use the two available options for

Reorganization of the cluster sequentially. In the presented examples we have

got one or two regions with conflicts. We can now conduct consensus ordering

to resolve local conflicts of codominant orders in the red and blue versions of

our map. To do that, we employ option <Display conflict group>, select one of

the conflict groups and press <OK> button. After the window of the consensus

process appears on the screen, we press button <Start>. Due to the small

number of markers in the conflict region of the chosen example, the

optimization process will be very fast. It’s end is marked by a message and

window with final result appears.

By closing the window of the optimization process,

we will see in the table of share markers that the

number of conflicts has decreased.

Now, by pressing again button <Display chrom> we obtain the

window displaying codominant markers of both versions of the

chromosome (red and blue). The remained one conflict can be

resolved by conducting consensus analysis for this region.

2_23



As a result, we obtain identical order of codominant markers in both variants (red and blue) of the map, that can

be seen when we close the window Consensus>. It should be noted that the map of each (red and blue)

consensus variants can be longer compared to the corresponding map obtained without the request of identical

order for the codominant markers in both variants.

Final comments and reminds:

1. After opening of a cluster with two types of dominant markers, each part (with codominant and

coupling phase dominant markers) should be first treated separately, by using <Ordering> option.

Only after that you should move to the tools in <Consensus> option.

2. Consensus analysis should be started only after the stepwise clustering (with gradually relaxed

threshold recombination rate) was finished.

3. We continue to improve our discrete optimization algorithm in order to increase the number of

simultaneously treated markers in consensus analysis, and thereby reducing the need in

treatment of the conflicts by small regions with 8 conflicting markers.

2_24



Extending the linkage group – insert function

The function allowing to add markers to the skeleton map can be applied only when the clusters have already

been treated using Consensus menu, or cluster that from the beginning included only one type of dominant

markers (i.e., in coupling phase). After opening a consensually treated cluster, we will see two its parts (red and

blue) with codominant markers (green) in identical order. The function of extending the skeleton map by additional

markers is applied separately for each of the two variants (red and blue). After calling for this function, a question

appears on the screen on which markers will be added first, dominant or codominant? The reason is that

dominant markers can be added to the chosen part of the cluster, either red or blue, whereas codominant markers

should be added to both parts under the requirement of consensus.

If we answer <No>, the list of relevant candidate

markers from the chosen map variant (e.g., red)

appears. It includes coupling phase dominant

markers from Heap. In addition, the list includes

dominant markers from the second variant of the

cluster that can also be considered as candidate

markers. Using insert function can be helpful in

filling the gaps in the map or extending the map by

more distal markers compared to those of the

skeleton map resulted from consensus treatment.

In the example, shown are the lists of additional

markers for red part of a cluster. When a candidate

marker is chosen for insertion, the closest to this

candidate marker on the map is highlighted by bold

font. To move from one variant (e.g., red) to the

second (blue) we should us the button:

Red

Similarly, for the second part of the chromosome, we also should choose only dominant markers, because

codominant markers are added to both parts simultaneously (see next page). 2_25


To facilitate adding codominant markers (answer <Yes>) from Heap to one or both parts of the linkage group, a

special table appears. Appears also a switch allowing to insert the marker to one (red or blue) parts or to both.

Can be inserted only downward the nearest marker

Can be inserted only upward the nearest marker

Cannot be inserted, otherwise consensus is violated

User’s decision: for one cluster upward and for the other downward insertion is better

User’s decision – no conflicts

In this table, for each marker to be added, the following information is provided: its nearest neighbor in both (Red and

Blue) parts of the cluster, distance to the nearest markers and change in the map length upon adding this marker

upward or downward the nearest marker. User can employ this information together with the already ordered maps,

to make his/her decisions. For the variants of adding a marker to both (red and blue) maps, the system analyzes and

marks symbolically the possible situations of insertion without violation the consensus order:

2_26

Extending the linkage group – insert function (continued)


As before, the inserted markers are underlined, both in the list of markers and in the figure of representing

the chromosome map. As before, the added markers can be removed (in such a case, removing a

codominant marker from one part leads to its automatic removal also from the second part).

In appending a codominant marker to both parts of the linkage group, the possibility for insertion (upward or

downward the closest marker, or prohibition) is defined by the corresponding symbol of the foregoing table. When

a marker is inserted in one part only, the result depends on user’s choice (e.g., consensus may be violated). Thus,

this option should be employed only in specific cases, e.g., when a certain marker must be included to the map.

A red dominant marker was inserted A codominant marker was inserted

Red

Blue

2_27

Extending the linkage group – insert function (continued)


This function is also applicable only for clusters after Consensus treatment. However, for this function, the list of

additional markers includes only those dominant markers from Heap that are in coupling with dominant markers of

the considered variant of the cluster. Therefore, for the cluster considered in function <insert>, the lists of the

attached markers for the two parts will have the following form:

It is worth recalling that the attached markers are not displayed in the list, whereas

the markers which attach them are marked by letter G. If we choose such a marker

and click the right mouse button, we’ll get an additional menu. Its options allow to

see all markers attached to the chosen marker or chosen interval.

Extending the linkage group – attach function

Red Blue

2_28


Output, final results

In fact, the output of the results remains the same as in the previous version. However, the option <Print (output

to EXEL)> can be applied only to a cluster (linkage group) carrying either one type of dominant markers, or both

types of dominant markers that are already in consensus order. In the last case, the user should print separately

two graphs for the maps corresponding to two types of dominant markers. The choice of the variant is conducted

by answering the question:

The option <Final results> is possible only after finishing consensus analysis for all clusters with two types of

dominant markers. For each such cluster two types of output file are generated.

2_29


Treatment of F2 data with only dominant markers

Working with such data is very similar to the approach described above. As a result of

the initial clustering, we will get two types of clusters:

In this case, each cluster can be treated separately, e.g., as in backcross data. We can

conduct ordering the markers alternated with stepwise clustering with gradual relaxation

of the threshold recombination and merging end-to-end clusters within each class of

markers (red or blue). The result of such steps will be reflected in a graphical for as the

following picture:

An important question is how to “combine” the ordered clusters of each type into representatives of linkage

groups, having in mind two complications: (a) we have not here shared codominant markers, and (b) there is an

increased danger of false linkage between non-syntenic repulsion phase markers (see Master et al., 2003). We

suggest a simple interactive tool based on analysis of distances between the clusters. Let us press button

<Display table of distances>.

2_30


A new window will appear with a table of distances between clusters with opposite linkage phases and possible

menu options.

As a result, the names of chosen clusters disappear from the table, and a list of chromosomes created by this

merging appear in a separate table.

Treatment of F2 data with only dominant markers (continued)

2_31

Rows present cluster with one type of markers (red) and columns the other one (blue). Small distance between two

clusters is a basis to consider them as a part of one chromosome. By analyzing the table of distances, user can

choose certain rows and columns and activate function <Select LGs that belong to one chromosome>.


If user suspects that the last choice was wrong, option <UnDo> can be applied. Option <All chromosomes>

erases the list of the created chromosomes and recovers the

initial table of distances. Option <Selected chromosome>

removes from the list the selected chromosome and recovers

corresponding clusters the table.

Analysis of distances between the distal markers of the clusters may allow

determination of relative position and orientation of clusters in the

chromosome. For that, the option <Orientation of LGs> can be employed.

The chromosomes ordered in such a ways are marked by a special sign. It

should be noted that not always the distances between distal markers of the

clusters provide sufficient information to allow unequivocal ordering.

After closing the table of distances and the list of chromosomes, we obtain

the list of clusters combined in chromosomes.

2_32

Treatment of F2 data with only dominant markers (continued)



We employ designation F1xF1 for mapping populations obtained by crossing two

heterozygous diploid individuals, although usually they are referred to as F1 in the

literature. Such crosses are commonl for outbred species and. For a heterozygous

locus, the progeny of such cross may segregate for two-to-four alleles. Loci

heterozygous in both parents and segregating for 4 or 3 alleles will be referred to as

F1 (crosses A1A2 x A3A4 and A1A2 x A1A3, A1A2 x A2A3); similarly The presence of

2 segregating alleles may represent a situation when both parents are heterozygous

for the same two alleles, A1A2 x A1A2 (referred to as F2), or situations when only

one parent is heterozygous, A1A2 x AA (referred to as testcross or ”backcross”).

Data on this population can be prepared in one of two formats: as a Tab-delimited

table or the format of JoinMap package. During input of population data you should

put the <Recoding data availability> in state <on> and press button <Select data

Population F1 x F1

Data input

file for recoding>. The program recoding.exe will input the data, test the file, out put the detected errors, and

create two files – with mother and father alleles (moth.txt and fath.txt ). The initial part of the names of these files

coincides with the name of the initial data file: for example, dataBor-moth.txt and dataBor-fath.txt. These files will

appear in the same folder where the initial data file was placed. Some details of Recoding function are provided in

the Instruction on the next page. For F2 markers, the scores of the markers are presented in both moth.txt and

fath.txt files; values of backcross markers appear in (one) corresponding file.

Creation of the two files is the first phase of input. Now you should put the <Recoding data availability> in state

<off> and, after pressing the button <Select data file for recoding>, select either of the two created files and

the folder where you want the solution will be saved. After input is done, the program calculates two matrices of

recombination rates, for female and male sides. Please note that this is a rather slow process, but we hope to

expedite it in the future.

2-33_


Instructions for Recoding.

If input file extension is .xls the desired worksheet is opened (assuming it contains data analogous to that as in

example "fam1test.xls "). Otherwise the user should select the format of text file.

Now two formats are available:

Tab-delimited table (columns - marker alleles) - this is the same as for Excel worksheet, but the file is tab-

delimited text and "Joint Map" program (CP - population type).

After the program have stopped to work, it create file with the name "<input_file_name>_err.txt“.

Messages in "<input_file_name>_err.txt" file:

1. If the user has clicked "Cancel" on some stage of the input, the file contains the message "Cancel".

2. If the opened worksheet was empty, the message in the file is "Empty worksheet".If some special format was

selected and the file is not compatible with it, the message in the file is "Erroneous file format".

3. If the file was interpreted successfully, the file contains: the number of individuals; the number of informative

markers for inheritance from father and mother; notes about non-informative markers (list of names) and errors

in data (if any) and user selection, how to work with data. For example:

"Genotypes: 237 Father: 244 markers Mother: 225 markers

Notes: Not informative for father genotype:NYU10 !

Not informative for mother genotype:

222C,210B,224B,208E,206B,205F,211E,109A,130A,44c,idh2,NYU19,NYU6,23a,NYU3,143C,twhh,142c,1c,NYU50,12

0C !

Mendelian errors found in YU22 !

Number of individuals in the file header is 237 and does not coincide with numbers for markers 211C (236),NYU3

(240)!

Incompatible dominance types in markers 214E!

User selection:

Markers having nonvalid number of individuals were excluded by the user!

Mendelian errors were replaced with missing values!

Marker 214E made codominant by the user.

If there are some informative markers for father, the data file "<input_file_name>_fath.txt" is created. The file contains

strings, which begin with name of marker following with space, then marker data (1,2,3,4,5,0 – as usual) divided with

tabs.If there are some informative markers for mother, the data file "<input_file_name>_moth.txt" is created, having

the same format.

2-34



After input and building the recombination matrices, the program displays a summary table with some characteristics

for each of the markers. It includes data on missing scores and Chi^2 for deviation of segregation ratios from

expected ones for both female and male sides (for backcross markers these data appear for only one of the sides).

For each marker, the table includes also its max LOD value (for most significant linkage on the whole set of the

remaining markers). All data can be sorted according to the values of any of the columns, helping to select the

markers that you may want to delete (too many missing scores, or high segregation distortion, or too loose linkage

with any of the remaining markers).

2_35


You can mark such problematic markers and press button <Delete markers>. These markers will appear in the

bottom part of the table and the total number of markers will be updated. Two undo options are available here:

<Undo of last step> or <Global Undo>. After closing this window, we can move to the next step.

Preliminary treatment (continued)

2_36


Control of bound together markers

This process was already described earlier, but Its specific features for the

F1 x F1 population should be are noted. Groups of bound together markers

are created separately for the male and female side data. A group is

registered if all its markers are of backcross type, or if includes only one

marker of F2 or F1 x F1 type and the others are of backcross type. But if

the group includes a few markers of F2 or F1 x F1 type, we will retain in the

group those markers of this type that are bound together in both moth.txt

and fath.txt files. As a result, a window will appear with info about groups of

bound together markers and the number of markers moved to Heap:

Marker clustering is conducted based on a common for the two sets

matrix of pair-wise recombination rates that is built from minimum between

male and female side recombination values. To start the process, you

should press button <First clustering>.

2_37


First Clustering

Clustering is conducted as with other populations. We

recommend to start with small threshold values.

In this example we started with threshold 0.1.

2_38


The general view of the obtained clusters

At the further steps of clustering in addition to parameter “Recomb.Rate threshold” uses also a

parameter “LOD threshold” 2_39


128

In the figure with the obtained clusters on

the previous page, green color marks

clusters that include only shared markers

(F2 or F1 x F1), red color denotes mother

alleles, and blue denotes father alleles.

By opening any of the clusters (using

double click) we obtain a window with the

two parts of the cluster, according to

female (r) and male (b) alleles. Shared

markers (F2 or F1) are denoted by Sh,

the remaining markers are of backcross

type. Each part, b and c, is ordered

separately. Ordering combined with re-

sampling (jackknife or bootstrap) is a

relatively slow process and we hope to

expedite it in the future version.

Treatment of each cluster

2_


129

As usually, during analysis, a marker can be moved to

Heap. A unique (backcross) marker is then deleted from

its set whereas a shared marker (F2 or F1*F1) can be

deleted from both male and female sets or only from one

of them. To make a proper decision, the user can check

the effect of the marker on the order stability in both male

and female parts.

Treatment of each cluster (continued)

2_


130

During the analysis of each male and

female part, it may be useful to take into

account the matrices of the pairwise

recombination rates LOD values. They

can be displayed on the screen by

using corresponding menu options.

Note that matrix LOD is calculated

during the ordering procedure and can

be displayed after this operation.

After closing the analyzed cluster, we

obtain the window which shows all

clusters. A cluster with male and

female part ordered is denoted as

shown below, with LG48 as example.

If shared markers in the two parts are

in the same order (consensus), then it

is displayed as LG8

After repeated opening of these clusters, we obtain the pictures of their parts, while for the cluster in

consensus a special message will also appear.


2_


131

If for a treated cluster the shared markers of the male and female parts

are in a conflicting order, they should be re-ordered using menu option

<Consensus>. It makes sense to conduct this step after the user has

achieved the reasonable size of the clusters during the stepwise increase

of the threshold recombination value. Thus, we demonstrate the

consensus analysis for one of the clusters assembled at threshold

r=0.25.

In the example, the cluster contains 15 markers of F2 or F1*F1 type

shared by the two parts of the cluster. Each part is already ordered.

Thus, we use menu option <Consensus> on either part of the screen

(left or right). The resulting window shows that we have here 15 shared

markers with four of them being in conflicting order.

By pressing button <Display chrom.> we can see

the markers, their relative positions and the conflicts.

We can also change the conflict situations by

ourselves as in case F2Dom (see pp. 2-20 – 2-23).

Now we press <Creation of consensus order> to

call for the process of consensus analysis .


2_ 2-43


132


After pressing button

<Start>, we will be able to

stop the process only after

finishing the function

<Optimization by Global

Criterion> (will be

accompanied by a special

message). We press button

<Stop> and could see that

the conflicts are eliminated.

(For a more detailed

description see Part 3).

2_

To resolve this conflict, we should repeat the process, by pressing the button <Display chrom>. Now, when all the

conflicts are resolved we can close this window.


133


Division option for F1_F1

By opening a cluster, we will actually get two sets of markers, maternal and paternal. For each of the sets we should

conduct Ordering and then Consensus procedures. Consider and example.

This cluster was saved at the stage Consensus,

hence the message

After pressing <OK> button, we will see the two

maps. Let us subdivide this cluster (both its

maps, r and b) into two parts.

2_


134


In the right part (LG2_b) chose menu option <Division of the linkage group>.

Select markers starting from the first one till JM042E24r_127(514), using Shift

button. The selected range of markers will be highlighted in blue. By pressing

the mouse right button, the following message will be obtained:

Division option for F1_F1 (continued)

Simultaneously, in the left part (LG2_r) a marker displayed with a larger bold

font will indicate the border of the marker set to be used for creation a new

cluster. The map will be shifted to leave place for displaying the highlighted

marker. In the left part we must choose the menu option <Division on the

linkage group> and select the set of markers flanked by highlighted (with bold

font) first and last markers.. The user may slightly modify the choice, e.g., by

including markers non-shared markers outside or inside of the selected range.

However, all shared markers (F2 and F1_F1 ) must be selected. By pressing

the right mouse button, we will get the following message:

By pressing this button we agree with

this suggestion, and the selected

markers will be removed from the list

and surrounded on the map by a

puncture frame.

2_



We return now back on the right part of the window, and click on the field near the marker list. This will allow us to

see all earlier selected markers. By pressing the mouse right button we’ll get the message


We accept this proposal and delete 24 markers: they

will be removed from the list and surrounded on the

map by a puncture frame. Now we select all markers

from the lower part on map displayed right side of the

screen, to create one more cluster, and repeat the

described above procedure. But now we do not need

to select repeatedly the menu option <Division on the

linkage group> (neither on the right nor on the left

side of the screen).

As a result of these steps, only one marker remains in the cluster (will be removed

to Heap). The derived maps will look as shown in the picture:

After the closure of the cluster, the following question appears:

By choosing <Yes> you implement the division of the cluster; otherwise

All the steps related to subdivision of this cluster will canceled. 2_47


136



As expected, we have now 26 instead of 25 clusters. Cluster LG2 includes now 38 markers and is marked as

a New, in status “ordered and in consensus”. Also a new cluster has appeared, LG26 with 25 markers,

in “ordered and in consensus”. The results should be saved using menu option <Save all clusters>.

2_

Other functions of the system (repeated clustering and output of the results) are very similar to those for

other types of mapping populations.


3_1

Table of Contents

3_1

3_2

3_3

3_4

3_5

3_6

3_8

3_9

3_10

3_11

3_14

3_15

3_17

3_18

3_18

3_19

3_28

3_32

3_34

3_34

3_37

3_40

3_41

3_44

3_44

3_47

MultiPoint Tutorial

Part 3 Consensus mapping analysis of multiple data sets

:

Introduction Building multilocus consensus maps

Two-phase algorithm for consensus mapping

The general scheme of consensus mapping analysis


Input data

Preliminary analysis

Ordering each chromosome separately

Consensus analysis in case of high proportion of shared markers Creating derivative datasets without unique markers

Consensus analysis in the absence of unique markers

Results of consensus analysis in the absence of unique markers

Continuation of the “consensus” analysis in the absence of unique markers

Consensus analysis for all markers Beginning the consensus analysis

Local analysis

Global analysis

Reviewing the results of consensus analysis

Displaying the results Integral map

Results for each set

Saving the intermediate results and continuing the analysis

Removing and adding sets in the process of consensus analysis

Appendix

Reorganization of maps for set pairs with conflicting orders

References


Building multilocus consensus maps

The Objective: Building multilocus genetic maps based on data

from different labs and mapping populations with a requirement

that shared markers must be in shared orders. Multilocus

consensus mapping (MCGM) is a further complication of

genome mapping. Two approaches were suggested to solve

MCGM problems, both looking for shared orders with maximum

number of shared markers. The first approach is based on

“giving credit” to the available maps; to obtain the consensus

solution different heuristics are employed, e.g., graph-analytical

method based on voting over partial orders (Yap et al. 2003;

Jackson et al. 2007).

(Mester et al. 2005; Korol et al. 2009). The algorithm implemented in MultiPoint is based on this approach and

includes two phases (see next page). On Phase I multilocus ordering for each data set is performed combined with

iterative re-sampling to evaluate the stability of marker orders in the individual maps. On Phase II, we consider

consensus mapping as a new variant of the famous Traveling Salesperson Problem (TSP) that can be formulated as

synchronized-TSP, and MCGM is solved by minimizing criterion of sum of recombination lengths along all multilocus

maps for the considered chromosome Mester et al., 2010).

We apply as the main criterion the sum of recombination rates (SRR) taken across the participant maps, i.e.

SRR=Li. Clearly, the amount of information about the multilocus order provided by the ith dataset is proportional to

its sample size Ni. Hence, it is natural to employ as optimization criterion the weighted SRR, with wi=Ni/(Ni) taken

as weights: SRR=wiLi. Another factor that may affect the between-set differences in information content is the

accuracy of marker scoring. Erroneous scoring leads to inflation of the map length, hence increased impact of low

quality data on the final result. To compensate for this effect, we employ weights that reduce the influence of the

datasets with long individual maps, e.g., wi=L0min/L0i, where L0i and L0min denote the initial (before consensus

Graph-theoretical approach for reconciling orders

received from different sources (Yap et al. 2003)

3_2

The second approach is based on searching of consensus

solution by re-analysis of raw data, instead of looking for

shared orders in pictures of previously constructed maps

analysis) map length of the ith and the shortest chromosome, respectively; to account both effects, we use

weights wi=(Ni/L0i)/(Ni/L0min).


Two-phase algorithm for consensus mapping

Original datasets

Phase I. Constructing verified

multilocus maps

Phase II. Consensus Mapping

SCF exact and heuristic

algorithms

FF heuristic algorithm

n>16

Heuristic algorithm

n14-16

Exact algorithm

For the second phase, i.e., for searching the consensus solution to MCGM, two different algorithms are available

in MultiPoint. The first one was named Full Frame (FF), and it assumes using special heuristics for global discrete

optimization of synchronized-TSP for all markers (unique, shared conflicting and non-conflicting). Our numerous

tests show that FF algorithm is effective with up to k=10-15 populations (data sets) with total number of shared

markers N<50. For larger problems, we developed another algorithm, based on defining regions of local conflicts

in the orders of shared markers (referred to as Specific Conflicted Frames, SCF), followed by “local” multilocus

ordering for each such region. This approach allows solving much larger MCGM problems (e.g., with k>20-30

populations and N>50-100 and more markers) by consequently moving along SCFs.

Solving MSGM via dissecting the chromosome into SCFs includes defining sets of conflicting marker regions

obtained on Phase I (based on non-synchronized solutions). Then, SCFs are formed by analysis of all pairs of the

resulting individual maps. Each SCF contains shared conflicting and non-conflicting markers, and some set-

specific (“unique”) markers. The remainder non-conflicting shared markers between the SCF regions are

considered as “frozen” anchors during the solution process for each SCF region (hence, only SCF markers

participate in the optimization process). This version of the algorithm significantly reduces CPU time. Moreover,

for certain sizes of SCF exact solution can be obtained.

The described algorithms are represented in more detail on the schemes in the next two pages.

3_3



Input and separate analysis of each dataset

Possible utilization of bound together markers for all sets simultaneously

Multilocus ordering of each set separately

Consensus ordering based on FF

(Global First step)

Defining local regions with conflict orders

of shared markers (SCF). Resolving the

conflicts. маркерами (Local step)

Continuing joint analysis of all sets using the preliminary results

from Global First or Local steps

Consensus ordering

Results

Output of the integral map

for the chosen method of

consensus analysis

Displaying all set maps

for the chosen method

of consensus analysis

Displaying for each set the

maps based on using

different consensus methods 3_4


Consensus local analysis of several sets with conflicting shared markers

Analysis of conflicts in pair-wise combinations of datasets with allowing for heuristic rules

of transposition and inversion in order to get a better initial point for consensus ordering

Defining frames (regions) with local

conflicts across several sets.

Consensus analysis for the defined

frame of the chosen group of sets

Until all pair-wise conflicts are resolved

Control for the presence of reminder

conflicts and their resolution

3_5


The consensus analysis system of the MultiPoint package is build for comparison of multiple maps and conducting

joint analysis of multiple data sets in order to build consensus maps that obey the requirement: shared markers in

these maps should appear in shared order. It should be noted that the consensus analysis across multiple data

sets in conducted separately for each chromosome. This means that for starting this analysis, the user should have

the markers classified into linkage groups, based on the literature, previous analysis

with MultiPoint standard version (2.1), or previous analysis with any other software.

In the corresponding window we should

select a folder with mapping data and a

concrete data file. The file name will be

marked as “select”. By pressing button

<Input Data> we input this file. After all

necessary data file are included, and their

names marked as “input” are seen in the

input window, we should press button <End

of input> and chose the folder for saving the

intermediate and final results of the analysis.

Then the main window of the analysis

appears. Note, that joint mapping analysis

may include datasets from different types of

mapping populations, e.g., dihaploid, F2,

and RIL, simultaneously.

The functions of consensus analysis are provided by the option <Consensus> of the

main menu, which includes 3 sub-options. Let us start with the first one, <Input files>.

After choosing this option, we should press button <Select data file for input>.

Input data

3_6


Input data (continued)

In case of mapping data for populations RIL_Selfing or RIL_Sib_mating, the analysis

of a separate data set is conducted using “observed” recombination rates but for

consensus mapping analysis we employ “transformed” rates (p. 1_83 – 1_85),

otherwise the comparison and joint analysis with other population data would be

impossible.

For IRIL data, the number of intercross

generations employed in building the

mapping population should be indicated.

Input of F2 data is controlled for the presence of dominant markers. In such a case, the dataset is split into two

subsets, each containing codominant markers and dominant markers in coupling phase (see also p. 2_30 – 2.32)

The names of these sets include the name of the initial file and extension “red” or “blue”.

3_7


After data input, we obtain a window with a table showing the list of all chosen files, population type(s), sample sizes

and numbers of markers. These files are also named as (numbered) data sets.

During the analysis of shared markers we should take into account the bound together markers among the shared

markers. Markers that belong to this class actually do not participate in further analysis and could be moved to Heap

till that last phase, when they can be returned to the final consensus maps. Their presence in the input data is

reflected in the difference in the number of markers between the < in set> and < in map> columns. The item <Shared

markers> for each set represents the number of shared markers, i.e., those that appear at least once among the

marker names of other sets. The number of bound together markers for each

set is shown in brackets.

As usually, we can employ the function <Control of bound together

markers>, and indicate names (or part of names) of priority markers

(see the details in p. 2_2, 1_38). After pressing button <Start of control> the

item <Markers in the map> will change for each set.

During the analysis of groups of bound together markers, only markers that are not shared, or those that have only

one shared marker are considered (and this shared marker, or “delegate” will represent the group). In groups with

several bound together shared markers and several unique markers, the program chooses a shared marker with

highest priority that is used then as a “delegate” of the unique markers of the group (that will be moved to Heap), while

the other shared markers will remain in the set. It is reasonable already at

this stage to choose the mode of transformation of recombination rates to

map distances (cM) and save corresponding choice using the option

<Save all change>.

Preliminary analysis

3_8


Each chromosome can be ordered, and certain markers can be moved to Heap. The “consensus” framework imposes

some constraints on this stage of analysis (compared to our standard scheme described in the non-consensus

chapters of the tutorial). Namely, to improve the quality of the multilocus order upon resampling analysis, shared

problematic markers (if the list of detected problematic markers include shared markers) cannot be deleted using

automatic <Control of monotony> function. Such marker(s) can be deleted only manually. Note that shared markers

are signed by symbol “Sh”. After separate ordering, each set is marked in the table with a special symbol; the weights

of each set in the corresponding optimization criteria also appear in the table.

Ordering each chromosome separately

Our consensus mapping is based on joint analysis of raw mapping data from multiple populations. For the chromosome

in question, we consider as the best solution such set of maps for the involved mapping populations that provides

minimum to the criterion “weighted sum of map lengths across the involved populations” (for the proposed weights see

p. 3_2). We can choose one of the three proposed weighting approaches, according to: (a) the sample sizes, (b)

lengths of individual (before consensus analysis) maps, and combined (a) & (b). There is an option to put as weights

numerical values (multipliers) proposed by the user. Then the column <Cur.weight> is replaced by values of the chosen

weight.

The weight for each set can be changed by the user, by selecting the set and pressing

the title <Cur.weight>. Then, in the appeared window

the corresponding value is replaced by the needed

value, followed by pressing <OK> button.

3_9


3_10

Consensus analysis in case of high proportion of shared markers

During data input, the program determines the set of shared markers. If many of the shared markers are co-

segregating, we perform the “bound together” procedure only for shared markers. It is noteworthy that

some two markers co-segregating in one dataset may recombine in another dataset, while one or both of

them may be absent in a third dataset. Thus, we first build bound together groups only for such markers

that are present and co-segregate in all data sets. The next step is performing the <Control of bound

together markers> function, which defines groups of bound together markers for each data set. During the

analysis of each such group, the program takes into account the groups of bound together markers

obtained for shared markers. As a result, the maps of separate data sets may include cosegregating shared

markers that do not belong to one group of bound together markers. This peculiarity will also be reflected in

the corresponding output EXCEL files.

Some peculiarities of the function “bound together” function when all markers are shared

We consider here a version of consensus analysis applied only to shared markers. It may be especially useful

when several mapping populations have been genotyped with an SNP array. In such situations the proportion

of shared markers between at least pairs or trios of populations may be very high. It appears that focusing on

shared markers simplifies the analysis and enables to perform consensus mapping for a very high number of

markers (Mester et al. 2015). Therefore, after separate ordering of all involved datasets, the user should

estimate the proportion of unique markers and decide whether he/she is ready to ignore unique markers and

conduct consensus analysis only for the shared markers.

----------------------------------------------------------

Mester D., Y. Ronin, P. Schnable, S. Aluru and A.B. Korol. 2015. Fast and accurate construction of ultra-dense consensus genetic

maps using evolution strategy optimization. PloS One 10(4): e0122485.


3_11

We consider a stage after the input of all datasets and ordering of each such set. The data include a high

proportion of shared markers and many cosegregating markers

The assumption that all markers are shared simplifies the analysis,

enables to work with very high number of markers, and most

importantly allows applying our heuristics to global optimization

criteria (Mester et al. 2015). After separate ordering of all involved

datasets, we should estimate the proportion of unique markers and

decide whether the proportion of unique (population specific)

markers is small enough to ignore them and perform consensus

analysis only for the shared markers. In such a case, by answering

YES, the unique markers will be removed from each datasets. The

result of this operation is shown in the table on the next page:

Creating derivative datasets without unique markers

Consensus analysis in case of high proportion of shared markers (continued)


3_12

You should apply function <ordering> for each set. This step will provide us with information on

map length of each set, with and without unique markers (in cM and as a sum of recombination

rates across all intervals (i.e. for all pairs of adjacent markers).

Using the option <View list of the sets> we can move from the comparative data

on map lengths to details on each set. Here we can choose between

<all markers> and <shared only>.

Creating derivative datasets without unique markers (continued)


3_13

At this stage (before the consensus analysis)

we can compare two variants of the map for

each data set: with and without unique

markers. For that, we should activate the

button <Comparing two first order>, select the

menu option <View list of the sets detail>

All markers> and chose one of the sets.

In the resulting window, we press button

<Display> and obtain two sets: <First> - with

unique markers, and <First shared only> -

without unique markers. Symbol (Sh) indicates

shared markers.

Creating derivative datasets without unique markers (continued)


3_14

Consensus analysis in the absense of unique markers

By activating button <shared markers> and pressing buttons <global analysis> and <Start of process> we

obtain the window reflecting the process of consensus analysis. After a certain delay, we will see in the column

<Non-consensus solution> the initial map length of each set and the sum of map lengths of all sets (multiplied

by 100000) in column “Criterion’ and the proxy of map length for each set calculated as a sum of recombination

rates across intervals.

During the consensus analysis, the

values in column <Consensus solution>

become smaller. Simultaneously, the

values «Cost of consensus,%»

(reflecting the proximity of the consensus

map lengths and the initial map lengths)

become smaller. When these differences

do not change anymore, you can stop

the process by pressing the button

<Stop>. This table is saved in the project

folder as a txt file CostofConsensus.txt.


3_15

Results of consensus analysis in the absence of unique markers

The results of consensus analysis are summarized in the following table:

We can compare the

consensus and initial marker

order for each of the

analyzed sets. For that, we

should select the menu

option <Comparing all

result with first>, move to

<detail> and select the

desired set.

In the example we can see

two sets that differ in the

degree of changes of the

consensus order compared

to the initial order.


3_16

A useful function is comparison marker positions in the initial and consensus maps. If we should select a marker

in the table <First shared only> this marker will be highlighted I bold blue in the <Global shared> table.

Results of consensus analysis in the absence of unique markers (continued)


3_17

Continuation of the “consensus” analysis in the absence of unique markers

Two reasons justifying the need in continuation of the optimization process in consensus analysis can be

mentioned:

(a) The user may want to continue the process assuming that the optimal solution has not yet been obtained.

(b) A break in computations has occurred during the analysis. In this case, after the next enter to the system,

a message will appear: «Previous computation was ended abnormality. Do you want to continue the

computation from the last control point?» By answering “YES” you can continue the process. You should

select and press the same buttons as described on page 3_14, but instead of button <Start of process>

you press button <Continue the process>.


Beginning the consensus analysis

We employ a combination of two methods for searching the consensus solution, local and global. The first one allows

fast calculation of a good approximation that can be employed as a starting point for the global analysis. In

global analysis we search for the solution by working simultaneously with all markers of the linkage group. In the

local analysis, using the individual (non-consensus) solutions for each mapping population, we first reveal regions of

local conflicts separated by non-conflicting regions. The consensus analysis is then applied separately to each

conflicting region using our heuristic discrete optimization tools. Due to relatively small size of such regions (with

respect to the number of shared markers), the solution does not take too much CPU time. Moreover, when the region

is really small, and exact solution is also possible. Combining the solutions for the local conflicts, we obtain a good

approximation for the global analysis. However, we can also use the global analysis from the beginning. Still, we

advice to start using local analysis.

To start the analysis begin, we should chose the method of solution. The

global analysis is conducted with all datasets. It can start from the results

of the initial analysis conducted before consensus with each data set separately (denoted as “first”), or from the

results of local consensus analysis as the initial approximation. After the type of analysis is chosen, the upper part

of the window shows the map length for each set (in cM) as well as total length in cM and total sum of

recombination rates (Criterion), for the initial (before consensus) maps.

Using the options of <View list of the set> menu, we can see the main characteristics of each set.

3_18

Consensus analysis for all markers


When the local analysis is chosen, the user obtain a table of pair-wise conflicts

in the orders of shared markers. For pair of sets we see the number of shared

markers and, when some are in conflicting order, the number of such conflicts. Namely, symbol “14(С_6)” indicates that for considered pair of sets, the number of shared markers in the targeted

linkage group is 14 and the number of shared markers in conflict(s) is 6. In the considered example, the number of

conflicting markers for each pair is lower than the total number of shared markers per pair. More complex situations

will be shown in the further examples (see Appendix 1).

In many cases, the data may include bound together markers. They can

appear in different orders in some sets but this cannot be considered as

a conflict. In the example, markers m21 and m87 (with zero

recombination) can be considered as staying in the same order. The

same would be true if zero recombination is found only in one of these

two sets. In calculating the table of pair-wise conflicts, we do not consider such cases as conflicting orders, thereby

reducing the total number of conflicts, but the user may prefer to not using such simplification, by negative answer

to the system question:

Yes

Obviously, the alternative answers will result in different tables of pair-wise conflicts.

No

Local analysis

3_19


We consider now how to search the

solution by dividing the datasets into

regions of local conflicts followed by

resolution of the local conflicts. By

double-click on the name of one of the

sets, e.g. Set3, we obtain a new

window that shows the shared markers

of the chosen set.

. To confirm the choice we press <OK>.

If we found a few small neighborhoods

with conflicts, we may want to select

these regions simultaneously resulting in

a combined conflict region.

In the employed example, the differences between the tables are not big because of the small proportion of bound

together markers. If opposite is the case, then the difference would be much more important and affect the

performance of the analysis.

shared

Local analysis (continued)

3_20


The system collects all the sets that contain the selected conflicting markers. Clearly, if the selected markers in such a

set are in conflict with some markers in other sets from the same defined region, these other markers are also

considered as a part of the conflict. Thus, all sets are tested for the extended thereby group of conflicting markers.

Consequently, the extension involves not only the conflicting markers but also the corresponding sets of populations.

Then, for each set and its local group of conflicting markers, the system finds the first and last markers in the group

and analyzes all markers above the first and below the last conflicting markers, until the next conflict is encountered

on from one or both sides or till the end of the linkage group is reached. To conduct local consensus analysis, the

group of conflicting markers is surrounded by a minimal number of shared non-conflicting markers. Red color here

highlights the shared markers comprising the conflict region surrounded by non-conflicting border markers (the whole

group is denoted by left red bracket) for each of the analyzed sets, whereas non-conflicting shared markers (Sh_*) are

shown in usual color. The list of sets included in the local consensus analysis of the region in question is provided as

well as the list of shared markers in this region.


3_21


In selection an interval of conflicting markers, the number of chosen markers can be increased by using menu

option <Include inside shared markers>.

The importance of this option can be seen from the following example.

Marker <Sh_ mar19> residing within the group of conflicting markers

shows no conflicts with these markers in any of the sets (hence

presented in black font). Without using this option, its order will not be

controlled, resulting in a possibility of conflicts.

If we do use this option, marker <Sh_ mar19> will be included to the

set of conflicting markers (highlighted in red) and after consensus

ordering will appear in shared order.


3_22


.

We continue to demonstrate the analysis using the example from p. 3_12. As a rule, the selected by default regions

(marked by red brackets) are well suited for searching local solution. However, we can extend these regions by

using menu option <Change the list of chosen set> <Change the selected part of the chosen set>. Then we

choose the set to be changed, and after selecting in list of its markers the upper and lower markers, press <OK>.

Several conditions should be taken into account. If a shared non-conflict marker is a border marker (included in the

bracketed group), it cannot be an internal marker for of targeted region in any of the sets, and vice versa: if it is

internal for any of such regions, the same should be correct for all other sets. The internal part cannot be bordered

by conflict markers from both sides. Thus, the region should be extended in such a way that at least one of the

border conflicts becomes internal. This may cause changes in all involved sets. The extension may not always be

possible, e.g., if in one of the sets a marker that should be internal is on the upper or bottom border. In such a case

a corresponding message appears and the extension is cancelled.


3_23


By closing the window <Consensus> we get the system’s inquiry whether we are ready to move to the process of

consensus analysis. The answer <No> makes sense if in one of the sets the selected part includes a high number

of markers or is bordered from both sides by conflicting markers. In such a case we may want to return to

<Consensus> window in order to try defining conflicting regions starting from another set. If we answer <Yes>, a

warning message may appear about long time that will be needed if we employ exact solution method. For such

cases we recommend using heuristic method of local search. The user may choose one of two calculation

methods by answering to the special system request. Our experience shows practically identical results; you may

choose the exact method, but the heuristic method works faster. If you get tired from waiting the result, you may

close the window ProgressBar and return to the stage of selection of conflicting intervals or just change the

calculation method (i.e. move to the heuristic method).

After returning to <Consensus> window we should press button <Start of consensus process>. The process of

exact local solution (testing all possible local orders) will take a relatively short time resulting in a line displaying

the results. In the first position of this line we see the number of the trial (1, 2, etc.). Symbol «_СЕ» indicates that

the method of exact solution was employed whereas «_Н» will indicate that heuristic method of local search was

employed. For each set, its total map length after applying the consensus analysis is shown (in cM) and the

number of markers (in brackets) in the selected for local analysis region. The item Time indicates computation

time, in sec (for the heuristic method – the time allocated for the solution), and item Criterion – the reached value

of the optimization criterion (sum of the recombination rates along the treated chromosome multiplied by 100000).

The analysis can be repeated, e.g., for larger parts of the chromosome or for the same part but using the heuristic

method. For that, we should press button <Return to the work with this part>. After the new round of analysis, we’ll

get a new line of results with a new trial number.


3_24



After selecting the heuristic method, we obtain a window representing the optimization process, and can press the

button <Start>.

In our example, we show a situation when for each local group of conflicting markers the solution is searched by

heuristic method (rather than using the exact local solution by testing all possible local orders). This approach is

referred to as “New Full frame algorithm (FF)” (see p. 3_2 -3_3). First, the algorithm orders each set separately; the

results are displayed in column “Non-Synchronized Solution” (the length of each solution and the total length are

shown). Once this stage is over, the corresponding field in the right upper corner of the window is highlighted in green.

Next stage, called “Skeleton”, is to find the best order of shared markers. And the last stage, called ”Consensus”, is

to find the optimal consensus solution for shared markers upon the inclusion of non-shared markers. The obtained

results are reflected in the column “FF synchronized solution”. The stage “Consensus” is finished after

pressing the button <<Stop>>.

The order of

shared markers.

The length of each

solution and the

total length are the

sums of the rates

of recombination

between adjacent

markers along the

maps, multiplied by

10,000

3_25


Each such trial can be chosen for the further steps of analysis by <double click> on the corresponding trial name.

After this choice, the first line of the table will change and the table of pair-wise conflicts will be re-calculated for

all sets. The order of markers in all of the sets will change as a result of consensus analysis.

Working consequently with each set, i.e., resolving the conflicts in the defined regions by local consensus

analysis, we will reach the situation when the table of shared markers will be free of conflicts.


3_26


The system will also check for conflicts of triples, in addition to pairwise conflicts. If such triple conflicts are

detected, the user will be asked whether he/she would like to resolve these conflicts. If the answer is <Yes>, the

conflicting sets will be shown. In the employed example, conflicts of shared markers were found in sets Set1,

Set7, and Set8, while Set2 was added because it also includes conflicting markers (but in the same order as in

Set1). Consensus analysis should be conducted for the shown sets, as described above.

After finishing the analysis, the system will again check for conflicts, and if no conflicts are found a message

<All right> appears.


3_27


A drawback of local analysis is in the fact that the resulting solution orders are

combined from several well ordered pieces. We will show here two examples of

application of global analysis. By choosing global analysis we get the system’s

question whether we want to use the previously obtained local solution as a

starting point for global analysis. Answer <No> gives us the possibility to start

directly the global analysis by pressing button <Start of process>. Global analysis

employs the heuristic method of optimization. In contrast to the window on p. 3_17,

here we show 47 shared markers. Their order changes at the “Skeleton” stage.

After the user stops the “Consensus” stage, the program conducts one more

iteration of consensus analysis and moves to the stage “Recalculating the sets”.

After this stage, the process is finished and the window is closed automatically.

Global analysis

3_28


Another possibility to utilize global analysis is to use it after local analysis,

with a hope to further improve the solution. This time, we choose again

<global analysis> and answer <Yes> to the question whether we want to

use the local solution as a starting point for global analysis.

A window of the process for employing of the heuristic method appears, and we can press the <Start> button.

Global analysis (continued)

3_29


The obtained result shows that the solution has indeed improved by very slightly.

The considered variant of the analysis employs as a starting point the order of shared markers obtained by using

the local analysis. Therefore, the stage “Skeleton” is skipped, and instead of the “FF synchronized solution”

column, the column “SCF synchronized solution” is filled in by the values from the local solution. The process of

searching the solution is started here from the stage “Optimization by Global Criterion”.

The global analysis can be continued (with a hope that the solution can be further improved). For that we should

switch on <Global analysis>, answer <No> to the system’s question and press the button <Continue of the

process>.

The window of the process for heuristic method here is the same as the one shown on p. 3_21. The obtain

solution will replace the solution shown in the line “First global”.


3_30


Consider the second example. It includes 16 sets, but in the figure we show only 12. The local solution looks as:

The obtained global solution is obviously worse than the local one.

But when the global analysis was started with the local solution, it considerably improved the result.


3_31


Now for each set we have the results of initial (set-specific) ordering and of several consensus ordering analyses. If

we employ the menu option <View list of the sets> <in detail>, we will see detailed characteristics of the

solutions for each set. Simultaneously, a window will appear that helps to see the correspondence of marker order

in each set with any of the obtained variants of consensus solutions.

Furthermore, by choosing in this window the radio button <Comparison> and one of the sets, e.g., Set3, we get

a new window, that represents all results of individual and consensus ordering of this set.

In this window we can choose one of the employed variants of solution and display the corresponding results

for each set obtained using the chosen method.

Reviewing the results of consensus analysis

3_32


Reviewing the results of consensus analysis (continued)

To conclude the request, the button <Display> in this window should be pressed.

3_33


Displaying the results, integral map

Two types of displaying the results of consensus analysis are currently

available in the package: the compromised map order for each set and integral

map. For the integral presentation we suggest 3 types of outputs: (1) a text file

for all shared markers; (2) a window with graphs of all sets, and (3) a txt file that

serves as input for drawing all ordered shared markers. In any case, we should

indicate which variant of the employed solutions we want to output (Local, First

global, local global).

Menu option <Integral map>: it allows output of shared markers ordered during

consensus analysis, to a text file IntegralMap. The file name also provides info

about the type of the conducted consensus analysis, e.g., IntegralMap ForLocal.

This file contains the names of shared markers with lists of sets where this

marker appears. In {…} brackets we show markers with uncertain order. Thus,

in the employed example, markers *m7 *m59 *m58 can be put in another order

without changing the optimization criterion. Brackets […] include a marker that is

absolutely linked with its next neighbor. In our example, this is marker *m86 that

shows no recombination with marker *m56.

The menu option <Integral picture> allows to output markers to a special

visualization file IntegralMapForPicture_ForLocal.viz. Using publicly available

program http://www.graphviz.org one can get a graphical presentation of the

integral map, as shown on the next page. This graph does not include bound

together (i.e., absolutely linked) markers. Depending on user’s choice, the file may

include only shared markers or shared plus unique (i.e., set specific) markers. In

the last case, the file name includes sub-name (Unique). All saved files are stored

in a special sub-folder that includes data of the project - the folder <ResultFiles>.

3_34


Unfortunately, the program http://www.graphviz.org imposes restrictions on the marker names: the name cannot

begin from digitals (0,1,…,9) and cannot include some special symbols (*, _, /, etc.). In the integral map, shared

and unique markers are highlighted in brown and grey colors, respectively.

Displaying the results, integral map (continued)

3_35


By choosing the menu option <All maps> we get a window with maps for all sets, for the selected variant of the

solution procedure. To see the maps, we should press <Display> button.

Scrolling allow to see all sets (if their number is >6).

3_36

Displaying the results, integral map (continued)


To obtain a graphical output of ordered markers for each data set, we need to

shift the list of sets to the state <View list of the sets> <in detail>, and then to

select the desired variant out of the conducted consensus analyses and the data

set of interest.

The menu option <Results for chosen set> has two sub-options for

output: to text file and to EXCEL file. In its turn, output to EXCEL may be

in two forms: as a map of markers and as a table graphical genotypes,

exactly as in the previously described options of see p. 1_74-79 and

2_12. The map output is shown in the figure.

Displaying the results for each set

3_37


An output EXCEL file for genotypes for the chosen set looks as in the usual mapping analysis (see also

p. 1_80-82).

Displaying the results for each set (continued)

3_38


All markers of the chosen set will be saved in text files of the folder containing the initial data of the project, namely,

in its special sub-folder <ResultFiles>..

For using the menu option <Result for every

set><Output to text file>, it is necessary to

choose one of the variants of the employed

consensus analysis and the needed set.

3_39

After saving is finished, it may happen that this sub-folder contains one, two, or three text files: the file with

the name *_Sk.txt includes only skeleton markers, file *_Sk&Ext.txt ” includes skeleton and bound

together markers, whereas file *_Glob.txt contains all markers including attached. Therefore, choosing

this option leads to generation of a file containing skeleton markers. Two other types of files appear only if

the solution for the chosen set includes bound together and (or) attached markers

Displaying the results for each set (continued)


Saving the intermediate results and continuing the analysis

During the analysis, the user may need to save various results, in order to have a flexibility of comparing the

efficiency of different scenarios of consensus mapping. For that, the menu option <Save all change> is employed.

We recommend to use this option after the data input, after initial (individual) ordering of the data sets, after

finishing <Global analysis> process and after each its continuation. You may also want to save the results during

some steps of <Local analysis>.

To continue the work, you should choose the option <Consensus>-><Open saved file> of the main menu and

select the step from which you want to continue the process; for example, S1 as one of the four saved steps (see

also p. 1_68 -1_70)

Usually, it makes more sense to continue the

analysis starting from the last saved step. But you

may also have situations that you want to start the

consensus analysis from the beginning, but with

somehow corrected one or few of the data sets.

Like in the usual multilocus mapping, repeated

analysis from different saved states will result in a

tree of steps. If needed, some of the saved states

can be deleted using <Clear saved file> option.

3_40


3_41

Removing and adding sets in the process of consensus analysis

It may be necessary to remove or add sets during consensus mapping. The process of

consensus analysis will have to be repeated, but some sets you can save in the form in

which they were before the analysis. To do this, open the previously saved results of the

treatment; it is necessary to choose the saving step before you starting attaching

markers. Removal or adding functions are activated by the relevant menu options.

When either of these options is chosen, an automatic additional save of the opened set is conducted. In the folder

carrying the selected set, a new sub-folder PartSet1 in case of removing a set and ExtendSet1 in case of adding

a set. These options can be used several times for the same open set with corresponding changes in the sub-

folder names. For example, if these functions are used for folder PartSet1, a sub-folder PartSet1_2 is created etc.

Using any of these functions will change the array of shared markers, so all shared markers previously moved to

<Heap> and <heapDelegate>, will be returned to their sets

Delete set(s)

Before choosing a menu option, from the list of files you should select the sets you want to delete, then answer

<Yes> to the message to confirm the deletion. After sets removing, a situation may arise when in one of the

remaining sets, the number of shared markers is less than 3. You receive a message and such sets will be

deleted automatically. The remaining sets are searched for markers that were shared in the initial data, but after

removal of a data set become unique. For the new combination of sets for consensus analysis, the program

selects the bound together markers and moves these markers to the <Heap> and <heapDelegate>. The saved

sets are displayed in the Consensus window; sets with changed marker content are indicated by the sign !! This

can be a set, in which some shared markers become unique, or a set in which there some additional markers

become shared (in case when these markers were sticky with markers of the deleted set).



3_42

Removing and adding sets in the process of consensus analysis (continued)

In this example we have 8 initial sets.

Set6 (1B_dataGG_3_x) and Set8 (1B_data_GG_10_x) have been deleted. This caused also a need to delete

Set5 (chr1_MC2_1B_tub) because the number of markers that it shared with any other remaining set has become

less than 3. Thus, the result is 5 sets for new consensus analysis.

The marked sets should be opened and treated using function <Ordering>. The

markers converted from shared to unique ones are also marked with sign !!.

During the analysis, they can be easily seen and, if needed, removed. After the

<Ordering> step, this sign disappear from markers and from the sets.


3_43

Removing and adding sets in the process of consensus analysis (continued)

Add set(s)

When you select this option, in the window Consensus a window appears for input sets. After input we should

press button <End input>. When you add new sets the number of shared markers may increase and some

“unique” markers that have been removed earlier to Heap, may become “shared”. If such a marker was found

in one of the old sets, all its markers from <Heap> and <heapDelegate> of this will be returned back to the

analysis. Before the new combination of sets will be subjected to consensus analysis, the program selects the

bound together markers and moves these markers to the <Heap> and <heapDelegate>. The new enlarged

combination of sets is displayed; the sets with markers returned from <Heap> and <heapDelegate> are signed

by !!. After opening such sets you should treat them using the functions Control of bound together markers and

Ordering. Obviously, the “cleaning” operation for these sets should be conducted from the beginning.


Reorganization of maps for set pairs with conflicting orders

The bottom part of the window includes a table of shared markers, with the number of shared markers for each pair

and number of conflicts. The record «С_10» means that for the considered pair of sets the number of conflicts is 10.

For problems with a large number of conflict markers we strongly

suggest first to reduce the number of conflicts by reorganizing the sets.

For that, we should analyze each pair of sets having a large number of

conflicts. In order to obtain the info about conflicting markers for a pair of

sets we should first select a line and then a column of the table. Thus,

for Set2 (line ) × Set3 (column), the number of conflicts is 10. We can

display graphically the situation with shared markers for this pair of sets.

In the figure presented on the next page we can see: the names of the

corresponding files, the names of shared markers of the chosen

chromosomes and their map positions, as well as the distance between

the markers and the number of set-specific (“unique” markers) in each

map interval. Most importantly for this stage of analysis is that conflicting

markers are indicated.

Appendix

3_44


To reduce the number of conflicts we can use the menu option

<Reorganization of the chromosome>. Two options are available for that:

inversion of the group of markers in one of the sets and transposition of a

group of markers to a selected interval.

We show now how the reorganization of the sets can be conducted. We

start from the menu option <Move the selected part> and apply it to the

set Set3. As a result, a list of all markers of this set together with its picture

will appear: the picture on the previous page shows that two markers

(mar13,mar14) are good candidates for moving down in the selected part

(highlighted in red) of Set2 Press now <OK>. As a result, these markers

were moved to the chosen region and a part of the conflict is resolved. If

in some analysis, some step proved not successful, you can employ menu

option <UnDo> and try another variant of transposition.

Reorganization of maps for set pairs with conflicting orders (continued)

3_45


The transposition resolved only a part of the conflict. The reminder part can be referred to as a “propeller”. If the

propeller includes an entire segment, it can be inverted. In our example (p.3_33), we select the Set2 and try to

employ menu option <Inversion the selected part>.

We again will obtain the list of all markers and a

picture of the set.

We select now the part of the list that we want to

invert. In this example it is the part from markers

*mar9 to marker *mar2. Based on the info from the

picture about the interval length, we can extend the

target segment by including the adjacent unique

markers. Press button <OK> to start. As a result, we

reduced the number of conflicts to two. If we are not

happy with the result, we can employ menu option

<UnDo> and then try several other inversion

variants. If the result is acceptable, the initial sets

should be replaced by the new ones. For that we

press the button <Save transformed data> and

answer <Yes> to the questions asked by the system

when we try to close the window.

The described treatment of individual pair-wise

conflicts can considerably reduce the total number of

conflicts in the sets before we apply global analysis. .

Reorganization of maps for set pairs with conflicting orders (continued)

3_46


References


of our relevant publications was provided on p. 1_8. Here we provide references to other papers cited in the

Tutorial.


coefficient of coincidence. Theor Appl Genet 104: 786–796.




natural populations. Genetics 121: 174-181.



Research Technical Report Third Edition (Beta Distribution 3B).




linkage analysis of complex diseases? Hum Genet 114: 588-593.







3_47


Table of Contents

MultiPoint Tutorial

Part 4 - Building ultra-dense genetic maps

in the presence of genotyping errors and missing data

:

Introduction

Input data

Analysis of missing and segregation

Window “Creation of global parameters”

Clustering

Treatment of a separate LG

Treatment of a set of LGs

Option <Save all clusters>

Output of the final results

References

4_2

4_3

4_4

4_6

4_10

4_12

4_19

4_24

4_22

4_25

4_1


Introduction

4_2

Recent advances of genomic technologies have opened unprecedented possibilities of relatively inexpensive genotyping at

genome-wide scale generating a large number of SNP markers. It would seem that there is now everything needed to build

high quality ultra-dense genetic maps. This should be the case if genotyping is error free and the number of markers per

chromosome is of the same order of magnitude as the population size. With very large number of markers available for a

mapping population, most of the markers on a genetic map will remain inseparable by recombination and will represent groups

of tightly linked loci. In such case, only one representative per each group could be placed on the (skeleton) map; all of the

remaining markers can then be attached to the skeleton. The real situation is significantly complicated by technology-

associated genotyping errors, which “diversify” a certain part of markers that would be identical in an ideal situation of no

errors. The higher the error rate and the ratio of number of marker to population size the more difficult is the problem of

building a reliable map. The situation is further complicated by missing data that is usual in genotyping-by-sequencing (GBS)

approach and cannot be compensated by imputation of missing scores, especially for RIL populations.

The sub-package MultiPoint-ultradense suggests a method of addressing these problems that is based on a simple probabilistic

estimation of the proportion of identical markers, as a function of the error level when the errors are rare, and of the radius of

“diversified” markers when the error level is increased (Ronin et al. 2015, 2017). Let, for example, sample size be N = 100 and

the probability of genotyping error p = 0.01 per marker. Then the probability that in all individuals both alleles of the marker m

will be unmistakably identified, is P = (1-p)N = (1-0.01)100 ≈ e-1. This means that assuming 1% error rate within a group of

absolutely linked markers, about a third will still remain error-free. Thus, for building the skeleton map one can select error-free

markers based on the presence of their “twins” in the sample. However, there is also non-zero probability of an opposite effect,

i.e., when non-identical markers become “twins” because of genotyping errors. Therefore, a certain threshold is introduced in

our algorithm for the selection of markers with a sufficient number of absolutely linked copies (Ronin et al. 2015). With higher

level of errors, the proportion of twin markers may become negligible: the genotyping errors lead to dissipation of the twin

groups, so that the resulting marker agglomerations are “blurred” around the positions of the (unobservable because of errors)

initial points corresponding to error-free situation. Therefore, with higher level of errors we employ an additional marker filtration.

Namely, after the twin groups exceeding a pre-set threshold size ts0 are selected as candidate for the skeletal map, we conduct

clustering of the remaining markers by a procedure similar to k-means algorithm. Then, representative markers of clusters are

added to the set of selected candidate markers for building the skeletal map (Ronin et al. 2017). The developed approach

allows for mapping big sets of markers (~105-106), i.e. suitable to deal with mapping data generated by GBS approach.

---------------------------------------------------------------------

Ronin et al. 2015 Building ultra-dense genetic maps in the presence of genotyping errors and missing data, pp. 127-133 in Proc. the 12th

Intern. Wheat Genetics Symp., edited by Y. Matsuoka and S. Takumi. Springer, Yokohama, Japan.

Ronin et al. A new approach for building ultra-high density linkage maps based on efficient filtering of trustable markers. Genetics 2017


Input data

We have two variants of data input: input followed by clustering the markers

into linkage groups (LGs), and input of one LG. In the first, we use option

Open->Population file, in the second we use Open->Input of one LG only. In

both cases, we get the input window. The issues related to input are described

in detail in section 1. As before, the button <Select data file for input> is used

to select the data file and the button <Input Data> to load the data.

It is noteworthy that mapping data, especially those

generated via genotyping-by sequencing under relatively

low coverage level, may have high level of missing data

and massive segregation distortion (hence high 2 for

deviations from the expected ratio). Thus, during the input

we suggest to conduct certain data filtering. If missing is

very high, the first step in filtering is for missing level. In

such situation, the user gets a warning message:

4_3

After this message, a window to set up the

parameters of filtering is opened.


Analysis of missing and segregation

4_4


Joint ULD analysis of co-dominant and dominant markers in F2 populations

Upon data input, filtering markers for missing and segregation is performed separately for co-dominant markers and each of the

two types of dominant markers. In the upper part of the window, you should chose the button defining the currently selected

marker type for filtering.

Then, for the defined group, set the threshold values of filtering parameters for missing and segregation distortion chi2, conduct

saving and move to the next group.

After finishing filtering for all three groups, close the window. The system then generates 3 sub-projects named

_Cod3, _Dom4, and _Dom5, and for each of these sub-projects the selected markers are saved in its folder ‘Data’.

Analysis of missing and segregation (continued)

4_5


Function “bound together markers”

We first describe the parameters that should be defined for this function. Parameters

<Part of name…> and <Coefficients of priority> are described in detail in Part1

(page1_33). In the considered examples, missing data is a more important complicating

factor than segregation distortion, hence we use by default coefficients 0.9 and 0.1. The

markers are selected in accordance to priority defined by these coefficients, and for each

pair of markers they are compared for identity across all genotypes (excluding those with

missing data for the considered markers). If the number of identical scores does not

exceed the <Min. number of genotypes for two markers>, the markers are considered

unlinked. A representative marker for a group of bound together markers (twin group) will

be included to further analysis if the number of markers in the group is no less than the

preset parameter <Min. size of bound together group> (or ts0 - Ronin et al. 2017). The

default value of this parameter can be changed by the user. The process of marker

selection is started by pressing the <Bound> button. In each group, the pair of markers

with maximal number of identical scores is selected; within the pair, the marker with

minimal missing is considered as a skeleton marker, the representative of the whole twin

group (Ronin et al. 2015). With the default parameters in the example, we get 717 twin

groups, hence 717 candidate skeleton markers; in total, the groups included 1660

markers. The remaining markers are saved in Heap that will serve a source of markers

that can be tried in order to fill in the gaps of the ordered LG.

Window “Creation of global parameters”

By answering <Yes> you save the

results, but if the number of selected

makers is too small, you answer

<No>. It may even happen that no

twin groups were detected fitting a

chosen stringent threshold ts0.

4_6


The answer <No> implies that skeletal markers will be recruited using

representatives of twin groups with size ≥ ts0 only; by pressing the appeared

button <First clustering> you start the process of clustering the selected

candidate skeletal markers into LGs. Alternatively, you may chose <Yes> to

increase the number of skeletal markers via clustering of the remaining markers

into kernels of a preset radius (min. rf) by a procedure similar to k-means. You

may change the radius, depending on population size, data quality, etc. (see

Ronin et al. 2017). For big number of markers the process may take considerable

time. At the end, a table appears that informs about the number of clusters for

each cluster size. The user should decide about the minimum size of the clusters

to be used as source of additional skeletal markers.

In any case, the appeared window provides you 2 radial

buttons: <Reiteration> enables repeating the process with another

parameter ts0 (min. size), while choosing <Continue> leads to the

following question:

Window “Creation of global parameters (continued)

4_7


Again 2 buttons appear: <Reiteration> enables repeating the process with

a changed value of “min. rf group” while selecting <Continue> starts the

examination of the new candidate skeletal markers obtained by clustering

for co-segregation with candidate skeletal markers representing twin groups

of size ≥ ts0. The results of examination appear in the table:

Pressing <OK> leads to saving the

selected candidate skeletal marker in

a special array, which is reflected in

appearance of a scrolling bar and

button кнопка <First clustering>.

Pressing initiates the process of calculation of pairwise rates of recombination needed to cluster markers into linkage

groups (see next page) followed by ordering the skeletal markers within LGs. In total, all markers will be kept in three

arrays: skeletal, bound together (twins), and remaining markers (Heap).

Important note: Mapping data may include a certain proportion of markers in repulsion phase relative to the

majority of markers. If their phase was not defined in advance, this aspect should be taken into account during the

mapping analysis. The estimate of recombination frequency (rf) between two linked repulsion-phase markers will

be >50%, hence the program automatically replaces such estimates by 1- rf. The accuracy of this simple approach

for phase control during map construction was carefully checked and validated in our simulation tests. In the output

tables (function <FinalResult>) the markers proved to be in repulsion phase relative to the majority of markers in

the skeletal map are marked by a special symbol (‘T’). If the user wants to save the ordered genotypes as well, the

genotyping calls of such markers are transformed, e.g., HBBHABHHABA will be transformed to HAAHBAHHBAB.


4_8


After filtering is complete, the window <Global parameters> opens. By pressing button <Bound>, you start the

process <Bound together> for all markers remained after filtering. As usually, the ‘priority’ of markers is taken into

account in this process. Obviously, codominant markers have higher priority rank, hence they are the first to aggregate groups of

twins (Ronin et al. 2017) that also include dominant markers. The remaining dominant markers are also grouped into twin groups,

separately for the two phases, Dom4 and Dom5, according to the dominant allele origin, maternal or paternal (‘red’ and ‘blue’).

By answering “Yes”, you can repeat the process, with or without changing the limits of the

group sizes. Your answer “No” leads to the continuation of usual process of the analysis, but

only for codominant markers. Pressing the button "First clustering” initiates the calculation of

matrices of pairwise recombination rates and results in opening of the clustering window for

co-dominant candidate skeletal markers, as in general (earlier described) protocol of ultra-

dense analysis of MultiPoint-ULD. For both types of dominant markers, corresponding arrays

are maintained in Heap and include twin groups of dominant markers. These arrays are saved

in ‘Data’ folders of sub-projects _Dom4 and _Dom5.

The results of grouping appear in a special table (left), representing 3 types of twin groups:

(i) pure codominant (CC), (ii) with more than one codominant and few dominant markers

(C>1D), and (iii) with only one codominant and a few dominant (C_1D). For each of these

groups, we see the distribution of obtained groups under certain gradation of group sizes

(which can be changed by the user, if needed). The table enables choosing the threshold

sizes for the 3 types of twin groups: You select the group name (one of the 3) and the column

defining the minimal allowed size for such type of twin groups. After performing selection for

all 3 types, press button <OK selection>, which results in a message on the number of each

type of groups, hence the initial number of candidate skeletal markers:

The results of the analysis can be saved, as usually, by option <Save all clusters>. Heap and matrices are saved in the sub-

project “_Cod3”. The main stage of the analysis is to build a skeletal map for co-dominant markers in sub-project Cod3 (by earlier

described protocol of MultiPoint-ULD). Each use of the option <Save all clusters> in Cod3, automatically updates Dom4 and

Dom5. After finishing the construction of the skeletal map in Cod3, the user can move to the stage of additional saturation of the

map by adding dominant markers; this should be done separately in Dom and Dom5 sub-projects. During this analysis, the user

may decide that some additional changes/revision is needed of the already constructed skeletal map in Cod3. For that, the user

should open again Cod3 and continue the analysis within Cod3. However, it is noteworthy, that saving the changes

in Cod3 (by using menu option <Save all clusters>) will automatically remove all insertions of dominant markers

made in Dom3 or Dom4.

Joint ULD analysis of co-dominant and dominant markers in F2 populations


4_9


Clustering

After <First clustering> is over and

threshold parameter is chosen, the system

asks whether the project deals with real or

simulated data. The reason is that in the

latter case we know in advance the

simulated order and, therefore, can

evaluate the correspondence between the

generated and reconstructed order of

markers in each LG. For that, we use here

a simple score, coefficient of recovery

(Mester et al. 2003).

4_10


The obtained subdivision of markers into LGs depends on the chosen threshold

recombination rate. Too liberal choice (e.g., 0.4) may lead to fusion of LGs. Replacing

it by a smaller value (e.g., 0.25) and pressing the button <Build Linkage Groups>

will give you ‘on the spot’ a solution with higher resolution. On the contrary, too

stringent threshold (e.g., 0.20) may result in fragmentation of LGs (too many LGs

compared to the species haploid number). After getting such a result, you may want

to fuse some LGs using an increased threshold (see Part1 page 1_51). For such a

case you may want to avoid too high increase of the total length of the resulting map

compared to the sum of lengths of fused maps (e.g. prevent fusions resulting in more

than 1.1 increase). Thus, you can replace the default value of the parameter

<Allowed increase of combined cluster> by a new one.

Clustering (continued)

4_11


Option <Extending the linkage group> has only one option: <insert marker(s)>. It is used mainly for inserting

markers from Heap to gaps between markers of the LGs. This operation leads also to re-calculation of the

recombination matrices (see next page for details). graph of one chromosome map. The function <Division of the

linkage group> also differs a bit from that in the previous versions, namely: (a) formation of new clusters by the

division should be accompanied by a corresponding re-distribution of Heap markers; and (b) markers that have not

be included into new clusters will be removed to Heap, and this also involves re-calculation of the matrices.

Treatment of a separate LG

Most functions of this part are analogous to those in section <Analysis and treatment of a separate linkage

group> described in Tutorial Part1, pages 1_42 – 1-50. Function <Control of monotony> is described in

Part2, pages 2_4 – 2_7. Note that markers deleted from the LGs are moved to the Heap, which is accompanied

by re-calculation of the matrices, that takes some time. The parameter <Time to ES> is calculated automatically

as a function of the number of markers in the LG. ).

4_12


Treatment of a separate LG (continued)

In the window displaying all clusters detailed information is provided on the number of

skeleton markers, bound together markers and markers in Heap. In the current

version of MultiPoint, the table with the LG marker list is extended to include the size

of each twin group and mean rank of markers of the group (useful if one is interested

to compare the obtained order of markers in the LG with the original order in the input

dataset).

In the described example, the process of ordering and removing problematic

markers violating monotony and local map stability of LG12 cluster, has reduced

the number of markers to 85. The result was saved as step S2.

4_13


Treatment of a separate LG: Extending the LG

In this example, a lot of markers closely linked to the considered LG were found in Heap. To see the

entire list of such markers we use the menu option <Extending the linkage group->insert

markers> that leads to the appearance of “List of additional markers” on the screen. This list is

generated as following: for each Heap marker, the program calculates the closest skeleton marker.

Thus, for each LG we obtain a list of “associated” Heap markers.

A special window <Insert to interval> is provided enabling the user to set up a variant of

insertion strategy (for a selected interval or for the entire LG interval-by-interval). The first step is

setting the parameter <inflation coeff.> to control the allowed inflation of the interval caused by

insertion of a candidate marker from the “List of additional markers” (its default value is 1.2,

but we recommend to start with 1.0 value followed by a step-wise increase).

4_14


Then the user should select the mode of insertions: manual, for a certain

interval, or automatic along the LG (using the option <Input additional

markers to the LG>). In the automatic regime, the system checks, for each

interval, whether the list contains suitable candidate markers and inserts the

best one. With this regime, a button <Break> is available in the window <Insert

to interval>. Pressing this button enables to stop the insertion process. After

the process is terminated or interrupted, the <Break> button is replaced with

the <UnDo last step> button. If needed, by pressing this button you can delete

all markers inserted to the LG during this insertion process.

To ensure high quality of the map, we recommend to coordinate the insertion

process by marker ordering. Then, the whole process can be repeated under

the same or slightly increased inflation coefficient .

In the automatic regime, the choice of the best marker for insertion is controlled

by the following rules. From all potential candidates for the current interval, the

system select those that upon insertion do not increase the total interval length

more than allowed by the parameter <Inflation coeff>. If priority markers

appear in the list of candidates for the current interval, such markers will be

preferred. Other quality characteristics that are taken into account, include: (a)

missing, (b) group size and (c) proximity of the candidate’s calculated position to

the center of the interval. The quality of each candidate for insertion is quantified

by relative weights so that sum of the (a)-(c) scores is equal to 1. The weights

are defined by the user. They can be changed and then saved till the next

change. In the absence of priority markers, these rules are applied to usual

markers, under both manual and automatic insertion regimes.

Treatment of a separate LG: Extending the LG

4_15


Treatment of a separate LG: Extending the LG (continued)

User can select a single target interval to add a marker

from the list: corresponding button <Choose the interval

on the map and click right mouse button> is marked.

On the LG map, we select the interval and by pressing the

mouse right button, the option <Interval-length method>.

The choice made by the user causes a change in the list

of additional markers: now it shows only relevant markers

(suitable to the selected interval) as well as the interval

length and recombination rate of each additional marker to

the interval’s flanking markers. We choose only markers

close to the interval and obeying the condition that the

distances to the flanks are smaller than the interval length.

4_16


The “Extending the LG” function needs a special comment. Its importance in the current version of MultiPoint-

ultradense derives from the fact that for building the skeleton map we select as initial candidates only markers that

represent either twin groups with a size no less than some pre-set threshold ts0. This principle that gives priority to

more reliable markers, may be less relevant in hotspots of recombination, leading to gaps in the map. Similarly, it

may prevent getting sufficient coverage at sub-telomeric regions known to have higher recombination rate. Thus, we

complement the sets of candidates by markers representing kernels of certain minimum size resulted from a

clustering procedure similar to k-means approach (Ronin et al. 2017). By using the function

“Extending the LG” we actually relax the request to the size of trustable twins-ships or kernels.

The selected marker enters the interval and is

marked by underlining (both in the list and the map).

After this step, the list of additional markers is

updated and you can continue inserting additional

markers to the same region or target another region

of the map. If you don’t like the result of insertion of a

certain marker, you can delete it. For that, select this

marker in the main list, press mouse right button and

use the option <Delete chosen marker>.

By pressing the mouse right button on any marker of

the list, the program offers 2 insertion options: manual

inserting the chosen marker by user, and automatic

insertion. For the conditions of automatic selection see

p. 4_14.


4_17


In addition to insertion suitable markers from Heap to intervals along the LG, markers from Heap can also be

added to the ends of the LG:

before the first marker of the LG

after the last markers of the LG


4_18


Treatment of a set of LGs

Function <Find markers location> is described in Part1 (p.1-72). Function

<Moving to Heap> is described in Part1 (p.1-69), but it takes much more time,

due to much larger data sets treated by Ultra-dense version of the software.

Function <User’s name of the cluster> enables to assign names to linkage

groups, in addition to LG1, LG2, ... For that, you should fist mark the target LG.

Then, using this function, obtain a special window (at the bottom on the left side

of the page), and write in it the additional name of the marked LG. This name

is preserved in all further manipulations.

To merge two LGs presumably representing two part of the same chromosome

we can use the menu option <Merging two clusters> from the list of clusters

present in the form <Detail>. In this list, for each ordered cluster, its distance

to the closest cluster is shown (actually, shown is the smallest recombination

rate between markers of the clusters). Thus, for the selected pair of clusters

оne can check whether their merging would indeed fit the expectation of end-

to-end order, when the closest markers in the two cluster are located near their

ends and will appear in combined cluster as adjacent markers or very close

neighbors. (see Part1 p.1-70)

4_19


The answer <No>

cancels merging of the

selected cluster pair

This procedure can be

continued with other pairs

of clusters. Ideally, we

should get finally the

number of clusters equal

to the haploid number of

chromosome of our

organism.

Please pay attention: bold

markers at the ends of

LG12 and LG18 appear

as adjacent neighbors in

the merged cluster.

After choosing the option <Merging two clusters> we can obtain a window with information about their closest

markers. By pressing the button <Display clusters> we will see the two clusters and the prediction of the combined

cluster. By selecting <Yes> we confirm the merging decision, and the component clusters are removed from the list.

The new cluster gets the last number; if the cluster were earlier named by user, the combined cluster will have a

compound name. Obviously, the connections between markers and clusters should be updated, which may be a time-

consuming task for big mapping projects.

Treatment of a set of LGs (continued)

4_20


This option is described in Tutorial Part1, page

1_66. However, in current version of the software,

the two saved distance matrices reflect the

transition of the removed skeleton markers to

Heap during ordering and backward movement

during inserting procedures. If you open the saved

project not from the last step, the program will re-

analyze the matrices in

During the analysis, the option <Save all cluster> can be employed many times enabling you to return to any such

step. At each step, the current version of the program needs not only the details on each cluster (LG), but also the

two matrices of recombination rates: (i) between the skeleton markers, to build the skeleton map; and (ii) between

the skeleton and Heap markers. If you saved the project details sequentially at steps S1,S2,S3,..., the system

remembers only the recombination matrices saved on the last step. It may happen that you decided to return back

to some earlier step, say S2. This will lead to a re-calculation of the matrices in accordance to the sets of skeletal

and Heap markers of this step. The new branch of the analysis can be continued (with steps S2_3, S2_4 …) with re-

calculating a corresponding step-specific pair of matrices. You may also use the option <Clear saved clusters> to

remove the unneeded anymore steps. When you remove some intermediate steps, the matrices remain in the

memory, while removing the last step in a branch the matrices of this branch are deleted. When you open your

earlier saved project at one step before the last one, the system re-calculates the recombination matrices in

accordance to the subdivision of skeletal markers in the LGs and Heap markers.

Option <Save all clusters>

Very important comment: During the work with a cluster that includes deletion

or insertion of a large number of markers, the system may generate a message:.

To avoid damage to the distance matrices, you should finish the current step,

save the results and close the program. Upon the repeated enter, the system

revises the sizes of distance matrices allowing for the continuation of the analysis.

accordance with distribution of markers at that step (see the example below):

4_21


The option <Print (output to EXCEL)> enables to output not only the skeletal markers of the chosen cluster but

also all bound together markers and all markers from Heap attached to this cluster. The flexibility is provided by

using a special window where the user should define which markers and which details should appear in the output.

In particular, the user may want to output all markers attached to the

skeletal map. We do not recommend to do that, because a part of attached

markers, due to genotyping errors, may be at a much higher distance from

the closest interval of the skeleton map compared to the size of the interval.

Alternatively, the user may request to include only those of the attached

markers that their distance to the closest interval does not exceed the

length of the interval multiplied by some constant, “relative distance to the

interval” (that might be <1 or >1). By default, we put a rather liberal

constant 1.4, but it can be changed by user.

By pressing <OK> we get

a new window. For a more

detailed description of this

function see Part1, pages

1-54 and 1-74.


4_22


This option can be employed only

when all clusters have already

been ordered. The window

<Parameters for printing> of

this option are identical to the

window of the <Print> option.

Yet, this new option provides

additional possibilities for a

flexible control of the output

information, listed in the second

window <Parameters for final

result>. The user can output the

results of each chromosome in a

separate file or get a file with all

chromosomes. The output may

include only marker names and

their chromosomal positions, or

names and genotype calls.

Option <Final result>

Depending on user’s requests, the output results will include two or three files for all

LGs or separate files for each of the LGs. File with name Sk contains skeleton

markers only, file with name Sk&Ex contains the skeleton and bound together (twin)

markers; file with name Glob contains all markers.

4_23



For example, the output file Glob.txt may look like the one in the figure below, where “S” denotes skeletal markers,

“B” – bound together markers, “A” – attached markers, and “AB” - markers bound together with the previous

attached marker. If the data include repulsion-phase markers, then the letter “T” on the left of the marker denotes a

repulsion-phase marker compared to the majority of markers. If the output included genotyping data, the marker

calls for T-markers are transformed (see p. 4-8), hence genotyping data now include all markers in coupling-phase.

Upon selection of option <Marker position> the output will include the marker

coordinate on the chromosome map, while by choosing the option <Interval length>

you will get in the output the distances between adjacent markers.

4_24



References


of our relevant publications was provided on pages 7-8. Here we provide references to other papers cited in the

Tutorial.


coefficient of coincidence. Theor Appl Genet 104:786–796.




natural populations. Genetics 121 174-181.



Research Technical Report Third Edition.




linkage analysis of complex diseases? Hum Genet 114: 588.593.







4_25