72
Memory approaches to improve multi-start constructive heuristics Celso C. Ribeiro Universidade Federal Fluminense, Brazil Santorini, May 2005 oint work with Eraldo Fernandes (M.Sc., PUC-Rio, Brazil 2005 – IV Workshop on Experimental and Efficient Algori

Memory approaches to improve multi-start constructive heuristics

  • Upload
    agrata

  • View
    57

  • Download
    0

Embed Size (px)

DESCRIPTION

Celso C. Ribeiro Universidade Federal Fluminense, Brazil. Memory approaches to improve multi-start constructive heuristics. Joint work with Eraldo Fernandes (M.Sc., PUC-Rio, Brazil). WEA’2005 – IV Workshop on Experimental and Efficient Algorithms. Santorini, May 2005. Summary. - PowerPoint PPT Presentation

Citation preview

Page 1: Memory approaches to improve  multi-start constructive heuristics

Memory approaches to improve

multi-start constructive heuristics

Celso C. Ribeiro

Universidade Federal Fluminense, Brazil

Santorini, May 2005

Joint work with Eraldo Fernandes (M.Sc., PUC-Rio, Brazil)

WEA’2005 – IV Workshop on Experimental and Efficient Algorithms

Page 2: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 2/72

Summary

Application: DNA sequencing Motivation: sequencing by hybridization Multi-start randomized constructive heuristic Adaptive memory strategy Vocabulary building Complete heuristic: MS+MEM+VB Computational experiments Numerical results and comparisons Concluding remarks

Page 3: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 3/72

DNA sequencing

DNA molecule: sequence formed by a combination of four different nucleotide bases - A, C, G, and T

Each DNA molecule may be represented as a word over the alphabet {A,C,G,T} of nucleotide bases

Example: ATAGGCAGGA Sequencing: identification of the contents of a

DNA molecule• Gel electrophoresis• Chemical method

Page 4: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 4/72

Sequencing by hybridization

SBH: alternative approach to DNA sequencing

Two phases:• Biochemical: hybridization experiment

involving a DNA array and the target molecule to be sequenced

• Computational: reconstruction problem using the results of the hybridization experiment

Page 5: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 5/72

Sequencing by hybridization

DNA array:• Bidimensional grid• Each cell contains a probe: small sequence of q

nucleotides • Library C(q): set of all 4q probes of size q in the array

Hybridization experiment:• Array is introduced into a solution containing many

copies of the target sequence• A copy of the target sequence reacts with a probe if the

latter is a subsequence (of the complement) of the former

• Spectrum: set of all probes of size q that reacted with the target sequence, i.e., subsequences of size q that appear in the target

Page 6: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 6/72

Sequencing by hybridization

AAAA AAAT AAAC AAAG AATA AATT AATC AATG AACA AACT AACCAACGAAGA AAGTAAGCAAGGATAA ATAT ATAC ATAG ATTA ATTT ATTC ATTG ATCA ATCT ATCC ATCG ATGA ATGT ATGCATGGACAA ACAT ACACACAG ACTA ACTT ACTC ACTG ACCA ACCT ACCCACCGACGAACGTACGCACGG

AGAA AGATAGACAGAGAGTA AGTT AGTCAGTGAGCAAGCTAGCCAGCGAGGAAGGTAGGCAGG

GTAAA TAAT TAAC TAAG TATA TATT TATC TATG TACA TACT TACC TACG TAGA TAGT TAGCTAGGTTAA TTAT TTAC TTAG TTTA TTTT TTTC TTTG TTCA TTCT TTCC TTCG TTGA TTGT TTGC TTGGTCAA TCAT TCAC TCAG TCTA TCTT TCTC TCTG TCCA TCCT TCCC TCCG TCGA TCGT TCGCTCGGTGAA TGAT TGACTGAG TGTA TGTT TGTC TGTG TGCA TGCT TGCCTGCGTGGATGGTTGGCTGGGCAAA CAAT CAACCAAG CATA CATT CATC CATG CACA CACT CACCCACGCAGACAGTCAGCCAGGCTAA CTAT CTAC CTAG CTTA CTTT CTTC CTTG CTCA CTCT CTCC CTCG CTGA CTGT CTGCCTGGCCAA CCAT CCACCCAG CCTA CCTT CCTC CCTG CCCA CCCT CCCCCCCGCCGACCGTCCGCCCGG

CGAACGATCGACCGAGCGTA CGTT CGTCCGTGCGCACGCTCGCCCGCGCGGACGGTCGG

CCGG

G

GAAA GAATGAACGAAGGATA GATT GATCGATGGACAGACTGACCGACGGAGAGAGTGAGCGAG

GGTAA GTAT GTACGTAG GTTA GTTT GTTC GTTG GTCA GTCT GTCCGTCGGTGAGTGTGTGCGTGG

GCAAGCATGCACGCAGGCTA GCTT GCTCGCTGGCCAGCCTGCCCGCCGGCGAGCGTGCG

CGCG

G

GGAAGGATGGACGGA

GGGTAGGTTGGTCGGTGGGCAGGCT

GGCC

GGCG

GGGA

GGGTGGG

CGGG

G

Library C(4):

Target sequence: ATAGGCAGGA

Page 7: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 7/72

Sequencing by hybridization

AAAA AAAT AAAC AAAG AATA AATT AATC AATG AACA AACT AACCAACGAAGA AAGT AAGCAAGGATAA ATAT ATAC ATAG ATTA ATTT ATTC ATTG ATCA ATCT ATCC ATCG ATGA ATGT ATGCATGGACAA ACAT ACACACAG ACTA ACTT ACTC ACTG ACCA ACCT ACCCACCGACGAACGTACGCACGG

AGAA AGAT AGACAGAGAGTA AGTT AGTCAGTGAGCAAGCTAGCCAGCGAGGAAGGTAGGCAGG

GTAAA TAAT TAAC TAAG TATA TATT TATC TATG TACA TACT TACC TACG TAGA TAGT TAGCTAGGTTAA TTAT TTAC TTAG TTTA TTTT TTTC TTTG TTCA TTCT TTCC TTCG TTGA TTGT TTGC TTGGTCAA TCAT TCAC TCAG TCTA TCTT TCTC TCTG TCCA TCCT TCCC TCCG TCGA TCGT TCGCTCGGTGAA TGAT TGACTGAG TGTA TGTT TGTC TGTG TGCA TGCT TGCCTGCGTGGATGGTTGGCTGGGCAAA CAAT CAACCAAG CATA CATT CATC CATG CACA CACT CACCCACGCAGACAGTCAGCCAGGCTAA CTAT CTAC CTAG CTTA CTTT CTTC CTTG CTCA CTCT CTCC CTCG CTGA CTGT CTGCCTGG

CCAA CCAT CCACCCAG CCTA CCTT CCTC CCTG CCCA CCCT CCCCCCCGCCGACCGTCCGCCCG

G

CGAACGATCGACCGAGCGTA CGTT CGTCCGTGCGCACGCTCGCCCGC

GCGGACGGT

CGGC

CGGG

GAAA GAAT GAACGAAGGATA GATT GATCGATGGACAGACTGACCGACGGAGAGAGTGAGCGAG

GGTAA GTAT GTACGTAG GTTA GTTT GTTC GTTG GTCA GTCT GTCCGTCGGTGAGTGTGTGCGTGG

GCAAGCATGCACGCAGGCTA GCTT GCTCGCTGGCCAGCCTGCCCGCC

GGCGAGCGT

GCGC

GCGG

GGAAGGATGGACGGA

GGGTAGGTTGGTCGGTGGGCAGGCT

GGCC

GGCG

GGGA

GGGTGGG

CGGG

G

Library C(4):

Target sequence: ATAGGCAGGA Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA}

Page 8: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 8/72

Sequencing by hybridization

Reconstruction problem:• Second phase: reconstruction of the target

sequence from the spectrum• Find a sequence of the probes in the

spectrum such that consecutive probes have q-1 bases of superposition

Hamiltonian path problem on the spectrum:• One vertex for each probe u in the spectrum• Arc (u,v) from probe u to v if the last q-1

bases of u coincide with the first q-1 bases of v

ATAG TAGG AGGC GGCA GCAG CAGG

AGGA

ATAG TAGG AGGC GGCA GCAG CAGG

AGGAATAGGCAGGA

Page 9: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 9/72

Sequencing by hybridization

Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA}

AGGA

CAGG

AGGC

GCAG

ATAG

GGCA

TAGG

Page 10: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 10/72

Sequencing by hybridization

Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA}

AGGA

CAGG

AGGC

GCAG

ATAG

GGCA

TAGG

Page 11: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 11/72

Sequencing by hybridization

Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA}

AGGA

CAGG

AGGC

GCAG

ATAG

GGCA

TAGGATAG TAGG AGGC GGCA GCAG CAGG

AGGAATAGGCAGGA

Page 12: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 12/72

Sequencing by hybridization

Hybridization errors:• Hybridization experiment is

not perfect• False positives: probes that

appear in the spectrum but not in the target sequence

• False negatives: probes that occur in the target sequence but not in the spectrum

ATAG TAGG AGGC ---- GCAG CAGG

AGGAATAGGCAGGA

Page 13: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 13/72

Sequencing by hybridization

Problem of sequencing by hybridization (PSBH): given the spectrum S = {s1, s2, ..., sm}, the size q of the probes, the length n, and the first probe s0 of the target sequence, find a sequence with size smaller than or equal to n with a maximum number of probes.

PSBH is NP-hard (Blazewicz et al., 1999)

Page 14: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 14/72

Sequencing by hybridization

Directed graph G = (V,E)• V = S (probes in the spectrum)• E = {(u,v): uS and vS}• Superposition o(u,v) between two probes u,vS:

size of the largest sequence that is both a suffix of u and a prefix of v

• Weight w(u,v) of the arc (u,v):

otherwise

0),( if

,

),,(),(

vuovuoqvuw

Page 15: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 15/72

Sequencing by hybridization

Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4)

AGGA

CAGG

AGGC

GCAG

ATAG

GGCG

TAGG

Target sequence: ATAGGCAGGA(n = 10)

GGCG: false positiveGGCA: false negative

Page 16: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 16/72

Sequencing by hybridization

Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4)

AGGA

CAGG

AGGC

GCAG

ATAG

GGCG

TAGG11

1

3

11

13

1

2

Target sequence: ATAGGCAGGA(n = 10)

GGCG: false positiveGGCA: false negative

Page 17: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 17/72

Sequencing by hybridization

Feasible solutions: acyclic paths in G emanating from vertex s0 with weight less than or equal to n-q

A path in G is a sequence a = (a1, a2, ..., ak) of probes ai S, i {1, 2, ..., k}

An optimal solution visits a maximum number of vertices and respects the above constraints

Heuristics: ant colony, tabu search, genetic algorithm

This work: multi-start constructive heuristic with a memory-based strategy

Page 18: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 18/72

Sequencing by hybridization

Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4)

AGGA

CAGG

AGGC

GCAG

ATAG

GGCG

TAGG11

1

3

11

13

1

2

Target sequence: ATAGGCAGGA(n = 10)

GGCG: false positiveGGCA: false negative

Page 19: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 19/72

Sequencing by hybridization

Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG}(q = 4)

AGGA

CAGG

AGGC

GCAG

ATAG

GGCG

TAGG11

1

3

11

13

1

2

Target sequence: ATAGGCAGGA(n = 10)

GGCG: false positiveGGCA: false negative

Page 20: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 20/72

Sequencing by hybridization

Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG}(q = 4)

AGGA

CAGG

AGGC

GCAG

ATAG

GGCG

TAGG11

1

3

11

13

1

2

Target sequence: ATAGGCAGGA(n = 10)

GGCG: false positiveGGCA: false negative

ATAG TAGG AGGC ---- GCAG CAGG

AGGAATAGGCAGGA

Page 21: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 21/72

Multi-start randomized constructive heuristic

Iteratively builds multiple solutions using a randomized constructive algorithm

Randomized constructive algorithm builds a different solution at each run

Returns the best solution found Initial solution formed by a unique probe: a

= (s0) Current partial solution (path) is extended

at each iteration by the insertion of a new probe at the end

Page 22: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 22/72

Multi-start randomized constructive heuristic

Current partial solution (path) is extended at each iteration by the insertion of a new probe at the end

Probe to be inserted is probabilistically selected from a restricted candidate list (RCL)

S(a): probes in the current partial solution a u: last probe in the current path RCL = {v S\S(a): o(u,v) ≥ (1-).max tS\S(a)

o(u,t) and w(a) + w(u,v) n-q} Randomly select a probe v from RCL with

probability p(u,v) = (1/w(u,v))/Σ tS\S(a) (1/w(u,t))greediness

Page 23: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 23/72

Adaptive memory strategy

Application to QAP: Fleurent and Glover, 1999 Pool Q of elite solutions (best solutions found):

diversity Intensification strategy for the constructive

algorithm Makes use of two kinds of information in the

construction: superposition between the probes and frequency of the arcs in the elite solutions

Parameter used to balance the weights of the two terms: greediness (superposition) and frequency (memory)

Page 24: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 24/72

Adaptive memory strategy

)},(/),({min),( )(\ vuwtuwvux aSSt

),(),(),( vuyvuxvue

"),(:" ' |}'|max/|"{|),(avuQa Qa aavuy

higher when the superposition between probes u and v is larger

higher for arcs (u,v) appearing more often in the solutions of the elite set

RCLt

tue

vuevup

),(

),(),(

Probability p(u,v) of selecting a probe vfrom the RCL to extend the currentpartial solution whose last probe is u:

greediness

frequency

Page 25: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 25/72

Adaptive memory strategy

Pool update: • Pool size: at most q solutions• Solution a is a candidate to be inserted into

the pool Q if it is better than the worst solution currently in the pool, i.e., |a| > min a’Q|a’|

• Candidate solution a replaces the worst solution in the pool if it is better than the best solution in the pool (|a| > max a’Q|a’|) or if it is sufficiently different from every other solution in the pool (min a’Q dist(a,a’) ≥ dmin)

Page 26: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 26/72

Vocabulary building

Good solutions are very often formed by the same building blocks (paths)

Optimal solutions formed by components appearing in suboptimal solutions

Identify short paths with optimal superposition and combine them to build optimal solutions

Vocabulary building: Glover and Laguna, 1997 • Find common paths appearing in good solutions

(words)• Combine them into new good solutions (phrases)

Page 27: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 27/72

Solutions encoded as adjacency vectors• Solution a = (a1,a2,...,ak) represented as a

vector x = x1,x2,...,x|S|

• If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path

Vocabulary building

1 2

3

45

6a = (1,4,2,3,5)

Page 28: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 28/72

Solutions encoded as adjacency vectors• Solution a = (a1,a2,...,ak) represented as a

vector x = x1,x2,...,x|S|

• If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path

Vocabulary building

1 2

3

45

6a = (1,4,2,3,5)

Page 29: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 29/72

Solutions encoded as adjacency vectors• Solution a = (a1,a2,...,ak) represented as a

vector x = x1,x2,...,x|S|

• If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path

Vocabulary building

1 2

3

45

6u 1 2 3 4 5 6

xu

4 3 5 2 - -a = (1,4,2,3,5)

Page 30: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 30/72

Vocabulary building

Some notation:• Set X of adjacency vectors• Size(x): number of arcs in the adjacency

vector x• Inter(X): subset of arcs that appear in all

vectors in X• Enclosure(y,X): set formed by all vectors in X

that contain the arcs in the adjacency vector y

Page 31: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 31/72

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 6 7 - 4 8 -

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 6 7 4 5 8 -

Inter(x1,x2):

Page 32: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 32/72

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 6 7 - 4 8 -

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 6 7 4 5 8 -

Inter(x1,x2):

Page 33: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 33/72

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 6 7 - - 8 -

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 6 7 - 4 8 -

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 6 7 4 5 8 -

Inter(x1,x2):

Page 34: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 34/72

Vocabulary building

Some notation:• Set X of adjacency vectors• Size(x): number of arcs in the adjacency vector x• Inter(X): subset of arcs that appear in all vectors in

X• Enclosure(y,X): set formed by all vectors in X that

contain the arcs in the adjacency vector y Find words: given an elite set X, find vectors y

with |Enclosure(y,X)| as large as possible and Size(y) ≥ smin (non-elementary small words), where smin is a parameter

Page 35: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 35/72

Vocabulary building

Algorithm FindWords(X,smin):Y , X’ Xwhile X’ do

x rand(X’), Z {x}, X’’ X - {x}while X’’ do

x rand(X’’)if Size(Inter(Z{x})) ≥ smin then Z Z {x}X’’ X’’ - {x};

end-whileif |Z| > 1 then y Inter(Z); Y Y {y} X’ X’ – Z

end-whilereturn Y

Martins and Plastino, 2005: more effective algorithm based on data mining strategies

Page 36: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 36/72

Vocabulary building

Additional notation:• x and y: adjacency vectors• ExtInter(x,y): undefined variables in one of the

vectors are filled with the corresponding defined variables in the other

Page 37: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 37/72

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 - - - - 8 -

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

- 3 4 5 6 - - -

ExtInter(x1,x2):

Page 38: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 38/72

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 - - - - 8 -

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

- 3 4 5 6 - - -

1

2

3 4

5

6

78

u 1 2 3 4 5 6 7 8

xu

2 3 4 5 6 - 8 -ExtInter(x1,x2):

Page 39: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 39/72

Vocabulary building

Additional notation:• x and y: adjacency vectors• ExtInter(x,y): undefined variables in one of the

vectors are filled with the corresponding defined variables in the other

Combine words: given a set of words Y, combine them into phrases• Very similar to the algorithm that finds words,

replacing the original operator Inter by the new operator ExtInter

Page 40: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 40/72

Vocabulary building

Algorithm CombineWords(Y):Z , Y’ Ywhile Y’ do

y rand(Y’), W {y}, Y’’ Y - {y}while Y’’ do

y rand(Y’’)if MaxInDegree(ExtInter(W,y)) = 1 then W W

{y}Y’’ Y’’ - {y};

end-whileif |W| > 1 then z ExtInter(W); Z Z {z} Y’ Y’ – W

end-whilereturn Z

Page 41: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 41/72

Vocabulary building

Combine words: given a set of words Y, combine them into phrases• Very similar to the algorithm that finds words,

replacing the original operator Inter by the new operator ExtInter

Phrases may be incomplete or unfeasible Make feasible the unfeasible phrases (solutions)

• Insert probe s0 in the best place in case it does not appear in the phrase

• Complete the solution joining subpaths of the phrase

Page 42: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 42/72

Vocabulary building

Algorithm VocabularyBuilding(X,smin):

Y FindWords(X,smin)Z CombineWords(Y)A for each z Z do

a MakeFeasible(z)A A {a}

end-forreturn A

Page 43: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 43/72

Complete heuristic: MS+MEM+VB

Algorithm MS+MEM+VB:Q, X ; a* nullfor i = 1, ..., MAXITER

a GreedyRandomizedMemory(Q, )if |a| > |a*| then a* aupdate weight and use a to update pools Q and Xif i mod(nVB) = 0 then

A VocabularyBuilding(X,smin)for every a A do use a to update pools Q and X and if |a| > |a*|

then a* aend-for

end-forreturn a*

Q: pool of elite solutions for adaptive memoryX: pool of elite solutions for vocabulary building|X|>>|Q|

Page 44: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 44/72

Computational experiments

Conditions:• Pentium 2.4 GHz with 512 M of RAM memory• Linux 10.0 with kernel 2.6.3• Codes in ANSI C++ compiled with GNU

compiler version 3.3.2 Instances:

• set A: instances generated from real human DNA sequences obtained from GenBank

• set R: instances randomly generated

Page 45: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 45/72

Computational experiments

Instances A:• Origin: 40 GenBank sequences• Five smaller sequences are generated from each

original sequence, corresponding to their prefixes of size n = 109, 209, 309, 409, 509

• For each of them, we consider its ideal spectrum, with size resp. equal to 100, 200, 300, 400, 500, using an array with probes of size q = 10

• Total: 200 instances• 20% of false negatives and 20% of false positives

generated for each instance (probe s0 appears in all of them, no repetitions)

Page 46: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 46/72

Computational experiments

Instances R:• Origin: 100 random sequences• Ten smaller sequences are generated from each

original sequence, corresponding to their prefixes of size n = 100, 200, ..., 1000

• For each of them, we consider its ideal spectrum, with size resp. equal to 92, 192, ..., 992, using an array with probes of size q = 7

• Total: 1000 instances• 20% of false negatives and 20% of false

positives generated for each instance (probe s0 appears in all of them, no repetitions)

Page 47: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 47/72

Computational experiments

Solution quality evaluation:1. Number of probes in the solution: |a|2. Similarity with the target sequence:

• Perform the alignment between the solution and the target sequence (matches: +1, missmatches: -1) to compute the value align((a),*) by dynamic programming

• Compute similarity(a) = 100.(align((a),*)+nmax)/(2.nmax), with nmax = max{|(a)|,|*|}

3. Fraction: • fraction(a) = 100.|a|/|a*|

Page 48: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 48/72

Computational experiments

Random instances in set R used for parameter seting and tuning• Weight decreases with the iteration counter• Small values of are used in the beginning, so

as that purely greedy solutions are generated when no frequency information is available

• Initial value of decreases with the problem size

• MAXITER = 10.n (iterations)• Parameters and are updated after blocks of

n/2 iterations

Page 49: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 49/72

Numerical results

Average similarity with thetarget sequence over all Rinstances with the same size

MS

MS+MEM+VB

Each additional component (memory, VB) improves the multi-start heuristic

Page 50: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 50/72

Numerical results

Average computation time over all R instances with the same size

Page 51: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 51/72

Numerical results

Average similarity with the target sequence observed with algorithm MS+Mem+VB over all R instances with the same size for different rates of errors

Page 52: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 52/72

Numerical results

Average similarity with the target sequence observed with algorithm MS+Mem+VB over all R instances with the same size for different probe sizes

Page 53: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 53/72

Numerical results

Best known solution for aninstance in set R (n=1000) vs. iteration counter

Page 54: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 54/72

Numerical results

Best known solution for aninstance in set R (n=1000) vs. processing time (10.4 seconds)

Page 55: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 55/72

Numerical results

Best known solution for aninstance in set R (n=1000) vs. processing time (10.4 seconds)

Page 56: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 56/72

Numerical results

Best known solution for anotherinstance in set R (n=1000) vs. iteration counter

Page 57: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 57/72

Numerical results

Best known solution for anotherinstance in set R (n=1000) vs. processing time (9.0 seconds)

Page 58: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 58/72

Numerical results

Best known solution for anotherinstance in set R (n=1000) vs. processing time (9.0 seconds)

Additional memory computations speedup the multi-start heuristic (better solutions in the same computation time), in spite of the increase in the time per iteration

Memory helps!

Page 59: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 59/72

Numerical results

increases and the greedy solutions deteriorate

decreases and the memoryacts to improve the solutions

Page 60: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 60/72

Numerical results

Instance in set R with n = 500 Empirical distributions of the time to

target solution value Set a target value (in this case, the

optimal value) Run each algorithm 100 times and record

the running time when a solution at least as good as the target value is found

Plot the empirical distributions

Page 61: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 61/72

Numerical results

Instance in set R with n = 500

Page 62: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 62/72

Numerical results

Instance in set R with n = 500

Algorithms with memory find target values more quickly (algorithmsto the left are preferable)

Algorithms with memory aremore robust (time to target values are more stable)

Page 63: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 63/72

Comparisons

Best algorithms in the literature:• Tabu search: Blazewicz et al., 2000• Overlapping windows heuristic: Blazewicz et

al., 2002• SOPAS – Genetic algorithm: Endo, 2004

Page 64: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 64/72

Comparisons

Sequence length (n)

Algorithm 109 209 309 409 509

TS 98.6 94.1 89.6 88.5 80.7

OW 99.4 95.2 95.7 92.1 90.1

GA 98.3 97.9 99.1 98.1 93.5

MS+Mem+VB

100.0 100.0 99.2 99.4 99.5

Average similarity with the target sequence observed with the four algorithms over all A instances with the same size

Page 65: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 65/72

Comparisons

Average similarity with the target sequence observed with the four algorithms over all A instances with the same size

(alternatively)

Page 66: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 66/72

Comparisons

Number of target sequences found by each of the four algorithms over all A instances with the same size Sequence length (n)

Algorithm 109 209 309 409 509

TS 28 23 17 10 10

OW 28 20 21 13 14

GA 37 30 37 30 28

MS+Mem+VB 40 40 39 39 39

Page 67: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 67/72

Comparisons

Average computation times in seconds observed for each of the four algorithms over all A instances with the same size

Sequence length (n)

Algorithm 109 209 309 409 509

TS <1.0 5.0 14.0 28.0 51.0

OW <1.0 <1.0 <1.0 <1.0 <1.0

GA 0.1 0.3 0.9 1.5 2.1

MS+Mem+VB

0.1 0.4 0.9 3.1 6.2

Cray T3E-900

Page 68: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 68/72

Comparisons

Average computation times in seconds observed for each of the four algorithms over all A instances with the same size Cray T3E-900

(alternatively)

Page 69: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 69/72

Comparisons

Number of target sequences found by MS+Mem+VB and the GA over all R instances with the same size

Sequence length (n)

Algorithm

100

200

300

400

500

600

700

800

900

1000

GA 70 61 55 37 23 11 9 3 1 2

MS+Mem+VB

79 74 83 73 61 52 34 10 13 2

Page 70: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 70/72

Comparisons

Average similarity with the target sequence over all R instances with the same size

Page 71: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 71/72

Comparisons

Average computation times in seconds observed for each algorithm over all R instances with the same size

Page 72: Memory approaches to improve  multi-start constructive heuristics

May 2005 Memory approaches to improve multi-start constructive heuristics WEA’2005 72/72

Concluding remarks

New multi-start heuristic to PSBH performs very well

Memory approaches (adaptive memory and vocabulary building) are able to improve multistart solutions

Parameter tuning may be further improved

Approach can be applied to other optimization problems (e.g. car sequencing problem)