12
S. Aluru et al. (Eds.): HiPC 2007, LNCS 4873, pp. 71–82, 2007. © Springer-Verlag Berlin Heidelberg 2007 An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN Azzedine Boukerche 1 , Jan Mendonca Correa 2 , Alba Cristina Magalhaes Alves de Melo 2 , Ricardo Pezzuol Jacobi 2 , and Adson Ferreira Rocha 3 1 SITE-School of Information Technology and Engineering, University of Ottawa, Canada 2 Department of Computer Science, University of Brasilia, Brazil 3 Department of Electrical Engineering, University of Brasilia, Brazil [email protected],{jan,albamm,rjacobi}@cic.unb.br, [email protected] Abstract. Multiple sequence alignment (MSA) is a very important problem in Computational Biology since it is often used to identify evolutionary relation- ships among the organisms and predict secondary/tertiary structure. Since MSA is known to be a computationally challenging problem, many proposals were made to accelerate it either by using parallel processing or hardware accelerators. In this paper, we propose an FPGA based accelerator to execute the most compute intensive part of DIALIGN, an iterative method to obtain multiple sequence alignments. The experimental results collected in our 200- element FPGA prototype show that a speedup of 383.41 was obtained when compared with the software implementation. 1 Introduction In the last decade, genome projects have produced a very huge amount of biological data. In order to better understand newly sequenced organisms, biologists compare their sequences against other organisms contained in genomic databases, in order to infer properties. Nowadays, this comparison is done millions of times a day, all over the world. Sequence alignment (or sequence comparison) is in fact a problem of finding an approximate pattern matching between the sequences [21]. It can involve only two sequences (pairwise alignment) or more than two sequences (multiple sequence alignment) [9]. In a multiple sequence alignment (MSA), similar residues among a set of nseq sequences are aligned together. Usually, sequences compared with MSA are known to be biologically related and the goal is to obtain conserved subpatterns [5]. MSAs are often scored with the sum-of-pairs (SP) objective function [4] and the exact SP MSA problem is known to be NP-complete [25]. Therefore, heuristic methods are usually used to solve this problem, even when the number of sequences is small. In general, an MSA problem can be solved with progressive or iterative methods [15]. Progressive methods are executed in three steps. First, the NW algorithm [17] is

[Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

Embed Size (px)

Citation preview

Page 1: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

S. Aluru et al. (Eds.): HiPC 2007, LNCS 4873, pp. 71–82, 2007. © Springer-Verlag Berlin Heidelberg 2007

An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

Azzedine Boukerche1, Jan Mendonca Correa2, Alba Cristina Magalhaes Alves de Melo2, Ricardo Pezzuol Jacobi2,

and Adson Ferreira Rocha3

1 SITE-School of Information Technology and Engineering, University of Ottawa, Canada 2 Department of Computer Science, University of Brasilia, Brazil

3 Department of Electrical Engineering, University of Brasilia, Brazil [email protected],{jan,albamm,rjacobi}@cic.unb.br,

[email protected]

Abstract. Multiple sequence alignment (MSA) is a very important problem in Computational Biology since it is often used to identify evolutionary relation-ships among the organisms and predict secondary/tertiary structure. Since MSA is known to be a computationally challenging problem, many proposals were made to accelerate it either by using parallel processing or hardware accelerators. In this paper, we propose an FPGA based accelerator to execute the most compute intensive part of DIALIGN, an iterative method to obtain multiple sequence alignments. The experimental results collected in our 200-element FPGA prototype show that a speedup of 383.41 was obtained when compared with the software implementation.

1 Introduction

In the last decade, genome projects have produced a very huge amount of biological data. In order to better understand newly sequenced organisms, biologists compare their sequences against other organisms contained in genomic databases, in order to infer properties. Nowadays, this comparison is done millions of times a day, all over the world.

Sequence alignment (or sequence comparison) is in fact a problem of finding an approximate pattern matching between the sequences [21]. It can involve only two sequences (pairwise alignment) or more than two sequences (multiple sequence alignment) [9]. In a multiple sequence alignment (MSA), similar residues among a set of nseq sequences are aligned together. Usually, sequences compared with MSA are known to be biologically related and the goal is to obtain conserved subpatterns [5].

MSAs are often scored with the sum-of-pairs (SP) objective function [4] and the exact SP MSA problem is known to be NP-complete [25]. Therefore, heuristic methods are usually used to solve this problem, even when the number of sequences is small.

In general, an MSA problem can be solved with progressive or iterative methods [15]. Progressive methods are executed in three steps. First, the NW algorithm [17] is

Page 2: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

72 A. Boukerche et al.

used to perform pairwise alignments with all sequences. After that, a phylogenetic tree is constructed with the information obtained in phase 1 and, finally, the tree is used to guide the alignment of the sequences sequentially, from the most closely related to the less related ones. CLUSTALW [24] and T-COFFEE [16] and examples of progressive MSA methods. Iterative methods also use dynamic programming but, unlike the progressive methods, iterative methods periodically evaluate the quality of the scores produced and realign subgroups of already aligned sequences. PRRP [8] and DIALIGN [14] are examples of iterative methods.

Many dedicated architectures [2,10,12,18] and parallel applications [3] have been proposed to tackle pairwise sequence alignment by accelerating the dynamic programming matrix computation. Fewer examples [11,19] do exist that accelerate MSA algorithms. In this case, the hardware is not used to execute the whole algorithm, but only the most compute intensive part of it. [19] and [11] proposed an FPGA-based accelerator to execute the first phase of CLUSTALW [24], that executes pairwise sequence comparisons among all the sequences.

In this article, we present and evaluate an FPGA-based architecture to execute the most compute intensive part of the DIALIGN algorithm for multiple sequence alignment. Our architecture is designed as a systolic array which is able to compare sequences of any size using the DIALIGN recurrence relations [14]. As far as we know, this is the first hardware-based approach to execute DIALIGN.

The results obtained on a 200-element prototype synthesized for the FPGA Altera Stratix 2 EP2S180F1508I4 show that a speedup of 383.41 is achieved when comparing real DNA sequences of size 194439 bp (base pairs) and 169786 bp, respectively. In this case, the software implementation took 3 hours and 4 minutes and our FPGA implementation took 28.839 seconds.

The rest of this paper is organized as follows. Section 2 describes the MSA problem and the DIALIGN algorithm to solve it. In Section 3, related work in the area of FPGA architectures for sequence alignment is discussed. Section 4 describes our FPGA-based architecture. Some results are discussed in section 5. Section 6 concludes the paper.

2 Biological Sequence Comparison with DIALIGN

2.1 The Sequence Alignment Problem

To compare two sequences, we need to find the best alignment between them, which is to place one sequence above the other making clear the correspondence between similar characters from the sequences [21]. We define alignment as the insertion of spaces in arbitrary locations along the sequences so that they finish with the same size.

Given a pairwise alignment between two sequences s and t, an score can be associated for it as follows. For each two bases in the same column, we associate, for instance, +1 if the two characters are identical (match), -1 if the characters are different (mismatch) and –2 if one of them is a space (gap). The score is the sum of the values computed for each column. The maximal score is called the similarity between the sequences.

Page 3: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN 73

One of the first exact methods to globally compare two sequences was NW [17]. It is based on dynamic programming and calculates a similarity matrix of size m x n, where m and n are the sizes of the sequences. NW has time and space complexity O(mn). The NW algorithm was modified to deal with local alignments (SW)[22]. An algorithm based on SW that uses an affine gap function is proposed in [7].

G A - C G G A T T A G G T - C G G - T T A - G A T C G G A A T A G

+3 -1 –6 +3 +3 +3 -3 –1 +3 +3 -3 = 4

Fig. 1. Alignment of sequences s, v and t, with the SP score for each column

An MSA involves more than 2 sequences. In this case, the scoring function to be used is not straightforward. Often, MSAs are scored with the Sum-of-Pairs (SP) function, where every pair of bases is scored with the pairwise scoring function and the score is the addition of all these values [9]. Figure 1 shows an example of a MSA and its score.

2.2 The DIALIGN Algorithm

DIALIGN (DIAGonal ALIGNment) [14] is a method for sequence alignment that can be either used to pairwise alignment or multiple sequence alignment. This method searches for fragments (or diagonals) that have no gaps and aligns them. In DIALIGN, a pairwise alignment is defined to be a chain of fragments [13].

When applied to the MSA problem, DIALIGN is executed in three phases. In the first phase, all pairwise alignments are computed, i.e., there are nseq(nseq-1)/2 chains of fragments, one for each pairwise alignment, where nseq is the number of sequences [13]. In the second phase, the diagonals that compose the pairwise alignments are sorted by their weight and the degree of overlap with other diagonals. This sorted list is used to obtain a multiple alignment with a greedy algorithm, generating alignment Al. In the last phase, the alignment Al is completed with an iterative procedure where the parts of the sequences that are not yet aligned with Al are realigned by executing phase 2 again, in such a way that consistent non-aligned diagonals are included in Al [14]. This phase is repeated until no diagonal with a positive weight can be included in Al.

Now, we will explain in detail the first phase, which is the core of this algorithm. For each pairwise alignment, it is necessary to calculate the relevance of each

diagonal found before attempting to align it [13]. This is done by E(l,sm) = -ln(P(l,sm)), where P(l,sm) is the probability of a diagonal D of size l have at least sm matches.

For each candidate diagonal Di, a weight w(Di) is assigned as E(l,sm) if E(l,sm) is above a given threshold T and 0, otherwise.

When the algorithm obtains a new significant diagonal, it tries to align it consistently with other previously calculated significant diagonals [14]. In an alignment of k diagonals D1, D2, …, Dk the total score S is given by the addition of all weights w(Di), i=1 to k.

Page 4: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

74 A. Boukerche et al.

To discover the score S, a dynamic programming based strategy is used. Consider two sequences A and B, having sizes m and n, respectively. For each pair (i,j), it will be determined all integers k with k≤min(i,j) where the diagonal (ai-kbi-k,...,aibj) beginning at position (i-k,j-k) and ending in position (i,j) has a positive weight w. For each position (i,j) is defined a score(i,j) for the alignment in the prefixes (a1a2...ai) and (b1b2...bj).

The last diagonal Dk aligned in position (i,j) is recovered by function prec(i,j)= Dk (formula 2). For each diagonal Dk aligned in position (i,j), prec(i,j) chooses the chain of diagonals with the greatest score so far. The score is calculated as in formula 1, where σ(Di,j) is defined as the largest score chain of diagonals that ends in point (i,j).

score(i,j) = max{score(i-1,j), score(i,j-1), σ(Di,j) } (1)

prec(i,j-1) , If score(i,j)=score(i,j-1) prec(i,j) = prec(i-1,j), If score(i,j-1) < score(i,j) = score(i-1,j)

Di,j , If score(i,j-1), score(i-1,j) < score(i,j) = σ(Di,j)

(2)

Two dynamic programming matrices are calculated. One for scores (formula 1) and other for the preceding diagonal (prec in formula 2). Once these matrices are calculated, the reverse path on the precs matrix gives the alignment. One example of such alignment is given in figure 2. In figure 2(a), the subsequences belonging to diagonals are shown in gray and the aligned diagonals are shown as lines. Figure 2(b) shows the final alignment.

Fig. 2. Example of a pairwise DIALIGN alignment

DIALIGN-P [20] is a parallel version of DIALIGN that executes the first phase of the algorithm in parallel, with an strategy that tries to distribute evenly the pairs of sequences to be compared among the processors. An optimization called anchored alignment is introduced to reduce the execution time of each pairwise alignment. Nevertheless, this optimization potentially reduces the quality of the alignment produced [20]. Speedups of 19.32 were obtained in a 64-processor cluster, when comparing 20 sequences.

3 Related Work

There are many proposals in the literature of FPGA-based architectures to accelerate pairwise sequence alignment applications [2,10,12,18] by calculating the similarity

Page 5: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN 75

matrix antidiagonals in hardware. In this approach, each element is capable of calculating one matrix score per turn. Thus, an N elements array can generate N scores at a time.

Figure 3 shows how each anti-diagonal of the dynamic programming matrix is calculated in parallel by a 5-element systolic array. The query sequence (ACGAT) is previously stored in the elements of the array and the database sequence (CTTAG) flows through the systolic array. Each element calculates one cell in the current anti-diagonal (shown in gray in figure 3) at the same time.

Most of the hardware solutions do not store the entire similarity matrix, obtaining only the similarity score [2]. Besides that, there is a limited number of computing elements that can be put in the systolic array. To deal with it, the smallest sequence being aligned is often stored on the computing elements as a query sequence. The other sequence can be of any size, since it “passes” through the FPGA (figure 3).

5-element systolic array

antidiagonals calculated

Fig. 3. Generic systolic array to calculate the similarity matrix

Frequently, it happens that even the query sequence is greater than the number of computing elements contained in the FPGA. In this case, a partitioning technique is used. To break query sequences, it is necessary to keep some scores onboard to allow new scores to be calculated. Some designs avoid this problem by putting many query bases on the same computing element. The drawback is that it requires more registers per element and thus decreases the maximum number of computing elements in the systolic array.

As an alternative, dynamic reconfiguration can be used. In this case, the first part of the query sequence is put directly in the processing elements using the dynamic reconfiguration capability. After that, the FPGA is reconfigured to contain the next part of the query sequence and the database sequence passes again through the FPGA. This procedure continues until the last part of the query sequence is processed.

Page 6: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

76 A. Boukerche et al.

A drawback of this approach is that reconfiguration time normally takes a few milliseconds.

Table 1 presents some hardware approaches to accelerate biological sequence comparison applications. Most of these accelerator proposals tackle the pairwise sequence comparison problem. The only ones that deal with multiple sequence alignment [11,19] accelerate the most compute intensive phase of Clustal-W [24], which does NW[17]-based pairwise sequence alignment among all sequences. Most of the proposals do query sequence splitting either by using reconfiguration or storing many characters at the same systolic cell. The speedups obtained range from 5.6 to 246.9 over the software implementation. Finally, all proposals implement in hardware variations of the NW or SW with constant gap functions [17,22] or affine gap functions [7]. As far as we know, there is no proposal of hardware accelerator for DIALIGN.

Table 1. Comparative Analysis of the Hardware Accelerator Proposals

Paper Alignment Algorithm Alignment Problem

Seq. Split

Speedup

Oliver et al. [18] Smith-Waterman [22] / Gotoh [7] Pairwise Yes 170 / 125 Lavenier [10] Smith-Waterman [22] Pairwise Yes 83 Marongui et al. [12] Smith-Waterman [22] Pairwise No 5.6 Anish [1] Gotoh [7] Pairwise Yes 170 Boukerche et al. [2] Smith-Waterman [22] Pairwise Yes 246.9 Oliver et al. [19] Needleman-Wunsh [17] /Gotoh [7] Multiple Yes 50.9 Lin et al. [11] Needleman-Wunsh[17] Multiple Yes 34

4 Design of a Reconfigurable Architecture for DIALIGN

As discussed in section 2.2, the most compute intensive phase of DIALIGN is the first one [20], which calculates pairwise alignments among all sequences. These alignments are independent from each other and, therefore, very suitable for hardware parallelization.

As most of the previous works (section 3), we will parallelize the antidiagonal calculation of the dynamic programming matrix using a systolic array (figure 3). However, since the recurrence relations of DIALIGN (formulae 1 and 2) are different from the ones in NW and SW, an entirely distinct design must be made for each systolic element.

The goal of our architecture is to find the best DIALIGN score and its position. To do that, the following modifications were applied. First, we set sm=l in the probability calculation. Second, the ln logarithm (section 2.2) was replaced by a base 2 logarithm.

Our linear systolic array calculates the antidiagonals as shown in figure 4, using as a basis the generic systolic array (figure 3). In figure 4, the scores already calculated are shown in gray. The border between the gray and white part shows the antidiagonal being calculated. Diagonals greater than the threshold T are shown in black. For a diagonal that ends in position (i,j), the architecture decides if it will be extended or ended and, in this case, whether it can be consistently aligned to other diagonal or not (section 2.2).

Page 7: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN 77

Fig. 4. Dynamic programming matrix calculation

An architecture that performs DIALIGN must contain formulae 1 and 2. To improve the performance, diagonal finding and diagonal alignment can be done simultaneously in the systolic vector. The algorithm for each systolic element is shown in figure 5.

calculate_recurrence_systolic (db_pair, prec(i, j-1), score(i, j-1), i) begin

prec(i,j) = find_prec(prec,score); if (match(i,j)) D(i,j)=extend_current_diagonal(); else

if (w(D(i,j)) < T) discard (D(i,j)); else if (consistent (D(i,j), prec(i-1,j), prec(i, j-1))) prec(i,j) = D(i,j); score(i,j) = σ(D(i,j)); else if (w(D(i,j)) > prec(i-1,j) and w(D(i,j)) > prec(i,j-1)) prec(i,j) = D(i,j);

score(i,j) = w(D(i,j)); endif endif endif endif best_diagonal_systolic = prec(i,j); best_score_sytolic = score(i,j); send_to_next_systolic(prec(i,j),score(i,j),w(D(i,j)),flags); end

Fig. 5. Algorithm executed in each systolic element

Figure 6 shows the systolic array diagram. The database sequence base pairs are input on the left side and the scores and their respective positions are output on the right side of the circuit. A handshake protocol is included to transfer scores and positions between the elements (blocks marked as “I” (input) and “O” (output)). Clk stands for the clock and Rst for reset signal. The DIALIGN recurrence relations are processed in the DAC (Diagonal Alignment Circuit) block.

Page 8: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

78 A. Boukerche et al.

Fig. 6. Systolic Array Design

R - Register FileM - Multiplexers

Fig. 7. Diagonal Alignment Circuit (DAC)

Figure 7 shows the DAC element. The register bank (R) contains values used in recurrence relations. They are selected by a network of multiplexers (M) to enter the “Recurrence Module”. The results are stored in registers by another set of multiplexers. The control part is done by the “Control Module”.

Figure 8 shows the recurrence module circuit. Inputs (from In1 to In15) and control lines (C1 to C9) for the multiplexers are on left side and outputs(Out1 to Out6) are on right side. This circuit is utilized many times to perform all relations. The adder (+), In15 and C9 are utilized to extend weights w(D) of current diagonal D by 1 when a match happens. In15, In14 and “+” are used to calculate the sum of scores σ(D). The comparator “=” verifies if the bases are equal and whether some flag values are equal to zero or one (In5 to In8 depend on C4 and C5, giving Out3). The comparator

Page 9: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN 79

C1

In3

C2

1

In1

In2>

C4

In5

In6

=C5

In7

In8

C3

In4

Out1

&Out2

C6

In9

C7

In10

C8

In12

In13

>In11

C9

In14

1

Out5

|Out4

+In15

Out6

Out3

Fig. 8. Recurrence Module

“>” decides if w(D) is above T and is also used to find score(i,j) (formula 1). The recurrence relation in formula 2 is implemented by “>”, “=”and “&”. The first line in formula 2 is computed by the “=” comparator. The second and third are translated to “>”, “=” and “&” by the expression (score(i,j) > score(i,j-1) & (score(i,j)=score(i-1,j))) .

To eliminate current diagonal Di,j if it is inconsistent, we must test if the ending position of the previously aligned diagonal is greater than the starting point of the current diagonal. To calculate this, an OR (“|”) and two “>” are used. If Di,j is

Page 10: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

80 A. Boukerche et al.

inconsistent with prec(i-1,j) or it is inconsistent with prec(i,j-1) then Di,j is inconsistent.

We also designed a partition method based on dynamic reconfiguration to compare query sequences which have more bases than the FPGA systolic elements (section 3).

5 Experimental Results

Our proposed architecture was designed in SystemC [23] and was translated to Verilog with the FORTE tool [6]. It was then synthesized for a FPGA Altera STRATIX 2 EP2S180F1508I4 using QUARTUS II. Our 200-element prototype works at 74.48 MHz.

In order to verify the speedup of our architecture, we implemented DIALIGN in C, generating an optimized C program. We used the C program and our prototype to compare 2 pairs of real DNA sequences retrieved from the NCBI site. The sequences compared were from fungus Aspergillus niger contig An18c0160 (AM270408), Aspergillus niger contig An16c0230 (AM270375) and Encephalitozoon cuniculi (AL590443), with sizes 121589bp, 169786bp and 194439bp, rescpectively. The wallclock time to compare the first two fungi for the optimized C program running on a Pentium 4 3 GHz 512 MB was 6812 seconds and the FPGA took 17.9 seconds (wallclock time), achieving a speedup of 380.56. The comparison between the second and third sequences took 11053.70 seconds in software and 28.83 seconds in our FPGA prototype, leading to a speedup of 383.41. Note that the wallclock times do not include data transfer times (FPGA prototype) nor disk read operations (software implementation).

Also, we measured the time needed to reconfigure the FPGA, in the case where the size of the query sequence is longer than 200. For this test, we used two variations of annellovirus from NCBI (sequences NC_009225 and AB290918, with sizes 3245bp and 3242bp, respectively). The comparison done by the FPGA took 0.01s and the software comparison took 3.48s, achieving a speedup of 348. The time needed to reconfigure the systolic array was 0.0008s. Finally, we simulated a database search of a 200bp sequence on a 10Mbp synthetic genomic database.

As presented in table 2, the speedup achieved was between 340 and 383.41, when compared with the software implementation.

The pairwise stage of multiple alignment with DIALIGN was performed with 4 variants of Human Adenovirus. The DNA sequences used were NC_004001, NC_001405, NC_002067 and NC_003266 with sizes of 34794bp, 35937bp, 35100bp and 35994bp respectively. Each cell in table 3 shows the time in seconds for a given pairwise alignment for both the FPGA architecture and the software implementation.

Table 2. Speedups Achieved by Our Architecture

Query Seq size Database Seq size Time FPGA (s) Time software (s) Speedup 169,786 194,439 28.83 11,053.70 383.41 121,589 169,786 17.9 6812.00 380.55

3245 3242 0.01 3.48 348.00 200 10,000,000 1.74 661.39 343.03

Page 11: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN 81

Table 3. Pairwise Aligment Times for the FPGA and the software implementation

Time (s) FPGA / Software Sequences 34794bp 35937bp 35100bp

35937bp 1.09 / 415.31 --- --- 35100bp 1.07 / 406.81 1.11 / 419.35 --- 35994bp 1.10 / 416.33 1.14 / 431.42 1.11 / 420.12

In table 3, six comparisons were made. The total time for software alignment and

the FPGA prototype were 2509.34 seconds and 6.62 seconds, respectively. The speedup achieved was 379.05.

The FPGA STRATIX 2 EP2S180F1508I4 has an estimated price of $10,688. Comparing against a Pentium 4 3 GHz costing $1000, the price/performance ratio is 10688/ 383.41 = 27.87 against 1000/1 for the Pentium. So the FPGA’s price/performance ratio is 35.88 times lower than the Pentium.

6 Conclusions and Future Work

In this paper, we proposed and evaluated a new hardware architecture that performs multiple sequence alignment. Our architecture was designed to accelerate the pairwise step of DIALIGN that is the most compute intensive part of this algorithm for multiple sequence alignment. The proposed architecture was designed to handle large sequences by splitting the query sequence in blocks of 200. It was then successfully synthesized in an Altera FPGA STRATIX 2 EP2S180F1508I4.

As results for real DNA sequences of sizes 121 Kbp and 169 Kbp, we obtained a speedup of 383.41 against an optimized C implementation, indicating it can be very useful to accelerate the multiple sequence alignment problem. The speedups achieved with 3 very different sizes of sequences were between 343 and 383 and that indicates that the speedup achieved in not very dependent on the size of the sequences. As future work, we intend to integrate our architecture, which implements the first phase of DIALIGN, with a software algorithm that implements phases 2 and 3, leading to an integrated hardware/software approach. Also, we intend to investigate if the iterative phase of the algorithm (phase 3) can be implemented partially or fully in an FPGA.

References

1. Anish, A.: Hardware Accelerated Protein Identification, MsC Thesis, Univ. Toronto (2003)

2. Boukerche, A., et al.: Reconfigurable Architecture for Biological Sequence Comparison in Reduced Memory Space. In: IEEE IPDPS/NIDISC (2007)

3. Boukerche, A., et al.: Parallel Strategies for the Local Biological Sequence Alignment in a Cluster of Workstations. Journal of Parallel and Distributed Computing 67, 170–185 (2007)

4. Carrillo, H., Lipman, D.: The Multiple Sequence Alignment Problem. SIAM Journal of Applied Math. 48, 1073–1082 (1988)

Page 12: [Lecture Notes in Computer Science] High Performance Computing – HiPC 2007 Volume 4873 || An FPGA-Based Accelerator for Multiple Biological Sequence Alignment with DIALIGN

82 A. Boukerche et al.

5. Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis, p. 356. Cambridge Univ Press, Cambridge (1998)

6. Forte Design Systems, Cynthesizer User’s Guide For Cynthesizer 2.4.0. (2005) 7. Gotoh, O.: An improved algorithm for matching biological sequences. Journal of

Molecular Biology 162, 705–708 (1982) 8. Gotoh, O.: Significant Improvement in Accuracy of Multiple Protein Sequence Alignments

by Iterative Refinements as Assessed by Reference to Structural Alignments. J. Mol. Biol. 264, 823–838 (1996)

9. Gusfield, D.: Algorithms on Strings, Trees and Sequences, p. 534. Cambridge Univ Press, Cambridge (1977)

10. Lavenier, D.: Speeding up genome computations with a systolic accelerator. SIAM news 31 (1998)

11. Lin, X., Peiheng, Z., Dongbo, B., Shengzhong, F., Ninghui, S.: To Accelerate Multiple Sequence Alignment using FPGAs. In: HPCASIA (2005)

12. Marongiu, A., Pallazari, P., Rosato, V.: Designing Hardware for Protein Sequence Alignment. Bioinformatics 19, 1739–1740 (2003)

13. Morgenstern, B., et al.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. In: Proc. Natl Acad. Sci., USA, pp. 12098–12103 (1996)

14. Morgenstern, B., et al.: DIALIGN: finding local similarities by multiple sequence alignment. Bioinformatics (1998)

15. Mount, D.: Bioinformatics: Sequence and Genome Analysis. C. S. Harbor Lab Press (2004)

16. Notredame, C., Higgins, D., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2002)

17. Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two protein. J. Mol. Biol. 48, 443–453 (1970)

18. Oliver, T., Schmidt, B., Maskell, D.: Reconfigurable Architectures for Bio-sequence Database Scanning on FPGAs. IEEE Transactions on Circuits and Systems II 52(12), 851–855 (2005)

19. Oliver, T., Schmidt, B., Nathan, D., Clemens, R., Maskell, D.: Using Reconfigurable Hardware to Accelerate Multiple Sequence Alignment with ClustalW. Bioinformatics 21, 3431–3432 (2005)

20. Schmollinger, M., Nieselt, K., Kaufmann, M., Morgenstern, B.: DIALIGN P: Fast pair-wise and multiple sequence alignment using parallel processors BMC Bioinformatics (2004)

21. Setubal, J., Meidanis, J.: Introduction to Computational Molecular Biology. PWS Publishing Company, Boston (1997)

22. Smith, T., Waterman, M.: Identification of common molecular sub-sequences. J. Mol. Biology 147, 195–197 (1981)

23. Open SystemC Initiative Draft, Standard SystemC Language Reference Manual (2005) 24. Thompson, J., Higgins, D., Gibson, T.: Clustal W: improving the sensitivity of progressive

multiple sequence alignment through sequence weighting. Nucleic Ac. Res. 22, 4673–4680 (1994)

25. Wang, T., Jiang, T.: On the Complexity of the Multiple Sequence Alignment. J. Comp. Biol. 1, 337–348 (1994)