Sequence Alignment with Traceback on Reconfigurable Hardware

Preview:

DESCRIPTION

Sequence Alignment with Traceback on Reconfigurable Hardware. Author :Scott Lloyd and Quinn O. Snell Publisher/Conf : 2008 International Conference on Reconfigurable Computing and FPGAs Speaker : De-yu-Chen Data : 2010.5.5. Introduction (1/2). - PowerPoint PPT Presentation

Citation preview

Author :Scott Lloyd and Quinn O. SnellPublisher/Conf : 2008 International Conference on Reconfigurable Computing and FPGAsSpeaker : De-yu-Chen Data : 2010.5.5

1

Most efforts to accelerate bio-sequence applications with hardware have focused on database searches. Given a query sequence, an entire genetic database is scanned looking for other sequences that are similar.

Accelerating a database search is a simpler problem than alignment. Only the score for the comparison is computed by hardware in the forward scan, whereas alignment requires traceback in addition to the forward scan.

The sequence comparison problem can be mapped to a linear systolic array of Processing Elements (PEs) requiringO(min(m,n)) space.

2

Unlike most acceleration methods that focus on sequencecomparison, this research describes and evaluates a space-efficient, global sequence alignment and architecture that includes traceback for implementation on reconfigurable hardware.

The algorithm is based on dynamic programming (NW algorithm), but partitions the problem into slices for the FPGA hardware Since sequence lengths are often longer than the number of PEs available in a systolic array.

3

Let H denote the DP score matrix, The matrix fill occurs in a scan from upper left to lower right because of dependencies from neighboring elements.

4

During the forward scan, a pointer p belong to {DIAG, ABOVE, LEFT} indicates the current selection of the MAX function in Equation 1.

The value of p is saved to the traceback matrix T, thus T[i, j] = p, Following the forward scan, traceback proceeds from T[m,n] to T[0,0], thereby determining the best alignment. The result is a list of edit operations e belong to {SUBSTITUTE, INSERT, DELETE}.

5

The forward scan consists of two fundamental scan procedures ScanPartial and ScanFull.

1. ScanFull(A, B, x, y, T) :The ScanFull procedure does not partition the DP matrix and produces a full matrix T of traceback pointers that refer to adjacent elements of H.In the other word, ScanFull is run NW algorithm.

6

2. ScanPartial(A, B, x, y, R) :ScanPartial will compute H score matrix and R row pointer matrix. Given that p indicates the heritage of element H[i , j], the following recurrences for 1≦i m and 1 j n determine ≦ ≦ ≦ R.

7

Example: S1=GCCCTAGCG S2=GCGCAATG

8

R matrix

Only the designated columns of R are actually stored, which correspond to the right-most columns of a slice.(H matrix not be stored)

9

10

3. TraceFull(A, B, x, y, T, E) : The TraceFull procedure alignment, T is traceback pointer matrix and E is operations(SUBSTITUTE, INSERT, DELETE).

4. TracePartial(A, B, x, y, R, E)

Example: S1=GCCCTAGCG S2=GCGCAATG Suppose PEs=4.

Step1: Call ScanPartial(S1, S2, 9, 8, R) :

11

Designatedcolumns

Step2: Call TracePartial(S1, S2, 9, 8, R, E) :

12

Step2: Call TracePartial(S1, S2, 9, 8, R, E) :

13

Step2: Call TracePartial(S1, S2, 8, 7, R, E) :

14

Step2: Call TracePartial(S1, S2, 8, 7, R, E) :

15

Step2: Call TracePartial(S1, S2, 8, 7, R, E) :

16

Step2: Call TracePartial(S1, S2, 8, 7, R, E) :

17

Compare with NW algorithm :

18

The global alignment accelerator is implemented using Qnet [9], an open-source packet-switched network architecture similar to DIMEtalk [18].

19

Three global alignment implementations are tested in the evaluation: 1) A software-only version of the algorithm presented in this

paper.(NW algorithm)2) A version accelerated by the FPGA3) The Myers-Miller global alignment algorithm for an

additional point of reference.

Seq-Gen [1] produced varying lengths of test sequences ranging from 128 to 16383 symbols for the evaluation.

20

21

22

Recommended