NEEDLEMAN-WUNSCH AND SMITH-WATERMAN

NEEDLEMAN-WUNSCH AND SMITH-WATERMAN

IMPLEMENTATION FOR SPAM/UCE INLINE FILTER

CHIEW MING THONG

FACULTY OF COMPUTER SCIENCE AND INFORMATION

TECHNOLOGY

UNIVERSITY OF MALAYA

KUALA LUMPUR

2011

NEEDLEMAN-WUNSCH AND SMITH-WATERMAN

IMPLEMENTATION FOR SPAM/UCE INLINE FILTER

CHIEW MING THONG

SUBMISSION OF DISSERTATION FOR THE FULFILLMENT OF

THE DEGREE OF MASTER OF COMPUTER SCIENCE

FACULTY OF COMPUTER SCIENCE AND INFORMATION

TECHNOLOGY

UNIVERSITY OF MALAYA

KUALA LUMPUR

2011

ii

Abstract

Spam have been a significant problem as it consumes bandwidth of the internet, waste surfers

time, waste computational resources of internet service providers and reduce the efficiency of

email as a way of communication. Despite various anti spam solutions introduced, spam mails

tend to be able to avoid detection by slightly modifying their spam signature. This helps to avoid

anti-spam solutions from successfully detecting the keywords in emails that are closely

associated with spam. Two algorithms named Needleman-Wunsch and Smith-Waterman will be

implemented on FPGA as spam detection engine. Both algorithms share its origin from the

theory of dynamic programming and are normally implemented in bioinformatics for sequence

alignment. As both are well-known for their ability to detect sequences with slight changes

caused by mutation, these two algorithms will be used to detect spam messages that slightly

change its spam keyword. FPGA have been selected as the device for implementation. As

hardware are faster than software, using FPGA helps to reduce the scanning time and reduce the

CPU load of the computer. The advancement of FPGA technologies help to make it capable of

becoming a standalone scanning unit. The effectiveness of both algorithms in spam scanning will

be looked into. The corpus from Text Retrieval Conference (TREC 2007) will be used to test the

effectiveness of the anti-spam engines.

iii

Acknowledgement

I’d like to dedicate my special thanks to Mr. Emran bin Mohd Tamil for his supervision and

support during this project. Also not forgotten Mr. Mohd Yamani Idna Idris, Mr. Zaidi Razak

and Mr. Noorzaily Mohamed Noor for their continuous guidance and valuable advises. Thanks

to Mr. Farid from Symmid for his Xilinx knowledge and teaching and everybody from the

System On Chip development group. Last but not least to my beloved family for their patient and

support during this research.

iv

Table of Content

Abstract …………………………………………………………………………………………...ii

Acknowledgement………………………………………………………………………………. iii

Table of Content …………………………………………………………………………………iv

List of Figures ……………………………………………………………………...…………...viii

List of Tables ………………………………………………………………….…………………xi

Chapter 1 Introduction …………………………………………………….……………………...1

1.1 Background …………………………………………………………………………...1

1.2 Motivation and Purposes ……………………………………………………………...3

1.3 Research Objective …………………………………………………………………...3

1.4 Thesis outline ………………………………………...……………………………….4

Chapter 2 Literature Review and Technology Background ……………….……………………..6

2.1 What is spam? ………………………………………………………………………...6

2.2 The history of spam …………………………………………………………………..6

2.3 Privacy ………………………………………………………………………………..7

2.4 Spam Prevention ……………………………………………………………………...8

2.5 Rules and Regulations …………………………………………….…………………9

2.6 CAN-SPAM Act …………………………………………………………………….10

2.7 Ways to prevent spam ……………………………………………………………….13

2.8 Combined Solutions …………………………………………………………………17

2.9 The Challenge of Botnets …………………………………………...……………….17

2.10 The Effectiveness of Spam Filtering ……………………………………………….18

v

2.11 Bayesian Poisoning ………………………………………………..……………….19

2.12 Needleman-Wunsch Algorithm ……………………………...…………………….20

2.13 Smith-Waterman Algorithm ……………………………...………………………..20

2.14 Global Alignment and Local Alignment …………………………………………...21

2.15 Processing Time Issue ……………………………………………………………...22

2.16 Heuristics Based Algorithms ………………………………………………………22

2.17 The Selection of Algorithm………………………………………………………...22

2.18 Calculations of Needleman-Wunsch Algorithm …………………………………...24

2.19 Calculations of Smith-Waterman Algorithm ……………………..………………..27

2.20 FPGA …………………………………………………………………….………...29

2.21 FPGA Platforms…………………………………………………………………….30

2.22 Future Demand……………………………………………………………………...31

2.23 Parallelism………………………………………………………………………….31

Chapter 3 Design Methodology………………….……………………………...………………33

3.1 Flow Diagram………………………………………………………………………..33

3.2 Overall Architecture…………………………………………………………………34

3.2.1 Crossover Ethernet Cable………………………………………………….36

3.2.2 Null Modem cable…………………………………………………………36

3.2.3 JTAG Cable………………………………………………………………..37

3.3 Microblaze…………………………………………………………………………...37

3.4 Microblaze Hardware Design………………………………………………………..38

3.5 Microblaze Software Design…………………………………………………………39

3.6 Software and Hardware Development Applications…………………………………43

vi

3.6.1 Xilinx ISE Design Suite 10.1………………………………………………43

3.6.2 Xilinx ISE 10.1(Xilinx Integrated System Environment)…………………43

3.6.3 Xilinx XPS 10.1(Xilinx Platform Studio)…………………………………44

3.6.4 Xilinx XPS SDK 10.1(Xilinx Platform Studio Software Development

Kit)…………………………………………………………………………45

3.6.5 Xilinx ISE Simulator……….………………………………………………45

3.6.6 Xilinx Chipscope Pro………………………………………………………46

3.7 Other Applications Used……………………………………………………………..47

3.7.1 Hyper Terminal…………………………………………………………….47

3.7.2 Bray Terminal v1.9b……………………………………………………….48

3.7.3 Simple TCP Client…………………………………………………………49

3.8 Programming Languages Used………………………………………………………49

3.8.1 VHDL……………………………………………………………………...49

3.8.2 C Programming…………………………………………………………….50

3.9 Development Board………………………………………………………………….50

Chapter 4 Design and Implementation ……………………………….……….………………...52

4.1 Fast Simplex Link(FSL)…………………………………………..…………………52

4.2 IP Design ………………………………………….…………………..…………….54

4.2.1 FSL side_relay……………………………………………………………..54

4.2.2 Needleman-Wunsch and Smith-Waterman Hardware……………………..57

4.2.2.1 Hardware test_nw and test_sw…………………….……………..58

4.2.2.2 Hardware array_proc…………………………….………………63

4.2.2.3 Hardware processing_element…………………….……………..64

vii

4.3 Parallelism…………………………………………………………...……..………...67

Chapter 5 Results, Simulations, Analysis, and Testing……………………………..…………...71

5.1 Development of side_relay Unit…..…………………………………..……………..72

5.2 Developing the Needleman-Wunsch Algorithm IP Unit…………..….…………..…73

5.3 Developing the Smith-Waterman Algorithm IP Unit…………...…….……………..76

5.4 Overall Microblaze with Needleman-Wunsch IP and Microblaze with Smith-

Waterman IP……………………………….…………………..….………………….78

5.5 Post-Route Simulation ………………………………………………………………..80

5.6 Hardware Timing Diagram………………………………………………..…………..81

5.7 Processing Element of Needleman-Wunsch Post-Route Simulation …………………84

5.8 Processing Element of Smith-Waterman Post-Route Simulation …………………….85

5.9 Microblaze FSL side_relay unit Post-Route Simulation ……………………………..86

5.10 Hardware Data Sampling…………………..………………………………………...87

5.11 Xilinx Chipscope Pro Sampling of FSL Interface ……………………..……………87

5.12 Spam Mail Testing ……………………………………………………………..……91

5.12.1 Testing Criteria …………………………………………………..……….92

5.12.2 Results………………………………………………………………….…95

Chapter 6 Conclusion …………………………………………….……………………………..99

6.1 Contributions…………………………………………...……………………………..99

6.2 Further Improvements and Future Works………….…………….………….………100

Reference ……………………………………………………………..………………………..102

Appendix A…………………………..……………….…………………..…………………….113

Appendix B…………………………..……………….…………………..…………………….114

viii

Appendix C…………………………..……………….…………………..…………………….136

List of Figures

Figure 2.1 The category of anti-spam solutions(Hunt & Carpinter 2006)…………...……………8

Figure 2.2 Model Of Email Delivery. Source from (Hoanca 2006)……………………………..12

Figure 2.3 The computed score table of Needleman-Wunsch…………………………………...24

Figure 2.4 The calculation of a cell in the matrix table of Needleman-Wunsch algorithm……...25

Figure 2.5 The completed calculations of the matrix table of Needleman-Wunsch algorithm….25

Figure 2.6 The traceback being performed on the matrix table of Needleman-Wunsch

algorithm........................................................................................................................................26

Figure 2.7 The score table of Smith-Waterman algorithm………………………………………27

Figure 2.8 The computed matrix table of Smith-Waterman algorithm………………………….28

Figure 2.9 The traceback being performed on the computed matrix table of Smith-Waterman

algorithm…………………………………………………………………………………………29

Figure 3.1 The stages involved in designing the inline filter…………………………………....33

Figure 3.2 Illustrates the operations of the FPGA systems.……………………………….……..34

Figure 3.3 The overall design of the architecture…………………………………….………….35

Figure 3.4 The design of the Microblaze and how it interconnects with the inline filter. ……...39

Figure 3.5 The flow of the software in Microblaze once the design is started.…...………...…...40

Figure 3.6 The flow of software after a connection is accepted.….…………………..….……...41

Figure 3.7 Further elaboration of the scanning function is shown in this figure.……..…………42

Figure 3.8 The sample coding used to display output from the serial port. ………………..…....48

Figure 4.1 The block diagram of a FSL bus (Fast Simplex Link (FSL) Bus (v2.11a), 2008)…..53

Figure 4.2 The block diagram of the side_relay hardware…………………...………………….54

ix

Figure 4.3 The type of interface that could be added with Xilinx Platform Studio.…..………...55

Figure 4.4 The two design unit that are connected to the Microblaze core.….…….….………...56

Figure 4.5 The new connection after reconfiguration is being made……………..…...………...56

Figure 4.6 State diagram of the side_relay unit…………………………………..….…………..57

Figure 4.7 The port interface of the algorithm IP…………..…………………….……………...58

Figure 4.8 The flow chart of the Needleman-Wunsch test_nw and Smith-Waterman test_sw

IP. On the right is the further breakdown of the Delay state.……………..………...…………...60

Figure 4.9 Further breakdown of the compute table in the flow chart.………………………….61

Figure 4.10 The interconnection of side_relay, algorithm IP and the Microblaze core.………...62

Figure 4.11 The port interface of the array_proc hardware………..……….….….……….…….63

Figure 4.12 The port interface of the processing_element hardware.………….…...……….…...64

Figure 4.13 The mapping of the VHDL block of the algorithm IP.………..…………………....66

Figure 4.14 The flow of execution for a single processing element system…………..………...68

Figure 4.15 The flow of execution for multiple processing element system in the middle of

the matrix table computation.…………….………………………………………….…………..68

Figure 5.1 The diagrams of the connections between the Microblaze core, the side_relay unit

and the main engine or algorithm IP...………………………………………….……………..…71

Figure 5.2 Resource utilization and the timing summary of the side_relay unit……………...…73

Figure 5.3(a) Resource utilization of Needleman-Wunsch IP………………………...…………74

Figure 5.3(b) Resource utilization of Needleman-Wunsch IP………………………….………..75

Figure 5.4 Timing summary for Needleman-Wunsch IP………………………………...……...75

Figure 5.5(a) Resource utilization of Smith-Waterman IP…………...…………………...……..77

Figure 5.5(b) Resource utilization of Smith-Waterman IP……………………………...……….77

x

Figure 5.6 Timing summary for Smith-Waterman IP……………………………………………78

Figure 5.7 Overall resource utilization of the Microblaze system with the Needleman-Wunsch

IP………………………………………………………………………………………………....79

Figure 5.8 Overall resource utilization of the Microblaze system with the Smith-Waterman

IP..…………………………………………………………………………………….………….80

Figure 5.9 Some of the simulation results of test_nw from Needleman-Wunsch

algorithm………..…………………………………….………………………….………………81

Figure 5.10 The test_nw hardware output the

result.……………………………………………………………………………...……………...83

Figure 5.11 Post-route simulation of Needleman-Wunsch algorithm processing element.....…..84

Figure 5.12 Post-route simulation of Smith-Waterman algorithm processing element …......…..85

Figure 5.13 Post-Route Simulation of Microblaze FSL side_relay unit…………………………86

Figure 5.14 Label 1, 2, 3 and 4 that shows the interface covered by Chipscope Pro

Analyzer in Figure 5.15………………………….…………………………………...………….87

Figure 5.15 The Chipscope Pro Analyzer result collected from interface labeled 1, 2, 3

and 4 in Figure 5.14……………………………………...………………………………………88

Figure 5.16 Label 5 and 6 that shows the interface covered by Chipscope Pro

Analyzer in Figure 5.17………………………………………………………………………….89

Figure 5.17 The Chipscope Pro Analyzer result collected from interface labeled 5

and 6 in Figure 5.16……………………………………………………….……………………..89

Figure 5.18 The Chipscope Pro Analyzer result collected from interface labeled 7

and 8 in Figure 5.16……………………………………………………………….……………..90

Figure 5.19 Procedure used by Criteria 1 to calculate the marks in Microblaze software………93

Figure 5.20 Procedure used by Criteria 2 to calculate the marks in Microblaze software………94

Figure B.1 The RTL Schematic of the VHDL block of the algorithm IP……….…………..…114

xi

Figure B.2 RTL Schematics of array_proc unit for Needleman-Wunsch……………………...115

Figure B.3 RTL Schematics of array_proc unit for Smith-Waterman……………………….…124

Figure B.4 The RTL Schematic of the processing element for Needleman-Wunsch…………..133

Figure B.5 The RTL Schematic of the processing element for Smith-Waterman……………...134

Figure B.6 RTL Schematics of side_relay……………………………………………..……….135

Figure C.1 The first half post-route simulation for Needleman-Wunsch Algorithm…………..137

Figure C.2 The second half of post-route simulation for Needleman-Wunsch Algorithm…….137

Figure C.3 The first half of post-route simulation for Smith-Waterman Algorithm….………..138

Figure C.4 The second half of post-route simulation for Smith-Waterman Algorithm………...138

List of Tables

Table 2.1 Various types of anti-spam solution and its description……………………………....17

Table 4.1 The description of ports that are in the master side of the FSL bus.………………….53

Table 4.2 The description of ports that are in the slave side of the FSL bus.…..…..……...…….53

Table 4.3 Port description for the array_proc hardware…………………………………………63

Table 4.4 Port description for the processing_element hardware ………………..……………..64

Table 4.5 The coordinates of values received by diagonal, upper and left registers…………….70

Table 5.1(a)(b) The results for testing of spam email..…………………………………..……....96

Table A.1 Microblaze terms and definitions…………………………………………………....113

1

Chapter 1

Introduction

1.1 Background

Spam has been a major problem in the world of internet. In year 2007 alone, spam costs around

$100 billion of US dollar for productivity loss worldwide (Ferris Research 2007).The spam

problem is so overwhelming that it reduce the efficiency and dependability of the network

(Ming-Wei et al. 2005) and consumes bandwidth (Hoanca 2006). If spam issue is not resolved or

reduced, it may stop email from being a way of internet communications altogether.

To date, there are various ways of blocking spam that is being researched, proposed and

implemented in the real world. Though none of these approaches totally eliminate the spam

problem it does however, help to reduce spam problem and increase the efficiencies of email

usage. One of the most popular and effective technique in filtering spam is using Bayesian

approach for detection. Bayesian detection however, is not a perfect anti-spam solution as

spammers could bypass it using Bayesian poisoning techniques (Cumming 2006). In Bayesian

poisoning, spam keywords are slightly modified to evade detection of the filter. The process of

detecting spam keywords that are slightly modified calls for the need of suitable approximate

algorithms.

After careful considerations, two algorithms were being selected for implementations

which are Needleman-Wunsch and Smith-Waterman. Both of these algorithms were well-known

2

in the field of bioinformatics for the purpose of gene sequences detection. In genes, sequences

were being compared to find similarities in two sequences that are slightly different as a result of

mutations. The string sequences detection in bioinformatics does have some similarity with spam

keyword detection. Only in spam keywords, the two sequences of words being compared were

mostly less than fifteen characters long while in gene comparison, the two sequences being

compared could be from several thousands to billions of characters long. Based on reviews done,

the two algorithms are not widely used yet because of the current technology limitations in

supporting sequences comparison that are too long. For this dissertation however, the two

algorithms were designed to compare sequences of normal text words that are mostly less than

fifteen characters long. The FPGA used were able to accommodate the smaller algorithms

designs.

Even though smaller designs of the two algorithms are being implemented, it still

requires a lot of processing power and time if it were implemented in software. Instead of

implementing it in software, the systems were implemented in FPGA hardware for faster speed.

Hardware have the advantages of computing at wire speed, the ability of utilize parallelism and

low power consumption.

This dissertation will provide an insight of some of the anti-spam approaches being

researched and applied in the world. It will highlight some of the problems faced in current anti-

spam solutions and proposed to solve some of the problems using the two algorithms. The

process of implementing the two algorithms in FPGA is being reviewed and discussed in this

dissertation. It then concludes with simulation, testing and analysis with the proposal of future

works.

3

1.2 Motivation and Purposes

(i) The problem of spam has been increasing over time calling for a faster anti-spam scanning

method to address the problem.

(ii) The need for a way to address the Bayesian poisoning problem.

(iii) Needleman-Wunsch and Smith-Waterman are computationally intensive algorithms. With

hardware, both algorithms could perform computations faster than software. FPGA offers the

advantages of computing at wire speed, the ability of utilize parallelism and low power

consumption.

1.3 Research Objective

(i) To study existing approaches and challenges in filtering spam, current ways of hardware

implementations and find two suitable algorithms.

(ii) To design two systems of hardware in FPGAs. One of the systems incorporates the first

algorithm and the other incorporates the second algorithm.

4

(iii) To implement both systems and test the effectiveness of it in detecting poisoned signatures.

1.4 Thesis outline

Chapter 2 Literature Review and Technology Background – In this chapter, various literature

reviews will be made covering anti-spam solutions and the background of the algorithms. This

chapter will also cover about the technology background of the design used.

Chapter 3 Design Methodology – The method of developing the environment is explained in this

chapter. This chapter will cover the software and hardware used in the design as well as brief

description of programming languages used.

Chapter 4 Design and Implementation – This chapter will brief about how the hardware is

designed, with the functions of various blocks of hardware being explained.

Chapter 5 Results, Simulations, Analysis, and Testing – Results of implementations is shown in

this chapter. It also explains about the simulations performed on the design. It is then followed

with testing results at the end. Various analyses were provided along the chapter.

5

Chapter 6 Conclusion – This section concludes the contributions in this research with further

improvements and future works.

6

Chapter 2

Literature Review and Technology Background

2.1 What is spam?

Spam is also known as Unsolicited Commercial Email (UCE) (Hoanca 2006) (Haupt 2004) and

Unsolicited Bulk Email (UBE) (Hoanca 2006) (The Definition of Spam 2007). Spam messages

are sent to groups of recipient without their consent (Gunnarsson & Ekberg 2003) (The

Definition of Spam 2007). UCE or UBE are commonly used by companies and individuals to

send emails to large number of people in a short time. Examples of spam are advertisements for

medications, websites, illegal items, rewards and prizes (Oda 2005).

2.2 The history of spam

The first incident of spam could be traced back to Digital Equipment Corporation sending large

amount of email to publicized about their new machine to all ARPANET addresses on the

United States west coast (Gunnarsson & Ekberg 2003) (Haupt 2004). Back then, the term spam

was still not in use in the computer community (Gunnarsson & Ekberg 2003).

According to (Haupt 2004), an email is considered a spam if:

i: The identity of the recipient is not relevant.

ii: The recipient never grant consent or permission for the email to be sent.

iii: The sender receive a sum of benefit out of proportion by sending spam mails to recipients.

7

During the late 1990s, the spam has become more and more an issue in the technology world.

There are different opinion in media and academic on this problem. At one corner, people regard

the spam problem as a mild annoyance (Crews 2001). On the other corner, people regard it as

predictions of doom. There are fears that the spam will overwhelm users and stops them from

using email altogether. Based on the issue ten report of Messaging Anti-Abuse Working Group,

the number of abusive emails is reported to be steadily in the range of 89% and 92% (MAAWG

2009). By 2015, the volume of spam is predicted to exceed 95% of all email traffic.

According to an AOL report in 2004, the spam volume has increased to an almost

100,000% from 1997 to 2004. Spam email is also being used as a vehicle for delivering viruses,

worms and phishing attacks that could lead to financial losses, data loss and identity theft

(Hoanca 2006) (Ming-Wei et al. 2005) (Catalin & Maria 2009). Even though there are a lot of

efforts by large companies, organizations and government over the recent years to stop spam, the

spam traffic continue to rise (Oda 2005). The increase of spam traffic result in what is equivalent

to distributed denial of services (DDOS) attack as the mail transfer agents(MTAs) resources are

being used to transfer spam traffic beside real email messages(Ming-Wei et al. 2005).

2.3 Privacy

Generally, internet users prefer to have the best personalized internet services available while at

the same time the ability to control their own privacy (Jacobsson & Carlsson 2007). They want

the rights to determine how their information is being used in the internet and by whom

(Gunnarsson & Ekberg 2003).

8

Spam is considered as one result of bad privacy protection. Companies collecting

information they could acquire from individuals could possibly sell it to any third party without

the owner's permission. The information obtained by the third party could be used to send spam

to individuals (Gunnarsson & Ekberg 2003).

2.4 Spam Prevention

At present, spam prevention were divided into 3 broad categories which is legislation, protocol

change and filtering (Hunt & Carpinter 2006). In legislation, rules and regulations were made by

a country or a group of country to keep the spam problem in check. For protocol change, new

ways for email communication is being studied to find better ways to reduce spam problem. This

includes email taxing, approaches and techniques to eliminate spam problem. In spam filtering,

filtering is divided into various categories as the Figure below adapted from (Hunt & Carpinter

2006).

Figure 2.1 The category of anti-spam solutions (Hunt & Carpinter 2006).

9

2.5 Rules and Regulations

Punishments against spammers tend to be more difficult as spam laws are limited to different

countries and states. Besides, there are no clear definition of spam which are agreed universally

(Oda 2005). A spammer that violate the spam law of a country may not violate the spam laws in

another country (Oda 2005). Because of this, all the spammers have to do are to move to places

that they didn‟t violate the laws (Hoanca 2006). Fundamentally, the privacy protection in

European Union is better than in United States (Gunnarsson & Ekberg 2003). One of the key

difference is that individuals in the United States do not own their own data collected from them

while individuals living in European countries do. Citizens in United States have different level

of privacy protection depending on the state they are living in. Based on the lessons learnt from

the Second World War, post-war Europe realized the threat of gathering private information.

Private information in the wrong hands might be devastating. European countries adopted the

United Nations guidelines and the Council of Europe Convention for the Protection of Human

Rights in 1950 (Gunnarsson & Ekberg 2003). The current law and legislation in USA and EU

tend to have less effect in controlling the spam volume. There are arguments that the current law

contributes to the increase of spam volume as spammers are allowed to send spam legally by

following certain rules (Carpinter & Hunt 2006).

10

2.6 CAN-SPAM Act

In Can-Spam Act, UBE are required to have labeling, opt-out instructions and sender‟s physical

address. Under this law, messages are prohibited to have deceptive subject lines and false

headers (Gunnarsson & Ekberg 2003). The first case that is charged under the CAN-SPAM act

could be traced to Anthony Greco, 18 from Cheektowaga, New York. Anthony was alleged to

have sent more than 1.5 million spam over Internet Messaging or SPIM in MySpace.com. He

threatened to tell other spammers how to send spim unless given the exclusive right to keep

sending spim (Hoanca 2006).

Despite the existence of the CAN-SPAM Act, there are studies that show that there have

been very low rate of compliance by advertisers. The studies were conducted based on 1,133

email messages from 4,800 email messages in 5 email accounts (Galen 2007). Based on these

studies, it shows that more need to be done to reduce the spam problem besides legislation alone.

The bill of right in United States is based on five principles (Gunnarsson & Ekberg 2003).

i.No personal data record-keeping systems whose very existence should be secret.

ii.A person should be able to find out what personal information is stored and used.

iii.A person must be able to prevent his or her personal information from being used or available

for other purposes than the intended purpose.

iv.A person must be able to correct identifiable personal information.

v.Organisations creating, maintaining, using, or selling records of personal data must assure the

reliability of the data for their intended use and prevent misuses.

11

While for European Parliament they adopted Data Protection Directive(95/46/EC) in 1995. The

Directive states that (Gunnarsson & Ekberg 2003) :

i. Member states shall protect the fundamental rights and freedom of persons, and in

particular their right to privacy with respect to the processing of data.

ii. There are also principles relating to data quality, which for instance declare that

personal data must be collected for specified, explicit and legitimate purposes and

not further processed in a way incompatible with those purposes.

iii. Personal data must also be processed lawfully, and it must be adequate, relevant and

not excessive in relation to the purposes for which they are collected. The personal

data may only be processed if the data subject has given his or her consent.

iv. The controller also must inform the data object of his or her right to access and

rectify the data concerning him or her. In the cases where the data have not been

obtained from the data subject, the controller must inform the data subject of the

identity of the controller, the purposes of the processing and other information.

The crisis of terrorist attack on September 11, 2001 has lead the American government to

ignore the privacy of internet users (Gunnarsson & Ekberg 2003). In order to track responsible

terrorists, Federal Bureau of Investigation or FBI has been installing controversial cyber

snooping software DCS-1000 known as Carnivore in Internet Service Providers in United States

12

(Gunnarsson & Ekberg 2003).There are reports that Central Intelligence Agency, CIA had leaked

sensitive commercial information gathered by the signals intelligence collection and analysis

network ECHELON. The leak of sensitive information actually leads Boeing to win an aircraft

contract worth $6 billion from Airbus. The former director of CIA James Woosley however,

clarified that it was for the purpose to “level the playing field” only (Gunnarsson & Ekberg

2003).

Simple Model Of Email Delivery

Figure 2.2 Model Of Email Delivery. Source from (Hoanca 2006).

The sender client, sender server, receiving server and receiving client are software and hardware

subsystems. To send an email a sender client compose a message when connected with the

sender server. The message is then sent to the sender server. The sender server then connects to

the receiving server and validates the existence of the recipient account before transmitting the

message to be stored in the server. The message will be retrieved by the receiving client when

the receiving client is connected to the system.

(1)Sender

Client(Outlook

Express, Eudora)

(2)Sender

Server(Exchange,

Sendmail)

(3)Receiving

Server(Exchange,

Sendmail)

(4)Receiving

Client(Outlook

Express, Eudora)

13

If any technique to block or reduce spam is being applied at (4)Receiving Client, it will

help reduce the loss of productivity at the human recipient. But, the cost to deliver the message,

by the sender server to the receiving server and to the client will have to be the burden of the

server owner and later passed on to the end user. To effectively stop the spam, spam control

techniques should be applied before it even leaves the sending client.

2.7 Ways to prevent spam

Generally, there is no silver bullet in spam prevention. That's it, no one approach fits all.

Prevention Methods How it works Advantages Disadvantages

Rules and Regulation Using laws and

legislations in

banning the activities

of spam in the

country.

The activities of

spamming in the

country are

prohibited.

Rules and regulations

have been the least

effective methods

used against spam.

Spam laws created in

a region tend to push

spammers offshore

outside the

jurisdiction of the law

instead of eliminating

it (Hoanca 2006).

Spam Filtering (Black

Listing) (Hoanca

2006)(Lorrie & Brian

1998)

The method of

blacklisting functions

by user listing the

email addresses to be

blocked.

Blacklisted email

addresses could not

send emails to users.

Spam filtering at the

receiving end is not

that effective as the

cost of sending the

spam has been borne

by the receiving

server. This method

could be easily

overcome by

spammers by using

botnets or zombie

Spam Filtering (White

Listing)(Hoanca 2006)

Whitelisting allows

only certain email

addresses to deliver

email to the address.

Emails from other

addresses are not

accepted and spam

problems is reduced.

This solution however

require users to add

new recipients

manually

14

Spam Filtering

(Bayesian Decision

Making)(Sahami et al.

1998)

A machine learning

spam filtering method

that are based on

probabilistic

calculations

The anti spam system

will continuously

calculated the

probability of each

spam words and

update the spam

filter.

Possibility to bypass

this filter using

Bayesian Poisoning.

Rate Throttling

Approach(Teergrubing)

(Hoanca 2006)

Teergrubing

functions by delaying

the receipt of email

messages.

The use of

Teergrubing has less

impact when single

messages are sent by

the server. However,

for spammers sending

a large number of

emails, it could slow

down the spammer‟s

server/s significantly.

Softwares that use

this concept are

TarProxy and Jackpot

(Hoanca 2006).

Besides consuming

the resources of the

spammers,

Teergrubing also

consume the

resources of the

server.(TWINING et

al. 2004)

Rate Throttling

Approach(TCP

damping) (Hoanca

2006)

By using TCP

damping, server that

receive email

messages will

calculates spam

scores for delivered

messages. The server

will then artificially

delay the

confirmation of email

messages that have

high spam scores.

For one sender, the

delay of sending the

email is not

significant but for

spammers that sent a

large number of

messages, it greatly

slows down the

process. The use of

TCP damping could

indirectly help

authorities to detect

spammers. Legal

servers will keep on

delivering messages

even though the delay

increase but servers

that are sending spam

tend to give up on

increasing delay.

To use TCP Damping

however, it require

code on the receiving

side to be rewritten to

use the spam scores.

Users of the servers

that are being used for

sending spam would

have to bear the

trouble of delay to the

receiving server. The

spam score are

dependent to the

filters that determine

the score. The

spammers could also

modify their spam

message format to

evade detection.

Rate Throttling

Approach(Grey

Listing) (Hoanca 2006)

The way grey listing

works is by first

refusing the

connection to the

There are reports that

the combination of

grey listing, white

listing and black

The drawback of

using grey listing is

that it has high false

positive rate. False

15

server that are not in

the whitelist. Normal

servers will attempt

to retransmit again

but for spam servers,

it is unlikely that it

will retry.

listing helps to reduce

spam by 88%.

positive are incident

that normal mail are

being misinterpreted

as spam. Besides,

some poorly

configured servers

will drop the

connections when

being denied

connection the first

time.

Alliance-Based

Approach(Yu-Fen et al.

2007)

The alliance-based

server functions by

using multiple servers

located at different

locations connected

to each other.

Through reliable and

secure connection,

they synch data

consisting of spam

signatures with each

other. Each server

will have their own

group of user and the

server will learn from

the user.

The system is able to

block more spam and

have good

performance.

Spam detection still

requires long

processing time.

Counterattack solutions

(Lorrie & Brian 1998)

Counter attack

solutions work by

replying to spam with

false applications.

The false applications

will burden the

spammers as they are

unable to differentiate

between the real

applications and fake

one. The technique

also works by

sending mass

complaints to the

spammers ISP.

This tactics

sometimes help to

produce

inconvenience to the

spam sender. The

spam sender will also

have their accounts

revoked by the ISPs.

Sometimes the true

identity of the spam

senders are hard to

trace as the spam

sender could be using

other victims email

addresses. The

counterattacks may

also end nowhere

leaving a large

amount of bounced

notice messages.

Opt-out list(Lorrie &

Brian 1998)

Users click the link

provided by the mail

to stop receiving

By selecting opt-out

list provided with the

spam mail, user will

However, sometimes

selecting the opt-out

will let spammers to

16

mails from the same

source in the future.

be able to stop

receiving from the

source.

know the existence of

the address and spam

even harder.

Channels(Lorrie &

Brian 1998)

A channelized email

functions by using

multiple email

address for a single

mail. A user may use

the public email

address alias for

business cards, public

posting on blogs or

submitting emails.

For private purposes,

the user may assign

another email address

alias to the same

email account. The

email account will

store the emails

received in different

channels according to

different email alias.

Once the user started

to receive spam from

the public address

alias, the user could

delete that address

alone and not the

whole email account.

The drawback of

using channels is

sometimes important

or wanted mails were

also received on the

same channel that

receives spam.

Authentication-Based

Spam Control(Hoanca

2006)

The users need to

login before using the

system. Each account

will have a reputation

score. Users that

seldom send spam

will have more

control of their

account as they have

a higher reputation

score. Users that send

lot of spam will have

a low reputation score

and have less control

of their account

As the emails will be

blocked or delayed if

the sender‟s IP were

not from

authenticated users,

this technique are

effective in reducing

spam.

Sender‟s could hijack

or infect trusted

sender‟s computer

with worm and used it

to send spam.

Munging(Ming-Wei et

al. 2005)

Munging works by

changing the email

address to a form that

are not detectable by

email harvester and

spam bots. For

example,

Munging could fool

spambots and other

email harvester

temporarily.

Spammers could

design their spambots

to be able to adapt to

munging tricks and

make the technique

ineffective.

17

[email protected]

could be change to

“jack at yahoo dot

com”. Email

addresses that are

being munged could

temporarily fool

spammers

Table 2.1 Various types of anti-spam solution and its description.

2.8 Combined Solutions

Generally, using combined solutions is better than using a single solution alone. Each solution

has their own weaknesses and strength. By using multiple filtering techniques, one solution

weakness could be covered by another solution (Hoanca 2006).

2.9 The Challenge of Botnets

To prevent detection, spammers have started to use botnet for their attack. In (Al-Bataineh &

White 2009), the use Botnet which consist of a network of compromised machine for spamming

is highlighted. By spamming this way, the group of botnets which receive command sent by its

master could launch attack and start mass-mailing from a large number of different sources of

machines crossing many domains of network. This could make it harder for the source of spam

to be detected.

18

2.10 The Effectiveness of Spam Filtering

The effectiveness of spam filter are literally categorized into these four categories (Yu-Fen et al.

2007):

False positive – Spam mail are being classified as non-spam mail.

False negative - Non-spam mail is being classified as spam mail.

True positive - Spam mail is being classified as spam mail.

True negative - Non-spam mail is being classified as non-spam mail.

Spam filter that have a high false positive rate is being considered to be less effective as

spam mail could still enter the inbox of email. With lower the false positive rate, less spam mail

will be found in inbox. In contrast, spam filter that have high false negative are considered a

more serious problem as real email are being classified as spam. If a spam filter that has high

false negative rate is being used for filtering, it will be a serious problem as important mails are

being blocked from reaching users.

19

2.11 Bayesian Poisoning

Bayesian poisoning is a method used by spammers to evade detection by machine-learning anti-

spam system. Using various ways, keywords word like “Viagra” could be modified to become

“V1agra”, “v1@gra” and so on (Hayes 2007). The slight modification of certain specific

keywords was designed to reduce the sensitivity of spam filters by injecting it into spam mail. As

Bayesian filter use spam mails for training, the weight score of certain common keywords like

Viagra could be reduced and thus reducing the effectiveness of anti-spam system.

In MIT spam conference of 2004, John Graham-Cumming demonstrates of two ways that

could possibly be used to attack POPFile‟s Bayesian engine (Graham-Cumming 2004). The first

way is by inserting random word from various literatures into the spam. This method did not

work because some of the words inserted are either in spam signature database or in words

identified as ham or in none of these two categories. The other way of attack was successful,

however. The method works by inserting random words into small amount of spams and then

add a web bug to confirm the reception. Once the web bug confirmed the reception, the system

will be trained to use the same poison words. After sending large amount of spam to a user,

certain amount of words are confirmed could be used to get through the anti-spam engine. As the

threat of Bayesian poisoning is real, research on counter-measures of Bayesian poisoning need to

be done before the problem get out of hand (Graham-Cumming 2006).

20

2.12 Needleman-Wunsch Algorithm

The Needleman-Wunsch algorithm is published by Saul B. Needleman and Christian-Wunsch

for the first time in (Needleman & Wunsch 1970). Since then, it has become the first algorithm

that applies the concept of dynamic programming in biological sequence comparison. Being an

approximate algorithm, Needleman-Wunsch are widely used for research into Deoxyribonucleic

acid(DNA), amino acid and protein alignment (Thomas & Rance 2003)(Lesk, Levitt & Chothia

1986)(Rose & Eisenmenger 1991)(Canella & Miglioli 2003)(Needleman & Wunsch 1970)(Du &

Lin 2004)(Xia & Dou 2007)(Mark & Michael 1996). As stated in (Thomas & Rance 2003),

DNA were consist of three parts which is phosphates, sugars and nitrogenous bases. The

nitrogenous bases are the information-containing portion of the DNA and represented in four

bases which is adenine, cytosine, guanine and thymine. Therefore in bioinformatics, these

nitrogenous bases were represented as A, C, G and T. Needleman-Wunsch algorithm have been

widely used to find similarity in DNA, protein and amino acid that are otherwise impossible to

find using visual comparison as the length of nitrogenous bases in Brook Trout and the Arctic

Char fish mitochondrial genome alone amount to approximately 16 thousand characters (Thomas

& Rance 2003).

2.13 Smith-Waterman Algorithm

Smith-Waterman algorithm is being published in 1981 in (Smith & Waterman 1981). Since it is

being published, Smith-Waterman are widely used in various areas like DNA(Li, Shum &

Truong 2007) (Xiandong & Vipin 2004), RNA(May et al. 2007), amino acid (Brutlag et al.

21

1993) and protein sequence comparison(Fa, Xiang-Zhen & Zhi-Yong 2002). Like Needleman-

Wunsch algorithm, Smith-Waterman belongs to the family of dynamic programming algorithm

(Nash, Blair & Grefenstette 2001). Dynamic programming algorithms require a large amount of

processing power and memory for its calculation. Various researches have been done to enhance

the speed of this algorithm. The researches includes using clusters (Boukerche, De Melo &

Ayala-Rincon 2005) and distributed computers (Jacob et al. 2007), SIMD (Hasan, Al-Ars &

Vassiliadis 2007), FPGAs (Benkrid, Ying & Benkrid 2007) (Storaasli, Strenski & Inc 2007) and

techniques to reduce memory usage and calculations (Fa, Xiang-Zhen & Zhi-Yong 2002) (Harris

et al. 2007).

2.14 Global Alignment and Local Alignment

Global alignment algorithms are used to align two sequences that are almost the same length. It

assume that two sequences are almost the same with minor differences in it (Knees, Schedl &

Widmer). For local alignment algorithm, it attempts to find group of similarity region in

sequences. Local alignment algorithm are more suitable to align two sequences that are relatively

very different in length in which is one is very long compared to the other (Boukerche, De Melo

& Ayala-Rincon 2005) (Christian & Jon 2006). Needleman-Wunsch are global alignment while

Smith-Waterman are local alignment based algorithm (Hasan, Al-Ars & Vassiliadis 2007).

22

2.15 Processing Time Issue

Dynamic Programming algorithms require a large amount of calculations to build the matrix

table. For example, the comparison of two strings of 5 characters long required 25 calculations.

While the comparison of two strings of 7 characters long require 49 calculations. The

exponential increase in the number of calculations also increases the requirement for more

memory to store the calculated table.

2.16 Heuristics Based Algorithms

BLAST is heuristics based algorithm that is used as an alternative to dynamic programming

algorithm in bioinformatics. Both of the algorithms are faster but less sensitive than dynamic

programming algorithms. Even though heuristic algorithms are less sensitive, it were chosen to

be used in bioinformatics as it require less processing power compared with dynamic

programming algorithm and therefore faster(Nash, Blair & Grefenstette 2001) (Hasan, Al-Ars &

Vassiliadis 2007) (Hsien-Yu, Meng-Lai & Yi 2004) (Boukerche, De Melo & Ayala-Rincon

2005) (Li, Shum & Truong 2007) (Xiandong & Vipin 2004) (Brutlag et al. 1993).

2.17 The Selection of Algorithm

In the process of finding suitable algorithms, various papers of different search algorithms have

been researched and looked into. Among the algorithms being studied are Apostolico-Giancarlo

algorithm(APOSTOLICO & GIANCARLO 1986), Horspool algorithm(HORSPOOL 1980),

23

Knuth-Morris-Pratt algorithm(KNUTH, MORRIS & PRATT 1977) and Brute Force algorithm.

The algorithms however, were found to be not suitable as these few algorithms are exact

algorithms. Exact algorithms were unable to detect spam signatures that were slightly modified.

In further research, it was found that algorithms for sequence comparison in bioinformatics have

the same characteristics for the sequence detection. In bioinformatics, algorithms were designed

to detect strings of genome that were slightly different from one another as a cause of insertion,

deletion and substitution. These approximate characteristics make algorithms used in

bioinformatics research very suitable to be applied in the detection of slightly modified spam

signatures.

There are algorithms in bioinformatics that were less sensitive in detection with less

complexity in computations like BLAST (ALTSCHUL et al. 1990) and higher sensitivity

algorithms like Smith-Waterman and Needleman-Wunsch that require high computations and

more complex steps (Nash,Blair & GREFENSTETTE 2001). Less sensitivity algorithms were

less accurate even though they were faster because of lower computations. Higher sensitivity

algorithms require more computation and are more challenging to develop because of the

complex steps involved. Despite the complexity and high demand of high sensitivity algorithms,

attempt to take on bigger challenge were made in this research. Instead of choosing one

algorithm that is highly sensitive, two algorithms were being implemented. It should be noted

however, that this research doesn‟t implement this two algorithms together as a combined single

unit. The milestone of this research is to implement the algorithms one at a time on the same

environment of hardware system. The possibility to combine both algorithms together will be

looked into in the future.

24

2.18 Calculations of Needleman-Wunsch Algorithm

Needleman-Wunsch algorithm consist of 3 parts calculations which is (i) computing the score,

(ii) computing the matrix table and (iii) performing the traceback. In step (i), two strings

sequence of characters that are going to be tested S1 “via1gra” and S2 “viagra” will be compared

to get the match and mismatch score. If there is a match a score of 1 will be given and 0 for a

mismatch.

76 69 61 67 72 61

76

69

61

31

67

72

61

0 0 0 0 0 0 0

0 1 0 0 0 0 0

0 0 1 0 0 0 0

0 0 0 1 0 0 1

0 0 0 0 0 0 0

0 0 0 0 1 0 0

0 0 0 0 0 1 0

0 0 0 1 0 0 1

S2 v i a g r a

ASCII

S1

v

i

a

1

g

r

a

i

j

Figure 2.3 The computed score table of Needleman-Wunsch.

Based on the score table, the main matrix table is calculated using the formula,

E i , j = max

E i – 1 , j-1 + score i-1 , j-1

E i , j-1 + W

E i – 1 , j + W

{

25

In this research, a value of 0 is used for the W gap.

76 69 61 67 72 61

76

69

61

31

67

72

61

0 0 0 0 0 0 0

0 1 1

0

0

0

0

0

0

S2 v i a g r a

ASCII

S1

v

i

a

1

g

r

a

i

j

E i , j = max

E i – 1 , j-1 + score i-1 , j-1

E i , j-1 + W

E i – 1 , j + W

{

Figure 2.4 The calculation of a cell in the matrix table of Needleman-Wunsch algorithm.

76 69 61 67 72 61

76

69

61

31

67

72

61

0 0 0 0 0 0 0

0 1 1 1 1 1 1

0 1 2 2 2 2 2

0 1 2 3 3 3 3

0 1 2 3 3 3 3

0 1 2 3 4 4 4

0 1 2 3 4 5 5

0 1 2 3 4 5 6

S2 v i a g r a

ASCII

S1

v

i

a

1

g

r

a

i

j

Figure 2.5 The completed calculations of the matrix table of Needleman-Wunsch algorithm for

viagra and via1gra strings.

26

Upon completing the matrix table, traceback is being performed starting from the bottom

rightmost of the cell to the upper left most of the cell using the formula Max{ Ei-1, j-1+Si,j, Ei,j-1,

Ei-1,j }. Traceback is performed to get the results from the table. The white spaces in Figure 2.6

shows the traceback performed starting from value of 6 to value 0 of the matric table.

76 69 61 67 72 61

76

69

61

31

67

72

61

0 0 0 0 0 0 0

0 1 1 1 1 1 1

0 1 2 2 2 2 2

0 1 2 3 3 3 3

0 1 2 3 3 3 3

0 1 2 3 4 4 4

0 1 2 3 4 5 5

0 1 2 3 4 5 6

S2 v i a g r a

ASCII

S1

v

i

a

1

g

r

a

i

j

Figure 2.6 The traceback being performed on the matrix table of Needleman-Wunsch algorithm.

Based on the traceback, it produced the result of :

v i a 1 g r a

| | | | | |

v i a _ g r a

The white spaces in Figure 2.6 shows the path of the traceback being performed. Based on the

result produced, it produced six matches even though the string S1 has been inject with an extra

character to produce via1gra.

27

2.19 Calculations of Smith-Waterman Algorithm

Like Needleman-Wunsch algorithm, the calculations of Smith-Waterman algorithm also consists

of 3 parts involving computing score, building the matrix table and performing traceback. For the

computing score segment, a score of +2 is given if there is a match and -1 for a mismatch.

61 70 68 72 6F 64

61

70

68

72

6F

64

69

0 0 0 0 0 0 0

0 2 -1 -1 -1 -1 -1

0 -1 2 -1 -1 -1 -1

0 -1 -1 2 -1 -1 -1

0 -1 -1 -1 2 -1 -1

0 -1 -1 -1 -1 2 -1

0 -1 -1 -1 -1 -1 2

0 -1 -1 -1 -1 -1 -1S

2 a p h r o dASCII

S1

a

p

h

r

o

d

i

i

j

31

0

-1

-1

-1

-1

-1

-1

-1

73

0

-1

-1

-1

-1

-1

-1

-1

69

0

-1

-1

-1

-1

-1

-1

2

40

0

-1

-1

-1

-1

-1

-1

-1

63

0

-1

-1

-1

-1

-1

-1

-1

73 0 -1 -1 -1 -1 -1 -1 -1 2 -1 -1 -1

69 0 -1 -1 -1 -1 -1 -1 -1 -1 2 -1 -1

61 0 2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1

63 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2

s

i

a

c

1 s i @ c

Figure 2.7 The score table of Smith-Waterman algorithm.

Using the score table as in Figure 2.7, the matrix table is built using the formula shown below.

The completed table is shown in Figure 2.9.

28

61 70 68 72 6F 64

61

70

68

72

6F

64

69

0 0 0 0 0 0 0

0 2 2 2 2 2 2

0 2 4 4 4 4 4

0 2 4 6 6 6 6

0 2 4 6 8 8 8

0 2 4 6 8 10 10

0 2 4 6 8 10 12

0 2 4 6 8 10 12S

2 a p h r o dASCII

S1

a

p

h

r

o

d

i

i

j

31

0

2

4

6

8

10

12

12

73

0

2

4

6

8

10

12

12

69

0

2

4

6

8

10

12

14

40

0

2

4

6

8

10

12

14

63

0

2

4

6

8

10

12

14

73 0 2 4 6 8 10 12 12 14 14 14 14

69 0 2 4 6 8 10 12 12 14 16 16 16

61 0 2 4 6 8 10 12 12 14 16 16 16

63 0 2 4 6 8 10 12 12 14 16 16 18

s

i

a

c

1 s i @ c

Figure 2.8 The computed matrix table of Smith-Waterman algorithm.

After the table is computed, the traceback is performed starting from the highest value in the

matrix table until it reaches the neighbouring cells that are zero in value.

E i , j = max { 0

E i – 1 , j-1 + score i-1 , j-1

E i , j-1 + W

E i – 1 , j + W

29

61 70 68 72 6F 64

61

70

68

72

6F

64

69

0 0 0 0 0 0 0

0 2 2 2 2 2 2

0 2 4 4 4 4 4

0 2 4 6 6 6 6

0 2 4 6 8 8 8

0 2 4 6 8 10 10

0 2 4 6 8 10 12

0 2 4 6 8 10 12

S2 a p h r o d

ASCII

S1

a

p

h

r

o

d

i

i

j

31

0

2

4

6

8

10

12

12

73

0

2

4

6

8

10

12

12

69

0

2

4

6

8

10

12

14

40

0

2

4

6

8

10

12

14

63

0

2

4

6

8

10

12

14

73 0 2 4 6 8 10 12 12 14 14 14 14

69 0 2 4 6 8 10 12 12 14 16 16 16

61 0 2 4 6 8 10 12 12 14 16 16 16

63 0 2 4 6 8 10 12 12 14 16 16 18

s

i

a

c

1 s i @ c

Figure 2.9 The traceback being performed on the computed matrix table of Smith-Waterman

algorithm.

2.20 FPGA

Hardware circuits are generally divided into two groups which are ASIC and FPGA. ASIC or

application-specific integrated circuit is circuits that are produced for general-purpose use and

targeted for mass production. When mass produced, ASIC tend to cost lower per unit of chip

because of low recurring cost. However, the drawback of ASIC is that it requires huge non-

recurring cost for design. Once the circuits are burned into chips, no modifications could be

made to it (JOAN 2009). Therefore, during the design stage, it require careful design and testing

as any unforeseen error detected after the production could made all the chips that have already

30

been produced to be defective. FPGA or also known as field-programmable gate array on the

other hand, offers the feature of re-programmability of the chip. After the circuits are designed

and programmed into FPGA, the designer could made modifications and reload the new design

into the hardware FPGAs. The re-programmability of FPGA made it to be more suitable for

prototypes design testing and development. FPGA also didn‟t have huge non-recurring cost

(XILINX 2010).

2.21 FPGA Platforms

To date, Xilinx is still the market leader for FPGA followed by Altera. As the founder of FPGA

technologies, Xilinx lead with more than 50 percent of the market share. The two FPGA

companies dominate the FPGA market while smaller companies like Silicon Blue, Achronix,

Tabula, Actel, Lattice, Abound (M2000), Tier Logic and others make their entries and exits from

time to time.

Xilinx have been actively rolling out new development boards throughout the years.

From every new products there are improvements in terms of higher capacity, lower power

consumption, higher speed, better throughput and much more. The variants of Xilinx

development boards are divided to Spartan and more powerful Virtex category. From older to

newer models, in the Spartan category there is Spartan-3E, Spartan-3A Spartan-6. In Virtex,

there are Virtex-4, Virtex-5 and Virtex-6 models. Each models are further broken down to

variants that are customized to suit different needs.

Like Xilinx, Altera too were actively rolling out their own products as well. For low cost

FPGAs categories, there are Cyclone II and Cyclone III. Altera also offered Arria GX for low

31

cost with transceivers FPGA. For high-end FPGAs, Altera has Stratix, Stratix II, Stratix III and

Stratix IV as their products line up.

2.22 Future Demand

The demand of FPGAs has been increasing over the years. FPGAs have been gradually

displacing ASICs and widely available off-the-shelf ASSPs (Application Specific Standard

Product). FPGAs is widely used in various areas such as defense, aerospace, broadcasting, wired

and wireless communications, automotive, industries, medical devices and scientific research.

2.23 Parallelism

Parallelism has been widely researched and implemented in various implementations to speed up

calculation speed of algorithms. There are two ways of applying parallelism of which one by

using multiple units of hardware to work together and the other by applying parallelism inside

the hardware itself. For example with Pentium 4 processor powered computers, parallelism could

be implemented by using 4 single core of Pentium processors on the same motherboard of the

server. Or else, it could be implemented by implementing a unit of quad-core Pentium processor

inside the server. Parallelism inside the hardware itself provide better advantage in terms of

reducing cost, smaller size, lower power consumption, less heat dissipation and increased

performances as a result of shorter communication distances. Parallelism is applied in the

implementations of hardware for Needleman-Wunsch and Smith-Waterman algorithm in this

research. Instead of using a single processing element, the two hardwares adopt multiple units of

32

processing elements working concurrently. This helps to reduce the table processing speed from

mn to m+n with m and n the length of the two sequences being compared.

33

Chapter 3

Design Methodology

This chapter is dedicated to explain the overall implementation of the Needleman-Wunsch and

Smith-Waterman spam or unsolicited commercial email inline filter. The two systems need to

undergo various steps of development before becoming a reality. The software and hardware

applications and tools used for the design will also be described in this chapter.

3.1 Flow Diagram

Implement and configure the FPGAs with

Microblaze and all the hardware units except

the side_relay and the algorithm IP

Build and code the web server application

in C to run on the Microblaze

Import the design into ISE.

Insert the Chipscope Definintion and Connection file and

system .ucf file to the hardware in ISE for monitoring purpose.

Attach 2 FSL interface to the Microblaze. One for

side_relay and the other for the algorithm IP.

Design, develop,simulate and test

the side_relay unit in VHDL.

Design, develop, simulate and test the

algorithm IP of Needleman-Wunsch

and Smith-Waterman in VHDL.

Update the C code of the application

running on Microblaze to include

operations to handle the two new hardware

Perform testing using the

complete FPGA system

environments.

Figure 3.1 The stages involved in designing the inline filter.

34

Send emails to the FPGA

system web server via a TCP

client application on the

desktop.

FPGA system environments

integrated with Needleman-

Wunsch hardware on

standby.

Done

Start

System on

standby

NoYes

FPGA system

perform scanning

Output

results

Power down

End

Send emails to the FPGA

system web server via a TCP

client application on the

desktop.

FPGA system environments

integrated with Smith-

Waterman hardware on

standby.

Done

Start

System on

standby

NoYes

FPGA system

perform scanning

Output

results

Power down

End

Figure 3.2 Illustrates the operations of the FPGA systems.

3.2 Overall Architecture

The overall connections of the inline filter are being shown in Figure 3.3 below. The Xilinx

Development platform that are being used is connected to three cables which is the crossover

35

Ethernet cable, null modem cable and JTAG Boundary Scan or IEEE/ANSI standard

1149.1_1190 cable.

Xilinx Development tools for hardware

and software development and download

JT

AG

ca

ble

(do

wn

loa

d

.bit a

nd

.e

lf to

th

e b

oa

rd)

Debugging Terminal

Nu

ll M

od

em

ca

ble

(co

nn

ecte

d to

RS

23

2 o

f th

e d

eve

lop

me

nt b

oa

rd

for

de

bu

gg

ing

an

d a

na

lysis

)

TCP Client

Crossover cable(Send email) Xilinx

Development

Board

Figure 3.3 The overall design of the architecture.

36

3.2.1 Crossover Ethernet Cable

The computer connects to the TCP server using the crossover cable. With simple TCP client

software installed in the computer, emails are sent to the Xilinx Development Board for scanning

purpose through this Ethernet cable. The crossover Ethernet cable is connected to the Ethernet

port on the computer and on the Xilinx Development Board.

3.2.2 Null Modem cable

The Xilinx Development Board model that is being used is ML505 LX50T FFG1136 and this

board contains one male DB-9 RS232 serial port. This port can be used to communicate and

transfer serial data to other devices. The port is designed to operate at a speed of up to 115200

Bd. A null modem cable is required to connect the serial port on the Xilinx board to the RS232

serial port on the computer. Using serial port (COM) terminal emulation software like

HyperTerminal or Terminal, user could view the output from the Xilinx board at the computer.

This could help user to get information of the current status of the design in FPGA and for

debugging purpose.

37

3.2.3 JTAG Cable

JTAG cable is used to program the FPGA on the Xilinx Development Board with hardware and

software created by user. User could create the bitstreams (.bit) and executable and linkable

format(.elf) file in the computer using relevant development tools and then download it to the

FPGA. The JTAG cable could also be used for debugging using software like Xilinx Chipscope

Pro or Xilinx Platform Studio SDK.

3.3 Microblaze

Microblaze is a soft-core RISC (Reduced Instruction Set Computer) processor. It is developed by

Xilinx to run embedded applications in FPGA. Microblaze is equipped with 32-bit address, data

buses, instruction word and registers. Based on Harvard architecture, Microblaze have PLB v46,

FSL bus and LMB (Local Memory Bus). The number of Microblaze processor that could be

implemented into the FPGA depends on the capacity of the FPGA itself. The Microblaze Debug

Module (MDM) however, could accommodate the debugging of up to 8 Microblaze processors

at a time. Microblaze is highly customizable in that we could choose what IP that are needed to

work with the processor (Embedded Systems Development, 2008).

38

3.4 Microblaze Hardware Design

The Needleman-Wunsch and Smith-Waterman IP are designed to work with the Xilinx

Microblaze processor using Fast Simplex Link (FSL) as the bus interface. As Microblaze is a

programmable processor created to work in FPGAs, it is highly customizable. The

microprocessor could be created and modified using Xilinx ISE Design Suite.

In Figure 3.4, the diagram shows how the Microblaze processor, the algorithm IP and the

IO hardware interconnect with one another in FPGA. The soft IP xps_ethernetlite, xps_uartlite

and xps_sysace connect to the real hardware of Ethernet_MAC, RS232_Uart and

SysACE_CompactFlash. Microblaze core connect to the BRAM using its own bus which is

Data-Side Local Memory Bus (DLMB) and Instruction-Side Local Memory Bus (ILMB). The

Microblaze core has one FSL interface connected to the side relay and two connected to the

algorithm IP. The design can be created using Xilinx Platform Studio (XPS). XPS allow users to

set the parameter for each block of hardware and helps to configure the algorithm IP and the

side_relay to connect to the Microblaze core using FSL interface.

39

MPMC MODULE INTERFACE

xps_ethernetlite

Ethernet_MAC

xps_uartlite

RS232_Uart

xps_sysace

SysACE_CompactFlash

xps_intc

xps_intc_0

xps_timer

xps_timer_1

mb

_p

lb

microblaze

Mdm

Debug_module

bram_block

lmb_bram

test_nw or

test_sw

side_relay

dlmb ilmb

microblaze_0_dbg

fsl

fsl

fsl fsl

clock_generator

clock_generator_0

proc_sys_reset

proc_sys_reset_0

dxcl ixcl

Figure 3.4 The design of the Microblaze and how it interconnects with the inline filter. The

design are located in the FPGA of the Xilinx Development board.

3.5 Microblaze Software Design

To run the Microblaze, software need to be created and downloaded to the processor. Microblaze

supports C programming language and could be run based on various Real Time Operating

40

System (RTOS). For this design, Xilkernel 4.0 is choosen for the development. Xilkernel is a

RTOS available for free in XPS created by Xilinx. Xilkernel is highly customizable, robust and

small in size. There are also other third party RTOS available in the internet that could work with

Microblaze. Beside the Xilkernel RTOS the design involves the usage of other library like Xilinx

Memory File System (xilmfs), LibXil FATFile System (Xilfatfs) and the open source lightweight

IP (lwIP).

Initialize main thread

Start Echo application thread

Create network thread

Set IP and MAC address

Initialize socket

Bind

Listen

Connection

requested?Accept connection

Initialize LWIP

Start packet receive thread

yes

TCP Client

Request connection

Accept connection

Figure 3.5 The flow of the software in Microblaze once the design is started.

41

Process echo request

Change the data in memory

to lower character

Call function read_file

Perform scanning of data

from ethernet and flash

database

Read data/email

from ethernet to

memory

Read the spam

signature inside

database flash

into memory

Generate and

print result

Accept connection

TCP Client

Read data/email

Hello Jack. How

are you? I’m….

hello jack. how

are you? i’m….

To lower case

Figure 3.6 The flow of software after a connection is accepted. A new thread is created and

certain function will be called to perform the relevant task.

Figure 3.5 and Figure 3.6 shows the flow of the software designed to run on the Microblaze

processor. The software acts as a TCP server and constantly listens for connection request. Once

connection is accepted, a new thread is created to handle the connection. The TCP server will

then went back to listening mode. The new thread that is created will then perform the functions

42

as in Figure 3.6. The scanning function of Figure 3.6 could be further broken down to what is

shown in Figure 3.7.

Perform scanning of data

from ethernet and flash

database

Tokenize the email

Send tokenized email to FSL

master of the algorithm IP

Read result from FSL slave

of the algorithm IP

Operation

complete?

Generate and

print result

Yes

No

Send signature to FSL master

of the side_relay

hello jack. how

are you? i’m….

Tokenization[hello] [jack.] [how]

[are] [you?] [i’m]….

Figure 3.7 Further elaboration of the scanning function is shown in this figure.

43

3.6 Software and Hardware Development Applications

Main software that are being used to develop the system will be further explained below. There

are several types of software involved in developing the systems.

3.6.1 Xilinx ISE Design Suite 10.1

The ISE design suite which comprise of Xilinx ISE, EDK and Chipscope Pro are being used in

the development of the system.

3.6.2 Xilinx ISE 10.1(Xilinx Integrated System Environment)

This is the software used to develop the IP (intellectual property) engine of Needleman-Wunsch,

Smith-Waterman and the side_relay unit. The tool could support both VHDL and Verilog

hardware description language. For this system development, VHDL was chosen as the

development language. The ISE tool provides the ISE simulator to perform behavioral and post-

route simulation. User could also choose to use other simulator such as Modelsim created by

other third party vendor when developing their IP engine. Besides being used to develop the

44

VHDL IP engine, Xilinx ISE could also be used to import the Microblaze block from XPS to

connect to the VHDL system and integrate the Chipscope functionality into the ISE project.

3.6.3 Xilinx XPS 10.1(Xilinx Platform Studio)

Xilinx Platform Studio are the tools used to develop and configure the Microblaze processor.

Using the Base System Builder wizard (BSB), user could choose and customize what they want

in their Microblaze block. Among the options that could be chosen by the user in BSB are:

-the target development board used by the design.

-the type of processor used, either PowerPC or Microblaze.

-the processor bus clock frequency, data and instruction BRAM.

-the IO devices that will be used.

-the sample application for the device.

-memory device to hold the simple Memory Test and Peripheral Selftest application of

Microblaze.

-standard input, output and boot memory for the devices.

By using the BSB, the wizard would helps to configure the UCF (User Constrained File) to

connect to the hardware. The Xilinx Platform Studio could support the creation of simple

software or application to run on the processor. It also provides Xilinx Microprocessor Debugger

45

(XMD) and GNU debugger for software debugging purposes. For the development of more

complicated applications, users could use XPS SDK software.

3.6.4 Xilinx XPS SDK 10.1(Xilinx Platform Studio Software Development Kit)

The XPS SDK are used to develop and debug more complex applications. Using XPS SDK,

users could connect to the hardware design that are generated and downloaded into the

development board. After connecting to the design in the board, user could then create their

application in C programming language and debug before generating the .elf file and download it

to the board. The process of debugging and downloading the application will be repeated in the

duration of the software development.

3.6.5 Xilinx ISE Simulator

The ISE simulator is being used during the development of the VHDL system. Using ISE

simulator, user could create testbench in VHDL and generate input signals into the system

created. User could then view the output signals and perform necessary modifications and

debugging.

46

3.6.6 Xilinx Chipscope Pro

Chipscope Pro software is used to monitor the real hardware signals of the engine once it is

downloaded into the FPGA of the board. Using Chipscope Pro core inserter, user could create

the necessary monitoring cores and connect it to the system design. The .bit file will then be

generated in ISE to be downloaded to the board. Once the design start running in the

development board, the Chipscope cores will start to gather signals at certain location of the

design determined by us and send the signals using JTAG cable to the Chipscope Pro Analyzer

software in the computer. Chipscope Pro is used for real time monitoring and debugging of the

design. If there are any errors encountered in Chipscope Pro Analyzer, the user will have to

return to the initial system design for corrections. Chipscope Pro could be integrated as part of

the ISE design. In Chipscope Pro, there are 4 types of core that could be integrated which is :

-IBA (Integrated Bus Analyzer)

Used to debug the IBM CoreConnect Processor Local Bus (PLB).

-ILA (Integrated Logic Analyzer)

A module that let users to view the trigger signals of the hardware design.

-VIO (Virtual Input/Output)

Helps to monitor and drive signals into the design in real-time. The VIO core could be used to

generate signals into the design and could be integrated as the permanent part of the system

design.

47

-ICON (Integrated Controller)

The ICON core helps to provide communication path to connect to other Chipscope cores. A

single ICON core could support the connections of up to 16 Chipscope cores.

3.7 Other Applications Used

This sub-chapter describes about other softwares and programs that complement the system. The

softwares were used to send inputs and read outputs from the target device for debugging and

analysis purpose.

3.7.1 Hyper Terminal

The Hyper Terminal software is being used for the purpose of debugging the software

applications that run in Microblaze. As the Xilinx development board have a debugging RS-232

serial port that supports null modem cable, it could be connected to the RS-232 serial port of the

CPU. Using the Hyper Terminal application, users could view the outputs of the software

application. This could aid in the debugging process of the software.

48

Figure 3.8 The sample coding used to display output from the serial port.

Sample coding in the XPS SDK. The coding are in C language. Note the syntax „print‟,

„xil_printf‟ are being used in the coding. The output of this statement will be displayed in the

Hyper Terminal.

3.7.2 Bray Terminal v1.9b

Bray terminal is another alternative to Hyper Terminal software.

49

3.7.3 Simple TCP Client

A simple application from bitArt that could create TCP connection from the computer installed

with it to TCP server. This application is being used to create connection to TCP server software

running on Microblaze inside the FPGA of the development board.

3.8 Programming Languages Used

During the development of the systems, two programming languages were used which are

VHDL and C programming. The sub-topic below will brief more about this.

3.8.1 VHDL

Short form for Very High Speed Integrated Circuit Hardware Description Language (VHSIC

HDL). VHDL are one of the hardware description language (HDL) widely used besides Verilog

HDL. VHDL development was initially supported by US Department of Defense in the 1980s

and were used as standard hardwares documentation by them. VHDL was standardized by

Institute of Electrical and Electronics Engineers (IEEE) in 1987 as VHDL-87. It was later

revised by IEEE as VHDL-93 and later VHDL-2001.

50

For this thesis, VHDL were being used to develop the side_relay and the algorithm IP of

Needleman-Wunsch and Smith-Waterman. The design units were developed and tested in Xilinx

ISE before being attached to Xilinx Microblaze.

3.8.2 C Programming

C programming language is a structured programming language widely used around the world to

develop applications and systems. It were used to develop an application to run on Microblaze

with Xilkernel as the operating system. This application will help to synchronize the hardware

input/output on the development board and the algorithm IP to work together. The application

that is being developed will also act as a TCP server receiving or serving the request for

connection.

3.9 Development Board

The development board that is being used as the target for system development are

XC5VLX50T. This model belongs to the Xilinx Virtex 5 FPGA family of FF1136 pin package.

It comprises of 7,200 slices of Slice Logic, 28,800 CLB FlipFlops and a maximum 480Kb of

distributed RAM. The board also has 60 blocks of 18Kb Block RAM and that amounts to

51

2,160Kb of Block RAM memories. The XC5VLX50T could support Microblaze microprocessor

running up to 150 MHz.

52

Chapter 4

Design and Implementation

This chapter explains the design of the Needleman-Wunsch and Smith-Waterman hardware. The

design involved is the IP (intellectual property) of Needleman-Wunsch, IP of Smith-Waterman

and the side supporting hardware. The chapter also explains about the implementation of

parallelism technology in the Needleman-Wunsch and Smith-Waterman IP engine.

4.1 Fast Simplex Link(FSL)

FSL or Fast Simplex Link is a fast communication bus protocol developed by Xilinx. FSL could

be used to interconnect two design units in FPGA. FSL is uni-directional and consist of a master

and a slave for each interface. The version 7 of the Microblaze processor could support a

maximum of 16 FSL channels (Embedded Systems Development, 2008). In Figure 4.1, the port

of the FSL bus is being shown. The end that is the master connects to the design unit that is

sending data. The other end that is the slave of FSL bus is connected to design unit that receives

data. FSL bus was chosen to be implemented as interface for the hardware because of simpler

design compared to Processor Local Bus (PLB) and easier to use.

53

Figure 4.1 The block diagram of a FSL bus (Fast Simplex Link (FSL) Bus (v2.11a), 2008).

Port Name Width Input to FIFO/Output from FIFO

Description

FSL_M_Clk 1 Input Used as input clock for FSL master when the FSL are set to asynchronous FIFO mode.

FSL_M_Data 32 Input Get input data from connected peripheral or Microblaze processor.

FSL_M_Control 1 Input Extra control bit.

FSL_M_Write 1 Input Controls the write of data to FSL bus. FSL bus will read the value from FSL_M_Data on rising clock edge when the FSL_M_Write are set to 1.

FSL_M_Full 1 Output Indicate that the FSL FIFO is full when set to ‘1’.

Table 4.1 The description of ports that are in the master side of the FSL bus.

Port Name Width Input to FIFO/Output from FIFO

Description

FSL_S_Clk 1 Input Act as input clock for FSL slave when the FSL are set to asynchronous mode.

FSL_S_Data 32 Output Output data from connected peripheral or Microblaze processor.

FSL_S_Control 1 Output Extra control bit.

FSL_S_Read 1 Input Used to acknowledge that data has been read. A value of ‘1’ on the rising clock edge will delete the first input value in the FSL FIFO queue.

FSL_S_Exists 1 Output When there are value in the FSL bus, it will set to ‘1’;

Table 4.2 The description of ports that are in the slave side of the FSL bus.

54

4.2 IP Design

For IP design, three blocks of hardware were created which is the side_relay hardware,

Needleman-Wunsch hardware and Smith-Waterman hardware. The three blocks were created to

be FSL bus interface compatible. Further details about the three blocks will be explained below.

4.2.1 FSL side_relay

FSL_S_Clk

FSL_S_Read

FSL_S_Data

FSL_S_Control

FSL_S_Exists

FSL_M_Clk

FSL_M_Write

FSL_M_Data

FSL_M_Control

FSL_M_Full

FS

L_

Clk

FS

L_

Rst

side_relay

Figure 4.2 The block diagram of the side_relay hardware.

The FSL side_relay unit are created to relay data signals from Microblaze processor to the

algorithm IP. As the engine of the algorithm IP require two slaves and one master FSL interface,

the adding of a peripheral with three FSL channel to one IP is not supported by the Xilinx

Platform Studio (XPS). As shown in Figure 4.3, the Microblaze only support two channel to one

IP design unit. IP design unit with three channels as on the rightmost of Figure 4.3 are not

supported even though the third channel is recognized by XPS. The necessary software library

55

driver for the third channel to operate could not be generated by the XPS. Therefore the

side_relay unit is created to address this problem.

Figure 4.3 The type of interface that could be added with Xilinx Platform Studio.

Two design units are being created in Xilinx Platform Studio (XPS) as shown in 4.4. The main

engine unit has two channel of FSL bus and side_relay unit has one FSL bus channel connected

to the Microblaze core. After creating the two units, an additional set of FSL master interface is

added to the side_relay and additional set of slave interface is added to the main engine. A new

FSL channel is created to connect the new side_relay master interface to the main engine slave

interface. Figure 4.5 shows the diagram of the reconfigured units.

56

Figure 4.4 The two design unit that are connected to the Microblaze core. One is side_relay and

the other could be Needleman-Wunsch or Smith-Waterman hardware.

Figure 4.5 The new connection after reconfiguration is being made.

As the protocol for the master and slave interface are different the side_relay unit are designed to

read data from the slave of FSL channel connected to the Microblaze core and send the data to

the master of FSL channel connected to the algorithm IP. The protocol and sequence of the

signals that are sent to the master interface connected to the algorithm IP must be exactly the

same as the protocol and sequence of master interface of the FSL channel that connect

57

Microblaze core to side_relay. The state of the side_relay hardware unit is shown in Figure 4.6

below.

Read data from

FSL1_S_Data

Write output to

FSL_M_Data

Figure 4.6 State diagram of the side_relay unit.

4.2.2 Needleman-Wunsch and Smith-Waterman Hardware

This section explains about the design of the Needleman-Wunsch and Smith-Waterman

hardware. Both of the hardware shares the same design approach and use the same hardware

design for side_relay hardware unit. For the main IP unit of Needleman-Wunsch and Smith-

Waterman, they were designed to consist of 3 main blocks of hardwares which are test_nw,

array_proc and processing_element for Needleman-Wunsch and test_sw, array_proc and

processing_element for Smith-Waterman.

58

4.2.2.1 Hardware test_nw and test_sw

The hardware IP, which is test_nw and test_sw are created to act as interfacing controller that

manage the communication between the FSL bus and the array_proc. It reads the data from the

slave interface of the FSL bus and transmits calculated results back to Microblaze processor via

the master FSL interface.

FSL_S_Clk

FSL_S_Read

FSL_S_Data

FSL_S_Control

FSL_S_Exists

FSL_M_Clk

FSL_M_Write

FSL_M_Data

FSL_M_Control

FSL_M_Full

FS

L_

Clk

FS

L_

Rst

test_nw/

test_sw

FSL1_S_Clk

FSL1_S_Read

FSL1_S_Data

FSL1_S_Control

FSL1_S_Exists

Figure 4.7 The port interface of the algorithm IP.

Five extra ports are added to the algorithm IP which is FSL1_S_Clk, FSL1_S_Read,

FSL1_S_Data, FSL1_S_Control and FSL1_S_Exists. The five ports are added to the design to

connect to the slave of FSL channel from the side_relay unit.

59

The test_nw Needleman-Wunsch and test_sw Smith-Waterman are designed to operate on 5

states. At the beginning, the algorithm will be in Idle state. Once there are signal trigger from the

FSL bus connecting to the side_relay, the IP will jump to the Read Data state. After receiving a

fixed amount of data from side_relay FSL channel, the algorithm IP will jump state to read a

fixed amount of data from the slave of another FSL channel. The algorithm IP will then jump to

Delay state to allow sufficient time for the IP to process the data. Next, the IP will jump to Write

Output state and write the result to the master interface of the FSL channel connected to it before

returning to Read Data state. Figure 4.8 shows how the flow of the algorithm IP.

60

Idle

Read data from

FSL1_S_Data or

side_relay FSL

channel

Read data from

FSL_S_Data or

algorithm IP FSL

channel

Delay state for

processing

Write output to

FSL_M_Data

Delay state for

processing

Compute table

Perform

traceback

Output result

Figure 4.8 The flow chart of the Needleman-Wunsch test_nw and Smith-Waterman test_sw IP.

On the right is the further breakdown of the Delay state.

61

Compute table

PE required = 1 Activate PE 1

Activate PE 1 and PE 2

Activate PE 1 to PE 3


PE required = 2

PE required = 3

PE required = 4

PE required = 5

PE required = 6

PE required = 7

PE required = 8

PE required = 9

PE required = 10

PE required = 11

PE required = 12

PE required = 13

Table

completed?



Activate PE 1 and PE 7







No

Yes

Perform traceback

Figure 4.9 Further breakdown of the compute table in the flow chart.

The Figure below shows how the two systems of the Needleman-Wunsch or Smith-Waterman

hardware are connected to the side_relay hardware and Microblaze processor. As two systems

were built, the algorithm IP is test_nw for Needleman-Wunsch algorithm in one system and

test_sw for Smith-Waterman algorithm in another system.

62

Microblaze

FS

L I

nte

rfac

ealgorithm IP

side_relay

FSL_S_Clk

FSL_S_Read

FSL_S_Data

FSL_S_Control

FSL_S_Exists

FSL_M_Clk

FSL_M_Write

FSL_M_Data

FSL_M_Control

FSL_M_Full

FSL_S_Clk

FSL_S_Read

FSL_S_Data

FSL_S_Control

FSL_S_Exists

FSL_M_Clk

FSL_M_Write

FSL_M_Data

FSL_M_Control

FSL_M_Full

FS

L1

_S

_C

lk

FS

L1

_S

_R

ead

FS

L1

_S

_D

ata

FS

L1

_S

_C

on

tro

l

FS

L1

_S

_E

xis

ts

FS

L_

Clk

FS

L_

Rst

FS

L I

nte

rfac

eF

SL

Inte

rfac

e

FSL Interface

FSL_Clk

FSL_Rst

test_nw or test_sw

Figure 4.10 The interconnection of side_relay, algorithm IP and the Microblaze core.

63

4.2.2.2 Hardware array_proc

FSL_Exist

FSL_Data

FSL1_Exist

FSL1_Data

permit_enter

FSL_Data_keluar

Clk

Rst

array_proc

Figure 4.11 The port interface of the array_proc hardware.

Port Name Width Input/Output Description

Clk 1 Input Input of global clock for the array_proc. Mapped from the FSL_Clk

Rst 1 Input Used to reset the peripherals connected to it.

FSL_Exist 1 Input Mapped from FSL_S_Exists. Indicate that there is/are data/s in the FSL bus.

FSL_Data 32 Input Mapped from FSL_S_Data. Contains the data output of the FSL bus.

FSL1_Exist 1 Input Mapped from FSL1_S_Exists. Indicate that there is/are data/s in the FSL bus.

FSL1_Data 32 Input Mapped from FSL1_S_Data. Contains the data output of the FSL bus.

FSL_Data_keluar 32 Output Mapped from FSL_M_Data. Send input data to the FSL bus.

permit_enter 1 Input Mapped from sig_permit_enter.

Table 4.3 Port description for the array_proc hardware.

Hardware array_proc are mapped inside test_nw and test_sw. The array_proc were created

differently for the two systems. The function of the array_proc is to receive value from test_nw

or test_sw, store the matrix table, build the matrix table by sending values and receiving from the

processing_element. Once the matrix table is completed, both systems will perform different

64

traceback according to its own algorithms and send the results back to test_nw or test_sw. The

array_proc top level block diagram and its IO port description were displayed above in Figure

4.11 and Table 4.3.

4.2.2.3 Hardware processing_element

comp1

comp2

diagonal_value

up_value

left_value

ctr_

in

ctr_

out

d_value

processing_element

Figure 4.12 The port interface of the processing_element hardware.

Port Name Width Input/Output Description

comp1 32 Input Receive a character the string received from database.

comp2 32 Input Receive a character the string received from email content.

diagonal_value 8 Input Receive a value from the diagonal position of the matrix table.

up_value 8 Input Receive a value from the upper position of the matrix table.

left_value 8 Input Receive a value from the left position of the matrix table.

ctr_in 1 Input When set to ‘1’, the processing_element will read the value from “comp1”, “comp2”, “diagonal_value”, “up_value” and “left_value”.

ctr_out 1 Input When set to ‘1’, the processing_element will output the calculation result via “d_value”.

d_value 8 Output Output calculated result.

Table 4.4 Port description for the processing_element hardware.

65

The processing_element blocks are the processing element for Needleman-Wunsch and Smith-

Waterman algorithm. It contains the formula to perform calculations for the matrix table. The

table and diagram above describe about the structure and port description for the

processing_element. The formula for Needleman-Wunsch algorithm is as below:

The formula for Smith-Waterman algorithm is as below:

E i , j = max

E i – 1 , j-1 + score i-1 , j-1

E i , j-1 + W

E i – 1 , j + W

{

E i , j = max { 0

E i – 1 , j-1 + score i-1 , j-1

E i , j-1 + W

E i – 1 , j + W

66

The algorithm IPs are designed in three blocks with the outermost layer act as the interfacing to

the FSL channel. The middle layer or the array_proc act as the controller with memories to be

used during the processing. The middle layer also functions as the storage of strings during

processing and perform traceback at the end of the process. The processing element contains the

formula for the algorithm and receives five inputs from array_proc to process. Figure 4.13 shows

the mapping of the design. The RTL schematic for the algorithm IP are shown in Figure 3.15. As

in Figure 3.15, the location of array_proc was circled. The RTL schematics for array_proc are

included in the appendix as it is too large to be displayed here.

test_nw or test_sw

array_proc

FSL Master

FSL Slave

FSL Slave

processing_element1

processing_element13

processing_element2

increment

Figure 4.13 The mapping of the VHDL block of the algorithm IP.

67

4.3 Parallelism

As with other dynamic programming algorithms, Needleman-Wusnch and Smith-Waterman

required a large amount of processing to compute the matrix table. Both the algorithms were

designed to compare two strings that have a maximum of 13 characters long. That means each

time the IP of Needleman-Wunsch or Smith-Waterman received two strings of characters, it

were required to calculate 13X13 times. As shown in the Figure 4.14 below, the IP have to

calculate 169 times to complete the table. That calls for the integration of parallel processing

technology. Instead of using one processing element to perform calculations, the advantage of

VHDL programming language is fully utilized to produce 13 units of processing element being

integrated to a single IP. The calculation time of both IP has been reduced significantly to 13+13

times. The calculation cycle is reduced from 169 to 26 which means that the calculation time of

the table for parallel system is reduced to less than one sixth compare to single processing

element system. Figure 4.15 below shows the execution of the parallel system. Parallel system

calculates in anti-diagonal way from top-left to the bottom-right. The maximum values that are

calculated concurrently is 13 and therefore, requiring 13 processing elements.

68

0 00 0 0 0 0 0 0 0 0 0 0 0

1 1

2

0 1 1 1 1 1 1 1 1 1 1 1 1

0 1 2 2 2 2 2 2 2 2 2 2 2 2

3 3

4

0 1 2 3 3 3 3 3 3 3 3 3 3

0 1 2 3 4 4 4 4 4 4 4 4 4 4

6

50 1 2 3 4 5 5 5 5 5 5 5 5

0 1 2 3 4 5 6 6 6 6 6 6 6

5

6

7 7

8

0 1 2 3 4 5 6 7 7 7 7 7 7

0 1 2 3 4 5 6 7 8 8 8 8 8 8

0 1 2 3 4 5 6

0

0

0

S2 1 2 3 4 5

S1

1

2

3

4

5

6

7

i

j

8

9

10

11

7 8 9 10

11

12

13

0

013

12

6

Figure 4.14 The flow of execution for a single processing element system.

Process 1

Process 2

Process 3

Process 4

Process 5

Process 6

Process 7

Process 8

Process 9

Process 10

Process 11

Process 12

Process 13

PE 1

PE 2

PE 3

PE 4

PE 5

PE 6

PE 7

PE 8

PE 9

PE 10

PE 11

PE 12

PE 13

00 0 0 0 0 0 0 0 0 0 0 0

1

2

0 1 1 1 1 1 1 1 1 1 1 1 1

0 1 2 2 2 2 2 2 2 2 2 2 2

0 1 2 3 3 3 3 3 3 3 3 3 3

0 1 2 3 4 4 4 4 4 4 4 4

0 1 2 3 4 5 5 5 5 5 5

0 1 2 3 4 5 6 6 6 6

0 1 2 3 4 5 6 7 7

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6

0 1 2 3 4 5

0 1 2 3 4

0 1 2 3

S2 1 2 3 4 5

S1

1

2

3

4

5

6

7

i

j

8

9

10

11

7 8 9 10

11

12

13

0 1 213

12

6

Figure 4.15 The flow of execution for multiple processing element system in the middle of the

matrix table computation.

69

To control 13 processing elements in VHDL, 13 processes were created in the architecture of

array_proc. The processes were mapped to the processing elements that were declared as

components in the coding. When instructed by the main process, the process will activate the

processing elements as needed. The advantages of parallel processing of VHDL programming

language were utilized in the system. The pseudo code below demonstrates how the process in

the parallel hardware works in VHDL. Below is the table that shows the coordinates of the

values receive for diagonal, upper and left of the hardware unit.

With n = 1 to 13

process_table_n : process( Clk ) begin if Clk'event and Clk = '1' then if ( condition ) then : : : sig_diagonal_value(0 to 3) <= diagonal value ; sig_up_value(0 to 3) <= upper value ; sig_left_value(0 to 3) <= left value ; end if; end if; end process ;

Diagonal Upper Left

Process 1 i-1, j-1 i-1, j i, j-1

Process 2 i, j-2 i, j-1 i+1, j-2

Process 3 i+1, j-3 i+1, j-2 i+2, j-3

Process 4 i+2, j-4 i+2, j-3 i+3, j-4

Process 5 i+3, j-5 i+3, j-4 i+4, j-5

Process 6 i+4, j-6 i+4, j-5 i+5, j-6

Process 7 i+5, j-7 i+5, j-6 i+6, j-7

Process 8 i+6, j-8 i+6, j-7 i+7, j-8

Process 9 i+7, j-9 i+7, j-8 i+8, j-9

70

Process 10 i+8, j-10 i+8, j-9 i+9, j-10

Process 11 i+9, j-11 i+9, j-10 i+10, j-11

Process 12 i+10, j-12 i+10, j-11 i+11, j-12

Process 13 i+11, j-13 i+11, j-12 i+12, j-13

Table 4.5 The coordinates of values received by diagonal, upper and left registers.

71

Chapter 5

Results, Simulations, Analysis, and Testing

This chapter presents various synthesis results and post-route simulations for co-processors and

its components created using VHDL programming language in Xilinx ISE. It also includes real-

time hardware signals sampling using Chipscope Pro software tool. The chapter will then

continue with explanations about testing done using the designs.

Figure 5.1 The diagrams of the connections between the Microblaze core, the side_relay unit and

the main engine or algorithm IP. Each end of the FSL bus channels are labeled with numbers

from 1 to 8.

72

5.1 Development of side_relay Unit

The side_relay unit was designed as a unit to relay signals from the Microblaze core to the main

engine of algorithm IP. Figure 5.1 shows the interconnection of these three units. As the

side_relay unit receives signals from the FSL bus channel labeled with number 2, it has to

reproduce signals to be sent via the master interface in number 3. One thing need to be noted

that, signals reproduced in number 3 are not a mirror to what is received at interface number 2.

But rather, it is a mirror of what is produced by interface number 1. The signals protocol for the

master and slave interface are different because write operations are performed at the master

interface and read operations are performed at slave interface. The side_relay unit relay the

signals in a way that the signals received at interface number 4 are the same as the signals

received at interface number 2. The side_relay unit are also designed to stop relaying once the

port FSL_M_Full of the FSL channel connecting 3 and 4 signals ‟1‟ and continue when

FSL_M_Full=‟0‟.

73

Figure 5.2 Resource utilization and the timing summary of the side_relay unit.

Figure 5.2 shows the resource utilization of the side_relay hardware. Resource utilization are a

good reference to make sure that the design fulfill the speed requirement and the FPGA has

enough resources to implement the design. The maximum frequency of the design is clocked at

around 848Mhz, making it stable to run with Microblaze processor that are set to 125Mhz.

5.2 Developing the Needleman-Wunsch Algorithm IP Unit

Figure 5.3(a) below shows the resource utilization of Needleman-Wunsch co-processor. The

design used around one third of the VLX50T resources. Based on Figure 5.3(b), of the 2,118

number of slice registers used, mostly are consumed by array_proc block totalling 2,713 slices.

74

The design also used up 8,262 slices of LUTs or 28% of the maximum available units for the

Virtex 5 board. From this amount 7,993 units were used by array_proc while the remaining for

others. The array_proc used a large amount of resources because it stores the 3 dimensional array

of the matrix table which is very large. Besides, it were required to control 13 processing element

concurrently. The array_proc also contain the part algorithms used to perform the required

traceback for Needleman-Wunsch.

Figure 5.3(a) Resource utilization of Needleman-Wunsch IP.

75

Figure 5.3(b) Resource utilization of Needleman-Wunsch IP.

Figure 5.4 Timing summary for Needleman-Wunsch IP.

76

Needleman-Wunsch co-processor is clocked in at a maximum frequency of 150.943MHz by

Xilinx ISE during synthesis. That makes the unit to be able to run comfortably with Microblaze

processor which is set at the speed of 125MHz. The design is able to run at such speed thanks to

the implementation of pipelining technology, which makes the design to run in stages.

5.3 Developing the Smith-Waterman Algorithm IP Unit

Like Needleman-Wunsch co-processor, most of the Smith-Waterman resource consumption

originates from the array_proc block. As in the Figure below, the array_proc unit consume 3,147

of the 3,389 slices consumed by the whole design. The slice register consumption are at 1611 of

the overall 1714 and LUTs at 8950 of the 9362 in total.

77

Figure 5.5(a) Resource utilization of Smith-Waterman IP.

Figure 5.5(b) Resource utilization of Smith-Waterman IP.

78

Figure 5.6 Timing summary for Smith-Waterman IP.

The timing summary shows that Smith-Waterman co-processor was able to achieve a maximum

frequency of 175.623MHz. The design is 15MHz faster than the Needleman-Wunsch unit. Both

co-processors are capable to run at 125MHz of which the Microblaze processor are set to.

5.4 Overall Microblaze with Needleman-Wunsch IP and Microblaze with Smith-Waterman

IP

The 2 systems of Microblaze processor with Needleman-Wunsch and Microblaze processor with

Smith-Waterman are imported into ISE from XPS so that Chipscope Pro monitoring core could

be inserted into the design. Though XPS support the insertion of Chipscope Pro cores, it is more

79

limited in terms of functionality compared with ISE. Figure 5.7 and Figure 5.8 display the

overall resource utilization of the 2 designs. Both the systems used up almost one third of slice

registers available in VLX50T and around half of slice LUTs. Of the amount, around 20% of

slice registers and slice LUTs were consumed by the Microblaze processor. About 70% of the

7,200 slice logics available are being used up to accommodate the full design. Of this amount,

about 30% were consumed by Microblaze and the remaining by the co-processor IP.

(a) (b)

Figure 5.7 Overall resource utilization of the Microblaze system with the Needleman-Wunsch IP.

80

(a) (b)

Figure 5.8 Overall resource utilization of the Microblaze system with the Smith-Waterman IP.

5.5 Post-Route Simulation

Also known as post-place and route timing simulation. When simulating in post-route

simulation, the system will create a Standard Delay Format (SDF) file. The simulator helps to

add blocks and routing delays for the design during the IP development with Xilinx ISE. Using

post-route simulation helps the developer to see how the IP design will behave in actual circuit

before importing the design to connect to Microblaze processor.

81

5.6 Hardware Timing Diagram

Figure 5.9 Some of the simulation results of test_nw from Needleman-Wunsch algorithm.

Both the Needleman-Wunsch test_nw and Smith-Waterman test_sw are almost the same in terms

of design. At initial stage, both hardware blocks will obtain 13 characters from spam signature

database and spam email before start processing as shown above. At the end of the simulation,

the results are yielded after calculations and tracebacks are performed in the IPs. Both IPs will

then loop back to the initial stage and standby to receive new strings of characters to perform

processing again. The full simulations of test_nw and test_sw are included in Appendix C. In

Figure 5.9, the test_nw are simulated to be in the state to read data from side_relay hardware.

Values from port fsl_clk, fsl1_data, fsl1_s_exists, fsl_s_data and fsl_s_exists are generated from

the testbench. Ports fsl1_data, fsl1_s_exists sends the values first for 13 times before ports

fsl_s_data and fsl_s_exists continues. The clock cycle of fsl_clk were generated at 10 ns.

10 ns

1.6 ns

10 ns

82

Testbench flood the fsl1_s_data with the first value followed by a trigger „1‟ by fsl1_s_exists.

The test_nw hardware detect the value by fsl1_s_exists on the next rising clock edge. Upon

detection of a „1‟ in fsl1_s_exists, test_nw reads the data in fsl1_s_data before sending a „1‟ in

fsl1_s_read for 1 clock cycle to indicate that it has read 1 value from the FSL FIFO. In real FSL

bus, 1 value will be deleted from the queue according to first in first out rule. The length of time

test_nw respond with a „1‟ in fsl1_s_read after the rising edge is measured at 1.6 ns. After the

test_nw receives the values from fsl1_s_data for 13 times, it will change state to read data from

the Microblaze. Upon completing the generation of data for fsl1_s_data and fsl1_s_exists, the

testbench will continue to generate data for FSL port connected to Microblaze which are

fsl_s_data and fsl_s_exist. The timing and patterns of data generation by fsl_s_data and

fsl_s_exists are the same as fsl1_s_data and fsl1_s_exists. As in Appendix C, during the timeline

31,750ns to around 35,250ns, there are no respond from the hardware as it is in the processing

mode. The array_proc require some time to process the matrix table and perform traceback. After

some amount of time, the test_nw change state and generated the results in ports fsl_m_data.

Port fsl_m_write are set to „1‟ for one clock cycle to show that result value is available and port

fsl_m_data contains the result. The processes from Figure 5.9 to 5.10 are a continuous process

for test_nw. In post-route simulation, different sets of values are simulated into fsl1_s_data and

fsl_s_data for testing to make sure that the IP perform correctly.

83

Figure 5.10 The test_nw hardware output the result.

10 ns

84

5.7 Processing Element of Needleman-Wunsch Post-Route Simulation

Figure 5.11 Post-route simulation of Needleman-Wunsch algorithm processing element.

The above Figure shows the post-route simulation of the processing element of Needleman-

Wunsch algorithm. Various values are being tested in the post-route simulation to ensure the

design function properly. Values for port comp1, comp2, diagonal_value, up_value, left_value,

ctr_in and ctr_out are generated by testbench. In T1, port ctr_in generates a positive signal, the

processing element read the values from port comp1, comp2, diagonal_value, up_value and

left_value. Port comp1 and port comp2 each receive a character from the array_proc to be

compared. If value in Port comp1 match the value in port comp2, an additional score value of 1

will be added to the value that is received from the port diagonal_value. If the value from port

comp1 did not match with the value in port comp2, a score value of 0 will be added to the value

received from port diagonal_value. The three values which is value received from port

diagonal_value plus additional score, value from port up_value and value from port left_value

will then be compared to determine which the highest value is. Based on the Figure 5.11 the

T1 T2 T3

85

highest value is displayed for port d_value at T3 by the processing element when port ctr_out

signals '1' in T2 as indicated by the arrow.

5.8 Processing Element of Smith-Waterman Post-Route Simulation

Figure 5.12 shows the post-route simulation for Smith-Waterman processing element. Like

processing element for Needleman-Wunsch, processing element of Smith-Waterman also

compare three values received from port diagonal_value plus score, port up_value and port

left_value when port ctr_in signals '1' in T1. Port comp1 and port comp2 also receive its value

when port ctr_in signals „1‟.The score for Smith-Waterman are different than the score for

Needleman-Wunsch. A match value between port comp1 and port comp2, will give a score of 2

point else -1 for a mismatch. If the value in port diagonal_value were found to be 0 and there is a

mismatch between comp1 and comp2, the -1 score will be neglected. When port ctr_out signals

'1' in T2, the PE respond by displaying the highest at T3 in port d_value.

Figure 5.12 Post-route simulation of Smith-Waterman algorithm processing element. T1 T2 T3

86

5.9 Microblaze FSL side_relay unit Post-Route Simulation

In Figure 5.13 below, it displays a post-route simulation of side_relay block. Values from port

fsl_clk, fsl_rst, fsl_s_data and fsl_s_exists are generated using the testbench. When block

side_relay detect that the status of fsl_s_exist is '1' on rising clock edge, it checks whether the

fsl_m_full is '1' or '0' as shown in a. A „1‟ by port fsl_m_full represents that the FIFO FSL bus

connected to main algorithm IP is full. If port fsl_m_full is '0' as in a the side_relay will read the

data from b port fsl_s_data and trigger the port fsl_s_read to '1' in c for 1 clock cycle. After

reading the data, the side_relay change the state to write and set fsl_m_write to „1‟ in d for 1

clock cycle. The side_relay then write the data to fsl_m_data. The side_relay will then went back

to read state and repeat the process.

Figure 5.13 Post-Route Simulation of Microblaze FSL side_relay unit.

10 ns

10 ns

a

b

c

d

87

5.10 Hardware Data Sampling

Using Chipscope Pro Analyzer, hardware signals were collected when it is running on the Xilinx

Development Board. By setting up the proper trigger match and capture setting, Chipscope Pro

Analyzer could capture the real signals while the system is in operation.

5.11 Xilinx Chipscope Pro Sampling of FSL Interface

It this sub-chapter, various screenshots are displayed to provide insights on how the FSL works

in the design. Figure 5.14 below shows a diagram of FSL bus connection between Microblaze,

side_relay and the main engine. Figure 5.15 shows the sampling results of interface 1, 2, 3 and 4

labeled in Figure 5.14.

Figure 5.14 Label 1, 2, 3 and 4 that shows the interface covered by Chipscope Pro Analyzer in

Figure 5.15.

88

Figure 5.15 The Chipscope Pro Analyzer result collected from interface labeled 1, 2, 3 and 4 in

Figure 5.14.

In Figure 5.15, it shows the sampling collected from interface labeled 1, 2, 3 and 4 of Figure

5.14. The trigger used is FSL_S_Exists port of the interface labeled 2. The storage qualifications

are set to all data. When the FSL_M_Write are set to '1' with FSL_M_Data containing the data

which is 'a', the FSL_S_Exists are set to '1' as there are data in the bus. FSL_S_Read are set to '1'

in the next sampling and data are read from the FSL_S_Data in label 2. After getting the data, the

side_relay block write the data to another FSL bus channel connected to the co-processor. The

other FSL channel is labeled 3 and 4. The side_relay set the FSL_M_Write to '1' with the data in

FSL_M_Data as in 3 of Figure 5.15. The co-processor respond when it detects that

FSL_S_Exists in 4 are set to '1' by setting FSL_S_Read to '1' in next cycle. The data 'a' are

retrieved into the co-processor.

{ { { {

89

Figure 5.16 below shows the diagram of FSL bus connection between Microblaze,

side_relay and the main engine but with different labels. The Figure is labeled with number 5, 6,

7 and 8 for further explanation. Figure 5.17 and Figure 5.18 shows the sampling results of

interface 5, 6, 7 and 8 labeled in Figure 5.16.

Figure 5.16 Label 5 and 6 that shows the interface covered by Chipscope Pro Analyzer in Figure

5.17.

Figure 5.17 The Chipscope Pro Analyzer result collected from interface labeled 5 and 6 in Figure

5.16.

{ {

90

In Figure 5.17 shows the Chipscope Pro sampling collected from interface labeled 5 and 6. The

FSL bus received a string of characters for word "viagra" from the master interface and passes it

on to the slave interface. The Xilinx Chipscope Pro Analyzer has been set in storage qualification

of storing the sampling when the FSL_S_Exists are set to '1'.

Figure 5.18 The Chipscope Pro Analyzer result collected from interface labeled 7 and 8 in

Figure 5.16.

For Figure 5.18, it shows some of the ongoing activities on the interface labeled 7 and 8. The

storage qualification is set to store the sampling when FSL_M_Write are set to '1'. The

co-processor are passing scanning results back to Microblaze processor for further analysis

after performing comparison of strings of characters it received before that.

{ {

91

Xilinx Chipscope Pro Analyzer software is vital in ensuring a design runs properly. The

sampling of Chipscope Pro Analyzer is gathered in real-time. If there are improper signals

detected in Chipscope Pro sampling, it means that there are flaws in the hardware that we design

and steps should be taken to rectify it.

5.12 Spam Mail Testing

For testing purpose, a selected group of 100 spam mails and 100 non-spam mails or ham is being

used. The testing corpus of spam and ham are obtained from TREC 2007. The selected group of

testing emails is text based email. Based on the selected emails, spam keywords are identified

and saved into the CompactFlash (CF) card. The emails are then sent one by one from the TCP

client to the Xilinx Development board for scanning purpose.

92

5.12.1 Testing Criteria

For experiment purpose, two set of criterias were used to test the effectiveness of both the built

Needleman-Wunsch and Smith-Waterman systems. In criteria 1, for keywords that are 4 or 5

characters long, a mismatch of 1 character will trigger the counter. If the keywords are 6 or more

than 6 characters long, then system will tolerate a maximum number of 2 characters mismatches

to set the trigger. Any keywords that are shorter than 4 characters will need to be matched

exactly. When the counter is triggered, it will add a score of one for the email. The counter starts

at the value of 0. At the end of a scan on the email, the system will post the accumulated score

for the email in the result. In criteria 2, stricter rule were implemented to trigger the counter. Any

words with the number of characters lower than 5 will be required to be exact match. For words

that are 5 and 6 characters long, a mismatch of 1 character is tolerated to trigger the counter.

Words that are 7 or more characters long are allowed to have maximum 2 characters of

mismatch. The Figure 5.19 and Figure 5.20 below explain the flow of both criteria.

93

Keyword = exact match

Mark = Mark+1

Keyword = 4 characterMatch = 3

character?


character?

Keyword >= 6 character0<Mismatch <= 2

character?

Yes

Start

End

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Yes

No

No

No

Figure 5.19 Procedure used by Criteria 1 to calculate the marks in Microblaze software.

94

Keyword = exact match

Mark = Mark+1


character?


character?

Keyword >= 7 character0<Mismatch <= 2

character?

Yes

Start

End

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Yes

No

No

No

Figure 5.20 Procedure used by Criteria 2 to calculate the marks in Microblaze software.

95

5.12.2 Results

After using 100 spam and 100 ham email for testing. The results are accumulated as in Table

5.1(a) and Table 5.1(b) below. The threshold score of 10 are used as reference. For ham mails,

the number of email hams that accumulate a score of below 10 in criteria 1 is 83. This is against

criteria 2 which accumulate 98. The same hams that underwent testing using criteria 2 tend to

have lower score compared to criteria 1. As for spam mails, the number of spam mails that

accumulate a score of more than 10 are 94 for criteria 1 against 90 of criteria 2. Based on the

Table, if a score of 10 is being used as threshold to classify whether an email is spam or not, it

could be observed that criteria 2 perform better in terms of lower false negatives. In criteria 2,

the numbers of ham mail that are misclassified as spam are only 2, compared with 17 of criteria

1. With criteria 2, it tends to get a lower false negative at only 2%. As with spam, criteria 1 have

a higher percentage of detection at 94% and 90% for criteria 2. Criteria 1 have a higher rate of

true positive than criteria 2. It should be noted that the results are based on the spam signature

database that are being used. If the database signatures are more updated, the results will be even

more accurate. In this testing, the 2 systems successfully detect most of spam emails that modify

its keywords to evade detection of exact match algorithms.

96

Ham

Criteria 1 Criteria 2

Accumulated score of 10

and below(email)

83 98


and above(email)

17 2

(a)

Spam

Criteria 1 Criteria 2


and below(email)

6 10


and above(email)

94 90

(b)

Table 5.1(a)(b) The results for testing of spam email.

97

Based on the testing performed, both Needleman-Wunsch and Smith-Waterman yield the same

result for number of match detection for two inputs of 13 characters each. Spam mail has higher

accumulated marks for both criteria 1 and criteria 2. Most of the spam keywords that are being

slightly modified are detected. For example:

g a r b a g e

| | | | | |

g a r b i g e

v 1 a g r a

| | | | |

v i a g r a

p r e s c r 1 p t i o n

| | | | | | | | | | |

p r e s c r i p t i o n

p h 4 r m a c y

| | | | | | |

p h a r m a c y

98

However there are also unavoidable slight detection of other keywords, like:

t a b l e t

| | | | |

s t a b l e

l a t e r

| | | |

l a s e r

The accuracy of the spam detection depends on the spam signature database being used. The

more keywords in the database that are the same as in the spam email content, the higher the

detection rate is.

99

Chapter 6

Conclusion

Chapter 6 concludes the research with contributions achieved in the development of this thesis. It

also provides information of the future improvement and future work after this research.

6.1 Contributions

(i) Creating FPGA based co-processor on Xilinx FSL interface for Needleman-Wunsch and

Smith-Waterman algorithm in VHDL. By using FSL interface, it helps to ensure the two

hardwares easier to be commercialized and applied to other research in the future. Previous

FPGAs developments in other research did not standardized the interface of their hardware and

therefore, causing the hardwares developed to be only lab friendly. When attempts to use this

hardwares are made, additional controller that slows the design has to be made because of port

incompatibility. As of current date of research, Xilinx FPGA are one of the widely used FPGA in

the market and FSL bus are the latest bus created to connect custom peripheral to the Xilinx

Microblaze processor. This made the two IP which is Needleman-Wunsch and Smith-Waterman

to be very flexible as it is connected to highly customizable Xilinx Microblaze processor.

100

(ii) Demonstrate the ability of hardware circuits and FPGA for spam scanning. In this

research, two FPGA embedded systems is created. One with Xilinx Microblaze processor

connected to Needleman-Wunsch and the other with Xilinx Microblaze processor connected to

Smith-Waterman. The Microblaze processor act as TCP server and also as the controller of the

algorithm IP.

(iii) Implementing parallelism in FSL based FPGA of Needleman-Wunsch and Smith-

Waterman algorithm. Creating parallel Needleman-Wunsch and Smith-Waterman IP is a

tedious process considering the number of processing elements involved. By integrating

parallelism in the IP, the computation speed of the matrix table is reduced to less than one sixth

compared to single processing element system.

6.2 Further Improvements and Future Works

(i) Creating larger FSL based FPGA of Needleman-Wunsch and Smith-Waterman algorithm

with future devices. By using larger capacity device to create Needleman-Wunsch and Smith-

Waterman that could receive longer strings of characters, attempts could be made to apply the

design in other type of applications.

101

(ii) Testing both algorithms with other machine learning anti-spam algorithms. As there is no

silver bullet to anti-spam solution, it would be interesting to see how Needleman-Wunsch and

Smith-Waterman algorithms perform when coupled with other anti-spam solutions.

(iii) XPS SDK only provide the ability to measure performance of software running alone on the

Microblaze processor. If the XilKernel operating system is being used, XPS SDK will not

support profiling or benchmarking of the applications running on it. As creating a TCP Server

application running on Microblaze require Xilkernel, the software performance are not

measurable. If future XPS provide support to measure the performance of the software that are

using Xilkernel, some research could be done to measure the performance of the TCP server

application.

(iv) When using Library LWIP (Lightweight IP) in socket mode, the maximum TCP throughput

that could be achieved are only about 1Mbps (Velusamy 2008). Xilinx adapters are not

optimized in socket mode and will only be fixed by Xilinx in subsequent releases. Testing could

be done to measure the performance of the software network throughput once this problem is

solved in the future.

102

Reference

GUNNARSSON, A. and EKBERG, S.(2003) Invasion of Privacy, Master Thesis, Blekinge

Institute of Technology.

HOANCA, B. (2006) How good are our weapons in the spam wars? Technology and Society

Magazine, IEEE, 25, 22-30.

YU-FEN, C., CHIA-MEI, C., BINGCHIANG, J. & HSIAO-CHUNG, L. (2007) An Alliance-

Based Anti-spam Approach. Natural Computation, 2007. ICNC 2007. Third International

Conference on.

HAUPT, R. L. (2004) Unsolicited commercial e-mail (UCE). Antennas and Propagation

Magazine, IEEE, 46, 153-154.

David E. Sorkin. (2003) Spam Laws. Available from : <http://www.spamlaws.com/> [16

December 2003].

LORRIE FAITH, C. & BRIAN, A. L. (1998) Spam! Commun. ACM, 41, 74-83.

ODA, T. (2005) A Spam-Detecting Articial Immune System. Faculty of Graduate Studies and

Research. Ottawa, Carleton University.

SANPAKDEE, U., WALAIRACHT, A. & WALAIRACHT, S. (2006) Adaptive Spai Mail

Filtering Using Genetic Algorithm. Advanced Communication Technology, 2006. ICACT 2006.

The 8th International Conference.

CARPINTER, J. & HUNT, R. (2006) Tightening the net: A review of current and next

generation spam filtering tools. Computers & Security, 25, 566-578.

103

MING-WEI, W., YENNUN, H., SHYUE-KUNG, L., ING-YI, C. & SY-YEN, K. (2005) A

multi-faceted approach towards spam-resistible mail. Dependable Computing, 2005.

Proceedings. 11th Pacific Rim International Symposium on.

The Definition of Spam (2007) Available from : <http://www.spamhaus.org/definition.html>

[29 November, 2007].

CREWS, C. W. (2001) Policy Analysis : Why Canning" spam" is a Bad Idea, Cato Institute.

JACOBSSON, A. & CARLSSON, B. (2007) Privacy and Spam: Empirical Studies of

Unsolicited Commercial e-Mail. Proceedings of IFIP Summer School on Risks & Challenges of

the Network Society.

The Global Economic Impact of Spam, 2005, Available from :

<http://www.ferris.com/?file_id=2004/05/611_409SpamCosts.pdf> [3 Dicember 2007].

BANIT, A., NITIN, K. & MOLLE, M. (2005) Controlling spam Emails at the routers.

Communications, 2005. ICC 2005. 2005 IEEE International Conference on.

GALEN, A. G. (2007) Compliance with the CAN-SPAM Act of 2003. Commun. ACM, 50, 56-

62.

HUNT, R. & CARPINTER, J. (2006) Current and New Developments in Spam Filtering.

Networks, 2006. ICON '06. 14th IEEE International Conference on.

DREYFUS, S. (2002) Richard Bellman on the Birth of Dynamic Programming. Operations

Research, 50, 48-51.

LUIS VON, A., MANUEL, B. & JOHN, L. (2004) Telling humans and computers apart

automatically. Commun. ACM, 47, 56-60.

http://www.spamhaus.org/definition.html

104

CATALIN, A. & MARIA, C. (2009) Phishing 101. MIT Spam Conference 2009. Massachusetts

Avenue Cambridge.

AL-BATAINEH, A. & WHITE, G. (2009) Detection and Prevention Methods of Botnet-

generated Spam. MIT Spam Conference 2009. Massachusetts Avenue Cambridge.

FRIESS, N. & AYCOCK, J. (2009) A Kosher Source of Ham. MIT Spam Conference 2009.

Massachusetts Avenue Cambridge.

(2009) Email Metrics Program: The Network Operators‟ Perspective. Report #10 – Third and

Fourth Quarter 2008. San Francisco, Messaging Anti-Abuse Working Group.

(2008) Email Metrics Program: The Network Operators‟ Perspective. Report #9 – Second

Quarter 2008. San Francisco, Messaging Anti-Abuse Working Group.

CANELLA, M. & MIGLIOLI, F. (2003) Performing DNA comparison on a bio-inspired tissue

of FPGAs. Parallel and Distributed Processing Symposium, 2003. Proceedings. International.

DU, Z. & LIN, F. (2004) Using blocks+ database in Needleman-Wunsch algorithm. Fuzzy

Information, 2004. Processing NAFIPS '04. IEEE Annual Meeting of the.

FUNG, W. W. L., SHAM, I., YUAN, G. & AAMODT, T. M. (2007) Dynamic Warp Formation

and Scheduling for Efficient GPU Control Flow. Microarchitecture, 2007. MICRO 2007. 40th

Annual IEEE/ACM International Symposium on.

KNEES, P., SCHEDL, M. & WIDMER, G. Multiple Lyrics Alignment: Automatic Retrieval of

Song Lyrics. Proceedings of 6th International Conference on Music Information Retrieval

(ISMIR’05), 564–569.

LESK, A. M., LEVITT, M. & CHOTHIA, C. (1986) Alignment of the amino acid sequences of

distantly related proteins using variable gap penalties. Protein Engineering Design and Selection,

1, 77-78.

105

MARK, G. & MICHAEL, L. (1996) Using Iterative Dynamic Programming to Obtain Accurate

Pairwise and Multiple Alignments of Protein Structures. Proceedings of the Fourth International

Conference on Intelligent Systems for Molecular Biology. AAAI Press.

NAVEED, T., SIDDIQUI, I. S. & AHMED, S. Parallel Needleman-Wunsch Algorithm for Grid.

Available http://www. gridbus. org/alchemi/files/Parallel% 20Needlema.

NEEDLEMAN, S. B. & WUNSCH, C. D. (1970) A general method applicable to the search for

similarities in the ammo acid sequence of two proteins. J. Mol. Biol, 48, 443-453.

ROSE, J. & EISENMENGER, F. (1991) A fast unbiased comparison of protein structures by

means of the Needleman-Wunsch algorithm. Journal of Molecular Evolution, 32, 340-354.

THOMAS, R. & RANCE, N. (2003) A parallel algorithm for DNA alignment. Crossroads, 9, 10-

15.

XIA, F. & DOU, Y. (2007) Reducing Storage Requirements in Accelerating Algorithm of Global

BioSequence Alignment on FPGA. Advanced Parallel Processing Technologies.

LI, I., SHUM, W. & TRUONG, K. (2007) 160-fold acceleration of the Smith-Waterman

algorithm using a field programmable gate array (FPGA). BMC Bioinformatics, 8, 185.

MAY, P., KLAU, G., BAUER, M. & STEINKE, T. (2007) Accelerated microRNA-Precursor

Detection Using the Smith-Waterman Algorithm on FPGAs. Distributed, High-Performance and

Grid Computing in Computational Biology.

HARRIS, B., JACOB, A. C., LANCASTER, J. M., BUHLER, J. & CHAMBERLAIN, R. D.

(2007) A Banded Smith-Waterman FPGA Accelerator for Mercury BLASTP. Field

Programmable Logic and Applications, 2007. FPL 2007. International Conference on.

106

XIANDONG, M. & VIPIN, C. (2004) Bio-sequence analysis with cradle's 3SoCTM

software

scalable system on chip. Proceedings of the 2004 ACM symposium on Applied computing.

Nicosia, Cyprus, ACM.

WEIGUO, L., SCHMIDT, B., VOSS, G., SCHRODER, A. & MULLER-WITTIG, W. (2006)

Bio-sequence database scanning on a GPU. Parallel and Distributed Processing Symposium,

2006. IPDPS 2006. 20th International.

BRUTLAG, D. L., DAUTRICOURT, J. P., DIAZ, R., FIER, J., MOXON, B. & STAMM, R.

(1993) BLAZETM

: An implementation of the Smith-Waterman sequence comparison algorithm

on a massively parallel computer. Computers & chemistry, 17, 203-207.

NASH, H., BLAIR, D. & GREFENSTETTE, J. (2001) Comparing algorithms for large-scale

sequence analysis. Bioinformatics and Bioengineering Conference, 2001. Proceedings of the

IEEE 2nd International Symposium on.

BENKRID, K., YING, L. & BENKRID, A. (2007) Design and Implementation of a Highly

Parameterised FPGA-Based Skeleton for Pairwise Biological Sequence Alignment. Field-

Programmable Custom Computing Machines, 2007. FCCM 2007. 15th Annual IEEE Symposium

on.

GOK, M. & YILMAZ, C. (2006) Efficient Cell Designs for Systolic Smith-Waterman

Implementations. Field Programmable Logic and Applications, 2006. FPL '06. International

Conference on.

CHRISTIAN, K. & JON, C. (2006) Efficient sequence alignment of network traffic. Proceedings

of the 6th ACM SIGCOMM conference on Internet measurement. Rio de Janeriro, Brazil, ACM.

STORAASLI, O., STRENSKI, D. & INC, C. (2007) Exploring Accelerating Science

Applications with FPGAs. Proc. of the Reconfigurable Systems Summer Institute, July.

NUR'AINI ABDUL, R., ROSNI, A., ABDULLAH ZAWAWI HAJI, T. & ZALILA, A. (2006)

Fast Dynamic Programming Based Sequence Alignment Algorithm. Distributed Frameworks for

Multimedia Applications, 2006. The 2nd International Conference on.

107

LIU, Y., HUANG, W., JOHNSON, J. & VAIDYA, S. (2006) GPU Accelerated Smith-

Waterman. Computational Science – ICCS 2006.

HASAN, L., AL-ARS, Z. & VASSILIADIS, S. (2007) Hardware acceleration of sequence

alignment algorithms-an overview. Design & Technology of Integrated Systems in Nanoscale

Era, 2007. DTIS. International Conference on.

AMAR, S. (2006) Heterogeneous processing: a strategy for augmenting moore's law. Linux J.,

2006, 7.

BENKRID, K., LIU, Y. & BENKRID, A. (2007) High Performance Biosequence Database

Scanning using FPGAs. Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE

International Conference on.

SMITH, T. F. & WATERMAN, M. S. (1981) Identification of common molecular subsequences.

J. Mol. Bwl, 147, 195-197.

PEIHENG, Z., GUANGMING, T. & GUANG, R. G. (2007) Implementation of the Smith-

Waterman algorithm on a reconfigurable supercomputing platform. Proceedings of the 1st

international workshop on High-performance reconfigurable computing technology and

applications: held in conjunction with SC07. Reno, Nevada, ACM.

GOTOH, O. (1982) An improved algorithm for matching biological sequences. Journal of

Molecular Biology, 162, 705.

HSIEN-YU, L., MENG-LAI, Y. & YI, C. (2004) A parallel implementation of the Smith-

Waterman algorithm for massive sequences searching. Engineering in Medicine and Biology

Society, 2004. IEMBS '04. 26th Annual International Conference of the IEEE.

FA, Z., XIANG-ZHEN, Q. & ZHI-YONG, L. (2002) A parallel Smith-Waterman algorithm

based on divide and conquer. Algorithms and Architectures for Parallel Processing, 2002.

Proceedings. Fifth International Conference on.

108

BOUKERCHE, A., DE MELO, A. C. M. A. & AYALA-RINCON, M. (2005) Parallel strategies

for local biological sequence alignment in a cluster of workstations. Parallel and Distributed

Processing Symposium, 2005. Proceedings. 19th IEEE International.

ROGNES, T. & SEEBERG, E. (2000) Six-fold speed-up of Smith-Waterman sequence database

searches using parallel processing on common microprocessors. Bioinformatics, 16, 699-706.

JACOB, A., SANYAL, S., PAPRZYCKI, M., ARORA, R. & GANZHA, M. (2007) Whole

Genome Comparison on a Network of Workstations. Parallel and Distributed Computing, 2007.

ISPDC '07. Sixth International Symposium on.

TAMIL, E. M., IDRIS, M. Y. I., THONG, C. M., SAUDI, M. M. & JALI, M. Z. (2008)

Needleman Wunsch Implementation for SPAM/UCE Inline Filter. Seventh International

Network Conference (INC 2008). Plymouth, United Kingdom.

RAZAK, Z., ZULKIFLEE, K., SALLEH, R., YAACOB, M. & TAMIL, E. M. (2007) A Real-

Time Line Segmentation Algorithm For An Offline Overlapped Handwritten Jawi Character

Recognition Chip. Malaysian Journal of Computer Science, 20, 12.

ABIDIN, S. A. Z., OTHMAN, A. H., TAMIL, E. M. & JALI, Z. M. (2006) e-Mail Spam Source

of Origin and Content In Open Relay Exploits at Home DSL Connection Using Jackpot

Mailswerver 1.2.2 Honeypot. Proceedings of National ICT Conference. Perlis, Malaysia.

TAMIL, E. M. & IDRIS, M. Y. I. (2006) FPGA Based Approximate String Search Algorithm

Implementation To Detect Polymorphic Worm. Proceedings of 3rd International Conference on

Artificial Intelligence in Engineering and Technology (ICAIET 2006). Sabah, Malaysia.

IDRIS, M. Y. I., TENG, Y. G. & TAMIL, E. M. (2007) Hardware-Based Worm Detection

Design Using Knuth-Morris-Pratt Algorithm. Proceedings of the Conference on IT Research and

Application (CITRA 2007). Selangor, Malaysia.

109

TAMIL, E. M., IDRIS, M. Y. I. & HENG, T. H. (2007) FPGA Design of Spyware Inline Filter

Using Levenshtein Distance Approximate String Search Algorithm. Proceedings of the SCORED

2007. Universiti Tenaga Nasional, Malaysia.

TAMIL, E. M., IDRIS, M. Y. I., HENG, T. H. & SAUDI, M. (2008) Hardware based

SPAM/UCE Filter Design with Levenshtein Distance Algorithm : A Framework. Proceedings of

Internet Convergence Conference (ICC 2007). Kuala Lumpur, Malaysia.

DU, Y. (2005) A SOC Implementation of Ogg Audio Player using MicroBlaze. Department of

Electrical Engineering, Faculty of Electrical Engineering, Mathematics and Computer Science.

Delft, Delft University of Technology.

MAGNUSSON, P. (2004) Evaluating Xilinx Microblaze for Network SoC Applications.

Department of Computer Science and Electrical Engineering. Lulea,Sweden, Luleå University

of Technology.

BERNSPANG, J. (2004) Interfacing an external Ethernet MAC/PHY to a MicroBlaze system on

a Virtex-II FPGA. Computer Engineering, Dept. of Electrical Engineering at LinkÄopings

universitet. Brisbane, University of Queensland.

(2008) Embedded Systems Development. Xilinx.

Synthesis and Simulation Design Guide. Xilinx.

PEDRONI, V. A. (2004) Circuit Design with VHDL. Cambridge, Massachusetts, MIT Press.

CHU, P. P. (2006) RTL HARDWARE DESIGN USING VHDL, New Jersey, John Wiley & Sons,

Inc.

(2004) Chipscope PLB IBA. Xilinx.

(2004) Chipscope ICON. Xilinx.

(2004) ML401 Evaluation Platform. Xilinx.

(2008) Fast Simplex Link(FSL) Bus (v2.11a). Xilinx.

110

ROSINGER, H.-P. (2004) Connecting Customized IP to the MicroBlaze Soft Processor Using

the Fast Simplex Link (FSL) Channel. XAPP529. Xilinx.

VELUSAMY, S. (2008) LightWeight IP (lwIP) Application Examples. v1.0 ed., Xilinx.

(2008) XAPP1026. 1.1 ed., Xilinx.

CASAGRANDE, N. (2003) Basic-Algorithms-of-Bioinformatics Applet.

BRAY (2008) Terminal. 1.9b ed.

JOAN, B. (2009) Difference Between ASIC and FPGA. ASIC vs FPGA. Available from :

<http://www.differencebetween.net/technology/difference-between-asic-and-fpga/> [08 January,

2010].

XILINX (2010) FPGA vs. ASIC. Available from :

<http://www.xilinx.com/company/gettingstarted/fpgavsasic.htm> [08 January 2010].

HAYES, B. (2007) How Many Ways Can You Spell V1@gra? , American Scientist. Available

from : <http://amsciadmin.eresources.com/libraries/documents/2008521812126487-2007-

07Hayes.pdf> [25 January, 2010].

GRAHAM-CUMMING, J. (2006) Does Bayesian poisoning exist? Available from :

<http://www.virusbtn.com/spambulletin/archive/2006/02/sb200602-poison> [27 December,

2007].

GRAHAM-CUMMING, J. (2004) How to beat an adaptive spam filter. The Spam Conference

2004.

111

SAHAMI, M., DUMAIS, S., HECKERMAN, D. & HORVITZ, E. (1998) A Bayesian approach

to filtering junk e-mail. AAAI-98 Workshop on Learning for Text Categorization, 460.

FERRIS RESEARCH(2007) Available from : <http://www.ferris.com/research-library/industry-

statistics/> [02 January, 2007].

SPAMHAUS.org (2007) “The SPAMHAUS Project”, Available from :

<http://www.spamhaus.org/effective_filtering.html> [04 January, 2007].

TWINING, D., WILLIAMSON, M. M., MOWBRAY, M. J. F. & RAHMOUNI, M. (2004)

Email prioritization: Reducing delays on legitimate mail caused by junk mail. USENIX

Association.

COHEN, J. (2005) COMPUTER SCIENCE AND BIOINFORMATICS. COMMUNICATIONS

OF THE ACM. ACM.

SANTARINI, M. (2010) Xcell Journal. 2010 Customer Innovation Issue ed. San Jose, Mike

Santarini.

RODRIGUEZ-RAMOS, L. F., ALONSO, A., GAGO, F., GIGANTE, J. V., HERRERA, G. &

VIERA, T. (2006) Adaptive Optics Real-Time Control Using FPGA. Field Programmable Logic

and Applications, 2006. FPL '06. International Conference on.

APOSTOLICO, A. & GIANCARLO, R. (1986) The Boyer-Moore-Galil string searching

strategies revisited. SIAM J. Comput., 15, 98-105.

HORSPOOL, R. N. (1980) Practical fast searching in strings. Software: Practice and

Experience, 10, 501-506.

112

KNUTH, D. E., MORRIS JR, J. H. & PRATT, V. R. (1977) Fast pattern matching in strings.

SIAM Journal on Computing, 6, 323.

ALTSCHUL, S. F., GISH, W., MILLER, W., MYERS, E. W. & LIPMAN, D. J. (1990) Basic

local alignment search tool. J. Mol. Biol, 215, 403-410.

113

Appendix A

Microblaze Terms and Definition

Block RAM Random access memory built inside the FPGA. Used as the primary storage of coding

that run on Microblaze. User could choose the size allocated for their coding. BRAM

are also used as the buffer of other IP in Microblaze block. The BRAM are scattered in

FPGA and limited in size therefore, it should be assigned carefully.

I-cache BRAM Instruction cache for Microblaze. Use the space of BRAM and the size is determined by

user.

D-cache BRAM Data cache for Microblaze. Also use the space of BRAM and the size is determined by

user.

PLB v46 PLB or Processor Local Bus are the bus that interconnect the Microblaze core to other

IP. PLBv46 have been replacing the OPB bus since the EDK 9.2i.

OPB Also known as On-Chip Peripheral Bus(OPB). Most of its applications have been

replaced by PLB since EDK 9.2i.

LMB Local Memory Bus used by Microblaze processor to gain fast access to the on-chip

BRAM.

FSL A fast communication bus protocol that could be used to connect the IP developed by

user to the Microblaze core or to other design unit. Fast Simplex Link bus are a lot

simpler and easier to use compared with PLB as it has less ports. Up to 16 units of

parallel FSL channels could be supported in version 7 of the Microblaze processor.

Table A.1 Microblaze terms and definitions.

114

Appendix B

RTL Schematics

Figure B.1 The RTL Schematic of the VHDL block of the algorithm IP.

array_proc

115

Figure B.2 RTL Schematics of array_proc unit for Needleman-Wunsch

(i)

(ii)

116

(iii)

117

(iv)

118

(v)

119

(vi)

120

(vii)

121

(viii)

122

(ix)

123

(x)

124

Figure B.3 RTL Schematics of array_proc unit for Smith-Waterman

(i)

(ii)

125

(iii)

126

(iv)

127

(v)

128

(vi)

129

(vii)

130

(viii)

131

(ix)

132

(x)

133

Figure B.4 The RTL Schematic of the processing element for Needleman-Wunsch.

134

Figure B.5 The RTL Schematic of the processing element for Smith-Waterman.

135

Figure B.6 RTL Schematics of side_relay

136

Appendix C

Full simulations of hardware

137

Figure C.1 The first half post-route simulation for Needleman-Wunsch Algorithm.

Figure C.2 The second half of post-route simulation for Needleman-Wunsch Algorithm.

138

5.4 Smith-Waterman Algorithm IP Post-Route Simulation

Figure C.3 The first half of post-route simulation for Smith-Waterman Algorithm.

Figure C.4 The second half of post-route simulation for Smith-Waterman Algorithm.

Documents

NEEDLEMAN-WUNSCH AND SMITH-WATERMAN