Introduction

. BioPerf is a suite of representative applications that we have assembled from the computational biology community, where the codes are carefully selected to span a breadth of algorithms and performance characteristics. We have analyzed the complexity of these codes, at the instruction and memory level, using “live” and aggregate data on contemporary high-performance architectures (Apple G5 with the IBM PowerPC 970), and on the IBM cycle-accurate simulator Mambo, previously used to design supercomputers such as IBM p-Series and BlueGene, and currently being used to model future systems. Hence, our work is novel in that it is one of the first efforts to incorporate life science application performance for optimizing high-end computer system architectures. Through dual-platform performance analysis, we offer system design parameters for machine configurations that may improve the performance of these codes. We target this suite for impact to both biologists and computer scientists for the evaluation of systems running bioinformatics applications.

Incorporating Life Science Applications into the Architectural Optimizations of Next-Generation Petaflops Systems

Introduction Performance Analysis

BioPerf

David A. BaderGeorgia Institute of Technology

Vipin Sachdeva University of New Mexico

The following codes are included in BioPerf

Package ExecutableBLAST blastn, blastpFASTA fasta34_t, ssearch34_tCLUSTALW clustalw, clustalw_smpHMMER hmmsearch, hmmpfamT-COFFEE tcoffeeGLIMMER glimmer, glimmer-packagePHYLIP dnapenny, promlkGRAPPA grappaCE cePREDATOR predator

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

1.2

0.00

0.01

0.02

0.03

0.04

0.05

0.06

Instr. completed/cycle L1d Miss Rate

L1d

Mis

s R

ate

0 200 400 600 800 1000 1200 1400 1600 1800

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

0.00

0.01

0.02

0.03

0.04

0.05

Instructions Completed/Cycle L1D Cache Misses

Inst

ruct

ions

Com

plet

ed/C

ycle

0 200 400 600 800 1000 1200 1400

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

Instr. Completed/Cycle L1D Cache Miss Rate

Blast, hmmpfam, and tcoffee performance graphs (from left to right)

Instructions per cycle increases in the same cycle that the L1 data miss rate decreases. We can thus correlate the performance of the application as it varies, with the system metrics impacting it.

Methodology Performance Analysis through live data

Cumulative Metrics

Separated Regions of Performance

Performance Analysis through live data

6.0e+8 8.0e+8 1.0e+9 1.2e+9 1.4e+9 1.6e+9 1.8e+9 2.0e+9 2.2e+9 2.4e+90

5e+4

1e+5

2e+5

2e+5

3e+5

3e+5

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

INST_TYPE_ARITH L1D_CACHE_MISS_USER

2.20e+9 2.22e+9 2.24e+9 2.26e+9 2.28e+9 2.30e+9 2.32e+9 2.34e+915000

20000

25000

30000

35000

40000

45000

50000

0

10000

20000

30000

40000


2.2e+9 2.4e+9 2.6e+9 2.8e+9 3.0e+9 3.2e+9 3.4e+9 3.6e+9 3.8e+9

0

1e+5

2e+5

3e+5

4e+5

5e+5

0

10000

20000

30000

40000


ClustalW region I (top), II (bottom left) and III (bottom right) showing differences in algorithmic complexity and memory access pattern.

Clustalw’s performance is roughly categorized into three regions.• Every sequence is compared against every other sequence by Smith Waterman, a quadratic time complexity dynamic programming algorithm,• The neighbor joining method in which comparison score of sequences is used to make a guide tree with the sequences at the leaves of the tree• The sequences are combined into a multiple sequence alignment according to the guide tree.

Instruction Profiling

0 500 1000 1500 2000 2500 3000

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

0.000

0.005

0.010

0.015

0.020

0.025

0.030

Instr. Completed/Cycle Branch Mispredicts/Instr.

0 500 1000 1500 2000 2500 3000

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

0.00

0.05

0.10

0.15

0.20

0.25

Instr. Completed/Cycle L1d Miss Rate

Clustalw’s livegraphs with L1d miss rate and branch mispredicts (top left and right).

Performance of the last phase of clustalw is more closely related to branch mispredicts than L1 data miss rate.

www.bioperf.org

Documents

Introduction