Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Haney - 1HPEC 10/20/2005
MIT Lincoln Laboratory
The HPEC Challenge Benchmark Suite
Ryan Haney, Theresa Meuse, Jeremy Kepner and James Lebak
Massachusetts Institute of TechnologyLincoln Laboratory
HPEC 2005
This work is sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.
MIT Lincoln LaboratoryHaney - 2
HPEC 10/20/2005
Acknowledgements
• Lincoln Laboratory PCA Team– Matthew Alexander– Jeanette Baran-Gale– Hector Chan– Edmund Wong
• Shomo Tech Systems– Marti Bancroft
• Silicon Graphics Incorporated– William Harrod
• Sponsor– Robert Graybill, DARPA PCA and HPCS Programs
• Code adapted from Soumekh, Mehrdad, Synthetic Aperture Radar Signal Processing with Matlab Algorithms, Wiley, 1999
MIT Lincoln LaboratoryHaney - 3
HPEC 10/20/2005
Motivation
Advanced Sensor Platforms
Processor and System Architectures
Single ProcessorElement
Multi-computers
Super-computers
TiledProcessors
Challenge: Provide benchmarks that test a system at the kernel and multi-processor levels
Design System• Choose
components• Hardware size• Required software
performance
Measure Performance• Throughput• Power• Stability
Implement Benchmarks• Design• Code• Tune
System Analysis and Design
MIT Lincoln LaboratoryHaney - 4
HPEC 10/20/2005
HPEC Challenge Benchmark Suite
• PCA program kernel benchmarks– Single-processor operations– Drawn from many different DoD applications– Represent both “front-end” signal processing and “back-end”
knowledge processing
• HPCS program Synthetic SAR benchmark– Multi-processor compact application– Representative of a real application workload– Designed to be easily scalable and verifiable
MIT Lincoln LaboratoryHaney - 5
HPEC 10/20/2005
Outline
• Introduction• Kernel Level Benchmarks
– Kernel Overview– Kernel Architecture– Generic vs. Optimized Results
• SAR Benchmark• Release Information• Summary
MIT Lincoln LaboratoryHaney - 6
HPEC 10/20/2005
Kernel Benchmark Selection
“Front-end Processing”• Data independent,
stream-oriented• Signal processing,
image processing, high-speed network communication
“Back-end Processing”• Data dependent, thread
oriented• Information processing,
knowledge processing
Broad Processing Categories Specific Kernels
Signal/Image Processing• Finite Impulse Response Filter (FIR)• QR Factorization (QR)• Singular Value Decomposition (SVD)• Constant False Alarm Rate Detection
(CFAR)
Communication• Corner Turn (CT)
Information/Knowledge Processing
• Graph Optimization via Genetic Algorithm (GA)
• Pattern Match (PM)• Real-time Database Operations (DB)
MIT Lincoln LaboratoryHaney - 7
HPEC 10/20/2005
Signal and Image Processing Kernels
QRFIR
SVD CFAR
Data Set 1:M Filters (~10 coefficients)
Data Set 2:M Filters (>100 coefficients)
• Bank of filters applied to input data• FIR filters implemented in time and
frequency domain
Input Matrix
M C
hann
els
InputMatrix
BidiagonalMatrix
DiagonalMatrix Σ
• Produces decomposition of an input matrix, X=UΣVH
• Classic Golub-Kahan SVD implementation
A
• Computes the factorization of an input matrix, A=QR
• Implementation uses Fast Givens algorithm
*Q R(MxN) (MxN)(MxM)
Beams
Dopplers
Range
C(i,j,k)
T(i,j,k)
C
• Creates a target list given a data cube• Calculates normalized power for each
cell, thresholds for target detection
Normalize,Threshold
Target List
(i,j,k)
MIT Lincoln LaboratoryHaney - 8
HPEC 10/20/2005
Information and KnowledgeProcessing Kernels
Range
Mag …
Pattern under test
Pattern MatchGenetic Algorithm
Database Operations
• Compute best match for a pattern out of set of candidate patterns
– Uses weighted mean-square error0.10.20.30.4Selection
Evaluation
Crossover Mutation
• Evaluate each chromosome• Select chromosomes for next generation• Crossover: randomly pair up chromosomes
and exchange portions • Mutation: randomly change each chromosome
Red-Black TreeData Structure
Linked ListData Structures
• Three generic database operations:
– search: find all items in a given range
– insert: add items to the database
– delete:remove item from the database
Candidate Pattern 1
Candidate Pattern 2
Candidate Pattern N
Corner Turn
0 1 2 34 5 6 78 9 10 11
1 5 92 6 103 7 11
0 4 8
• Memory rearrangement of matrix contents
– Switch from row to column major layout
MIT Lincoln LaboratoryHaney - 9
HPEC 10/20/2005
Kernel Benchmark Architecture
• Kernels written in ANSI C– Portable across UNIX based platforms
• Sample data sets provided• Generation of user defined data sets and sizes possible using Matlab
Matlab Data Generator
Input Data KernelOutput Data
C Verification
C Kernel
VerificationData
Timing DataMatlab
Performance Analysis
VerificationResults
PerformanceResults
MIT Lincoln LaboratoryHaney - 10
HPEC 10/20/2005
Generic vs. Optimized Kernel Benchmark Results
Optimized code outperforms
baseline in FIR due to use of
VSIPL libraries.
QR and SVD optimized code outperforms baseline
because of AltiVec use.
Results similar for CFAR, PM, and GA. Minimal
optimizations were made.
Optimized Corner Turn code uses AltiVec intrinsics.
Optimized DB code uses specialized
memory management
routines.
MIT Lincoln LaboratoryHaney - 11
HPEC 10/20/2005
Outline
• Introduction• Kernel Level Benchmarks• SAR Benchmark
– Overview– System Architecture– Computational Components
• Release Information• Summary
MIT Lincoln LaboratoryHaney - 12
HPEC 10/20/2005
Spotlight SAR System
• Intent of Compact App:– Scalable– High Compute Fidelity– Self-Verifying
• Principal performance goal: Throughput– Maximize rate of results– Overlapped IO and
computing
MIT Lincoln LaboratoryHaney - 13
HPEC 10/20/2005
SARImages
Front-End Sensor Processing
TemplateFiles
Back-End Knowledge Formation
Validation
TemplateFiles
Groups ofTemplateFiles
RawSAR
Files
SARImage
Scalable Data and Template
Generator
Kernel #2Image Storage
Groups of Template
Files
Sub-ImageDetectionFiles
Image Files
Sub-ImageDetectionFiles
DetectionsKernel #4 Detection
SARImage TemplateInsertion
Kernel #3Image
Retrieval Templates
RawSARFile
SARImageFiles
SARImageFiles
Kernel #1 Data Readand Image Formation
Templates
TemplateFiles
SAR System Architecture
Raw SAR Data Files
Computation File IO
HPEC community has
traditionally focused on
Computation …
… but File IO performance is
increasingly important
MIT Lincoln LaboratoryHaney - 14
HPEC 10/20/2005
SARImage
Knowledge Formation
SARImageFiles
RawSARFile
TemplateFiles
Groups ofTemplateFiles
RawSARFiles
Kernel #2Image Storage
SARImageFiles
DetectionFiles
Kernel #3Image
Retrieval
TemplateFiles
TemplateFiles
Groups of Template
Files
Sub-ImageDetectionFiles
Image Files
Sensor Processing
Raw SAR Data Files
ValidationDetectionsKernel #4
Detection
SARImage
Templates
Data Generation and Computational Stages
RawSAR
Templates
SARImage Template
Insertion
Scalable Data and Template
Generator
Kernel #1 Image
Formation
Templates
MIT Lincoln LaboratoryHaney - 15
HPEC 10/20/2005
• Radar captures echo returns from a ‘swath’ on the ground
• Notional linear FM chirp pulse train, plus two ideally non-overlapping echoes returned from different positions on the swath
• Summation and scaling of echo returns realizes a challengingly long antenna aperture along the flight path
SAR Overview
. . .
(∑ ∑ −=pulses swath
mntpmnuts )),(),(),( τα
delayed transmitted SAR waveform
reflection coefficient scale factor, different for each return from the swathreceived
‘raw’ SAR
Cross-Range, Y = 2Y0
Fixed to Broadside
Range, X = 2X0
Synthetic Aperture, L
MIT Lincoln LaboratoryHaney - 16
HPEC 10/20/2005
Scalable Synthetic Data Generator
• Generates synthetic raw SAR complex data
• Data size is scalable to enable rigorous testing of high performance computing systems
– User defined scale factor determines the size of images generated
• Generates ‘templates’ that consist of rotated and pixelated capitalized letters
Source Code adapted from Soumekh, 1999.
Cross−rangeR
ange
Spotlight SAR Returns
20 40 60 80 100 120 140 160
50
100
150
200
250
300
350
400
0
2
4
6
8
10
12
14
16
Cross-RangeR
ange
Spotlight SAR Returns
MIT Lincoln LaboratoryHaney - 17
HPEC 10/20/2005
Kernel 1 — SAR Image Formation
Source Code adapted from Soumekh, 1999.
s(ω,ku) f(x,y)
F(kx,ky)
Interpolationkx = sqrt(4k2 –ku2)
ky = kuMatched Filtering
Fourier Transform(t,u)B(ω,ku)
Inverse Fourier Transform
(kx,ky) B (x,y)
s*0(ω,ku)
s(t,u)
Range X, pixels
Cro
ss−
rang
e Y
, pix
els
Wavefront Spotlight SAR Reconstruction
50 100 150 200 250
50
100
150
200
250
300
350
200
400
600
800
1000
1200
1400
1600
Received Samples Fit a Polar Swath
Processed SamplesFit a Rectangular Swath f
θo
kx
ky
Cross-Range, Pixels
Ran
ge, P
ixel
s
Spotlight SAR Reconstruction
Spatial Frequency Domain Interpolation
MIT Lincoln LaboratoryHaney - 18
HPEC 10/20/2005
Template Insertion(untimed)
• Inserts rotated pixelated capital letter templates into each SAR image
– Non-overlapping locations and rotations– Randomly selects 50%– Used as ideal detection targets in Kernel 4
Image with Inserted Templates
X pixels
Y p
ixel
s
50 100 150 200 250
50
100
150
200
250
300
350
200
400
600
800
1000
1200
1400
1600
Image with Inserted Templates
X pixels
Y p
ixel
s
50 100 150 200 250
50
100
150
200
250
300
350
200
400
600
800
1000
1200
1400
1600
Y Pi
xels
Y Pi
xels
X Pixels X Pixels
If inserted with %100 Templates
Image only inserted with %50 random Templates
MIT Lincoln LaboratoryHaney - 19
HPEC 10/20/2005
10 20 30 40 50
5
10
15
20
25
30
35
40
45
50
Thresholded Image Difference
50 100 150 200 250
50
100
150
200
250
300
350
Kernel 4 — Detection
• Detects targets in SAR images1. Image difference2. Threshold3. Sub-regions 4. Correlate with every template
max is target ID
• Computationally difficult– Many small correlations over
random pieces of a large image• 100% recognition no false alarms
Image Difference
50 100 150 200 250
50
100
150
200
250
300
350
Image Difference
Image B
50 100 150 200 250
50
100
150
200
250
300
350
Image A
50 100 150 200 250
50
100
150
200
250
300
350
Image A
Image B
Sub-regionThresholded Difference
MIT Lincoln LaboratoryHaney - 20
HPEC 10/20/2005
ValidationDetectionsKernel #4
Detection
SARImage
Templates
RawSAR
Templates
SARImage Template
Insertion
Scalable Data and Template
Generator
Kernel #1 Image
Formation
Templates
Benchmark Summary andComputational Challenges
• Pulse compression• Polar Interpolation• FFT, IFFT (corner turn)
• Sequential store• Non-sequential
retrieve• Large & small IO
• Large Images difference & Threshold
• Many small correlations on random pieces of large image
• Scalable synthetic data generation
Front-End Sensor Processing
Back-End Knowledge Formation
MIT Lincoln LaboratoryHaney - 21
HPEC 10/20/2005
Outline
• Introduction• Kernel Level Benchmarks• SAR Benchmark• Release Information• Summary
MIT Lincoln LaboratoryHaney - 22
HPEC 10/20/2005
HPEC Challenge Benchmark Release
• http://www.ll.mit.edu/HPECChallenge/– Future site of documentation and
software
• Initial release is available to PCA, HPCS, and HPEC SI program members through respective program web pages
– Documentation– ANSI C Kernel Benchmarks– Single processor MATLAB SAR System
Benchmark
• Complete release will be made available to the public in first quarter of CY06
MIT Lincoln LaboratoryHaney - 23
HPEC 10/20/2005
Summary
• The HPEC Challenge is a publicly available suite of benchmarks for the embedded space
– Representative of a wide variety of DoD applications
• Benchmarks stress computation, communication and I/O
• Benchmarks are provided at multiple levels– Kernel: small enough to easily understand and optimize– Compact application: representative of real workloads– Single-processor and multi-processor
• For more information, see http://www.ll.mit.edu/HPECChallenge/
MIT Lincoln LaboratoryHaney - 24
HPEC 10/20/2005
Backup Slides
MIT Lincoln LaboratoryHaney - 25
HPEC 10/20/2005
Kernel Data Set Summary
1505015050Chromosomes
105968GenesGenetic algorithm
700440Ops per cycle102,400500Total recordsDatabase
Number of patterns
Pattern lengthMatrix sizeDopplersRange gatesBeamsMatrix sizeVec LengthCoefsFiltersParameter
1024409612128
2567212864Pattern
Match
750x500050x5000Corner Turn166412824
9900190935006416484816CFAR
Detection
150x150180x60500x100QR/SVD
2064FIRSet 4Set 3Set 2Set 1Kernel