25
Haney - 1 HPEC 10/20/2005 MIT Lincoln Laboratory The HPEC Challenge Benchmark Suite Ryan Haney, Theresa Meuse, Jeremy Kepner and James Lebak Massachusetts Institute of Technology Lincoln Laboratory HPEC 2005 This work is sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

The HPEC Challenge Benchmark Suitearchive.ll.mit.edu/HPEC/agendas/proc05/Day_1/Presentations/1550_Haney.pdfMIT Lincoln Laboratory Haney - 4 HPEC 10/20/2005 HPEC Challenge Benchmark

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • Haney - 1HPEC 10/20/2005

    MIT Lincoln Laboratory

    The HPEC Challenge Benchmark Suite

    Ryan Haney, Theresa Meuse, Jeremy Kepner and James Lebak

    Massachusetts Institute of TechnologyLincoln Laboratory

    HPEC 2005

    This work is sponsored by the Defense Advanced Research Projects Agency under Air Force Contract FA8721-05-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

  • MIT Lincoln LaboratoryHaney - 2

    HPEC 10/20/2005

    Acknowledgements

    • Lincoln Laboratory PCA Team– Matthew Alexander– Jeanette Baran-Gale– Hector Chan– Edmund Wong

    • Shomo Tech Systems– Marti Bancroft

    • Silicon Graphics Incorporated– William Harrod

    • Sponsor– Robert Graybill, DARPA PCA and HPCS Programs

    • Code adapted from Soumekh, Mehrdad, Synthetic Aperture Radar Signal Processing with Matlab Algorithms, Wiley, 1999

  • MIT Lincoln LaboratoryHaney - 3

    HPEC 10/20/2005

    Motivation

    Advanced Sensor Platforms

    Processor and System Architectures

    Single ProcessorElement

    Multi-computers

    Super-computers

    TiledProcessors

    Challenge: Provide benchmarks that test a system at the kernel and multi-processor levels

    Design System• Choose

    components• Hardware size• Required software

    performance

    Measure Performance• Throughput• Power• Stability

    Implement Benchmarks• Design• Code• Tune

    System Analysis and Design

  • MIT Lincoln LaboratoryHaney - 4

    HPEC 10/20/2005

    HPEC Challenge Benchmark Suite

    • PCA program kernel benchmarks– Single-processor operations– Drawn from many different DoD applications– Represent both “front-end” signal processing and “back-end”

    knowledge processing

    • HPCS program Synthetic SAR benchmark– Multi-processor compact application– Representative of a real application workload– Designed to be easily scalable and verifiable

  • MIT Lincoln LaboratoryHaney - 5

    HPEC 10/20/2005

    Outline

    • Introduction• Kernel Level Benchmarks

    – Kernel Overview– Kernel Architecture– Generic vs. Optimized Results

    • SAR Benchmark• Release Information• Summary

  • MIT Lincoln LaboratoryHaney - 6

    HPEC 10/20/2005

    Kernel Benchmark Selection

    “Front-end Processing”• Data independent,

    stream-oriented• Signal processing,

    image processing, high-speed network communication

    “Back-end Processing”• Data dependent, thread

    oriented• Information processing,

    knowledge processing

    Broad Processing Categories Specific Kernels

    Signal/Image Processing• Finite Impulse Response Filter (FIR)• QR Factorization (QR)• Singular Value Decomposition (SVD)• Constant False Alarm Rate Detection

    (CFAR)

    Communication• Corner Turn (CT)

    Information/Knowledge Processing

    • Graph Optimization via Genetic Algorithm (GA)

    • Pattern Match (PM)• Real-time Database Operations (DB)

  • MIT Lincoln LaboratoryHaney - 7

    HPEC 10/20/2005

    Signal and Image Processing Kernels

    QRFIR

    SVD CFAR

    Data Set 1:M Filters (~10 coefficients)

    Data Set 2:M Filters (>100 coefficients)

    • Bank of filters applied to input data• FIR filters implemented in time and

    frequency domain

    Input Matrix

    M C

    hann

    els

    InputMatrix

    BidiagonalMatrix

    DiagonalMatrix Σ

    • Produces decomposition of an input matrix, X=UΣVH

    • Classic Golub-Kahan SVD implementation

    A

    • Computes the factorization of an input matrix, A=QR

    • Implementation uses Fast Givens algorithm

    *Q R(MxN) (MxN)(MxM)

    Beams

    Dopplers

    Range

    C(i,j,k)

    T(i,j,k)

    C

    • Creates a target list given a data cube• Calculates normalized power for each

    cell, thresholds for target detection

    Normalize,Threshold

    Target List

    (i,j,k)

  • MIT Lincoln LaboratoryHaney - 8

    HPEC 10/20/2005

    Information and KnowledgeProcessing Kernels

    Range

    Mag …

    Pattern under test

    Pattern MatchGenetic Algorithm

    Database Operations

    • Compute best match for a pattern out of set of candidate patterns

    – Uses weighted mean-square error0.10.20.30.4Selection

    Evaluation

    Crossover Mutation

    • Evaluate each chromosome• Select chromosomes for next generation• Crossover: randomly pair up chromosomes

    and exchange portions • Mutation: randomly change each chromosome

    Red-Black TreeData Structure

    Linked ListData Structures

    • Three generic database operations:

    – search: find all items in a given range

    – insert: add items to the database

    – delete:remove item from the database

    Candidate Pattern 1

    Candidate Pattern 2

    Candidate Pattern N

    Corner Turn

    0 1 2 34 5 6 78 9 10 11

    1 5 92 6 103 7 11

    0 4 8

    • Memory rearrangement of matrix contents

    – Switch from row to column major layout

  • MIT Lincoln LaboratoryHaney - 9

    HPEC 10/20/2005

    Kernel Benchmark Architecture

    • Kernels written in ANSI C– Portable across UNIX based platforms

    • Sample data sets provided• Generation of user defined data sets and sizes possible using Matlab

    Matlab Data Generator

    Input Data KernelOutput Data

    C Verification

    C Kernel

    VerificationData

    Timing DataMatlab

    Performance Analysis

    VerificationResults

    PerformanceResults

  • MIT Lincoln LaboratoryHaney - 10

    HPEC 10/20/2005

    Generic vs. Optimized Kernel Benchmark Results

    Optimized code outperforms

    baseline in FIR due to use of

    VSIPL libraries.

    QR and SVD optimized code outperforms baseline

    because of AltiVec use.

    Results similar for CFAR, PM, and GA. Minimal

    optimizations were made.

    Optimized Corner Turn code uses AltiVec intrinsics.

    Optimized DB code uses specialized

    memory management

    routines.

  • MIT Lincoln LaboratoryHaney - 11

    HPEC 10/20/2005

    Outline

    • Introduction• Kernel Level Benchmarks• SAR Benchmark

    – Overview– System Architecture– Computational Components

    • Release Information• Summary

  • MIT Lincoln LaboratoryHaney - 12

    HPEC 10/20/2005

    Spotlight SAR System

    • Intent of Compact App:– Scalable– High Compute Fidelity– Self-Verifying

    • Principal performance goal: Throughput– Maximize rate of results– Overlapped IO and

    computing

  • MIT Lincoln LaboratoryHaney - 13

    HPEC 10/20/2005

    SARImages

    Front-End Sensor Processing

    TemplateFiles

    Back-End Knowledge Formation

    Validation

    TemplateFiles

    Groups ofTemplateFiles

    RawSAR

    Files

    SARImage

    Scalable Data and Template

    Generator

    Kernel #2Image Storage

    Groups of Template

    Files

    Sub-ImageDetectionFiles

    Image Files

    Sub-ImageDetectionFiles

    DetectionsKernel #4 Detection

    SARImage TemplateInsertion

    Kernel #3Image

    Retrieval Templates

    RawSARFile

    SARImageFiles

    SARImageFiles

    Kernel #1 Data Readand Image Formation

    Templates

    TemplateFiles

    SAR System Architecture

    Raw SAR Data Files

    Computation File IO

    HPEC community has

    traditionally focused on

    Computation …

    … but File IO performance is

    increasingly important

  • MIT Lincoln LaboratoryHaney - 14

    HPEC 10/20/2005

    SARImage

    Knowledge Formation

    SARImageFiles

    RawSARFile

    TemplateFiles

    Groups ofTemplateFiles

    RawSARFiles

    Kernel #2Image Storage

    SARImageFiles

    DetectionFiles

    Kernel #3Image

    Retrieval

    TemplateFiles

    TemplateFiles

    Groups of Template

    Files

    Sub-ImageDetectionFiles

    Image Files

    Sensor Processing

    Raw SAR Data Files

    ValidationDetectionsKernel #4

    Detection

    SARImage

    Templates

    Data Generation and Computational Stages

    RawSAR

    Templates

    SARImage Template

    Insertion

    Scalable Data and Template

    Generator

    Kernel #1 Image

    Formation

    Templates

  • MIT Lincoln LaboratoryHaney - 15

    HPEC 10/20/2005

    • Radar captures echo returns from a ‘swath’ on the ground

    • Notional linear FM chirp pulse train, plus two ideally non-overlapping echoes returned from different positions on the swath

    • Summation and scaling of echo returns realizes a challengingly long antenna aperture along the flight path

    SAR Overview

    . . .

    (∑ ∑ −=pulses swath

    mntpmnuts )),(),(),( τα

    delayed transmitted SAR waveform

    reflection coefficient scale factor, different for each return from the swathreceived

    ‘raw’ SAR

    Cross-Range, Y = 2Y0

    Fixed to Broadside

    Range, X = 2X0

    Synthetic Aperture, L

  • MIT Lincoln LaboratoryHaney - 16

    HPEC 10/20/2005

    Scalable Synthetic Data Generator

    • Generates synthetic raw SAR complex data

    • Data size is scalable to enable rigorous testing of high performance computing systems

    – User defined scale factor determines the size of images generated

    • Generates ‘templates’ that consist of rotated and pixelated capitalized letters

    Source Code adapted from Soumekh, 1999.

    Cross−rangeR

    ange

    Spotlight SAR Returns

    20 40 60 80 100 120 140 160

    50

    100

    150

    200

    250

    300

    350

    400

    0

    2

    4

    6

    8

    10

    12

    14

    16

    Cross-RangeR

    ange

    Spotlight SAR Returns

  • MIT Lincoln LaboratoryHaney - 17

    HPEC 10/20/2005

    Kernel 1 — SAR Image Formation

    Source Code adapted from Soumekh, 1999.

    s(ω,ku) f(x,y)

    F(kx,ky)

    Interpolationkx = sqrt(4k2 –ku2)

    ky = kuMatched Filtering

    Fourier Transform(t,u)B(ω,ku)

    Inverse Fourier Transform

    (kx,ky) B (x,y)

    s*0(ω,ku)

    s(t,u)

    Range X, pixels

    Cro

    ss−

    rang

    e Y

    , pix

    els

    Wavefront Spotlight SAR Reconstruction

    50 100 150 200 250

    50

    100

    150

    200

    250

    300

    350

    200

    400

    600

    800

    1000

    1200

    1400

    1600

    Received Samples Fit a Polar Swath

    Processed SamplesFit a Rectangular Swath f

    θo

    kx

    ky

    Cross-Range, Pixels

    Ran

    ge, P

    ixel

    s

    Spotlight SAR Reconstruction

    Spatial Frequency Domain Interpolation

  • MIT Lincoln LaboratoryHaney - 18

    HPEC 10/20/2005

    Template Insertion(untimed)

    • Inserts rotated pixelated capital letter templates into each SAR image

    – Non-overlapping locations and rotations– Randomly selects 50%– Used as ideal detection targets in Kernel 4

    Image with Inserted Templates

    X pixels

    Y p

    ixel

    s

    50 100 150 200 250

    50

    100

    150

    200

    250

    300

    350

    200

    400

    600

    800

    1000

    1200

    1400

    1600

    Image with Inserted Templates

    X pixels

    Y p

    ixel

    s

    50 100 150 200 250

    50

    100

    150

    200

    250

    300

    350

    200

    400

    600

    800

    1000

    1200

    1400

    1600

    Y Pi

    xels

    Y Pi

    xels

    X Pixels X Pixels

    If inserted with %100 Templates

    Image only inserted with %50 random Templates

  • MIT Lincoln LaboratoryHaney - 19

    HPEC 10/20/2005

    10 20 30 40 50

    5

    10

    15

    20

    25

    30

    35

    40

    45

    50

    Thresholded Image Difference

    50 100 150 200 250

    50

    100

    150

    200

    250

    300

    350

    Kernel 4 — Detection

    • Detects targets in SAR images1. Image difference2. Threshold3. Sub-regions 4. Correlate with every template

    max is target ID

    • Computationally difficult– Many small correlations over

    random pieces of a large image• 100% recognition no false alarms

    Image Difference

    50 100 150 200 250

    50

    100

    150

    200

    250

    300

    350

    Image Difference

    Image B

    50 100 150 200 250

    50

    100

    150

    200

    250

    300

    350

    Image A

    50 100 150 200 250

    50

    100

    150

    200

    250

    300

    350

    Image A

    Image B

    Sub-regionThresholded Difference

  • MIT Lincoln LaboratoryHaney - 20

    HPEC 10/20/2005

    ValidationDetectionsKernel #4

    Detection

    SARImage

    Templates

    RawSAR

    Templates

    SARImage Template

    Insertion

    Scalable Data and Template

    Generator

    Kernel #1 Image

    Formation

    Templates

    Benchmark Summary andComputational Challenges

    • Pulse compression• Polar Interpolation• FFT, IFFT (corner turn)

    • Sequential store• Non-sequential

    retrieve• Large & small IO

    • Large Images difference & Threshold

    • Many small correlations on random pieces of large image

    • Scalable synthetic data generation

    Front-End Sensor Processing

    Back-End Knowledge Formation

  • MIT Lincoln LaboratoryHaney - 21

    HPEC 10/20/2005

    Outline

    • Introduction• Kernel Level Benchmarks• SAR Benchmark• Release Information• Summary

  • MIT Lincoln LaboratoryHaney - 22

    HPEC 10/20/2005

    HPEC Challenge Benchmark Release

    • http://www.ll.mit.edu/HPECChallenge/– Future site of documentation and

    software

    • Initial release is available to PCA, HPCS, and HPEC SI program members through respective program web pages

    – Documentation– ANSI C Kernel Benchmarks– Single processor MATLAB SAR System

    Benchmark

    • Complete release will be made available to the public in first quarter of CY06

  • MIT Lincoln LaboratoryHaney - 23

    HPEC 10/20/2005

    Summary

    • The HPEC Challenge is a publicly available suite of benchmarks for the embedded space

    – Representative of a wide variety of DoD applications

    • Benchmarks stress computation, communication and I/O

    • Benchmarks are provided at multiple levels– Kernel: small enough to easily understand and optimize– Compact application: representative of real workloads– Single-processor and multi-processor

    • For more information, see http://www.ll.mit.edu/HPECChallenge/

  • MIT Lincoln LaboratoryHaney - 24

    HPEC 10/20/2005

    Backup Slides

  • MIT Lincoln LaboratoryHaney - 25

    HPEC 10/20/2005

    Kernel Data Set Summary

    1505015050Chromosomes

    105968GenesGenetic algorithm

    700440Ops per cycle102,400500Total recordsDatabase

    Number of patterns

    Pattern lengthMatrix sizeDopplersRange gatesBeamsMatrix sizeVec LengthCoefsFiltersParameter

    1024409612128

    2567212864Pattern

    Match

    750x500050x5000Corner Turn166412824

    9900190935006416484816CFAR

    Detection

    150x150180x60500x100QR/SVD

    2064FIRSet 4Set 3Set 2Set 1Kernel