12
Multidimensional Performance Modeling Multidimensional Performance Modeling for Advanced Embedded Signal for Advanced Embedded Signal Processors Processors Michael Michael Stebnisky Stebnisky Carl Hein Carl Hein Lockheed Martin Advanced Technology Laboratories Lockheed Martin Advanced Technology Laboratories 1 Federal Street 1 Federal Street A&E Building 2W A&E Building 2W Camden, New Jersey 08102 Camden, New Jersey 08102 mstebnis mstebnis@atl atl. lmco lmco .com .com Systems Integration Systems Integration

Multidimensional Performance Modeling for …€¢Real time on-demand dynamic allocation of computational resources •Each task/component (i.e. fft, fir) is abstracted by several

Embed Size (px)

Citation preview

Multidimensional Performance ModelingMultidimensional Performance Modelingfor Advanced Embedded Signalfor Advanced Embedded Signal

ProcessorsProcessors

Michael Michael StebniskyStebniskyCarl HeinCarl Hein

Lockheed Martin Advanced Technology LaboratoriesLockheed Martin Advanced Technology Laboratories1 Federal Street 1 Federal Street •• A&E Building 2W A&E Building 2W

Camden, New Jersey 08102Camden, New Jersey 08102mstebnismstebnis@@atlatl..lmcolmco.com.com

Systems IntegrationSystems Integration

22

Multidimensional Performance ModelingMultidimensional Performance Modeling

•• Problem:Problem:—— Traditional performance modeling approaches \are unable toTraditional performance modeling approaches \are unable to

address emerging requirements and component technologies.address emerging requirements and component technologies.This is a result of an increased awareness and need forThis is a result of an increased awareness and need fordynamically adaptive ordynamically adaptive or reconfigurable reconfigurable systems, particularly in systems, particularly inthe area of power dissipation/performance.the area of power dissipation/performance.

•• Goal(s)/Objectives(s)Goal(s)/Objectives(s)—— Define methods/algorithms to acurately model and optimizeDefine methods/algorithms to acurately model and optimize

reconfigurable architectures and functions (services) requiredreconfigurable architectures and functions (services) requiredto support multidimensional performance modeling.to support multidimensional performance modeling.

—— Apply ideas developed from InfoPad, ACS, PAC/C, DARES,Apply ideas developed from InfoPad, ACS, PAC/C, DARES,PCA, and MSP to develop a uniqe new rapidPCA, and MSP to develop a uniqe new rapidprototyping/optimization capability.prototyping/optimization capability.

•• ApproachApproach—— Define features required to support accurate performance andDefine features required to support accurate performance and

multidimensional modeling and optimization of DRAs.multidimensional modeling and optimization of DRAs.

—— Evaluate algorithms/methods for performing intelligent,Evaluate algorithms/methods for performing intelligent,reactive dynamic scheduling.reactive dynamic scheduling.

—— Evaluate algorithms/methods for performing offline analysis,Evaluate algorithms/methods for performing offline analysis,data reduction, pattern recognition, and execution planning.data reduction, pattern recognition, and execution planning.

DoDDoD missions/systems require new approaches/ tools missions/systems require new approaches/ tools to exploit emergingto exploit emergingreconfigurable technologies to form polymorphous/powerreconfigurable technologies to form polymorphous/poweraware systems.aware systems.

33

A Methodology for Verifiable, Validatable Polymorphic Architectures

DRA Optimization Process Flow

RuntimeSystemModel

Parameterized Resource

Executables

OfflineOptimiza

tion: Speed

Latency Power Energy

QoS ...

DRAObjectLibrary Runtime

Optim-ization: Speed

Latency Power Energy

QoS ...

ApplicationFlowgraphs

PreprocessedSchedulingInformation

Schedule/Component

Analysis/Feedback to Optimization

Characteristics

Est

imat

es

Lo

cal

Des

ign

s

CO

TS

Testcase/ Scenario Testbench

DynamicallyDynamically Reconfigurable Reconfigurable Architectures Architectures

•• Based upon a two phaseBased upon a two phaseoptimization process - offlineoptimization process - offlineand onlineand online

•• Offline optimizationOffline optimization—— Overall goal: performOverall goal: perform

component selection, pruning,component selection, pruning,data extraction and sensitivitydata extraction and sensitivityanalysis that will maximize theanalysis that will maximize theeffectiveness of the onlineeffectiveness of the onlineoptimizationoptimization

—— Selects an optimum set ofSelects an optimum set ofverified, validated componentsverified, validated componentsfrom existing librariesfrom existing libraries

—— Facilitates analysis to identifyFacilitates analysis to identifypotential implementations morepotential implementations moreoptimal to the applicationoptimal to the application

—— Identifies dependencies/relationshipsIdentifies dependencies/relationshipsin the data flow graph that will facilitate online schedulingin the data flow graph that will facilitate online scheduling

•• Online optimizationOnline optimization—— Components are selected for execution at runtime based upon dynamic figures of meritComponents are selected for execution at runtime based upon dynamic figures of merit—— Figures of merit are complex functions of component characteristicsFigures of merit are complex functions of component characteristics

44

Key Aspects of ApproachKey Aspects of Approach•• Dealing with overwhelming flexibilityDealing with overwhelming flexibility

—— Management of complexity andManagement of complexity andmethods of abstraction/simplificationmethods of abstraction/simplification

•• Two stage optimizationTwo stage optimization—— Offline (component and subsystem)Offline (component and subsystem)

and online (scheduling)and online (scheduling)

•• Methodology for generating/optimizing/Methodology for generating/optimizing/representing components andrepresenting components andreconfigurablereconfigurable devices devices—— Develop operating points, analyze, improve, andDevelop operating points, analyze, improve, and

abstractabstract

•• Dynamic scheduling with multidimensionalDynamic scheduling with multidimensionalgoals/constraintsgoals/constraints—— ““ToolkitToolkit”” approach using offline extracted information approach using offline extracted information

•• Offline information extraction and optimization of schedulingOffline information extraction and optimization of scheduling—— Analysis/characterization/abstraction of tasks/graphsAnalysis/characterization/abstraction of tasks/graphs

•• Constraint/goal simplification using modesConstraint/goal simplification using modes—— Abstraction method minimizing computation and enforcing interfaceAbstraction method minimizing computation and enforcing interfacess

•• Reuse of existing capability to perform the required analysis and dataReuse of existing capability to perform the required analysis and datavisualizationvisualization—— Use the internally developed CSIMUse the internally developed CSIM

55

Graphical Hierarchical HardwareArchitecture Definition

Graphical Hierarchical SoftwareArchitecture Definition

Produces Animated and Static Simulation TimelineDisplays as well as processor, communication, and

memory statistical data

Modify

Modify

Model of Hardware

BehaviorsPE SE ME

BehaviorsPE SE ME

BehaviorsPE SE ME

Structure:Network TopologyStructure:

Network Topology

SW Data Flow GraphDFG

FFT

FIR FFT VMULVMULVMUL

DFG

FFT

FIR FFT VMULVMULVMUL

Application SW for PEsBoard 1/ PE1 Recv 1024

Compute 3.2 us Send 512,2 Send 512,3 Recv 1024

Schedule

Map / Allocate

Partition

Analysis

HW / SWCo-SimulationVisualization &

Animation

Detailed information on CSIM can be found at: www.atl.lmco.com/proj/csim/Detailed information on CSIM can be found at: www.atl.lmco.com/proj/csim/

CSIM CSIM —— HW/SW Virtual PrototypingHW/SW Virtual PrototypingCapabilities and FeaturesCapabilities and Features

•• CSIM has demonstrated the ability toCSIM has demonstrated the ability tomodel complex signal processormodel complex signal processornetwork behavior and performancenetwork behavior and performance

•• CSIM is a C-based, hierarchicalCSIM is a C-based, hierarchicalsimulation environment forsimulation environment formodeling system, subsystem, andmodeling system, subsystem, andindividual module/componentindividual module/componentHW/SW performanceHW/SW performance

•• CSIMCSIM’’ss independent use of HW/SW independent use of HW/SWmodels provides a path for reducingmodels provides a path for reducingdevelopment costs as well as the lifedevelopment costs as well as the lifecycle costscycle costs

•• CSIM provides a commonCSIM provides a commonenvironment for developing andenvironment for developing andverifying system, subsystem, andverifying system, subsystem, andmodule HW/SW interfaces andmodule HW/SW interfaces andoverall system performanceoverall system performance

66

Design space resulting from evaluating aDesign space resulting from evaluating arepresentative signal processing applicationrepresentative signal processing application

Automated Support to Offline Analysis/Automated Support to Offline Analysis/OptimizationOptimization

•• Each task/component (i.e.Each task/component (i.e.fftfft, fir) is abstracted by, fir) is abstracted byseveral parameters (i.e.several parameters (i.e.throughput, latency, power,throughput, latency, power,size, etc.)size, etc.)

•• Each task may have severalEach task may have severaldifferently optimizeddifferently optimizedimplementationsimplementations

•• An online library ofAn online library ofimplementations is selectedimplementations is selected

•• The application(s) of interestThe application(s) of interestare simulated using a subsetare simulated using a subsetof the available componentsof the available components

•• The results of simulatingThe results of simulatingeach subset is characterizedeach subset is characterizedwith respect to the parameters of interestwith respect to the parameters of interest

•• The characterizations are used to identify a subset of implementationsThe characterizations are used to identify a subset of implementationsthat are most optimal across the parameters of interestthat are most optimal across the parameters of interest

77

Processor Architecture and EnvironmentProcessor Architecture and Environment

This graphic illustrates aThis graphic illustrates arepresentative processorrepresentative processorarchitecture and an associatedarchitecture and an associatedCSIM visualizationCSIM visualizationenvironment. The core CSIMenvironment. The core CSIMcapability has beencapability has beenaugmented with a dynamicaugmented with a dynamicscheduler that can applyscheduler that can applycomplex selection criteria tocomplex selection criteria tothe scheduling process. In thisthe scheduling process. In thisexample, these complexexample, these complexselection criteria areselection criteria areencapsulated as severalencapsulated as several““modesmodes””; the buttons on the; the buttons on thelower right are used tolower right are used tointeractively switch betweeninteractively switch betweenthe modes. In addition, any ofthe modes. In addition, any ofthe processors can be the processors can be ““killedkilled””or returned to the simulation,or returned to the simulation,facilitating the evaluation offacilitating the evaluation offailoverfailover and similar analyses. and similar analyses.Illustrative results are shownIllustrative results are shownin the following slides.in the following slides.

88

Power-drivenscheduling;“LoPwr”

Complex criteriadrivenscheduling,“DDPSWC”

Assignedtaskscheduling,“Base”

Assignedspeedscheduling,“Fast”

Maxspeed-scheduling,“Fastest”

DynamicallyDynamically Reconfigurable Reconfigurable Architectures Architectures•• Real time on-demandReal time on-demand

dynamic allocation ofdynamic allocation ofcomputational resourcescomputational resources

•• Each task/componentEach task/component(i.e. (i.e. fftfft, fir) is abstracted, fir) is abstractedby several parametersby several parameters(i.e. throughput, latency,(i.e. throughput, latency,power, size, etc.)power, size, etc.)

•• Each task may haveEach task may haveseveral differentlyseveral differentlyoptimized implementationsoptimized implementations

•• An online library ofAn online library ofimplementations is selectedimplementations is selected

•• At run time theAt run time theimplementation bestimplementation bestmatched to the currentmatched to the currentselection criteria is executedselection criteria is executed

•• For this example, the maximumFor this example, the maximumthroughput to minimum powerthroughput to minimum powerdissipation ratio is improved >10X versus standard practicedissipation ratio is improved >10X versus standard practice

SWEPT improvement >10X vs standard practiceSWEPT improvement >10XSWEPT improvement >10X vs vs standard practice standard practice

99

Processorsfaulted/

switched out

Processorsreturned tooperation

DynamicallyDynamically Reconfigurable Reconfigurable Architectures Architectures

•• Validation and verificationValidation and verificationof next generationof next generationprocessor architecturesprocessor architectures

•• Any processor may beAny processor may betemporarily or permanentlytemporarily or permanentlyremoved from theremoved from thearchitecturearchitecture

•• Processors may beProcessors may bereturned to the architecturereturned to the architecture

•• Facilitates developmentFacilitates developmentand analysis ofand analysis offailoverfailover/fault adaptation/fault adaptationmethodologiesmethodologies

•• Extends multidimensionalExtends multidimensionaloptimization tooptimization toarchitectures witharchitectures withintermittently inactiveintermittently inactivecomponents (especiallycomponents (especiallyuseful for throughput/useful for throughput/power trades)power trades)

1010

Other Applications of CSIMOther Applications of CSIM

ProgramProgram

VNSVNS

AMRFSAMRFS

AdvancedAdvancedSurfaceSurfaceShip ModelShip Model

AvionicsAvionicsMissionMissionComputingComputingSystemSystem

WirelessWirelessMobileMobileNetworksNetworks

DescriptionDescription

Network attackNetwork attacksimulator for trainingsimulator for trainingIA operatorsIA operators

Agent-based resourceAgent-based resourcemanagementmanagement

ComputingComputinginfrastructure modelinfrastructure model

Model of JSF-Model of JSF-Integrated CoreIntegrated CoreProcessorProcessor

Dynamic ad hocDynamic ad hocrouting behaviorsrouting behaviors

CustomerCustomer

US ArmyUS ArmyCECOMCECOM

ONRONR

LM NE&SS-LM NE&SS-MoorestownMoorestown

LM NE&SS-LM NE&SS-EganEgan

DARPADARPA((XGCommsXGComms))US ArmyUS ArmyCECOM (WIN-T)CECOM (WIN-T)

Key TechnologyKey Technology

Token-based performanceToken-based performancemodeling, HLAmodeling, HLA

Intelligent agents,Intelligent agents,dynamic schedulerdynamic scheduler

Token-based performanceToken-based performancemodeling, pre-emptivemodeling, pre-emptivemultitasking CPU modelsmultitasking CPU models

Token-based performanceToken-based performancemodeling, hardware/modeling, hardware/software co-simulationsoftware co-simulation

Abstract propagation andAbstract propagation andarbitration models,arbitration models,dynamic routingdynamic routingalgorithmsalgorithms

1111

Specification

Requirements

CSIM & MaC

Run-time Environment and DesignRun-time Environment and DesignApplication for Polymorphous TechnologyApplication for Polymorphous TechnologyVerification & Validation (RE-ADAPT V&V)Verification & Validation (RE-ADAPT V&V)

•• ImpactImpact

—— Design-Time Modeling andDesign-Time Modeling andSimulation Environment forSimulation Environment forVerification and Validation (V&V) of PCA enabled applicationsVerification and Validation (V&V) of PCA enabled applications

—— Run-Time Monitoring for PCA Enabled Application V&VRun-Time Monitoring for PCA Enabled Application V&V

—— Approach Validation via application to PCA enabled avionics designApproach Validation via application to PCA enabled avionics design

•• New IdeasNew Ideas

—— Methodology for run-timeMethodology for run-timedetection of requirementdetection of requirementviolationsviolations

—— Framework for automaticFramework for automaticgeneration of monitoring andgeneration of monitoring andchecking componentschecking components

—— Dynamic run-time corrector toDynamic run-time corrector toforce reconfiguration into aforce reconfiguration into asafe statesafe state

Real-time Systems Group, University of Pennsylvania

1212

StreamProcessorsindicated byfilled boxes

GP Processorsindicated byoutlined boxes

Dynamic barchart indicatingtotal activeprocessors,active streamprocessoractive GPprocessors andactive threadedprocessors

Total activeprocessorcount display

MaCS messagesand status

Mission status RADAR Tasks

Imaging

RoutePlanning

Self Testand MaCS

FlightControl

ThreatAvoidance

MissionAssignment

Communi-cations

PCA Virtual ProcessorState and Activity

System State andTask Flow

Real-time Systems Group, University of Pennsylvania

DARPATechDARPATech Demo Demo