Upload
trinhduong
View
215
Download
0
Embed Size (px)
Citation preview
Multidimensional Performance ModelingMultidimensional Performance Modelingfor Advanced Embedded Signalfor Advanced Embedded Signal
ProcessorsProcessors
Michael Michael StebniskyStebniskyCarl HeinCarl Hein
Lockheed Martin Advanced Technology LaboratoriesLockheed Martin Advanced Technology Laboratories1 Federal Street 1 Federal Street •• A&E Building 2W A&E Building 2W
Camden, New Jersey 08102Camden, New Jersey 08102mstebnismstebnis@@atlatl..lmcolmco.com.com
Systems IntegrationSystems Integration
22
Multidimensional Performance ModelingMultidimensional Performance Modeling
•• Problem:Problem:—— Traditional performance modeling approaches \are unable toTraditional performance modeling approaches \are unable to
address emerging requirements and component technologies.address emerging requirements and component technologies.This is a result of an increased awareness and need forThis is a result of an increased awareness and need fordynamically adaptive ordynamically adaptive or reconfigurable reconfigurable systems, particularly in systems, particularly inthe area of power dissipation/performance.the area of power dissipation/performance.
•• Goal(s)/Objectives(s)Goal(s)/Objectives(s)—— Define methods/algorithms to acurately model and optimizeDefine methods/algorithms to acurately model and optimize
reconfigurable architectures and functions (services) requiredreconfigurable architectures and functions (services) requiredto support multidimensional performance modeling.to support multidimensional performance modeling.
—— Apply ideas developed from InfoPad, ACS, PAC/C, DARES,Apply ideas developed from InfoPad, ACS, PAC/C, DARES,PCA, and MSP to develop a uniqe new rapidPCA, and MSP to develop a uniqe new rapidprototyping/optimization capability.prototyping/optimization capability.
•• ApproachApproach—— Define features required to support accurate performance andDefine features required to support accurate performance and
multidimensional modeling and optimization of DRAs.multidimensional modeling and optimization of DRAs.
—— Evaluate algorithms/methods for performing intelligent,Evaluate algorithms/methods for performing intelligent,reactive dynamic scheduling.reactive dynamic scheduling.
—— Evaluate algorithms/methods for performing offline analysis,Evaluate algorithms/methods for performing offline analysis,data reduction, pattern recognition, and execution planning.data reduction, pattern recognition, and execution planning.
DoDDoD missions/systems require new approaches/ tools missions/systems require new approaches/ tools to exploit emergingto exploit emergingreconfigurable technologies to form polymorphous/powerreconfigurable technologies to form polymorphous/poweraware systems.aware systems.
33
A Methodology for Verifiable, Validatable Polymorphic Architectures
DRA Optimization Process Flow
RuntimeSystemModel
Parameterized Resource
Executables
OfflineOptimiza
tion: Speed
Latency Power Energy
QoS ...
DRAObjectLibrary Runtime
Optim-ization: Speed
Latency Power Energy
QoS ...
ApplicationFlowgraphs
PreprocessedSchedulingInformation
Schedule/Component
Analysis/Feedback to Optimization
Characteristics
Est
imat
es
Lo
cal
Des
ign
s
CO
TS
Testcase/ Scenario Testbench
DynamicallyDynamically Reconfigurable Reconfigurable Architectures Architectures
•• Based upon a two phaseBased upon a two phaseoptimization process - offlineoptimization process - offlineand onlineand online
•• Offline optimizationOffline optimization—— Overall goal: performOverall goal: perform
component selection, pruning,component selection, pruning,data extraction and sensitivitydata extraction and sensitivityanalysis that will maximize theanalysis that will maximize theeffectiveness of the onlineeffectiveness of the onlineoptimizationoptimization
—— Selects an optimum set ofSelects an optimum set ofverified, validated componentsverified, validated componentsfrom existing librariesfrom existing libraries
—— Facilitates analysis to identifyFacilitates analysis to identifypotential implementations morepotential implementations moreoptimal to the applicationoptimal to the application
—— Identifies dependencies/relationshipsIdentifies dependencies/relationshipsin the data flow graph that will facilitate online schedulingin the data flow graph that will facilitate online scheduling
•• Online optimizationOnline optimization—— Components are selected for execution at runtime based upon dynamic figures of meritComponents are selected for execution at runtime based upon dynamic figures of merit—— Figures of merit are complex functions of component characteristicsFigures of merit are complex functions of component characteristics
44
Key Aspects of ApproachKey Aspects of Approach•• Dealing with overwhelming flexibilityDealing with overwhelming flexibility
—— Management of complexity andManagement of complexity andmethods of abstraction/simplificationmethods of abstraction/simplification
•• Two stage optimizationTwo stage optimization—— Offline (component and subsystem)Offline (component and subsystem)
and online (scheduling)and online (scheduling)
•• Methodology for generating/optimizing/Methodology for generating/optimizing/representing components andrepresenting components andreconfigurablereconfigurable devices devices—— Develop operating points, analyze, improve, andDevelop operating points, analyze, improve, and
abstractabstract
•• Dynamic scheduling with multidimensionalDynamic scheduling with multidimensionalgoals/constraintsgoals/constraints—— ““ToolkitToolkit”” approach using offline extracted information approach using offline extracted information
•• Offline information extraction and optimization of schedulingOffline information extraction and optimization of scheduling—— Analysis/characterization/abstraction of tasks/graphsAnalysis/characterization/abstraction of tasks/graphs
•• Constraint/goal simplification using modesConstraint/goal simplification using modes—— Abstraction method minimizing computation and enforcing interfaceAbstraction method minimizing computation and enforcing interfacess
•• Reuse of existing capability to perform the required analysis and dataReuse of existing capability to perform the required analysis and datavisualizationvisualization—— Use the internally developed CSIMUse the internally developed CSIM
55
Graphical Hierarchical HardwareArchitecture Definition
Graphical Hierarchical SoftwareArchitecture Definition
Produces Animated and Static Simulation TimelineDisplays as well as processor, communication, and
memory statistical data
Modify
Modify
Model of Hardware
BehaviorsPE SE ME
BehaviorsPE SE ME
BehaviorsPE SE ME
Structure:Network TopologyStructure:
Network Topology
SW Data Flow GraphDFG
FFT
FIR FFT VMULVMULVMUL
DFG
FFT
FIR FFT VMULVMULVMUL
Application SW for PEsBoard 1/ PE1 Recv 1024
Compute 3.2 us Send 512,2 Send 512,3 Recv 1024
Schedule
Map / Allocate
Partition
Analysis
HW / SWCo-SimulationVisualization &
Animation
Detailed information on CSIM can be found at: www.atl.lmco.com/proj/csim/Detailed information on CSIM can be found at: www.atl.lmco.com/proj/csim/
CSIM CSIM —— HW/SW Virtual PrototypingHW/SW Virtual PrototypingCapabilities and FeaturesCapabilities and Features
•• CSIM has demonstrated the ability toCSIM has demonstrated the ability tomodel complex signal processormodel complex signal processornetwork behavior and performancenetwork behavior and performance
•• CSIM is a C-based, hierarchicalCSIM is a C-based, hierarchicalsimulation environment forsimulation environment formodeling system, subsystem, andmodeling system, subsystem, andindividual module/componentindividual module/componentHW/SW performanceHW/SW performance
•• CSIMCSIM’’ss independent use of HW/SW independent use of HW/SWmodels provides a path for reducingmodels provides a path for reducingdevelopment costs as well as the lifedevelopment costs as well as the lifecycle costscycle costs
•• CSIM provides a commonCSIM provides a commonenvironment for developing andenvironment for developing andverifying system, subsystem, andverifying system, subsystem, andmodule HW/SW interfaces andmodule HW/SW interfaces andoverall system performanceoverall system performance
66
Design space resulting from evaluating aDesign space resulting from evaluating arepresentative signal processing applicationrepresentative signal processing application
Automated Support to Offline Analysis/Automated Support to Offline Analysis/OptimizationOptimization
•• Each task/component (i.e.Each task/component (i.e.fftfft, fir) is abstracted by, fir) is abstracted byseveral parameters (i.e.several parameters (i.e.throughput, latency, power,throughput, latency, power,size, etc.)size, etc.)
•• Each task may have severalEach task may have severaldifferently optimizeddifferently optimizedimplementationsimplementations
•• An online library ofAn online library ofimplementations is selectedimplementations is selected
•• The application(s) of interestThe application(s) of interestare simulated using a subsetare simulated using a subsetof the available componentsof the available components
•• The results of simulatingThe results of simulatingeach subset is characterizedeach subset is characterizedwith respect to the parameters of interestwith respect to the parameters of interest
•• The characterizations are used to identify a subset of implementationsThe characterizations are used to identify a subset of implementationsthat are most optimal across the parameters of interestthat are most optimal across the parameters of interest
77
Processor Architecture and EnvironmentProcessor Architecture and Environment
This graphic illustrates aThis graphic illustrates arepresentative processorrepresentative processorarchitecture and an associatedarchitecture and an associatedCSIM visualizationCSIM visualizationenvironment. The core CSIMenvironment. The core CSIMcapability has beencapability has beenaugmented with a dynamicaugmented with a dynamicscheduler that can applyscheduler that can applycomplex selection criteria tocomplex selection criteria tothe scheduling process. In thisthe scheduling process. In thisexample, these complexexample, these complexselection criteria areselection criteria areencapsulated as severalencapsulated as several““modesmodes””; the buttons on the; the buttons on thelower right are used tolower right are used tointeractively switch betweeninteractively switch betweenthe modes. In addition, any ofthe modes. In addition, any ofthe processors can be the processors can be ““killedkilled””or returned to the simulation,or returned to the simulation,facilitating the evaluation offacilitating the evaluation offailoverfailover and similar analyses. and similar analyses.Illustrative results are shownIllustrative results are shownin the following slides.in the following slides.
88
Power-drivenscheduling;“LoPwr”
Complex criteriadrivenscheduling,“DDPSWC”
Assignedtaskscheduling,“Base”
Assignedspeedscheduling,“Fast”
Maxspeed-scheduling,“Fastest”
DynamicallyDynamically Reconfigurable Reconfigurable Architectures Architectures•• Real time on-demandReal time on-demand
dynamic allocation ofdynamic allocation ofcomputational resourcescomputational resources
•• Each task/componentEach task/component(i.e. (i.e. fftfft, fir) is abstracted, fir) is abstractedby several parametersby several parameters(i.e. throughput, latency,(i.e. throughput, latency,power, size, etc.)power, size, etc.)
•• Each task may haveEach task may haveseveral differentlyseveral differentlyoptimized implementationsoptimized implementations
•• An online library ofAn online library ofimplementations is selectedimplementations is selected
•• At run time theAt run time theimplementation bestimplementation bestmatched to the currentmatched to the currentselection criteria is executedselection criteria is executed
•• For this example, the maximumFor this example, the maximumthroughput to minimum powerthroughput to minimum powerdissipation ratio is improved >10X versus standard practicedissipation ratio is improved >10X versus standard practice
SWEPT improvement >10X vs standard practiceSWEPT improvement >10XSWEPT improvement >10X vs vs standard practice standard practice
99
Processorsfaulted/
switched out
Processorsreturned tooperation
DynamicallyDynamically Reconfigurable Reconfigurable Architectures Architectures
•• Validation and verificationValidation and verificationof next generationof next generationprocessor architecturesprocessor architectures
•• Any processor may beAny processor may betemporarily or permanentlytemporarily or permanentlyremoved from theremoved from thearchitecturearchitecture
•• Processors may beProcessors may bereturned to the architecturereturned to the architecture
•• Facilitates developmentFacilitates developmentand analysis ofand analysis offailoverfailover/fault adaptation/fault adaptationmethodologiesmethodologies
•• Extends multidimensionalExtends multidimensionaloptimization tooptimization toarchitectures witharchitectures withintermittently inactiveintermittently inactivecomponents (especiallycomponents (especiallyuseful for throughput/useful for throughput/power trades)power trades)
1010
Other Applications of CSIMOther Applications of CSIM
ProgramProgram
VNSVNS
AMRFSAMRFS
AdvancedAdvancedSurfaceSurfaceShip ModelShip Model
AvionicsAvionicsMissionMissionComputingComputingSystemSystem
WirelessWirelessMobileMobileNetworksNetworks
DescriptionDescription
Network attackNetwork attacksimulator for trainingsimulator for trainingIA operatorsIA operators
Agent-based resourceAgent-based resourcemanagementmanagement
ComputingComputinginfrastructure modelinfrastructure model
Model of JSF-Model of JSF-Integrated CoreIntegrated CoreProcessorProcessor
Dynamic ad hocDynamic ad hocrouting behaviorsrouting behaviors
CustomerCustomer
US ArmyUS ArmyCECOMCECOM
ONRONR
LM NE&SS-LM NE&SS-MoorestownMoorestown
LM NE&SS-LM NE&SS-EganEgan
DARPADARPA((XGCommsXGComms))US ArmyUS ArmyCECOM (WIN-T)CECOM (WIN-T)
Key TechnologyKey Technology
Token-based performanceToken-based performancemodeling, HLAmodeling, HLA
Intelligent agents,Intelligent agents,dynamic schedulerdynamic scheduler
Token-based performanceToken-based performancemodeling, pre-emptivemodeling, pre-emptivemultitasking CPU modelsmultitasking CPU models
Token-based performanceToken-based performancemodeling, hardware/modeling, hardware/software co-simulationsoftware co-simulation
Abstract propagation andAbstract propagation andarbitration models,arbitration models,dynamic routingdynamic routingalgorithmsalgorithms
1111
Specification
Requirements
CSIM & MaC
Run-time Environment and DesignRun-time Environment and DesignApplication for Polymorphous TechnologyApplication for Polymorphous TechnologyVerification & Validation (RE-ADAPT V&V)Verification & Validation (RE-ADAPT V&V)
•• ImpactImpact
—— Design-Time Modeling andDesign-Time Modeling andSimulation Environment forSimulation Environment forVerification and Validation (V&V) of PCA enabled applicationsVerification and Validation (V&V) of PCA enabled applications
—— Run-Time Monitoring for PCA Enabled Application V&VRun-Time Monitoring for PCA Enabled Application V&V
—— Approach Validation via application to PCA enabled avionics designApproach Validation via application to PCA enabled avionics design
•• New IdeasNew Ideas
—— Methodology for run-timeMethodology for run-timedetection of requirementdetection of requirementviolationsviolations
—— Framework for automaticFramework for automaticgeneration of monitoring andgeneration of monitoring andchecking componentschecking components
—— Dynamic run-time corrector toDynamic run-time corrector toforce reconfiguration into aforce reconfiguration into asafe statesafe state
Real-time Systems Group, University of Pennsylvania
1212
StreamProcessorsindicated byfilled boxes
GP Processorsindicated byoutlined boxes
Dynamic barchart indicatingtotal activeprocessors,active streamprocessoractive GPprocessors andactive threadedprocessors
Total activeprocessorcount display
MaCS messagesand status
Mission status RADAR Tasks
Imaging
RoutePlanning
Self Testand MaCS
FlightControl
ThreatAvoidance
MissionAssignment
Communi-cations
PCA Virtual ProcessorState and Activity
System State andTask Flow
Real-time Systems Group, University of Pennsylvania
DARPATechDARPATech Demo Demo