Parallel Real -Time Scheduling from Theory to Implementationalp514/hpc2017/Jing_Li.pdf · Snippet...

ParallelReal-TimeSchedulingfromTheorytoImplementation

Jing LiDepartmentofComputerScienceYingWuCollegeofComputing

New Jersey Institute of Technologyjingli@njit.edu

OutlineØ Why do we demand parallelism?

Ø My research: exploiting parallelism in real-time systems

Ø Two examples of my research

FutureofComputingPerformance

1970 1980 1990 2000 2010 2020

Frequency (MHz)

40 Years of Microprocessor Trend Data

Originaldatauptotheyear2010collectedandplottedbyM.Horowitz,F.Labonte,O.Shacham,K.Olukotun, L.Hammond,andC.Batten.Newplotanddatacollectedfor2010-2015byK.Rupp.

ThefastestslothworkingattheDMV (from Zootopia).

PowerDensityØ Clock frequency hits the power wall.

Multi-CoreSystems

1970 1980 1990 2000 2010 2020

Number of

Logical Cores

Frequency (MHz)

40 Years of Microprocessor Trend Data

Originaldatauptotheyear2010collectedandplottedbyM.Horowitz,F.Labonte,O.Shacham,K.Olukotun, L.Hammond,andC.Batten.Newplotanddatacollectedfor2010-2015byK.Rupp.

Multi-Core=Parallelism?Ø Concurrent execution of jobs

Ø Parallel execution of a job

coressinglemulti-coremachine

jobs Job 1 Job 2 Job 3

LaneKeepingAssistSystem in Cars

Since the system interacts with the physical world, its computation must be completed under a time constraint.

Applications Benefit from Intra-Job ParallelismØ Motion planning program in a self-driving car

FromKimetal.[ICCPS13]

Cyber-Physical Systems(CPS)CPS are built from, and depend upon, the seamless integration of computational algorithms and physical components.

E.g., robotics, drones, autonomous vehicles, etc. 9

Applications Benefit from Intra-Job Parallelism

Searchtheweb

*Jeff Dean et al.(Google) "Thetailatscale."CommunicationsoftheACM56.2(2013)

2nd phaseranking

Snippetgenerator

Doc.indexsearch

Response

Ø Web searchNeed to respond within100msfor users to find responsive*.

InteractiveCloudServices(ICS)E.g., web search, online gaming, stock trading etc.

Searchtheweb

Real-Time SystemsThe performance of the systems depends not only upon theirfunctional aspects, but also upon their temporal aspects.

Real-time performance:

1) Provide hard guarantee of meeting jobs’ deadlines (e.g. CPS)2) Optimize latency-related objectives for jobs (e.g. ICS)

NewGenerationofReal-TimeSystemsCharacteristics:

Ø New classes of applications with complex functionalitiesØ Increasing computational demand of each application

Ø Consolidating multiple applications onto a shared platform Ø Rapid increase in the number of cores per chip

Demand: leverage parallelism within the applications, to improve real-time performance and system efficiency

Myresearch:parallelreal-timescheduling from theory to implementation

Ø Example 1: parallel real-time scheduling for meeting deadlines

Ø Example 2: parallel real-time scheduling for a target latency

Real-TimeHybridSimulation(RTHS)

Since the numerical simulation interacts with the physical specimen, its computation must be completed by its deadline.

Cyber-PhysicalBoundary

^RobertL.andTerryL.BowenLargeScaleStructuresLaboratoryatPurdueUniversity

Numericalsimulation Physicalspecimen

HowtoAllocateCorestoMultipleParallelJobs?Ø A real world system consists of many parallel jobs

Ø Each job demands different real-time performanceq E.g., jobs need to meet their deadlines in RTHS

Ø The goal of parallel real-time scheduling:smartly allocate cores to parallel jobs to meet their deadlines

HowtoAnalyzeaParallelJob?Ø An example:

Computation on a array A[i] with m elements

int i = 0;

parallel_for (; i < m; i++) {compute( A[i] );

}bar();

Naturally captures programs generated

by parallel languages such as Cilk Plus, Intel TBB and OpenMP.

Node: sequential computation

Edge: dependence between nodes

DirectedAcyclicGraph (DAG) Model

Unitnode– singleinstruction

available nodecompleted nodeunavailable node

Work T1 : execution time on one coreSpan T∞ : execution time on ∞ cores

(critical-path length)

Work and Span

T1 =18T∞ =9

FederatedSchedulingFor parallel tasks, FS has the best bound in term of schedulability

FS assigns ni dedicated cores to each parallel task

ni – the minimum #cores needed for a task to meet its deadline

𝑛" =𝐶" − 𝐿"𝐷" − 𝐿"

deadlineDi =periodworst-case span Liworst-case workCi

FSPlatformØ Middleware platform providing FS service in Linux

Ø Work with GNU OpenMP runtime system Ø Run OpenMP programs with minimum modification

FederatedScheduling(FS)

OpenMPRuntime

ParallelismImprovesRTHSAccuracy

A RTHS simulates a nine stories building, with first story damper

Ø Previously, sequential processing power limits a rate of 575HzØ Parallel execution now allows a rate of 3000Hz

Ø Reduction in error for acceleration and displacement

Ø Parallelism increases accuracy via faster actuation and sensing

Sequential (575Hz)Parallel(3000Hz)

Time(sec)

Normalize

dError(%)

Ø Example 1: parallel real-time scheduling for meeting deadlines

Ø Example 2: parallel real-time scheduling for a target latency

SystemforInteractiveCloudServicesOnline system: do not know when jobs arrive

Objective: maximize the number of jobs that meet a target latency T

2nd phaseranking

Snippetgenerator

Doc.indexsearch

Aggregator

WorkloadDistributionHasaLongTail

Job SequentialExecutionTime(ms)(work)

Bing searchworkload

Ø Large jobs must run in parallel to meet target latency

Ø Always run large jobs in full parallelism?

Target latency

Parallelize Large Jobs According to LoadTail-Control Strategy: when load is low, run all jobs in parallel; when load is high, run large jobs sequentially.

Latency = Processing Time + Waiting time

At low load: processing time dominates latency

At high load:waiting time dominates latency

✔Miss0 request

core3time

✔Miss1 request

target

TheInnerWorkingsofTail-ControlWe implement tail-control algorithm in the runtime system of Intel Thread Building Block and evaluate on Bing search workload.

We implement tail-control algorithm in the runtime system of Intel Thread Building Block and evaluate on Bing search workload.

TheInnerWorkingsofTail-Control

TargetLatency

defaultwork-stealing≥

TargetLatency

ConclusionExploit the untapped efficiency in parallel computing platforms and drastically improve the real-time performance of applications.

Ø System Guaranteed to Meet Deadlines for CPSq Develop provably good schedulers for parallel applications

q Incorporate real-time scheduling into parallel runtime system

Ø System Optimized to Meet Target Latency for ICSq Design and implement strategy to optimize real-time performance

Parallel Real -Time Scheduling from Theory to Implementationalp514/hpc2017/Jing_Li.pdf · Snippet...

Documents

Spatial approximate string search Doc

YE SEARCH THE SCRIPTURESWatchman Nee - .doc

Greedy Randomized Adaptive Search Proceduresmauricio.resende.info/doc/gtut.pdf · 2014-09-25 · GREEDY RANDOMIZED ADAPTIVE SEARCH PROCEDURES 113 iterations. The simulation uses the

Jayne v. Google Internet Search Engine Founders Doc. 1 Att

CS54701 Federated Text Search - Purdue University · Federated Text Search . ... (Callan et al., 1995), linear combination of doc ... Use linear transformation, a hint for other method

Competitive on-the-job search - Collegio Carlo Albertosites.carloalberto.org/garibaldi/doc/on_the_job_9jan2015_red.pdf · Competitive on-the-job search Pietro Garibaldi University

THE SEARCH FOR A CULTURAL IDENTITY - University of …scnc.ukzn.ac.za/doc/INDIAN/PDFfiles/Naidoo, Muthal Search 4... · THE SEARCH FOR A CULTURAL IDENTITY A personal odyssey ... but

Monte-Carlo Tree Search (MCTS) for Computer Gobouzy/Doc/AA2/MCTSGo-Bouzy.pdf · Monte-Carlo Tree Search (MCTS) for Computer Go Bruno Bouzy bruno.bouzy@parisdescartes.fr Université

Optimization and search in discrete spaces - Own investigationformella.webs.uvigo.es/doc/tc06/tc_oed.pdf · Optimization and search in discrete spaces ... we use the quality Q of

[Doc 297-3] 5-3-2013 Warrant 410 Norfolk Search

national Account Doc Type A - Tire-hq Account - Doc... · National Account – Doc Type A 2 The National Account or a list of accounts matching your search criteria will appear. You

STAR "SEARCH OF THE WEEK".doc

CPRI Specification V3 - search read.pudn.comread.pudn.com/downloads123/doc/project/523029/CPRIV3.doc · Web viewInterface Specification Common Public Radio Interface (CPRI); Interface

Boolean Model Hongning Wang CS@UVa. Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation

Einstein@Home search for periodic gravitational waves in ...caoj/pub/doc/jcao_j_s5athome.pdf · LIGO-P0900032-v3 Einstein@Home search for periodic gravitational waves in early S5

Optimal Parameter and Memory-Size Search for Multi ...mediatum.ub.tum.de/doc/1381850/847821188943.pdf · Search for Multi-timescale Nexting Mert Kayhan, Johannes Günther, Klaus Diepold

[Doc 303 3] 4-23-2013 Warrant to Search Defendant's Laptop

Perceptive Enterprise Search - Lexmarkmedia.lexmark.com/www/doc/en_US/enterprise_search... · Perceptive Enterprise Search makes locating key information ... essential for many knowledge-driven

Dictionary 15-2 Voice Recorder 15-9 Search15-4 Scan Barcode 15 … · 2016-09-08 · Doc./Rec. Tools 15 15-4Search 1 MENU or % S Tools S Highlight Doc./Rec. tab 2 Search Search Window

[Doc 297-2] 4-21-2013 Warrant Pinedale Hall Search