ATLAS A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers

  • Published on
    02-Jan-2016

  • View
    23

  • Download
    5

DESCRIPTION

ATLAS A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers. Yoongu Kim Dongsu Han Onur Mutlu Mor Harchol-Balter. Motivation. Modern multi-core systems employ multiple memory controllers Applications contend with each other in multiple controllers - PowerPoint PPT Presentation

Transcript

<p>ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers</p> <p>ATLAS</p> <p>A Scalable and High-Performance Scheduling Algorithm for Multiple Memory ControllersYoongu KimDongsu HanOnur MutluMor Harchol-Balter</p> <p>1MotivationModern multi-core systems employ multiple memory controllers</p> <p>Applications contend with each other in multiple controllers</p> <p>How to perform memory scheduling for multiple controllers?22Desired Properties of Memory Scheduling AlgorithmMaximize system performanceWithout starving any cores</p> <p>Configurable by system softwareTo enforce thread priorities and QoS/fairness policies</p> <p>Scalable to a large number of controllersShould not require significant coordination between controllers</p> <p>3No previous scheduling algorithm satisfies all these requirements Multiple memory controllers3Multiple Memory Controllers4CoreMCMemorySingle-MC systemMultiple-MC systemDifference?The need for coordinationCoreMCMemoryMCMemory4MC 1T1T2Memory service timelineThread 1s requestThread 2s requestT2Thread Ranking in Single-MC# of requests: Thread 1 &lt; Thread 2</p> <p>Thread 1 Shorter jobOptimal averagestall time: 2TThread 2STALLThread 1STALLExecution timeline5Thread ranking: Thread 1 &gt; Thread 2</p> <p>Thread 1 Assigned higher rankAssume all requests are to the same bank5Thread Ranking in Multiple-MC6MC 1T2T1T2MC 2T1T1T1MC 1T1T2T2MC 2T1T1T1UncoordinatedCoordinatedMC 1s shorter job: Thread 1Global shorter job: Thread 2</p> <p>MC 1 incorrectly assigns higher rank to Thread 1</p> <p>Global shorter job: Thread 2</p> <p>MC 1 correctly assigns higher rank to Thread 2 </p> <p>CoordinationCoordination Better scheduling decisionsThread 1Thread 2STALLSTALLAvg. stall time: 3TThread 1Thread 2STALLSTALLSAVED CYCLES!Avg. stall time: 2.5T6Coordination Limits Scalability7Coordination?MC 1MC 2MC 3MC 4Meta-MCMC-to-MCMeta-MCTo be scalable, coordination should: exchange little information occur infrequently</p> <p>Consumes bandwidth7The Problem and Our GoalProblem: Previous memory scheduling algorithms are not scalable to many controllers Not designed for multiple MCs Require significant coordination</p> <p>Our Goal: Fundamentally redesign the memory scheduling algorithm such that it Provides high system throughput Requires little or no coordination among MCs88OutlineMotivationRethinking Memory SchedulingMinimizing Memory Episode TimeATLASLeast Attained Service Memory SchedulingThread Ranking Request Prioritization RulesCoordinationEvaluationConclusion</p> <p>99Rethinking Memory SchedulingA thread alternates between two states (episodes)Compute episode: Zero outstanding memory requests High IPCMemory episode: Non-zero outstanding memory requests Low IPC10Goal: Minimize time spent in memory episodesOutstanding memory requestsTimeMemory episodeCompute episode10How to Minimize Memory Episode Time Minimizes time spent in memory episodes across all threads Supported by queueing theory: Shortest-Remaining-Processing-Time scheduling is optimal in single-server queueRemaining length of a memory episode?</p> <p> Prioritize thread whose memory episode will end the soonest TimeOutstanding memory requestsHow much longer?1111Predicting Memory Episode LengthsLarge attained service Large expected remaining service</p> <p>Q: Why?A: Memory episode lengths are Pareto distributed12We discovered: past is excellent predictor for futureTimeOutstanding memory requestsRemaining serviceFUTUREAttained servicePAST12Pareto Distribution of Memory Episode Lengths13</p> <p>401.bzip2Favoring least-attained-service memory episode = Favoring memory episode which will end the soonestPr{Mem. episode &gt; x}x (cycles)Memory episode lengths of SPEC benchmarksPareto distributionAttained service correlates with remaining serviceThe longer an episode has lasted The longer it will last further13OutlineMotivationRethinking Memory SchedulingMinimizing Memory Episode TimeATLASLeast Attained Service Memory SchedulingThread Ranking Request Prioritization RulesCoordinationEvaluationConclusion</p> <p>1414Prioritize the job with shortest-remaining-processing-time</p> <p>Provably optimal Remaining service: Correlates with attained service</p> <p> Attained service: Tracked by per-thread counterLeast Attained Service (LAS) Memory Scheduling15Prioritize the memory episode with least-remaining-serviceOur ApproachQueueing TheoryLeast-attained-service (LAS) scheduling: Minimize memory episode timeHowever, LAS does not consider long-term thread behaviorPrioritize the memory episode with least-attained-service15Long-Term Thread Behavior16Mem.episodeThread 1Thread 2Short-termthread behaviorMem.episodeLong-termthread behaviorCompute episodeCompute episode&gt;priority</p>

Recommended

View more >