29
Fast Switching of Threads Between Cores Richard Strong & Dean Tullsen (University San Diego) Jayaram Mudigonda, Jeffrey C. Mogul & Nathan Binkert (HP Labs) Ruhaim Izmeth | MS14901218 Nipuna Pannala | MS14902208

Fast switching of threads between cores - Advanced Operating Systems

Embed Size (px)

DESCRIPTION

Fast switching of threads between cores is a published research paper on Operating systems, This is our attempt to decode the research and present to the class

Citation preview

Page 1: Fast switching of threads between cores - Advanced Operating Systems

Fast Switching of Threads Between CoresRichard Strong & Dean Tullsen (University San Diego)

Jayaram Mudigonda, Jeffrey C. Mogul & Nathan Binkert (HP Labs)

Ruhaim Izmeth | MS14901218Nipuna Pannala | MS14902208

Page 2: Fast switching of threads between cores - Advanced Operating Systems

Introduction

● Now we are in the MULTICORE era.● Multi Core CPUs enable inter core communication

with less cost in the terms of Magnitude compared to the traditional multi processors. [This reduce the time for hardware to move migrating data working set]

● But software cost for moving thread remain as high

Page 3: Fast switching of threads between cores - Advanced Operating Systems

Asymmetric Multicore Processor

● Core – Core performance asymmetry appears to be very useful way to improve energy and area efficiency.

● Relatively little performance cost, But greater throughput per watt.

● Asymmetric Multicore Processor increases the need for frequent migration of threads between cores very efficiently.

Page 4: Fast switching of threads between cores - Advanced Operating Systems

Fast Switching of Threads between Cores

● To get a good performance in switching threads, between cores○ OS scheduler needs to migrate thread from slow

core to fast or ideal core.○ Also necessary to balance the load between

cores.(In a symmetric or Asymmetric system)○ All thread execution time segments should be

relatively short.

Page 5: Fast switching of threads between cores - Advanced Operating Systems

Simple Cores…

● Normally simple Cores can be better match for memory-bound application code.○ Operating systems and OS like codes are typical

memory bounded applications.

Page 6: Fast switching of threads between cores - Advanced Operating Systems

Thread Migration Techniques

● Migration Mechanism 1 : Constantinou○ This mechanism considered verity of costs

associated with thread migration, But primary focus about the threads in warming up (Caches and branch predictors)

○ But this is not addressing the software cost to migrate threads between cores.

Page 7: Fast switching of threads between cores - Advanced Operating Systems

Thread Migration Techniques

● Migration Mechanism 2 : Choi○ This mechanism specific case of migrating the

branch predictor state when thread switches cores

○ But this is not addressing the software overhead issues.

Page 8: Fast switching of threads between cores - Advanced Operating Systems

Thread Migration Techniques

Shared Thread Multiprocessor: Brown & Tulsan● Hardware manage's the thread moments.● Thread State is represented in hardware and that is

shared among the all cores in a chip.● Therefore hardware can move threads between

cores without direct OS involvement.

Page 9: Fast switching of threads between cores - Advanced Operating Systems

Software Approaches to Core Switching

•Core B is in IDLE state ?•Is there any thread to run on core A after T switching to B ?•Can ensure T is the most appropriate thread to run on B?

Transfer architectural state of thread from A to B

Page 10: Fast switching of threads between cores - Advanced Operating Systems

Approaches used in the research

● V1: Linux’s thread-migration mechanism● V2: Modified scheduler● V3: Scheduler fast-paths● V4: Addressing IPI costs● V5: Cross-core wakeup from quiesce

Page 11: Fast switching of threads between cores - Advanced Operating Systems

V1: Linux Thread Migration Mechanism

● Normally using for relatively long-term load balancing across the cores.

● Linux thread migration mechanism is the art of the core switching.

● One thread is available to initiate the migration.

Page 12: Fast switching of threads between cores - Advanced Operating Systems

V1: Linux Thread Migration Mechanism

● When task wants to migrate it puts itself on Per-Core Migration Queue.

● If the target core is idle thread wakes up from per-core migration queue and move to the Run Queue of the target core.

● After getting the approval from the target queue thread will execute in the target core.

Page 13: Fast switching of threads between cores - Advanced Operating Systems

V1: Linux Thread Migration MechanismCons...

● This migration approach involves “Extra” context switch between initiating thread and migrating thread.

Page 14: Fast switching of threads between cores - Advanced Operating Systems

Linux Thread Migration Mechanism Increase Efficiency

● To remove extra context switching,○ Threads can take migrating decisions by itself○ Centralize the thread status○ Increase the number of per core queues.○ Create Cross core signals

Page 15: Fast switching of threads between cores - Advanced Operating Systems

V2: Modified scheduler

Core 0Core 1N T

Run Queue

T

Alternative Queue (AQ)

T

Run Queue

schedule() interrupt

Control Block : TCore : 1...

SwitchCore()1

2

3

4

5

6

7

● Remove an extra context switch described in V1, ● Initiate thread migrate by process itself.

Page 16: Fast switching of threads between cores - Advanced Operating Systems

V3: Scheduler fast-paths● The original modified schedule● A fast schedule source version (FSS), called to initiate a core switch, ● A fast schedule target version (FST), called at the target core in response to the cross-core

signal.

FSS and FST omit a number of housekeeping functions normally done in schedule (eg: Priority calculation)

FSS only makes a hint to FST, so no locking takes place

FST has AQ check, FSS does not have AQ checks.

Page 17: Fast switching of threads between cores - Advanced Operating Systems

V3: Scheduler fast-paths

Page 18: Fast switching of threads between cores - Advanced Operating Systems

V4: Addressing inter-processor interrupt (IPI) costs

Inter-processor interrupts are sent to ‘wake up’ polling or paused processors.

Modified scheduler wakes up target core if idle.

The “IPI sending code” modified to be more efficient as it sends the interrupts to all members of a specified set.

schedule() is invoked on the target core with the interrupt

Page 19: Fast switching of threads between cores - Advanced Operating Systems

Modified System Calls

Modified long running system calls to initiate CoreSwitch()

Modified system calls : open,stat, read, write, readv, writev, select, poll, fsync, fdatasync,readfrom, sendto and sendfile.

4096 bytes

Page 20: Fast switching of threads between cores - Advanced Operating Systems

Simulation Environment

M5 Simulator used for generating detailed timelines, showing when interesting events such as procedure calls, cache misses, and long-latency instructions occur

x86 models are not debugged with M5. Complex core : Alpha EV6 (21264), 64KB L1Simple core : EV4-based (21064), 8KB L1Simulated on shared L2 3.5 MBytesMain-memory access time of 25 nsec.

Page 21: Fast switching of threads between cores - Advanced Operating Systems

sim_XXX - number of ‘x’ denote the number of processors

eg: sim_c - single processor

sim_sC - dual processor

Simulation Environment - Configuration naming scheme

Prefix 750Mhz 3Ghz

c CComplex

s SSimple

Tests run on Linux v 2.6.18 kernel

Only one trial run per experiment, as the simulator is deterministic

Page 22: Fast switching of threads between cores - Advanced Operating Systems

Microbenchmark results

Modified gettid() to call coreswitch() and run it N= 1,000,000 times in a tight loop

Page 23: Fast switching of threads between cores - Advanced Operating Systems

Cross-core wakeup from quiesce

● idle loop polling is inefficient

● initiating cross-CPU interrupt is slow as a powered down CPU needs to be awakened

● Kernel should dynamically decide between spinlock and powering down based on recent history.

Page 24: Fast switching of threads between cores - Advanced Operating Systems

Macrobenchmark results - Web Benchmark

Page 25: Fast switching of threads between cores - Advanced Operating Systems

Macrobenchmark results - Database Benchmark

Using “TPC-B-like” example from the Berkeley DB distribution

Core switch done only on fdatasync()

Eliminated disk I/O delays by using a RAM disk on the real hardware, and by setting the access time to zero in M5’s disk simulator.

Page 26: Fast switching of threads between cores - Advanced Operating Systems

Future Work

● Energy measurement/savings benchmarks for the above tests

● Determining the best core to switch to and the best time to switch in

● Optimal mechanism to poll or power down a Processor

Page 27: Fast switching of threads between cores - Advanced Operating Systems

Summary

● Cost of core switching is more important when use asymmetric multicores.

● Core switching to slower OS cores on frequent, expensive system calls some times reduce performance○ But it also provide power down complex application

cores.

Page 28: Fast switching of threads between cores - Advanced Operating Systems

References ● J. Aas. Understanding the Linux 2.6.8.1 CPU Scheduler. http://josh.trancesoftware.

com/linux/, Feb. 2005.

● S. Balakrishnan, R. Rajwar, M. Upton, and K. Lai. The Impact of Performance Asymmetry in Emerging Multicore Architectures. In Proc. ISCA, pages 506–517, 2005.

● M. Becchi and P. Crowley. Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures. J. Instruction Level Parallelism, pages 1–26, June 2008.

● N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G.Saidi, and S. K. Reinhardt. The M5 Simulator: Modeling Networked Systems. IEEE Micro, 26(4):52–60, 2006.

● D. Brooks, V. Tiwari, and M. Martonosi. Wattch: a framework for architectural-level power analysis and optimizations. In Proc. ISCA, pages 83–94, Jun. 2000.

Page 29: Fast switching of threads between cores - Advanced Operating Systems

Q / A

Thank You