Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben...

Tornado: Maximizing Locality and Concurrencyin a Shared Memory Multiprocessor Operating

System

Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm

By : Priya Limaye

Locality

• What is Locality of reference?

Locality

sum = 0; for (int i = 0; i < 10; i ++) {

sum = sum + number[i]; }

Locality

sum = 0; for (int i = 0; i < 10; i ++) {

Temporal Locality Recently accessed data and instruction are likely to be

accessed in near future

Locality

sum = 0; for (int i = 0; i < 10; i ++) {

Spatial LocalityData and instructions close to recently accessed data and instructions are likely to be accessed in the near

future.

Locality

• What is Locality of reference?– Recently accessed data and instructions and

nearby data and instructions are likely to be accessed in the near future.

– Grab a larger chunk than you immediately need– Once you’ve grabbed a chunk, keep it

Locality in multiprocessor

• Computation depends on data local to processor– Each processor uses data from its own cache– Once data is brought in cache it stays there

Locality in multiprocessor

Memory

Counter

Counter: Shared

Memory

CPU CPU

Counter: Shared

Memory

Counter: Shared

Memory

Counter: Shared

Memory

Read : OK

Counter: Shared

Memory

CPU CPU

Invalidate

Comparing counter 1. Scales well with old

architecture2. Performs worse with shared

memory multiprocessor

Counter: Array

• Sharing requires moving back and forth between CPU Caches

• Split counter into array • Each CPU get its own counter

Counter: Array

Memory

CPU CPU

Counter: Array

Memory

Counter: Array

Memory

Counter: Array

Memory

Read Counter

Add All Counters

(1 + 1)

Counter: Array

• This solves the problem • What about performance?

Comparing counter Does not perform better than ‘shared counter’.

Counter: Array

• This solves the problem • What about performance?• What about false sharing?

Counter: False Sharing

Memory

CPU CPU

Memory

Sharing

Memory

Invalidate

Memory

Sharing

Memory

CPU CPU

Invalidate

Solution?

• Use padded array• Different elements map to different locations

Counter: Padded Array

Memory

CPU CPU

Counter: Padded Array

Memory

Update independent of each other

Comparing counter Works better

Locality in OS

• Serious performance impact• Difficult to retrofit• Tornado– Ground up design– Object Oriented approach – Natural locality

Tornado

• Object Oriented Approach• Clustered Objects• Protected Procedure Call• Semi-automatic garbage collection– Simplified locking protocol

Object Oriented Approach

Process 1

Process 2

Process Table

Process 1

Process 2

Process Table

Process 1

Process 2

Process Table

Process 1

Process 2

Process 1

Process 2

Process Table

Process 1

Process 2

Class ProcessTableEntry{datalock

• Each resource is represented by different object

• Requests to virtual resources handled independently– No shared data structure access– No shared locks

Process

Page Fault Exception

Process

Region

Process

Region

FCM File Cache Manager

Process

Region FCM

HAT Hardware Address TranslationFCM File Cache Manager

Search for responsible region

Process

Region

FCM File Cache ManagerCOR Cached Object RepresentativeDRAM Memory manager

• Multiple implementations for system objects• Dynamically change the objects used for

resource• Provides foundation for other Tornado

features

Clustered Objects

• Improve locality for widely shared objects• Appears as single object– Composed of multiple component objects

• Has representative ‘rep’ for processors– Defines degree of clustering

• Common clustered object reference for client

Clustered Objects

Clustered Objects : Implementation

• A translation table per processor– Located at same virtual address– Pointer to rep

• Clustered object reference is just a pointer into the table

• ‘reps’ created on demand when first accessed– Special global miss handling object

Counter: Clustered Object

Counter – Clustered Object

CPU CPU

rep 1 rep 1

Object Reference

rep 1 rep 1

Object Reference

rep 2 rep 1

Object Reference

Update independent of each other

Clustered Objects

• Degree of clustering• Multiple reps per object – How to maintain consistency ?

• Coordination between reps– Shared memory– Remote PPCs

rep 1 rep 1

Object Reference

rep 1 rep 1

Object Reference

rep 1 rep 1

Read Counter

rep 1 rep 1

Object Reference

rep 1 rep 1

Add All Counters

(1 + 1)

Clustered Objects : Benefits

• Facilitates optimizations applied on multiprocessor e.g. replication and partitioning of data structure

• Preserves object-oriented design• Enables incremental optimizations• Can have several different implementations

Synchronization

• Two kinds of locking issues– Locking– Existence guarantees

Synchronization: Locking

• Encapsulate locking within individual objects• Uses clustered objects to limit contention• Uses spin-then-block locks– Highly efficient– Reduces cost of lock/unlock pair

Synchronization: Existence guarantees

• All references to an object protected by lock– Eliminates races where one thread is accessing the

object and another is deallcoating it• Complex global hierarchy of locks• Tornado - semi automatic garbage collection– Clustered object reference can be used any time– Eliminates needs for locks

Garbage Collection

• Distinguish between temporary references and persistent references– Temporary: clustered references held privately– Persistent: shared memory, can persist beyond

lifetime of a thread

Garbage Collection

• Remove all persistent references– Normal cleanup

• Remove all temporary references– Event driven kernel– Maintain counter for each processor – Delete object if counter is zero

• Destroy object itself

Garbage Collection

Process 1

Garbage Collection

Process 1

Counter ++

Garbage Collection

Process 1

Counter = 1Process 2

Delete

Garbage Collection

Process 1

Delete

If counter = 0

Garbage Collection

Process 1

Counter-- Process 2

Garbage Collection

Process 1

If counter = 0

Interprocess communication

• Uses Protected Procedure Calls• A call from client object to server object– Clustered object call that crosses protection

domain of client to server• Advantages– Client requests serviced on local processor– Client and server share processors similar to

handoff scheduling– Each client request has one thread in server

PPC: Implementation

• On demand creation of server threads• Maintains list of worker threads• Implemented as a trap and some queue

manipulations– Dequeue worker thread from ready workers – Enqueue caller thread on the worker– Return from-trap to the server

• Registers are used to pass parameters

Performance

Performance: summary

• Strong basic design• Highly scalable• Locality and locking overhead are major

source of slowdown

Conclusion

• Object-oriented approach and clustered objects exploits locality and concurrency

• OO design has some overhead, but these are low compared to performance advantages

• Tornado scales extremely well and achieves high performance on shared-memory multiprocessors

References

• http://web.cecs.pdx.edu/~walpole/class/cs510/papers/05.pdf

• Presentation by Holly Grimes, CS 533, Winter 2008

• http://en.wikipedia.org/wiki/Locality_of_reference

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben...

Documents

EDUCATION SPONSOR THE EDUCATION OF OVER 1...Telephone + 374 10 535167 I orran@orran.am I Orran (a charity with a 501© (3) tax exempt status id 95-4841165) 2217 Observatory Avenue,

HELP US TO REALIZE THEIR DREAMS - Orran · 6 First Yekmalyan Street, Yerevan 0002, Republic of Armenia, Tel.: +374 10 535167, E-mail: orran@orran.am, Website: HELP US TO REALIZE THEIR

ORRAN “HELPING FAMILIES IN NEED”BENEVOLENT ......Serj Tankian and Angela Madatian • Garo Mardirossian, Esq. • Austrian Development Agency and Open Society Foundation ESP Silver

Vanadzor Orran- 3 Quarter Report /July- September · Vanadzor Orran- 3rd Quarter Report /July- September / 3rd Q 2016 for web Page 2 During the reporting period 107 children were

THE MASSACHUSETTS OPEN CLOUD - Red Hatvideos.cdn.redhat.com/summit2015/presentations/13671_the...THE MASSACHUSETTS OPEN CLOUD Orran Krieger, Principal Investigator, Boston University

Disclosure and Public Consultation Transparency and Participation in the Application of Safeguard Policies Johnson Appavoo Operations Analyst WB Safeguards

ORRAN “HELPING FAMILIES IN NEED”BENEVOLENT NON ... · DECEMBER 2015 ORRAN “HELPING FAMILIES IN NEED”BENEVOLENT NON-GOVERNMENTAL ORGANIZATION ... nia Marriott Tigran Mets Ballroom

Ultrafast Phase Transition via Catastrophic Phonon Collapse ...Ultrafast Phase Transition via Catastrophic Phonon Collapse Driven by Plasmonic Hot-Electron Injection Kannatassen Appavoo,*,†,‡

The Virtual Block Interface: A Flexible Alternative to the ... · RachataAusavarungnirun, Geraldo F. Oliveira , Jonathan Appavoo, Vivek Seshadri , OnurMutlu 7: Example Use Case: Address

Orran Yerevan 4st Quarter 2016 · Orran Yerevan – 4st Quarter 2016 /October-December/ Operations Report 4th Q 2013 for web Page 3 2.1.2 Handicraft classes The children of Orran

February 14, 2016NAME OF EVENT1 What’s the Big Deal? Collection Evaluation at the National Level Clare Appavoo, Executive Director, Canadian Research Knowledge

Vanadzor Orran- 2 Quarter Report /April to June, 2016 · Vanadzor Orran- 2nd Quarter Report /April to June, 2016 / 4th Q 2013 for web Page 2 7 children attending Orran have various

1 Read-Copy Update Paul E. McKenney Linux Technology Center IBM Beaverton Jonathan Appavoo Department

Indriya2: A Heterogeneous Wireless Sensor Network (WSN ... · Indriya2: A Heterogeneous Wireless Sensor Network (WSN) Testbed Paramasiven Appavoo(B), Ebram Kamal William, Mun Choon

Orran - 4th Quarter 2009 Operations · PDF fileInesa Alikhanyan Yerevan School assignments, homework, Chemistry, Biology 30. ... Svetlana Avagyan Yerevan Chemistry 39. Lena Saroyan

Revised Laws of Mauritius - Appavoo Global Businessappavoo-global.com/wp-content/uploads/2015/09/... · Revised Laws of Mauritius F22A – 1 [Issue 4] FOUNDATIONS ACT ... “Code”

EbbRT: A Framework for Building Per-Application … A Framework for Building Per-Application Library Operating Systems Dan Schatzberg, James Cadden, Han Dong, Orran Krieger, Jonathan

BIBLIOGRAFIJA AKADEMIKA IVANA GAMSA akad... · kraško terminologijo . Krona tega prizadevanja je bila knjiga Slovenska kraška terminologija (1973, s sodelavci) , ki je bila obenem

Clustered Objects: Initial Design, Implementation and ...Clustered Objects: Initial Design, Implementation and Evaluation Jonathan Appavoo, B.Sc. University of Toronto, 1998 Supervisor:

K42: Building a Complete OS Orran Krieger, Marc Auslander, Bryan Rosenburg, Robert Wisniewski, Jimi Xenidis, Dilma Da Silva, Michal Ostrowski, Jonathan