Upload
hop
View
63
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Network On Chip Cache Coherency. Final report, part B Students: Zemer Tzach Kalifon Ethan Instructor: Walter Isaschar Winter 2009. Agenda. General concepts. Description of the coherency protocol. Architecture design. Components implementation. Simulations. - PowerPoint PPT Presentation
Citation preview
Network On Chip Cache Coherency
Final report, part B
Students: Zemer Tzach Kalifon Ethan
Instructor: Walter Isaschar
Winter 2009
AgendaGeneral concepts.
Description of the coherency protocol.
Architecture design.
Components implementation.
Simulations.
Functionality demonstration .Network On Chip - Cache Coherency 2
Network On Chip - Cache Coherency
General Concepts
3
General Background
Modern CPU’s are based on CMP – Chip-Multi Processor.
Improved performance is achieved by “Distribution and Parallelism”.
Cores interact by using NoC – Network on Chip.
Network On Chip - Cache Coherency 4
NoC General Diagram
Network On Chip - Cache Coherency 5
NoC Characteristics
Wormhole packet routing.
Packet’s path is X-Y.
Units can communicate simultaneously.
Reduce power consumption.
Scalability.
Network On Chip - Cache Coherency 6
Cache Coherency
Cache: On chip fast temporary storage.
Cache Coherency: CMP cores use only up to date data.
Traditionally, Cache Coherency achieved by central memory control unit.
Network On Chip - Cache Coherency 7
Traditionally Cache Coherency
Network On Chip - Cache Coherency 8
Line 1000 = X Line 1000 = XLine 1000 = Y
Problem Description
Prior Cache Coherency protocols are irrelevant – NoC doesn’t have central unit.
Adding such unit will damage both NoC’s scalability and parallelism.
Network On Chip - Cache Coherency 9
Solution Requirements
High performance:Avoid “Hot Spots” and “Bottlenecks”.
Minimize resources.
Won’t affect main NoC characteristics (e.g. scalability).
Network On Chip - Cache Coherency 10
Solution Basics
Memory control distribution according to memory spaces.
Placement of control units as part of the NoC.
Network On Chip - Cache Coherency 11
Solution Diagram
Network On Chip - Cache Coherency 12
Solution General Example
Network On Chip - Cache Coherency 13
Read Miss on line 1000.CPU refer to the appropriate Controller.Controller order transfer of data.Other CPU sends the cache line.
Line 1000 = ?
Line 1000 = X
Line 1000 = X
Project Goal
Design and implement Cache Coherency protocol for CMP based NoC.Implement NoC (part one).Implement Cache Coherency support for NoC (part
two).
Network On Chip - Cache Coherency 14
Network On Chip - Cache Coherency
Coherency Protocol
15
Network On Chip - Cache Coherency
General DescriptionThree types of transactions: Read, Read for
Ownership and Invalidation.Cache line’s status can be I/S/E
(Invalid/Shared/Exclusive respectively).Each cache control unit keeps journal which
determines line’s status.Requests are first addressed to the
appropriate cache control unit.16
Protocol’s Terminology
Requester.Home Node. Closest Sharer. Owner.
Network On Chip - Cache Coherency 17
Read Miss: Line is Shared
Network On Chip - Cache Coherency 18
(3)Data
(1)Read
Request(2)
Forward Request
(4)ACK
Write Miss: Line is Shared
Network On Chip - Cache Coherency 19
(4)ACK
(3)Data
(2)Forward and Invalidation
Request
(7)Grant
Ownership
(5)Invalidation
(1)Read for
Ownership
(6)Invalidation
ACK
Design difficulties (1st example)
Network On Chip - Cache Coherency 20
(2)Invalidation
(4)Forward and Invalidation
Request
(5)Data
(1)Read for
Ownership
(3)Invalidation
ACK
Design difficulties (2nd example)
Network On Chip - Cache Coherency 21
(2)Forward and Invalidation
Request
(2)Invalidation
(3)Data
(4)ACK
(1)Read for
Ownership
Protocol’s FeaturesParallel handling of Read requests.Data is forwarded by the Closest Sharer.Transparency: any CPU which uses M/E/S/I is
supported.The protocol supports strongly consistent
processors.
Network On Chip - Cache Coherency 22
Network On Chip - Cache Coherency
Architecture
23
CMP Diagram
Network On Chip - Cache Coherency 24
CPU Node Structure
Network On Chip - Cache Coherency 25
NoC Interface
Functions as a gateway to the NoC.Packing/unpacking flits into/from NoC’s
Packets.Transmit and receive data simultaneously.
Network On Chip - Cache Coherency 26
NoC Interface Structure
Network On Chip - Cache Coherency 27
CPU Interface
Adapting between NoC’s Cache Coherency Protocol and the CPU.
Translating NoC’s Packets into/from FSB transactions.
CPU transactions doesn’t prevent the CPU Interface from handling the Protocol’s packets.
Network On Chip - Cache Coherency 28
CPU Interface Structure
Network On Chip - Cache Coherency 29
Controller Node Structure
Network On Chip - Cache Coherency 30
Cache Coherency Controller
Manages the Coherency Protocol.Each CCC (Cache Coherency Controller) is
responsible for a specific set of the Memory Lines.
The Directory Table (DT) holds the status of the above Lines as well as several protocol’s information bits.
Network On Chip - Cache Coherency 31
CCC Structure
Network On Chip - Cache Coherency 32
DT General Structure
The DT will contain the following data for each Line:
Network On Chip - Cache Coherency 33
Architecture Features
Message’s length vary according to its purpose. Reduces NoC’s congestion.
Messages carry the transaction information (reduces HW requirements).
Transaction can be blocked by memory update only (allows high parallelism).
Scalable. Network On Chip - Cache Coherency 34
Network On Chip - Cache Coherency
CMPImplementatio
n
35
CMP Characteristics
Size of memory unit is 1 [Byte].Cache line comprise 2 memory units (can
be enlarged).Size of memory is 16 [Byte].CPU’s actions are determined by the user.
Network On Chip - Cache Coherency 36
CPU Implementation
Network On Chip - Cache Coherency 37
CPU Node Implementation
Network On Chip - Cache Coherency 38
CCC Node Implementation
Network On Chip - Cache Coherency 39
CMP Implementation
Network On Chip - Cache Coherency 40
Synthesis Parameters
Network On Chip - Cache Coherency 41
System PerformanceSystem’s clock frequency is 100 [MHz]. CPU’s hold-up (in cycles):
Network On Chip - Cache Coherency 42
Event Line’s Status CPU Delay TotalInvalidation S 0 9Invalidation E 19 28 (M)Read Miss I 29 38 (M)Read Miss S 29 38Read Miss E 49 58 (M)
System Performance
M – Memory penalty.C – Dependant on number of CPUs.Delay in all nodes is one/two cycle. In larger systems network factor becomes
greater.Network On Chip - Cache Coherency 43
Event Line’s Status CPU Delay TotalWrite Miss I 29 38 (M)Write Miss S 29 38 (C)Write Miss E 29 38 (C)
Network On Chip - Cache Coherency
CMPSimulations
44
Network On Chip - Cache Coherency
Read Miss: Line is Shared (1)
45
CPU1x1 reads cache line. The appropriate line is stored in CPU0x0.
1
2
Network On Chip - Cache Coherency
Read Miss: Line is Shared (2)
46
1
2
4
3
Network On Chip - Cache Coherency
Read Miss: Line is Shared (3)
47
1
2 6
5
Network On Chip - Cache Coherency
Read Miss: Line is Exclusive (1)
48
CPU1x1 reads for ownership. The appropriate line is stored in CPU0x0.
1
2
1
2
Network On Chip - Cache Coherency
Read Miss: Line is Exclusive (2)
49
1
2
3
4
Network On Chip - Cache Coherency
Read Miss: Line is Exclusive (3)
50
1
2
5
Network On Chip - Cache Coherency
Read Miss: Line is Exclusive (4)
51
1
2
6
7
Network On Chip - Cache Coherency
Demonstration
52
Demonstration Diagram
Network On Chip - Cache Coherency 53
Tasks – Part A
Familiarize with design tools.Familiarize with VirtexII Pro FPGA
(application & components).Design & Implement NoC’s router.Assemble NoC (2x2 grid) using our router
implementation.
Network On Chip - Cache Coherency 54
Tasks – Part B
Design Cache Coherency protocol for CMP based on faculty research.
Assemble CMP based on our NoC.Implement the protocol as part of the
assembled CMP.
Network On Chip - Cache Coherency 55
Future Work
Network On Chip - Cache Coherency 56
Memory should be distributed.Improve NoC Interface latency.Messages carry all the transaction’s
information.Strongly consistent processors.
Conclusions (1)
Network On Chip - Cache Coherency 57
All architectural goals were achieved. Minimal HW utilization makes for practical
solution. The most efficient possible by protocol
definition.
Conclusions (2)
Network On Chip - Cache Coherency 58
The generic design makes a great basis for further studies and research.
With larger systems, the project advantages would be even more predominant.