29
Multicore Processors Raul Queiroz Feitosa Parts of these slides are from the support material provided by W. Stallings

Introdução à Micro-Informática - PUC-Riomicro/SLIDES/Multicore.pdf · Intel Core Architecture Outline . ... 4 SMT cores, each supporting 4 ... Advanced Programmable Interrupt

  • Upload
    hadieu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Multicore Processors

Raul Queiroz Feitosa

Parts of these slides are from the support material provided by W. Stallings

Multicore Computers 2

Objective

“This chapter provides an overview of

multicore systems”.

Stallings

Objective

Multicore Computers 3

Outline

Hardware Performance Issues

Software Performance Issues

Multicore Organizations

Intel Core Architecture

Outline

Multicore Computers 4

Hardware performance issues

Chip Density

Microprocessors performance increase in due to

a) Improved organization, e.g.,

b) Increased clock frequency

both made possible by 1. increasing chip density!

Pipelining

Superscalar

Multithreading

By 2015 → 100 billion transistors on 300mm2 die.

Multicore Computers 5

Hardware performance issues

Relative Performance

Multicore Computers 6

Hardware performance issues

Relative Performance/cycle

Pollack’s rule: performance is roughly proportional

to square root of increase in complexity Double complexity gives 40% more performance

2. Diminishing gains with complexity increase!

Multicore Computers 7

Hardware performance issues

Power

3. Power requirements grow exponentially with chip density and clock frequency!

Multicore Computers 8

Hardware performance issues Increased Complexity

4. Memory transistors have a power density one order of magnitude lower than that of logic.

Multicore Computers 9

Outline

Hardware Performance Issues

Software Performance Issues

Multicore Organizations

Intel Core Architecture

Outline

Multicore Computers 10

Software Performance Issues

small amounts of serial code impact performance

According to Amdahl’s law

where f is the fraction of code infinitely parallelizable with no schedule

overhead.

N

ff

N

1

1

processors parallel on program execute totime

processor single aon program execute totimeSpeedup

Multicore Computers 11

Software Performance Issues

Small amounts of serial code impact performance due to

Communication, distribution of work and cache coherence overheads

percentage of

sequential code

Multicore Computers 12

Software Performance Issues

More recently software Engineers have developed applications that

effectively exploit multiprocessor architecture, e. g., database

applications.

5. New applications exploit multiprocessor architecture!

Multicore Computers 13

Outline

Hardware Performance Issues

Software Performance Issues

Multicore Organization

Intel Core Architecture

Outline

Multicore Computers 14

Multicore Organization

In view of: 1. Increasing chip density.

2. Diminishing gains with complexity increase,

3. Power requirements grow exponentially with chip density and clock frequency.

4. Memory transistors have a power density an order of magnitude lower than that of logic.

5. Applications, which exploit multiprocessor architecture.

what to do with extra transistors made available by the semiconductor industry?

Multicore Computers 15

Multicore Organization

What to do with extra transistors made

available by the semiconductor industry?

Reduce complexity, so that multiple complete processors

fit in a single chip

Reduce clock frequency and increase the proportion of

chip occupied by cache to reduce power requirements

Multicore Computers 16

Multicore Organization

Main variable in a multicore organization:

Number of core processors on chip

Number of levels of cache on chip

Amount of shared cache

Multicore Computers 17

Multicore Organization Alternatives

ARM11 MPCore AMD Opteron

Intel Core Duo Intel Core i7

Multicore Computers 18

Private × shared L2 Cache

Advantages of shared L2 Cache

Constructive interference reduces overall miss rate

Data shared by multiple cores not replicated at cache level

With proper frame replacement algorithms mean amount of shared cache dedicated to each core is dynamic

Threads with less locality can have more cache

Easy inter-process communication through shared memory

Cache coherency confined to L1

Advantages of private L2 Cache

Dedicated L2 cache gives each core more rapid access

Shared L3 cache may also improve performance

Multicore Computers 19

Outline

Hardware Performance Issues

Software Performance Issues

Multicore Organization

Intel Core Architecture

Outline

Multicore Computers 20

Intel Core Architecture

Intel Core Duo uses superscalar cores

Intel Core i7 uses simultaneous multi-

threading (SMT)

Scales up number of threads supported

4 SMT cores, each supporting 4 threads appears as 16

cores.

Multicore Computers 21

Intel x86 Core Duo Organization

Multicore Computers 22

Intel x86 Core Duo Organization

Introduced in 2006

Two x86 superscalar, shared L2 cache

Dedicated L1 cache per core implementing MESI protocol

Protocol extended to accommodate multiple chips (SMP)

Thermal control unit per core

Manages chip heat dissipation

Maximize performance within thermal constraints

If temperature of a core exceeds a threshold, clock rate reduced.

Advanced Programmable Interrupt Controlled (APIC)

Inter-process interrupts between cores

Routes I/O interrupts to appropriate core.

Each APIC includes a timer, so that OS can interrupt the local core.

Multicore Computers 23

Intel x86 Core Duo Organization

Power Management Logic

Monitors thermal conditions and CPU activity

Adjusts voltage and power consumption

Can switch individual logic subsystems

2MB shared L2 cache

Dynamic allocation

MESI support for L1 caches

Extended to support multiple Core Duo in SMP

L2 data shared between local cores or external

Bus interface

Multicore Computers 24

Intel Core i7 Organization

256 KB

L2 Cache

DDR3 Memory

Controllers

Core 1

32 KB I&D

L1 Caches

Up to 15 MB

L3 Cache

QuickPath

Interconnect

Core n

32 KB I&D

L1 Caches

256 KB

L2 Caches

256 KB

L2 Caches

● ● ●

unboxing

Multicore Computers 25

Intel Core i7 Organization

Introduced in November 2008, 2nd Generation in January 2011

Up to six x86 SMT processors

Dedicated L2, shared L3 cache

Speculative pre-fetch for caches

On chip DDR3 memory controller Three 8 byte channels (192 bits) giving 32GB/s

No front side bus

Turbo Boost

Clock frequency is incrementally adjusted on demand.

Hardware Virtualization A facility that allows multiple operating systems to simultaneously processor

resources in a safe and efficient manner

Multicore Computers 26

Intel Core i7 Organization

QuickPath Interconnection

Cache coherent point-to-point

link

High speed communications

between processor chips

6.4G transfers per second, 16 bits

per transfer

Dedicated bi-directional pairs

Total bandwidth 25.6GB/s

Intel QPI animated demo

Multicore Computers 27

Intel Multicore Processors

Cache Latency

Compare Intel Core Processors

General Information about Intel Processors

Multicore Computers 28

Text Book References

These topics are covered in

Stallings - chapter 18

Multicore Computers 29

Multicore Processors

END