36
CS243 Stanford 3/16/06 Parallel Software 2.0 Parallel Software 2.0 Wei Li Wei Li Senior Principal Engineer Senior Principal Engineer Intel Corporation Intel Corporation

Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

CS243 Stanford 3/16/06

Parallel Software 2.0Parallel Software 2.0

Wei LiWei LiSenior Principal EngineerSenior Principal Engineer

Intel CorporationIntel Corporation

Page 2: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

2

AgendaAgenda

Major Technological ChangeMajor Technological Change

Software ResponseSoftware Response

Parallel Software 2.0Parallel Software 2.0

Page 3: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

3

QuizQuiz

MooreMoore’’s law states which of the s law states which of the following roughly doubles following roughly doubles every 2 years?every 2 years?1.1.FrequencyFrequency

2.2.PerformancePerformance

3.3.TransistorsTransistors

4.4.Transistor DensityTransistor Density

Page 4: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

4

from www.intel.comwww.intel.com

Page 5: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

5

Historical Driving ForcesHistorical Driving Forces

0.01

0.1

1

10

1970 1980 1990 2000 2010 20200.01

0.1

1

10

1970 1980 1990 2000 2010 20201

10

100

1000

10000

100000

1970 1980 1990 2000 2010 20201

10

100

1000

10000

100000

1970 1980 1990 2000 2010 2020

Increased PerformanceIncreased Performancevia Increased Frequencyvia Increased Frequency

FeatureFeatureSizeSize(um)(um)

FrequencyFrequency(MHz)(MHz)

2005200565nm65nm

1B+ Transistors1B+ Transistors

1946194620 Numbers20 Numbers

in Main Memoryin Main Memory

19711971I4004 ProcessorI4004 Processor2300 Transistors2300 Transistors

Shrinking GeometryShrinking Geometry

Page 6: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

6

The ChallengesThe Challenges

10

100

1000

1990 1995 2000 2005 2010 2015

CPUCPUPowerPower(W)(W)

30nm

45nm65nm

90nm

0.13um

0.18um0.25um

0.35um

0.5um

0.7um

0.1

1

10

1990 1993 1997 2001 2005 2009

~30%

30nm

45nm65nm

90nm

0.13um

0.18um0.25um

0.35um

0.5um

0.7um

0.1

1

10

1990 1993 1997 2001 2005 2009

~30%SupplySupplyVoltage Voltage

(V)(V)

Power = Capacitance x VoltagePower = Capacitance x Voltage22 x Frequencyx Frequencyalsoalso

Power ~ VoltagePower ~ Voltage33

Power LimitationsPower Limitations Diminishing Voltage ScalingDiminishing Voltage Scaling

Page 7: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

Energy:Energy:The Next FrontierThe Next Frontier

Page 8: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

8

17,066 Flops/Watt17,066 Flops/Watt467 Flops/Dollar467 Flops/Dollar

Energy Efficient Performance Energy Efficient Performance ––High EndHigh End

ASC PurpleASC Purple6 MWatt6 MWatt100 TFlops goal100 TFlops goal12K+ cpus 12K+ cpus –– Power5Power5

$230M$230M

DATACENTERDATACENTER““ENERGY LABELENERGY LABEL””

Source: LLNLSource: LLNL

Source: NASASource: NASA

NASA ColumbiaNASA Columbia2 MWatt2 MWatt60 TFlops goal60 TFlops goal10,240 cpus 10,240 cpus –– Itanium IIItanium II

$50M$50M

30,720 Flops/Watt30,720 Flops/Watt1,288 Flops/Dollar1,288 Flops/Dollar

ComputationalComputationalEfficiencyEfficiency

Page 9: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

The Classic TradeoffThe Classic Tradeoff

Page 10: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

The Real The Real ChallengeChallenge

EnergyEnergy--EfficiencyEfficiencyPerformancePerformance

CapabilitiesCapabilities

Page 11: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

11

Reducing Power with Voltage Reducing Power with Voltage ScalingScaling

Power = Capacitance * VoltagePower = Capacitance * Voltage22 * Frequency* Frequency

Frequency ~ Voltage in region of interestFrequency ~ Voltage in region of interest

Power ~ VoltagePower ~ Voltage33

10% reduction of voltage yields10% reduction of voltage yields

–– 10% reduction in frequency10% reduction in frequency

–– 30% reduction in power30% reduction in power

–– Less than 10% reduction in performanceLess than 10% reduction in performance

0.66%0.66%3%3%1%1%1%1%

PerformancePerformancePower Power FrequencyFrequencyVoltage Voltage

Rule of ThumbRule of Thumb

Page 12: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

12

Dual Core example of Dual Core example of Voltage ScalingVoltage Scaling

0.66%0.66%3%3%1%1%1%1%

PerformancePerformancePower Power FrequencyFrequencyVoltage Voltage

cachecache

CoreCore CoreCore CoreCore

cachecache

Voltage = 1Voltage = 1

Freq = 1Freq = 1

Area = 1Area = 1

Power = 1Power = 1

PerfPerf = 1= 1

Voltage = Voltage = -- 15%15%

Freq = Freq = -- 15%15%

Area = 2Area = 2

Power = 1Power = 1

PerfPerf = ~1.8= ~1.8

Page 13: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

13

Multiple cores deliver more Multiple cores deliver more performance per wattperformance per watt

C1C1

C4C4

C2C2

C3C3

SmallSmallcorecore

Big coreBig core

CacheCache

CacheCache

11

22

33

44

11

22

11 11

11

22

33

44

11

22

33

44

PowerPower

PerformancePerformance

Power = Power = ¼¼

Performance = 1/2Performance = 1/2

Many core is more Many core is more power efficientpower efficient

Power ~ areaPower ~ area

Single thread Single thread performance ~ area**.5performance ~ area**.5

Page 14: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

14

MooreMoore’’s Law will provide s Law will provide transistorstransistors

128

11nm11nm

20162016

256643216842Integration Integration CapacityCapacity(Billions of (Billions of Transistors)Transistors)

8nm8nm16nm16nm22nm22nm32nm32nm45nm45nm65nm65nm90nm90nmFeature SizeFeature Size

20182018201420142012201220102010200820082006200620042004High Volume High Volume ManufacturingManufacturing

Use transistors forUse transistors for•• Multiple coresMultiple cores•• OnOn--core memory (caches)core memory (caches)•• New features (*Ts)New features (*Ts)

Intel process technology capabilities

Multiple cores and caches address power and Multiple cores and caches address power and memory latency issuesmemory latency issues

Page 15: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

The Dawn of The Dawn of EnergyEnergy--Efficient PerformanceEfficient Performance

Page 16: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

16

AgendaAgenda

Major Technological ChangeMajor Technological Change

Software ResponseSoftware Response

Parallel Software 2.0Parallel Software 2.0

Page 17: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

17

MultiMulti--Core Platforms Demand Core Platforms Demand Threaded SoftwareThreaded Software

Page 18: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

18

The Importance of ThreadingThe Importance of Threading

Do Nothing: Benefits Still VisibleDo Nothing: Benefits Still Visible––Operating systems ready for multiOperating systems ready for multi--processingprocessing––Background tasks benefit from more compute Background tasks benefit from more compute

resourcesresources

Parallelize: Unlock the PotentialParallelize: Unlock the Potential––Native threadsNative threads––Threaded librariesThreaded libraries––Compiler generated threadsCompiler generated threads

Page 19: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

19

Multiple cores and Parallel Multiple cores and Parallel ProgrammingProgramming

No change in No change in fundamental fundamental programming modelprogramming model

Synchronization and Synchronization and communication costs communication costs greatly reducedgreatly reduced

––Optimization Optimization choices may be choices may be differentdifferent

––Makes it practical to Makes it practical to parallelize more parallelize more programsprograms

P1

cache

P2 P3 P4

cache cache cache

Memory

SMP

C1

cache

Memory

C2 C3 C4

cache cache cache

CMP

Page 20: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

20

Introducing Introducing ThreadsThreads

DebuggingDebugging

PerformancePerformanceTuningTuning

Architectural Architectural AnalysisAnalysis

Threading for MultiThreading for Multi--CoreCore

Page 21: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

21

Introducing Introducing ThreadsThreads

DebuggingDebugging

PerformancePerformanceTuningTuning

Architectural Architectural AnalysisAnalysis

Threading for MultiThreading for Multi--CoreCore

Call GraphCall Graph•• Functional StructureFunctional Structure

•• Execution TimesExecution Times

•• CountsCounts

IntelIntel®® VTuneVTune™™Performance AnalyzerPerformance Analyzer

Page 22: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

22

Page 23: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

23

Introducing Introducing ThreadsThreads

DebuggingDebugging

PerformancePerformanceTuningTuning

Architectural Architectural AnalysisAnalysis

Threading for MultiThreading for Multi--CoreCore

OpenMPOpenMP Loop ConstructLoop Construct•• Creates one thread per coreCreates one thread per core

•• Assigns iterations to threadsAssigns iterations to threads

IntelIntel®® CompilersCompilers

Page 24: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

24

Page 25: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

25

Introducing Introducing ThreadsThreads

DebuggingDebugging

PerformancePerformanceTuningTuning

Architectural Architectural AnalysisAnalysis

Threading for MultiThreading for Multi--CoreCore

Thread Safety IssuesThread Safety Issues•• Data RacesData Races

•• DeadlocksDeadlocks

IntelIntel®® Thread Thread CheckerChecker

Page 26: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

26

Page 27: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

27

Introducing Introducing ThreadsThreads

DebuggingDebugging

PerformancePerformanceTuningTuning

Architectural Architectural AnalysisAnalysis

Threading for MultiThreading for Multi--CoreCore

Find Contended LocksFind Contended Locks•• Most OverheadMost Overhead

•• Largest Reduction in Largest Reduction in ParallelismParallelism

IntelIntel®® Thread Thread ProfilerProfiler

Page 28: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

28

Page 29: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

29

Page 30: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

30

Growing Growing Momentum Momentum

For For SoftwareSoftware

ParallelizationParallelization

Activision (Activision (RavensoftRavensoft))

AdobeAdobe

AlgorithmicsAlgorithmics

AliasAlias

AutodeskAutodesk

Business ObjectsBusiness Objects

CakewalkCakewalk

CodecPeopleCodecPeople

Computer AssociatesComputer Associates

Corel (WordPerfect)Corel (WordPerfect)

CyberlinkCyberlink

DiscreetDiscreet

IBMIBM

id Softwareid Software

LandmarkLandmark

MacromediaMacromedia

MainconceptMainconcept

MaxonMaxon

mental imagesmental images

Microsoft (Office Suite)Microsoft (Office Suite)

MidwayMidway

MSCMSC

Novell SUSENovell SUSE

OracleOracle

PegasusPegasus

PinnaclePinnacle

PixarPixar ((RendermanRenderman))

ParadigmParadigm

PTCPTC

Red HatRed Hat

SAPSAP

SASSAS

Siebel CRMSiebel CRM

SignetSignet

SkypeSkype

SLBSLB

SnapStreamSnapStream

Sonic (Sonic (RoxioRoxio))

SonySony

SteinbergSteinberg

SunGardSunGard

SybaseSybase

SymantecSymantec

Thomson Thomson

THQTHQ

UbisoftUbisoft

UGSUGS

ValveValve

Yahoo (Yahoo (MusicmatchMusicmatch))

Other names and brands may be claimed as the property of others.Other names and brands may be claimed as the property of others.

Page 31: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

31

AgendaAgenda

Major Technological ChangeMajor Technological Change

Software ResponseSoftware Response

Parallel Software 2.0Parallel Software 2.0

Page 32: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

32

A New A New EraEra……

PerformanceEquals Frequency

Unconstrained Power

Voltage Scaling

PerformanceEquals IPC

THE OLDTHE OLD

THE NEWTHE NEW

Multi-Core

MicroarchitectureAdvancements

Power Efficiency

It is It is happening happening

fastfast……

Page 33: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

MultiMulti--Core TrajectoryCore Trajectory

DualDual--CoreCore

QuadQuad--CoreCore

20062006 20072007

Page 34: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

34

Future ArchitectureFuture ArchitectureMany More CoresMany More Cores

Parallel extension of IAParallel extension of IA–– Homogeneous array of coresHomogeneous array of cores

–– FixedFixed--function unitsfunction units

–– CoarseCoarse-- and fineand fine--grained datagrained data--and threadand thread--level parallelismlevel parallelism

–– Global coherency hardwareGlobal coherency hardware

Partitioned arrayPartitioned array–– Application domainsApplication domains

–– Isolated communication trafficIsolated communication traffic

–– Fault toleranceFault tolerance

Power delivery and managementPower delivery and management

High bandwidth memoryHigh bandwidth memory

Reconfigurable cacheReconfigurable cache

Scalable fabricScalable fabric

FixedFixed--function unitsfunction units

CoreCore CoreCore

CoreCore CoreCore

CoreCore CoreCore

CoreCore

CoreCore

CoreCore

CoreCore CoreCore

CoreCore CoreCore

CoreCore CoreCore

Page 35: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

35

Parallel Software 2.0Parallel Software 2.0

Ease of programmingEase of programming–– Programming language, compiler, toolsProgramming language, compiler, tools

Ubiquitous Ubiquitous –– Consumer/wireless Consumer/wireless vsvs HPC/databaseHPC/database–– Home Home vsvs nuclear labsnuclear labs–– More legacy applicationsMore legacy applications

Explosion of coresExplosion of cores–– 2X cores every 18 months2X cores every 18 months–– Scalable softwareScalable software

ReliabilityReliabilityUser experienceUser experience–– vs. just raw performancevs. just raw performance

EducationEducation–– Mass Mass vsvs elite elite

Page 36: Senior Principal Engineer Intel Corporationinfolab.stanford.edu/~ullman/dragon/w06/lectures/cs243-lec16-wei-2.pdf · Senior Principal Engineer Intel Corporation. 2 Agenda Major Technological

Parallel Software 2.0

The Beginning of a New Era