82
Supercomputing Supercomputing in Plain English in Plain English The Tyranny of The Tyranny of the Storage Hierarchy the Storage Hierarchy Henry Neeman, Director OU Supercomputing Center for Education & Research Blue Waters Undergraduate Petascale Education Program May 29 – June 10 2011

Supercomputing in Plain English The Tyranny of the Storage Hierarchy Henry Neeman, Director OU Supercomputing Center for Education & Research Blue Waters

Embed Size (px)

Citation preview

  • Supercomputingin Plain English The Tyranny ofthe Storage HierarchyHenry Neeman, DirectorOU Supercomputing Center for Education & ResearchBlue Waters Undergraduate Petascale Education ProgramMay 29 June 10 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • OutlineWhat is the storage hierarchy?RegistersCacheMain Memory (RAM)The Relationship Between RAM and CacheThe Importance of Being LocalHard DiskVirtual Memory*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • The Storage HierarchyRegistersCache memoryMain memory (RAM)Hard diskRemovable media (CD, DVD etc)InternetSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Fast, expensive, fewSlow, cheap, a lot

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • A LaptopIntel Core2 Duo SU9600 1.6 GHz w/3 MB L2 Cache4 GB 1066 MHz DDR3 SDRAM256 GB SSD Hard DriveDVD+RW/CD-RW Drive (8x)1 Gbps Ethernet AdapterSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Dell Latitude Z600[4]

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Storage Speed, Size, CostSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011** MFLOP/s: millions of floating point operations per second** 16 64-bit general purpose registers, 8 80-bit floating point registers, 16 128-bit floating point vector registers

    LaptopRegisters(Intel Core2 Duo1.6 GHz)CacheMemory(L2)MainMemory(1066MHz DDR3 SDRAM)Hard Drive(SSD)Ethernet(1000 Mbps)DVD+R(16x)Phone Modem(56 Kbps)Speed(MB/sec)[peak]314,573[6](12,800 MFLOP/s*)27,276 [7]4500 [7]250 [9]12522 [10]0.007Size(MB)464 bytes**[11]34096256,000unlimitedunlimitedunlimitedCost($/MB)$285 [13]$0.03 [12]$0.002 [12]chargedper month(typically)$0.00005 [12]charged per month (typically)

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Registers[25]

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • What Are Registers?Registers are memory-like locations inside the Central Processing Unit that hold data that are being used right now in operations.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Arithmetic/Logic UnitControl UnitRegistersFetch Next InstructionAddSubMultDivAndOrNotIntegerFloating PointFetch DataStore DataIncrement Instruction PtrExecute InstructionCPUSupercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • How Registers Are UsedEvery arithmetic or logical operation has one or more operands and one result.Operands are contained in source registers.A black box of circuits performs the operation.The result goes into a destination register.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Example:addend in R0augend in R1ADDsum in R25712Register RiRegister RjRegister RkoperandoperandresultOperation circuitrySupercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • How Many Registers?Typically, a CPU has less than 8 KB (8192 bytes) of registers, usually split into registers for holding integer values and registers for holding floating point (real) values, plus a few special purpose registers.Examples:IBM POWER7 (found in IBM p-Series supercomputers): 226 64-bit integer registers and 348 128-bit merged vector/scalar registers (7376 bytes) [28]Intel Core2 Duo: 16 64-bit general purpose registers, 8 80-bit floating point registers, 16 128-bit floating point vector registers (464 bytes) [11]Intel Itanium2: 128 64-bit integer registers, 128 82-bit floating point registers (2304 bytes) [23]Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Cache[4]

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • What is Cache?A special kind of memory where data reside that are about to be used or have just been used.Very fast => very expensive => very small (typically 100 to 10,000 times as expensive as RAM per byte)Data in cache can be loaded into or stored from registers at speeds comparable to the speed of performing computations.Data that are not in cache (but that are in Main Memory) take much longer to load or store.Cache is near the CPU: either inside the CPU or on the motherboard that the CPU sits on.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • From Cache to the CPUSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Typically, data move between cache and the CPU at speeds relatively near to that of the CPU performing calculations.CPUCache27 GB/sec (6x RAM)[7]307 GB/sec[7]

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Multiple Levels of CacheMost contemporary CPUs have more than one level of cache. For example:Intel Pentium4 EM64T (Yonah) [??]Level 1 caches: 32 KB instruction, 32 KB dataLevel 2 cache: 2048 KB unified (instruction+data)IBM POWER7 [28]Level 1 cache: 32 KB instruction, 32 KB data per coreLevel 2 cache: 256 KB unified per coreLevel 3 cache: 4096 KB unified per coreSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Why Multiple Levels of Cache?The lower the level of cache:the faster the cache can transfer data to the CPU;the smaller that level of cache is (faster => more expensive => smaller).Example: IBM POWER7 latency to the CPU [28]L1 cache: 1 cycle = 0.29 ns for 3.5 GHzL2 cache: 8.5 cycles = 2.43 ns for 3.5 GHz (average)L3 cache: 23.5 cycles = 5.53 ns for 3.5 GHz (local to core)RAM: 346 cycles = 98.86 ns for 3.5 GHz (1066 MHz RAM)Example: Intel Itanium2 latency to the CPU [19]L1 cache: 1 cycle = 1.0 ns for 1.0 GHzL2 cache: 5 cycles = 5.0 ns for 1.0 GHzL3 cache: 12-15 cycles = 12 15 ns for 1.0 GHzExample: Intel Pentium4 (Yonah)L1 cache: 3 cycles = 1.64 ns for a 1.83 GHz CPU = 12 calculationsL2 cache: 14 cycles = 7.65 ns for a 1.83 GHz CPU = 56 calculationsRAM: 48 cycles = 26.2 ns for a 1.83 GHz CPU = 192 calculations Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Cache & RAM LatenciesSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Better[26]Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

    MemLat

    3

    3.04

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3.02

    3

    3

    3

    3

    3

    3

    3

    3.02

    3

    3

    3

    3

    3

    3

    3

    3.02

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3.02

    3

    3

    3

    3

    3.03

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3.02

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3

    3.6

    6.8

    9.77

    12.75

    14.14

    14.15

    14.15

    14.14

    14.14

    14.18

    14.15

    14.14

    14.15

    14.15

    14.14

    14.18

    14.16

    14.16

    14.14

    14.19

    14.14

    14.14

    14.13

    14.14

    14.14

    14.13

    14.17

    14.15

    14.14

    14.16

    14.14

    14.16

    14.15

    14.16

    14.14

    14.16

    14.14

    14.18

    14.14

    14.16

    14.16

    14.14

    14.15

    14.14

    14.22

    14.16

    14.14

    14.17

    14.14

    14.14

    14.14

    14.17

    14.15

    14.14

    14.16

    14.17

    14.14

    14.14

    14.14

    14.14

    14.14

    14.17

    14.18

    14.16

    14.15

    14.16

    14.15

    14.14

    14.14

    14.16

    14.14

    14.19

    14.15

    14.16

    14.13

    14.14

    14.15

    14.14

    14.16

    14.18

    14.16

    14.14

    14.14

    14.15

    14.14

    14.17

    14.17

    14.17

    14.16

    14.15

    14.2

    14.19

    14.19

    14.21

    14.25

    14.23

    14.26

    14.26

    14.24

    14.25

    14.26

    14.24

    14.27

    14.25

    14.27

    14.26

    14.41

    14.24

    14.28

    14.29

    14.26

    14.28

    14.25

    14.34

    14.3

    14.51

    14.31

    14.37

    14.28

    14.38

    14.54

    14.44

    14.43

    15.08

    14.51

    14.69

    15.33

    14.95

    16.21

    16.72

    16.65

    18.42

    19.44

    20.33

    26.06

    31.63

    37.64

    42.81

    44.53

    45.52

    45.24

    45.73

    46.74

    46.48

    46.53

    46.75

    46.49

    46.62

    46.89

    47.01

    46.86

    46.93

    47.05

    46.89

    47.3

    47.1

    46.94

    46.81

    47.23

    46.9

    47.31

    47.52

    47.12

    47.63

    46.96

    47.11

    47.25

    47.11

    47.36

    47.81

    47.54

    47.12

    47.38

    47.24

    47.39

    47.55

    47.35

    47.4

    47.24

    47.19

    47.51

    47.52

    47.53

    47.29

    47.64

    48.85

    47.66

    3 cycles

    14 cycles

    47 cycles

    Memory Latency

    Array Size (bytes)

    Latency (clock cycles)

    Cache & RAM Latency: Intel T2400 (1.83 GHz)

    Sheet1

    10243

    10883.04

    11523

    12163

    12803

    13443

    14083

    14723

    15363

    16003

    16643

    17283

    17923

    18563

    19203

    19843

    20483

    21123

    22403.02

    23683

    24963

    26243

    27523

    28803

    30083

    31363

    32643.02

    33923

    35203

    36483

    37763

    39043

    40323

    41603

    43523.02

    45443

    47363

    49283

    51203

    53123

    55043

    56963

    58883

    60803

    62723

    65283

    67843

    70403

    72963

    75523

    78083.02

    80643

    83203

    86403

    89603

    92803.03

    96003

    99203

    102403

    105603

    109443

    113283

    117123

    120963

    124803

    129283

    133763

    138243

    142723

    147203

    152323

    157443

    162563

    167683

    173443

    179203

    184963

    191363.02

    197763

    204163

    210563

    217603

    224643

    231683

    239363

    247043

    255363

    263683

    272003

    280963

    289923

    299523

    309123

    319363

    329603.6

    340486.8

    351369.77

    3628812.75

    3744014.14

    3865614.15

    3987214.15

    4115214.14

    4249614.14

    4384014.18

    4524814.15

    4672014.14

    4819214.15

    4972814.15

    5132814.14

    5299214.18

    5465614.16

    5638414.16

    5817614.14

    6003214.19

    6195214.14

    6393614.14

    6598414.13

    6809614.14

    7027214.14

    7251214.13

    7481614.17

    7718414.15

    7961614.14

    8211214.16

    8473614.14

    8742414.16

    9017614.15

    9305614.16

    9600014.14

    9900814.16

    10214414.14

    10534414.18

    10867214.14

    11212814.16

    11564814.16

    11929614.14

    12307214.15

    12697614.14

    13094414.22

    13504014.16

    13926414.14

    14361614.17

    14816014.14

    15283214.14

    15763214.14

    16256014.17

    16768014.15

    17292814.14

    17836814.16

    18400014.17

    18976014.14

    19571214.14

    20185614.14

    20819214.14

    21472014.14

    22144014.17

    22841614.18

    23558414.16

    24300814.15

    25062414.16

    25849614.15

    26662414.14

    27500814.14

    28364814.16

    29254414.14

    30169614.19

    31116814.15

    32089614.16

    33094414.13

    34131214.14

    35200014.15

    36300814.14

    37440014.16

    38611214.18

    39820814.16

    41068814.14

    42355214.14

    43680014.15

    45049614.14

    46457614.17

    47910414.17

    49408014.17

    50956814.16

    52550414.15

    54195214.2

    55891214.19

    57638414.19

    59443214.21

    61305614.25

    63225614.23

    65203214.26

    67244814.26

    69350414.24

    71520014.25

    73760014.26

    76070414.24

    78451214.27

    80908814.25

    83443214.27

    86054414.26

    88748814.41

    91526414.24

    94387214.28

    97337614.29

    100384014.26

    103526414.28

    106764814.25

    110105614.34

    113548814.3

    117100814.51

    120761614.31

    124537614.37

    128435214.28

    132454414.38

    136595214.54

    140864014.44

    145267214.43

    149811215.08

    154496014.51

    159328014.69

    164307215.33

    169446414.95

    174745616.21

    180211216.72

    185843216.65

    191654418.42

    197644819.44

    203827220.33

    210201626.06

    216774431.63

    223552037.64

    230540842.81

    237747244.53

    245177645.52

    252844845.24

    260748845.73

    268902446.74

    277305646.48

    285977646.53

    294918446.75

    304140846.49

    313651246.62

    323456046.89

    333568047.01

    343993646.86

    354745646.93

    365836847.05

    377273646.89

    389068847.3

    401228847.1

    413772846.94

    426707246.81

    440044847.23

    453798446.9

    467980847.31

    482611247.52

    497696047.12

    513254447.63

    529299246.96

    545843247.11

    562905647.25

    580499247.11

    598643247.36

    617356847.81

    636652847.54

    656550447.12

    677068847.38

    698227247.24

    720051247.39

    742553647.55

    765760047.35

    789696047.4

    814374447.24

    839827247.19

    866073647.51

    893139247.52

    921056047.53

    949843247.29

    979526447.64

    1010137648.85

    1041708847.66

    rmma_intelt2400_memfwdlat_20070

    RIGHTMARK MEMORY ANALYZER V3.72 TEST RESULTS

    ============================================

    CPU ModelGenuine Intel(R) Core(TM) Duo (Yonah) 1828.7 MHz

    L1 Cache Line Size64 bytes

    L2 Cache Line Size128 bytes

    Test TypeD-Cache Latency

    Test StatusCompleted successfully

    Selected Tests120

    Set Size134217728 bytes

    Memory Allocation1

    Thread Lock0

    Min Block Size1024 bytes

    Max Block Size10485760 bytes

    Stride Size64 bytes

    Segments Count1

    NOP Count0

    Measurement Mode0

    NOP Latency0

    D-CACHE FORWARD READ LATENCY TEST

    =================================

    Size(bytes)Latency(cycles)Latency(ns)

    102431.64

    10883.041.66

    115231.64

    121631.64

    128031.64

    134431.64

    140831.64

    147231.64

    153631.64

    160031.64

    166431.64

    172831.64

    179231.64

    185631.64

    192031.64

    198431.64

    204831.64

    211231.64

    22403.021.65

    236831.64

    249631.64

    262431.64

    275231.64

    288031.64

    300831.64

    313631.64

    32643.021.65

    339231.64

    352031.64

    364831.64

    377631.64

    390431.64

    403231.64

    416031.64

    43523.021.65

    454431.64

    473631.64

    492831.64

    512031.64

    531231.64

    550431.64

    569631.64

    588831.64

    608031.64

    627231.64

    652831.64

    678431.64

    704031.64

    729631.64

    755231.64

    78083.021.65

    806431.64

    832031.64

    864031.64

    896031.64

    92803.031.65

    960031.64

    992031.64

    1024031.64

    1056031.64

    1094431.64

    1132831.64

    1171231.64

    1209631.64

    1248031.64

    1292831.64

    1337631.64

    1382431.64

    1427231.64

    1472031.64

    1523231.64

    1574431.64

    1625631.64

    1676831.64

    1734431.64

    1792031.64

    1849631.64

    191363.021.65

    1977631.64

    2041631.64

    2105631.64

    2176031.64

    2246431.64

    2316831.64

    2393631.64

    2470431.64

    2553631.64

    2636831.64

    2720031.64

    2809631.64

    2899231.64

    2995231.64

    3091231.64

    3193631.64

    329603.61.97

    340486.83.72

    351369.775.34

    3628812.756.97

    3744014.147.73

    3865614.157.74

    3987214.157.74

    4115214.147.73

    4249614.147.73

    4384014.187.76

    4524814.157.74

    4672014.147.73

    4819214.157.74

    4972814.157.74

    5132814.147.73

    5299214.187.76

    5465614.167.75

    5638414.167.74

    5817614.147.73

    6003214.197.76

    6195214.147.73

    6393614.147.73

    6598414.137.73

    6809614.147.73

    7027214.147.73

    7251214.137.73

    7481614.177.75

    7718414.157.74

    7961614.147.73

    8211214.167.74

    8473614.147.73

    8742414.167.74

    9017614.157.74

    9305614.167.74

    9600014.147.73

    9900814.167.74

    10214414.147.73

    10534414.187.76

    10867214.147.73

    11212814.167.75

    11564814.167.75

    11929614.147.73

    12307214.157.74

    12697614.147.73

    13094414.227.77

    13504014.167.74

    13926414.147.73

    14361614.177.75

    14816014.147.73

    15283214.147.73

    15763214.147.73

    16256014.177.75

    16768014.157.74

    17292814.147.73

    17836814.167.74

    18400014.177.75

    18976014.147.73

    19571214.147.73

    20185614.147.73

    20819214.147.73

    21472014.147.73

    22144014.177.75

    22841614.187.75

    23558414.167.74

    24300814.157.74

    25062414.167.74

    25849614.157.74

    26662414.147.73

    27500814.147.73

    28364814.167.74

    29254414.147.73

    30169614.197.76

    31116814.157.74

    32089614.167.74

    33094414.137.73

    34131214.147.73

    35200014.157.74

    36300814.147.73

    37440014.167.74

    38611214.187.76

    39820814.167.74

    41068814.147.73

    42355214.147.73

    43680014.157.74

    45049614.147.73

    46457614.177.75

    47910414.177.75

    49408014.177.75

    50956814.167.75

    52550414.157.74

    54195214.27.76

    55891214.197.76

    57638414.197.76

    59443214.217.77

    61305614.257.79

    63225614.237.78

    65203214.267.8

    67244814.267.8

    69350414.247.79

    71520014.257.79

    73760014.267.8

    76070414.247.79

    78451214.277.81

    80908814.257.79

    83443214.277.8

    86054414.267.8

    88748814.417.88

    91526414.247.79

    94387214.287.81

    97337614.297.82

    100384014.267.8

    103526414.287.81

    106764814.257.79

    110105614.347.84

    113548814.37.82

    117100814.517.93

    120761614.317.82

    124537614.377.86

    128435214.287.81

    132454414.387.87

    136595214.547.95

    140864014.447.89

    145267214.437.89

    149811215.088.25

    154496014.517.94

    159328014.698.03

    164307215.338.38

    169446414.958.18

    174745616.218.86

    180211216.729.14

    185843216.659.1

    191654418.4210.07

    197644819.4410.63

    203827220.3311.12

    210201626.0614.25

    216774431.6317.3

    223552037.6420.58

    230540842.8123.41

    237747244.5324.35

    245177645.5224.89

    252844845.2424.74

    260748845.7325.01

    268902446.7425.56

    277305646.4825.42

    285977646.5325.45

    294918446.7525.56

    304140846.4925.42

    313651246.6225.49

    323456046.8925.64

    333568047.0125.71

    343993646.8625.62

    354745646.9325.66

    365836847.0525.73

    377273646.8925.64

    389068847.325.86

    401228847.125.75

    413772846.9425.67

    426707246.8125.6

    440044847.2325.83

    453798446.925.64

    467980847.3125.87

    482611247.5225.99

    497696047.1225.77

    513254447.6326.05

    529299246.9625.68

    545843247.1125.76

    562905647.2525.84

    580499247.1125.76

    598643247.3625.9

    617356847.8126.14

    636652847.5426

    656550447.1225.77

    677068847.3825.91

    698227247.2425.83

    720051247.3925.91

    742553647.5526

    765760047.3525.89

    789696047.425.92

    814374447.2425.83

    839827247.1925.81

    866073647.5125.98

    893139247.5225.98

    921056047.5325.99

    949843247.2925.86

    979526447.6426.05

    1010137648.8526.71

    1041708847.6626.06

    D-CACHE BACKWARD READ LATENCY TEST

    ==================================

    Size(bytes)Latency(cycles)Latency(ns)

    102431.64

    10883.021.65

    115231.64

    121631.64

    128031.64

    134431.64

    140831.64

    147231.64

    153631.64

    160031.64

    166431.64

    172831.64

    179231.64

    185631.64

    192031.64

    198431.64

    204831.64

    211231.64

    224031.64

    236831.64

    249631.64

    262431.64

    275231.64

    288031.64

    300831.64

    313631.64

    326431.64

    339231.64

    352031.64

    364831.64

    377631.64

    390431.64

    403231.64

    416031.64

    435231.64

    454431.64

    473631.64

    492831.64

    512031.64

    531231.64

    550431.64

    569631.64

    588831.64

    608031.64

    627231.64

    652831.64

    678431.64

    704031.64

    729631.64

    755231.64

    780831.64

    806431.64

    832031.64

    864031.64

    896031.64

    928031.64

    960031.64

    992031.64

    1024031.64

    1056031.64

    1094431.64

    1132831.64

    1171231.64

    1209631.64

    1248031.64

    1292831.64

    1337631.64

    1382431.64

    1427231.64

    1472031.64

    1523231.64

    1574431.64

    1625631.64

    1676831.64

    1734431.64

    1792031.64

    1849631.64

    1913631.64

    1977631.64

    2041631.64

    2105631.64

    2176031.64

    2246431.64

    2316831.64

    2393631.64

    2470431.64

    2553631.64

    2636831.64

    2720031.64

    2809631.64

    2899231.64

    2995231.64

    3091231.64

    3193631.64

    329603.61.97

    340486.813.72

    351369.745.33

    3628812.756.97

    3744014.137.73

    3865614.197.76

    3987214.167.74

    4115214.147.73

    4249614.147.73

    4384014.147.73

    4524814.167.74

    4672014.157.74

    4819214.167.74

    4972814.157.74

    5132814.147.73

    5299214.167.74

    5465614.147.73

    5638414.137.73

    5817614.147.73

    6003214.157.74

    6195214.167.74

    6393614.187.75

    6598414.147.73

    6809614.137.73

    7027214.157.74

    7251214.147.73

    7481614.167.74

    7718414.167.75

    7961614.157.74

    8211214.147.73

    8473614.147.73

    8742414.167.74

    9017614.147.73

    9305614.137.73

    9600014.147.73

    9900814.147.73

    10214414.157.74

    10534414.197.76

    10867214.137.73

    11212814.157.74

    11564814.157.74

    11929614.137.73

    12307214.167.74

    12697614.147.73

    13094414.157.74

    13504014.27.77

    13926414.157.74

    14361614.167.75

    14816014.147.73

    15283214.137.73

    15763214.147.73

    16256014.157.74

    16768014.157.74

    17292814.167.74

    17836814.147.73

    18400014.147.73

    18976014.157.74

    19571214.187.75

    20185614.157.74

    20819214.147.73

    21472014.167.74

    22144014.167.74

    22841614.167.74

    23558414.147.73

    24300814.147.73

    25062414.137.73

    25849614.197.76

    26662414.167.74

    27500814.147.73

    28364814.157.74

    29254414.167.74

    30169614.167.74

    31116814.157.74

    32089614.177.75

    33094414.177.75

    34131214.157.74

    35200014.147.73

    36300814.187.75

    37440014.157.74

    38611214.157.74

    39820814.147.73

    41068814.187.76

    42355214.177.75

    43680014.147.73

    45049614.147.73

    46457614.147.73

    47910414.147.73

    49408014.157.74

    50956814.157.74

    52550414.167.74

    54195214.197.76

    55891214.187.76

    57638414.217.77

    59443214.37.82

    61305614.327.83

    63225614.277.8

    65203214.287.81

    67244814.297.81

    69350414.337.84

    71520014.257.79

    73760014.617.99

    76070414.337.84

    78451214.387.86

    80908814.337.84

    83443214.357.85

    86054414.437.89

    88748814.367.85

    91526414.267.8

    94387214.397.87

    97337614.287.81

    100384014.317.82

    103526414.387.87

    106764814.347.84

    110105614.437.89

    113548814.417.88

    117100814.517.93

    120761614.347.84

    124537614.47.87

    128435214.397.87

    132454414.517.94

    136595214.357.85

    140864014.668.02

    145267215.348.39

    149811214.778.08

    154496016.59.02

    159328015.198.31

    164307215.788.63

    169446415.278.35

    174745615.28.31

    180211215.788.63

    185843217.589.61

    191654418.5510.14

    197644820.0510.97

    203827221.611.81

    210201626.2514.36

    216774432.2617.64

    223552039.621.65

    230540842.9623.49

    237747245.6224.95

    245177645.9725.14

    252844846.6125.49

    260748846.8225.6

    268902446.1825.25

    277305647.2125.82

    285977646.8725.63

    294918446.9525.67

    304140847.9326.21

    313651247.2425.83

    323456046.6425.5

    333568047.2325.82

    343993647.3425.89

    354745646.9725.68

    365836847.2825.86

    377273647.4425.94

    389068847.6426.05

    401228846.9825.69

    413772847.5225.99

    426707247.4225.93

    440044847.3725.9

    453798447.4925.97

    467980847.7126.09

    482611247.9926.24

    497696047.6726.07

    513254447.425.92

    529299247.6626.06

    545843247.8526.17

    562905647.6826.08

    58049924826.25

    598643247.4925.97

    617356847.726.08

    636652847.6326.05

    656550448.0726.29

    677068847.4925.97

    698227247.7126.09

    720051248.2626.39

    742553647.3425.89

    765760047.2825.86

    789696047.6426.05

    814374447.8826.18

    839827248.0526.28

    866073647.7926.13

    893139247.8526.17

    921056047.626.03

    949843247.4125.92

    979526447.8626.17

    1010137648.8926.73

    1041708847.4625.95

    D-CACHE RANDOM READ LATENCY TEST

    ================================

    Size(bytes)Latency(cycles)Latency(ns)

    10243.031.65

    108831.64

    115231.64

    121631.64

    128031.64

    13443.031.65

    140831.64

    147231.64

    153631.64

    160031.64

    166431.64

    172831.64

    179231.64

    185631.64

    192031.64

    19843.011.65

    204831.64

    211231.64

    224031.64

    236831.64

    249631.64

    26243.031.65

    275231.64

    288031.64

    300831.64

    31363.021.65

    326431.64

    339231.64

    352031.64

    364831.64

    37763.021.65

    390431.64

    403231.64

    416031.64

    435231.64

    45443.031.65

    473631.64

    492831.64

    512031.64

    531231.64

    55043.021.65

    569631.64

    588831.64

    608031.64

    627231.64

    65283.021.65

    678431.64

    704031.64

    729631.64

    755231.64

    78083.021.65

    806431.64

    832031.64

    864031.64

    896031.64

    928031.64

    960031.64

    992031.64

    1024031.64

    105603.021.65

    1094431.64

    1132831.64

    1171231.64

    1209631.64

    1248031.64

    1292831.64

    1337631.64

    1382431.64

    1427231.64

    147203.031.65

    1523231.64

    1574431.64

    1625631.64

    1676831.64

    173443.031.66

    1792031.64

    1849631.64

    1913631.64

    197763.021.65

    2041631.64

    2105631.64

    2176031.64

    2246431.64

    2316831.64

    239363.021.65

    2470431.64

    2553631.64

    2636831.64

    272003.031.66

    2809631.64

    2899231.64

    2995231.64

    3091231.64

    319363.031.66

    329603.61.97

    340486.653.64

    351369.575.24

    3628812.556.86

    3744013.987.65

    3865614.017.66

    3987214.157.74

    4115214.177.75

    4249614.217.77

    4384014.147.73

    4524814.167.74

    4672014.147.73

    4819214.157.74

    4972814.147.73

    5132814.147.73

    5299214.177.75

    5465614.157.74

    5638414.187.75

    5817614.157.74

    6003214.137.73

    6195214.167.74

    6393614.157.74

    6598414.137.73

    6809614.187.75

    7027214.177.75

    7251214.167.74

    7481614.177.75

    7718414.167.74

    7961614.167.74

    8211214.167.74

    8473614.157.74

    8742414.147.73

    9017614.167.74

    9305614.147.73

    9600014.137.73

    9900814.167.75

    10214414.157.74

    10534414.137.73

    10867214.157.74

    11212814.157.74

    11564814.137.73

    11929614.147.73

    12307214.147.73

    12697614.147.73

    13094414.187.75

    13504014.147.73

    13926414.147.73

    14361614.167.74

    14816014.167.74

    15283214.147.73

    15763214.157.74

    16256014.147.73

    16768014.147.73

    17292814.157.74

    17836814.157.74

    18400014.137.73

    18976014.167.74

    19571214.167.74

    20185614.147.73

    20819214.147.73

    21472014.157.74

    22144014.157.74

    22841614.167.75

    23558414.157.74

    24300814.157.74

    25062414.137.73

    25849614.147.73

    26662414.157.73

    27500814.147.73

    28364814.177.75

    29254414.187.76

    30169614.147.73

    31116814.167.75

    32089614.167.74

    33094414.167.74

    34131214.167.75

    35200014.167.74

    36300814.157.74

    37440014.187.75

    38611214.157.74

    39820814.147.73

    41068814.157.74

    42355214.147.73

    43680014.177.75

    45049614.157.74

    46457614.177.75

    47910414.267.8

    49408014.167.74

    50956814.157.74

    52550414.197.76

    54195214.367.85

    55891214.487.92

    57638414.67.98

    59443214.748.06

    61305614.868.12

    63225615.038.22

    65203215.18.25

    67244815.218.32

    69350415.278.35

    71520015.378.4

    73760015.488.46

    76070415.648.55

    78451215.698.58

    80908815.778.62

    83443215.958.72

    86054415.958.72

    88748816.048.77

    91526416.18.81

    94387216.258.88

    97337616.318.92

    100384016.378.95

    103526416.519.03

    106764816.839.21

    110105616.659.1

    113548816.889.23

    117100816.799.18

    120761616.849.21

    124537616.959.27

    1284352179.3

    132454417.159.38

    136595217.319.47

    140864017.49.52

    145267217.549.59

    149811217.439.53

    154496017.69.62

    159328018.4110.07

    164307219.0810.44

    169446418.3510.03

    174745619.0710.43

    180211219.9110.89

    185843221.9812.02

    191654423.4112.8

    197644836.4619.93

    203827242.323.13

    210201649.3426.98

    216774475.0541.04

    223552094.0151.41

    2305408101.1755.32

    2377472108.3159.23

    2451776111.6261.04

    2528448114.4262.57

    2607488116.7263.83

    2689024116.0363.45

    2773056116.7363.83

    2859776117.3464.16

    2949184118.0364.54

    3041408120.8266.07

    3136512121.8866.65

    3234560124.1767.9

    3335680127.0569.47

    3439936127.3669.64

    3547456129.8771.02

    3658368131.2571.77

    3772736136.5474.67

    3890688135.9674.35

    4012288140.1476.63

    413772814177.1

    4267072142.6778.01

    4400448145.4779.55

    4537984146.5480.13

    4679808149.4881.74

    4826112151.6682.93

    4976960151.682.9

    5132544155.9185.26

    5292992156.2785.45

    5458432157.4286.08

    5629056158.6586.76

    5804992159.0386.96

    5986432161.0788.08

    6173568162.0688.62

    6366528163.589.4

    6565504165.1490.3

    6770688167.2891.47

    6982272167.9991.86

    7200512169.9592.93

    7425536170.8993.45

    7657600171.9894.04

    7896960173.1394.67

    8143744174.5195.43

    8398272175.3895.9

    8660736176.1496.32

    8931392177.4697.04

    9210560178.8697.81

    9498432179.698.21

    9795264180.4198.65

    10101376179.5398.17

    10417088180.2598.56

    D-CACHE PSEUDO-RANDOM READ LATENCY TEST

    =======================================

    Size(bytes)Latency(cycles)Latency(ns)

    10243.021.65

    108831.64

    115231.64

    121631.64

    12803.031.65

    134431.64

    140831.64

    147231.64

    153631.64

    16003.031.65

    166431.64

    172831.64

    179231.64

    18563.031.65

    192031.64

    198431.64

    204831.64

    211231.64

    224031.64

    236831.64

    249631.64

    262431.64

    27523.031.65

    288031.64

    300831.64

    313631.64

    32643.021.65

    339231.64

    352031.64

    364831.64

    377631.64

    39043.031.66

    403231.64

    416031.64

    435231.64

    454431.64

    473631.64

    492831.64

    512031.64

    53123.031.66

    550431.64

    569631.64

    588831.64

    60803.021.65

    627231.64

    652831.64

    678431.64

    70403.021.65

    729631.64

    755231.64

    780831.64

    806431.64

    83203.021.65

    864031.64

    896031.64

    928031.64

    96003.031.65

    992031.64

    1024031.64

    1056031.64

    109443.021.65

    1132831.64

    1171231.64

    1209631.64

    1248031.64

    1292831.64

    1337631.64

    1382431.64

    1427231.64

    1472031.64

    1523231.64

    1574431.64

    1625631.64

    167683.031.65

    1734431.64

    1792031.64

    1849631.64

    1913631.64

    1977631.64

    2041631.64

    2105631.64

    217603.031.66

    2246431.64

    2316831.64

    2393631.64

    2470431.64

    255363.031.66

    2636831.64

    2720031.64

    2809631.64

    289923.021.65

    2995231.64

    3091231.64

    3193631.64

    329603.611.97

    340486.83.72

    351369.85.36

    3628812.756.97

    3744014.167.74

    3865614.187.75

    3987214.177.75

    4115214.167.74

    4249614.157.74

    4384014.157.74

    4524814.167.74

    4672014.157.74

    4819214.147.73

    4972814.157.74

    5132814.147.73

    5299214.177.75

    5465614.197.76

    5638414.197.76

    5817614.157.74

    6003214.177.75

    6195214.167.74

    6393614.167.75

    6598414.187.75

    6809614.167.74

    7027214.167.74

    7251214.217.77

    7481614.157.74

    7718414.187.75

    7961614.217.77

    8211214.187.75

    8473614.197.76

    8742414.217.77

    9017614.137.73

    9305614.177.75

    9600014.167.74

    9900814.197.76

    10214414.147.73

    10534414.157.74

    10867214.167.74

    11212814.217.77

    11564814.167.74

    11929614.167.75

    12307214.157.74

    12697614.167.74

    13094414.167.75

    13504014.157.74

    13926414.187.75

    14361614.177.75

    14816014.167.74

    15283214.177.75

    15763214.217.77

    16256014.187.75

    16768014.197.76

    17292814.177.75

    17836814.187.75

    18400014.27.77

    18976014.187.76

    19571214.187.75

    20185614.157.74

    20819214.187.76

    21472014.167.74

    22144014.177.75

    22841614.177.75

    23558414.197.76

    24300814.187.76

    25062414.167.74

    25849614.187.75

    26662414.177.75

    27500814.197.76

    28364814.167.75

    29254414.217.77

    30169614.187.75

    31116814.177.75

    32089614.187.76

    33094414.187.75

    34131214.157.74

    35200014.197.76

    36300814.177.75

    37440014.177.75

    38611214.177.75

    39820814.167.75

    41068814.27.77

    42355214.177.75

    43680014.187.76

    45049614.187.76

    46457614.167.74

    47910414.167.74

    49408014.187.75

    50956814.157.74

    52550414.177.75

    54195214.187.76

    55891214.237.78

    57638414.227.78

    59443214.237.78

    61305614.267.8

    63225614.277.8

    65203214.297.82

    67244814.297.82

    69350414.267.8

    71520014.277.8

    73760014.337.84

    76070414.317.83

    78451214.357.84

    80908814.297.81

    83443214.297.81

    86054414.337.84

    88748814.297.81

    91526414.277.81

    94387214.277.8

    97337614.437.89

    100384014.317.82

    103526414.277.8

    106764814.277.8

    110105614.347.84

    113548814.337.83

    117100814.327.83

    120761614.437.89

    124537614.347.84

    128435214.37.82

    132454414.357.85

    136595214.698.03

    140864014.397.87

    145267214.387.86

    149811214.638

    154496014.597.98

    159328014.78.04

    164307214.968.18

    169446415.238.33

    174745615.218.31

    180211216.058.78

    185843218.219.96

    191654419.4610.64

    197644823.3612.77

    203827229.7916.29

    210201633.9618.57

    216774461.8633.82

    223552077.5542.41

    230540889.1448.74

    237747294.651.73

    245177694.9251.9

    252844895.4952.21

    260748896.8252.95

    268902497.5753.36

    277305697.4453.28

    285977698.153.65

    294918498.2353.71

    304140898.2653.73

    313651298.3953.8

    323456098.6753.95

    333568099.254.25

    343993699.1454.21

    354745698.9354.1

    365836899.1854.23

    377273699.2954.3

    389068899.8654.6

    401228899.4154.36

    413772899.3154.31

    426707299.7254.53

    440044899.6854.51

    453798499.3654.33

    467980899.1454.21

    482611299.5554.44

    497696099.6754.5

    513254499.3154.31

    529299299.2154.25

    545843299.4854.4

    5629056100.1254.75

    580499299.4354.37

    598643299.854.58

    617356899.4354.37

    636652899.5154.42

    656550499.3954.35

    677068899.6754.5

    698227299.4954.4

    720051299.254.25

    742553699.9854.67

    765760099.3854.34

    789696099.8654.61

    814374499.8254.59

    839827299.5854.45

    8660736100.2454.82

    8931392100.2954.84

    9210560100.1554.76

    9498432100.0254.69

    979526499.8754.61

    10101376100.4554.93

    10417088100.1654.77

  • Main Memory[13]

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • What is Main Memory?Where data reside for a program that is currently runningSometimes called RAM (Random Access Memory): you can load from or store into any main memory location at any timeSometimes called core (from magnetic cores that some memories used, many years ago)Much slower => much cheaper => much bigger

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • What Main Memory Looks LikeSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*012345678910536,870,911You can think of main memory as a big long 1D array of bytes.

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • The Relationship BetweenMain Memory & Cache

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • RAM is SlowSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*CPU307 GB/sec[6]4.4 GB/sec[7] (1.4%)BottleneckThe speed of data transferbetween Main Memory and theCPU is much slower than thespeed of calculating, so the CPUspends most of its time waitingfor data to come in or go out.

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Why Have Cache?Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*CPUCache is much closer to the speedof the CPU, so the CPU doesnthave to wait nearly as long forstuff thats already in cache:it can do moreoperations per second!4.4 GB/sec[7] (1%)27 GB/sec (9%)[7]

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Cache & RAM BandwidthsSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Better[26]Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

    Cache & RAM Bandwidth

    14247.1814210.38

    14386.7714180.11

    14435.2214165.56

    14457.6514182.99

    14466.7114185.19

    14476.2814174.7

    14485.114198.12

    14498.3414169.3

    14483.5114182.34

    14498.0114181.44

    14498.7414202.71

    14508.1114139.8

    14507.5814181.87

    14513.114194.58

    14500.0414158.15

    14513.5114186.06

    14516.7714194.11

    14513.8114155.92

    14509.3614188.26

    14516.2714178.59

    14518.3114180.03

    14508.4714184.54

    14511.1814191.08

    14519.2214189.59

    14515.1614190.39

    14509.7714188.29

    14507.1114183.02

    14515.5914192.23

    14508.5314186.89

    14518.0514168.5

    14203.114024.38

    14196.0914044.27

    11519.7211755.13

    8650.69079.46

    7843.678249.91

    7844.668255.21

    7848.98237.1

    7856.328259.32

    7841.438262.92

    7856.788259.96

    7847.188246.82

    7853.258255.26

    7848.058246.77

    7838.118250.94

    7853.978243.19

    7853.788261.14

    7849.578248.73

    7854.248260.4

    7853.68231.9

    7847.248253.33

    7857.718177.5

    7861.58247.45

    7857.648257.7

    7860.948253.15

    7837.248255.14

    7853.538243.88

    7850.788254.16

    7860.198240.45

    7862.658235.63

    7860.98257.26

    7865.088261.5

    7865.778247.68

    7858.748247.77

    7857.178205.32

    7863.588238.6

    7861.458249.15

    7863.958235.33

    7865.718255.3

    7855.078243.98

    7868.618258.39

    7853.498236.09

    7860.658210.25

    7859.688254

    7867.578253.59

    7867.228259.73

    7851.688246.81

    7866.578256.6

    7861.838241.05

    7836.388247.16

    7867.898243.68

    7867.728261.13

    7865.678240.21

    7867.298224.45

    7860.898257.53

    7866.978248.44

    7857.758256.1

    7864.658246.46

    7866.928256.44

    7864.358240.2

    7868.738256.85

    7857.198257.51

    7850.648260.73

    7865.468251.95

    7858.488259.42

    7860.478259.32

    7864.988261.1

    7872.738245.37

    7867.558259.95

    7861.678224.83

    7857.068237.93

    7860.468199.36

    7849.258257.12

    7870.998189.58

    7865.28215.24

    7862.68237.23

    7852.58248.6

    7852.088245.84

    7866.578258.02

    7856.98250.85

    7853.578230.01

    7840.18190.94

    7820.758177.06

    7806.388128.59

    7777.878140.93

    7779.198088.6

    7778.958101.97

    7774.668060.33

    7752.698101

    7770.038094.78

    7776.568053.9

    7758.878101.12

    7772.358068.58

    7772.168095.8

    7763.728074.29

    7773.928083.87

    7777.268070.48

    7771.848099.73

    7719.598075.81

    7768.828096.28

    7775.098056.91

    7771.168086.48

    7755.968096.3

    7760.128078.07

    7759.88068.23

    7631.788054.83

    7618.28047.63

    7732.027956.87

    7655.858042.72

    7725.87996.28

    7714.618056.72

    7515.067858

    7624.127895.5

    7637.937741.66

    7636.537945.8

    7341.237853.66

    7515.287781.19

    7237.047747.09

    7461.547250.46

    7256.97378.14

    6696.777106.15

    6737.496343.97

    6547.116249.16

    5991.115018.53

    5836.484482.27

    4959.793418.65

    4657.392772.85

    4336.382547.29

    3988.922312.31

    40162166.92

    3931.62095.44

    3810.271995.24

    3820.211954.69

    3848.281899.1

    3799.91840.8

    3772.091796.24

    3765.111742.41

    3750.181723.53

    3743.71703.83

    3682.081668.98

    3664.431642.6

    3704.841618.6

    3619.751592.33

    3603.381585.18

    3629.121563.99

    3661.951535.88

    3667.141531.17

    3665.121524.09

    3658.151495.36

    3603.331507.28

    3649.911488.69

    3607.021469.99

    3496.531455.48

    3640.171464.54

    3644.711452.88

    3596.211437.13

    3635.891446.52

    3537.251427.81

    3623.651422.92

    3597.521420.96

    3601.781442.96

    3568.331423.41

    3584.591443.29

    3608.711427.09

    3626.661400.76

    3584.251398.39

    3566.341386.02

    3616.271407.95

    3500.151398.48

    32 KB (L1 cache size)

    2 MB (L2 cache size)

    7.7 GB/sec

    14.2 GB/sec

    3.5 GB/sec

    1.4 GB/sec

    Read BW

    Write BW

    Array Size (bytes)

    Bandwidth (MB/sec)

    Cache & RAM Bandwidth: Intel T2400 (1.83 GHz)

    Sheet1

    102414247.1814210.38

    204814386.7714180.11

    307214435.2214165.56

    409614457.6514182.99

    512014466.7114185.19

    614414476.2814174.7

    716814485.114198.12

    819214498.3414169.3

    921614483.5114182.34

    1024014498.0114181.44

    1126414498.7414202.71

    1228814508.1114139.8

    1331214507.5814181.87

    1433614513.114194.58

    1536014500.0414158.15

    1638414513.5114186.06

    1740814516.7714194.11

    1843214513.8114155.92

    1945614509.3614188.26

    2048014516.2714178.59

    2150414518.3114180.03

    2252814508.4714184.54

    2355214511.1814191.08

    2457614519.2214189.59

    2560014515.1614190.39

    2662414509.7714188.29

    2764814507.1114183.02

    2867214515.5914192.23

    2969614508.5314186.89

    3072014518.0514168.5

    3174414203.114024.38

    3276814196.0914044.27

    3379211519.7211755.13

    358408650.69079.46

    378887843.678249.91

    399367844.668255.21

    419847848.98237.1

    440327856.328259.32

    460807841.438262.92

    481287856.788259.96

    501767847.188246.82

    522247853.258255.26

    542727848.058246.77

    563207838.118250.94

    583687853.978243.19

    604167853.788261.14

    624647849.578248.73

    645127854.248260.4

    665607853.68231.9

    696327847.248253.33

    727047857.718177.5

    757767861.58247.45

    788487857.648257.7

    819207860.948253.15

    849927837.248255.14

    880647853.538243.88

    911367850.788254.16

    942087860.198240.45

    972807862.658235.63

    1003527860.98257.26

    1044487865.088261.5

    1085447865.778247.68

    1126407858.748247.77

    1167367857.178205.32

    1208327863.588238.6

    1249287861.458249.15

    1290247863.958235.33

    1331207865.718255.3

    1382407855.078243.98

    1433607868.618258.39

    1484807853.498236.09

    1536007860.658210.25

    1587207859.688254

    1638407867.578253.59

    1689607867.228259.73

    1751047851.688246.81

    1812487866.578256.6

    1873927861.838241.05

    1935367836.388247.16

    1996807867.898243.68

    2068487867.728261.13

    2140167865.678240.21

    2211847867.298224.45

    2283527860.898257.53

    2355207866.978248.44

    2437127857.758256.1

    2519047864.658246.46

    2600967866.928256.44

    2682887864.358240.2

    2775047868.738256.85

    2867207857.198257.51

    2959367850.648260.73

    3061767865.468251.95

    3164167858.488259.42

    3266567860.478259.32

    3368967864.988261.1

    3481607872.738245.37

    3594247867.558259.95

    3706887861.678224.83

    3829767857.068237.93

    3952647860.468199.36

    4085767849.258257.12

    4218887870.998189.58

    4352007865.28215.24

    4495367862.68237.23

    4638727852.58248.6

    4792327852.088245.84

    4945927866.578258.02

    5109767856.98250.85

    5273607853.578230.01

    5447687840.18190.94

    5621767820.758177.06

    5806087806.388128.59

    5990407777.878140.93

    6184967779.198088.6

    6379527778.958101.97

    6584327774.668060.33

    6799367752.698101

    7014407770.038094.78

    7239687776.568053.9

    7475207758.878101.12

    7710727772.358068.58

    7956487772.168095.8

    8212487763.728074.29

    8478727773.928083.87

    8744967777.268070.48

    9021447771.848099.73

    9308167719.598075.81

    9605127768.828096.28

    9912327775.098056.91

    10229767771.168086.48

    10557447755.968096.3

    10895367760.128078.07

    11243527759.88068.23

    11601927631.788054.83

    11970567618.28047.63

    12349447732.027956.87

    12738567655.858042.72

    13137927725.87996.28

    13557767714.618056.72

    13987847515.067858

    14428167624.127895.5

    14888967637.937741.66

    15360007636.537945.8

    15841287341.237853.66

    16343047515.287781.19

    16855047237.047747.09

    17387527461.547250.46

    17940487256.97378.14

    18503686696.777106.15

    19087366737.496343.97

    19691526547.116249.16

    20316165991.115018.53

    20951045836.484482.27

    21606404959.793418.65

    22282244657.392772.85

    22978564336.382547.29

    23705603988.922312.31

    244531240162166.92

    25221123931.62095.44

    26009603810.271995.24

    26828803820.211954.69

    27668483848.281899.1

    28538883799.91840.8

    29440003772.091796.24

    30361603765.111742.41

    31313923750.181723.53

    32296963743.71703.83

    33310723682.081668.98

    34355203664.431642.6

    35430403704.841618.6

    36546563619.751592.33

    37693443603.381585.18

    38881283629.121563.99

    40099843661.951535.88

    41359363667.141531.17

    42659843665.121524.09

    44001283658.151495.36

    45383683603.331507.28

    46807043649.911488.69

    48271363607.021469.99

    49786883496.531455.48

    51343363640.171464.54

    52951043644.711452.88

    54609923596.211437.13

    56320003635.891446.52

    58081283537.251427.81

    59904003623.651422.92

    61777923597.521420.96

    63713283601.781442.96

    65710083568.331423.41

    67768323584.591443.29

    69888003608.711427.09

    72079363626.661400.76

    74332163584.251398.39

    76656643566.341386.02

    79052803616.271407.95

    81530883500.151398.48

    rmma_intelt2400_membw_20070904

    RIGHTMARK MEMORY ANALYZER V3.72 TEST RESULTS

    ============================================

    CPU ModelGenuine Intel(R) Core(TM) Duo (Yonah) 1828.7 MHz

    L1 Cache Line Size64 bytes

    L2 Cache Line Size128 bytes

    Test TypeMemory Bandwidth

    Test StatusCompleted successfully

    Selected Tests7

    Set Size134217728 bytes

    Memory Allocation1

    Thread Lock0

    Min Block Size1024 bytes

    Max Block Size8388608 bytes

    CPU Register Usage0

    Read Prefetch Type0

    Prefetch Distance0 bytes

    Block Prefetch Size1024 bytes

    Stride Size64 bytes

    Non-Temporal Store0

    Copy-to-Self Mode0

    MEMORY READ BANDWIDTH TEST

    ==========================

    Size(bytes)Bandwidth(bytes/cycle)Bandwidth(MB/s)

    10247.7914247.18

    20487.8714386.77

    30727.8914435.22

    40967.9114457.65

    51207.9114466.71

    61447.9214476.28

    71687.9214485.1

    81927.9314498.34

    92167.9214483.51

    102407.9314498.01

    112647.9314498.74

    122887.9314508.11

    133127.9314507.58

    143367.9414513.1

    153607.9314500.04

    163847.9414513.51

    174087.9414516.77

    184327.9414513.81

    194567.9314509.36

    204807.9414516.27

    215047.9414518.31

    225287.9314508.47

    235527.9414511.18

    245767.9414519.22

    256007.9414515.16

    266247.9314509.77

    276487.9314507.11

    286727.9414515.59

    296967.9314508.53

    307207.9414518.05

    317447.7714203.1

    327687.7614196.09

    337926.311519.72

    358404.738650.6

    378884.297843.67

    399364.297844.66

    419844.297848.9

    440324.37856.32

    460804.297841.43

    481284.37856.78

    501764.297847.18

    522244.297853.25

    542724.297848.05

    563204.297838.11

    583684.297853.97

    604164.297853.78

    624644.297849.57

    645124.297854.24

    665604.297853.6

    696324.297847.24

    727044.37857.71

    757764.37861.5

    788484.37857.64

    819204.37860.94

    849924.297837.24

    880644.297853.53

    911364.297850.78

    942084.37860.19

    972804.37862.65

    1003524.37860.9

    1044484.37865.08

    1085444.37865.77

    1126404.37858.74

    1167364.37857.17

    1208324.37863.58

    1249284.37861.45

    1290244.37863.95

    1331204.37865.71

    1382404.37855.07

    1433604.37868.61

    1484804.297853.49

    1536004.37860.65

    1587204.37859.68

    1638404.37867.57

    1689604.37867.22

    1751044.297851.68

    1812484.37866.57

    1873924.37861.83

    1935364.297836.38

    1996804.37867.89

    2068484.37867.72

    2140164.37865.67

    2211844.37867.29

    2283524.37860.89

    2355204.37866.97

    2437124.37857.75

    2519044.37864.65

    2600964.37866.92

    2682884.37864.35

    2775044.37868.73

    2867204.37857.19

    2959364.297850.64

    3061764.37865.46

    3164164.37858.48

    3266564.37860.47

    3368964.37864.98

    3481604.37872.73

    3594244.37867.55

    3706884.37861.67

    3829764.37857.06

    3952644.37860.46

    4085764.297849.25

    4218884.37870.99

    4352004.37865.2

    4495364.37862.6

    4638724.297852.5

    4792324.297852.08

    4945924.37866.57

    5109764.37856.9

    5273604.297853.57

    5447684.297840.1

    5621764.287820.75

    5806084.277806.38

    5990404.257777.87

    6184964.257779.19

    6379524.257778.95

    6584324.257774.66

    6799364.247752.69

    7014404.257770.03

    7239684.257776.56

    7475204.247758.87

    7710724.257772.35

    7956484.257772.16

    8212484.257763.72

    8478724.257773.92

    8744964.257777.26

    9021444.257771.84

    9308164.227719.59

    9605124.257768.82

    9912324.257775.09

    10229764.257771.16

    10557444.247755.96

    10895364.247760.12

    11243524.247759.8

    11601924.177631.78

    11970564.177618.2

    12349444.237732.02

    12738564.197655.85

    13137924.227725.8

    13557764.227714.61

    13987844.117515.06

    14428164.177624.12

    14888964.187637.93

    15360004.187636.53

    15841284.017341.23

    16343044.117515.28

    16855043.967237.04

    17387524.087461.54

    17940483.977256.9

    18503683.666696.77

    19087363.686737.49

    19691523.586547.11

    20316163.285991.11

    20951043.195836.48

    21606402.714959.79

    22282242.554657.39

    22978562.374336.38

    23705602.183988.92

    24453122.24016

    25221122.153931.6

    26009602.083810.27

    26828802.093820.21

    27668482.13848.28

    28538882.083799.9

    29440002.063772.09

    30361602.063765.11

    31313922.053750.18

    32296962.053743.7

    33310722.013682.08

    343552023664.43

    35430402.033704.84

    36546561.983619.75

    37693441.973603.38

    38881281.983629.12

    400998423661.95

    41359362.013667.14

    426598423665.12

    440012823658.15

    45383681.973603.33

    468070423649.91

    48271361.973607.02

    49786881.913496.53

    51343361.993640.17

    52951041.993644.71

    54609921.973596.21

    56320001.993635.89

    58081281.933537.25

    59904001.983623.65

    61777921.973597.52

    63713281.973601.78

    65710081.953568.33

    67768321.963584.59

    69888001.973608.71

    72079361.983626.66

    74332161.963584.25

    76656641.953566.34

    79052801.983616.27

    81530881.913500.15

    MAXIMAL READ BANDWIDTH

    245767.9414519.22

    MEMORY WRITE BANDWIDTH TEST

    ===========================

    Size(bytes)Bandwidth(bytes/cycle)Bandwidth(MB/s)

    10247.7714210.38

    20487.7514180.11

    30727.7514165.56

    40967.7614182.99

    51207.7614185.19

    61447.7514174.7

    71687.7614198.12

    81927.7514169.3

    92167.7614182.34

    102407.7514181.44

    112647.7714202.71

    122887.7314139.8

    133127.7514181.87

    143367.7614194.58

    153607.7414158.15

    163847.7614186.06

    174087.7614194.11

    184327.7414155.92

    194567.7614188.26

    204807.7514178.59

    215047.7514180.03

    225287.7614184.54

    235527.7614191.08

    245767.7614189.59

    256007.7614190.39

    266247.7614188.29

    276487.7614183.02

    286727.7614192.23

    296967.7614186.89

    307207.7514168.5

    317447.6714024.38

    327687.6814044.27

    337926.4311755.13

    358404.969079.46

    378884.518249.91

    399364.518255.21

    419844.58237.1

    440324.528259.32

    460804.528262.92

    481284.528259.96

    501764.518246.82

    522244.518255.26

    542724.518246.77

    563204.518250.94

    583684.518243.19

    604164.528261.14

    624644.518248.73

    645124.528260.4

    665604.58231.9

    696324.518253.33

    727044.478177.5

    757764.518247.45

    788484.528257.7

    819204.518253.15

    849924.518255.14

    880644.518243.88

    911364.518254.16

    942084.518240.45

    972804.58235.63

    1003524.528257.26

    1044484.528261.5

    1085444.518247.68

    1126404.518247.77

    1167364.498205.32

    1208324.518238.6

    1249284.518249.15

    1290244.58235.33

    1331204.518255.3

    1382404.518243.98

    1433604.528258.39

    1484804.58236.09

    1536004.498210.25

    1587204.518254

    1638404.518253.59

    1689604.528259.73

    1751044.518246.81

    1812484.518256.6

    1873924.518241.05

    1935364.518247.16

    1996804.518243.68

    2068484.528261.13

    2140164.518240.21

    2211844.58224.45

    2283524.528257.53

    2355204.518248.44

    2437124.518256.1

    2519044.518246.46

    2600964.518256.44

    2682884.518240.2

    2775044.528256.85

    2867204.528257.51

    2959364.528260.73

    3061764.518251.95

    3164164.528259.42

    3266564.528259.32

    3368964.528261.1

    3481604.518245.37

    3594244.528259.95

    3706884.58224.83

    3829764.58237.93

    3952644.488199.36

    4085764.528257.12

    4218884.488189.58

    4352004.498215.24

    4495364.58237.23

    4638724.518248.6

    4792324.518245.84

    4945924.528258.02

    5109764.518250.85

    5273604.58230.01

    5447684.488190.94

    5621764.478177.06

    5806084.448128.59

    5990404.458140.93

    6184964.428088.6

    6379524.438101.97

    6584324.418060.33

    6799364.438101

    7014404.438094.78

    7239684.48053.9

    7475204.438101.12

    7710724.418068.58

    7956484.438095.8

    8212484.428074.29

    8478724.428083.87

    8744964.418070.48

    9021444.438099.73

    9308164.428075.81

    9605124.438096.28

    9912324.418056.91

    10229764.428086.48

    10557444.438096.3

    10895364.428078.07

    11243524.418068.23

    11601924.48054.83

    11970564.48047.63

    12349444.357956.87

    12738564.48042.72

    13137924.377996.28

    13557764.418056.72

    13987844.37858

    14428164.327895.5

    14888964.237741.66

    15360004.347945.8

    15841284.297853.66

    16343044.257781.19

    16855044.247747.09

    17387523.967250.46

    17940484.037378.14

    18503683.897106.15

    19087363.476343.97

    19691523.426249.16

    20316162.745018.53

    20951042.454482.27

    21606401.873418.65

    22282241.522772.85

    22978561.392547.29

    23705601.262312.31

    24453121.182166.92

    25221121.152095.44

    26009601.091995.24

    26828801.071954.69

    27668481.041899.1

    28538881.011840.8

    29440000.981796.24

    30361600.951742.41

    31313920.941723.53

    32296960.931703.83

    33310720.911668.98

    34355200.91642.6

    35430400.891618.6

    36546560.871592.33

    37693440.871585.18

    38881280.861563.99

    40099840.841535.88

    41359360.841531.17

    42659840.831524.09

    44001280.821495.36

    45383680.821507.28

    46807040.811488.69

    48271360.81469.99

    49786880.81455.48

    51343360.81464.54

    52951040.791452.88

    54609920.791437.13

    56320000.791446.52

    58081280.781427.81

    59904000.781422.92

    61777920.781420.96

    63713280.791442.96

    65710080.781423.41

    67768320.791443.29

    69888000.781427.09

    72079360.771400.76

    74332160.761398.39

    76656640.761386.02

    79052800.771407.95

    81530880.761398.48

    MAXIMAL WRITE BANDWIDTH

    10247.7714210.38

    MEMORY COPY BANDWIDTH TEST

    ==========================

    Size(bytes)Bandwidth(bytes/cycle)Bandwidth(MB/s)

    10245.8210648.75

    20486.2511431.17

    30726.1811309.64

    40966.3311583.39

    51206.2511429.84

    61446.3511607.8

    71686.2911510.59

    81926.3411599.43

    92166.3111531.02

    102406.3611622.61

    112646.3211551.17

    122886.3411599.04

    133126.3211555.78

    143366.3511620.62

    153606.1711287.61

    163846.1211196.41

    174083.957228.88

    184322.915326.2

    194562.955396.99

    2048035491.1

    215042.965417.67

    225282.935357.97

    235522.965414.54

    2457635491.85

    256002.975432.51

    266242.945384.95

    276482.975435.02

    286723.015498.14

    296962.975440.47

    307202.955403.39

    317442.985447.28

    3276835494.46

    337922.985453.21

    358402.985457.75

    378882.985457.63

    399362.985452.18

    419842.995472.63

    440322.995468.92

    460802.995463.01

    481282.985455.08

    5017635481.14

    5222435484.35

    542722.995470.94

    5632035484.3

    583682.995463.75

    6041635485.67

    624642.995474.95

    6451235489.08

    6656035481.57

    696323.015505.31

    7270435491.62

    757762.995472.22

    7884835485.58

    8192035485.19

    849923.015498.06

    880642.995465.31

    911362.995476.62

    942083.015502.87

    972803.015499.31

    10035235490.08

    10444835491.96

    1085443.015496.4

    11264035490.72

    1167363.015496.95

    1208323.015502.93

    12492835479.51

    12902435488.61

    13312035486.26

    1382403.015509.75

    14336035494.93

    1484803.015502.54

    1536003.015498.09

    1587203.015495.8

    1638403.015507.81

    16896035495.35

    1751043.015511.85

    1812483.015503.42

    1873923.015507.25

    1935363.015502.67

    1996803.015510.99

    2068483.025513.68

    21401635481.46

    2211843.015503.89

    2283523.025517.06

    2355203.015500.83

    2437123.025516.56

    2519043.015496.48

    2600963.015502.12

    2682883.015497.64

    27750435489.9

    2867202.975435.19

    2959362.995467.49

    3061762.995471.84

    3164162.985450.19

    3266562.985458.68

    3368962.985445.75

    3481602.945380.86

    3594242.995462.88

    3706882.995467.74

    3829762.985441.7

    3952642.985454.69

    4085762.995465.03

    4218882.975434.2

    4352002.985452.61

    4495362.985447.14

    4638722.965418.31

    4792322.975433.75

    4945922.985449.67

    5109762.995461.12

    5273602.985446.15

    5447682.965420.68

    5621762.985445.81

    5806082.975436.67

    5990402.95305.18

    6184962.965412.74

    6379522.975426.45

    6584322.935359.76

    6799362.965415.67

    7014402.975426.47

    7239682.935362.93

    7475202.965412.62

    7710722.925334.95

    7956482.895288.48

    8212482.815132.24

    8478722.845188.02

    8744962.734989.11

    9021442.745004.81

    9308162.624794.01

    9605122.474511.26

    9912322.314223.45

    10229762.163958.48

    10557441.993644.64

    10895361.522771.07

    11243521.332424.51

    11601921.182149.8

    11970561.12009.58

    12349441.031874.84

    12738561.041896.16

    13137921.011847.3

    13557760.991815.14

    13987840.951733.94

    14428160.921690.97

    14888960.911662.4

    15360000.891631.91

    15841280.871592.98

    16343040.861576.36

    16855040.851555.82

    17387520.841535.8

    17940480.821501.29

    18503680.811484.7

    19087360.81459.02

    19691520.791442.87

    20316160.791444.34

    20951040.781422.38

    21606400.771404.3

    22282240.761390.2

    22978560.751379.78

    23705600.741360.23

    24453120.741358.37

    25221120.731336.56

    26009600.731332.53

    26828800.721309.32

    27668480.711306.57

    28538880.71287.96

    29440000.711294.49

    30361600.71274.23

    31313920.691265.65

    32296960.691258.17

    33310720.681247.35

    34355200.681248.1

    35430400.671232.97

    36546560.671221.7

    37693440.671216.35

    38881280.661212.83

    40099840.661206.05

    41359360.651189.26

    42659840.651183.24

    44001280.641165.64

    45383680.651184.89

    46807040.641164.57

    48271360.651190.02

    49786880.631155.19

    51343360.641163.24

    52951040.641162.7

    54609920.641168.97

    56320000.651183.23

    58081280.631160.75

    59904000.631151.1

    61777920.631158.05

    63713280.641171.36

    65710080.641170.73

    67768320.631151.42

    69888000.641168.89

    72079360.631157.37

    74332160.641167.04

    76656640.641171.26

    79052800.631153.88

    81530880.641176.82

    MAXIMAL COPY BANDWIDTH

    102406.3611622.61

  • Cache Use JargonCache Hit: the data that the CPU needs right now are already in cache.Cache Miss: the data that the CPU needs right now are not currently in cache.If all of your data are small enough to fit in cache, then when you run your program, youll get almost all cache hits (except at the very beginning), which means that your performance could be excellent!Sadly, this rarely happens in real life: most problems of scientific or engineering interest are bigger than just a few MB.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Cache LinesA cache line is a small, contiguous region in cache, corresponding to a contiguous region in RAM of the same size, that is loaded all at once.Typical size: 32 to 1024 bytesExamplesCore 2 Duo [26]L1 data cache: 64 bytes per lineL2 cache: 64 bytes per linePOWER7 [28]L1 instruction cache: 128 bytes per lineL1 data cache: 128 bytes per lineL2 cache: 128 bytes per lineL3 cache: 128bytes per line Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • How Cache WorksWhen you request data from a particular address in Main Memory, heres what happens:The hardware checks whether the data for that address is already in cache. If so, it uses it.Otherwise, it loads from Main Memory the entire cache line that contains the address.For example, on a 1.83 GHz Pentium4 Core Duo (Yonah), a cache miss makes the program stall (wait) at least 48 cycles (26.2 nanoseconds) for the next cache line to load time that could have been spent performing up to 192 calculations! [26]Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • If Its in Cache, Its Also in RAMIf a particular memory address is currently in cache, then its also in Main Memory (RAM).That is, all of a programs data are in Main Memory, but some are also in cache.Well revisit this point shortly.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Mapping Cache Lines to RAMMain memory typically maps into cache in one of three ways:Direct mapped (occasionally)Fully associative (very rare these days)Set associative (common)DONTPANIC!Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Direct Mapped CacheDirect Mapped Cache is a scheme in which each location in main memory corresponds to exactly one location in cache (but not the reverse, since cache is much smaller than main memory).Typically, if a cache address is represented by c bits, and a main memory address is represented by m bits, then the cache location associated with main memory address A is MOD(A,2c); that is, the lowest c bits of A.Example: POWER4 L1 instruction cacheSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Direct Mapped Cache IllustrationSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Must go intocache address11100101Main Memory Address0100101011100101Notice that 11100101 is the low 8 bits of 0100101011100101.

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Jargon: Cache ConflictSuppose that the cache address 11100101 currently contains RAM address 0100101011100101.But, we now need to load RAM address 1100101011100101, which maps to the same cache address as 0100101011100101.This is called a cache conflict : the CPU needs a RAM location that maps to a cache line already in use.In the case of direct mapped cache, every cache conflict leads to the new cache line clobbering the old cache line.This can lead to serious performance problems.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Problem with Direct Mapped: F90If you have two arrays that start in the same place relative to cache, then they might clobber each other all the time: no cache hits!Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*REAL,DIMENSION(multiple_of_cache_size) :: a, b, cINTEGER :: index

    DO index = 1, multiple_of_cache_size a(index) = b(index) + c(index)END DOIn this example, a(index), b(index) and c(index) all map to the same cache line, so loading c(index) clobbers b(index) no cache reuse!Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Problem with Direct Mapped: CIf you have two arrays that start in the same place relative to cache, then they might clobber each other all the time: no cache hits!Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*float a[multiple_of_cache_size], b[multiple_of_cache_size, c[multiple_of_cache_size];int index;

    for (index = 0; index < multiple_of_cache_size; index++) { a[index] = b[index] + c[index]; }In this example, a[index], b[index] and c[index] all map to the same cache line, so loading c[index] clobbers b[index] no cache reuse!Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Fully Associative CacheFully Associative Cache can put any line of main memory into any cache line.Typically, the cache management system will put the newly loaded data into the Least Recently Used cache line, though other strategies are possible (e.g., Random, First In First Out, Round Robin, Least Recently Modified).So, this can solve, or at least reduce, the cache conflict problem.But, fully associative cache tends to be expensive, so its pretty rare: you need Ncache. NRAM connections!Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Fully Associative IllustrationSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Could go intoany cache lineMain Memory Address0100101011100101

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Set Associative CacheSet Associative Cache is a compromise between direct mapped and fully associative. A line in main memory can map to any of a fixed number of cache lines.For example, 2-way Set Associative Cache can map each main memory line to either of 2 cache lines (e.g., to the Least Recently Used), 3-way maps to any of 3 cache lines, 4-way to 4 lines, and so on.Set Associative cache is cheaper than fully associative you need K . NRAM connections but more robust than direct mapped.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • 2-Way Set Associative IllustrationSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Could go into cache address11100101Main Memory Address0100101011100101Could go intocache address01100101OR

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Cache Associativity ExamplesCore 2 Duo [26]L1 data cache: 8-way set associativeL2 cache: 8-way set associativePOWER4 [12]L1 instruction cache: direct mappedL1 data cache: 2-way set associativeL2 cache: 8-way set associativeL3 cache: 8-way set associativePOWER7 [28]L1 instruction cache: 4-way set associativeL1 data cache: 8-way set associativeL2 cache: 8-way set associativeL3 cache: 8-way set associative

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • If Its in Cache, Its Also in RAMAs we saw earlier:If a particular memory address is currently in cache, then its also in Main Memory (RAM).That is, all of a programs data are in Main Memory, but some are also in cache.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Changing a Value Thats in CacheSuppose that you have in cache a particular line of main memory (RAM).If you dont change the contents of any of that lines bytes while its in cache, then when it gets clobbered by another main memory line coming into cache, theres no loss of information.But, if you change the contents of any byte while its in cache, then you need to store it back out to main memory before clobbering it. Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Cache Store StrategiesTypically, there are two possible cache store strategies:Write-through: every single time that a value in cache is changed, that value is also stored back into main memory (RAM).Write-back: every single time that a value in cache is changed, the cache line containing that cache location gets marked as dirty. When a cache line gets clobbered, then if it has been marked as dirty, then it is stored back into main memory (RAM). [14] Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Cache Store ExamplesCore 2 Duo [26]L1 cache: write-backPentium D [26]L1 cache: write-throughSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • The Importance of Being Local[15]

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • More Data Than CacheLets say that you have 1000 times more data than cache. Then wont most of your data be outside the cache?

    YES!

    Okay, so how does cache help?Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Improving Your Cache Hit RateMany scientific codes use a lot more data than can fit in cache all at once.Therefore, you need to ensure a high cache hit rate even though youve got much more data than cache.So, how can you improve your cache hit rate?Use the same solution as in Real Estate:Location, Location, Location!Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Data LocalityData locality is the principle that, if you use data in a particular memory address, then very soon youll use either the same address or a nearby address.Temporal locality: if youre using address A now, then youll probably soon use address A again.Spatial locality: if youre using address A now, then youll probably soon use addresses between A-k and A+k, where k is small.Note that this principle works well for sufficiently small values of soon.Cache is designed to exploit locality, which is why a cache miss causes a whole line to be loaded.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Data Locality Is Empirical: CData locality has been observed empirically in many, many programs.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*void ordered_fill (float* array, int array_length){ /* ordered_fill */ int index;

    for (index = 0; index < array_length; index++) { array[index] = index; } /* for index */} /* ordered_fill */

    Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Data Locality Is Empirical: F90Data locality has been observed empirically in many, many programs.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*SUBROUTINE ordered_fill (array, array_length) IMPLICIT NONE INTEGER,INTENT(IN) :: array_length REAL,DIMENSION(array_length),INTENT(OUT) :: array INTEGER :: index

    DO index = 1, array_length array(index) = index END DOEND SUBROUTINE ordered_fill

    Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • No Locality Example: CIn principle, you could write a program that exhibited absolutely no data locality at all:Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*void random_fill (float* array, int* random_permutation_index, int array_length){ /* random_fill */ int index;

    for (index = 0; index < array_length; index++) { array[random_permutation_index[index]] = index; } /* for index */} /* random_fill */Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • No Locality Example: F90In principle, you could write a program that exhibited absolutely no data locality at all:Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*SUBROUTINE random_fill (array, random_permutation_index, array_length) IMPLICIT NONE INTEGER,INTENT(IN) :: array_length INTEGER,DIMENSION(array_length),INTENT(IN) :: && random_permutation_index REAL,DIMENSION(array_length),INTENT(OUT) :: array INTEGER :: index

    DO index = 1, array_length array(random_permutation_index(index)) = index END DOEND SUBROUTINE random_fillSupercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Permuted vs. OrderedSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*In a simple array fill, locality provides a factor of 8 to 20 speedup over a randomly ordered fill on a Pentium4.

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

    Permuted vs Ordered

    27.251.379997

    11.070.76

    4.050.36

    1.640.18

    0.710.09

    0.350.04

    0.180.02

    0.080.01

    Random

    Ordered

    Array size (log2 bytes)

    CPU seconds

    Sheet1

    Sizelog(size)RandomOrdered

    1342177282727.251.379997

    671088642611.070.76

    33554432254.050.36

    16777216241.640.18

    8388608230.710.09

    4194304220.350.04

    2097152210.180.02

    1048576200.080.01

    Sheet2

    Sheet3

  • Exploiting Data LocalityIf you know that your code is capable of operating with a decent amount of data locality, then you can get speedup by focusing your energy on improving the locality of the codes behavior.This will substantially increase your cache reuse.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • A Sample ApplicationMatrix-Matrix MultiplyLet A, B and C be matrices of sizesnr nc, nr nk and nk nc, respectively:Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*The definition of A = B C isfor r {1, nr}, c {1, nc}.Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Matrix Multiply w/InitializationSUBROUTINE matrix_matrix_mult_by_init (dst, src1, src2, & & nr, nc, nq) IMPLICIT NONE INTEGER,INTENT(IN) :: nr, nc, nq REAL,DIMENSION(nr,nc),INTENT(OUT) :: dst REAL,DIMENSION(nr,nq),INTENT(IN) :: src1 REAL,DIMENSION(nq,nc),INTENT(IN) :: src2

    INTEGER :: r, c, q

    DO c = 1, nc DO r = 1, nr dst(r,c) = 0.0 DO q = 1, nq dst(r,c) = dst(r,c) + src1(r,q) * src2(q,c) END DO !! q END DO !! r END DO !! cEND SUBROUTINE matrix_matrix_mult_by_init

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Matrix Multiply w/Initializationvoid matrix_matrix_mult_by_init ( float** dst, float** src1, float** src2, int nr, int nc, int nq){ /* matrix_matrix_mult_by_init */ int r, c, q;

    for (r = 0; r < nr; r++) { for (c = 0; c < nc; c++) { dst[r][c] = 0.0; for (q = 0; q < nq; q++) { dst[r][c] = dst[r][c] + src1[r][q] * src2[q][c]; } /* for q */ } /* for c */ } /* for r */} /* matrix_matrix_mult_by_init */Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Matrix Multiply Via IntrinsicSUBROUTINE matrix_matrix_mult_by_intrinsic ( & & dst, src1, src2, nr, nc, nq) IMPLICIT NONE INTEGER,INTENT(IN) :: nr, nc, nq REAL,DIMENSION(nr,nc),INTENT(OUT) :: dst REAL,DIMENSION(nr,nq),INTENT(IN) :: src1 REAL,DIMENSION(nq,nc),INTENT(IN) :: src2

    dst = MATMUL(src1, src2)END SUBROUTINE matrix_matrix_mult_by_intrinsic

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Matrix Multiply BehaviorSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*If the matrix is big, then each sweep of a row will clobber nearby values in cache.

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • Performance of Matrix MultiplySupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

    Chart1

    0.020.010.01

    0.050.050.03

    0.10.110.08

    0.380.760.28

    1.782.180.58

    8.979.592.4

    17.920.984.72

    72.881.4618.76

    156.6161.4437.41

    723.9690.98151.55

    Naive

    Init

    Intrinsic

    Total Problem Size in bytes (nr*nc+nr*nq+nq*nc)

    CPU sec

    Matrix-Matrix Multiply

    Sheet1

    819200.020.010.01

    1966080.050.050.03

    3276800.10.110.08

    7864320.380.760.28

    13107201.782.180.58

    31457288.979.592.4

    524288017.920.984.72

    1258291272.881.4618.76

    20971520156.6161.4437.41

    50331648723.9690.98151.55

    Sheet2

    Sheet3

  • TilingSupercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*

    Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011

  • TilingTile: a small rectangular subdomain of a problem domain. Sometimes called a block or a chunk.Tiling: breaking the domain into tiles.Tiling strategy: operate on each tile to completion, then move to the next tile.Tile size can be set at runtime, according to whats best for the machine that youre running on.Supercomputing in Plain English: Storage Hierarchy BWUPEP2011, UIUC, May 29 - June 10 2011*Supercomputing in Plain English: Storage HierarchyTue Feb 15 2011

    Supe