6

Notes on Homework 1

Notes on Homework 1. 2x2 Matrix Multiply C 00 += A 00 B 00 + A 01 B 10 C 10 += A 10 B 00 + A 11 B 10 C 01 += A 00 B 01 + A 01 B 11 C 11 += A 10 B 01 +

Download PPTX Report

Upload
brook-watts
View
213
Download
0

Embed Size (px)

Citation preview

Page 1: Notes on Homework 1. 2x2 Matrix Multiply C 00 += A 00 B 00 + A 01 B 10 C 10 += A 10 B 00 + A 11 B 10 C 01 += A 00 B 01 + A 01 B 11 C 11 += A 10 B 01 +

Notes on Homework 1

Page 2: Notes on Homework 1. 2x2 Matrix Multiply C 00 += A 00 B 00 + A 01 B 10 C 10 += A 10 B 00 + A 11 B 10 C 01 += A 00 B 01 + A 01 B 11 C 11 += A 10 B 01 +

2x2 Matrix Multiply

C00 += A00B00 + A01B10

C10 += A10B00 + A11B10

C01 += A00B01 + A01B11

C11 += A10B01 + A11B11

Rewrite as SIMD algebra

C00_C10 += A00_A10 * B00_B00C01_C11 += A00_A10 * B01_B01C00_C10 += A01_A11 * B10_B10C01_C11 += A01_A11 * B11_B11

02/11/2009CS267 Lecture 7

Page 3: Notes on Homework 1. 2x2 Matrix Multiply C 00 += A 00 B 00 + A 01 B 10 C 10 += A 10 B 00 + A 11 B 10 C 01 += A 00 B 01 + A 01 B 11 C 11 += A 10 B 01 +

Summary of SSE intrinsics

#include <emmintrin.h>

Vector data type:• __m128d

Load and store operations:• _mm_load_pd• _mm_store_pd• _mm_loadu_pd• _mm_storeu_pd

Load and broadcast across vector• _mm_load1_pd

Arithmetic:• _mm_add_pd• _mm_mul_pd

Page 4: Notes on Homework 1. 2x2 Matrix Multiply C 00 += A 00 B 00 + A 01 B 10 C 10 += A 10 B 00 + A 11 B 10 C 01 += A 00 B 01 + A 01 B 11 C 11 += A 10 B 01 +

Example: multiplying 2x2 matrices

#include <emmintrin.h>c1 = _mm_loadu_pd( C+0*lda ); //load unaligned block in Cc2 = _mm_loadu_pd( C+1*lda );for( int i = 0; i < 2; i++ ){ a = _mm_load_pd( A+i*lda ); //load aligned i-th column of A b1 = _mm_load1_pd( B+i+0*lda ); //load i-th row of B b2 = _mm_load1_pd( B+i+1*lda ); c1=_mm_add_pd( c1, _mm_mul_pd( a, b1 ) ); //rank-1 update c2=_mm_add_pd( c2, _mm_mul_pd( a, b2 ) );}_mm_storeu_pd( C+0*lda, c1 ); //store unaligned block in C_mm_storeu_pd( C+1*lda, c2 );

Page 5: Notes on Homework 1. 2x2 Matrix Multiply C 00 += A 00 B 00 + A 01 B 10 C 10 += A 10 B 00 + A 11 B 10 C 01 += A 00 B 01 + A 01 B 11 C 11 += A 10 B 01 +

General suggestions for optimizations• Changing the order of the loops (i, j, k)

• changes which matrix stays in memory and the type of product• Blocking for multiple levels (registers or SIMD, L1, L2)

• L3 is not important as most matrices will fit and optimizations not likely to bring more benefits

• Unrolling the loops• Copy optimizations

• will be a large benefit if done for sub-matrix that is kept in memory• careful not to overdo (overhead is high for every level)

• Aligning memory and padding• can help with SIMD performance• eliminating special cases and bad performance

• Adding SIMD intrinsics• cannot get over 50% without explicit intrinsics or auto-vectorization

• Tuning parameters• various block sizes, register sizes (perhaps automate)

Page 6: Notes on Homework 1. 2x2 Matrix Multiply C 00 += A 00 B 00 + A 01 B 10 C 10 += A 10 B 00 + A 11 B 10 C 01 += A 00 B 01 + A 01 B 11 C 11 += A 10 B 01 +

Other issues• Checking compilers

• options available PGI, PathScale, GNU, Cray• Cray compiler seems to have an issue with explicit intrinsics

• Using compiler flags• remember not allowed to use MM specific flags (-matmul)

• Checking assembly code for SSE instructions • use the –S flag to see assembly• should have mainly ADDPD & MULPD ops, ADDSD & MULSD are

scalar computations• Remember the write-up

• want to know what optimizations you tried and what worked and failed

• try and have an incremental design and show the performance of multiple iterations

• DUE DATE changed, now due Monday Feb 17th at 11:59pm

Jak przygotować się do transformacji cyfrowej? · 65% Taki odsetek przedsiębiorstw w regionie CEE nie słyszał o Digital Transformation. 01 00 01 00 01 00 01 00 01 00 01 00 01

Jak przygotować się do transformacji cyfrowej? · 65% Taki odsetek przedsiębiorstw w regionie CEE nie słyszał o Digital Transformation. 01 00 01 00 01 00 01 00 01 00 01 00 01

Documents

ΕΤΕΠ 01-01-01-00

ΕΤΕΠ 01-01-01-00

Documents

Land Cruiser - Toyota Belgium€¦ · Land Cruiser kW/pk Nuttige Lading M9717-PRIXB-01 • 01/01/2017 00:00:00 • 31/01/2017 00:00:00 Verantwoordelijke uitgever/Editeur responsable:

Land Cruiser - Toyota Belgium€¦ · Land Cruiser kW/pk Nuttige Lading M9717-PRIXB-01 • 01/01/2017 00:00:00 • 31/01/2017 00:00:00 Verantwoordelijke uitgever/Editeur responsable:

Documents

LUZz 00 LUZ b€¦ · LUZz 00 LUZ b

LUZz 00 LUZ b€¦ · LUZz 00 LUZ b

Documents

B 2017 07 20100204330 21/07/2017 00:00:00 3088 …portal.inen.sld.pe/wp-content/uploads/2018/01/TRANS-III-IV... · b 2017 07 20377339461 21/07/2017 00:00:00 3074 6837 21/07/2017 00:00:00

B 2017 07 20100204330 21/07/2017 00:00:00 3088 …portal.inen.sld.pe/wp-content/uploads/2018/01/TRANS-III-IV... · b 2017 07 20377339461 21/07/2017 00:00:00 3074 6837 21/07/2017 00:00:00

Documents

CentreCOM GS924XL ユーザーマニュアル...01-80-C2-00-00-02 LACP 01-80-C2-00-00-03 802.1X EAP/EAPOL frame 01-80-C2-00-00-10 Bridge Management Group address 01-80-C2-00-00-20

CentreCOM GS924XL ユーザーマニュアル...01-80-C2-00-00-02 LACP 01-80-C2-00-00-03 802.1X EAP/EAPOL frame 01-80-C2-00-00-10 Bridge Management Group address 01-80-C2-00-00-20

Documents

( Espiritismo) - C B - Aula 00 - Apresentacao do Curso Basico de Espiritismo # 01.pptx

( Espiritismo) - C B - Aula 00 - Apresentacao do Curso Basico de Espiritismo # 01.pptx

Documents

MK22 - Benz Häcksler · MK22 01 00 02 00 03 00 03 01 04 01 05 00 04 03 05 01 05 02 05 03 06 00 07 00 07 01 08 00 09 00 09 01 10 00 10 01 13 00 13 01 16 00 16 01 16 02 16 03 16 04

MK22 - Benz Häcksler · MK22 01 00 02 00 03 00 03 01 04 01 05 00 04 03 05 01 05 02 05 03 06 00 07 00 07 01 08 00 09 00 09 01 10 00 10 01 13 00 13 01 16 00 16 01 16 02 16 03 16 04

Documents

01 2:00 4/15 1 1 2:00 02 4/15 (B) 075F Takara standard

01 2:00 4/15 1 1 2:00 02 4/15 (B) 075F Takara standard

Documents

11x17 Cantherm template · Thermal Cut-Off 250 Vac MTKF 1A MTTF 1/2A MTYF 4/5A ceramic housing size B (00): 38 size B (01): 68 size B (00): 38 size B (01): 68 size B (00): 38

11x17 Cantherm template · Thermal Cut-Off 250 Vac MTKF 1A MTTF 1/2A MTYF 4/5A ceramic housing size B (00): 38 size B (01): 68 size B (00): 38 size B (01): 68 size B (00): 38

Documents

H:Projekte083-Borchen -00 1. Änd. B-Plan Nr. 21 ... · Title: H:Projekte083-Borchen -00 1. Änd. B-Plan Nr. 21, Hilgenthal!98 Bearbeitung083-031-00-00-B3-01-00-00 B-Plan (1) Author:

H:Projekte083-Borchen -00 1. Änd. B-Plan Nr. 21 ... · Title: H:Projekte083-Borchen -00 1. Änd. B-Plan Nr. 21, Hilgenthal!98 Bearbeitung083-031-00-00-B3-01-00-00 B-Plan (1) Author:

Documents

B.Tech Syllabus Copy B Tech... · Web viewStatistical Process Control in Spinning 03 00 02 03 01 00 04 TT409B Advances in Finishing 03 00 02 03 01 00 04 TT409C Merchandizing & Supply

B.Tech Syllabus Copy B Tech... · Web viewStatistical Process Control in Spinning 03 00 02 03 01 00 04 TT409B Advances in Finishing 03 00 02 03 01 00 04 TT409C Merchandizing & Supply

Documents

OFFSET DISK HARROW - Marchesancatalogo.marchesan.com.br/PDF/0501091466I.pdf · 028 0511065629 offset bar 01 00 00 00 00 029 0511064166 offset bar 01 01 01 01 01 030 0531018099 37,50

OFFSET DISK HARROW - Marchesancatalogo.marchesan.com.br/PDF/0501091466I.pdf · 028 0511065629 offset bar 01 00 00 00 00 029 0511064166 offset bar 01 01 01 01 01 030 0531018099 37,50

Documents

900-0197-01-00 REV B - OutBack Power Inc

900-0197-01-00 REV B - OutBack Power Inc

Documents

HORARIOS DE TUTORIA PRESENCIAL III … · centro universitario 05 san carlos ... b 01 cristians 10:00 a 12:00 estrada alvarado tutr q ... b arturo01 d 09:00 a 12:00 baltodano tutr

HORARIOS DE TUTORIA PRESENCIAL III … · centro universitario 05 san carlos ... b 01 cristians 10:00 a 12:00 estrada alvarado tutr q ... b arturo01 d 09:00 a 12:00 baltodano tutr

Documents

INDEX []00-23 Diffuse B-cell lymphoma (Gastric large cell lymphoma, B-cell type) – 84% 00-34 Eosinophilic gastroenteritis – 99% 00-40 Lymphocytic colitis – 80% G.I. 01-23 Basaloid

INDEX []00-23 Diffuse B-cell lymphoma (Gastric large cell lymphoma, B-cell type) – 84% 00-34 Eosinophilic gastroenteritis – 99% 00-40 Lymphocytic colitis – 80% G.I. 01-23 Basaloid

Documents

COMPOSICION MODULAR ESCALERA EM - · PDF filedin7168 en esla c (1 : 5) (2) e09-00-00-26 (4) n01-08-20-01 (4) n02-08-00-01 b (1 : 5) (2) e06-03-00-00 6063 t6 5,7 a (4) e09-00-00-26

COMPOSICION MODULAR ESCALERA EM - · PDF filedin7168 en esla c (1 : 5) (2) e09-00-00-26 (4) n01-08-20-01 (4) n02-08-00-01 b (1 : 5) (2) e06-03-00-00 6063 t6 5,7 a (4) e09-00-00-26

Documents

o B s s * àíé yr yr O OO r-H I 01 Cñ —s 00 1+1 oo o 9 …...o B s s * àíé yr yr O OO r-H I 01 Cñ —s 00 1+1 oo o 9-1} 00 O CD oo tJ1 00 00 oo 00 00 oo O O 00 oo 00 00 oo

o B s s * àíé yr yr O OO r-H I 01 Cñ —s 00 1+1 oo o 9 …...o B s s * àíé yr yr O OO r-H I 01 Cñ —s 00 1+1 oo o 9-1} 00 O CD oo tJ1 00 00 oo 00 00 oo O O 00 oo 00 00 oo

Documents

Section 00 01 01 PROJECT MANUAL & TITLE PAGESection Title Page 00 01 01 Title Page 00 01 01: 1 00 01 10 Table of Contents 00 01 10: 1-4 00 01 20 List of Drawing Sheets 00 01 20: 1

Section 00 01 01 PROJECT MANUAL & TITLE PAGESection Title Page 00 01 01 Title Page 00 01 01: 1 00 01 10 Table of Contents 00 01 10: 1-4 00 01 20 List of Drawing Sheets 00 01 20: 1

Documents

001-340...CW202 DSP Section Rank 81 01 01 01 01 10 01 10 10 01 RESULTS OCT 1999 1999 — (5) 00 00 oo 00 00 00 00 00 00 00 00 00 00 00 00 oo 00 00 00 00 00 oo 00 00 00 00 00 oo 00

001-340...CW202 DSP Section Rank 81 01 01 01 01 10 01 10 10 01 RESULTS OCT 1999 1999 — (5) 00 00 oo 00 00 00 00 00 00 00 00 00 00 00 00 oo 00 00 00 00 00 oo 00 00 00 00 00 oo 00

Documents

Post-Hoc Analysis with TTK - GitHub Pages · 2021. 7. 29. · A 00.vtu A 01.vtu B 50.vtu B 51.vtu B 54.vtu (a)File System Sim, Time, FILE A, 00, data/A 00.vtu A, 01, data/A 01.vtu

Post-Hoc Analysis with TTK - GitHub Pages · 2021. 7. 29. · A 00.vtu A 01.vtu B 50.vtu B 51.vtu B 54.vtu (a)File System Sim, Time, FILE A, 00, data/A 00.vtu A, 01, data/A 01.vtu

Documents

Outline: LHCb and LCG-AA Ph.Charpentier 01 10 100 111 01 1 01 01 00 01 01 010 10 11 01 00 B 00 l e

Outline: LHCb and LCG-AA Ph.Charpentier 01 10 100 111 01 1 01 01 00 01 01 010 10 11 01 00 B 00 l e

Documents

900-0093-01-00 REV B FLEXnet DC

900-0093-01-00 REV B FLEXnet DC

Documents

2016 - DGO · CAPÍ- GRU- ARTI- SUB- RU-TULO PO GO ARTI- BRI-GO CA RECEITAS CORRENTES 01 00 00 00 00 IMPOSTOS DIRETOS 01 01 00 00 00 Sobre o Rendimento 01 01 01 00 00 Imposto sobre

2016 - DGO · CAPÍ- GRU- ARTI- SUB- RU-TULO PO GO ARTI- BRI-GO CA RECEITAS CORRENTES 01 00 00 00 00 IMPOSTOS DIRETOS 01 01 00 00 00 Sobre o Rendimento 01 01 01 00 00 Imposto sobre

Documents

Force Wise/State Wise list of Medal awardees to the Police ...45 bpr&d 00 00 00 01 46 ncrb 00 00 00 01 47 nia 00 00 01 02 48 svp npa 00 00 00 02 49 nepa 00 00 00 01 50 ncb 00 00 00

Force Wise/State Wise list of Medal awardees to the Police ...45 bpr&d 00 00 00 01 46 ncrb 00 00 00 01 47 nia 00 00 01 02 48 svp npa 00 00 00 02 49 nepa 00 00 00 01 50 ncb 00 00 00

Documents

EDITAL Nº 03/2015- RETIFICADO€¦ · Ana Dorziat Barbosa de Melo 01 01 02 Edna Gusmão de G. Brennand 00 00 00 Fernando Cézar B. de Andrade 02 01 03 Maria Eulina Pessoa de Carvalho

EDITAL Nº 03/2015- RETIFICADO€¦ · Ana Dorziat Barbosa de Melo 01 01 02 Edna Gusmão de G. Brennand 00 00 00 Fernando Cézar B. de Andrade 02 01 03 Maria Eulina Pessoa de Carvalho

Documents

900-0198-01-00 REV B - OutBack Power Inc

900-0198-01-00 REV B - OutBack Power Inc

Documents

99127 00:00 5201 11890 + 11889 + 11792 + 11500 11191 + … · 2019. 11. 29. · STB NK RHSE RHS RHGB RH 23:50-00:00 00:10 00:20 00:30 00:40 00:50 01:00 01:10 01:20 01:30 01:40 01:50

99127 00:00 5201 11890 + 11889 + 11792 + 11500 11191 + … · 2019. 11. 29. · STB NK RHSE RHS RHGB RH 23:50-00:00 00:10 00:20 00:30 00:40 00:50 01:00 01:10 01:20 01:30 01:40 01:50

Documents

€¦ · b 00 CO b b o b o 00 00 o co 00 b 00 tso co o o 00 oo 00 00 00 oo t. txo ... O O O O O O O o z Cho o o o O O X p o O N 00 00 00 00 00 00 t. 00 txo 00

€¦ · b 00 CO b b o b o 00 00 o co 00 b 00 tso co o o 00 oo 00 00 00 oo t. txo ... O O O O O O O o z Cho o o o O O X p o O N 00 00 00 00 00 00 t. 00 txo 00

Documents

Overall Half Marathon - MILO Philippines · PDF file11 20459 Roy DACUTANAN 01:29:17 01:29:13 00:16:18 00:53:31 M B:35 ... 45 20317 Roberto JACINTO 01:42:55 01:42:46 00:18:27 01:00:07

Overall Half Marathon - MILO Philippines · PDF file11 20459 Roy DACUTANAN 01:29:17 01:29:13 00:16:18 00:53:31 M B:35 ... 45 20317 Roberto JACINTO 01:42:55 01:42:46 00:18:27 01:00:07

Documents