42
Vectors with Values on the JVM Razvan Lupusoru – Intel Paul Sandoz – Oracle @PaulSandoz October 5, 2017

Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Vectors with Values on the JVM

Razvan  Lupusoru  –  Intel    

Paul  Sandoz  –  Oracle  @PaulSandoz  October  5,  2017  

Page 2: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Intel Legal Disclaimer & Optimization Notice •  INFORMATION  IN  THIS  DOCUMENT  IS  PROVIDED  “AS  IS”.  NO  LICENSE,  EXPRESS  OR  IMPLIED,  BY  ESTOPPEL  OR  OTHERWISE,  TO  ANY  

INTELLECTUAL  PROPERTY  RIGHTS  IS  GRANTED  BY  THIS  DOCUMENT.  INTEL  ASSUMES  NO  LIABILITY  WHATSOEVER  AND  INTEL  DISCLAIMS  ANY  EXPRESS  OR  IMPLIED  WARRANTY,  RELATING  TO  THIS  INFORMATION  INCLUDING  LIABILITY  OR  WARRANTIES  RELATING  TO  FITNESS  FOR  A  PARTICULAR  PURPOSE,  MERCHANTABILITY,  OR  INFRINGEMENT  OF  ANY  PATENT,  COPYRIGHT  OR  OTHER  INTELLECTUAL  PROPERTY  RIGHT.  

•  SoRware  and  workloads  used  in  performance  tests  may  have  been  opZmized  for  performance  only  on  Intel  microprocessors.    Performance  tests,  such  as  SYSmark  and  MobileMark,  are  measured  using  specific  computer  systems,  components,  soRware,  operaZons  and  funcZons.    Any  change  to  any  of  those  factors  may  cause  the  results  to  vary.    You  should  consult  other  informaZon  and  performance  tests  to  assist  you  in  fully  evaluaZng  your  contemplated  purchases,  including  the  performance  of  that  product  when  combined  with  other  products.    

•  Copyright  ©  2016,  Intel  CorporaZon.  All  rights  reserved.  Intel,  PenZum,  Xeon,  Xeon  Phi,  Core,  VTune,  Cilk,  and  the  Intel  logo  are  trademarks  of  Intel  CorporaZon  in  the  U.S.  and  other  countries.  

Op#miza#on  No#ce  

Intel’s  compilers  may  or  may  not  opZmize  to  the  same  degree  for  non-­‐Intel  microprocessors  for  opZmizaZons  that  are  not  unique  to  Intel  microprocessors.  These  opZmizaZons  include  SSE2,  SSE3,  and  SSSE3  instrucZon  sets  and  other  opZmizaZons.  Intel  does  not  guarantee  the  availability,  funcZonality,  or  effecZveness  of  any  opZmizaZon  on  microprocessors  not  manufactured  by  Intel.  Microprocessor-­‐dependent  opZmizaZons  in  this  product  are  intended  for  use  with  Intel  microprocessors.  Certain  opZmizaZons  not  specific  to  Intel  microarchitecture  are  reserved  for  Intel  microprocessors.  Please  refer  to  the  applicable  product  User  and  Reference  Guides  for  more  informaZon  regarding  the  specific  instrucZon  sets  covered  by  this  noZce.  

NoZce  revision  #20110804  

2  

Page 3: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Oracle Safe Harbor Statement

3  

 The   following   is   intended   to   outline   our   general   product  direcZon.  It  is  intended  for  informaZon  purposes  only,  and  may  not  be   incorporated   into  any  contract.   It   is  not  a  commitment  to  deliver   any  material,   code,   or   funcZonality,   and   should  not  be   relied   upon   in   making   purchasing   decisions.   The  development,   release,   and   Zming   of   any   features   or  funcZonality   described   for   Oracle’s   products   remains   at   the  sole  discreZon  of  Oracle.    

Page 4: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Java first, Java always, Java for ML

4  

Page 5: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Overview

•  Explain  SIMD  and  why  it  is  useful  •  Introduce  the  Vector  API  and  some  examples  • Deep  dive  into  how  the  Vector  API  is  opZmized  on  Intel  CPUs  

5  

Page 6: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Let’s talk SIMD (by John Rose) (the fight for love and glory!)

It’s  sZll  the  same  old  story,  Though  now  it's  mulZ-­‐core-­‐y;  The  data  lane-­‐wise  fly.  The  SIMD  basics  never  lie,  (The  Monoid  functor  maps  “APPLY”)  As  Zme  goes  by.    

6  

Page 7: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

What is SIMD?

•  Single  InstrucZon  MulZple  Data  • With  one  “instrucZon”  operate  on  more  stuff    

7  

Page 8: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Scalar addition

8  

sta#c  void  scalarAdd(int[]  a,  int[]  b,  int[]  r)  {          for  (int  i  =  0;  i  <  a.length;  i++)  {                  r[i]  =  a[i]  +  b[i];          }  }    

Page 9: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Unrolled scalar addition

9  

sta#c  void  unrolledScalarAdd(int[]  a,  int[]  b,  int[]  r)  {          int  i  =  0;          int  lanes  =  4;          for  (;  i  <  a.length  -­‐  a.length  %  lanes;  i  +=  lanes)  {                  r[i  +  0]  =  a[i  +  0]  +  b[i  +  0];                  r[i  +  1]  =  a[i  +  1]  +  b[i  +  1];                  r[i  +  2]  =  a[i  +  2]  +  b[i  +  2];                  r[i  +  3]  =  a[i  +  3]  +  b[i  +  3];          }            if  (i  >  a.length)  {                  i  -­‐=  lanes;                  for  (;  i  <  a.length;  i++)  {                          r[i]  =  a[i]  +  b[i];                  }          }  }    

Page 10: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

SIMD addition

10  

sta#c  void  simdAdd(int[]  a,  int[]  b,  int[]  r)  {          int  i  =  0;          int  lanes=  4;          for  (;  i  <  a.length  -­‐  a.length  %  lanes;  i  +=  lanes)  {                  r[i  +  0]            a[i  +  0]            b[i  +  0];                  r[i  +  1]            a[i  +  1]            b[i  +  1];                  r[i  +  2]            a[i  +  2]            b[i  +  2];                  r[i  +  3]            a[i  +  3]            b[i  +  3];          }            if  (i  >  a.length)  {                  i  -­‐=  lanes;                  for  (;  i  <  a.length;  i++)  {                          r[i]  =  a[i]  +  b[i];                  }          }  }    

=   +  

Page 11: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

SIMD addition with the Vector API

11  

sta#c  void  vectorAdd(int[]  a,  int[]  b,  int[]  r)  {          int  i  =  0;          int  lanes=  4;          for  (;  i  <  a.length  -­‐  a.length  %  lanes;  i  +=  lanes)  {                  IntVector<…>  av  =  INT_256_SPECIES.fromArray(a,  i);                  IntVector<…>  bv  =  INT_256_SPECIES.fromArray(b,  i);                  av.add(bv).intoArray(r,  i);          }            if  (i  >  a.length)  {                  i  -­‐=  lanes;                  for  (;  i  <  a.length;  i++)  {                          r[i]  =  a[i]  +  b[i];                  }          }  }    

Page 12: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

SIMD masking the tail with the Vector API

12  

sta#c  void  vectorAdd(int[]  a,  int[]  b,  int[]  r)  {          int  i  =  0;          int  lanes=  4;          for  (;  i  <  a.length  -­‐  a.length  %  lanes;  i  +=  lanes)  {                  IntVector<…>  av  =  INT_256_SPECIES.fromArray(a,  i);                  IntVector<…>  bv  =  INT_256_SPECIES.fromArray(b,  i);                  av.add(bv).intoArray(r,  i);          }            if  (i  >  a.length)  {                  Vector.Mask<…>  m  =  tailMask(a.length,  lanes);                  i  -­‐=  lanes;                  IntVector<…>  av  =  INT_256_SPECIES.fromArray(a,  i,  m);                  IntVector<…>  bv  =  INT_256_SPECIES.fromArray(b,  i,  m);                  av.add(bv).intoArray(r,  i,  m);          }  }    

Page 13: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

SIMD specific use cases

•  Image  manipulaZon  •  Linear  Algebra  (BLAS)  • Machine/Deep  learning  •  (Matrix  mulZplicaZon,  both  sparse  and  deep)  • Cryptographic  algorithms  •  Financial  applicaZons  • Numerous  use-­‐cases  within  the  JDK  

13  

Page 14: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Goal of Vector API

•  Express  simple  and  complex  SIMD-­‐based  computaZons  with  clear  code  

• Good  reliable  performance  maximizing  use  of  a  processor  (Intel,  ARM,  GPU?)  

• Graceful  degradaZon  when  funcZonality  not  available    

14  

Page 15: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

What about auto-vectorization?

•  The  runZme  compiler  can  convert  some  scalar  loops  into  vectorized  loops  is  fragile  (superword)  

•  The  set  of  loops  recognized  is  limited  and  can  be  fragile  

15  

Page 16: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Vector API overview

public  interface  Vector<E,  S  extends  …>  {          …          Vector<E,  S>  add(Vector<E,  S>  o);          Vector<E,  S>  add(Vector<E,  S>  o,  Mask<E,  S>  m);  }  • A  Vector  has  two  type  variables  

•  A  scalar  type,  E,  and  a  shape,  S  • A  shape  defines  how  many  scalars  are  packed  together  (#  lanes)  • Vectors  of  the  same  shape  can  be  directly  operated  on  

  16  

Page 17: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Vector API overview

public  interface  Species<E,  S  extends  …>  {          …          Vector<E,  S>  zero();          Vector<E,  S>  fromByteArray(byte[]  bs,  int  ix);  }  • A  Vector  is  instanZated  from  a  species  (factory)    

17  

Page 18: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Vector API use cases in the JDK

• GeneraZng  hash  codes  • Array  mismatch  •  SorZng  (counZng  ascending/descending  runs)    

18  

Page 19: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Improvements: or The Monoid functor maps “APPLY”!    DoubleVector<S>  map(StrictMath::log1p)    • Use  higher  order  funcZons  

•  Requires  “lambda  cracking”  to  extract  out  the  scalar  operaZon  and  apply  the  SIMD  equivalent  instrucZon(s)  

• Many  operaZons  can  be  folded  into  a  general  map  operaZon  •  This  will  reduce  the  number  of  explicit  operaZons  

•  Including  scalar  to  vector  conversions,  such  as  broadcast  • Vector  wants  to  be  a  value  type  whose  element  type  is  also  a  value  

19  

Page 20: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Hardware vs Java Mismatch

Size  (bits)   8   16   32   64   128   256   512   …  

X86  Register  

AL   AX   EAX   RAX   XMM0   YMM0   ZMM0   …  

Java  Type   byte   short   int   long   …  

20  

Page 21: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Hardware vs Java Mismatch

Size  (bits)   8   16   32   64   128   256   512   …  

X86  Register  

AL   AX   EAX   RAX   XMM0   YMM0   ZMM0   …  

Java  Type   byte   short   int   long   Int128Vector  Float128Vector  Double128Vector  …  

Int256Vector  Float256Vector  Double256Vector  …    

Int512Vector  Float512Vector  Double512Vector  …    

…  

21  

Page 22: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Value Types Primer

• Value  Type  -­‐  user-­‐defined  primiZve  type  • A  value  type  holds  its  own  data  in  its  allocated  memory.  

• Vector  API  classes  can  all  be  value  types:  •  The  interest  is  in  the  VALUE  they  hold,  and  not  the  container.  

22  

Page 23: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Int128Vector

23  

Value  Header   128  bit  Value  

Object  Header   Array  Field  

Object  Header   Array  Length   int_0   int_1   int_3  int_2  

Value  Type  Memory  Layout  

Vector  API  Current  Memory  Layout  

opZonal  

Page 24: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

So why not use value types?

•  Ideally,  we  should  be  using  value  types.  • But…  

1.  Value  Type  support  is  in  Valhalla.  Vector  API  work  is  in  Panama.  2.  Before  Value  Types,  Minimal  Value  Types  are  coming:  

•  Supported  at  VM  level,  language  level  changes  are  further  out.  •  To  use  them,  Method  Handles  and  combinators  are  used  to  refer  to  value  types  and  pass  them  around.  

•  Very  verbose  language  is  needed  to  construct  and  put  together:  •  Vladimir  Ivanov  (Oracle)  coined  the  term  Vector  Pain-­‐gramming  when  referring  to  verbosity  needed  to  construct  simple  vector  algorithms.  

 

24  

Page 25: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Absence of Value Types

• Value  type  support  is  the  real  soluZon  for  ensuring  performance  of  Vector  API.  

• But,  we  have  another  trick  in  meanZme:  •  Use  Hotspot  C2  type  system  to  map  Vector  API  classes  to  appropriate  registers  to  hold  their  values.  

25  

Int128Vector  TypeVect  <int,  4>   xmm  

Page 26: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Intrinsification of Vector API

•  IntrinsificaZon:  Name  of  compiler  technique  used  to  replace  a  method’s  implementaZon  with  a  faster,  hand-­‐opZmized  version.  

• Vector  API  IntrinsificaZon:  Used  to  convert  calls  to  the  Vector  API  to  intermediate  representaZon  in  compiler  that  represents  desired  operaZon  semanZcs.  

vec3  =  vec1.add(vec2);   AddVFNode  

26  

Page 27: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

for  (int  i  =  0;  i  +  spec.length()  <  a.length;  i  +=  spec.length())  {  

   FloatVector<S>  vec1  =  spec.fromArray(a,  i);  

   FloatVector<S>  vec2  =  spec.fromArray(b,  i);  

   vec1.add(vec2).intoArray(c,  i);  

}  

for  (int  i  =  0;  i  <  a.length;  i++)  {  

   c[i]  =  a[i]  +  b[i];  

}  

27  

Page 28: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

for  (int  i  =  0;  i  +  spec.length()  <  a.length;  i  +=  spec.length())  {  

   FloatVector<S>  vec1  =  spec.fromArray(a,  i);  

   FloatVector<S>  vec2  =  spec.fromArray(b,  i);  

   vec1.add(vec2).intoArray(c,  i);  

}  

ConINode  

StoreVectorNode  

LoadVectorNode  <float,  8>  

LoadVectorNode  <float,  8>  

AddVFNode  <float,  8>  

28  

Page 29: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

StoreVectorNode  

LoadVectorNode  (float,  8)  

LoadVectorNode  (float,  8)  

AddVFNode  (float,  8)  

vmovdqu  0x10(%r11,%rbx,4),%ymm0  

vmovdqu  0x10(%r10,%rbx,4),%ymm1  

vaddps  %ymm1,%ymm0,%ymm0  

vmovdqu  %ymm0,0x10(%r8,%rbx,4)  

29  

Page 30: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Actual code generated vmovdqu  0x10(%r11,%rbx,4),%ymm0  vmovdqu  0x10(%r10,%rbx,4),%ymm1  vaddps  %ymm1,%ymm0,%ymm0  vmovdqu  %ymm0,0x10(%r8,%rbx,4)  vmovdqu  0x30(%r11,%rbx,4),%ymm0  vmovdqu  0x30(%r10,%rbx,4),%ymm1  vaddps  %ymm1,%ymm0,%ymm0  vmovdqu  %ymm0,0x30(%r8,%rbx,4)  vmovdqu  0x50(%r11,%rbx,4),%ymm0  vmovdqu  0x50(%r10,%rbx,4),%ymm1  vaddps  %ymm1,%ymm0,%ymm0  vmovdqu  %ymm0,0x50(%r8,%rbx,4)  vmovdqu  0x70(%r11,%rbx,4),%ymm0  vmovdqu  0x70(%r10,%rbx,4),%ymm1  vaddps  %ymm1,%ymm0,%ymm0      vmovdqu  %ymm0,0x70(%r8,%rbx,4)  vmovdqu  0x90(%r11,%rbx,4),%ymm0  vmovdqu  0x90(%r10,%rbx,4),%ymm1  vaddps  %ymm1,%ymm0,%ymm0  vmovdqu  %ymm0,0x90(%r8,%rbx,4)  vmovdqu  0xb0(%r11,%rbx,4),%ymm0  vmovdqu  0xb0(%r10,%rbx,4),%ymm1  

vaddps  %ymm1,%ymm0,%ymm0  vmovdqu  %ymm0,0xb0(%r8,%rbx,4)  vmovdqu  0xd0(%r11,%rbx,4),%ymm0  vmovdqu  0xd0(%r10,%rbx,4),%ymm1  vaddps  %ymm1,%ymm0,%ymm0  vmovdqu  %ymm0,0xd0(%r8,%rbx,4)  vmovdqu  0xf0(%r11,%rbx,4),%ymm0  vmovdqu  0xf0(%r10,%rbx,4),%ymm1  vaddps  %ymm1,%ymm0,%ymm0  vmovdqu  %ymm0,0xf0(%r8,%rbx,4)  add        $0x40,%ebx  cmp        %edi,%ebx  jl          0x00007f9f6fab4ab0  

30  

Key  Takeaways:  •  Super  unrolled  8  Zmes  •  No  safepoints  •  No  boxing/unboxing  overheads  

Page 31: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Why Intrinsification?

Mature  support  on  compiler  side  and  no  new  technology  to  introduce.  

Reduced  dependence  on  value  types.  

Thorough  assembler  support.  

Faster  TTM  for  Vector  API  with  the  promise  of  performance.  

Takes  advantage  of  exisZng  compiler  opZmizaZons  like  unrolling  and  scheduling.  

31  

Page 32: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Goals with Intrinsification

• Ability  to  access  SIMD  from  naZve  architecture:  •  YES  -­‐  IntrinsificaZon  converts  API  calls  to  representaZon  that  maps  to  naZve  vector  instrucZons.  

• Performance:  •  YES  -­‐  Vectorized  code  gets  generated.  

• Graceful  degradaZon:  •  YES  (upcoming)  -­‐  For  operaZons  not  supported  on  a  parZcular  architecture,  boxing/unboxing  will  occur  just  for  that  operaZon.  Also,  can  downsize  vectors  to  fit  in  naZve  architecture  size.  

32  

Page 33: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Challenges with Lack of Value Type Support

33  

Control  flow  merges  for  objects  limit  escape  analysis   Escaping  Vector  objects  need  boxing.   Method  calls  that  pass  Vector  instances  

around  need  container  not  just  value.    

IntVector<S>  accum  =  spec.zero();  for  (...)  {    accum  =  accum.add(spec.broadcast(1));  }  

return  spec.fromArray(a,i);   IntVector<S>  val  =  spec.broadcast(42);  compute(val);  

•  SyntheZcally  insert  a  “VectorUnbox”  to  transfer  between  object  and  value.  

•  Expand  VectorUnbox  to  Vector  PHI  node  if  both  inputs  are  vector  values.  

•  Generate  allocator  to  create  an  empty  Vector  object.  

•  Fill  field  of  Vector  object  with  value  and  return  newly  created  object.  

•  Create  Vector  object  and  fill  with  value  just  as  in  the  return  case.  

Page 34: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Image Processing Application: Sepia Filter •  Filter  applies  Sepia  toning  to  an  image.    

34  

for  (int  i  =  0;  i  <  width  *  height;  i++)  {      magnitudeR[i]  =  0.393f  *  redFlat[i]  +  0.769f  *  greenFlat[i]  +  0.189f  *  blueFlat[i];      magnitudeG[i]  =  0.349f  *  redFlat[i]  +  0.686f  *  greenFlat[i]  +  0.168f  *  blueFlat[i];      magnitudeB[i]  =  0.272f  *  redFlat[i]  +  0.534f  *  greenFlat[i]  +  0.131f  *  blueFlat[i];      if(255.0f  <  magnitudeR[i])  magnitudeR[i]  =  255.0f;      if(255.0f  <  magnitudeG[i])  magnitudeG[i]  =  255.0f;      if(255.0f  <  magnitudeB[i])  magnitudeB[i]  =  255.0f;  }  

for  (int  i  =  0;  i  <  width  *  height;  i  +=  8)  {      FloatVector<Shapes.S256Bit>  c1  =  fspec.broadcast(0.393f);      FloatVector<Shapes.S256Bit>  c2  =  fspec.broadcast(0.769f);      FloatVector<Shapes.S256Bit>  c3  =  fspec.broadcast(0.189f);      FloatVector<Shapes.S256Bit>  c4  =  fspec.broadcast(0.349f);      FloatVector<Shapes.S256Bit>  c5  =  fspec.broadcast(0.686f);      FloatVector<Shapes.S256Bit>  c6  =  fspec.broadcast(0.168f);      FloatVector<Shapes.S256Bit>  c7  =  fspec.broadcast(0.272f);      FloatVector<Shapes.S256Bit>  c8  =  fspec.broadcast(0.534f);      FloatVector<Shapes.S256Bit>  c9  =  fspec.broadcast(0.131f);      FloatVector<Shapes.S256Bit>  c10  =  fspec.broadcast(255f);      FloatVector<Shapes.S256Bit>  redVec  =  fspec.fromArray(redFlat,  i);      FloatVector<Shapes.S256Bit>  greenVec  =  fspec.fromArray(greenFlat,  i);      FloatVector<Shapes.S256Bit>  blueVec  =  fspec.fromArray(blueFlat,  i);      FloatVector<Shapes.S256Bit>  res1  =  redVec.mul(c1).add(greenVec.mul(c2)).add(blueVec.mul(c3));      FloatVector<Shapes.S256Bit>  res2  =  redVec.mul(c4).add(greenVec.mul(c5)).add(blueVec.mul(c6));      FloatVector<Shapes.S256Bit>  res3  =  redVec.mul(c7).add(greenVec.mul(c8)).add(blueVec.mul(c9));      res1.blend(c10,  res1.lessThan(c10)).intoArray(magnitudeR,  i);      res2.blend(c10,  res2.lessThan(c10)).intoArray(magnitudeG,  i);      res3.blend(c10,  res3.lessThan(c10)).intoArray(magnitudeB,  i);  }  

Page 35: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

SEPIA Filter

35  

    vmulps    ymm3,ymm2,ymm1  vmulps    ymm1,ymm4,ymm13  vmulps    ymm0,ymm5,ymm4  vaddps    ymm0,ymm3,ymm0  vaddps    ymm0,ymm0,ymm1        vcmpps    ymm1,ymm0,ymm12,1h  vblendvps  ymm0,ymm12,ymm0,ymm1  vmovdqu  ymmword  ptr  [r12+r11*8+10h],ymm0  

AVX2  SIMD  arithmeZc  operaZons  

advanced  vector  operaZon  for  blend  

Image  before  filtering   aRer  filtering  from  Vector  API  implementaZon    

up  to  6x  faster  than  original  implementaZon  

Page 36: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

BLAS Performance

36  

•  BLAS  I,  II  algorithms  are  used  in  Machine  Learning  libraries  for  linear  models  (logisZc  and  linear  regression),  collaboraZve  filtering  etc.  (for  example,  Spark  ML)    

•  BLAS  III  rouZnes  like  GEMM  are  applicable  to  neural  network  and  deep  learning  algorithms.    •  Up  to  4.5X  performance  speed-­‐up  across  BLAS  I/II  and  III  algorithms*      

upto  4.5X  performance  speed-­‐up  on  BLAS  rouZnes*  

*Open  JDK  Project  Panama  source  build  09182017.  Java  Hotspot  64-­‐bit  Server  VM  (mixed  mode).  OS  version:  Cent  OS  7.3  64-­‐bit    Intel®  Xeon®  PlaZnum  8180  processor  (using  512  byte  and  1024  byte  chunk  of  floaZng  point  data).    JVM  opZons:  -­‐XX:+UnlockDiagnosZcVMOpZons  -­‐XX:-­‐CheckIntrinsics  -­‐XX:TypeProfileLevel=121  -­‐XX:+UseVectorApiIntrinsics      

0  

1  

2  

3  

4  

5  

SDOT   SSPR     SSYR   SGEMM  

Vector  opera#ons     matrix-­‐vector  opera#ons   matrix-­‐matrix  opera#ons  

BLAS  rou#nes  

Vector  API  performance  improvements  

Page 37: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Intrinsification coverage of Vector API

• Parts  of  API  are  supported  via  intrinsificaZon:  •  Float,  Double,  and  Int  Vectors  of  128,  256,  and  512  sizes  have  parZal  API  support.  

•  add,  sub,  mul,  div,  sumAll  are  supported.  equal,  lessThan,  and  blend  are  supported  for  128  and  256  size.  The  rest  are  in  development  and  will  arrive  on  a  regular  basis  throughout  the  rest  of  this  year  and  early  2018.  

•  All  examples  shown  with  generated  code  are  supported.  •  If  experimenZng  with  use  of  API  is  desired  without  performance  requirement,  pass  “-­‐XX:-­‐UseVectorApiIntrinsics”  to  VM  to  disable  use  of  Vector  API  intrinsificaZon.  

•  This  ensures  stability  and  full  coverage  with  Java  implementaZon  (but  will  be  slow  for  now).  

37  

Page 38: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

FloatVector<Shapes.S256Bit>  offsets  =  F_SPEC.fromArray(new  float[]{0,  1,  2,  3,  4,  5,  6,  7},  0);  FloatVector<Shapes.S256Bit>  vwidth  =  F_SPEC.broadcast(width);  FloatVector<Shapes.S256Bit>  vheight  =  F_SPEC.broadcast(height);  FloatVector<Shapes.S256Bit>  two  =  F_SPEC.broadcast(2);  FloatVector<Shapes.S256Bit>  sheight  =  vheight.div(two);  FloatVector<Shapes.S256Bit>  swidth  =  vwidth.div(two);  FloatVector<Shapes.S256Bit>  thresh  =  F_SPEC.broadcast(4);  FloatVector<Shapes.S256Bit>  zoomfactor1  =  F_SPEC.broadcast(0.73f);  FloatVector<Shapes.S256Bit>  zoomfactor2  =  F_SPEC.broadcast(0.10f);  FloatVector<Shapes.S256Bit>  zoomstep  =  F_SPEC.broadcast(this.zoom_cur_step);  IntVector<Shapes.S256Bit>  iones  =  I_SPEC.broadcast(1);  IntVector<Shapes.S256Bit>  max  =  I_SPEC.broadcast(this.iterations);    for  (int  row  =  0;  row  <  height;  row++)  {      for  (int  col  =  0;  col  <  width;  col  +=  F_SPEC.length())  {          FloatVector<Shapes.S256Bit>  cre  =  F_SPEC.broadcast(col).add(offsets).sub(swidth);          cre  =  cre.mul(thresh).div(vwidth).div(zoomstep).sub(zoomfactor1);          FloatVector<Shapes.S256Bit>  cim  =  F_SPEC.broadcast(row).sub(sheight).mul(thresh);          cim  =  cim.div(vwidth).div(zoomstep).add(zoomfactor2);          FloatVector<Shapes.S256Bit>  x  =  F_SPEC.zero();          FloatVector<Shapes.S256Bit>  y  =  F_SPEC.zero();          IntVector<Shapes.S256Bit>  iter  =  I_SPEC.zero();          Vector.Mask<Float,  Shapes.S256Bit>  mres  =  F_SPEC.trueMask();          while  (mres.anyTrue()  &&  iter.lessThan(max).allTrue())  {              FloatVector<Shapes.S256Bit>  x_new  =  x.mul(x).sub(y.mul(y)).add(cre);              y  =  two.mul(x).mul(y).add(cim);              x  =  x_new;              IntVector<Shapes.S256Bit>  temp  =  iter.add(iones);              iter  =  temp.blend(iter,  mres.rebracket(Integer.class));              mres  =  x.mul(x).add(y.mul(y)).lessThan(thresh);          }          IntVector<Shapes.S256Bit>  res  =  iter.blend(I_SPEC.zero(),  iter.lessThan(max));          res.intoArray(buff,  0);            for  (int  i  =  0;  i  <  buff.length;  i++)  {              image.setRGB(col  +  i,  row,  colors[buff[i]  %  colors.length]);          }      }  }  

Page 39: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Where to get it? How to use it?

• Project  Panama  contains  Vector  API:  •  h�p://openjdk.java.net/projects/panama/  

•  Intel  Developer  Zone  -­‐  Vector  API  Developer  Program  •  h�ps://soRware.intel.com/en-­‐us/arZcles/vector-­‐api-­‐developer-­‐program-­‐for-­‐java  

• Webpage  with  informaZon  for  developers  to  get  started  on  Vector  API  •  Code  samples  on  standard  BLAS  and  FSI  algorithms  

39  

Page 40: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Backup

40  

Page 41: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Contributors Oracle:  •  Paul  Sandoz  •  John  Rose  •  Vladimir  Ivanov  Intel  •  Razvan  Lupusoru  •  Vivek  Deshpande  •  Rahul  Kundu  •  Sandhya  Viswanathan  •  Shravya  Rukmannagari  •  Ian  Graves  (ex-­‐Intel)  

41  

Page 42: Vectors with Values on the JVM - Oraclecr.openjdk.java.net/.../j1-2017-Vector-API-CON4826.pdf · Vectors with Values on the JVM Razvan&Lupusoru&–Intel& & Paul&Sandoz–Oracle& @PaulSandoz&

Int128Vector

Value  Header   128  bit  Value  

Value  Type  Possible  Memory  Layout  1  

opZonal  

Value  Header   64  bit  long  

Value  Type  Possible  Memory  Layout  2  

opZonal  

Value  Header  

Value  Type  Possible  Memory  Layout  2  

opZonal  

64  bit  long  

32  bit  int   32  bit  int   32  bit  int   32  bit  int