33

Accelerating Apache Arrow and - RainFocus€¦ · projects like – Hbase, Impala, Kudu, Parquet, Phoenix, Spark, Storm etc. •Fast - Take Advantage of SIMD operations with better

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Accelerating Apache Arrow and Quartet FS ActivePivot with SPARC Software in Silicon CON6383

Allen Whipple Managing Director, New York Quartet FS Amir Javanshir Principal Software Engineer ISV Engineering, Oracle Sanjay Rao Senior Software Engineer ISV Engineering, Oracle

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

3

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Data Analytics Accelerator (DAX): Overview and OpenDAX API

JDK 8 with DAX

Quartet FS: Return on experience

Apache Arrow

Getting started

1

2

3

4

4

5

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

The rise of In-Memory Computing

• Data analytics is increasingly relying on in-memory approaches to deliver better time to insight

• The key to delivering performance for data intensive, in-memory analytics is streaming and processing as much data as possible from memory

• Accommodating as much data in memory is essential

• Oracle's DAX technology developed into SPARC processors provides almost an order of magnitude speed-up for this type of analytics work on compressed data

• We will show you how to use this technology to achieve insight faster, using less hardware

5

Making Analytics available to all

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Introducing Accelerators into Oracle’s SPARC Processors

• Break-through Chip design: – 2X-3X Memory Bandwidth

– 32 cores and 256 threads at 4.13Ghz

– Large 64MB L3 Cache

– Up to 2TB physical memory per processor

– 3x IO Bandwidth over prior generations

• Data Analytics Acceleration (DAX) – Offloads processing for lower core usage

– DB12c Query Acceleration

– Open APIs for Customers, Partners and Developers

– Early adopters seeing amazing results

Software in Silicon Technology

6

Memory

SPARC M7 & S7

Full Bandwidth

DAX SQL

CORES OFFLOADED CORES OFFLOADED

SQL DAX

Crypto Accelerators

Integrated Offload

6

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Analytics Accelerator Engine

Decompress

Unpack/ Alignment

Scan, Filter, Join

Result Format/ Encode

Data Input Queues

Local SRAM

Decompress

Unpack/ Alignment

Result Format/ Encode

Decompress

Unpack/ Alignment

Result Format/ Encode

Decompress

Unpack/ Alignment

Result Format/ Encode

Data Output Queues M7 Query Engine

On-Chip Network

Data Input Queues

Data Output Queues

On-Chip Network

On-Chip Network

On-Chip Network

Scan, Filter, Join

Scan, Filter, Join

Scan, Filter, Join

7

• Data Analytics Accelerators (DAX) built on chip

– Independently process streams of data placed in system memory

– Decompress Data simultaneously

– Frees cores to run other applications, such as OLTP

– Reduces cache pollution

– 8 (M7) or 4(S7) accelerator units per chip, with 4 pipelines each

– Profits to big data analysis, machine learning algorithms and Oracle DB In-Memory

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

M7 DAX Operations

8

• Scan - Scans an array for elements which match (or greater than or less than) an input value and returns a bit vector with bits set for match

• Select - Selects elements from an array based on a bit vector - Input: bit vector ; Output: elements for which the bit vector =1

• Translate - Inset operation, given an input set of integers, how many of them are present in also another set

• Extract - Decompression – Formats supported - RLE, N-gram compression etc

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

DAX Open APIs for Partners and Application Developers

• Oracle DB12c In-Memory Query Acceleration and Inline Decompression

– Available in Oracle Database 12.1.0.2 BP13 + Oracle Solaris 11.3

• Solaris 11.3 APIs – libdax APIs support hardware capabilities and hide details and handles limitations

– libvector APIs extends with JNI, Python, SQL bindings

• Implementing Java Streams with DAX acceleration

• Implementing DAX with Partners and FOSS (Apache Spark)

9

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Accessing DAX by S/W

10

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Data Analytics Accelerator (DAX): Overview and OpenDAX API

JDK 8 with DAX

Quartet FS: Return on experience

Apache Arrow

Getting started

1

2

3

4

11

5

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Adding DAX to JDK 8 Lambda Streams

12

• The Lambda Streams API introduced in JDK 8 is a natural fit to expose DAX functionality to accelerate existing and future Java based analytics applications and frameworks

• Implemented as JDK8 standalone library

• Successfully offloads Integer Stream filter, allMatch, anyMatch, noneMatch and count functions to DAX

• Existing code offloaded “Under-the-Hood” by JVM, significant performance boost with dramatically lower compute resources

• Release as part of JDK (OpenJDK project)

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 13

Automatic Acceleration of Java Analytics • Use cases: SQL style Java: e.g. Top N integers, outlier detection, cube building, KNN

algorithm, weather analysis

• Example results: Weather Data Query (using 10 Million data points)

– Query 1: Number of times Temperature crossed 90F ~ 10X faster on DAX

– Query 2: Test if Temperature always less than 100F ~ 20X faster on DAX

• Executed on SPARC S7-2 running Solaris 12 & JDK 8 Standalone Library

13

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Weather Query 2

Weather Query 1

Top-N Integers

Percentile Calculator

Outlier Detection

Exe

cuti

on

Tim

e (m

s)

No Offload

DAX Offload

Workload Baseline run time (ms)

Trinity run time (ms) Speedup

Weather Analysis Query 2

88 4 21.8X

Weather Analysis Query 1

129 12 10.8X

Top-N Integer 243 59 4.1X

Percentile Calculator 1860 468 4.0X

Outlier detection 683 188 3.6X

Predicate (> 95 F)

Map (filter)

Reduce (count)

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Data Analytics Accelerator (DAX): Overview and OpenDAX API

JDK 8 with DAX

Quartet FS: Return on experience

Apache Arrow

Getting started

1

2

4

14

5

3

w w w . q u a r t e t f s . c o m 15

Solving the operational decision-making needs of business users working in time-sensitive and data-intensive environments

About Quartet FS

Established in 2005 by the founders of Summit Systems

6 offices - London - New York - Paris - Singapore - Hong Kong - Sydney

75+ employees 80+ implementations 50+ client organisations 10+ ISV & SaaS partners

w w w . q u a r t e t f s . c o m 16

PRE-PROCESSING

Data enrichment

Pre-calculations

Custom rules

AGGREGATION

Incremental updates

On the fly aggregation

POST-PROCESSING

Computes complex measures

Reacts to real-time streaming

Includes user specific behaviour

HETEROGENEOUS

DATA SOURCES

ActivePivot

Pure Java in-memory analytics database management system Aggregates data incrementally and in real-time Executes complex computations based on your business logic Supports on-demand “what-if” analysis on real-time data.

USER INTERFACE

An Open Aggregation and Calculation Framework

What-if analysis

Intuitive exploration

Alerts

w w w . q u a r t e t f s . c o m 17

Typical ActivePivot Analytics Use Cases

1

7

Finance Pricing Supply Chain

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

ActiveDax Prototype

• Written in C

• Simulates Quartet FS ActivePivot’s in memory computation engine

• Generates an in memory table based on a set of input parameters: X columns storing dictionary indexes (unsigned integers)

• Generates data randomly

• Runs N parallel threads simulating users

• Each user runs sequentially M queries

• Each query is randomly generated ( = value or != value)

18

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Scan in-memory table representing indexes Line Nb | Col 1 | Col 2 | Col 3 | Col 4 | Val 1 | Val 2 | Val 3 |

------------ | ------- | ------- | ------- | ------- | ------- | ------- | ------- |

1 | 3 | 6 | 162 | 32 | 10.59 | 15.30 | 0.26 |

2 | 6 | 5 | 69 | 28 | 0.76 | 13.69 | 0.99 |

3 | 5 | 3 | 32 | 31 | 55.61 | 16.56 | 0.42 |

4 | 3 | 2 | 187 | 22 | -28.01 | 12.71 | 0.82 |

5 | 1 | 0 | 60 | 30 | 66.98 | 8.29 | 0.17 |

6 | 9 | 1 | 28 | 43 | 51.67 | 9.66 | 0.86 |

7 | 0 | 3 | 128 | 13 | 2.71 | 9.12 | 0.93 |

8 | 1 | 6 | 185 | 30 | -10.03 | 12.96 | 0.59 |

9 | 3 | 2 | 173 | 49 | 29.27 | 13.29 | 0.66 |

10 | 5 | 0 | 133 | 38 | -5.70 | 10.14 | 0.24 |

-- List of Search Criteria for each Scan column:

Criteria 1: [ = 1] Index to dictionary Criteria 2: [ != 1]

Criteria 3: [ != 57]

Criteria 4: [ = 30]

-- Final Logical Operation: AND

-- Final DAX and No Dax Bit Vectors:

00001001 00000000 00000000

00001001 00000000 00000000

Queries - Matching Matrices - Matching Vectors: 1 / 1 / 1

19

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

DAX vs. Core only

20

Table scan: Total queries per second

0

50

100

150

200

250

300

350

400

450

500

1 2 4 8 16 32 64 96 128 192

Qu

erie

s p

er s

eco

nd

Threads

Scan Results 15M Lines, 12 Scan Cols

DAX 8 bits (Qps)

DAX 16 bits (Qps)

DAX 32 bits (Qps)

Core (Qps)

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

DAX vs. Core only

21

Performance enhancement

0,00

5,00

10,00

15,00

20,00

25,00

1 2 4 8 16 32 64 96 128 192

x ti

mes

bet

ter

Threads

DAX Enhancement vs. Core only 15M Lines, 12 Scan Cols

8 bits

16 bits

32 bits

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

DAX vs. Core only

22

Scan compressed data

0

2

4

6

8

10

12

14

1 2 4 8 16 32 64 96 128 192

Qu

eri

es p

er

seco

nd

per

th

read

Number of Threads

Scan Results 15M Lines, 12 Scan Cols, 16 bit Integers

DAX 16 bits

DAX 16 bits Comp

Core

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

DAX vs. Core only

23

CPU usage

DAX DAX

Compressed Core Only

1 Thread 1,55 3,97 98,42

0

20

40

60

80

100

120

Total CPU Instructions (Billions)

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Program Agenda with Highlight

Data Analytics Accelerator (DAX): Overview and OpenDAX API

JDK 8 with DAX

Quartet FS: Return on experience

Apache Arrow

Getting started

1

2

4

24

5

3

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

• New Top-Level Apache Software Foundation Project (announced Feb 2016)

• Solves

- Re-formatting of data for cross-system communication

- Overhead in accessing of data

- Multiple copies of data in-memory

• Backed by key-developers of 13 Major Open Source projects like – Hbase, Impala, Kudu, Parquet, Phoenix, Spark, Storm etc.

• Fast

- Take Advantage of SIMD operations with better use of CPU Cache.

- Columnar memory-layout permitting random access with an efficiency of O(1) for Data Retrieval

25

Apache Arrow - Introduction

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

The only system with in-memory columnar data supporting Complex, Dynamic Schemas and Common Data Layer

26

Arrow – Memory Layout

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Arrow – Vector SCAN Operation

• Written in Java

• Generates in-memory columnar Arrow vectors

• Arrow vectors store a sequence of values in an Individual column

• Test Scenario

• Generates test data in two arrow vectors

• Scan indexed data in two vectors and check for the condition if they have same data

with accessor, vector1 == vector2

27

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Apache Arrow with SPARC M7 DAX

28

Configurations - SPARC M7-1, 7 Cores, 80 GB Memory, Oracle Solaris 11.3 OS with Java 1.8.0 Linear Performance with DAX at 3x lower CPU

0

10

20

30

40

50

60

65 Million 125 Million

256 Million

512 Million

800 Million

1 Billion

CP

U U

tliz

ati

on

in %

DATA SET

1 Million 2 Million 6 Million 65 Million 125

Million 256

Million 512

Million 800

Million 1 Billion

DAX Time (Sec) 0,0144 0,029 0,127 1,62 4,17 6,466 13,074 20,332 29,051

NODAX Time (Sec) 0,0836 0,121 0,369 3,3 7,93 13,56 28,726 39,823 60,637

0,0078125

0,015625

0,03125

0,0625

0,125

0,25

0,5

1

2

4

8

16

32

64

Tim

e(Se

c)

2X to 6X Speedup with DAX

DataSet

Lower CPU Utilization(1/3rd) with DAX Offload

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Program Agenda

Data Analytics Accelerator (DAX): Overview and OpenDAX API

JDK 8 with DAX

Quartet FS: Return on experience

Apache Arrow

Getting started

1

2

4

29

5

3

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |

Oracle Software In Silicon Developer Cloud Free Access for Universities, Researchers, Customers and Partners

• Access M7 DAX Zones with Solaris 11.3

• Prebuilt templates to extend and customize

– Open APIs, libraries, man pages, headers

– Code examples and use cases

– Example Integration for Apache Spark

• White papers and Animation Demo

• Simple Online Click-thru license

Available now at: http://SWiSdev.Oracle.com/DAX/

30

Copyright © 2016, Oracle and/or its affiliates. All rights reserved. | 32