36

Copyright © 2014 Oracle and/or its affiliates. All rights ... · Statistik und Mining Verfahren ... Datentransformation & Statistiken R workspace console Oracle statistics engine

Embed Size (px)

Citation preview

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Datenanalysen auf Enterprise Niveau mit Oracle R Enterprise

Dr. Nadine Schöne Sales Consultant Oracle Direct, Sales Consulting Dr. Michael Haupt Tech Lead, FastR Project Virtual Machine Research Group, Oracle Labs Negib Marhoul Leading Senior Sales Consultant Oracle Direct, Sales Consulting

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Safe Harbor Statement

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

4

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Agenda

Datenanalysen im Enterprise

R und Oracle R Enterprise (ORE)

Demo

Oracle Labs und FastR

Weitere Informationen

1

2

3

4

5

5

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Datenanalysen im Enterprise

6

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 7

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Hintergrund Statistik und Mining Verfahren

Zeitaufwendige

Analyseprozesse

Mehrere Interationen

Workflows von immer wiederkehrenden Arbeitsschritten

Ressourcen-intensive Datenanalysen

Daten sammeln

Daten

identifizieren

Daten aufbereiten

Daten analysieren

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Wichtige Themen für Enterprise Data Analytics

1. Skalierbarkeit

2. Performance

3. Entwicklung & Produktion

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

R und Oracle R Enterprise (ORE)

10

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

R ist …

1. Eine Programmiersprache

2. Eine statistische Workbench

3. Ein Data Science Ökosystem

R ist die lingua franca für Data Science.

R logo © R Foundation, vonhttp://www.r-project.org

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Aspekte herkömmlicher R/Datenbank-Interaktion

12

R logo © R Foundation, vonhttp://www.r-project.org

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

R Engine andere R-Packages

Oracle R Enterprise Packages

User R Engine (Desktop)

1

User-Tabellen

Oracle DB SQL

Ergebnisse

Datenbank Compute Engine 2 R Engine andere

R-Packages

Oracle R Enterprise Packages

R Engine(s) verwaltet durch Oracle DB

R

Ergebnisse

3

Transparency Layer => Nutzung der Rechenkraft der Datenbank Kein Flat File Export => Zeitersparnis + Nutzung der Rechenkraft des Servers

„Collaborative Execution“-Modell

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

“R is a powerful and interesting tool for data analysis! ORE brings R into a scalable DB engine (solving problems of data management, analysis and scalability). We actually can obtain information and added value from not so actively used data.”

– Stefano Alberto Russo, Researcher at CERN Openlab

14

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

• Oracle R Distribution

• ROracle

• Oracle R Enterprise

• Oracle R Advanced Analytics for Hadoop

Kostenlos für die R Community

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle R Enterprise auf einen Blick

Function push-down – Datentransformation & Statistiken

R workspace console

Oracle statistics engine

OBIEE, Web Services

Unveränderte User Experience

Skalierbar auf große Datenmengen

Einbettung in operationale Systeme

©2014 Oracle – All Rights Reserved

Entwicklung Produktion Anwendung

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Sensordaten-Analyse I

17

200.000 Haushalte

3 Jahre

1 Messung/Stunde

5.256 Mrd. Messwerte (2.628 Messwerte/Kunde)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Sensordaten-Analyse II

18

10 s/Modell

200.000 Haushalte ➔

200.000 Modelle

23 Tage + 4 Stunden 4,3 Stunden

Oracle R Enterprise

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Integration Data Miner mit Oracle R Enterprise

SQL Query node

– Erlaubt die Integration von R Skripten

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Advanced Analytics

• Data Understanding & Visualization – Summary & Descriptive Statistics – Histograms, scatter plots, box plots, bar charts – R graphics: 3-D plots, link plots, special R graph types – Cross tabulations – Tests for Correlations (t-test, Pearson’s, ANOVA) – Selected Base SAS equivalents • Data Selection, Preparation and Transformations – Joins, Tables, Views, Data Selection, Data Filter, SQL time windows, Multiple schemas – Sampling techniques – Re-coding, Missing values – Aggregations – Spatial data – R to SQL transparency and push down • Classification Models – Logistic Regression (GLM) – Naive Bayes – Decision Trees – Support Vector Machines (SVM) – Neural Networks (NNs) • Regression Models – Multiple Regression (GLM) – Support Vector Machines

Wide Range of In-Database Data Mining and Statistical Functions

Clustering – Hierarchical K-means – Orthogonal Partitioning – Expectation Maximization

Anomaly Detection – Special case Support Vector Machine (1-Class SVM)

Associations / Market Basket Analysis – A Priori algorithm

Feature Selection and Reduction – Attribute Importance (Minimum Description Length) – Principal Components Analysis (PCA) – Non-negative Matrix Factorization – Singular Vector Decomposition

Text Mining – Most OAA algorithms support unstructured data (i.e. customer

comments, email, abstracts, etc.) Transactional Data

– Most OAA algorithms support transactional data (i.e. purchase transactions, repeated measures over time)

R packages—ability to run open source – Broad range of R CRAN packages can be run as part of database

process via R to SQL transparency and/or via Embedded R mode

* included in every Oracle Database

Data Understanding & Visualization

Classification & Regression Models

Clustering

Run open source R packages

Data Preparation and Transformations

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Demo

21

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

R 3.1.1

Oracle R Enterprise (ORE) 1.4.1

Oracle DB 12.1.0.2.0

R, SQL

Software-Komponenten im VM-Image

Oracle SQLDeveloper 4.0.3 Rstudio 0.98.1079

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Benefits

6054 R-Packages

23

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Labs und FastR

24

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 25

Safe Harbor Statement

The following is intended to provide some insight into a line of research in Oracle Labs. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Oracle reserves the right to alter its development plans and practices at any time, and the development, release, and timing of any features or functionality described in connection with any Oracle product or service remains at the sole discretion of Oracle. Any views expressed in this presentation are my own and do not necessarily reflect the views of Oracle.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

The Mission of Oracle Labs is straightforward: Identify, explore, and transfer new technologies that have the potential to substantially improve Oracle's business.

– Edward Screven, Chief Corporate Architect, Oracle

26

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Überlegungen zu R

• R eignet sich hervorragend für statistische Aufgaben. Warum sollte man C und Fortran verwenden?

• R ist als Sprache inhärent parallel. Warum sollte man Parallelität extra implementieren?

27

Library'2(R'+'Fortran)

Library'1(R'+'C)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

FastR

• Open-Source-R-Implementierung

– GPL 2

– https://bitbucket.org/allr/fastr

– Forschungsprototyp

– Linux, Mac

• Eigenschaften

– In “100 % Java” implementiert

– Mit Truffle (Interpreter) und Graal (dynamischer Compiler)

28

Library'2'(R)

Library'1'(R)

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Truffle und Graal

29

Node%Transi, ons:Specializing%for%Types

Unini, alized

Generic

AST$InterpreterUnini- alized$Nodes

AST$InterpreterRewri. en$Nodes Compiled)Code

Deop%miza%onto,AST,Interpreter

Node%Rewri*ng%to%UpdateProfiling%Feedback

Node%Rewri*ngfor%Profiling%Feedback

Compila( on*usingPar( al*Evalua( on

Recompila*on,usingPar*al,Evalua*on

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Benchmark-Ergebnisse: Shootout

• Benchmark-Eigenschaften

– “Computer Languages Shootout Game”

– Keine typischen R-Anwendungen

• Ergebnisse – Achtung, logarithmische Achse

– Die meisten sind ca. 10x schneller

– Positive Ausnahme: ca. 520x

30

1

10

100

1000

Geometric mean: 10x improvement over GNU R

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

PGX: Überblick

PGX ist ein Framework zur Datenanalyse, das mächtige Graphen-Analysen der Daten unterstützt

Recommendation Influencer

Identification

Community Detection

Pattern Matching

PGX führt schnelle und parallele Analysen auf großen Graphen aus – sowohl auf einer einzelnen Maschine als auch in einer verteilten Umgebung.

PGX ist eng integriert mit der Oracle DB (Optionen RDF und PG), welche Graphdaten auf persistentem Speicher konsistent verwaltet.

PGX

… Single Machine Distributed

Graph

Program (DSL)

compiler

Unsere DSL-Compiler-Technologie erlaubt einfaches Umschalten zwischen zwei Umgebungen.

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Mehr Informationen

32

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Mehr Informationen

33

ORE Discussion Forum: https://community.oracle.com/community/developer/english/business_intelligence/data_warehousing/r

Oracle Advanced Analytics: http://www.oracle.com/technetwork/database/options/advanced-analytics/index.html

ORE-Blog: https://blogs.oracle.com/R/

FastR: https://bitbucket.org/allR/fastR

Graal/Truffle: https://wiki.openjdk.java.net/display/Graal/Main

Oracle Labs im OTN: http://www.oracle.com/technetwork/oracle-labs/index.html

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Kontakt

Dr. Nadine Schöne| Sales Consultant

Email: [email protected]

Tel: +49 331 200 7190

Dr. Michael Haupt | Tech Lead, FastR Project

Email: [email protected]

Tel: +49 331 200 7277

ORACLE Deutschland B.V. & Co. KG

Schiffbauergasse 14

14467 Potsdam

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 35