Upload
oracle-big-data
View
831
Download
2
Tags:
Embed Size (px)
Citation preview
Oracle Big Data SQLCreate Value with Data
David TeszlerDirector, Big Data AnalyticsProduct Business GroupSeptember, 2014
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Oracle Confidential – Internal/Restricted/Highly Restricted 3
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Agenda
Oracle Confidential – Internal/Restricted/Highly Restricted 4
Seize the Opportunity by Breaking Big Data Technical Barriers
Oracle Big Data SQL: Enabling Technology to Unify the Data Platform
Demonstration
1
2
3
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Big Data OpportunityTypical use cases in today’s world of fast exploration of big data
Financial Services
MoneyLaundering
PortfolioAnalysis
Tracking Stock
Market
Manufacturing
Supply Planning
Retail
ReturnsFraudBuying
Patterns
Session-ization
Telcos
MoneyLaundering
SIM Card Fraud
CallQuality
BigData
Slide - 5
Utilities
NetworkAnalysis
Quality Assessment
Fraud
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential | #BeyondBigData 6
CREATEVALUE
SILOS OF INNOVATION SYSTEMS OF RECORD
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential | #BeyondBigData 7
Enterprise Big Data Analytics ArchitectureEnabling you to Create Value from Data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
BIG DATAMANAGEMENT
BIG DATAANALYTICS
BIG DATAAPPLICATIONS
BIG DATAINTEGRATION
CREATE VALUEFROM DATA
Streaming +Batch
Data Reservoir +Data Warehouse
Discovery +Business Analytics
Mobile +Web + On-device
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 8
Discover and predict, fast
Simplify access to all data
Secure andgovern all data
MAKING BIG DATA BUSINESS AS USUAL with Oracle Big Data Enabling your Organization
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Array of Technologies
9
Run the Business
Business transactions
Business analytics
RelationalHadoop
Change the Business
Data reservoirs
Exploit new analyses
NoSQL
Scale the Business
Fast simple data structures
Scale-out economically
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Barriers to Adoption of New Technologies
Confidential 10
INTEGRATION SKILLS SECURITY
Lack tools and
training to exploit Big Data
Adding Big Data to
existing
architecture is complex
No clear route to
governance or
enforcement
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Overcoming Barriers to Adoption of New Technologies
Confidential 11
INTEGRATION SKILLS SECURITY
EngineeredSystems
SQL onAll Data
Database Security on
All Data
SQL
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Management System
SOU
RC
ES
Oracle Database
Oracle IndustryModels
Oracle Advanced Analytics
Oracle Spatial & Graph
Big Data Appliance
Cloudera Hadoop
Oracle Big Data Discovery
Oracle NoSQL Database
Oracle R Advanced Analytics for Hadoop
Oracle Database
Oracle Advanced Security
Oracle Advanced Analytics
Oracle Spatial & Graph
Oracle Exadata
Oracle Big DataConnectors
Oracle DataIntegrator
B
Oracle Big Data SQL
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Analytics Challenge
13
Separate silos of information to analyze
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Analytics Challenge
14
Separate data access interfaces
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 15
SQL on Hadoop is Obvious
Stinger
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Analytics Challenge
16
No comprehensive SQL interface
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Management System
17
Preserving investment in SQL for Big Data analytics
NoSQL
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Snapshot of Oracle SQL Analytic Functions
Use Rich Oracle SQL Dialect Over All Data
• Ranking functions
– rank, dense_rank, cume_dist, percent_rank, ntile
• Window Aggregate functions (moving and cumulative)
– Avg, sum, min, max, count, variance, stddev, first_value, last_value
• LAG/LEAD functions
– Direct inter-row reference using offsets
• Reporting Aggregate functions
– Sum, avg, min, max, variance, stddev, count, ratio_to_report
• Statistical Aggregates
– Correlation, linear regression family, covariance
• Linear regression
– Fitting of an ordinary-least-squares regression line to a set of number pairs.
– Frequently combined with the COVAR_POP, COVAR_SAMP, and CORR functions
• Descriptive Statistics
– DBMS_STAT_FUNCS: summarizes numerical columns of a table and returns count, min, max, range, mean, stats_mode, variance, standard deviation, median, quantile values, +/- n sigma values, top/bottom 5 values
• Correlations
– Pearson’s correlation coefficients, Spearman's and Kendall's (both nonparametric).
• Cross Tabs
– Enhanced with % statistics: chi squared, phi coefficient, Cramer's V, contingency coefficient, Cohen's kappa
• Hypothesis Testing
– Student t-test , F-test, Binomial test, Wilcoxon Signed Ranks test, Chi-square, Mann Whitney test, Kolmogorov-Smirnov test, One-way ANOVA
• Distribution Fitting
– Kolmogorov-Smirnov Test, Anderson-Darling Test, Chi-Squared Test, Normal, Uniform, Weibull, Exponential
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
} else {
next = lineNext.getQuantity();
}
if (!q.isEmpty() && (prev.isEmpty() || (eq(q, prev) && gt(q, next)))) {
state = "S";
return state;
}
if (gt(q, prev) && gt(q, next)) {
state = "T";
return state;
}
if (lt(q, prev) && lt(q, next)) {
state = "B";
return state;
}
if (!q.isEmpty() && (next.isEmpty() || (gt(q, prev) && eq(q, next)))) {
state = "E";
return state;
}
if (q.isEmpty() || eq(q, prev)) {
state = "F";
return state;
}
return state;
}
private boolean eq(String a, String b) {
if (a.isEmpty() || b.isEmpty()) {
return false;
}
return a.equals(b);
}
private boolean gt(String a, String b) {
if (a.isEmpty() || b.isEmpty()) {
return false;
}
return Double.parseDouble(a) > Double.parseDouble(b);
}
private boolean lt(String a, String b) {
if (a.isEmpty() || b.isEmpty()) {
return false;
}
return Double.parseDouble(a) < Double.parseDouble(b);
}
public String getState() {
return this.state;
}
}
BagFactory bagFactory = BagFactory.getInstance();
@Override
public Tuple exec(Tuple input) throws IOException {
long c = 0;
String line = "";
String pbkey = "";
V0Line nextLine;
V0Line thisLine;
V0Line processLine;
V0Line evalLine = null;
V0Line prevLine;
boolean noMoreValues = false;
String matchList = "";
ArrayList<V0Line> lineFifo = new ArrayList<V0Line>();
boolean finished = false;
DataBag output = bagFactory.newDefaultBag();
if (input == null) {
return null;
}
if (input.size() == 0) {
return null;
}
Object o = input.get(0);
if (o == null) {
return null;
}
//Object o = input.get(0);
if (!(o instanceof DataBag)) {
int errCode = 2114;
String msg = "Expected input to be DataBag, but"
Simplified, sophisticated, standards based syntax
Pattern Matching With Oracle SQLSnapshot of Oracle SQL Analytic Functions
SELECT first_x, last_z
FROM ticker MATCH_RECOGNIZE (
PARTITION BY name ORDER BY time
MEASURES FIRST(x.time) AS first_x,
LAST(z.time) AS last_z
ONE ROW PER MATCH
PATTERN (X+ Y+ W+ Z+)
DEFINE X AS (price < PREV(price)),
Y AS (price > PREV(price)),
W AS (price < PREV(price)),
Z AS (price > PREV(price) AND
z.time - FIRST(x.time) <= 7 ))
250+ Lines of Java UDF 12 Lines of SQL
20x less code
Finding Patterns in Stock Market Data - Double Bottom (W)
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. 19
10:00 10:05 10:10 10:15 10:20 10:25
Ticker
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data SQL – A New Architecture
• Powerful, high-performance SQL on Hadoop
– Full Oracle SQL capabilities on Hadoop
– SQL query processing local to Hadoop nodes
• Simple data integration of Hadoop and Oracle Database– Single SQL point-of-entry to access all data
– Scalable joins between Hadoop and RDBMS data
• Optimized hardware
– Balanced Configurations
– No bottlenecks
Oracle Confidential – Internal/Restricted/Highly Restricted 20
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Want to know what this reallymeans.
100%
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Data Stored in Hadoop
Oracle Confidential – Internal/Restricted/Highly Restricted 22
Hadoop/NoSQL Ecosystem {"custId":1185972,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:07","recommended":null,"activity":8}{"custId":1354924,"movieId":1948,"genreId":9,"time":"2012-07-01:00:00:22","recommended":"N","activity":7}{"custId":1083711,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:26","recommended":null,"activity":9}{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:32","recommended":"Y","activity":7}{"custId":1010220,"movieId":11547,"genreId":44,"time":"2012-07-01:00:00:42","recommended":"Y","activity":6}{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:43","recommended":null,"activity":8}{"custId":1253676,"movieId":null,"genreId":null,"time":"2012-07-01:00:00:50","recommended":null,"activity":9}{"custId":1351777,"movieId":608,"genreId":6,"time":"2012-07-01:00:01:03","recommended":"N","activity":7}{"custId":1143971,"movieId":null,"genreId":null,"time":"2012-07-01:00:01:07","recommended":null,"activity":9}{"custId":1363545,"movieId":27205,"genreId":9,"time":"2012-07-01:00:01:18","recommended":"Y","activity":7}{"custId":1067283,"movieId":1124,"genreId":9,"time":"2012-07-01:00:01:26","recommended":"Y","activity":7}{"custId":1126174,"movieId":16309,"genreId":9,"time":"2012-07-01:00:01:35","recommended":"N","activity":7}{"custId":1234182,"movieId":11547,"genreId":44,"time":"2012-07-01:00:01:39","recommended":"Y","activity":7}}{"custId":1346299,"movieId":424,"genreId":1,"time":"2012-07-01:00:05:02","recommended":"Y","activity":4}
Example: Files with JSON data
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 23
SQL-on-Hadoop Engines Share Metadata, not MapReduce
Hive Metastore
Hive Metastore
Hive ImpalaSparkSQLOracle Big Data SQL …
Table Definitions:movieapp_log_jsonTweetsavro_log
Metastore maps DDL to Java access classes
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Enhanced Oracle External Tables
• New types of external tables
– ORACLE_HIVE (inherit metadata)
– ORACLE_HDFS (specify metadata)
• Access parameters for Big Data– Hadoop cluster
– Remote Hive database/table• DBMS_HADOOP Package for automatic import
24
CREATE TABLE movielog (
click VARCHAR2(4000))
ORGANIZATION EXTERNAL (
TYPE ORACLE_HIVE
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS
(
com.oracle.bigdata.tablename logs
com.oracle.bigdata.cluster mycluster
))
REJECT LIMIT UNLIMITED;
Schema on Read
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
CUSTOMERS
SELECT name, SUM(purchase)
FROM customers
GROUP BY name;
Intelligent Storage Maximizes Performance
What Can Big Data Learn from Exadata?
Oracle ExadataStorage Server
Oracle ExadataStorage Server
Oracle SQL query issued• Plan constructed• Query executed
1
Smart Scan Works on Storage• Filter out unneeded rows• Project only queried columns• Score data models• Bloom filters to speed up joins
2
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Storage Layer
Oracle Confidential – Internal/Restricted/Highly Restricted 26
Big Data SQL Server: A New Hadoop Processing Engine
Filesystem (HDFS)NoSQL Databases
(Oracle NoSQL DB, Hbase)
Resource Management (YARN, cgroups)
Processing Layer
MapReduceand Hive
Spark Impala SearchBig Data
SQL
B
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
B B B
How do we query Hadoop?
Big Data SQL Query Execution
HDFSData NodeBDS Server
HDFS Data NodeBDS Server
Query compilation determines:• Data locations • Data structure• Parallelism
1
Fast reads using Big Data SQL Server• Schema-on-read using Hadoop classes• Smart Scan selects only relevant data
2
Process filtered result• Move relevant data to database• Join with database tables• Apply database security policies
3Hive Metastore
HDFSNameNode 1
2 3
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
But How Does Security Work?
B B B
Database security for query access• Virtual Private Databases• Redaction• Audit Vault and Database Firewall
1
Hadoop security for Hadoop jobs• Kerberos Authentication• Apache Sentry (RBAC)• Audit Vault
2
System-specific encryption• Database tablespace encryption• BDA On-disk Encryption
3
SELECT * FROM my_bigdata_table
WHERE SALES_REP_ID =
SYS_CONTEXT('USERENV','SESSION_USER');
Filter on SESSION_USER
DBMS_REDACT.ADD_POLICY(
object_schema => 'MCLICK',
object_name => 'TWEET_V',
column_name => 'USERNAME',
policy_name => 'tweet_redaction',
function_type => DBMS_REDACT.PARTIAL,
function_parameters =>
'VVVVVVVVVVVVVVVVVVVVVVVVV,*,3,25',
expression => '1=1'
);
***
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 29
Summary: Oracle Big Data SQL
Oracle SQL , on all your data.
Oracle SQL on Hadoop and beyond• With a Smart Scan service inspired by Exadata• With native SQL operators• With the security of Oracle Database
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Key Technologies Driving Innovation
Use the Right Tool for the Job and benefit from the Power of “AND”
30
Run the Business
Business transactions
Business analytics
RelationalHadoop
Change the Business
Data reservoirs
Exploit new analyses
NoSQL
Scale the Business
Fast simple data structures
Scale-out economically
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Internal/Restricted/Highly Restricted 31
Discover and predict, fast
Simplify access to all data
Secure andgovern all data
MAKING BIG DATA BUSINESS AS USUAL with Oracle Big Data Enabling your Organization
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
• VM containing key components of Oracle Big Data Platform
• Download from OTN
• In sync with latest BDA release
• Used for:
– Learning about the Oracle platform
– Developing applications deployed to BDA
– BDA Client
Get Started: Oracle Big Data Lite Virtual Machine
http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html