Transcript
Page 1: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 1 © Hortonworks Inc. 2014

Discover HDP 2.1 Interactive SQL Query in Hadoop with Apache Hive

Hortonworks. We do Hadoop.

Page 2: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 2 © Hortonworks Inc. 2014

Speakers

Justin Sears

Hortonworks Product Marketing Manager

Carter Shanklin

Hortonworks Director of Product Management & PM for Apache Hive in Hortonworks Data Platform

Owen O’Malley

Hortonworks Co-Founder, Engineer & Committer for Apache Hive project

Page 3: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 3 © Hortonworks Inc. 2014

OPERATIONS  TOOLS  

Provision, Manage & Monitor

DEV  &  DATA  TOOLS  

Build & Test

A Modern Data Architecture AP

PLICAT

IONS  

DATA

   SYSTEM  

REPOSITORIES  

RDBMS   EDW   MPP  

Business    Analy<cs  

Custom  Applica<ons  

Packaged  Applica<ons  

Gov

erna

nce

&

Inte

grat

ion

ENTERPRISE HADOOP

Secu

rity

Ope

ratio

ns

Data Access

Data Management

SOURC

ES  

OLTP,  ERP,  CRM  Systems  

Documents,    Emails  

Web  Logs,  Click  Streams  

Social  Networks  

Machine  Generated  

Sensor  Data  

GeolocaCon  Data  

Page 4: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 4 © Hortonworks Inc. 2014

HDP 2.1: Enterprise Hadoop

HDP 2.1 Hortonworks Data Platform

HDP 2.1 Hortonworks Data Platform

   

Provision,  Manage  &  Monitor  

 Ambari  

Zookeeper  

Scheduling    

Oozie  

Data  Workflow,  Lifecycle  &  Governance  

 Falcon  Sqoop  Flume  NFS  

WebHDFS  YARN  :  Data  Opera<ng  System  

DATA    MANAGEMENT  

DATA    ACCESS  GOVERNANCE  &  INTEGRATION   OPERATIONS  

Script    Pig      

Search    

Solr      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  AnalyCcs,    ISV  engines  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

Batch    

Map  Reduce  

   

SECURITY  

Authen<ca<on  Authoriza<on  Accoun<ng  

Data  Protec<on    

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

Page 5: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 5 © Hortonworks Inc. 2014

HDP 2.1: Enterprise Hadoop

HDP 2.1 Hortonworks Data Platform

HDP 2.1 Hortonworks Data Platform

   

Provision,  Manage  &  Monitor  

 Ambari  

Zookeeper  

Scheduling    

Oozie  

Data  Workflow,  Lifecycle  &  Governance  

 Falcon  Sqoop  Flume  NFS  

WebHDFS  

DATA    MANAGEMENT  

GOVERNANCE  &  INTEGRATION   OPERATIONS  

Script    Pig      

Search    

Solr      

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Others    

In-­‐Memory  AnalyCcs,    ISV  engines  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   °  

°  

°  

N  

HDFS    (Hadoop  Distributed  File  System)  

Batch    

Map  Reduce  

   

SECURITY  

Authen<ca<on  Authoriza<on  Accoun<ng  

Data  Protec<on    

Storage:  HDFS  Resources:  YARN  Access:  Hive,  …    Pipeline:  Falcon  Cluster:  Knox  

YARN  :  Data  Opera<ng  System  

DATA    ACCESS  

SQL    

Hive/Tez,  HCatalog  

   

Page 6: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 6 © Hortonworks Inc. 2014

Apache Hive After the Stinger Initiative: Speed, Scale & SQL Compliance

Page 7: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 7 © Hortonworks Inc. 2014

Hive: SQL Analytics For Any Data Size

Sensor  Mobile  

Weblog  OperaConal  

/  MPP  

Store  and  Query  all  Data  in  Hive  

Use  Exis<ng  SQL  Tools  and  Exis<ng  SQL  Processes  

SQL  Queries  

Page 8: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 8 © Hortonworks Inc. 2014

The Stinger Initiative: Complete

• Community initiative around Hive • Enables Hive to support interactive workloads • Enhances Hive’s standard SQL interface for Hadoop • Improves existing tools & preserves investments

Query Processing

Vectorized Query

Execution Engine

Tez

= 100X + + File

Format

ORCFile

Page 9: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 9 © Hortonworks Inc. 2014

New in Hive HDP 2.1: Speed

New Features for Speed

Interactive query using Hive on Tez Vectorized query execution Cost-based optimizer

Page 10: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 10 © Hortonworks Inc. 2014

New in HDP 2.1: More Than 10 New SQL Features

New SQL Features

Subquery for IN / NOT IN Support for EXISTS and NOT EXISTS Common table expressions (CTEs) Support for CHAR datatype Scale and precision support for DECIMAL datatype JOIN conditions in the WHERE clause Cancel jobs via ODBC / JDBC Support for Unicode column names Permanent functions Stream data into Hive from Flume (Experimental feature)

Page 11: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 11 © Hortonworks Inc. 2014

Hive’s Journey to SQL Compliance Evolu<on  of  SQL  Compliance  in  Hive  

SQL  Datatypes   SQL  SemanCcs  INT/TINYINT/SMALLINT/BIGINT   SELECT,  INSERT  

FLOAT/DOUBLE   GROUP  BY,  ORDER  BY,  HAVING  

BOOLEAN   JOIN  on  explicit  join  key  

ARRAY,  MAP,  STRUCT,  UNION   Inner,  outer,  cross  and  semi  joins  

STRING   Sub-­‐queries  in  the  FROM  clause  

BINARY   ROLLUP  and  CUBE  

TIMESTAMP   UNION  

DECIMAL   Standard  aggregaCons  (sum,  avg,  etc.)  

DATE   Custom  Java  UDFs  

VARCHAR   Windowing  funcCons  (OVER,  RANK,  etc.)  

CHAR   Advanced  UDFs  (ngram,  XPath,  URL)  

Interval  Types   Sub-­‐queries  for  IN/NOT  IN,  HAVING  

JOINs  in  WHERE  Clause  

Common  Table  Expressions  (WITH  Clause)  

INSERT  /  UPDATE  /  DELETE  

Legend  Available  

Roadmap  

Hive  11  

Hive  12  

Hive  13  

Page 12: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 12 © Hortonworks Inc. 2014

New in HDP 2.1: Other Improvements

Other New Hive Features

SQL standard authorization

Hive job visualizer in Ambari

PAM authentication support

SSL encryption support in HiveServer2

Dynamic partition scalability

Page 13: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 13 © Hortonworks Inc. 2014

Demo

Page 14: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 14 © Hortonworks Inc. 2014

FoodMart Dataset

• FoodMart Dataset, replicated 275 times (~ 10GB data) • Queries run locally on an HDP 2.1 Sandbox. • Queries to do some customer analytics.

sales_fact_1997 customer

Other Dimension

Tables

time_by_day

Page 15: Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

Page 15 © Hortonworks Inc. 2014

Learn More About Hive & The Stinger Initiative

Hortonworks.com/labs/stinger/

Register for the remaining 5 Discover HDP 2.1 Webinars

Hortonworks.com/

webinars

Next Webinar:

Apache Falcon for Data Governance in Hadoop Wednesday, May 21, 10am

Pacific