Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

  • View
    1.422

  • Download
    3

Embed Size (px)

DESCRIPTION

In February 2013, the open source community launched the Stinger Initiative to improve speed, scale and SQL semantics in Apache Hive. After thirteen months of constant, concerted collaboration (and more than 390,000 new lines of Java code) Stinger is complete with Hive 0.13. In this presentation, Carter Shanklin, Hortonworks director of product management, and Owen O'Malley, Hortonworks co-founder and committer to Apache Hive, discuss how Hive enables interactive query using familiar SQL semantics.

Text of Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive

  • Page 1 Hortonworks Inc. 2014 Discover HDP 2.1 Interactive SQL Query in Hadoop with Apache Hive Hortonworks. We do Hadoop.
  • Page 2 Hortonworks Inc. 2014 Speakers Justin Sears Hortonworks Product Marketing Manager Carter Shanklin Hortonworks Director of Product Management & PM for Apache Hive in Hortonworks Data Platform Owen OMalley Hortonworks Co-Founder, Engineer & Committer for Apache Hive project
  • Page 3 Hortonworks Inc. 2014 OPERATIONS TOOLS Provision, Manage & Monitor DEV & DATA TOOLS Build & Test A Modern Data ArchitectureAPPLICATIONS DATA SYSTEM REPOSITORIES RDBMS EDW MPP Business Analy
  • Page 4 Hortonworks Inc. 2014 HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform HDP 2.1 Hortonworks Data Platform Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS YARN : Data Opera
  • Page 5 Hortonworks Inc. 2014 HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform HDP 2.1 Hortonworks Data Platform Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS DATA MANAGEMENT GOVERNANCE & INTEGRATION OPERATIONS Script Pig Search Solr NoSQL HBase Accumulo Stream Storm Others In-Memory AnalyCcs, ISV engines 1 N HDFS (Hadoop Distributed File System) Batch Map Reduce SECURITY Authen
  • Page 6 Hortonworks Inc. 2014 Apache Hive After the Stinger Initiative: Speed, Scale & SQL Compliance
  • Page 7 Hortonworks Inc. 2014 Hive: SQL Analytics For Any Data Size Sensor Mobile Weblog OperaConal / MPP Store and Query all Data in Hive Use Exis
  • Page 8 Hortonworks Inc. 2014 The Stinger Initiative: Complete Community initiative around Hive Enables Hive to support interactive workloads Enhances Hives standard SQL interface for Hadoop Improves existing tools & preserves investments Query Processing Vectorized Query Execution Engine Tez = 100X+ + File Format ORCFile
  • Page 9 Hortonworks Inc. 2014 New in Hive HDP 2.1: Speed New Features for Speed Interactive query using Hive on Tez Vectorized query execution Cost-based optimizer
  • Page 10 Hortonworks Inc. 2014 New in HDP 2.1: More Than 10 New SQL Features New SQL Features Subquery for IN / NOT IN Support for EXISTS and NOT EXISTS Common table expressions (CTEs) Support for CHAR datatype Scale and precision support for DECIMAL datatype JOIN conditions in the WHERE clause Cancel jobs via ODBC / JDBC Support for Unicode column names Permanent functions Stream data into Hive from Flume (Experimental feature)
  • Page 11 Hortonworks Inc. 2014 Hives Journey to SQL Compliance Evolu
  • Page 12 Hortonworks Inc. 2014 New in HDP 2.1: Other Improvements Other New Hive Features SQL standard authorization Hive job visualizer in Ambari PAM authentication support SSL encryption support in HiveServer2 Dynamic partition scalability
  • Page 13 Hortonworks Inc. 2014 Demo
  • Page 14 Hortonworks Inc. 2014 FoodMart Dataset FoodMart Dataset, replicated 275 times (~ 10GB data) Queries run locally on an HDP 2.1 Sandbox. Queries to do some customer analytics. sales_fact_1997 customer Other Dimension Tables time_by_day
  • Page 15 Hortonworks Inc. 2014 Learn More About Hive & The Stinger Initiative Hortonworks.com/labs/stinger/ Register for the remaining 5 Discover HDP 2.1 Webinars Hortonworks.com/ webinars Next Webinar: Apache Falcon for Data Governance in Hadoop Wednesday, May 21, 10am Pacific