In February 2013, the open source community launched the Stinger Initiative to improve speed, scale and SQL semantics in Apache Hive. After thirteen months of constant, concerted collaboration (and more than 390,000 new lines of Java code) Stinger is complete with Hive 0.13. In this presentation, Carter Shanklin, Hortonworks director of product management, and Owen O'Malley, Hortonworks co-founder and committer to Apache Hive, discuss how Hive enables interactive query using familiar SQL semantics.
Text of Discover HDP 2.1: Interactive SQL Query in Hadoop with Apache Hive
Page 1 Hortonworks Inc. 2014 Discover HDP 2.1 Interactive SQL Query in Hadoop with Apache Hive Hortonworks. We do Hadoop.
Page 2 Hortonworks Inc. 2014 Speakers Justin Sears Hortonworks Product Marketing Manager Carter Shanklin Hortonworks Director of Product Management & PM for Apache Hive in Hortonworks Data Platform Owen OMalley Hortonworks Co-Founder, Engineer & Committer for Apache Hive project
Page 3 Hortonworks Inc. 2014 OPERATIONS TOOLS Provision, Manage & Monitor DEV & DATA TOOLS Build & Test A Modern Data ArchitectureAPPLICATIONS DATA SYSTEM REPOSITORIES RDBMS EDW MPP Business Analy
Page 4 Hortonworks Inc. 2014 HDP 2.1: Enterprise Hadoop HDP 2.1 Hortonworks Data Platform HDP 2.1 Hortonworks Data Platform Provision, Manage & Monitor Ambari Zookeeper Scheduling Oozie Data Workow, Lifecycle & Governance Falcon Sqoop Flume NFS WebHDFS YARN : Data Opera
Page 6 Hortonworks Inc. 2014 Apache Hive After the Stinger Initiative: Speed, Scale & SQL Compliance
Page 7 Hortonworks Inc. 2014 Hive: SQL Analytics For Any Data Size Sensor Mobile Weblog OperaConal / MPP Store and Query all Data in Hive Use Exis
Page 8 Hortonworks Inc. 2014 The Stinger Initiative: Complete Community initiative around Hive Enables Hive to support interactive workloads Enhances Hives standard SQL interface for Hadoop Improves existing tools & preserves investments Query Processing Vectorized Query Execution Engine Tez = 100X+ + File Format ORCFile
Page 9 Hortonworks Inc. 2014 New in Hive HDP 2.1: Speed New Features for Speed Interactive query using Hive on Tez Vectorized query execution Cost-based optimizer
Page 10 Hortonworks Inc. 2014 New in HDP 2.1: More Than 10 New SQL Features New SQL Features Subquery for IN / NOT IN Support for EXISTS and NOT EXISTS Common table expressions (CTEs) Support for CHAR datatype Scale and precision support for DECIMAL datatype JOIN conditions in the WHERE clause Cancel jobs via ODBC / JDBC Support for Unicode column names Permanent functions Stream data into Hive from Flume (Experimental feature)
Page 11 Hortonworks Inc. 2014 Hives Journey to SQL Compliance Evolu
Page 12 Hortonworks Inc. 2014 New in HDP 2.1: Other Improvements Other New Hive Features SQL standard authorization Hive job visualizer in Ambari PAM authentication support SSL encryption support in HiveServer2 Dynamic partition scalability
Page 13 Hortonworks Inc. 2014 Demo
Page 14 Hortonworks Inc. 2014 FoodMart Dataset FoodMart Dataset, replicated 275 times (~ 10GB data) Queries run locally on an HDP 2.1 Sandbox. Queries to do some customer analytics. sales_fact_1997 customer Other Dimension Tables time_by_day
Page 15 Hortonworks Inc. 2014 Learn More About Hive & The Stinger Initiative Hortonworks.com/labs/stinger/ Register for the remaining 5 Discover HDP 2.1 Webinars Hortonworks.com/ webinars Next Webinar: Apache Falcon for Data Governance in Hadoop Wednesday, May 21, 10am Pacific