12
June 2012 HiveServer2 Project (WIP) Carl Steinbach | Platform Engineering

HiveServer2 for Apache Hive

Embed Size (px)

Citation preview

Page 1: HiveServer2 for Apache Hive

June 2012

HiveServer2 Project (WIP) Carl Steinbach | Platform Engineering

Page 2: HiveServer2 for Apache Hive

Hive Background: What is it?

An ETL/Data Warehouse system for Hadoop:

•  SQL->MR Compiler and Execution Engine

•  SerDes: Pluggable Data Format Handlers

•  MetaStore: Persistent Metadata Storage

©2012 Cloudera, Inc. All Rights Reserved. 2

Page 3: HiveServer2 for Apache Hive

Hive Evolution

•  Original Vision: – Let users express their queries in a high-level

language without having to write MR programs

•  Now more and more: – A parallel SQL DBMS that happens to use

Hadoop for its storage and execution layer.

©2012 Cloudera, Inc. All Rights Reserved. 3

Page 4: HiveServer2 for Apache Hive

What do users expect from a DBMS?

•  Sessions/Concurrency – Persistent client state on the server-side – Ability to run multiple client concurrently

•  ODBC/JDBC – SQL IDEs, BI, ETL, …

•  Authentication/Authorization •  Auditing/Logging

©2012 Cloudera, Inc. All Rights Reserved. 4

Page 5: HiveServer2 for Apache Hive

What’s Missing?

•  Sessions/Concurrency – Current Thrift API can’t support concurrency

•  ODBC/JDBC – Thrift API doesn’t support common ODBC/JDBC

•  Authentication/Authorization –  Incomplete implementations

•  Auditing/Logging – Multiple plugin interfaces in need of consolidation

©2012 Cloudera, Inc. All Rights Reserved. 5

Page 6: HiveServer2 for Apache Hive

What’s Missing

Concurrency/Sessions •  Current Thrift API can’t support multiple

connections or client sessions. •  User/Global Configuration and Session

Info •  Query compiler memory leaks

©2012 Cloudera, Inc. All Rights Reserved. 6

Page 7: HiveServer2 for Apache Hive

What’s Missing

ODBC/JDBC •  Thrift API can’t support common ODBC/

JDBC calls: – SQLGetInfo – SQLGetTypeInfo – SQLCancel – SQLGetFunctions

©2012 Cloudera, Inc. All Rights Reserved. 7

Page 8: HiveServer2 for Apache Hive

What’s Missing

Authentication/Authorization •  SASL Authentication for HiveServer •  Hive supports GRANT/ROLE based

authorization, but implementation is incomplete.

•  Code injection vectors: ADD JAR, TRANSFORM, SET x, …

©2012 Cloudera, Inc. All Rights Reserved. 8

Page 9: HiveServer2 for Apache Hive

Project Milestones

•  HiveServer2 Thrift API Spec •  JDBC/ODBC HiveServer2 Drivers •  Concurrent Thrift clients

– Fix query compiler memory leaks – User/Global session/configuration information

•  Authentication (Kerberos) •  Authorization

– Extend to configuration, ADD x, TRANSFORM, …

©2012 Cloudera, Inc. All Rights Reserved. 9

Page 10: HiveServer2 for Apache Hive

Who’s working on it?

•  Carl Steinbach – carl@cloudera

•  Prasad Mujumdar – prasadm@cloudera

©2011 Cloudera, Inc. All Rights Reserved. 10

Page 11: HiveServer2 for Apache Hive

Resources

•  HIVE-2935: Implement HiveServer2

•  HiveServer API Proposal: – https://cwiki.apache.org/confluence/display/

Hive/HiveServer2+Thrift+API

©2011 Cloudera, Inc. All Rights Reserved. 11

Page 12: HiveServer2 for Apache Hive

Questions?

•  Questions? •  Questions? •  Questions? •  Questions?

©2012 Cloudera, Inc. All Rights Reserved. 12