Hive 101: Hive Query Language

  • View
    82

  • Download
    2

Embed Size (px)

DESCRIPTION

Hive 101: Hive Query Language. 2014-08-21. Jeff Clouse. Agenda. What is Hive HUE HQL Select Operators Functions Joins Sub Queries Union Hive best practices. What is Hive. High level implementation of MapReduce Language is Hive Query Language - HQL - PowerPoint PPT Presentation

Text of Hive 101: Hive Query Language

Hive 101

Hive 101: Hive Query Language2014-08-21Jeff ClouseIN-0021 2013 Inmar, Inc. All Rights Reserved.AgendaWhat is HiveHUEHQLSelectOperatorsFunctionsJoinsSub Queries UnionHive best practices2 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.What is HiveHigh level implementation of MapReduceLanguage is Hive Query Language - HQLHQL is a subset of ANSI SQL with extensionsMetadata is stored in MySQLSemantics are very much like Oracle and MySQLThere are no Updates3 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.What is HiveHive tablesExternal TablesWarehouse TablesDrops in HIVE External tables delete metadataDrops in the HIVE warehouse really delete4 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HUEHadoop User ExperienceProvides web access to Hive5

2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HUE

6 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL Select SyntaxSelectSelect * From t1DistinctSelect Distinct col1 From t1WhereSelect * From t1 where col1 = USLimitSelect * From t1 limit 5Group BySelect col1, sum(col2) as Total From t1 group by col1Order BySelect col1, sum(col2) as Total From t1 group by col1 order by col1HavingSelect col1, sum(col2) as Total From t1 group by col1 having sum(col2) > 507 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL Predicate Operators=EqualsEquals or both sides are NULL, !=Not equal=Greater than or equal to[not] betweenValue is equal to or between two valuesis [not] NULLCheck Value for NULLlikeValue is like another value. Wildcards are % and _8 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL Arithmetic OperatorsA - BSubtract B from AA * BMultiply A and BA / BDivide A by BA + BAdd A and BA % BThe remainder resulting from A/B

A & BBitwise and of A and BA | BBitwise or of A and B A ^ BBitwise xor of A and B~ABitwise negation of A9 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL Logical OperatorsA and B, A && BBoolean and of A and BA or B, A || BBoolean or of A and BNOT A, !ABoolean negation of AA [NOT] IN (B,)A is in [or not] a set of values

10 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL FunctionsRound(A)Round(A,2)Floor(A)Ceiling(A)Rand()

Year(date)Month(date)Datediff(date1, date2)Date_add(startdate, days)11Length(A)Upper(A)Concat(A, B, )Substring(A, start ,len)Trim(A)

Sum(A)Count(*)Min(A)Max(A)

2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL JoinsJoinSelect * from table1 t1 join table2 t2 on t1.key = t2.keyOnly returns records from both tablesOuter JoinsLeftSelect * from table1 t1 left join table2 t2 on t1.key = t2.keyReturns all rows from the left table, t1, and matching rows from the right table. Missing rows from the right table will be populated with NULLRight Select * from table1 t1 right join table2 t2 on t1.key = t2.keyReturns all rows from the right table, t2, and matching rows from the left table. Missing rows from the left table will be populated with NULLFull Select * from table1 t1 full outer join table2 t2 on t1.key = t2.keyReturns all rows from both tables. Missing rows from either table will be populated with NULL

12 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL SubQueries and UnionUsed to combine multiple result setsOnly UNION ALL is supported currentlyThe number and name of columns returned by each select statement must be the same.Select *from (Select col1, col2from t1UNION ALLselect col1, col2from t2) unionResultsSub-queries are only supported in the from clauseSupport for sub-queries in the where clause will be limited to IN and EXISTS in Hive 0.1313 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL AnalyticsCount(A)Sum(A)Min(A)Max(A)Avg()Over()Partition ByOrder By14Lead() and Lag()RANKROW_NUMBERDENSE_RANKCUME_DISTPERCENT_RANKNTILE

2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL AnalyticsPARTITION BY with one partitioning column, no ORDER BY or window specificationSELECT a, COUNT(b) OVER (PARTITION BY c) FROM T;

PARTITION BY with two partitioning columns, no ORDER BY or window specificationSELECT a, COUNT(b) OVER (PARTITION BY c, d) FROM T;

PARTITION BY with one partitioning column, one ORDER BY column, and no window specificationSELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d) FROM T;

PARTITION BY with two partitioning columns, two ORDER BY columns, and no window specificationSELECT a, SUM(b) OVER (PARTITION BY c, d ORDER BY e, f) FROM T;

PARTITION BY with partitioning, ORDER BY, and window specificationSELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM T;SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) FROM T;SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) FROM T;SELECT a, AVG(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM T;

15 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.HQL AnalyticsWINDOW clauseSELECT a, SUM(b) OVER w FROM T;WINDOW w AS (PARTITION BY c ORDER BY d ROWS UNBOUNDED PRECEDING)

LEAD using default 1 row lead and not specifying default valueSELECT a, LEAD(a) OVER (PARTITION BY b ORDER BY C ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING) FROM T;

LAG specifying a lag of 3 rows and default value of 0SELECT a, LAG(a, 3, 0) OVER (PARTITION BY b ORDER BY C ROWS 3 PRECEDING) FROM T;

16 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.Hive best practicesSmallest to largest tables for joinsData LayoutPartition large tablesUse the partition in your where clause

17 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.Partitioning by Month18JanFebDecTransF0100F0101F0103F0102F0200F0201F0203F0202F1200F1201F1203F1202TablePartitioned by MonthFiles withinthe partitionsF0105F0104F1204 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.Hive best practicesSmallest to largest tables for joinsData LayoutPartition large tablesUse the partition in your where clauseBucketing

19 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.Bucketing by Basket_id20TransTablesFiles containing Rows with same hash for Bucket_IdTrans_item 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.Hive best practicesSmallest to largest tables for joinsData LayoutPartition large tablesUse the partition in your where clauseBucketingData SamplingBucket TABLESAMPLE(bucket 30 out of 64 on basket_id)Block TABLESAMPLE(1 PERCENT)Parallel Processing set hive.exec.parallel=true;

21 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.Questions? 2014 Inmar, Inc. All Rights Reserved. 2013 Inmar, Inc. All Rights Reserved.