MapReduce and DBMS Hybrids

12: MapReduce and DBMS Hybrids

Zubair Nabi

[email protected]

May 26, 2013

Zubair Nabi 12: MapReduce and DBMS Hybrids May 26, 2013 1 / 37

Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary


Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary


Introduction

Data warehousing solution built atop Hadoop by Facebook

Now an Apache open source project

Queries are expressed in SQL-like HiveQL, which are compiled intomap-reduce jobs

Also contains a type system for describing RDBMS-like tables

A system catalog, Hive-Metastore, which contains schemas andstatistics is used for data exploration and query optimization

Stores 2PB of uncompressed data at Facebook and is heavily used forsimple summarization, business intelligence, machine learning, amongmany other applications1

Also used by Digg, Grooveshark, hi5, Last.fm, Scribd, etc.

1https://www.facebook.com/note.php?note_id=89508453919Zubair Nabi 12: MapReduce and DBMS Hybrids May 26, 2013 4 / 37

https://www.facebook.com/note.php?note_id=89508453919

Introduction










Introduction










Introduction










Introduction










Introduction










Introduction










Data Model

Tables:I Similar to RDBMS tables

I Each table has a corresponding HDFS directoryI The contents of the table are serialized and stored in files within that

directoryI Serialization can be both system provided or user definedI Serialization information of each table is also stored in the

Hive-Metastore for query optimizationI Tables can also be defined for data stored in external sources such as

HDFS, NFS, and local FS


Data Model

Tables:I Similar to RDBMS tablesI Each table has a corresponding HDFS directory

I The contents of the table are serialized and stored in files within thatdirectory

I Serialization can be both system provided or user definedI Serialization information of each table is also stored in the




Data Model

Tables:I Similar to RDBMS tablesI Each table has a corresponding HDFS directoryI The contents of the table are serialized and stored in files within that

directory

I Serialization can be both system provided or user definedI Serialization information of each table is also stored in the




Data Model


directoryI Serialization can be both system provided or user defined

I Serialization information of each table is also stored in theHive-Metastore for query optimization

I Tables can also be defined for data stored in external sources such asHDFS, NFS, and local FS


Data Model



Hive-Metastore for query optimization

I Tables can also be defined for data stored in external sources such asHDFS, NFS, and local FS


Data Model






Data Model (2)

Partitions:I Determine the distribution of data within sub-directories of the main

table directory

I For instance, for a table T stored in /wh/T and partitioned on columnsds and ctry

F Data with ds value 20090101 and ctry value US,F Will be stored in files within /wh/T/ds=20090101/ctry=US

Buckets:I Data within partitions is divided into bucketsI Buckets are calculated based on the hash of a column within the

partitionI Each bucket is stored within a file in the partition directory


Data Model (2)


table directoryI For instance, for a table T stored in /wh/T and partitioned on columnsds and ctry





Data Model (2)



F Data with ds value 20090101 and ctry value US,

F Will be stored in files within /wh/T/ds=20090101/ctry=US




Data Model (2)







Data Model (2)




Buckets:I Data within partitions is divided into buckets

I Buckets are calculated based on the hash of a column within thepartition

I Each bucket is stored within a file in the partition directory


Data Model (2)





partition

I Each bucket is stored within a file in the partition directory


Data Model (2)







Column Data Types

Primitive types: integers, floats, strings, dates, and booleans

Nestable collection types: arrays and maps

Custom types: user-defined


Column Data Types





Column Data Types





HiveQL

Supports select, project, join, aggregate, union all, and sub-queries

Tables are created using data definition statements with specificserialization formats, partitioning, and bucketing

Data is loaded from external sources and inserted into tables

Support for multi-table insert – multiple queries on the same input datausing a single HiveQL statement

User-defined column transformation and aggregation functions in Java

Custom map-reduce scripts written in any language can be embedded


HiveQL








HiveQL








HiveQL








HiveQL








HiveQL








Example: Facebook Status

Status updates are stored on flat files in an NFS directory/logs/status_updates

This data is loaded on a daily basis to a Hive table:status_updates(userid int,status string,dsstring)

Using:

1 LOAD DATA LOCAL INPATH ’/logs/status_updates’2 INTO TABLE status_updates PARTITION (ds=’2013-05-26’)

Detailed profile information, such as gender and academic institution ispresent in the table: profiles(userid int,schoolstring,gender int)





Using:







Using:







Using:




Example: Facebook Status (2)

Query to workout the frequency of status updates based on gender andacademic institution

1 FROM (SELECT a.status, b.school, b.gender2 FROM status_updates a JOIN profiles b3 ON (a.userid = b.userid and4 a.ds=’2013-05-26’)5 ) subq16 INSERT OVERWRITE TABLE gender_summary7 PARTITION(ds=’2013-05-26’)8 SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender9 INSERT OVERWRITE TABLE school_summary

10 PARTITION(ds=’2013-05-26’)11 SELECT subq1.school, COUNT(1) GROUP BY subq1.school


Example: Facebook Status (2)

Query to workout the frequency of status updates based on gender andacademic institution

1 FROM (SELECT a.status, b.school, b.gender2 FROM status_updates a JOIN profiles b3 ON (a.userid = b.userid and4 a.ds=’2013-05-26’)5 ) subq16 INSERT OVERWRITE TABLE gender_summary7 PARTITION(ds=’2013-05-26’)8 SELECT subq1.gender, COUNT(1) GROUP BY subq1.gender9 INSERT OVERWRITE TABLE school_summary

10 PARTITION(ds=’2013-05-26’)11 SELECT subq1.school, COUNT(1) GROUP BY subq1.school


Metastore

Similar to the metastore maintained by traditional warehousingsolutions such as Oracle and IBM DB2 (distinguishes Hive from Pig orCascading which have no such store)

Stored in either a traditional DB such as MySQL or an FS such as NFSContains the following objects:

I Database: namespace for tablesI Table: metadata for a table including columns and their types, owner,

storage, and serialization informationI Partition: metadata for a partition; similar to the information for a table


Metastore


Stored in either a traditional DB such as MySQL or an FS such as NFS

Contains the following objects:I Database: namespace for tablesI Table: metadata for a table including columns and their types, owner,



Metastore



I Database: namespace for tables

I Table: metadata for a table including columns and their types, owner,storage, and serialization information

I Partition: metadata for a partition; similar to the information for a table


Metastore




storage, and serialization information

I Partition: metadata for a partition; similar to the information for a table


Metastore






Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary


Introduction

Two options for data analytics on shared nothing clusters:1 Parallel Databases, such as Teradata, Oracle etc. but,

I Assume that failures are a rare eventI Assume that hardware is homogeneousI Never tested in deployments with more than a few dozen nodes

2 MapReduce but,I All shortcomings pointed by DeWitt and Stonebraker, as discussed

beforeI At times an order of magnitude slower than parallel DBs


Introduction


I Assume that failures are a rare event

I Assume that hardware is homogeneousI Never tested in deployments with more than a few dozen nodes




Introduction


I Assume that failures are a rare eventI Assume that hardware is homogeneous

I Never tested in deployments with more than a few dozen nodes




Introduction






Introduction



2 MapReduce but,

I All shortcomings pointed by DeWitt and Stonebraker, as discussedbefore

I At times an order of magnitude slower than parallel DBs


Introduction




before

I At times an order of magnitude slower than parallel DBs


Introduction






Hybrid

Combine scalability and non-existent monetary cost of MapReducewith performance of parallel DBs

HadoopDB is such a hybridI Unlike Hive, Pig, Greenplum, Aster, etc. which are language and

interface level hybrids, Hadoop DB is a systems level hybrid

Uses MapReduce as the communication layer atop a cluster of nodesrunning single-node DBMS instances

PostgreSQL as the database layer, Hadoop as the communicationlayer, and Hive as the translation layer

Commercialized through the start up, Hadapt2

2http://hadapt.com/Zubair Nabi 12: MapReduce and DBMS Hybrids May 26, 2013 14 / 37

http://hadapt.com/

Hybrid

Combine scalability and non-existent monetary cost of MapReducewith performance of parallel DBsHadoopDB is such a hybrid

I Unlike Hive, Pig, Greenplum, Aster, etc. which are language andinterface level hybrids, Hadoop DB is a systems level hybrid





http://hadapt.com/

Hybrid







http://hadapt.com/

Hybrid







http://hadapt.com/

Hybrid







http://hadapt.com/

Hybrid







http://hadapt.com/

HadoopDB

Consists of four components:

1 Database Connector: Interface between per-node database systemsand Hadoop TaskTrackers

2 Catalog: Meta-information about per-node databases

3 Data Loader: Data partitioning across single-node databases

4 SQL to MapReduce to SQL (SMS) Planner: Translation betweenSQL and MapReduce


HadoopDB







HadoopDB







HadoopDB







HadoopDB Architecture


Database Connector

Uses the Java Database Connectivity (JDBC)-compliant HadoopInputFormat

The connector is served the SQL query and other information by theMapReduce job

The connector connects to the DB, executes the SQL query, andreturns results in the form of key/value pairs

Hadoop in essence sees the DB as just another data source


Database Connector






Database Connector






Database Connector






Catalog

Contains information, such as:1 Connection parameters, such as DB location, format, and any

credentials

2 Metadata about the datasets, replica locations, and partitioning scheme

Stored as an XML file on the HDFS


Catalog


credentials2 Metadata about the datasets, replica locations, and partitioning scheme



Catalog


credentials2 Metadata about the datasets, replica locations, and partitioning scheme



Data Loader

Consists of two key components:

1 Global Hasher: Executes a custom Hadoop job to repartition raw datafiles from the HDFS into n parts, where n is the number of nodes in thecluster

2 Local Hasher: Copies a partition from the HDFS to the node-local DBof each node and further partitions it into smaller size chunks


Data Loader

Consists of two key components:

1 Global Hasher: Executes a custom Hadoop job to repartition raw datafiles from the HDFS into n parts, where n is the number of nodes in thecluster

2 Local Hasher: Copies a partition from the HDFS to the node-local DBof each node and further partitions it into smaller size chunks


SQL to MapReduce to SQL (SMS) Planner

Extends HiveQL in two key ways:

1 Before query execution, the Hive Metastore is updated with referencesto HadoopDB tables, table schemas, formats, and serializationinformation

2 All operators with partitioning keys similar to the node-local databaseare converted into SQL queries and pushed to the database layer


SQL to MapReduce to SQL (SMS) Planner

Extends HiveQL in two key ways:

1 Before query execution, the Hive Metastore is updated with referencesto HadoopDB tables, table schemas, formats, and serializationinformation

2 All operators with partitioning keys similar to the node-local databaseare converted into SQL queries and pushed to the database layer


Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary


Introduction

The declarative nature of SQL is too limiting for describing most bigdata computation

The underlying subsystems are also suboptimal as they do notconsider domain-specific optimizations

nCluster makes use of SQL/MR, a framework that inserts user-definedfunctions in any programming language into SQL queries

By itself, nCluster is a shared-nothing parallel database gearedtowards analytic workloads

Originally designed by Aster Data Systems and later acquired byTeradata

Used by Barnes and Noble, LinkedIn, SAS, etc.


Introduction








Introduction








Introduction








Introduction








Introduction








SQL/MR Functions

Dynamically polymorphic: input and output schemes are decided atruntime

Parallelizable across cores and machines

Composable because their input and output behaviour is identical toSQL subqueries

Amenable to static and dynamic optimizations just like SQL subqueriesor a relation

Can be implemented in a number of languages including Java, C#,C++, Python, etc. and can thus make use of third-party libraries

Executed within processes to provide sandboxing and resourceallocation


SQL/MR Functions








SQL/MR Functions








SQL/MR Functions








SQL/MR Functions








SQL/MR Functions








Syntax

1 SELECT ...2 FROM functionname(3 ON table-or-query4 [PARTITION BY expr, ...]5 [ORDER BY expr, ...]6 [clausename(arg, ...) ...]7 )8 ...

SQL/MR function appears in the FROM clause

ON is the only required clause which specifies the input to the function

PARTITION BY partitions the input to the function on one or moreattributes from the schema


Syntax






Syntax






Syntax (2)


ORDER BY sorts the input to the function and can only be used after aPARTITION BY clause

Any number of custom clauses can also be defined whose names andarguments are passed as a key/value map to the function

Implemented as relations so easily nestable


Syntax (2)






Syntax (2)






Execution Model

Functions are equivalent to either map (row function) or reduce(partition function) functions

Identical to MapReduce, these functions are executed across manynodes and machinesContracts identical to MapReduce functions

I Only one row function operates over a row from the input tableI Only one partition function operates over a group of rows defined by thePARTITION BY clause, in the order specified by the ORDER BYclause


Execution Model


Identical to MapReduce, these functions are executed across manynodes and machines

Contracts identical to MapReduce functionsI Only one row function operates over a row from the input tableI Only one partition function operates over a group of rows defined by thePARTITION BY clause, in the order specified by the ORDER BYclause


Execution Model



I Only one row function operates over a row from the input table

I Only one partition function operates over a group of rows defined by thePARTITION BY clause, in the order specified by the ORDER BYclause


Execution Model



I Only one row function operates over a row from the input tableI Only one partition function operates over a group of rows defined by thePARTITION BY clause, in the order specified by the ORDER BYclause


Programming Interface

A Runtime Contract is passed by the query planner to thefunction which contains the names and types of the input columns andthe names and values of the argument clauses

The function then completes this contract by filling in the outputschema and making a call to complete()Row and partition functions are implemented through theoperateOnSomeRows and operateOnPartition methods,respectively

I These methods are passed an iterator over their input rows and anemitter object for returning output rows to the database

operateOnPartition can also optionally implement the combinerinterface




The function then completes this contract by filling in the outputschema and making a call to complete()

Row and partition functions are implemented through theoperateOnSomeRows and operateOnPartition methods,respectively






















Installation

Functions need to be installed first before they can be used

Can be supplied as a .zip along with third-party libraries

Install-time examination also enables static analysis of properties, suchas row function or partition function, support for combining, etc.

Any arbitrary file can be installed which is replicated to all workers,such as configuration files, binaries, etc.

Each function is provided with a temporary directory which is garbagecollected after execution


Installation







Installation







Installation







Installation







Architecture

One or more Queen nodes process queries and hash partition themacross Worker nodes

The query planner honours the Runtime Contract with thefunction and invokes its initializer (Constructor in case of Java)

Functions are executed within the Worker databases as separateprocesses for isolation, security, resource allocation, forcedtermination, etc.

The worker database implements a “bridge” which manages itscommunication with the SQL/MR function

The SQL/MR function process contains a “runner” which manages itscommunication with the worker database


Architecture







Architecture







Architecture







Architecture







Architecture (2)


Example: Wordcount

1 SELECT token, COUNT(*)2 FROM tokenizer(3 ON input-table4 DELIMITER(’ ’)5 )6 GROUP BY token;


Example: Clickstream Sessionization

Divide a user’s clicks on a website into sessions

A session includes the user’s clicks within a specified time period

Timestamp User ID10:00:00 23890900:58:24 765610:00:24 23890902:30:33 765610:01:23 23890910:02:40 238909

Timestamp User ID Session ID10:00:00 238909 010:00:24 238909 010:01:23 238909 010:02:40 238909 100:58:24 7656 002:30:33 7656 1














Example: Clickstream Sessionization (2)

1 SELECT ts, userid, session2 FROM sessionize (3 ON clicks4 PARTITION BY userid5 ORDER BY ts6 TIMECOLUMN (’ts’)7 TIMEOUT (60)8 );


Example: Clickstream Sessionization (3)

1 public class Sessionize implements PartitionFunction {23 private int timeColumnIndex;4 private int timeout;56 public Sessionize(RuntimeContract contract) {7 // Get time column and timeout from contract8 // Define output schema9 contract.complete();

10 }1112 public void operationOnPartition(13 PartitionDefinition partition,14 RowIterator inputIterator,15 RowEmitter outputEmitter) {16 // Implement the partition function logic17 // Emit output rows18 }1920 }


Outline

1 Hive

2 HadoopDB

3 nCluster

4 Summary


Summary

Hive, HadoopDB, and nCluster explore three different points in the designspace

1 Hive uses MapReduce to give DBMS-like functionality

2 HadoopDB uses MapReduce and DBMS side-by-side

3 nCluster implements MapReduce within a DBMS


Summary






Summary






Summary






References

1 Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, PrasadChakka, Suresh Anthony, Hao Liu, Pete Wyckoff, and RaghothamMurthy. 2009. Hive: a warehousing solution over a map-reduceframework. Proc. VLDB Endow. 2, 2 (August 2009), 1626-1629.

2 Azza Abouzeid, Kamil Bajda-Pawlikowski, Daniel Abadi, AviSilberschatz, and Alexander Rasin. 2009. HadoopDB: an architecturalhybrid of MapReduce and DBMS technologies for analytical workloads.Proc. VLDB Endow. 2, 1 (August 2009), 922-933.

3 Eric Friedman, Peter Pawlowski, and John Cieslewicz. 2009.SQL/MapReduce: a practical approach to self-describing, polymorphic,and parallelizable user-defined functions. Proc. VLDB Endow. 2, 2(August 2009), 1402-1413.


Technology

MapReduce and DBMS Hybrids