Transcript
Page 1: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA

2013 © Trivadis

HDInsight in Windows Azure

Marc Schöni

Meinrad Weiss

April 2014

04.03.2014HDInsight in Windows Azure R 1.001

Page 2: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

Introduction

HDInsight on Windows Azure

2

Page 3: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Big data solutions deal with complexities of:

VOLUME

(Size)

VARIETY

(Structure)

VELOCITY

(Speed)

Big Data

VALUE

Hadoop/HDInsight3

Focus

04.03.2014HDInsight in Windows Azure R 1.00

Page 4: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

4

HDInsight Versions on Azure

Component V 1.6 V 2.1 V3 (Preview)

Apache Hadoop 1.0.3 01.02.2000 02.02.2000

Apache Hive 0.9.0 0.11.0 0.12.0

Apache Pig 0.9.3 0.11 0.12

Apache Sqoop 01.04.2002 01.04.2003 01.04.2004

Apache Oozie 03.02.2000 03.02.2002 4.0.0

Apache HCatalog 0.4.1 Merged with Hive Merged with Hive

Apache Templeton 0.1.4 Merged with Hive Merged with Hive

Ambari API v1.0 API v1.0

SQL Server JDBC Driver 3 No Information No Information

Source: http://www.windowsazure.com/en-us/documentation/articles/hdinsight-component-versioning/

Page 5: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

5

Hadoop Zoo

Page 6: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Windows Azure Blob StorageHDFS

Hadoop Filesystem Interface

Query &

Metadata:

Data

Movement:Workflow: Monitoring:

Windows Azure HDInsight Service

04.03.2014HDInsight in Windows Azure R 1.00

6

Page 7: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

Windows Azure

Blob Storage

7

Page 8: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Windows Azure HDInsight Service

04.03.2014HDInsight in Windows Azure R 1.00

8

Page 9: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Windows Azure HDInsight Service

04.03.2014HDInsight in Windows Azure R 1.00

9

Focus

Page 10: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Windows Azure Storage

Scalable, durable, and available

Anywhere at anytime access

Only pay for what the service uses

Use from Windows Azure Compute

Use from anywhere on the internet

04.03.2014HDInsight in Windows Azure R 1.00

10

Page 11: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Northern Europe

Western EuropeSouth Central US

West US East US

Datacenters and Regions

04.03.2014HDInsight in Windows Azure R 1.00

11

Page 12: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

12

• Higher durability• 3 local replicas in primary location

• Local replicas – synchronously replicated

• Common failures (disk, node, rack) – use local copies to recover

• Major disasters – contact customer about potential data loss

• Reduced Price – 23-34% based on how much you store

• Turn off Geo for your storage account in portal• Non-critical data that can be recreated on major

disasters

• Application manages its own replica

• Companies have limitations on geo locations

• Highest level of durability• 3 local replicas each in primary and secondary

locations

• Local replicas – synchronously replicated

• Geo replica – asynchronously replicated

• Common failures (disk, node, rack) – use local copies to recover

• Major disasters – use geo replicated copy (400+ miles apart)

• Price remains the same as before

• Enabled by default

Page 13: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Blob Storage Concepts

• Store large amounts of unstructured text or binary data with the fastest read performance

• Highly scalable, durable, and available file system

• Blobs can be exposed publically over HTTP

• Securely lock down permissions to blobs

04.03.2014HDInsight in Windows Azure R 1.00

13

Page 14: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Azure Blob storage

Setting up the Windows Azure storage account

Azure Portal

04.03.2014HDInsight in Windows Azure R 1.00

14

Page 15: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

15

Setup new Storage

Page 16: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

16

Move Data to Azure Blob Storage

Azure Blob storage

Set-AzureStorageBlobContent

-File "C:...\2011\Weather2011_H1_JustData.csv"

-Container $containername

-Blob "FlightDelay/.../2011/Weather2011_H1_JustData.csv"

-context $context

Power Shell

Tool like CloudBerry

Drag&Drop

Page 17: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

Windows Azure

HDInsight Service

17

Page 18: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Windows Azure HDInsight Service

04.03.2014HDInsight in Windows Azure R 1.00

18

Page 19: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Setting up the Windows Azure HDInsight cluster

Windows Azure HDInsight

Azure Blob storage

HDInsight Console

04.03.2014HDInsight in Windows Azure R 1.00

19

Page 20: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

20

Setup new Cluster

Page 21: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

21

Provision Cluster via PowerShell

# Create a new HDInsight cluster

$config = New-AzureHDInsightClusterConfig -ClusterSizeInNodes $clusterNodes `

| Set-AzureHDInsightDefaultStorage `

-StorageAccountName "$storageAccountName_Default.blob.core.windows.net" `

-StorageAccountKey $storageAccountKey_Default `

-StorageContainerName $containerName_Default `

| Add-AzureHDInsightMetastore `

-SqlAzureServerName "$hiveSQLDatabaseServerName.database.windows.net" `

-DatabaseName $hiveSQLDatabaseName `

-Credential $hiveCreds `

-MetastoreType HiveMetastore `

| New-AzureHDInsightCluster `

-Version "3.0" `

-Name $clusterName `

-Location $location `

-Credential $clusterCreds

Page 22: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

Map Reduce

22

Page 23: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Hadoop MapReduce

• Programming framework (library and runtime) for analyzing datasets stored in HDFS

• Composed of user-supplied Map and Reduce functions:• Map() - subdivide and

conquer

• Reduce() - combine and reduce cardinality

1. Divide a large problem into sub-problems.

………

2. Perform the same function on all sub-problems.

Do work()

3. Combine the output from all sub-functions.

Do work() Do work()

04.03.2014HDInsight in Windows Azure R 1.00

23

Page 24: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

MapReduce

• Rapidly process vast amounts of data in parallel, on a large cluster of compute nodes

• Framework schedules and monitors tasks, and re-executes failed tasks

• Typically, both input and output are stored in file system

DataNode 1

Mapper

Data is shuffled

across the network

and sorted

Map Phase Shuffle/Sort Reduce Phase

DataNode 2

Mapper

DataNode 3

Mapper

DataNode 1

Reducer

DataNode 2

DataNode 3

Reducer

04.03.2014HDInsight in Windows Azure R 1.00

24

Page 25: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Layout Windspeed Calculation

StationID Date Windspeed

123 22.01.2012 31

124 22.01.2012 34

125 22.01.2012 22

126 22.01.2012 12

123 23.01.2012 26

124 23.01.2012 29

125 23.01.2012 46

126 23.01.2012 12

StationID Date Windspeed

123 23.01.2012 26

124 23.01.2012 29

125 23.01.2012 46

126 23.01.2012 12

Compute Node 1 Compute Node 2

StationID Date Windspeed

123 22.01.2012 31

124 22.01.2012 34

125 22.01.2012 22

126 22.01.2012 12

04.03.2014HDInsight in Windows Azure R 1.00

25

Page 26: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Layout Windspeed Calculation

StationID Date Windspeed

123 23.01.2012 26

124 23.01.2012 29

125 23.01.2012 46

126 23.01.2012 12

Data Node 1 Data Node 2

StationID Date Windspeed

123 22.01.2012 31

124 22.01.2012 34

125 22.01.2012 22

126 22.01.2012 12

Key Value

Max 34

Key Value

Max 46

Map

Key Value

Max 46

Reduce

04.03.2014HDInsight in Windows Azure R 1.00

26

Page 27: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Hadoop Streaming Process

04.03.2014HDInsight in Windows Azure R 1.00

27

Page 28: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

HDInsight .NET Support for MapReduce

• “NuGet” Microsoft .NET MapReduce API for Hadoop

• Execute job through Powershell

• Collect the result on HDFS or directly into WASB storage

04.03.2014HDInsight in Windows Azure R 1.00

28

Page 29: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Creating a C# Mapper Program

• Reads in weather data through stdin

• Calculates the max windspeed

• Outputs key-value pair to stdout

HDInsight in Windows Azure R 1.0029

04.03.2014

Page 30: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Key Value

Max 46

StationID Date Windspeed

123 23.01.2012 26

124 23.01.2012 29

125 23.01.2012 46

126 23.01.2012 12Simple Console application

No special libraries needed

Creating a C# Mapper

04.03.2014HDInsight in Windows Azure R 1.00

30

Page 31: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Demo: Creating a C# Reducer Program

• Reads in key-value pairs through stdin

• Calculates the max windspeed

• Outputs the results stdout

HDInsight in Windows Azure R 1.003104.03.2014

Page 32: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Creating a C# Reducer

Key Value

Max 46

Key Value

Max 34

Key Value

Max 46Same program structure as

Mapper

04.03.2014HDInsight in Windows Azure R 1.00

32

Page 33: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Microsoft .NET SDK for Hadoop on

CodePlex

http://hadoopsdk.codeplex.com/

HDInsight Interactive JavaScript and

Hive Consoles

http://www.windowsazure.com/en-

us/manage/services/hdinsight/interactive-javascript-and-

hive-consoles/

Page 34: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

Hive & HiveQL (HQL)

34

Page 35: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Hive architecture

• Built on top of Hadoop to provide data management, querying, and analysis

• Access and query data through simple SQL-like statements, called Hive queries

• In short, Hive complies, Hadoop executes

04.03.2014HDInsight in Windows Azure R 1.00

35

Page 36: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Create, load, and query Hive tables

• HiveQL includes data definition language, data import/export and data manipulation language statements

• See https://cwiki.apache.org/confluence/display/Hive/LanguageManual

04.03.2014HDInsight in Windows Azure R 1.00

36

Page 37: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Demo: Create and Load Hive Tables

Windows Azure HDInsight

Hive

Partitioned

Hive table

Bucketed

table

Hive table

Hive

table

CASE

statement

Table

partitioning

Join

Query

results

“Cluster

by” clause Query

results

PowerShell Console

04.03.2014HDInsight in Windows Azure R 1.00

37

Page 38: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

38

Hive: Create Table

Define data structure on top of HDFS Files

CREATE EXTERNAL TABLE Weather_Data

(Station INT

,Date STRING

,Visibility DOUBLE

,Windspeed DOUBLE

,Latitude DOUBLE

,Longitude DOUBLE)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ','

STORED AS TEXTFILE LOCATION '$rootpart/FlightDelay/Weather/2012

External Table = Data files are not

bound to schema (Drop Table will not

delete the corresponding files)

Page 39: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

39

Hive: Create Partitioned Table

CREATE EXTERNAL TABLE ExtUSWeatherTypedDataPartitioned

(Station INT

,Date DATE

,Visibility DOUBLE

,Windspeed DOUBLE

,Latitude DOUBLE

,Longitude DOUBLE)

PARTITIONED BY (year string)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'

STORED AS TEXTFILE

LOCATION '$rootpart/FlightDelay/ExtUSWeatherDataPartitioned';

ALTER TABLE ExtUSWeatherTypedDataPartitioned

ADD PARTITION(year='2011')

LOCATION '$rootpart/FlightDelay/ExtUSWeatherDataPartitioned/year=2011'

Partition ‘Key’ -> Data in /year=2011/

Page 40: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

A view is a purely logical object with no associated storage

As in a regular relational database

04.03.2014HDInsight in Windows Azure R 1.00

40

Create View

CREATE VIEW StrongWind AS

SELECT

Station

,Date

,Visibility

,Windspeed

,Latitude

,Longitude

WHERE Windspeed > 10

Page 41: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Using the Hive ODBC driver

• Connector to HDInsight Hive available as part of HDInsight Hadoop clusters

• Enable business intelligence, analytics, and reporting on data in Hive

04.03.2014HDInsight in Windows Azure R 1.00

41

Page 42: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

42

Hive SQL Datatypes and Hive SQL Semantics

Page 43: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

43

HQL Date Function

Return

Type Function Description

int year(string date)Returns the year part of a date or a timestamp string:

year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970

int month(string date)Returns the month part of a date or a timestamp string:

month("1970-11-01 00:00:00") = 11, month("1970-11-01") = 11

int day(string date) dayofmonth(date)Return the day part of a date or a timestamp string:

day("1970-11-01 00:00:00") = 1, day("1970-11-01") = 1

int hour(string date)Returns the hour of the timestamp:

hour('2009-07-30 12:58:59') = 12, hour('12:58:59') = 12

int minute(string date) Returns the minute of the timestamp

int second(string date) Returns the second of the timestamp

int weekofyear(string date)Return the week number of a timestamp string:

weekofyear("1970-11-01 00:00:00") = 44, weekofyear("1970-11-01") = 44

int datediff(string enddate, string startdate)Return the number of days from startdate to enddate:

datediff('2009-03-01', '2009-02-27') = 2

string date_add(string startdate, int days) Add a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01'

string date_sub(string startdate, int days) Subtract a number of days to startdate: date_sub('2008-12-31', 1) = '2008-12-30'

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

Page 44: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Using the Hive ODBC driver

Microsoft Excel

PowerPivot

Hive

04.03.2014HDInsight in Windows Azure R 1.00

44

Page 45: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

HDInsight PowerShell for Hivehttp://hadoopsdk.codeplex.com/wikipage?title=Job%20Sub

mission%20PowerShell%20cmdlets&referringTitle=Home

How to Connect Excel to Windows

Azure HDInsight via HiveODBC

http://www.windowsazure.com/en-

us/manage/services/hdinsight/connect-excel-with-hive-

ODBC/

Page 46: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

LINQ to Hive

• Creates and compiles LINQ queries to use against Hive data

• Translates C# or F# LINQ queries into HiveQL queries and executes them on the Hadoop cluster

04.03.2014HDInsight in Windows Azure R 1.00

46

Page 47: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Working with LINQ Queries

Hive table

HDInsight

Hive

LINQ to Hive

04.03.2014HDInsight in Windows Azure R 1.00

47

Page 48: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

LINQ to Hive

http://hadoopsdk.codeplex.com/

wikipage?title=LINQ%20to%20Hive&referringTitle=Home

Using the Hadoop .NET SDK with the

HDInsight Service

http://www.windowsazure.com/en-us/manage/services/

hdinsight/howto-net-libraries/

Page 49: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

Sqoop

49

Page 50: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Using Sqoop to Move Data

• A tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases

04.03.2014HDInsight in Windows Azure R 1.00

50

Page 51: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Using SQOOP to Copy Data

SQL Database

Windows Server HDInsight

Azure Blob storage

Hive and Sqoop

PowerShell ConsoleWindows Azure SQL Database

04.03.2014HDInsight in Windows Azure R 1.00

51

Page 52: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Apache Sqoop Reference

http://sqoop.apache.org/docs/1.4.4/SqoopUserGuide.html

Hadoop on Windows Azure - Working With Data

http://www.windowsazure.com/en-

us/develop/net/tutorials/hadoop-and-data/

Page 53: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

04.03.2014HDInsight in Windows Azure R 1.00

Programming

53

Page 54: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

• Microsoft .NET SDK for Hadoop

– WebHDFS Client

– WebHCat

• Windows PowerShell Integration

DeveloperFriends

HDInsight in Windows Azure R 1.0054

Page 55: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Microsoft .NET SDK For Hadoop

• .NET client libraries for Hadoop

• Write MapReduce in Visual Studio using C# or F#

• Debug against local dataMicrosoft

Visual Studio

Slave Nodes

04.03.2014HDInsight in Windows Azure R 1.00

55

Page 56: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

SDK components

• MapReduce library

• LINQ to Hive client library

• WebClient library – WebHDFS client library

– WebHCat client library

04.03.2014HDInsight in Windows Azure R 1.00

56

Page 57: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

WebClient Libraries in .NET

• WebHDFS client library: works with files in HDFS and Windows Azure Blob storage

• WebHCat client library: manages the scheduling and execution of jobs in an HDInsight cluster

WebHDFS

• Scalable REST API

• Move files in and

out and delete

from HDFS

• Perform file and

directory functions

WebHCat

• HDInsight job

scheduling and

execution

04.03.2014HDInsight in Windows Azure R 1.00

57

Page 58: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Creating a Hive Table Using WebHDFS Client

Windows Server HDInsight

.NET Application (WebHDFS)

.NET application (WebHDFS)

to interact with

HDInsight cluster

Hive table

Load data

Copy data from

base machine

to Azure Storage

04.03.2014HDInsight in Windows Azure R 1.00

58

Page 59: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Performing a Remote Job with WebHCat

.NET application (WebHCat)

to interact with Hive tables

Query the Hive data

using .NET code

Hive table

Windows Server HDInsight

.NET Application (WebHCat)

04.03.2014HDInsight in Windows Azure R 1.00

59

Page 60: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Windows PowerShell Integration

• Manage an HDInsight cluster using a local management console

• PowerShell scripts to build projects, import data into HDFS, and run samples

• Repeatable management through scripting

04.03.2014HDInsight in Windows Azure R 1.00

60

Page 61: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Integrating PowerShell with HDInsight

Windows PowerShell

Create a Cluster

Run MapReduce Program

Delete the Customer

Windows Server HDInsight

PowerShell Integration

04.03.2014HDInsight in Windows Azure R 1.00

61

Page 62: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

Microsoft .NET SDK For Hadoop

http://hadoopsdk.codeplex.com/

Managing Your HDInsight Cluster

with PowerShell

http://hadoopsdk.codeplex.com/wikipage?title=PowerShell%

20Cmdlets%20for%20Cluster%20Management&referringTitl

e=Home

Page 63: HDInsight in Windows Azuredownload.microsoft.com/download/1/2/2/.../2014.04.22_HDInsightIn… · 22.04.2014  · HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

2013 © Trivadis

HDInsight in Windows Azure R 1.0063

Questions?


Recommended