HDInsight in Windows 22.04.2014 آ  HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure Component

  • View
    0

  • Download
    0

Embed Size (px)

Text of HDInsight in Windows 22.04.2014 آ  HDInsight in Windows Azure R 1.00 4 HDInsight Versions on Azure...

  • 2013 © Trivadis

    BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA

    2013 © Trivadis

    HDInsight in Windows Azure

    Marc Schöni

    Meinrad Weiss

    April 2014

    04.03.2014 HDInsight in Windows Azure R 1.001

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    Introduction

    HDInsight on Windows Azure

    2

  • 2013 © Trivadis

    Big data solutions deal with complexities of:

    VOLUME

    (Size)

    VARIETY

    (Structure)

    VELOCITY

    (Speed)

    Big Data

    VALUE

    Hadoop/HDInsight 3

    Focus

    04.03.2014 HDInsight in Windows Azure R 1.00

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    4

    HDInsight Versions on Azure

    Component V 1.6 V 2.1 V3 (Preview)

    Apache Hadoop 1.0.3 01.02.2000 02.02.2000

    Apache Hive 0.9.0 0.11.0 0.12.0

    Apache Pig 0.9.3 0.11 0.12

    Apache Sqoop 01.04.2002 01.04.2003 01.04.2004

    Apache Oozie 03.02.2000 03.02.2002 4.0.0

    Apache HCatalog 0.4.1 Merged with Hive Merged with Hive

    Apache Templeton 0.1.4 Merged with Hive Merged with Hive

    Ambari API v1.0 API v1.0

    SQL Server JDBC Driver 3 No Information No Information

    Source: http://www.windowsazure.com/en-us/documentation/articles/hdinsight-component-versioning/

    http://www.windowsazure.com/en-us/documentation/articles/hdinsight-component-versioning/

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    5

    Hadoop Zoo

  • 2013 © Trivadis

    Windows Azure Blob StorageHDFS

    Hadoop Filesystem Interface

    Query &

    Metadata:

    Data

    Movement: Workflow: Monitoring:

    Windows Azure HDInsight Service

    04.03.2014 HDInsight in Windows Azure R 1.00

    6

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    Windows Azure

    Blob Storage

    7

  • 2013 © Trivadis

    Windows Azure HDInsight Service

    04.03.2014 HDInsight in Windows Azure R 1.00

    8

  • 2013 © Trivadis

    Windows Azure HDInsight Service

    04.03.2014 HDInsight in Windows Azure R 1.00

    9

    Focus

  • 2013 © Trivadis

    Windows Azure Storage

     Scalable, durable, and available

     Anywhere at anytime access

     Only pay for what the service uses

     Use from Windows Azure Compute

     Use from anywhere on the internet

    04.03.2014 HDInsight in Windows Azure R 1.00

    10

  • 2013 © Trivadis

    Northern Europe

    Western Europe South Central US

    West US East US

    Datacenters and Regions

    04.03.2014 HDInsight in Windows Azure R 1.00

    11

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    12

    • Higher durability • 3 local replicas in primary location

    • Local replicas – synchronously replicated

    • Common failures (disk, node, rack) – use local copies to recover

    • Major disasters – contact customer about potential data loss

    • Reduced Price – 23-34% based on how much you store

    • Turn off Geo for your storage account in portal • Non-critical data that can be recreated on major

    disasters

    • Application manages its own replica

    • Companies have limitations on geo locations

    • Highest level of durability • 3 local replicas each in primary and secondary

    locations

    • Local replicas – synchronously replicated

    • Geo replica – asynchronously replicated

    • Common failures (disk, node, rack) – use local copies to recover

    • Major disasters – use geo replicated copy (400+ miles apart)

    • Price remains the same as before

    • Enabled by default

  • 2013 © Trivadis

    Blob Storage Concepts

    • Store large amounts of unstructured text or binary data with the fastest read performance

    • Highly scalable, durable, and available file system

    • Blobs can be exposed publically over HTTP

    • Securely lock down permissions to blobs

    04.03.2014 HDInsight in Windows Azure R 1.00

    13

  • 2013 © Trivadis

    Azure Blob storage

    Setting up the Windows Azure storage account

    Azure Portal

    04.03.2014 HDInsight in Windows Azure R 1.00

    14

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    15

    Setup new Storage

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    16

    Move Data to Azure Blob Storage

    Azure Blob storage

    Set-AzureStorageBlobContent

    -File "C:...\2011\Weather2011_H1_JustData.csv"

    -Container $containername

    -Blob "FlightDelay/.../2011/Weather2011_H1_JustData.csv"

    -context $context

    Power Shell

    Tool like CloudBerry

    Drag&Drop

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    Windows Azure

    HDInsight Service

    17

  • 2013 © Trivadis

    Windows Azure HDInsight Service

    04.03.2014 HDInsight in Windows Azure R 1.00

    18

  • 2013 © Trivadis

    Setting up the Windows Azure HDInsight cluster

    Windows Azure HDInsight

    Azure Blob storage

    HDInsight Console

    04.03.2014 HDInsight in Windows Azure R 1.00

    19

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    20

    Setup new Cluster

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    21

    Provision Cluster via PowerShell

    # Create a new HDInsight cluster

    $config = New-AzureHDInsightClusterConfig -ClusterSizeInNodes $clusterNodes `

    | Set-AzureHDInsightDefaultStorage `

    -StorageAccountName "$storageAccountName_Default.blob.core.windows.net" `

    -StorageAccountKey $storageAccountKey_Default `

    -StorageContainerName $containerName_Default `

    | Add-AzureHDInsightMetastore `

    -SqlAzureServerName "$hiveSQLDatabaseServerName.database.windows.net" `

    -DatabaseName $hiveSQLDatabaseName `

    -Credential $hiveCreds `

    -MetastoreType HiveMetastore `

    | New-AzureHDInsightCluster `

    -Version "3.0" `

    -Name $clusterName `

    -Location $location `

    -Credential $clusterCreds

  • 2013 © Trivadis

    04.03.2014 HDInsight in Windows Azure R 1.00

    Map Reduce

    22

  • 2013 © Trivadis

    Hadoop MapReduce

    • Programming framework (library and runtime) for analyzing datasets stored in HDFS

    • Composed of user- supplied Map and Reduce functions: • Map() - subdivide and

    conquer

    • Reduce() - combine and reduce cardinality

    1. Divide a large problem into sub-problems.

    ………

    2. Perform the same function on all sub-problems.

    Do work()

    3. Combine the output from all sub-functions.

    Do work() Do work()

    04.03.2014 HDInsight in Windows Azure R 1.00

    23

  • 2013 © Trivadis

    MapReduce

    • Rapidly process vast amounts of data in parallel, on a large cluster of compute nodes

    • Framework schedules and monitors tasks, and re-executes failed tasks

    • Typically, both input and output are stored in file system

    DataNode 1

    Mapper

    Data is shuffled

    across the network

    and sorted

    Map Phase Shuffle/Sort Reduce Phase

    DataNode 2

    Mapper

    DataNode 3

    Mapper

    DataNode 1

    Reducer

    DataNode 2

    DataNode 3

    Reducer

    04.03.2014 HDInsight in Windows Azure R 1.00

    24

  • 2013 © Trivadis

    Layout Windspeed Calculation

    StationID Date Windspeed

    123 22.01.2012 31

    124 22.01.2012 34

    125 22.01.2012 22

    126 22.01.2012 12

    123 23.01.2012 26

    124 23.01.2012 29

    125 23.01.2012 46

    126 23.01.2012 12

    StationID Date Windspeed

    123 23.01.2012 26

    124 23.01.2012 29

    125 23.01.2012 46

    126 23.01.2012 12

    Compute Node 1 Compute Node 2

    StationID Date Windspeed

    123 22.01.2012 31

    124 22.01.2012 34

    125 22.01.2012 22

    126 22.01.2012 12

    04.03.2014 HDInsight in Windows Azure R 1.00

    25

  • 2013 © Trivadis

    Layout Windspeed Calculation

    StationID Date Windspeed

    123 23.01.2012 26

    124 23.01.2012 29

    125 23.01.2012 46

    126 23.01.2012 12

    Data Node 1 Data Node 2

    StationID Date Windspeed

    123 22.01.2012 31

    124 22.01.2012 34

    125 22.01.2012 22

    126 22.01.2012 12

    Key Value

    Max 34

    Key Value

    Max 46

    Map

    Key Value

    Max 46

    Reduce

    04.03.2014 HDInsight in Windows Azure R 1.00

    26

  • 2013 © Trivadis

    Hadoop Streaming Process

    04.03.2014 HDInsight in Windows Azure R 1.00

    27

  • 2013 ©