Microsoft Big Data Essentials Module 1 - Introduction to Big Data Saptak Sen, Microsoft Bill Ramos,...

Preview:

Citation preview

Microsoft Big Data EssentialsModule 1 - Introduction to Big Data

Saptak Sen, MicrosoftBill Ramos, Advaiya

• Why Big Data?

• Big Data Lambda Architecture

• Getting started with Windows Azure HDInsight Service

Agenda

The Business Imperative

1. 2. 4. 3. Human Fault Tolerance

Minimize CapEx Low Learning CurveHyper Scale on Demand

CAP Theorem

Consistency

C

Partition Tolerance

PAvailabili

ty

A

Big Data Lambda Architecture

Big Data Lambda Architecture

• Batch layer• Stores master dataset• Compute arbitrary views

• Speed layer• Fast, incremental algorithms• Batch layer eventually

overrides speed layer

• Serving layer• Random access to batch

views• Updated by batch layer

Serving Layer

Speed Layer

Batch Layer

The Batch Layer

• Stores master dataset (in append mode)

• Unrestrained computation

• Horizontally scalable

• High latency

Incoming data

streams

Master dataset

Batch views

The Speed Layer

• Stream processing of data

• Stores a limited window of data

• Dynamic computation

Real-time increments

Incoming data

streams

Process stream

Increment views

Real-time views

The Serving Layer

• Queries the batch and real-time views

• Merges the resultsReal-time views

Batch views

Querying and

mergingOutput

Microsoft Lambda Architecture Support Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsight

Azure Blob storage

MapReduce, Hive, Pig, Oozie, SSIS

Federations in Windows Azure SQL Database

Azure tables

Memcached/MongoDB

SQL Server database engine

SQL Server VM:

• Columnstore indexes

• Analysis Services

• StreamInsight

Azure Storage Explorer

Microsoft Excel

Power Query

PowerPivot

Power View

Power Map

Reporting Services

LINQ to Hive

Analysis Services

Serving LayerSpeed LayerBatch Layer

Apache Hadoop

Yahoo!

SQL Server Analysis Service (SSAS)

Microsoft Excel and PowerPivot

Other BI Tools and Custom Applications

Hadoop Data

Third Party Database

SQL Server Analysis Services

(SSAS Cube)

+Custom

Applications

SQL Server Connector (Hadoop Hive ODBC)

Staging Database

Microsoft Excel & PowerPivot for

Excel

Serving LayerSpeed LayerBatch Layer

Windows Azure HDInsight

Ferranti Computer Systems

Microsoft Dynamics AX

SQL Server Analysis Services

SQL Server Reporting Services

SQL Server (In-Memory OLTP)

Data Feed from Smart Meters

Reactive Extensions (Rx)

SQL Server Database (In-Memory OLTP)

Reactive Extensions (Rx)

Windows Azure

HDInsight

SQL Server Analysis Services

SQL Server ReportingServices

Microsoft Dynamics

AX

Windows Azure Storage

Serving LayerSpeed LayerBatch Layer

Azure Blob storage

Windows AzureBlob storage

Demo 1: Setting up the Windows Azure storage account

Azure Storage Explorer

Azure Storage Explorer

Blob Storage Concepts

• Store large amounts of unstructured text or binary data with the fastest read performance

• Highly scalable, durable, and available file system

• Blobs can be exposed publically over HTTP

• Securely lock down permissions to blobs

BlobContainer

Account

Images

PIC01.JPG

Video

VID1.AVI

http://<account>.blob.core.windows.net/<container>/<blobname>

Pages/Blocks

Block/Page

Block/Page

PIC02.JPGContoso

Getting started with HDInsight Service

Demo 2: Setting up the Windows Azure HDInsight cluster

Windows Azure HDInsight

Azure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

Demo 3: Loading data into Windows Azure storage for use with HDInsight

Windows Azure HDInsight

Azure Blob storage

Windows AzureHDInsight

Windows AzureBlob storage

HDInsight Console

HDInsight Console

https://<ClusterName>.azurehdinsight.net/

Serving LayerSpeed LayerBatch Layer

CSV files from local disk

Easy Access to Data, Big & Small

Easy Access to Data, Big & SmallSimplify access to public & corporate data

Easily preview, shape, & format your data

Combine and refine data across multiple sources

Gain insight across relational, unstructured, & semi-structured data

Common management of structured & unstructured data

Query across relational DB & Hadoop with single T-SQL Query

Power Query

Windows Azure Marketplace

Windows Azure HDInsight Service

Parallel Data Warehouse with Polybase

Questions?

Recommended