Upload
phungcong
View
222
Download
2
Embed Size (px)
Citation preview
Big Data BriefingHow to implement Big Data on Windows Azure
Big Data; running cheaply and efficiently on Windows Azure to discover business value from data you already hold about your business.
Introduction
AgendaWindows Azure IntroHDInsight IntroArchitectureBI and VisualisationsCase Studies
WhoAndy Cross, Director of Microsoft Partner Elastacloud, Windows Azure MVP, Big Data specialist and international speaker.
Time until next coffee 55:00
30 minutes30 minutes30 minutes15 minutes15 minutes
Find out more at http://www.elastacloud.com
WINDOWS AZUREIntroduction to
Today’s Transformation – Cloud
Usage BasedElasticSelf-ServicePooled Resources
Economics ▪ Agility ▪ Focus
PaaS SaaS
IT Continuum
Evolution toward highly-virtual and beyond to cloud
Physical Virtual / Private IaaS
Microsoft On The Internet
One of 3rd world’s largest private networks
1.0
1.0
400m active accounts
More than 3 billion queries per month
1 billion+ authentications per day
750 million unique visitors at any given time
Microsoft account
Global Foundation Services
The Microsoft Cloud~Globally Distributed Data Centers
Quincy, WA Chicago, IL San Antonio, TX Dublin, Ireland Generation 4 DCs
Inside a Microsoft Data Center
Cloud Computing Patterns
tCom
pute
InactivityPeriod
On and Off – Start/End SemesterOn & off workloads (e.g. batch job)Over provisioned capacity is wasted Time to market can be cumbersome
t
Unpredictable Bursting – Web demandUnexpected/unplanned peak in demand Sudden spike impacts performance Can’t over provision for extreme cases Co
mpu
te
t
Predictable Burst – RegistrationServices with micro seasonality trends
Peaks due to periodic increased demandIT complexity and wasted capacity
Com
pute
t
Growing Fast – Research ProjectSuccessful services needs to grow/scale Keeping up w/ growth is big IT challenge Cannot provision hardware fast enoughCo
mpu
te
Applicationbuilding blocks
StorageBig data
Caching
CDN
Database
Identity
Media
Messaging
Networking
Traffic
Big Data Technologies
HDInsight SummaryGA, recent preview of Hadoop 2Managed Hadoop on Windows AzureFamiliar tools such as Hive, Pig, OozieAdditional BoB Microsoft ecosystem tooling with .net SDK
Powershell and .net for provisionExecution with .net and powershell for Hive
Paired with Hortonworks HDP for on-premises Hadoop; compatible with all major Hadoop implementationsCombined with Excel and traditional Microsoft BI stack for compelling solutions
PaaS
Position in Cloud
Centralised Resources
State – a Windows Azure SQL DB stores metadata to allow identical clusters to be spawned
Data – all your data resides on Windows Azure Blob Storage; resilient, secure and cost effective
Logs – machine state and cluster uptime is stored in Windows Azure Table Storage
Provision hundreds of machines against your data; gain understanding rapidly
Speed to understanding500 servers.
20 minutes.
40,000,000,000 data points
Buy
the
Look
Drive Investment
Exploring Sentiment, Social and Metadata Graphs
Person Product
Person
Product
Owns
Owns
Alike
Alike?
Social Media
Microsoft Big Data Solution
Power View Excel with PowerPivot Embedded BIPredictive Analytics
APPsLOBCRMERP
Microsoft EDW
SSAS SSRS
Devices CrawlersSensors Bots
Hadoop On Windows Server
Hadoop On Windows Azure
Microsoft Offerings
Hive ODBC Driver & Hive Add-in for ExcelIntegration with Microsoft PowerPivot
Hadoop based distribution for Windows Server & AzureStrategic Partnership with Hortonworks
JavaScript framework for HadoopRTM of Hadoop connectors for SQL Server and PDW
Hadoop on WindowsInsights to all users by activating new types of data
Integrate with Microsoft Business Intelligence
Choice of deployment on Windows Server + Windows AzureIntegrate with Windows Components (AD, Systems Center)Easy installation and configuration of Hadoop on WindowsSimplified programming with . Net & Javascript integration Integrate with SQL Server Data Warehousing
Diffe
rent
iatio
n
Microsoft Big Data RoadmapTo accelerate the delivery of Microsoft’s Hadoop based solution for Windows Server and service for Windows Azure, Microsoft is announcing a partnership with HortonworksMicrosoft is committed to broadening accessibility and usage of Hadoop to end users, developers and IT professionals in organizations of all sizes
Microsoft is announcing an end-to-end roadmap for Big Data that embraces Apache HadoopTM by distributing enterprise class Hadoop based solutions on both Windows Server and Windows Azure
Microsoft is extending its leadership in business intelligence and data warehousing to provide insights to all users by activating new types of data of any size
Microsoft’s value addsWhy Elastacloud recommend Microsoft’s platform
Skill reuse
Express elegant solutions in C#Familiar Unit Testing patternsConcise programmatic terseness
Your existing development team can immediately
realise value
The frameworks
facilitate deterministic
testing for highly reliable
queries
Complex logic is best expressed in programmatic
form
Commoditised query
Provision
Execute
De-provision
Valu
e
Action Cost
Value of query
Time
Cost
HDInsight in Azure
Hybrid Compatibility
Hadoop On Premises
Name=Andy Pnid=123456
123456 4712
Microsoft are the only vendor with enterprise on premises and cloud big data offerings
Just as Big Data is more than Hadoop; Windows Azure is more than HDInsight
HDPSparkStorm
Elastacloud provisioned a bespoke 500 core virtual cluster on Windows Azure, allowing us to cut down a compute intensive task from 20 days over 8 cores to under 12 hours over 500 cores.
Jedidiah Francis, Data Scientist, ASOS
Architectural ConsiderationsSolving Big Data challenges cohesively
The first challenge you face with Big Data is somewhat the hardest
Elastacloud’s tooling aims to solve these challenges
Ingress, Compute, Egress automated; allowing focus on providing value
Workflows Big Data solutions are rarely the result of a single piece of code running; instead jobs transform data into intermediate domains before completing an overall piece of work
The overall architectural challenge of Big Data is that just as Data can vary, so must architectures.
Batch
Interactive Real TimeHadoop can operate with O(n) over Petabytes of data
Drill, Stinger and Tez bring must quicker querying
Storm allows enormous scalable throughput
Batc
hSe
rvin
gSp
eed
Static Data
Data Stream
Batch Process
Real Time
Operational Data Store
Precompute
Increment
User Query
Technologies possible? Any.Batch Interactive Real Time
Hadoop SQL / MySQL Storm
Spark Hadoop 2 / Tez / Drill Spark
Lucene Mongo
Cassandra
Windows Azure Table Storage
Spark
Batc
hSe
rvin
gSp
eed
Static Data
Data Stream
HDInsight
Storm
SQL/NoSQL Database
Precompute
Increment
User Query
Windows Azure Service Bus
Blob Persist
Elastacloud Service Bus
Spout
Moving Mediatonic to Windows Azure
AWS didn’t work• Mediatonic are a leading UK games company• They had deployed an analytics platform to AWS• They required quite a bit of manpower to maintain their system
operationally • Their biggest cost was an Amazon Redshift data warehouse which was
a bottleneck in their design• Their system was built so that if a server went down whilst collecting
gaming messages they could lose 15 minutes of data• They found the “Big Data” part so complex that only highly-paid
specialists could improve the system
How Windows Azure Helped• A joint collaboration between Microsoft, Elastacloud and Mediatonic
migrated their Game Fuel platform to Windows Azure
• With the Windows Azure Service Bus we were able to ensure that their messaging was continuous and not discrete so no messages were ever lost
• With HDInsight we were able to teach their staff how to build “Big Data” applications
• We were able to deliver a system with unique components of Windows Azure, removing the expensive data warehousing bottleneck but delivering the same functionality
“I can’t believe that I could get up and running with HDInsight doing real work in 30 minutes. It took me days to do the same thing with Elastic Map Reduce on Amazon”
Adam Fletcher – Gamefuel Project Lead
Buy
the
Look
Initial State
Traditional Analytics
Limited
Traditional Analytics allows for the tracing of single actions on a website; who added a product to a bag, where did they come from, where did they go?
Buy
the
Look
Advancement Cross Sell
Improvements possible
Beyond immediacy is the ongoing workflow of a system. Who added items to a basket after using the Buy the Look functionality. However, this lacks comprehension of customer behaviour.
Cutting edgeWho bought this?
Cutting edge – understand your customerPu
rcha
se p
rope
nsity
Time
January February
Proving through Big Data technologies that engaged customers will return through engagement following life events such as pay-days to continue the engagement. Building business on this basis.