34
Democratizing Big Data with Microsoft Azure HDInsight Saptak Sen Solution EngineeringManager Hortonworks @saptak Nishant Thacker Technical Product Manager – Big Data Microsoft @nishantthacker

Democratizing Big Data with Microsoft Azure HDInsight

Embed Size (px)

Citation preview

Page 1: Democratizing Big Data with Microsoft Azure HDInsight

DemocratizingBigDatawithMicrosoftAzureHDInsight

SaptakSenSolutionEngineeringManagerHortonworks@saptak

NishantThackerTechnicalProductManager–BigDataMicrosoft@nishantthacker

Page 2: Democratizing Big Data with Microsoft Azure HDInsight

Hortonworks+Microsoft:TogetherSince2012

"AtHortonworkswehaveseenmoreandmoreHadooprelatedworkloadsandapplicationsmovetothecloud.StartinginHDP2.6,weareadoptinga“CloudFirst”strategyinwhichourplatformwillbeavailableonourcloudplatforms–AzureHDInsightatthesametimeorevenbeforeitisavailableontraditionalon-premisessettings.With thisinmind,weareveryexcited thatMicrosoftandHortonworkswillempowerAzureHDInsightcustomerstobethefirsttobenefitfromourHDP2.6innovationinthenearfuture."- Arun Murthy,co-founder,Hortonworks(February,2017)

“Operatingafullymanagedcloudservice likeAzureHDInsight,whichisbackedbyanenterprisegradeSLA,requiresthatwecandeploythelatestbitsofHadoop&ApacheSparkondemand.Tothatend,weareexcited thatthelatestHortonworksDataPlatform2.6willbecontinuouslyavailable toAzureHDInsightevenbeforeitson-premise release.Hortonworks’commitment tobeingcloudfirstisespecially significantgiventhegrowingimportanceofcloudwithHadoopandSparkworkloads.”- DharmaShukla,DistinguishedEngineerandGeneral ManageratMicrosoft.(February,2017)

Page 3: Democratizing Big Data with Microsoft Azure HDInsight

BigDataintheCloud

3

Page 4: Democratizing Big Data with Microsoft Azure HDInsight

BigDataintheCloud

4

Page 5: Democratizing Big Data with Microsoft Azure HDInsight

TraditionalClusters

5

Page 6: Democratizing Big Data with Microsoft Azure HDInsight

Challengeswithimplementingclusters

Page 7: Democratizing Big Data with Microsoft Azure HDInsight

HadoopClustersintheCloud

7

Page 8: Democratizing Big Data with Microsoft Azure HDInsight

WhyHadoopinthecloud?

Page 9: Democratizing Big Data with Microsoft Azure HDInsight

Distributed Storage• Filessplitacrossstorage• Filesreplicated

• Nearestnoderesponds• AbstractedAdministration

Hadoop/SparkClusters

Extensible• APIstoextendfunctionality• Addnewcapabilities• Allowforinclusionincustomenvironments

Automated Failover• Unmonitoredfailovertoreplicateddata• Builtforresiliency• Metadatastoredforlaterretrieval

Hyper-Scale• Addresourcesasdesired• Builttoincludecommodityconfigs• Directcorrelationofperformanceandresources

Distributed Compute• Distributedprocessing• ResourceUtilization• Cost-Efficientmethodcalls

9

Page 10: Democratizing Big Data with Microsoft Azure HDInsight

Distributed Storage• Filessplitacrossstorage• Filesreplicated

• Nearestnoderesponds• AbstractedAdministration

Cloud

Extensible• APIstoextendfunctionality• Addnewcapabilities• Allowforinclusionincustomenvironments

Automated Failover• Unmonitoredfailovertoreplicateddata• Builtforresiliency• Metadatastoredforlaterretrieval

Hyper-Scale• Addresourcesasdesired• Builttoincludecommodityconfigs• Directcorrelationofperformanceandresources

Distributed Compute• Distributedprocessing• ResourceUtilization• Cost-Efficientmethodcalls

10

Page 11: Democratizing Big Data with Microsoft Azure HDInsight

Distributed Storage• Filessplitacrossstorage• Filesreplicated

• Nearestnoderesponds• AbstractedAdministration

BigDataintheCloud

Extensible• APIstoextendfunctionality• Addnewcapabilities• Allowforinclusionincustomenvironments

Automated Failover• Unmonitoredfailovertoreplicateddata• Builtforresiliency• Metadatastoredforlaterretrieval

Hyper-Scale• Addresourcesasdesired• Builttoincludecommodityconfigs• Directcorrelationofperformanceandresources

Distributed Compute• Distributedprocessing• ResourceUtilization• Cost-Efficientmethodcalls

11

Page 12: Democratizing Big Data with Microsoft Azure HDInsight

HDInsightProvidesPurpose-builtClusterTypesClusterType Components

Hadoop HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics,Slider

HBase HDFS,MapReduce2,YARN,Tez,Hive,HBase, PhoenixQueryServer,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics

Storm HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Storm,Ambari Metrics,Kafka,

Spark HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics, Spark,Zeppelin, Livy

InteractiveHive HDFS,MapReduce2,YARN,Tez,Hive2LLAP,Pig,Sqoop,Oozie,Zookeeper,AmbariMetrics,Slider

RServer HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics, Spark,Livy

Kafka HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics,Kafka

• ComponentsmarkedinREDarethecomponentsthatdrivetheclustertypeusecase

• SparkclustersalsohaveJupyter installed• AllclusterscomeHAenabledbydefault

Page 13: Democratizing Big Data with Microsoft Azure HDInsight

BigDataintheCloud

13

Page 14: Democratizing Big Data with Microsoft Azure HDInsight

BigDataintheCloud- Options

Page 15: Democratizing Big Data with Microsoft Azure HDInsight

Scenariosfordeployingashybrid

Page 16: Democratizing Big Data with Microsoft Azure HDInsight

TraditionalClusters– OnPrem

16

HadoopCluster

WorkerNode

HDFSHDFS HDFS

Tasks Tasks Tasks Tasks Tasks Tasks

TaskTracker

MasterNode

Client

Job(jar)file

Job(jar)file

Page 17: Democratizing Big Data with Microsoft Azure HDInsight

ClustersintheCloud

Page 18: Democratizing Big Data with Microsoft Azure HDInsight

AzureHDInsightHadoopandSparkasaServiceonAzure

FullymanagedHadoopandSparkforthecloud

100%OpenSourceHortonworksDataPlatform

Clustersupandrunninginminutes

Managed,monitoredandsupportedbyMicrosoftwiththeindustry’sbestenterpriseSLA

UsefamiliarBItoolsforanalysis,oropensourcenotebooksforinteractivedatascience

63%lowertotalcostofownershipthandeployyourownHadoopon-premises*

*IDCstudy“TheBusinessValueandTCOAdvantageofApacheHadoopintheCloudwithMicrosoftAzureHDInsight”

Page 19: Democratizing Big Data with Microsoft Azure HDInsight

HDInsightCluster

AzureDataLakeStorage

HDInsightcluster

Domaincredentials

AzureStorageBlob

Headnode

Back-up

Datanode

Page 20: Democratizing Big Data with Microsoft Azure HDInsight

HDInsightClusterSecurity

AADtenantAzureVNETtoVNETpeering

HDInsightCluster

AzureDataLakeStorage

Domaincredentials

AzureStorageBlob

Headnode

Back-up

Datanode

Page 21: Democratizing Big Data with Microsoft Azure HDInsight

Decoupling- Benefits

Page 22: Democratizing Big Data with Microsoft Azure HDInsight

What’sNewinHDInsight3.6• HDInsight3.6GAannouncedduringDataWorksSummitMunich

• “HDInsight3.6hasthelatestHortonworksDataPlatform(HDP)2.6platform,acollaborativeeffortbetweenMicrosoftandHortonworkstobringHDPtomarketcloud-first. ”

• https://azure.microsoft.com/en-us/blog/announcing-general-availability-of-azure-hdinsight-3-6/

Page 23: Democratizing Big Data with Microsoft Azure HDInsight

What’sNewinHDInsight3.6

• InteractiveHiveimprovements• Spark2.1GA*• ZeppelinaddedtoSparkClusterType• Improvedclustercreationtime

*GAmeansclustersarebackedbyAzureSLA

Page 24: Democratizing Big Data with Microsoft Azure HDInsight

BigDataintheCloud

24

Page 25: Democratizing Big Data with Microsoft Azure HDInsight

25

BigDataApplicationArchitecture

Page 26: Democratizing Big Data with Microsoft Azure HDInsight

TheAzureArchitectureSourceA

SourceB

SourceC

DataFactory

AzureDataLakeStore

SourceD

Powershell

StreamAnalytics

HDInsight

AzureDataLakeAnalytics

AzureSQLDataWarehouse

AzureAnalysisServices

Ingestion Backend Frontend

PushStream

DAX

T-SQL

H iveQL

Analyst

Analyst

Analyst

Analyst

Page 27: Democratizing Big Data with Microsoft Azure HDInsight

TheAzureArchitecture- Detailed

27

Page 28: Democratizing Big Data with Microsoft Azure HDInsight

Example:BigDatainTelcoTelarix usesbigdatatohelpmaintaincallquality

“Carriersaregoingtocreatenewwirelessapplicationsandofferings—voice,video,MMS,orwhateverthenextgreat

applicationis—andourcustomers’networksneedtobeabletosupport this.”

VicBozzo,SeniorVPofWorldwide SalesandMarketing

Scenario

Telarix helps telecommunications carriersworldwidemaintaincallquality,managecosts,andstreamlinetraffic.Telarix’s suitehandles trafficandqualitymanagement,trading,routing,billing, andsettlementformorethan300billion voice,SMS,content,anddataminuteseachyear.

SolutionTelarix used SQLServerandAzureHDInsightwiththeabilitytoanalyzelargevolumesofstructuredandunstructureddatainrealtime.

Result

• KeepupwithCarrierswhoarecreatingnewwirelessapplications andofferings, suchasvoice,video,MMS.Telarixwillprovidethesecarriersthesamebusiness processtotrade,route,settle,manage,invoice, bill, andcollect,acrossalloftheirservices

Page 29: Democratizing Big Data with Microsoft Azure HDInsight

Linkury usesbigdatatomakeonlinecontentdiscoveryprofitableforsearchandsocialengines,publishers,andmarketers

Scenario

Linkury isatooltohelpmonetizationoftheonlineadvertisingmarket. Theyneeded toanalyzehundreds ofmillions ofwebtrafficeventseachdaytohelpbuild targetedadvertising basedoncustomerbehavior

Solution AzureHDInsight (Hadoop-as-a-service) with StormforHDInsighttoanalyzereal-timedatainHadoop.

Result

• Linkury nowcaptureshundreds ofmillions ofwebtrafficeventsinreal-timeincluding howusersbrowse/actions,interactwiththedevice,products, etc.todisplay targetedonline advertisements.

• Cannowshowadvertisingeffectiveness throughthirdpartyBItools thatshow keymetrics

“Wehadgainedalotoftraffic,butwecouldn’treallymanageandanalyzethedatainrealtime.Nowwehaveregained

control,whichmeans,forexample,thatwecanspendmoretimeanalyzingfraudoradcampaignsthatareperforming

poorly”

KobiEldar,CTO

Example:BigDatausedfortargetedcustomeradvertisement

Page 30: Democratizing Big Data with Microsoft Azure HDInsight

Example:BigDatausedforconnectedcarsDelphiAutomotiveusesbigdataforcarownerstokeeptabsontheircars

“WithDelphiConnect,carownerscanfindouthowclosetohometheirspouse issotheycanputthefinishingtouchesondinner.Theycankeeptabsonteenagedriversbysettingupgeo-fences.Ifthecargoesoutsideofageo-fenceordrivesfasterthanaspecifiedspeedlimit,momordadreceivesan

emailortextmessage.”

VictorCanseco,ManagingDirector

Scenario

Delphiis aleadingglobalsupplier oftechnologies fortheautomotiveindustry, introducedDelphiConnect, anafter-marketconnected-carproductthatletsdriversdigitallyinteractwiththeircarsthroughsmartphones, tablets,andPCs.

Solution

AzureHDInsightandSQLServerinanInternetofThings (IoT)scenarioforcapturingandanalyzingdatafromcars(vehiclediagnostics, geo-fencing, geo-location,mileagetracking,bluetooth). AlsouseAzureServiceBus,andSQLDatabasetounderstand geo-fencingaroundamap.

Result

• Driverscannowunderstand informationontheircarslikehowtheyweredriven,wheytheyparked,routetheytook,duration,andmileage.Theyalsoget real-timeinformation onwhatotherdriversaredoingwiththeircar.

Page 31: Democratizing Big Data with Microsoft Azure HDInsight

Summary

31

Page 32: Democratizing Big Data with Microsoft Azure HDInsight

CalltoAction

Pointstoremember

CONNECT• Contacts:

[email protected]• DocsandForums:

• https://docs.microsoft.com/en-us/azure/hdinsight/

• https://azure.microsoft.com/en-us/support/forums/

Connectandvoiceyourcustomers’opinion

RampuponournewservicesNOW!!

32

EVOLVE• Knowmore

• http://www.microsoft.com/hdinsight• LeveragefreetrialonAzure

• https://azure.microsoft.com/en-us/free/

• TryHortonworksSandboxonAzure• http://hortonworks.com/sandbox

LEARN• http://learnanalytics.microsoft.com/• Trainingson

• SparkinAzureHDInsight• AzureHDInsightAdministrationand

Security• RServeronAzureHDInsight

Page 33: Democratizing Big Data with Microsoft Azure HDInsight
Page 34: Democratizing Big Data with Microsoft Azure HDInsight

©2016MicrosoftCorporation.Allrightsreserved.