Democratizing Big Data with Microsoft Azure HDInsight

Preview:

Citation preview

DemocratizingBigDatawithMicrosoftAzureHDInsight

SaptakSenSolutionEngineeringManagerHortonworks@saptak

NishantThackerTechnicalProductManager–BigDataMicrosoft@nishantthacker

Hortonworks+Microsoft:TogetherSince2012

"AtHortonworkswehaveseenmoreandmoreHadooprelatedworkloadsandapplicationsmovetothecloud.StartinginHDP2.6,weareadoptinga“CloudFirst”strategyinwhichourplatformwillbeavailableonourcloudplatforms–AzureHDInsightatthesametimeorevenbeforeitisavailableontraditionalon-premisessettings.With thisinmind,weareveryexcited thatMicrosoftandHortonworkswillempowerAzureHDInsightcustomerstobethefirsttobenefitfromourHDP2.6innovationinthenearfuture."- Arun Murthy,co-founder,Hortonworks(February,2017)

“Operatingafullymanagedcloudservice likeAzureHDInsight,whichisbackedbyanenterprisegradeSLA,requiresthatwecandeploythelatestbitsofHadoop&ApacheSparkondemand.Tothatend,weareexcited thatthelatestHortonworksDataPlatform2.6willbecontinuouslyavailable toAzureHDInsightevenbeforeitson-premise release.Hortonworks’commitment tobeingcloudfirstisespecially significantgiventhegrowingimportanceofcloudwithHadoopandSparkworkloads.”- DharmaShukla,DistinguishedEngineerandGeneral ManageratMicrosoft.(February,2017)

BigDataintheCloud

3

BigDataintheCloud

4

TraditionalClusters

5

Challengeswithimplementingclusters

HadoopClustersintheCloud

7

WhyHadoopinthecloud?

Distributed Storage• Filessplitacrossstorage• Filesreplicated

• Nearestnoderesponds• AbstractedAdministration

Hadoop/SparkClusters

Extensible• APIstoextendfunctionality• Addnewcapabilities• Allowforinclusionincustomenvironments

Automated Failover• Unmonitoredfailovertoreplicateddata• Builtforresiliency• Metadatastoredforlaterretrieval

Hyper-Scale• Addresourcesasdesired• Builttoincludecommodityconfigs• Directcorrelationofperformanceandresources

Distributed Compute• Distributedprocessing• ResourceUtilization• Cost-Efficientmethodcalls

9

Distributed Storage• Filessplitacrossstorage• Filesreplicated

• Nearestnoderesponds• AbstractedAdministration

Cloud

Extensible• APIstoextendfunctionality• Addnewcapabilities• Allowforinclusionincustomenvironments

Automated Failover• Unmonitoredfailovertoreplicateddata• Builtforresiliency• Metadatastoredforlaterretrieval

Hyper-Scale• Addresourcesasdesired• Builttoincludecommodityconfigs• Directcorrelationofperformanceandresources

Distributed Compute• Distributedprocessing• ResourceUtilization• Cost-Efficientmethodcalls

10

Distributed Storage• Filessplitacrossstorage• Filesreplicated

• Nearestnoderesponds• AbstractedAdministration

BigDataintheCloud

Extensible• APIstoextendfunctionality• Addnewcapabilities• Allowforinclusionincustomenvironments

Automated Failover• Unmonitoredfailovertoreplicateddata• Builtforresiliency• Metadatastoredforlaterretrieval

Hyper-Scale• Addresourcesasdesired• Builttoincludecommodityconfigs• Directcorrelationofperformanceandresources

Distributed Compute• Distributedprocessing• ResourceUtilization• Cost-Efficientmethodcalls

11

HDInsightProvidesPurpose-builtClusterTypesClusterType Components

Hadoop HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics,Slider

HBase HDFS,MapReduce2,YARN,Tez,Hive,HBase, PhoenixQueryServer,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics

Storm HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Storm,Ambari Metrics,Kafka,

Spark HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics, Spark,Zeppelin, Livy

InteractiveHive HDFS,MapReduce2,YARN,Tez,Hive2LLAP,Pig,Sqoop,Oozie,Zookeeper,AmbariMetrics,Slider

RServer HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics, Spark,Livy

Kafka HDFS,MapReduce2,YARN,Tez,Hive,Pig,Sqoop,Oozie,Zookeeper,Ambari Metrics,Kafka

• ComponentsmarkedinREDarethecomponentsthatdrivetheclustertypeusecase

• SparkclustersalsohaveJupyter installed• AllclusterscomeHAenabledbydefault

BigDataintheCloud

13

BigDataintheCloud- Options

Scenariosfordeployingashybrid

TraditionalClusters– OnPrem

16

HadoopCluster

WorkerNode

HDFSHDFS HDFS

Tasks Tasks Tasks Tasks Tasks Tasks

TaskTracker

MasterNode

Client

Job(jar)file

Job(jar)file

ClustersintheCloud

AzureHDInsightHadoopandSparkasaServiceonAzure

FullymanagedHadoopandSparkforthecloud

100%OpenSourceHortonworksDataPlatform

Clustersupandrunninginminutes

Managed,monitoredandsupportedbyMicrosoftwiththeindustry’sbestenterpriseSLA

UsefamiliarBItoolsforanalysis,oropensourcenotebooksforinteractivedatascience

63%lowertotalcostofownershipthandeployyourownHadoopon-premises*

*IDCstudy“TheBusinessValueandTCOAdvantageofApacheHadoopintheCloudwithMicrosoftAzureHDInsight”

HDInsightCluster

AzureDataLakeStorage

HDInsightcluster

Domaincredentials

AzureStorageBlob

Headnode

Back-up

Datanode

HDInsightClusterSecurity

AADtenantAzureVNETtoVNETpeering

HDInsightCluster

AzureDataLakeStorage

Domaincredentials

AzureStorageBlob

Headnode

Back-up

Datanode

Decoupling- Benefits

What’sNewinHDInsight3.6• HDInsight3.6GAannouncedduringDataWorksSummitMunich

• “HDInsight3.6hasthelatestHortonworksDataPlatform(HDP)2.6platform,acollaborativeeffortbetweenMicrosoftandHortonworkstobringHDPtomarketcloud-first. ”

• https://azure.microsoft.com/en-us/blog/announcing-general-availability-of-azure-hdinsight-3-6/

What’sNewinHDInsight3.6

• InteractiveHiveimprovements• Spark2.1GA*• ZeppelinaddedtoSparkClusterType• Improvedclustercreationtime

*GAmeansclustersarebackedbyAzureSLA

BigDataintheCloud

24

25

BigDataApplicationArchitecture

TheAzureArchitectureSourceA

SourceB

SourceC

DataFactory

AzureDataLakeStore

SourceD

Powershell

StreamAnalytics

HDInsight

AzureDataLakeAnalytics

AzureSQLDataWarehouse

AzureAnalysisServices

Ingestion Backend Frontend

PushStream

DAX

T-SQL

H iveQL

Analyst

Analyst

Analyst

Analyst

TheAzureArchitecture- Detailed

27

Example:BigDatainTelcoTelarix usesbigdatatohelpmaintaincallquality

“Carriersaregoingtocreatenewwirelessapplicationsandofferings—voice,video,MMS,orwhateverthenextgreat

applicationis—andourcustomers’networksneedtobeabletosupport this.”

VicBozzo,SeniorVPofWorldwide SalesandMarketing

Scenario

Telarix helps telecommunications carriersworldwidemaintaincallquality,managecosts,andstreamlinetraffic.Telarix’s suitehandles trafficandqualitymanagement,trading,routing,billing, andsettlementformorethan300billion voice,SMS,content,anddataminuteseachyear.

SolutionTelarix used SQLServerandAzureHDInsightwiththeabilitytoanalyzelargevolumesofstructuredandunstructureddatainrealtime.

Result

• KeepupwithCarrierswhoarecreatingnewwirelessapplications andofferings, suchasvoice,video,MMS.Telarixwillprovidethesecarriersthesamebusiness processtotrade,route,settle,manage,invoice, bill, andcollect,acrossalloftheirservices

Linkury usesbigdatatomakeonlinecontentdiscoveryprofitableforsearchandsocialengines,publishers,andmarketers

Scenario

Linkury isatooltohelpmonetizationoftheonlineadvertisingmarket. Theyneeded toanalyzehundreds ofmillions ofwebtrafficeventseachdaytohelpbuild targetedadvertising basedoncustomerbehavior

Solution AzureHDInsight (Hadoop-as-a-service) with StormforHDInsighttoanalyzereal-timedatainHadoop.

Result

• Linkury nowcaptureshundreds ofmillions ofwebtrafficeventsinreal-timeincluding howusersbrowse/actions,interactwiththedevice,products, etc.todisplay targetedonline advertisements.

• Cannowshowadvertisingeffectiveness throughthirdpartyBItools thatshow keymetrics

“Wehadgainedalotoftraffic,butwecouldn’treallymanageandanalyzethedatainrealtime.Nowwehaveregained

control,whichmeans,forexample,thatwecanspendmoretimeanalyzingfraudoradcampaignsthatareperforming

poorly”

KobiEldar,CTO

Example:BigDatausedfortargetedcustomeradvertisement

Example:BigDatausedforconnectedcarsDelphiAutomotiveusesbigdataforcarownerstokeeptabsontheircars

“WithDelphiConnect,carownerscanfindouthowclosetohometheirspouse issotheycanputthefinishingtouchesondinner.Theycankeeptabsonteenagedriversbysettingupgeo-fences.Ifthecargoesoutsideofageo-fenceordrivesfasterthanaspecifiedspeedlimit,momordadreceivesan

emailortextmessage.”

VictorCanseco,ManagingDirector

Scenario

Delphiis aleadingglobalsupplier oftechnologies fortheautomotiveindustry, introducedDelphiConnect, anafter-marketconnected-carproductthatletsdriversdigitallyinteractwiththeircarsthroughsmartphones, tablets,andPCs.

Solution

AzureHDInsightandSQLServerinanInternetofThings (IoT)scenarioforcapturingandanalyzingdatafromcars(vehiclediagnostics, geo-fencing, geo-location,mileagetracking,bluetooth). AlsouseAzureServiceBus,andSQLDatabasetounderstand geo-fencingaroundamap.

Result

• Driverscannowunderstand informationontheircarslikehowtheyweredriven,wheytheyparked,routetheytook,duration,andmileage.Theyalsoget real-timeinformation onwhatotherdriversaredoingwiththeircar.

Summary

31

CalltoAction

Pointstoremember

CONNECT• Contacts:

• sales@hortonworks.com• DocsandForums:

• https://docs.microsoft.com/en-us/azure/hdinsight/

• https://azure.microsoft.com/en-us/support/forums/

Connectandvoiceyourcustomers’opinion

RampuponournewservicesNOW!!

32

EVOLVE• Knowmore

• http://www.microsoft.com/hdinsight• LeveragefreetrialonAzure

• https://azure.microsoft.com/en-us/free/

• TryHortonworksSandboxonAzure• http://hortonworks.com/sandbox

LEARN• http://learnanalytics.microsoft.com/• Trainingson

• SparkinAzureHDInsight• AzureHDInsightAdministrationand

Security• RServeronAzureHDInsight

©2016MicrosoftCorporation.Allrightsreserved.

Recommended