Modernizing Business Intelligence and Analyticscdn.govexec.com/media/ctd_analytic_db_final.pdfUse...

Preview:

Citation preview

1©Cloudera,Inc.Allrightsreserved.

ModernizingBusinessIntelligenceandAnalytics

1©Cloudera, Inc.Allrightsreserved.

JustinEricksonSeniorDirector,ProductManagement

2©Cloudera,Inc.Allrightsreserved.

•WhatbenefitscanIachievefrommodernizingmyanalyticDB?•WhenandhowdoImigratefromcurrentsystems?• Howdoesitworkinthecloud?

Agenda

3©Cloudera,Inc.Allrightsreserved.

EDWOptimization

DataPreparation

Self-ServiceBI&Exploration

UseyourEDWmoreefficientlybyoffloadingworkloadstoHadoop

Fast,flexibleETLoverlargedatavolumes,sodataisalwaysreadyforyourbusiness

Fastesttime-to-insightswithamodernanalyticdatabasedesignedwithHadoop’sflexibilityandagility

KeyApplications

4©Cloudera,Inc.Allrightsreserved.

Cloudera’sAnalyticDatabase

Identify,offload,&optimizeworkloadsto

Hadoop

NavigatorOptimizer

IntelligentSQLeditor

Hue

Audit,lineage,encryption,key

management,&policylifecycles

Navigator

IntegrationwiththeleadingBItools

BIPartners

InteractivequeryengineforBI&SQLanalytics

Impala

Large-scaleETL&batchprocessingengine

Hive-on-Spark

Multi-Storage,Multi-Environment

DataStorageforFast&ChangingData

Kudu

5©Cloudera,Inc.Allrightsreserved.

KeyBenefitsAnanalyticdatabasedesignedforHadoop

High-PerformanceBIandSQLAnalytics

FlexibilityforDataandUseCaseVariety

Cost-effectiveScaleforTodayandTomorrow

GoBeyondSQLwithanOpenArchitecture

6©Cloudera,Inc.Allrightsreserved.

AnalyticDBAnatomyBuiltforself-serviceandhybridcloud

7©Cloudera,Inc.Allrightsreserved.

AnatomyofanAnalyticDatabaseCloudera DecoupledbyDesign

QueryEngine

StorageEngine

Catalog

QueryEngine(Impala)

Catalog(HMS)

MonolithicAnalyticDatabase ModernAnalyticDatabase

Storage(Kudu)

Storage(S3)

Storage(HDFS)

8©Cloudera,Inc.Allrightsreserved.

LimitedtoSQLonly• Maintaindatacopiesfornon-SQL

RigidDataModel• Tightlycoupledstorageandcompute

StaticSizing• Majormaintenancetoaddcapacity/nodes

PoorlyDesignedforCloud• Noelasticityorintegrationwithobjectstorage

PainPointsTraditionalMonolithicAnalyticDatabases

COMPUTESTORE

9©Cloudera,Inc.Allrightsreserved.

Benefits ofCloudera’sModernApproachCloud-Native&On-Premise

GoBeyondSQL• OpenArchitecture:Openformatsandopenstorage

• ShareddataacrossSQLandnon-SQLworkloads

DataFlexibility• Faster,moreagiledataacquisition• Dataportability:Openformatsandopenstorage

Cost-EffectiveScalability• Elasticscaleon-premorinthecloud

• Cloud-nativepay-per-useandtransience

• Provenatbigdatascale

Hybrid• Runsacrossmulti-cloud&on-prem

• Multi-storageoverS3,HDFS,Kudu,Isilon,DSSD,etcSharedData

10©Cloudera,Inc.Allrightsreserved.

EDWOptimizationExpandtheValueofYourDataWarehousingLandscape

11©Cloudera,Inc.Allrightsreserved.

MotivationsforOptimizingtheEDW

CostcontainmentforexistingworkloadsLimitedbudgetforexpansion

UnabletotakeonnewworkloadsUnabletokeepupwithchangingbusinessneeds

Difficultyhandlingbothfixed-SLAreportsandself-serviceexploration

Growingimportanceofself-serviceBI,advancedanalytics,andcloud

$$

12©Cloudera,Inc.Allrightsreserved.

ExistingEDWLandscape

DataSources

ETL/Staging

EDW

Archive

DataMarts

CannedReports

Dashboards/AnalyticApplications

Non-SQLWorkloads

Self-ServiceBI/AdHoc

13©Cloudera,Inc.Allrightsreserved.

OptimizingtheEDWwithCloudera

• Cost-EffectiveScale• Sayyestomorewithouttherisk

• GoBeyondSQL• Exploration,advancedanalytics,andmoreallinoneplatform

•ModernizetheDataWarehouseLandscape• MaximizetheEDWwhileenablingiterative,self-serviceaccess/BI• Well-suitedforon-prem,cloud,andhybriddeployments

90%lessperTBvsRDBMSand75%lessvsNetezza

Augmented itsOracleEDWwithmulti-tenantClouderasystemwiththeirBItoolconfiguredtoallowuserstopullreportsfromboth

MediaResearchFirmSavedtensofmillionsbyoffloadingDBMStoClouderainthecloud

14©Cloudera,Inc.Allrightsreserved.

ModernDataWarehouseEnvironment

DataSources

EDW

AnalyticDatabase

OperationalDatabase

DataScience&Engineering

SharedDataLayer

ModernDataPlatform

FixedReports

Dashboards/AnalyticApplications

Non-SQLWorkloads

Self-ServiceBI/AdHoc

FlexibleReporting

15©Cloudera,Inc.Allrightsreserved.

Plan Offload Optimize

EstimateEffort

RiskAnalysis

SchemaDesign

FineTuningDataModelonHadoop

OptimizeQueriesforPerformance

Test&Validate

Evaluate

IdentifyUseCases

ImpactAnalysis

Objectives PrioritizedPlan

ValidateROI,CostInitialPOC

OffloadeachworkloadEvaluatetheneedforoffload Impactanalysis,prioritizedplan

Optimizeperformance

WorkloadVisibility

NavigatorOptimizerBuilttohelpyouthroughtheoptimizationprocess

OffloadActions

16©Cloudera,Inc.Allrightsreserved.

WorkloadVisibilityGetinsightsintowhat’shappeningtoday

EvaluateQueries• Topqueries• Queryduplication• Querycomplexity• Commonaccesspatterns

EvaluateDataAccess• Toptables,topcolumns• Usage-basedERdiagram• Alltables/columnsinuse

EvaluatePOC• IdentifyinitialworkloadpieceforPoC• Getpartitioningkeysuggestions

Evaluate

17©Cloudera,Inc.Allrightsreserved.

ImpactAnalysis&PrioritizedPlanUnderstandwhatittakestooffload

ImpactAnalysis• Focuseffortsbyidentifyingduplication• Workloadriskassessmentbasedoncomplexityandbestpractices

• Understandquerycompatibility

PrioritizedPlan• Estimateeffort• Identifyeasiestpiecestostartforfastsuccess• Prioritizeworkloadsforoffload

Plan

18©Cloudera,Inc.Allrightsreserved.

PredictableOffloadRemovetheguesswork

Understandoffloadrequirements• Determinemostcommonworkload

patterns• Developdata-/usage-drivenoffload

strategy

Actionablerecommendations• Complexityassessmentforriskierareas• Focuseffortsbyidentifyingduplication• Designrecommendationsforbestresults

Offload

19©Cloudera,Inc.Allrightsreserved.

OptimizingwithinHadoopMaintainpeakperformance

Understandusageandkeepupwithdataneeds• Understandmostcommonusagepatterns• Identifyoptimizationopportunities• Proactivelyadjustdatamodels

Performanceoptimizations• BestpracticeguidanceforHiveandImpala• Queryperformanceoptimization• Increaseplatformadoption

Optimize

20©Cloudera,Inc.Allrightsreserved.

Builtforhybridcloud

21©Cloudera,Inc.Allrightsreserved.

What’sDrivingAnalyticstotheCloud?Bigdatadeploymentsincloudareaccelerating:

● ExecutiveMandate:Minimizeon-premdatacenterfootprint

● IncreasedAgility:End-userself-service

● Elasticity:Optimizeinfrastructureusage

● LowerOverallTCO

22©Cloudera,Inc.Allrightsreserved.

MostOrganizationsAreorWillbeHybridCloud

• 76%willembracehybridcloud(Gartner1)• 82%willhaveamulti-cloudstrategy(RightScale2)• 50%will“repatriate”atleastonepubliccloudworkloadbacktoprivatecloudor

on-prem forcostreasons(4513)• 50%ofCloudera’scloudcustomersrunahybridenvironment

1Gartner,MarketTrends:CloudAdoptionTrendsFavorPublicCloudWithaHybridTwist20152RightScale 2016StateoftheCloudReport3451Research:AWSLambda:newandexciting,oldandrehashed,morevendorlock-in(oralltheabove)?,November22,2016

Whyisthisacriticalstrategy?

Portability&Cost Functionality DataGravity

23©Cloudera,Inc.Allrightsreserved.

Cost-Efficiencies&FlexibilityintheCloudPrimaryAnalyticDatabasePatterns

Onlypayforwhatyouneed,whenyouneedit

▪ Transientclusters▪ Objectstoragecentric▪ Cloud-nativedeployment

ETL

ReduceOperatingCosts NewInsights,NewRevenue

BI/Analytics

Exploreandanalyzealldata,whereveritlives

▪ Long-runningclusters▪ Objectstorageorlocalstorage▪ Lift-and-shiftdeployment

24©Cloudera,Inc.Allrightsreserved.

AddUseCases,Analytics,andDataOn-Demand• AvoidtheITbacklogwithinstantaccesstoalldata

• On-demandclustersquerydirectlyonsharedobjectstorage

PredictableResultsWheneverYouWant• Consistentqueryperformance,evenduringpeaktimes

• Multi-tenancyviaisolatedclustersonshareddata

Just-in-TimeResources• Real-timecapacityforyourneeds,astheychange

• Elasticallygrow/shrinkyourclusterviadecoupledarchitecture

Contention-FreeETL• ETLanytimewithoutimpactingotherworkloadsorriskingSLAs

• SeparateETLclustersas-neededonshareddata

AdditiveBenefitsintheCloudExtendingcoreperformance,flexibility,scalability,andopenarchitecturebenefits

25©Cloudera,Inc.Allrightsreserved.

BI/AnalyticsintheCloudThreeArchitecturesOptionstoOptimizePrice/Performance

ObjectStorage

TransientCluster

TransientBI(infrequentusage)Spinupclusterswhenneeded● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser

PersistentBI(regularusage)PersistentclustersforBIanytime● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup

PersistentCluster

PersistentBIwithLocalStorage(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● Sharedclusterforsharedlocaldata

PersistentCluster HDFSand/orKudu

PersistentCluster

TransientCluster

DefaultChoice

26©Cloudera,Inc.Allrightsreserved.

PersistentBIonObjectStorageBestforelasticity(andspeedvstransient)

● Thisisusuallythebestchoice● Bestwhenworkloadsare:

o Flexibleandchangingo Frequentduringmostworkingdayso Notscheduledforfixedhours

● Benefitsinclude:o Predictableresultsreadilyavailableo Fullmulti-tenantisolationo Commondatainsharedobjectstorageo Grow/shrinkforTCOefficiency

● Tradeoffs:o Pernodeperfofobjectstorage(usemore,

cheapernodes)ObjectStorage

SharedHMSDB

PersistentBI(regularusage)Persistentclustersforreadyavailability● Reservedinstances● Node-basedpricing● Grow/shrink● Clusterpertenantgroup

PersistentCluster

PersistentCluster

DefaultChoice

27©Cloudera,Inc.Allrightsreserved.

PersistentBIwithLocally-AttachedStorageBestperformanceforconsistentworkloads

● Bestwhenworkloadsare:o Regularandconsistento Consistentlyqueryingcommondatao TightSLAsforperformanceo Fastchangingdata(thatneedsKudu)o Runningwithoutobjectstorage(eg.Azure,GCE)

● Benefitsinclude:o Fasterperformancepernodeonlocaldatao Abilitytoqueryobjectstorageforrestofdata

● Tradeoffs:o Lesselasticthanobjectstoredbasedclusterso Lessisolationformulti-tenantworkloadsusing

sameHDFSdatao Costifthereareoff-peakhours

ObjectStorage

PersistentBIwithHDFS(fastest)Maxspeedformoreregularworkloads● Reservedinstances● Node-basedpricing● Lessfrequentgrow/shrink● SharedclusterforsharedHDFSdata

PersistentCluster

LocalHMSDB

HDFSand/orKudu

28©Cloudera,Inc.Allrightsreserved.

TransientBIonObjectStorageBestTCOforinfrequentusage

ObjectStorage

ClouderaDirector

● Bestwhenworkloadsare:o Infrequentorscheduled

● Benefitsinclude:o LowestTCOwithclustersonlywhenneededo Fullmulti-tenantisolationo Commondatainsharedobjectstorage

● Tradeoffs:o Delaytospin-upclusterswhenneededo CapabilityofBIuserstospinupclusterso Pernodeperfofobjectstorage(usemore,

cheapernodes)SharedHMSDB

TransientCluster

TransientBI(infrequentusage)Spinupclusterswhenneeded.● On-demandinstances● Usage-basedpricing● Grow/shrink● Clusterpertenantoruser

TransientCluster

©Cloudera,Inc.Allrightsreserved. 29

ThankyouThankYouJustinErickson