34
www.firstsanfranciscopartners.com Produced by: MONTHLY SERIES Brought to you in partnership with: April 6, 2017 Building a Flexible and Scalable Analytics Architecture

DI&A Webinar: Building a Flexible and Scalable Analytics Architecture

Embed Size (px)

Citation preview

The First Step in Information Management

www.firstsanfranciscopartners.com

Producedby:

MONTHLY SERIES

Broughttoyouinpartnershipwith:

April 6, 2017

BuildingaFlexibleandScalableAnalyticsArchitecture

PollingQuestion

§ WhereisyourorganizationinitsreadinesstodevelopaformalBigDataandanalyticsarchitecture?− Wehavenoplansorarchitectureforanalytics,butwanttohaveone.

− Wehaveastrategyandareplanninganarchitecture.− Wehavestartedtoimplementaplannedarchitectureforanalytics.

− Noneoftheabove.

pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

TypicalProblems

§ Analyticsteamsaresetupoutsideofanyformalsetofguardrails− Theydogoodwork,forawhile− Thentheystarttoask“Whyisn’tthismore

organized?”§ ACIOdecidesthatacompanyneedstodobetterwithdata,andacquires$15millioninBigDatatechnologyandsetsupadatalab− Afewsponsorsstarttousethelab,butcostsof

operationseemtoexceedthebenefitsfromtheirefforts

− Someoneasks“Whydidwedothis?”§ Bothweremissingaclearplanmanifestedinaformalarchitecture

pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

TopicsForToday’sWebinar

§ Whatisabigdataandanalyticsarchitecture?§ Whenshouldbigdataandanalyticsarchitecturesbeemployed?

§ Anarchitectureforbigdatasystems:keycomponents

§ Bestpractices§ Q&A

pg 4© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Combine?

www.firstsanfranciscopartners.com

WhatisaBigDataandAnalyticsArchitecture?

ADefinitionofArchitecture

§ Theartanddisciplineofdesigningbuildingsandstructures,fromthemacro-levelofurbanplanningtothemicro-levelofcreatingfurnitureandmachineparts.

§ Thedesignofanycomplexobjectorsystem.Itmayrefertotheimpliedarchitectureofabstract thingssuchasmusicormathematics,theapparentarchitectureofnaturalthingssuchasgeologicalformationsorlivingthings,orexplicitlyplannedarchitectureofhuman-madethingssuchasbuildings,machines,organizations,processes,softwareanddatabases.

§ Theorganizedarrangementofcomponentelementstooptimizethefunction,performance,feasibility,costand/oraestheticsofanoverallstructure.

pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

FromTheDAMAGuidetotheDataManagementBodyofKnowledge

DefinitionofArchitecturesforBigDataandAnalytics

§ Therefore,theBigDataandAnalyticsarchitectureisanarrangementofelementsthatareusedtomanageandleverageenormousamountsofdatatoperformanalytics.

pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

ConsiderationsforArchitecturesforBigDataandAnalytics

§ AvoidaWinchesterhouse− Complicatedwithmanypermutationsandvariables

− Itisadditive− Makingamistakecangetexpensiveifyouboltonanincompatiblesetofelements

§ EnsureyouneedBigDataforAnalytics§ Considercharacteristicsthatoptimizethefunction,performance,feasibilityandcost

pg 8© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Rooms: 160; Doorways: 467; Doors: 950; Fireplaces: 47 (gas, wood, coal); Bedrooms: 40Constructed 1884 – 1922 (38 continuous years); Cost: $5.5MBlueprints: Never made; Individual rooms sketched out by Sarah Winchester on paper or other media (e.g., tablecloths)

All design – no architecture

ElementsofaBigDataandAnalyticsArchitecture

pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

OrganizationElementsFunctionalElements TechnologyElements

Data

Consum

ption

DataSup

ply

Chain/Logistics

Data

Man

agem

ent

PedigreeandPreparation

Landing/Staging

Model/MetricsManagement

DataReduction

GlossaryManagement

MachineLearning/AI

DataGovernance

DataOperationsDataIngestion

ReferenceandMasterData

CompetencyCenters

Self-Service/DataCitizens

ETL/Virtualization

Distributed Processing

Metadata

DataQuality/Hygiene

Lake,Pond,Warehouse

HDFS,ColumnarandGraph

DataStreaming

DataGlossary

DataLakeManagementTaxonomy/Ontology

WebServices

PolicyandProcess

DataAnalystsandScientistsCollaboration,Decision-MakingAccess-Publish,Subscribe,Notify Accesstools– BI,Analytics

Applications

Analytics– Descriptive,Predictive,Prescriptive

Business/Tech.PlanningSecurity,Privacy

BusinessContinuity

TwoLensestoDeriveanEffectiveArchitecture

pg 10© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

FormDevelopingthe

architecturesoallstakeholderscan

actuallyunderstandanddevelopit

ProgressionDeveloparchitecturesthatarebestfitfor

purposeandeffective,nomatterhowsimple

orcomplex

FormsofArchitecturesforBigDataandAnalytics

§ Architectureforms:1. Abstract – Enableandconveyinsightsoitcanbeconsideredandadopted2. Apparent – Obviousstructuresoitcanbeusedtomanagedataaswellas

interfacewithpeopleandprocesses3. ExplicitlyPlanned– Mustbecomprehensive,notjustatechnologystackand

abunchofabstractarrows,soyoucanmangeandsustaintheenvironment

§ YourBigDataandAnalyticsarchitectureneedstoconsiderallthreeforms.

pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

ProgressionofArchitecture

pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Recognized Data-Driven

Insight-Driven

Audienceexpands

ScatteredAnalytics

Valuehaslimitedaudience

Embedded• Supporttacticaloperations• Monetizationofdata

Isolated

Yourarchitecturewillnotbestatic.Presentingonlytheultimatefuturestateisnotpractical.

www.firstsanfranciscopartners.com

WhenShouldBigDataandAnalyticsArchitecturesbeEmployed?

Thesewillaffectarchitectureandprogression:§ Veracity (the4th V)– Scattered,isolated,reactive

§ Variety – Consolidatingcontent,insightfrommorethanjustrowsandcolumns

§ Volume – Consolidatedcontent,tacticaluses,monetization

§ Velocity – Businessvelocity,notjustdatavelocity

§ “NetNew”– Generationskipping,balancealongsidemeetingtraditionalneeds

FactorsThatTriggertheNeedforFormalArchitecture

pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

FirstProgression– Isolated,ScatteredAnalytics(Abstract)

pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

OrganizationelementsFunctionalelements Technologyelements

Data

Consum

ption

Datasu

pplych

ain

/logistics

Data

Man

agem

ent

Landing/Staging DataOperationsETL

DataAnalystsAccess-Publish,Subscribe,Notify

Accesstools– BI,Analytics,Analytics- Descriptive,Predictive,Prescriptive

HDFS,ColumnarandGraph

Isolated,ScatteredAnalytics

pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Legacy Usage

EDW

PredictiveAnalytics

Claims

Customer

ClientData Hadoop

BIandReportingET

LIngest

DataScientistSpark

Analyst

SecondProgression– RecognizedValue(Abstract)

pg 17© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

OrganizationelementsFunctionalelements Technologyelements

Data

Consum

ption

Datasu

pplych

ain

/logistics

Data

Man

agem

ent

Landing/Staging

GlossaryManagement DataGovernance

DataOperationsDataIngestion

Reference&MasterData

CompetencyCenters

ETL /VirtualizationDataQuality/Hygiene

Lake,Pond,Warehouse

HDFS,columnar&GraphDataGlossary

DataLakeManagementPolicyandProcess

DataAnalystsandScientistsAccess-publish,subscribe,notifyAccesstools– BI,Analytics,Analytics- Descriptive,

Predictive,Prescriptive

Security,Privacy

Example– B2BInsuranceCompanyandDataMonetization

§ Insightdriven,monetizingdataasseparatelineofbusiness− Datalake,

Hadoop,Dedicated,isolateddatasciencearea;isolatedmonetizationarea

pg 18© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Legacy UsageHybridDataArchitecture

NewDataProducts

EDWPredictiveAnalytics

Claims

Customer

ClientData

HadoopLakeIn

gest,

pedigree

BIandReporting

Governance,DataManagementETL

Spark

ThirdProgression– DataDriven(Abstract)

pg 19© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

OrganizationelementsFunctionalelements Technologyelements

Data

Consum

ption

Datasu

pplych

ain

/logistics

Data

Man

agem

ent

Pedigreeandpreparation

Landing/Staging

Model/metricsmanagement

DataReduction

GlossaryManagement

MachineLearning/AI

DataGovernance

DataOperationsDataIngestion

Reference&MasterData

CompetencyCenters

SelfService/DataCitizens

ETL /Virtualization

Distributed Processing

Metadata

DataQuality/Hygiene

Lake,Pond,Warehouse

HDFS,columnar&Graph

DataStreaming

DataGlossary

DataLakeManagementTaxonomy/Ontology

Webservices

PolicyandProcess

DataAnalystsandScientistsCollaboration,DecisionMakingAccess-publish,subscribe,notify Accesstools– BI,Analytics,

Applications

Analytics- Descriptive,Predictive,Prescriptive

Business/Tech.PlanningSecurity,Privacy

BusinessContinuity

NewApplications

Usage

Example– “NetNew;”Generation-Skipping,Prescriptive

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

LogicalDataWarehouse

Exploration&Discovery

EDW

PredictiveAnalytics

Applications

StreamingIoT

Social

DigitalContent

eMail,Docs

HadoopDataLake

Ingest,ped

igree BIand

Reporting

Governance,DataManagementpg 20

SparkDataProducts

CitizenDataScientist

Storm

Pre-processin

g,validation

Hado

op

conn

ector

Example– DataandAnalyticsTechnologyStackApparent

pg 21© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

DataIntake

DataPreparation:Perm

issions,Dictionary,Indexing,Pedigree

DataLandingZone (DataLake)

DataTransformation, Reduction

AnalyticalDataAssets

AnalyticalComputingInfrastructure

BI/ReportingAssets

ModelServer,DataAccess

BITool

AnalyticsTool

Sources Insights

BIandReports

AnalyticsResults

MonetizedData

Results

DataPortal

Data Stores

Analytics Data Layer

Data Mart LayerTransactional Application Data Layer

Data Warehouse Layer

Integrated Data Layer

Content

External Internal

Content

Data Integration Services Data Movement Services

Data Quality Services Data Access Services

DataManagementServices

EnterpriseServices Environment Management Services Security Services

Master Data

Master Data

Event Data

Event Data

Master Data

Event Data

IntegratedMasterData

IntegratedEventData

Conformed Dimensions

Atomic Facts

Derived Facts History

Operational

Conformed Master DataIntegrated Event Data

Analytic

Conformed Master DataDerived Facts + History

Cubes –(Multi-DimensionalAnalytics)

Advanced Analytics(Statistical Analysis, Data Mining, etc.)

Archived Data Layer MetadataLayer Ontologies /

DictionariesBusiness Rules

Operational Metadata

Technical Master Data

Archive

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

ReferenceDataArchitecturewithServices

www.firstsanfranciscopartners.com

AnArchitectureforBigDataSystems:KeyComponents

FSFPReferenceArchitecture– AbstractType

§ LikeanI-beam,thedataarchitectureneedstotaketheloadofmeetingbusinessobjectives,anddistributethatloadtosupportivestructures

pg 23© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

DATAINSIGHTARCHITECTURE

Wrangling Layer

ManagementLayer

DataAccessLayer

BusinessStrategy

FSFPReferenceArchitecture– AbstractDATAINSIGHTARCHITECTURE

pg 24© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

1

DataMovement/Logistics

Cross-GenerationAbstractionProcesses

andMapping

DataVirtual’n,Services

ManagementLayerMetadata,Lineage,WorkFlow,Models,ReferenceData,Rules,CanonicalData

DataAccessLayerBI/Reporting,Analytics,Mobile

VintageArea

Legacyapplicationsanddatastructures,

traditionalmethods

Mission:ToServeandProtect

ContemporaryArea

Newapps anddatastructures,Agile

methods

Mission:Flexible,Responsive

BusinessStrategy

FSFPReferenceArchitecture– ApparentDATAINSIGHTARCHITECTURE

pg 25© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

1

DataLifeCycles

Management

DataUsage

VintageArea ContemporaryArea

BusinessStrategy

LegacyBIandReporting

DataWarehouse,ODS,Mart

ETL,EAI,Replication

DataLake,Pond

NoSQL(HDFS,Graph)

AdvancedAnalytics

RDBMS,SQL,In-MemoryAppliance

Metadata Lineage ReferenceData

Alignment

DataMonetization

VisualizationDataW

ranglingMobile LogicalDW

UnstructuredData

FSFPReferenceArchitecture– ExplicitDATAINSIGHTARCHITECTURE

pg 26© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

1

Data movement /

logistics

Cross-generationAbstraction Processes

& Mapping

VintageArea ContemporaryArea

BusinessStrategy

Vintage Views

DBMS

Future Apps

Data Movement/Logistics

Cross-GenerationAbstraction Processes

andMapping

Web Services

Distributed Processing

DataVirtual’n

$Monetization

EDW

RDBMS

Ext’lData

Unstr’dData

Ingestion,pedigree

AgileApps

VintageApps

ManagementLayerMetadata,Lineage,WorkFlow,Models,ReferenceData,Rules,CanonicalData

DataAccessLayerBI/Reporting,Analytics,Mobile

DBMS

ETL

ETL

NoSqlLake

DM IoT

Preprocess

pg 28

FSFPReferenceArchitecture– DataAccessFocus

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

DATAINSIGHTARCHITECTURE

Data Supply Life Cycles and Supply

Chains -Movement /Logistics

ManagementLayer

DataAccessLayer

VintageArea

ContemporaryArea

BusinessStrategy

PortalsReport,BI,Query Workbenches Labs

WebServices,DataVirtualization

Mobile

www.firstsanfranciscopartners.com

BestPractices

pg 30

BestPractices

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Applydifferentlenses

ConsiderFormsConsiderProgressions

Reconcileoldtonew

UnderstandbusinessneedsReconcilecurrent-statetechnologyandfuture-statetechnology

Applythe“I-beam”

AddressVintageandContemporarysystems

Haveaplan

EstablishprioritiesIdentifywhereyoustartIdentifywhoisaffected

HaveaMethodology

§ Establish(butwithadefinedarchitecture)aSandbox,PoC

§ DefinetheVisionofvalueandreturn§ PerformAlignment§ AssesstheV’s,cultureandorganizationreadiness

§ Definelong-termrequirementsforuse§ Defineoperatingmodels§ DesigntheAnalyticsArchitecture§ Developarealisticroadmap§ Transitiontoasustainablearchitecture

pg 31© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

Discovery

Action Strategy

VisionandAlignment

Requirements

ArchitectureandDesign

Assessment

ImplementationandOperation

Roadmap

Initiation

MeasurementandSustaining OperatingModel

Copyright:FirstSanFranciscoPartners,2017

Questions?

pg 32© 2017 First San Francisco Partners www.firstsanfranciscopartners.com

MONTHLY SERIES

Thankyou!SeeyouThursday,May4forournextDIAwebinar,TheRoleofaDataScientist(InterviewwithaCDS)

JohnLadley@[email protected]

KelleO’Neal@[email protected]

LayerCharacteristic Transactional

Application DataWarehouse DataMart

Dataproducedviatheautomationofbusinessprocesses

Viewofdataacrosstheenterprise.Supportsdissemination,derivationofknowledgeandhistory

Purpose

DataLifeCycle

DataOperations

DataModel

Datastructuredandfilteredtosupportspecificinformationneedsofsmallgroupsofusers.

Allbase(non-derived)dataoriginateshere

Derivations(includingaggregations)producedhere,andhistoryisinferred

DatafromWarehouseistransformedtosupportspecificreporting

Create/Source/Read/Update/Delete/Archive

Extract/Transform/Load/Derive/Publish/Archive

Subscribe/Transform/Archive

Normalizedto3NFSubjectOriented/Snowflaked /ConformedDimensions

InformationRequirementOriented/Snowflaked /ConformedDimensions

• Much more is needed than the above• Definitions are a technical reference; explanations help stakeholders to

understand the reference architecture

NeedDefinitions,Explanations– NotJustPicture

© 2017 First San Francisco Partners www.firstsanfranciscopartners.com