Upload
dataversity
View
19
Download
2
Embed Size (px)
Citation preview
The First Step in Information Management
www.firstsanfranciscopartners.com
Producedby:
MONTHLY SERIES
Broughttoyouinpartnershipwith:
April 6, 2017
BuildingaFlexibleandScalableAnalyticsArchitecture
PollingQuestion
§ WhereisyourorganizationinitsreadinesstodevelopaformalBigDataandanalyticsarchitecture?− Wehavenoplansorarchitectureforanalytics,butwanttohaveone.
− Wehaveastrategyandareplanninganarchitecture.− Wehavestartedtoimplementaplannedarchitectureforanalytics.
− Noneoftheabove.
pg 2© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
TypicalProblems
§ Analyticsteamsaresetupoutsideofanyformalsetofguardrails− Theydogoodwork,forawhile− Thentheystarttoask“Whyisn’tthismore
organized?”§ ACIOdecidesthatacompanyneedstodobetterwithdata,andacquires$15millioninBigDatatechnologyandsetsupadatalab− Afewsponsorsstarttousethelab,butcostsof
operationseemtoexceedthebenefitsfromtheirefforts
− Someoneasks“Whydidwedothis?”§ Bothweremissingaclearplanmanifestedinaformalarchitecture
pg 3© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
TopicsForToday’sWebinar
§ Whatisabigdataandanalyticsarchitecture?§ Whenshouldbigdataandanalyticsarchitecturesbeemployed?
§ Anarchitectureforbigdatasystems:keycomponents
§ Bestpractices§ Q&A
pg 4© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Combine?
ADefinitionofArchitecture
§ Theartanddisciplineofdesigningbuildingsandstructures,fromthemacro-levelofurbanplanningtothemicro-levelofcreatingfurnitureandmachineparts.
§ Thedesignofanycomplexobjectorsystem.Itmayrefertotheimpliedarchitectureofabstract thingssuchasmusicormathematics,theapparentarchitectureofnaturalthingssuchasgeologicalformationsorlivingthings,orexplicitlyplannedarchitectureofhuman-madethingssuchasbuildings,machines,organizations,processes,softwareanddatabases.
§ Theorganizedarrangementofcomponentelementstooptimizethefunction,performance,feasibility,costand/oraestheticsofanoverallstructure.
pg 6© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
FromTheDAMAGuidetotheDataManagementBodyofKnowledge
DefinitionofArchitecturesforBigDataandAnalytics
§ Therefore,theBigDataandAnalyticsarchitectureisanarrangementofelementsthatareusedtomanageandleverageenormousamountsofdatatoperformanalytics.
pg 7© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
ConsiderationsforArchitecturesforBigDataandAnalytics
§ AvoidaWinchesterhouse− Complicatedwithmanypermutationsandvariables
− Itisadditive− Makingamistakecangetexpensiveifyouboltonanincompatiblesetofelements
§ EnsureyouneedBigDataforAnalytics§ Considercharacteristicsthatoptimizethefunction,performance,feasibilityandcost
pg 8© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Rooms: 160; Doorways: 467; Doors: 950; Fireplaces: 47 (gas, wood, coal); Bedrooms: 40Constructed 1884 – 1922 (38 continuous years); Cost: $5.5MBlueprints: Never made; Individual rooms sketched out by Sarah Winchester on paper or other media (e.g., tablecloths)
All design – no architecture
ElementsofaBigDataandAnalyticsArchitecture
pg 9© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
OrganizationElementsFunctionalElements TechnologyElements
Data
Consum
ption
DataSup
ply
Chain/Logistics
Data
Man
agem
ent
PedigreeandPreparation
Landing/Staging
Model/MetricsManagement
DataReduction
GlossaryManagement
MachineLearning/AI
DataGovernance
DataOperationsDataIngestion
ReferenceandMasterData
CompetencyCenters
Self-Service/DataCitizens
ETL/Virtualization
Distributed Processing
Metadata
DataQuality/Hygiene
Lake,Pond,Warehouse
HDFS,ColumnarandGraph
DataStreaming
DataGlossary
DataLakeManagementTaxonomy/Ontology
WebServices
PolicyandProcess
DataAnalystsandScientistsCollaboration,Decision-MakingAccess-Publish,Subscribe,Notify Accesstools– BI,Analytics
Applications
Analytics– Descriptive,Predictive,Prescriptive
Business/Tech.PlanningSecurity,Privacy
BusinessContinuity
TwoLensestoDeriveanEffectiveArchitecture
pg 10© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
FormDevelopingthe
architecturesoallstakeholderscan
actuallyunderstandanddevelopit
ProgressionDeveloparchitecturesthatarebestfitfor
purposeandeffective,nomatterhowsimple
orcomplex
FormsofArchitecturesforBigDataandAnalytics
§ Architectureforms:1. Abstract – Enableandconveyinsightsoitcanbeconsideredandadopted2. Apparent – Obviousstructuresoitcanbeusedtomanagedataaswellas
interfacewithpeopleandprocesses3. ExplicitlyPlanned– Mustbecomprehensive,notjustatechnologystackand
abunchofabstractarrows,soyoucanmangeandsustaintheenvironment
§ YourBigDataandAnalyticsarchitectureneedstoconsiderallthreeforms.
pg 11© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
ProgressionofArchitecture
pg 12© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Recognized Data-Driven
Insight-Driven
Audienceexpands
ScatteredAnalytics
Valuehaslimitedaudience
Embedded• Supporttacticaloperations• Monetizationofdata
Isolated
Yourarchitecturewillnotbestatic.Presentingonlytheultimatefuturestateisnotpractical.
Thesewillaffectarchitectureandprogression:§ Veracity (the4th V)– Scattered,isolated,reactive
§ Variety – Consolidatingcontent,insightfrommorethanjustrowsandcolumns
§ Volume – Consolidatedcontent,tacticaluses,monetization
§ Velocity – Businessvelocity,notjustdatavelocity
§ “NetNew”– Generationskipping,balancealongsidemeetingtraditionalneeds
FactorsThatTriggertheNeedforFormalArchitecture
pg 14© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
FirstProgression– Isolated,ScatteredAnalytics(Abstract)
pg 15© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
OrganizationelementsFunctionalelements Technologyelements
Data
Consum
ption
Datasu
pplych
ain
/logistics
Data
Man
agem
ent
Landing/Staging DataOperationsETL
DataAnalystsAccess-Publish,Subscribe,Notify
Accesstools– BI,Analytics,Analytics- Descriptive,Predictive,Prescriptive
HDFS,ColumnarandGraph
Isolated,ScatteredAnalytics
pg 16© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Legacy Usage
EDW
PredictiveAnalytics
Claims
Customer
ClientData Hadoop
BIandReportingET
LIngest
DataScientistSpark
Analyst
SecondProgression– RecognizedValue(Abstract)
pg 17© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
OrganizationelementsFunctionalelements Technologyelements
Data
Consum
ption
Datasu
pplych
ain
/logistics
Data
Man
agem
ent
Landing/Staging
GlossaryManagement DataGovernance
DataOperationsDataIngestion
Reference&MasterData
CompetencyCenters
ETL /VirtualizationDataQuality/Hygiene
Lake,Pond,Warehouse
HDFS,columnar&GraphDataGlossary
DataLakeManagementPolicyandProcess
DataAnalystsandScientistsAccess-publish,subscribe,notifyAccesstools– BI,Analytics,Analytics- Descriptive,
Predictive,Prescriptive
Security,Privacy
Example– B2BInsuranceCompanyandDataMonetization
§ Insightdriven,monetizingdataasseparatelineofbusiness− Datalake,
Hadoop,Dedicated,isolateddatasciencearea;isolatedmonetizationarea
pg 18© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Legacy UsageHybridDataArchitecture
NewDataProducts
EDWPredictiveAnalytics
Claims
Customer
ClientData
HadoopLakeIn
gest,
pedigree
BIandReporting
Governance,DataManagementETL
Spark
ThirdProgression– DataDriven(Abstract)
pg 19© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
OrganizationelementsFunctionalelements Technologyelements
Data
Consum
ption
Datasu
pplych
ain
/logistics
Data
Man
agem
ent
Pedigreeandpreparation
Landing/Staging
Model/metricsmanagement
DataReduction
GlossaryManagement
MachineLearning/AI
DataGovernance
DataOperationsDataIngestion
Reference&MasterData
CompetencyCenters
SelfService/DataCitizens
ETL /Virtualization
Distributed Processing
Metadata
DataQuality/Hygiene
Lake,Pond,Warehouse
HDFS,columnar&Graph
DataStreaming
DataGlossary
DataLakeManagementTaxonomy/Ontology
Webservices
PolicyandProcess
DataAnalystsandScientistsCollaboration,DecisionMakingAccess-publish,subscribe,notify Accesstools– BI,Analytics,
Applications
Analytics- Descriptive,Predictive,Prescriptive
Business/Tech.PlanningSecurity,Privacy
BusinessContinuity
NewApplications
Usage
Example– “NetNew;”Generation-Skipping,Prescriptive
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
LogicalDataWarehouse
Exploration&Discovery
EDW
PredictiveAnalytics
Applications
StreamingIoT
Social
DigitalContent
eMail,Docs
HadoopDataLake
Ingest,ped
igree BIand
Reporting
Governance,DataManagementpg 20
SparkDataProducts
CitizenDataScientist
Storm
Pre-processin
g,validation
Hado
op
conn
ector
Example– DataandAnalyticsTechnologyStackApparent
pg 21© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
DataIntake
DataPreparation:Perm
issions,Dictionary,Indexing,Pedigree
DataLandingZone (DataLake)
DataTransformation, Reduction
AnalyticalDataAssets
AnalyticalComputingInfrastructure
BI/ReportingAssets
ModelServer,DataAccess
BITool
AnalyticsTool
Sources Insights
BIandReports
AnalyticsResults
MonetizedData
Results
DataPortal
Data Stores
Analytics Data Layer
Data Mart LayerTransactional Application Data Layer
Data Warehouse Layer
Integrated Data Layer
Content
External Internal
Content
Data Integration Services Data Movement Services
Data Quality Services Data Access Services
DataManagementServices
EnterpriseServices Environment Management Services Security Services
Master Data
Master Data
Event Data
Event Data
Master Data
Event Data
IntegratedMasterData
IntegratedEventData
Conformed Dimensions
Atomic Facts
Derived Facts History
Operational
Conformed Master DataIntegrated Event Data
Analytic
Conformed Master DataDerived Facts + History
Cubes –(Multi-DimensionalAnalytics)
Advanced Analytics(Statistical Analysis, Data Mining, etc.)
Archived Data Layer MetadataLayer Ontologies /
DictionariesBusiness Rules
Operational Metadata
Technical Master Data
Archive
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
ReferenceDataArchitecturewithServices
FSFPReferenceArchitecture– AbstractType
§ LikeanI-beam,thedataarchitectureneedstotaketheloadofmeetingbusinessobjectives,anddistributethatloadtosupportivestructures
pg 23© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
DATAINSIGHTARCHITECTURE
Wrangling Layer
ManagementLayer
DataAccessLayer
BusinessStrategy
FSFPReferenceArchitecture– AbstractDATAINSIGHTARCHITECTURE
pg 24© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
1
DataMovement/Logistics
Cross-GenerationAbstractionProcesses
andMapping
DataVirtual’n,Services
ManagementLayerMetadata,Lineage,WorkFlow,Models,ReferenceData,Rules,CanonicalData
DataAccessLayerBI/Reporting,Analytics,Mobile
VintageArea
Legacyapplicationsanddatastructures,
traditionalmethods
Mission:ToServeandProtect
ContemporaryArea
Newapps anddatastructures,Agile
methods
Mission:Flexible,Responsive
BusinessStrategy
FSFPReferenceArchitecture– ApparentDATAINSIGHTARCHITECTURE
pg 25© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
1
DataLifeCycles
Management
DataUsage
VintageArea ContemporaryArea
BusinessStrategy
LegacyBIandReporting
DataWarehouse,ODS,Mart
ETL,EAI,Replication
DataLake,Pond
NoSQL(HDFS,Graph)
AdvancedAnalytics
RDBMS,SQL,In-MemoryAppliance
Metadata Lineage ReferenceData
Alignment
DataMonetization
VisualizationDataW
ranglingMobile LogicalDW
UnstructuredData
FSFPReferenceArchitecture– ExplicitDATAINSIGHTARCHITECTURE
pg 26© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
1
Data movement /
logistics
Cross-generationAbstraction Processes
& Mapping
VintageArea ContemporaryArea
BusinessStrategy
Vintage Views
DBMS
Future Apps
Data Movement/Logistics
Cross-GenerationAbstraction Processes
andMapping
Web Services
Distributed Processing
DataVirtual’n
$Monetization
EDW
RDBMS
Ext’lData
Unstr’dData
Ingestion,pedigree
AgileApps
VintageApps
ManagementLayerMetadata,Lineage,WorkFlow,Models,ReferenceData,Rules,CanonicalData
DataAccessLayerBI/Reporting,Analytics,Mobile
DBMS
ETL
ETL
NoSqlLake
DM IoT
Preprocess
pg 28
FSFPReferenceArchitecture– DataAccessFocus
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
DATAINSIGHTARCHITECTURE
Data Supply Life Cycles and Supply
Chains -Movement /Logistics
ManagementLayer
DataAccessLayer
VintageArea
ContemporaryArea
BusinessStrategy
PortalsReport,BI,Query Workbenches Labs
WebServices,DataVirtualization
Mobile
pg 30
BestPractices
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Applydifferentlenses
ConsiderFormsConsiderProgressions
Reconcileoldtonew
UnderstandbusinessneedsReconcilecurrent-statetechnologyandfuture-statetechnology
Applythe“I-beam”
AddressVintageandContemporarysystems
Haveaplan
EstablishprioritiesIdentifywhereyoustartIdentifywhoisaffected
HaveaMethodology
§ Establish(butwithadefinedarchitecture)aSandbox,PoC
§ DefinetheVisionofvalueandreturn§ PerformAlignment§ AssesstheV’s,cultureandorganizationreadiness
§ Definelong-termrequirementsforuse§ Defineoperatingmodels§ DesigntheAnalyticsArchitecture§ Developarealisticroadmap§ Transitiontoasustainablearchitecture
pg 31© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
Discovery
Action Strategy
VisionandAlignment
Requirements
ArchitectureandDesign
Assessment
ImplementationandOperation
Roadmap
Initiation
MeasurementandSustaining OperatingModel
Copyright:FirstSanFranciscoPartners,2017
Questions?
pg 32© 2017 First San Francisco Partners www.firstsanfranciscopartners.com
MONTHLY SERIES
Thankyou!SeeyouThursday,May4forournextDIAwebinar,TheRoleofaDataScientist(InterviewwithaCDS)
JohnLadley@[email protected]
KelleO’Neal@[email protected]
LayerCharacteristic Transactional
Application DataWarehouse DataMart
Dataproducedviatheautomationofbusinessprocesses
Viewofdataacrosstheenterprise.Supportsdissemination,derivationofknowledgeandhistory
Purpose
DataLifeCycle
DataOperations
DataModel
Datastructuredandfilteredtosupportspecificinformationneedsofsmallgroupsofusers.
Allbase(non-derived)dataoriginateshere
Derivations(includingaggregations)producedhere,andhistoryisinferred
DatafromWarehouseistransformedtosupportspecificreporting
Create/Source/Read/Update/Delete/Archive
Extract/Transform/Load/Derive/Publish/Archive
Subscribe/Transform/Archive
Normalizedto3NFSubjectOriented/Snowflaked /ConformedDimensions
InformationRequirementOriented/Snowflaked /ConformedDimensions
• Much more is needed than the above• Definitions are a technical reference; explanations help stakeholders to
understand the reference architecture
NeedDefinitions,Explanations– NotJustPicture
© 2017 First San Francisco Partners www.firstsanfranciscopartners.com