Upload
mark-rittman
View
1.091
Download
0
Embed Size (px)
Citation preview
PowerPoint Presentation
Unlock the Value in your Big Data Reservoir using Oracle Big Data Discovery and Rittman MeadMark Rittman, CTO, Rittman MeadMarch 2016
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
2Mark Rittman, Co-Founder of Rittman MeadOracle ACE Director, specialising in Oracle BI&DW14 Years Experience with Oracle TechnologyRegular columnist for Oracle MagazineAuthor of two Oracle Press Oracle BI booksOracle Business Intelligence Developers GuideOracle Exalytics RevealedWriter for Rittman Mead Blog :http://www.rittmanmead.com/blogEmail : [email protected] : @markrittmanAbout the Speaker
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
3Started back in 1997 on a bank Oracle DW projectOur tools were Oracle 7.3.4, SQL*Plus, PL/SQL and shell scriptsWent on to use Oracle Developer/2000 and Designer/2000Our initial users queried the DW using SQL*PlusAnd later on, we rolled-out Discoverer/2000 to everyone elseAnd life was fun15+ Years in Oracle BI and Data Warehousing
[email protected] www.rittmanmead.com @rittmanmead
4Over time, this data warehouse architecture developedAdded Oracle Warehouse Builder to automate and model the DW buildOracle 9i Application Server (yay!) to deliver reports and web portalsData Mining and OLAP in the databaseOracle 9i for in-database ETL (and RAC)Data was typically loaded from Oracle RBDMS and EBSIt was turtles Oracle all the way downThe Oracle-Centric DW Architecture
[email protected] www.rittmanmead.com @rittmanmead
5Many customers and organisations are now running initiatives around big dataSome are IT-led and are looking for cost-savings around data warehouse storage + ETLOthers are skunkworks projects in the marketing department that are now scaling-upProjects now emerging from pilot exercisesAnd design patterns starting to emergeMany Organisations are Running Big Data Initiatives
[email protected] www.rittmanmead.com @rittmanmead
6Typical implementation of Hadoop and big data in an analytic context is the data lakeAdditional data storage platform with cheap storage, flexible schema support + computeData lands in the data lake or reservoir in raw form, then minimally processedData then accessed directly by data scientists, or processed further into DWCommon Big Data Design Pattern : Data Reservoir
[email protected] www.rittmanmead.com @rittmanmead
So What is a Data Reservoir?
[email protected] www.rittmanmead.com @rittmanmead
What Does it Do?
[email protected] www.rittmanmead.com @rittmanmead
And Does it Replace My Data Warehouse?
[email protected] www.rittmanmead.com @rittmanmead
An Interesting Question.
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
Meanwhile, back in the real world
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
Customer 360-Degree Insight
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
17Data from Real-Time, Social & Internet Sources is Strange
Single Customer View
Enriched Customer Profile
Correlating
ModelingMachineLearning
ScoringTypically comes in non-tabular formJSON, log files, key/value pairsUsers often want it speculativelyHavent though through final purposeSchema can change over timeOr maybe there isnt even oneBut the end-users want it nowNot when your ETL team are next free
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
18Hadoop & NoSQL better suited to exploratory analysis of newly-arrived data reservoir type-dataFlexible schema - applied by user rather than ETLCheap expandable storage for detail-level dataBetter native support for machine-learning anddata discovery tools and processesPotentially a great fit for our new and emergingcustomer 360 datasets, and great platform for analysisIntroducing Hadoop - Cheap, Flexible Storage + Compute
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
19Combine with DW for Big Data Management Platform
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
Start with pilot for area of the business that needs a single view of customersThen, over time, iterate and build out the Customer 360-degree viewDelivering a Successful Customer 360-Degree View
Start with a business area thatneeds a single customer view
Obtain clear understanding of customer online & offline behaviour
Build out Predictive Modelsand Decision Enginesto deliver value now
Build out Hadoop Data Reservoir, Feedsand link to DW + CRM
Iterate and Build-out,add new integrations,incrementally buildingcapability
Develop and Implement Strategy, Deliver Business ValueBuild DevOps Capability
Pilot & Quick Win
Create Full Production InfrastructurePilot (Virtualised / Commodity) Hadoop Infrastructure
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
21But These Data Sources are Strange
Single Customer View
Enriched Customer Profile
Correlating
ModelingMachineLearning
ScoringTypically comes in non-tabular formJSON, log files, key/value pairsUsers often want it speculativelyHavent though through final purposeSchema can change over timeOr maybe there isnt even oneBut the end-users want it nowNot when your ETL team are next free
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
22But These Data Sources are Strange
Single Customer View
Enriched Customer Profile
Correlating
ModelingMachineLearning
Scoring
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
23But These Data Sources are Strange
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
24Introducing the Data Lab for Raw/Unstructured Data
[email protected] www.rittmanmead.com @rittmanmead
25Data loaded into the reservoir needs preparation and curation before presenting to usersSpecialist skills typically needed to ingest and understand data - and those staff are scarceHow do we staff and scale projects as our use of big data matures?But Working with Unstructured Textual Data Is Hard
[email protected] www.rittmanmead.com @rittmanmead
Hold on
[email protected] www.rittmanmead.com @rittmanmead
Haven't we heard this story before?
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
29Part of the acquisition of Endeca back in 2012 by Oracle CorporationBased on search technology and concept of faceted searchData stored in flexible NoSQL-style in-memory database called Endeca ServerAdded aggregation, text analytics and text enrichment features for data discoveryExplore data in raw form, loose connections, navigate via search rather than hierarchiesUseful to find out what is relevant and valuable in a dataset before formal modelingWhat Was Oracle Endeca Information Discovery?
[email protected] www.rittmanmead.com @rittmanmead
30Proprietary database engine focused on search and analyticsData organized as records, made up of attributes stored as key/value pairsNo over-arching schema, no tables, self-describing attributes Endeca Server hallmarks:Minimal upfront designSupport for jagged dataAdministered via web service callsNo data left behindLoad and GoBut limited in scale (>1m records) what if it could be rebuilt on Hadoop?Endeca Server Technology Combined Search + Analytics
[email protected] www.rittmanmead.com @rittmanmead
2012
[email protected] www.rittmanmead.com @rittmanmead
2013
[email protected] www.rittmanmead.com @rittmanmead
2014
[email protected] www.rittmanmead.com @rittmanmead
2014
[email protected] www.rittmanmead.com @rittmanmead
2014
[email protected] www.rittmanmead.com @rittmanmead
2015
[email protected] www.rittmanmead.com @rittmanmead
2015
[email protected] www.rittmanmead.com @rittmanmead
and 2015
[email protected] www.rittmanmead.com @rittmanmead
2016
[email protected] www.rittmanmead.com @rittmanmead
40A visual front-end to the Hadoop data reservoir, providing end-user access to datasetsCatalog, profile, analyse and combine schema-on-read datasets across the Hadoop clusterVisualize and search datasets to gain insights, potentially load in summary form into DWOracle Big Data Discovery
[email protected] www.rittmanmead.com @rittmanmead
41What Does Big Data Discovery Do?
Provide a visual catalog and search function across data in the data reservoirProfile and understand data, relationships, data quality issuesApply simple changes, enrichment to incoming dataVisualize datasets including combinations (joins)
[email protected] www.rittmanmead.com @rittmanmead
Start with pilot for area of the business that needs a single view of customersThen, over time, iterate and build out the Customer 360-degree viewDelivering a Successful Customer 360-Degree View
Start with a business area thatneeds a single customer view
Obtain clear understanding of customer online & offline behaviour
Build out Predictive Modelsand Decision Enginesto deliver value now
Build out Hadoop Data Reservoir, Feedsand link to DW + CRM
Iterate and Build-out,add new integrations,incrementally buildingcapability
Develop and Implement Strategy, Deliver Business ValueBuild DevOps Capability
Pilot & Quick Win
Create Full Production InfrastructurePilot (Virtualised / Commodity) Hadoop Infrastructure
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
Delivering a Successful Customer 360-Degree View
Build out Predictive Modelsand Decision Enginesto deliver value now
Build out Hadoop Data Reservoir, Feedsand link to DW + CRM
Build DevOps Capability
T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com
[email protected] www.rittmanmead.com @rittmanmead
44Provide a visual catalog and search function across data in the data reservoirProfile and understand data, relationships, data quality issuesApply simple changes, enrichment to incoming dataVisualize datasets including combinations (joins)What Does Big Data Discovery Do?
[email protected] www.rittmanmead.com @rittmanmead
45Rittman Mead want to understand drivers and audience for their websiteWhat is our most popular content? Who are the most in-demand blog authors?Who are the influencers? What do they read? Three data sources in scope:Example Scenario : Social Media Analysis
RM Website Logs
Twitter Stream
Website Posts, Comments etc
[email protected] www.rittmanmead.com @rittmanmead
46Datasets in Hive have to be ingested into DGraph engine before analysis, transformationCan either define an automatic Hive table detector process, or manually uploadTypically ingests 1m row random sample1m row sample provides > 99% confidence that answer is within 2% of value shownno matter how big the full dataset (1m, 1b, 1q+)Makes interactivity cheap - representative dataset Ingesting & Sampling Datasets for the DGraph Engine
[email protected] www.rittmanmead.com @rittmanmead
47Ingested datasets are now visible in Big Data Discovery StudioCreate new project from first dataset, then add secondView Ingested Datasets, Create New Project
[email protected] www.rittmanmead.com @rittmanmead
48Ingestion process has automatically geo-coded host IP addressesOther automatic enrichments run after initial discovery step, based on datatypes, contentAutomatic Enrichment of Ingested Datasets
[email protected] www.rittmanmead.com @rittmanmead
49For the ACCESS_PER_POST_CAT_AUTHORS dataset, 18 attributes now availableCombination of original attributes, and derived attributes added by enrichment processInitial Data Exploration On Uploaded Dataset Attributes
[email protected] www.rittmanmead.com @rittmanmead
50Data ingest process automatically applies some enrichments - geocoding etcCan apply others from Transformation page - simple transformations & Groovy expressionsData Transformation & Enrichment
[email protected] www.rittmanmead.com @rittmanmead
51Uses Salience text engine under the coversExtract terms, sentiment, noun groups, positive / negative words etcTransformations using Text Enrichment / Parsing
[email protected] www.rittmanmead.com @rittmanmead
52Choose option to Create New Attribute, to add derived attribute to datasetPreview changes, then save to transformation scriptCreate New Attribute using Derived (Transformed) Values
1
2
3
[email protected] www.rittmanmead.com @rittmanmead
53Users can upload their own datasets into BDD, from MS Excel or CSV fileUploaded data is first loaded into Hive table, then sampled/ingested as normalUpload Additional Datasets
123
[email protected] www.rittmanmead.com @rittmanmead
54Used to create a dataset based on the intersection (typically) of two datasetsNot required to just view two or more datasets together - think of this as a JOIN and SELECTJoin Datasets On Common Attributes
[email protected] www.rittmanmead.com @rittmanmead
55Select from palette of visualisation componentsSelect measures, attributes for displayCreate Discovery Pages for Dataset Analysis
[email protected] www.rittmanmead.com @rittmanmead
56Visualize and Interact With Hadoop Datasets
[email protected] www.rittmanmead.com @rittmanmead
57BDD Studio dashboards support faceted search across all attributes, refinementsAuto-filter dashboard contents on selected attribute values - for data discoveryFast analysis and summarisation through Endeca Server technologyFaceted Search Across Entire Data Reservoir
Further refinement onOBIEE in post keywords3
Results now filteredon two refinements4
[email protected] www.rittmanmead.com @rittmanmead
58Visual Analyzer also provides a form of data discovery for BI usersSimilar to Tableau, Qlikview etcInspired by BI elements of OEIDUses OBIEE RPD as the primary datasource, so data needs to be curated + structuredProbably a better option for users who arent concerned its big dataBut can still connect to Hadoop viaHive, Impala and Oracle Big Data SQLComparing BDD to Oracle Visual Analyzer
[email protected] www.rittmanmead.com @rittmanmead
59Data in the data reservoir typically is raw, hasnt been organised into facts, dimensions yetIn this initial phase, you dont want to it to be - too much up-front work with unknown dataLater on though, users will benefit from structure and hierarchies being added to dataBut this takes work, and you need to understand cost/benefit of doing it now vs. laterManaged vs. Free-Form Data Discovery
[email protected] www.rittmanmead.com @rittmanmead
60Transformations within BDD can then be used to create curated fact + dim Hive tablesCan be used then as a more suitable dataset for use with OBIEE RPD + Visual AnalyzerOr exported then in to Exadata or Exalytics to combine with main DW datasetsExport Prepared Datasets Back to Hive, for OBIEE + VA
[email protected] www.rittmanmead.com @rittmanmead
61Users in Visual Analyzer then havea more structured dataset to useData organised into dimensions, facts, hierarchies and attributesCan still access Hadoop directlythrough Impala or Big Data SQLBig Data Discovery though was key to initial understanding of dataFurther Analyse in Visual Analyzer for Managed Dataset
[email protected] www.rittmanmead.com @rittmanmead
62Oracle Big Data Discovery used to go back to the raw event data add more meaningEnrich data, extract nouns + terms, add reference data from file, RDBMS etcUnderstand sentiment + meaning of tweets, link disparate + loosely coupled eventsFaceted search dashboardsOracle BDD for Data Wrangling + Data Enrichment
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
63Previous counts assumed that all tweet references equally importantBut some Twitter users are far more influential than othersSit at the centre of a community, have 1000s of followersA reference by them has massive impact on page viewsPositive or negative comments from them drive perceptionCan we identify them?Potentially reach out with analyst programStudy what website posts go viralUnderstand out audience, and the conversation, betterBut Who Are The Influencers In Our Community?
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
64Rittman Mead website features many types of contentBlogs on BI, data integration, big data, data warehousingOp-Eds (OBIEE12c - Three Months In, Whats the Verdict?)Articles on a theme, e.g. performance tuningDetails of new courses, new promotionsDifferent communities likely to form around these content typesDifferent influencers and patterns of recommendation, discoveryCan we identify some of the communities, segment our audience?What Communities and Networks Are Our Audience?
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
65Graph Example : RM Blog Post Referenced on Twitter
Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI
0000Page Views
1000Page Views
Follows
2000Page Views
Follows
3000Page Views
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
66Network Effect Magnified by Extent of Social Graph
Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI
3000Page Views7005Page Views
Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
67Retweets by Influential Twitter Users Drive Visits
Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI
3000Page Views
Retweet5003Page Views
RT: Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
68Retweets, Mentions and Replies Create Communities
RetweetReplyMentionReply#bigdatasql
ReplyMentionMentionMentionMention#thatswhatshesaid
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
69Property Graph Terminology
Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions
Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI
RetweetsNode, or VertexDirected Connection, or EdgeNode, or Vertex
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
70Different types of Twitter interaction could imply more or less influence
Retweet of another users Tweet implies that person is worth quotingor you endorse their opinion
Reply to another users tweet could be a weaker recognition of that persons opinion or view
Mention of a user in a tweet is a weaker recognition that they are part of a community / debateDetermining Influencers - Factors to Consider
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
71Relative Importance of Edge Types Added via Weights
Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Mentions, Weight = 30
Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI
Retweet, Weight = 100Edge PropertyEdge Property
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
72Graph, spatial and raster data processing for big dataRuns on-prem, or in Oracle Big Data Cloud ServiceInstallable on commodity cluster using CDHData stored in Apache HBase or Oracle NoSQL DBComplements Spatial & Graph in Oracle DatabaseDesigned for trillions of nodes, edges etcOut-of-the-box spatial enrichment servicesOver 35 of most popular graph analysis functionsGraph traversal, recommendationsFinding communities and influencers, Pattern matchingOracle Big Data Spatial & Graph
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
73Calculating Top 10 Users using Page Rank Algorithm
Top 10 influencers: markrittman rmoff rittmanmead mRainey JeromeFr Nephentur borkur BIExperte i_m_dave dw_pete
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
74Visualising the Social Graph Around Particular Users
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
75Calculating Shortest Path Between Users
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
76Edge Bundling to Better Illustrate Connection Frequency
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
77Determining Communities via Twitter Interactions
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
78Determining Communities via Twitter Interactions
Clusters based on actual interaction patterns, not hashtags Detects real communities, not ones that exist just in-theory
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead
79Extend your organisations reach into your data with Oracle Big Data Discovery, Cloudera Hadoop and the Rittman Mead Big Data Rapid Start.The Big Data Rapid Start is a fixed price, two week engagement delivered by Rittman Meads team of Oracle, Big Data and Data Discovery consultants, designed to quickly provide everything required to begin discovering the hidden value of your data.Move forward with confidence in the technology, process and application of Big Data Discovery with the support of the worlds leaders.Big Data Rapid Start from Rittman Mead
[email protected] www.rittmanmead.com @rittmanmead
80Articles on the Rittman Mead Bloghttp://www.rittmanmead.com/category/oracle-big-data-appliance/http://www.rittmanmead.com/category/big-data/http://www.rittmanmead.com/category/oracle-big-data-discovery/Rittman Mead offer consulting, training and managed services for Oracle Big DataOracle & Cloudera partnershttp://www.rittmanmead.com/bigdataAdditional Resources
[email protected] www.rittmanmead.com @rittmanmead
Unlock the Value in your Big Data Reservoir using Oracle Big Data Discovery and Rittman MeadMark Rittman, CTO, Rittman MeadMarch 2016
[email protected] www.rittmanmead.com @rittmanmead
[email protected] www.rittmanmead.com @rittmanmead