81
[email protected] www.rittmanmead.com @rittmanmead Unlock the Value in your Big Data Reservoir using Oracle Big Data Discovery and Rittman Mead Mark Rittman, CTO, Rittman Mead March 2016

Unlock the value in your big data reservoir using oracle big data discovery and rittman mead

Embed Size (px)

Citation preview

PowerPoint Presentation

Unlock the Value in your Big Data Reservoir using Oracle Big Data Discovery and Rittman MeadMark Rittman, CTO, Rittman MeadMarch 2016

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

2Mark Rittman, Co-Founder of Rittman MeadOracle ACE Director, specialising in Oracle BI&DW14 Years Experience with Oracle TechnologyRegular columnist for Oracle MagazineAuthor of two Oracle Press Oracle BI booksOracle Business Intelligence Developers GuideOracle Exalytics RevealedWriter for Rittman Mead Blog :http://www.rittmanmead.com/blogEmail : [email protected] : @markrittmanAbout the Speaker

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

3Started back in 1997 on a bank Oracle DW projectOur tools were Oracle 7.3.4, SQL*Plus, PL/SQL and shell scriptsWent on to use Oracle Developer/2000 and Designer/2000Our initial users queried the DW using SQL*PlusAnd later on, we rolled-out Discoverer/2000 to everyone elseAnd life was fun15+ Years in Oracle BI and Data Warehousing

[email protected] www.rittmanmead.com @rittmanmead

4Over time, this data warehouse architecture developedAdded Oracle Warehouse Builder to automate and model the DW buildOracle 9i Application Server (yay!) to deliver reports and web portalsData Mining and OLAP in the databaseOracle 9i for in-database ETL (and RAC)Data was typically loaded from Oracle RBDMS and EBSIt was turtles Oracle all the way downThe Oracle-Centric DW Architecture

[email protected] www.rittmanmead.com @rittmanmead

5Many customers and organisations are now running initiatives around big dataSome are IT-led and are looking for cost-savings around data warehouse storage + ETLOthers are skunkworks projects in the marketing department that are now scaling-upProjects now emerging from pilot exercisesAnd design patterns starting to emergeMany Organisations are Running Big Data Initiatives

[email protected] www.rittmanmead.com @rittmanmead

6Typical implementation of Hadoop and big data in an analytic context is the data lakeAdditional data storage platform with cheap storage, flexible schema support + computeData lands in the data lake or reservoir in raw form, then minimally processedData then accessed directly by data scientists, or processed further into DWCommon Big Data Design Pattern : Data Reservoir

[email protected] www.rittmanmead.com @rittmanmead

So What is a Data Reservoir?

[email protected] www.rittmanmead.com @rittmanmead

What Does it Do?

[email protected] www.rittmanmead.com @rittmanmead

And Does it Replace My Data Warehouse?

[email protected] www.rittmanmead.com @rittmanmead

An Interesting Question.

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

Meanwhile, back in the real world

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

Customer 360-Degree Insight

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

17Data from Real-Time, Social & Internet Sources is Strange

Single Customer View

Enriched Customer Profile

Correlating

ModelingMachineLearning

ScoringTypically comes in non-tabular formJSON, log files, key/value pairsUsers often want it speculativelyHavent though through final purposeSchema can change over timeOr maybe there isnt even oneBut the end-users want it nowNot when your ETL team are next free

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

18Hadoop & NoSQL better suited to exploratory analysis of newly-arrived data reservoir type-dataFlexible schema - applied by user rather than ETLCheap expandable storage for detail-level dataBetter native support for machine-learning anddata discovery tools and processesPotentially a great fit for our new and emergingcustomer 360 datasets, and great platform for analysisIntroducing Hadoop - Cheap, Flexible Storage + Compute

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

19Combine with DW for Big Data Management Platform

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

Start with pilot for area of the business that needs a single view of customersThen, over time, iterate and build out the Customer 360-degree viewDelivering a Successful Customer 360-Degree View

Start with a business area thatneeds a single customer view

Obtain clear understanding of customer online & offline behaviour

Build out Predictive Modelsand Decision Enginesto deliver value now

Build out Hadoop Data Reservoir, Feedsand link to DW + CRM

Iterate and Build-out,add new integrations,incrementally buildingcapability

Develop and Implement Strategy, Deliver Business ValueBuild DevOps Capability

Pilot & Quick Win

Create Full Production InfrastructurePilot (Virtualised / Commodity) Hadoop Infrastructure

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

21But These Data Sources are Strange

Single Customer View

Enriched Customer Profile

Correlating

ModelingMachineLearning

ScoringTypically comes in non-tabular formJSON, log files, key/value pairsUsers often want it speculativelyHavent though through final purposeSchema can change over timeOr maybe there isnt even oneBut the end-users want it nowNot when your ETL team are next free

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

22But These Data Sources are Strange

Single Customer View

Enriched Customer Profile

Correlating

ModelingMachineLearning

Scoring

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

23But These Data Sources are Strange

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

24Introducing the Data Lab for Raw/Unstructured Data

[email protected] www.rittmanmead.com @rittmanmead

25Data loaded into the reservoir needs preparation and curation before presenting to usersSpecialist skills typically needed to ingest and understand data - and those staff are scarceHow do we staff and scale projects as our use of big data matures?But Working with Unstructured Textual Data Is Hard

[email protected] www.rittmanmead.com @rittmanmead

Hold on

[email protected] www.rittmanmead.com @rittmanmead

Haven't we heard this story before?

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

29Part of the acquisition of Endeca back in 2012 by Oracle CorporationBased on search technology and concept of faceted searchData stored in flexible NoSQL-style in-memory database called Endeca ServerAdded aggregation, text analytics and text enrichment features for data discoveryExplore data in raw form, loose connections, navigate via search rather than hierarchiesUseful to find out what is relevant and valuable in a dataset before formal modelingWhat Was Oracle Endeca Information Discovery?

[email protected] www.rittmanmead.com @rittmanmead

30Proprietary database engine focused on search and analyticsData organized as records, made up of attributes stored as key/value pairsNo over-arching schema, no tables, self-describing attributes Endeca Server hallmarks:Minimal upfront designSupport for jagged dataAdministered via web service callsNo data left behindLoad and GoBut limited in scale (>1m records) what if it could be rebuilt on Hadoop?Endeca Server Technology Combined Search + Analytics

[email protected] www.rittmanmead.com @rittmanmead

2012

[email protected] www.rittmanmead.com @rittmanmead

2013

[email protected] www.rittmanmead.com @rittmanmead

2014

[email protected] www.rittmanmead.com @rittmanmead

2014

[email protected] www.rittmanmead.com @rittmanmead

2014

[email protected] www.rittmanmead.com @rittmanmead

2015

[email protected] www.rittmanmead.com @rittmanmead

2015

[email protected] www.rittmanmead.com @rittmanmead

and 2015

[email protected] www.rittmanmead.com @rittmanmead

2016

[email protected] www.rittmanmead.com @rittmanmead

40A visual front-end to the Hadoop data reservoir, providing end-user access to datasetsCatalog, profile, analyse and combine schema-on-read datasets across the Hadoop clusterVisualize and search datasets to gain insights, potentially load in summary form into DWOracle Big Data Discovery

[email protected] www.rittmanmead.com @rittmanmead

41What Does Big Data Discovery Do?

Provide a visual catalog and search function across data in the data reservoirProfile and understand data, relationships, data quality issuesApply simple changes, enrichment to incoming dataVisualize datasets including combinations (joins)

[email protected] www.rittmanmead.com @rittmanmead

Start with pilot for area of the business that needs a single view of customersThen, over time, iterate and build out the Customer 360-degree viewDelivering a Successful Customer 360-Degree View

Start with a business area thatneeds a single customer view

Obtain clear understanding of customer online & offline behaviour

Build out Predictive Modelsand Decision Enginesto deliver value now

Build out Hadoop Data Reservoir, Feedsand link to DW + CRM

Iterate and Build-out,add new integrations,incrementally buildingcapability

Develop and Implement Strategy, Deliver Business ValueBuild DevOps Capability

Pilot & Quick Win

Create Full Production InfrastructurePilot (Virtualised / Commodity) Hadoop Infrastructure

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

Delivering a Successful Customer 360-Degree View

Build out Predictive Modelsand Decision Enginesto deliver value now

Build out Hadoop Data Reservoir, Feedsand link to DW + CRM

Build DevOps Capability

T : +44 (0) 1273 911 268 (UK) or (888) 631-1410 (USA) or +61 3 9596 7186 (Australia & New Zealand) or +91 997 256 7970 (India)E : [email protected] : www.rittmanmead.com

[email protected] www.rittmanmead.com @rittmanmead

44Provide a visual catalog and search function across data in the data reservoirProfile and understand data, relationships, data quality issuesApply simple changes, enrichment to incoming dataVisualize datasets including combinations (joins)What Does Big Data Discovery Do?

[email protected] www.rittmanmead.com @rittmanmead

45Rittman Mead want to understand drivers and audience for their websiteWhat is our most popular content? Who are the most in-demand blog authors?Who are the influencers? What do they read? Three data sources in scope:Example Scenario : Social Media Analysis

RM Website Logs

Twitter Stream

Website Posts, Comments etc

[email protected] www.rittmanmead.com @rittmanmead

46Datasets in Hive have to be ingested into DGraph engine before analysis, transformationCan either define an automatic Hive table detector process, or manually uploadTypically ingests 1m row random sample1m row sample provides > 99% confidence that answer is within 2% of value shownno matter how big the full dataset (1m, 1b, 1q+)Makes interactivity cheap - representative dataset Ingesting & Sampling Datasets for the DGraph Engine

[email protected] www.rittmanmead.com @rittmanmead

47Ingested datasets are now visible in Big Data Discovery StudioCreate new project from first dataset, then add secondView Ingested Datasets, Create New Project

[email protected] www.rittmanmead.com @rittmanmead

48Ingestion process has automatically geo-coded host IP addressesOther automatic enrichments run after initial discovery step, based on datatypes, contentAutomatic Enrichment of Ingested Datasets

[email protected] www.rittmanmead.com @rittmanmead

49For the ACCESS_PER_POST_CAT_AUTHORS dataset, 18 attributes now availableCombination of original attributes, and derived attributes added by enrichment processInitial Data Exploration On Uploaded Dataset Attributes

[email protected] www.rittmanmead.com @rittmanmead

50Data ingest process automatically applies some enrichments - geocoding etcCan apply others from Transformation page - simple transformations & Groovy expressionsData Transformation & Enrichment

[email protected] www.rittmanmead.com @rittmanmead

51Uses Salience text engine under the coversExtract terms, sentiment, noun groups, positive / negative words etcTransformations using Text Enrichment / Parsing

[email protected] www.rittmanmead.com @rittmanmead

52Choose option to Create New Attribute, to add derived attribute to datasetPreview changes, then save to transformation scriptCreate New Attribute using Derived (Transformed) Values

1

2

3

[email protected] www.rittmanmead.com @rittmanmead

53Users can upload their own datasets into BDD, from MS Excel or CSV fileUploaded data is first loaded into Hive table, then sampled/ingested as normalUpload Additional Datasets

123

[email protected] www.rittmanmead.com @rittmanmead

54Used to create a dataset based on the intersection (typically) of two datasetsNot required to just view two or more datasets together - think of this as a JOIN and SELECTJoin Datasets On Common Attributes

[email protected] www.rittmanmead.com @rittmanmead

55Select from palette of visualisation componentsSelect measures, attributes for displayCreate Discovery Pages for Dataset Analysis

[email protected] www.rittmanmead.com @rittmanmead

56Visualize and Interact With Hadoop Datasets

[email protected] www.rittmanmead.com @rittmanmead

57BDD Studio dashboards support faceted search across all attributes, refinementsAuto-filter dashboard contents on selected attribute values - for data discoveryFast analysis and summarisation through Endeca Server technologyFaceted Search Across Entire Data Reservoir

Further refinement onOBIEE in post keywords3

Results now filteredon two refinements4

[email protected] www.rittmanmead.com @rittmanmead

58Visual Analyzer also provides a form of data discovery for BI usersSimilar to Tableau, Qlikview etcInspired by BI elements of OEIDUses OBIEE RPD as the primary datasource, so data needs to be curated + structuredProbably a better option for users who arent concerned its big dataBut can still connect to Hadoop viaHive, Impala and Oracle Big Data SQLComparing BDD to Oracle Visual Analyzer

[email protected] www.rittmanmead.com @rittmanmead

59Data in the data reservoir typically is raw, hasnt been organised into facts, dimensions yetIn this initial phase, you dont want to it to be - too much up-front work with unknown dataLater on though, users will benefit from structure and hierarchies being added to dataBut this takes work, and you need to understand cost/benefit of doing it now vs. laterManaged vs. Free-Form Data Discovery

[email protected] www.rittmanmead.com @rittmanmead

60Transformations within BDD can then be used to create curated fact + dim Hive tablesCan be used then as a more suitable dataset for use with OBIEE RPD + Visual AnalyzerOr exported then in to Exadata or Exalytics to combine with main DW datasetsExport Prepared Datasets Back to Hive, for OBIEE + VA

[email protected] www.rittmanmead.com @rittmanmead

61Users in Visual Analyzer then havea more structured dataset to useData organised into dimensions, facts, hierarchies and attributesCan still access Hadoop directlythrough Impala or Big Data SQLBig Data Discovery though was key to initial understanding of dataFurther Analyse in Visual Analyzer for Managed Dataset

[email protected] www.rittmanmead.com @rittmanmead

62Oracle Big Data Discovery used to go back to the raw event data add more meaningEnrich data, extract nouns + terms, add reference data from file, RDBMS etcUnderstand sentiment + meaning of tweets, link disparate + loosely coupled eventsFaceted search dashboardsOracle BDD for Data Wrangling + Data Enrichment

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

63Previous counts assumed that all tweet references equally importantBut some Twitter users are far more influential than othersSit at the centre of a community, have 1000s of followersA reference by them has massive impact on page viewsPositive or negative comments from them drive perceptionCan we identify them?Potentially reach out with analyst programStudy what website posts go viralUnderstand out audience, and the conversation, betterBut Who Are The Influencers In Our Community?

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

64Rittman Mead website features many types of contentBlogs on BI, data integration, big data, data warehousingOp-Eds (OBIEE12c - Three Months In, Whats the Verdict?)Articles on a theme, e.g. performance tuningDetails of new courses, new promotionsDifferent communities likely to form around these content typesDifferent influencers and patterns of recommendation, discoveryCan we identify some of the communities, segment our audience?What Communities and Networks Are Our Audience?

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

65Graph Example : RM Blog Post Referenced on Twitter

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI

0000Page Views

1000Page Views

Follows

2000Page Views

Follows

3000Page Views

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

66Network Effect Magnified by Extent of Social Graph

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI

3000Page Views7005Page Views

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

67Retweets by Influential Twitter Users Drive Visits

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI

3000Page Views

Retweet5003Page Views

RT: Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

68Retweets, Mentions and Replies Create Communities

RetweetReplyMentionReply#bigdatasql

ReplyMentionMentionMentionMention#thatswhatshesaid

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

69Property Graph Terminology

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI

Mentions

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI

RetweetsNode, or VertexDirected Connection, or EdgeNode, or Vertex

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

70Different types of Twitter interaction could imply more or less influence

Retweet of another users Tweet implies that person is worth quotingor you endorse their opinion

Reply to another users tweet could be a weaker recognition of that persons opinion or view

Mention of a user in a tweet is a weaker recognition that they are part of a community / debateDetermining Influencers - Factors to Consider

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

71Relative Importance of Edge Types Added via Weights

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI

Mentions, Weight = 30

Lifting the Lid on OBIEE Internals with Linux Diagnostics Tools http://t.co/gFcUPOm5pI

Retweet, Weight = 100Edge PropertyEdge Property

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

72Graph, spatial and raster data processing for big dataRuns on-prem, or in Oracle Big Data Cloud ServiceInstallable on commodity cluster using CDHData stored in Apache HBase or Oracle NoSQL DBComplements Spatial & Graph in Oracle DatabaseDesigned for trillions of nodes, edges etcOut-of-the-box spatial enrichment servicesOver 35 of most popular graph analysis functionsGraph traversal, recommendationsFinding communities and influencers, Pattern matchingOracle Big Data Spatial & Graph

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

73Calculating Top 10 Users using Page Rank Algorithm

Top 10 influencers: markrittman rmoff rittmanmead mRainey JeromeFr Nephentur borkur BIExperte i_m_dave dw_pete

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

74Visualising the Social Graph Around Particular Users

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

75Calculating Shortest Path Between Users

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

76Edge Bundling to Better Illustrate Connection Frequency

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

77Determining Communities via Twitter Interactions

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

78Determining Communities via Twitter Interactions

Clusters based on actual interaction patterns, not hashtags Detects real communities, not ones that exist just in-theory

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead

79Extend your organisations reach into your data with Oracle Big Data Discovery, Cloudera Hadoop and the Rittman Mead Big Data Rapid Start.The Big Data Rapid Start is a fixed price, two week engagement delivered by Rittman Meads team of Oracle, Big Data and Data Discovery consultants, designed to quickly provide everything required to begin discovering the hidden value of your data.Move forward with confidence in the technology, process and application of Big Data Discovery with the support of the worlds leaders.Big Data Rapid Start from Rittman Mead

[email protected] www.rittmanmead.com @rittmanmead

80Articles on the Rittman Mead Bloghttp://www.rittmanmead.com/category/oracle-big-data-appliance/http://www.rittmanmead.com/category/big-data/http://www.rittmanmead.com/category/oracle-big-data-discovery/Rittman Mead offer consulting, training and managed services for Oracle Big DataOracle & Cloudera partnershttp://www.rittmanmead.com/bigdataAdditional Resources

[email protected] www.rittmanmead.com @rittmanmead

Unlock the Value in your Big Data Reservoir using Oracle Big Data Discovery and Rittman MeadMark Rittman, CTO, Rittman MeadMarch 2016

[email protected] www.rittmanmead.com @rittmanmead

[email protected] www.rittmanmead.com @rittmanmead