The Holy Grail of Data Analytics

THE HOLY GRAIL OF DATA ANALYTICS

Dan Lynn, CEO

• Data Services • Data Strategy • Data Integration / BI / Analytics • Modernize Data Infrastructures • Custom Applications & APIs

• Distributed over 6 states! • Fully-virtualized staff

www.agildata.com

Dan LynnCEO

Co-Founder @ FullContact 15 years building data systems Techstars 2011dan@agildata.com

www.agildata.comAll product names, logos, and brands are property of their respective owners. All company, product and service names used are for identification purposes only. Use of these names, logos, and brands does not imply endorsement.

Free MySQL Performance Analyzer

www.agildata.com/gibbs

AgilData Scalable Cluster

TRADE-OFFS

OLTP vs OLAP

OLTP OVERVIEW• “Online Transaction Processing”

• Database is optimized for low latency access to current data

• Short transactions (INSERT, UPDATE, DELETE)

• High concurrency

• Examples:

• Add item to shopping cart

• Reset password

OLAP OVERVIEW• Online Analytical Processing

• Database is optimized for aggregation of historical data

• Aggregations can span millions or billions of records

• Low(er) concurrency

• Examples:

• What is our average shopping cart size, grouped by week and by affiliate?

• What are the top 5 paths that users take when navigating our website?

HOW DATABASES OPTIMIZE FOR OLTP

• Optimized for reading or updating an entire row • (e.g. the full customer record)

• Data is written to and read from disk on a row-by-row basis.

• Indexes are used to construct full business object from multiple tables via JOINs. • (e.g. SELECT*FROMorderoJOINcustomercONc.id=o.customer_id)

• Hadoop and NoSQL systems generally behave the same.

• Scan performance is limited

HOW DATABASES OPTIMIZE FOR OLAP

• Optimized for aggregating columns • (e.g. SELECTAVG(unit_price*qty)FROMorder_lineGROUPBYc.id)

• Data is laid out on disk on a per-column basis. • Great for scans, not so good for random row-level access

• Doesn’t support random UPDATEs

HOW HADOOP OPTIMIZES FOR OLAP

• Data is partitioned in HDFS in append-only blocks of ~64MB.

• These blocks are spread out across the cluster.

• Processing (i.e. queries) is sent to the data, instead of bringing the data to the application for processing.

• Columnar data formats like Parquet can be stored on HDFS for very fast scan performance.

• Updates are very expensive.

Scan Performance

DATABASE

Updatability

THE LAMBDA ARCHITECTURE

Kafka, etc…

Data Stream

Write to HDFS Batch Computation(MapReduce, Spark)

Batch Views

Speed Layer(Storm, Spark Streaming, Flink, etc…)

Real-time views

Serving Layer(HBase, MySQL,

PostgreSQL, etc…)

THE LAMBDA ARCHITECTURE

• Apache Project (incubating)

• Started at Cloudera, growing industry adoption.

• Currently v0.9.1

• 1.0 release likely coming out in September 2016

Source: http://www.slideshare.net/cloudera/kudu-new-hadoop-storage-for-fast-analytics-on-fast-data

APACHE KUDU USE CASES• Online Reporting

• Examples: Operational Data Store, Customer-facing analytics, real-time dashboards

• Workload: Inserts, updates, scans, random lookups

• Time Series • Examples: Market analytics, fraud section, risk monitoring, message queueing

• Workload: Inserts, updates, scans, random lookups

• Machine Data Analysis

• Examples: Network threat detection, devops monitoring and alerting

• Workload: Inserts, scans, random lookups

THE ROAD AHEAD

• Reactive processing

• Dynamic / intelligent indexing

• High performance mutable message queueing

• Kudu project website:http://kudu.apache.org/

• Details about OLTP vs OLAP workloadshttp://datawarehouse4u.info/OLTP-vs-OLAP.html

• Analyst perspective on Kuduhttp://www.dbms2.com/2015/09/28/introduction-to-cloudera-kudu/

www.agildata.com

dan@agildata.com

@danklynn

Thanks!

CREDITS

• Grail image: https://upload.wikimedia.org/wikipedia/commons/1/10/London-Victoria_and_Albert_Museum-Grail-02.jpg

• Balanced scales:https://commons.wikimedia.org/wiki/File:Balanced_scale_of_Justice.svg

The Holy Grail of Data Analytics

Technology

Customer Service Holy Grail

The holy grail

The Holy Grail-1

The Holy Grail English 06.10.2009

The Holy Grail or the Alchemists’ stone? - OECD · The Holy Grail was a well-described object, and there was only one true grail Our problem: The Holy Grail? Or the Alchemists’

2885 Holy Grail

Adriana KouliasThe Holy Grail

The Holy Grail Russian 04.10.2009

The Holy Grail: Connecting CRM & Analytics Plaforms

Holy grail presentation website

Civil Procedure Holy Grail

Detecting and Preventing Fraud, Waste and Abuse: Using Analytics …media.govtech.net/GOVTECH_WEBSITE/EVENTS/PRESENTATION_DO… · ANALYTICS is the new ‘holy grail’ Analytics

Holy Jerusalem, Holy Grail

Iceman Inheritance, Holy Grail and Appropriate TechnologyIceman Inheritance, Holy Grail and Appropriate Technology

In Search of the Holy Grail - Longwoods€¦ · In Search of the Holy Grail One Year Later “In Search of the Holy Grail ... Back to the Quest for the Holy Grail " " " " "

The Holy Grail Trading System - DropPDF1.droppdf.com/files/mzdo9/the-holy-grail-trading-system.pdf · The Holy Grail Trading System ... The Birth of 'Grail' The Diary The Spread Betting

Holy grail or leaky cup',

Holy Grail of Roulette

The Holy Grail English 16.10.2009

Qi MAGEN STAR-HOLY GRAIL VORTEX - danmirahorian.rodanmirahorian.ro/HOLY-GRAIL-TECHNOLOGY.pdf · Azi tehnologia Holy Grail Vortex este utilizată de Quantum Star[Norvegia], Psitronic