25
Introduction to Infobright 1 Confidential – Do Not Distribute

Introduction to Infobright April 16, 2014

Embed Size (px)

DESCRIPTION

The explosive growth of machine generated data is creating challenges in not only storage and management but how to turn that data into actionable information. On April 16, 2014 our webinar "Introduction to Infobright" explores how Infobright accelerates ad-hoc query performance and reduces costs.

Citation preview

Page 1: Introduction to Infobright April 16, 2014

Introduction to Infobright

1Confidential – Do Not Distribute

Page 2: Introduction to Infobright April 16, 2014

Agenda

Today’s Analytic ChallengesThe Infobright Analytic PlatformGetting Started

Page 3: Introduction to Infobright April 16, 2014

The Rise of Machine Data

50 billion connected devices

$7.4B mobile advertising billings

190 Exabytes of data from: –Web logs–Sensor data–Call data records–Transaction records

3Confidential – Do Not Distribute

Page 4: Introduction to Infobright April 16, 2014

More than just “Big” Data

•Transactional•Analytics

•Dynamic•Static

•Pre-planned•Ad-hoc•Rough•Approximate

•Structured•Unstructured•Semi-structured

Data Query

FunctionData Refresh

Page 5: Introduction to Infobright April 16, 2014

“Real time” Analytics: The New Imperative

Identify security threats & fraud Troubleshoot networks Optimize online/mobile ads Plan capacity scale-out Competitive positioning

Page 6: Introduction to Infobright April 16, 2014

Infobright Powers Big Data Analytics

6Confidential – Do Not Distribute

Page 7: Introduction to Infobright April 16, 2014

Who is Infobright

Global provider of database analytics platforms to over 450 direct and OEM customers in the telecom, digital media and marketing, financial services, solution provider, energy and healthcare markets

Page 8: Introduction to Infobright April 16, 2014

Key Benefits of the Knowledge Grid Architecture

8Confidential – Do Not Distribute

Page 9: Introduction to Infobright April 16, 2014

Column vs. Row: What is the best use case?

Row Oriented

All the columns are needed

Transactional processing is required

Column Oriented

Only relevant columns are needed

Reports are aggregates (sum, count, average,

etc.)

Page 10: Introduction to Infobright April 16, 2014

Column vs. Row: How it Works

50 days worth of data, 1 million rows / day

Disk I/O is the primary limiting factor

A row-oriented design forces the database to retrieve all column data

As table size increases so do the indexes

Load speed degrades since indexes need to be recreated as data is added; this causes huge sorts (another very slow operation)

30 Columns

50

M R

ow

s

Page 11: Introduction to Infobright April 16, 2014

Column vs. Row: How it works

Query:– Select Column 11 ,

Where Column 17 for the 3rd week (day 15 – day 21)

30 Columns

50

M R

ow

s

Page 12: Introduction to Infobright April 16, 2014

Column vs. Row: How it Works

Row-based results– Eliminate 43 days– 7 million rows

retrieved– 210 million data

elements retrieved

30 Columns

50

M R

ow

s

Page 13: Introduction to Infobright April 16, 2014

Column vs. Row: How it Works

Column-based results– Eliminate 43 days– Eliminate 28 of the 30

columns– 14 million data

elements

30 Columns

50

M R

ow

s

Page 14: Introduction to Infobright April 16, 2014

Data Loading Process: Data Packs

Bulk load input data

… … …64K

64K

64K

64K

A1

A2

A3

A-n

B1

B2

B3

B-n

C1

C2

C3

C-n

Data Packs

Page 15: Introduction to Infobright April 16, 2014

Data Loading Process: Compression &Knowledge Grid

… … …

64K

64K

64K

64K

Data packs compressed

On-Disk storage

In MemoryKnowledge Grid

Page 16: Introduction to Infobright April 16, 2014

What Your Data Looks Like Now

Original Data

10 TBCompressed Data

500 GB

Page 17: Introduction to Infobright April 16, 2014

The Knowledge Grid: How it works

Knowledge Nodes answer the query directly, or

Identify only required Data Packs, minimizing decompression, and

Predict required data in advance based on workload

All driven by a granular computing engine

Page 18: Introduction to Infobright April 16, 2014

Queries with the Knowledge Grid: How it Works

Query: How are my sales doing this year?

Granular engine iterates on Knowledge Grid

Each pass eliminates Data Packs

If any Data Packs are needed to resolve query, only those are decompressed

Knowledge Grid

Compressed Data

Page 19: Introduction to Infobright April 16, 2014

Queries with the Knowledge Grid: How it Works

SELECT count(*)FROM employees WHERE salary > 100000

AND age < 35AND job = ‘DBA’AND state = ‘TX’

salary age job state

No Match Suspect All Match

Page 20: Introduction to Infobright April 16, 2014

Queries with the Knowledge Grid: How it Works

SELECT count(*)FROM employees WHERE salary > 100000

AND age < 35AND job = ‘DBA’AND state = ‘TX’

salary age job state

No match Suspect All Match

All packs ignored

All packs ignored

All packs ignored

Only this pack will be decompressed

Page 21: Introduction to Infobright April 16, 2014

Working with Infobright & Hadoop

General purpose database solutions require:– Significant administration, ongoing tuning and indexing– More hardware– Less flexibility for macroscopic investigative analytics– Higher total cost of ownership

Hadoop ConnecterInfobright

Enterprise EditionBI Tools

Page 22: Introduction to Infobright April 16, 2014

Customer Example: JDSU

Low Admin: Do not want to force users to require DBA’s to keep solution running

Load Speeds: Ingestion rates continue to increase, placing heavy burden on solutions

High Compression: Want to keep longer histories in less space

Requirements

Lower TCO: Resulting in better value for customers, better margins for providers

Stripped Away “DBA” tax requirement required by previous versions

Ingesting over 1TB/Hour, with significant headroom beyond that

Over 3X the retention period and a 5X simultaneous reduction in storage requirement

Lower TCO for users, higher margins for JDSU

Results

Little to No Admin

Fast Load Speeds

20:1+ Compression

Exceptional Ad Hoc Query Performance

Very Low TCO

22

Page 23: Introduction to Infobright April 16, 2014

Customer Example: LiveRail

Low Admin: Reduce the requirements for labor intensive reporting

Ad Hoc Query Capabilities: Ability to mine data based for investigative analytics

High Compression: Want to keep longer histories in less space

Requirements

Lower TCO: Robust analytics platform without excessive outlay of capital or people

Eliminated the need for staff to run customized reports using Hive

Developed a portal where customers can run their own ad hoc reporting

Minimal resources required to house the Infobright repository for reporting

Better results for customers, lower costs and higher margins for LiveRail

Results

Little to No Admin

Fast Load Speeds

20:1+ Compression

Exceptional Ad Hoc Query Performance

Very Low TCO

23

Page 24: Introduction to Infobright April 16, 2014

Customer Example: JC Decaux

Low Admin: Reduce the requirements for labor intensive reporting

Ad Hoc Query Capabilities: Consolidate and issue timely reports from disparate data sources

High Compression: Existing Oracle-based system couldn’t handle the volume of data

Requirements

Lower TCO: Minimize admin required for managing Oracle and work with Hadoop

Ability to create essential reports in less than three minutes

Fast queries: queries originally taking 15+ minutes using MySQL reduced to seconds

Fast uploads: Data loads that used to take two hours are now happening in 20 minutes.

implemented in three months. Fast deployment: System implemented in three months.

Results

Little to No Admin

Fast Load Speeds

20:1+ Compression

Exceptional Ad Hoc Query Performance

Very Low TCO

24

Page 25: Introduction to Infobright April 16, 2014

Download our trial

Follow us on Twitter

Follow us on LinkedIn

Join our community

Getting Started with Infobright