18
Real-time Analytics from Small Data, Big Data and Huge Data Raanan Dagan, Big Data Solutions, Splunk Copyright © 2012 Splunk Inc.

CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Real-time Analytics

from Small Data, Big

Data and Huge Data

Raanan Dagan, Big Data Solutions, Splunk

Copyright © 2012 Splunk Inc.

Page 2: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

What I’ll Talk About

Machine Data

Splunk and Big Data, Real-time Analytics

Customer Use Cases

2

Page 3: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Big Data Comes from Machines

Volume | Velocity | Variety | Variability

GPS,

RFID,

Hypervisor,

Web Servers,

Email, Messaging

Clickstreams, Mobile,

Telephony, IVR, Databases,

Sensors, Telematics, Storage,

Servers, Security Devices, Desktops

Machine-generated data is one of the

fastest growing, most complex

and most valuable segments of big data

3

Page 4: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

What Does Machine Data Look Like?

4

Sources

Twitter

Care IVR

Middleware Error

Order Processing

Page 5: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Machine Data Contains Critical Insights

5

Order ID

Customer’s Tweet

Time Waiting On Hold

Product ID

Company’s Twitter ID

Sources

Twitter

Care IVR

Middleware Error

Order Processing

Order ID

Customer ID

Twitter

ID

Customer ID

Customer ID

Page 6: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Splunk: The Platform for Machine Data

6

Insight and Visualizations

for Executives

Statistical Analysis

Proactive Monitoring

Search and Investigation

Machine Data Operational Intelligence

Splunk Index

Page 7: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Customer

Facing Data

Outside the

Datacenter

Applications

Web logsLog4J, JMS, JMX.NET eventsCode and scripts

Networking

ConfigurationssyslogSNMPnetflow

Databases

ConfigurationsAudit/query logsTablesSchemas

Virtualization

& Cloud

HypervisorGuest OS, AppsCloud

Linux/Unix

ConfigurationssyslogFile systemps, iostat, top

Windows

RegistryEvent logsFile systemsysinternals

Logfiles Configs Messages Traps

Alerts

Metrics Scripts TicketsChanges

Click-stream dataShopping cart dataOnline transaction data

Manufacturing, logistics…CDRs & IPDRsPower consumptionRFID dataGPS data

Splunk Collects and Indexes Machine DataNo upfront schema. No RDBMS. No custom connectors.

7

Page 8: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Operational Intelligence for IT and Business Users

Web Intelligence

Application Management Business Analytics

Security & Compliance

LOB Owners/

Executives

LOB Owners/

ExecutivesCustomer

Support

Customer

Support

System

Administrator

System

Administrator

IT Operations Management

Operations

Teams

Operations

Teams

Security

Analysts

Security

Analysts

IT

Executives

IT

ExecutivesDevelopment

Teams

Development

Teams AuditorsAuditors

Website/Business

Analysts

Website/Business

Analysts

8

Page 9: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

The Technical part

Page 10: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Splunk Has Four Primary Functions

• Searching and Reporting (Search Head)

• Indexing and Search Services (Indexer)

• Local and Distributed Management (Deployment Server)

• Data Collection and Forwarding (Forwarder)

A Splunk install can be one or all roles…

10

Page 11: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Scalability to Tens of TBs/Day on Commodity Servers

Send data from 1000s of servers using combination of Splunk Forwarders, syslog, WMI, message queues, or other remote protocols

Auto load-balanced forwarding to as many Splunk Indexers as you need to index terabytes/day

Offload search load to Splunk Search Heads

11

Page 12: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Analyzing Heterogeneous Data

No data normalization

Automatically handles

timestamps

Parsers not required

Index every term &

pattern “blindly”

No attempt to

“understand” up front

Normalization as it’s

needed

Faster implementation

Easy search language

Multiple views into the

same data

Knowledge applied at

search-time

No brittle schema to work

around

Multiple views into the

same data

Find transactions, patterns

and trends

Universal

Indexing

Late Structure

Binding

Analysis and Visualization

Rapid time-to-deploy: hours or days

12

Page 13: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Real-time Analytics

Data

Pa

rsin

g Q

ue

ue Parsing Pipeline

• Source, event typing

• Character set

normalization

• Line breaking

• Timestamp identification

• Regex transforms

Indexing

Pipeline

Real-time

Buffer

Raw data

Index Files

Real-time Search Process

Real-time Search Process

Monitor Input

Ind

ex

Qu

eu

e

TCP/UDP Input

Scripted InputSplunk

Index

13

Page 14: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Splunk and Hadoop

14

Splunk Hadoop Connect

Reliable Data Export

Import Hadoop Data

Splunk App for HadoopOps

End-to-end monitoring,

troubleshooting , analysis of

Hadoop environment

>>>>

Real-time Collection and

Analysis

Dashboards, Reports,

Access Controls

>>

Page 15: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

15

Splunk Hadoop Connect

Delivers reliable integration

between Splunk and Hadoop

Export events collected and

aggregated in Splunk to HDFS

Explore and browse HDFS

directories and files

Import and index data from HDFS

for secure searching, reporting,

analysis and visualizations in Splunk

Page 16: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Splunk App for HadoopOps

16

End-to-end monitoring and

troubleshooting for Hadoop

Monitoring of entire Hadoop

environment (Network, Switch,

Operating System and Database)

Integrated alerting to track and

respond to activities from MapReduce

to the individual node in the cluster

Centralized real-time view of Hadoop

nodes using intuitive heatmap display

Page 17: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Summary - Splunk Big Data Solution

Product-basedSolution

Performance at scale

Integrated and End-to-end

17

Page 18: CeBIT Big Data 2012 - Raanan Dagan, Big Data Product Marketing, Splunk

Thank You