160
SLA Compliance Assurance Charles Wheelus Senior Data Scientist, Cequint Splunk .conf 2013 October 2nd, 2013 1 Thursday, October 3, 13

Charles Wheelus - .conf2017 | The 8th Annual Splunk ... · Charles Wheelus, MSCS • Senior Data Scientist, Cequint ... research interests: Data Mining and Machine Learning • 2012

  • Upload
    doanque

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

SLA Compliance Assurance

Charles WheelusSenior Data Scientist, Cequint

Splunk .conf 2013October 2nd, 2013

1

Thursday, October 3, 13

About me:Charles Wheelus, MSCS

2

Thursday, October 3, 13

About me:Charles Wheelus, MSCS

• Senior Data Scientist, Cequint

2

Thursday, October 3, 13

About me:Charles Wheelus, MSCS

• Senior Data Scientist, Cequint• Ph.D. Candidate, Florida Atlantic University

2

Thursday, October 3, 13

About me:Charles Wheelus, MSCS

• Senior Data Scientist, Cequint• Ph.D. Candidate, Florida Atlantic University research interests: Data Mining and Machine Learning

2

Thursday, October 3, 13

About me:Charles Wheelus, MSCS

• Senior Data Scientist, Cequint• Ph.D. Candidate, Florida Atlantic University research interests: Data Mining and Machine Learning• 2012 Splunk Ninja Revolution award recipient

2

Thursday, October 3, 13

About me:Charles Wheelus, MSCS

• Senior Data Scientist, Cequint• Ph.D. Candidate, Florida Atlantic University research interests: Data Mining and Machine Learning• 2012 Splunk Ninja Revolution award recipient• Splunk Certified Architect

2

Thursday, October 3, 13

About me:Charles Wheelus, MSCS

• Senior Data Scientist, Cequint• Ph.D. Candidate, Florida Atlantic University research interests: Data Mining and Machine Learning• 2012 Splunk Ninja Revolution award recipient• Splunk Certified Architect • Technology consultant for 20 years

2

Thursday, October 3, 13

About me:Charles Wheelus, MSCS

• Senior Data Scientist, Cequint• Ph.D. Candidate, Florida Atlantic University research interests: Data Mining and Machine Learning• 2012 Splunk Ninja Revolution award recipient• Splunk Certified Architect • Technology consultant for 20 years• Splunk user and evangelist for three years

2

Thursday, October 3, 13

About me:Charles Wheelus, MSCS

• Senior Data Scientist, Cequint• Ph.D. Candidate, Florida Atlantic University research interests: Data Mining and Machine Learning• 2012 Splunk Ninja Revolution award recipient• Splunk Certified Architect • Technology consultant for 20 years• Splunk user and evangelist for three years• Started with version 4.3

2

Thursday, October 3, 13

3

Thursday, October 3, 13

About

3

Thursday, October 3, 13

Cequint provides handset and Carrier data services to most major wireless carriers in the U.S.

About

3

Thursday, October 3, 13

Cequint provides handset and Carrier data services to most major wireless carriers in the U.S.

About

http://cequint.com

3

Thursday, October 3, 13

About

4

Thursday, October 3, 13

About

4

Thursday, October 3, 13

About

5

Thursday, October 3, 13

About

5

Thursday, October 3, 13

Service Level Agreement (SLA)Compliance Assurance

Charles Wheelus October 2nd, 20136

Thursday, October 3, 13

...or

How to kill a flock of birds with one stone

Charles Wheelus October 2nd, 20137

Thursday, October 3, 13

Disclaimer: No birds were injured during the production of this presentation. :)

Charles Wheelus October 2nd, 20138

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

(on a Wireless Carrier network)

9

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

The project:

(on a Wireless Carrier network)

9

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

The project:

(on a Wireless Carrier network)

Develop a system that provides proof of our SLA compliance with our carrier customer

9

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

The project:

(on a Wireless Carrier network)

Develop a system that provides proof of our SLA compliance with our carrier customer

Time is of the essence!

9

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 201310

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

What is the significance of a Service Level Agreement?

10

degradedperformance

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

What is the significance of a Service Level Agreement?

10

+

degradedperformance

extendedperiod

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

What is the significance of a Service Level Agreement?

10

unhappycustomer

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

What is the significance of a Service Level Agreement?

10

happycustomer

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 201311

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Step 1: Determine the Key Performance Indicators

11

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

What are Key Performance Indicators (KPI)?

Step 1: Determine the Key Performance Indicators

11

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

What are Key Performance Indicators (KPI)?

Metrics used to evaluate factors that are critical to the optimal performanceof a organization, project or system

Step 1: Determine the Key Performance Indicators

11

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Challenges

12

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Challenges

• Numerous subsystems

12

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Challenges

• Numerous subsystems

• Different development teams

12

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Challenges

• Numerous subsystems

• Different development teams

• Different programming languages

12

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Challenges

• Numerous subsystems

• Different development teams

• Different programming languages

• Different operating systems

12

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Challenges

• Numerous subsystems

• Different development teams

• Different programming languages

• Different operating systems

• Wide variety of hardware types

12

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

The “Cat Herder”

13

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine what data to get

14

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine what data to get

Study the SLA

14

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine what data to get

Study the SLA

Engage others in the process

14

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine what data to get

Study the SLA

Engage others in the process

• Developers

14

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine what data to get

Study the SLA

Engage others in the process

• Developers• Management

14

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine what data to get

Study the SLA

Engage others in the process

• Developers• Management• Product team

14

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine what data to get

Study the SLA

Engage others in the process

• Developers• Management• Product team• Operations

14

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine the best place(s) to get the data from

15

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine the best place(s) to get the data from

15

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Establish best practice for data input

16

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Establish best practice for data input

What simple step can you take in the beginning that will save time later?

16

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Establish best practice for data input

What simple step can you take in the beginning that will save time later?

Best practices document

16

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Establish best practice for data input

What simple step can you take in the beginning that will save time later?

Best practices document

Verify the data is in the expected format!

16

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

17

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

syslog

17

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

syslogUDP

17

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

syslogUDP

17

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

18

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

UniversalForwarder

18

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

19

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

19

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

19

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

Determine transport method for getting the data into Splunk

19

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

KPI Data flow diagram

20

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

SLA report (RECAP):

21

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

SLA report (RECAP):

• Establish KPI

21

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

SLA report (RECAP):

• Establish KPI

• Get KPI data into Splunk

21

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

SLA report (RECAP):

• Establish KPI

• Get KPI data into Splunk

• KPI counter aggregation and reconciliation

21

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

SLA report (RECAP):

• Establish KPI

• Get KPI data into Splunk

• KPI counter aggregation and reconciliation

• Use Splunk REST API to build the report

21

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 2013

SLA report (RECAP):

• Establish KPI

• Get KPI data into Splunk

• KPI counter aggregation and reconciliation

• Use Splunk REST API to build the report

21

Thursday, October 3, 13

SLA Compliance

Charles Wheelus October 2nd, 201322

Thursday, October 3, 13

Charles Wheelus October 2nd, 201323

Thursday, October 3, 13

Charles Wheelus October 2nd, 201323

Thursday, October 3, 13

“Black-box” testing

Charles Wheelus October 2nd, 201324

Thursday, October 3, 13

“Black-box” testing

Charles Wheelus October 2nd, 2013

The problem:

24

Thursday, October 3, 13

“Black-box” testing

Charles Wheelus October 2nd, 2013

The problem:

Performance information about the Carrier’s self provisioning gateway is unavailable. We have to run our own tests to determine the expected performance

24

Thursday, October 3, 13

“Black-box” testing

Charles Wheelus October 2nd, 2013

The problem:

Performance information about the Carrier’s self provisioning gateway is unavailable. We have to run our own tests to determine the expected performance

Time is of the essence!

24

Thursday, October 3, 13

Charles Wheelus October 2nd, 201325

Thursday, October 3, 13

Charles Wheelus October 2nd, 201325

Thursday, October 3, 13

Charles Wheelus October 2nd, 201325

Thursday, October 3, 13

Charles Wheelus October 2nd, 201325

Thursday, October 3, 13

Charles Wheelus October 2nd, 201325

Thursday, October 3, 13

Charles Wheelus October 2nd, 201325

Thursday, October 3, 13

Charles Wheelus October 2nd, 201325

Thursday, October 3, 13

Charles Wheelus October 2nd, 201326

Thursday, October 3, 13

Black Box Testing

Charles Wheelus October 2nd, 201326

Thursday, October 3, 13

Black Box Testing

Charles Wheelus October 2nd, 201326

Thursday, October 3, 13

Black Box Testing

Charles Wheelus October 2nd, 201326

Thursday, October 3, 13

Black Box Testing

Charles Wheelus October 2nd, 201326

Thursday, October 3, 13

Charles Wheelus October 2nd, 201326

Thursday, October 3, 13

Charles Wheelus October 2nd, 201327

Thursday, October 3, 13

Charles Wheelus October 2nd, 201327

Thursday, October 3, 13

Load test results analysis

Charles Wheelus October 2nd, 201328

Thursday, October 3, 13

Load test results analysis

Charles Wheelus October 2nd, 2013

The problem:

28

Thursday, October 3, 13

Load test results analysis

Charles Wheelus October 2nd, 2013

The problem:

We need a quick way to evaluate the results of load testing.

28

Thursday, October 3, 13

Load test results analysis

Charles Wheelus October 2nd, 2013

The problem:

We need a quick way to evaluate the results of load testing.

Time is of the essence!

28

Thursday, October 3, 13

Load test results analysis

Charles Wheelus October 2nd, 201329

Thursday, October 3, 13

Load test results analysis

Charles Wheelus October 2nd, 201329

Thursday, October 3, 13

Load test results analysis

Charles Wheelus October 2nd, 201330

Thursday, October 3, 13

Load test results analysis

Charles Wheelus October 2nd, 201330

Thursday, October 3, 13

Charles Wheelus October 2nd, 201331

Thursday, October 3, 13

Charles Wheelus October 2nd, 201331

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 201332

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 2013

The problem:

32

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 2013

The problem:

Thousands of subsystem events may be generated into the log files, some events are inter-dependent. We need a comprehensive and robust system for detecting, correlating, and reporting these events to the correct development team.

32

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 2013

The problem:

Thousands of subsystem events may be generated into the log files, some events are inter-dependent. We need a comprehensive and robust system for detecting, correlating, and reporting these events to the correct development team.

Time is of the essence!

32

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 201333

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 2013

The solution:

33

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 2013

The solution:

Splunk saved and scheduled searches!

33

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 2013

The solution:

Splunk saved and scheduled searches!

With very brief training, the developers are building their own queries, saving and scheduling

33

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 201333

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 201333

Thursday, October 3, 13

Event Reporting

Charles Wheelus October 2nd, 201333

Thursday, October 3, 13

Charles Wheelus October 2nd, 201334

Thursday, October 3, 13

Charles Wheelus October 2nd, 201334

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

Event Monitoring and Alarming

35

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

The problem:

Event Monitoring and Alarming

35

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

The problem:

The operations team requires that the KPI produce alarm output into their pre-existing monitoring and alarm system

Event Monitoring and Alarming

35

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

The problem:

The operations team requires that the KPI produce alarm output into their pre-existing monitoring and alarm system

Time is of the essence!

Event Monitoring and Alarming

35

Thursday, October 3, 13

Charles Wheelus October 2nd, 201336

Event Monitoring and Alarming

Thursday, October 3, 13

• Operations has pre-existing alarming software

Charles Wheelus October 2nd, 201336

Event Monitoring and Alarming

Thursday, October 3, 13

• Operations has pre-existing alarming software

• Splunk was connected to OPS alarm system using the Splunk API

Charles Wheelus October 2nd, 201336

Event Monitoring and Alarming

Thursday, October 3, 13

Charles Wheelus October 2nd, 201336

Event Monitoring and Alarming

Thursday, October 3, 13

Charles Wheelus October 2nd, 201336

Event Monitoring and Alarming

Thursday, October 3, 13

Charles Wheelus October 2nd, 201337

Thursday, October 3, 13

Charles Wheelus October 2nd, 201337

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

Performance Analysis

38

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

The problem:

Performance Analysis

38

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

The problem:

The entire team needs to have up to the minute business intelligence.

Performance Analysis

38

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

The problem:

The entire team needs to have up to the minute business intelligence.

Time is of the essence!

Performance Analysis

38

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

Performance Analysis

39

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

The answer:

Performance Analysis

39

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

The answer:

Splunk Dashboards and Apps!

Performance Analysis

39

Thursday, October 3, 13

Performance Analysis

Charles Wheelus October 2nd, 201340

Thursday, October 3, 13

Performance Analysis

• Customized tools for Developers

Charles Wheelus October 2nd, 201340

Thursday, October 3, 13

Performance Analysis

• Customized tools for Developers

• Dashboards for Operations

Charles Wheelus October 2nd, 201340

Thursday, October 3, 13

Performance Analysis

• Customized tools for Developers

• Dashboards for Operations

• Trouble shooting for Developers and Operations

Charles Wheelus October 2nd, 201340

Thursday, October 3, 13

Performance Analysis

• Customized tools for Developers

• Dashboards for Operations

• Trouble shooting for Developers and Operations

• Business Intelligence for Management

Charles Wheelus October 2nd, 201340

Thursday, October 3, 13

41

Thursday, October 3, 13

41

Thursday, October 3, 13

41

Thursday, October 3, 13

41

Thursday, October 3, 13

41

Thursday, October 3, 13

41

Thursday, October 3, 13

41

Thursday, October 3, 13

Charles Wheelus October 2nd, 2013

Cut to the chaseSplunk’s greatest benefits:

42

Thursday, October 3, 13

•Time savings

Charles Wheelus October 2nd, 2013

Cut to the chaseSplunk’s greatest benefits:

42

Thursday, October 3, 13

•Time savings•Ability to react quickly (SPL)

Charles Wheelus October 2nd, 2013

Cut to the chaseSplunk’s greatest benefits:

42

Thursday, October 3, 13

•Time savings•Ability to react quickly (SPL)•Real time analytics

Charles Wheelus October 2nd, 2013

Cut to the chaseSplunk’s greatest benefits:

42

Thursday, October 3, 13

•Time savings•Ability to react quickly (SPL)•Real time analytics•Rapid dashboard production

Charles Wheelus October 2nd, 2013

Cut to the chaseSplunk’s greatest benefits:

42

Thursday, October 3, 13

What’s next?

Charles Wheelus October 2nd, 201343

Thursday, October 3, 13

What’s next?

Charles Wheelus October 2nd, 2013

• Deeper analytics

43

Thursday, October 3, 13

What’s next?

Charles Wheelus October 2nd, 2013

• Deeper analytics• New metrics & dashboards

43

Thursday, October 3, 13

What’s next?

Charles Wheelus October 2nd, 2013

• Deeper analytics• New metrics & dashboards• Modular inputs

43

Thursday, October 3, 13

What’s next?

Charles Wheelus October 2nd, 2013

• Deeper analytics• New metrics & dashboards• Modular inputs• More use of Splunk Apps

43

Thursday, October 3, 13

Charles Wheelus October 2nd, 201344

Thursday, October 3, 13

Charles Wheelus October 2nd, 201344

Thursday, October 3, 13

Charles Wheelus October 2nd, 201345

Thursday, October 3, 13