36
Copyright © 2014 Splun Inc. Data Onboarding Ingestion Without the Indigestion David Milli Client Architec

SplunkLive! Presentation - Data Onboarding with Splunk

  • Upload
    splunk

  • View
    1.157

  • Download
    4

Embed Size (px)

Citation preview

Page 1: SplunkLive! Presentation - Data Onboarding with Splunk

Copyright © 2014 Splunk Inc.

Data OnboardingIngestion Without the

Indigestion

David MillisClient Architect

Page 2: SplunkLive! Presentation - Data Onboarding with Splunk

2

Legal NoticeDuring the course of this presentation, we may make forward looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC.  The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information.  We do not assume any obligation to update any forward looking statements we may make.  In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release.

Splunk®, Splunk>®, Listen to Your Data®, The Engine for Machine Data®, Hunk™, Splunk Cloud™, Splunk Storm® and SPL™ are registered trademarks or trademarks of Splunk Inc. in the United States and/or other countries. All other brand names, product names or trademarks belong to their respective owners. © 2014 Splunk Inc. All rights reserved.

Page 3: SplunkLive! Presentation - Data Onboarding with Splunk

• Systematic way to bring new data sources into Splunk

• Make sure that new data is instantly usable

& has maximum value for users

• Goes hand-in-hand with the User Onboarding process

(sold separately)

What is the Data Onboarding Process?

Page 4: SplunkLive! Presentation - Data Onboarding with Splunk

Know Your (Data) Pipeline

Page 5: SplunkLive! Presentation - Data Onboarding with Splunk

The Data Pipeline

Page 6: SplunkLive! Presentation - Data Onboarding with Splunk

The Data Pipeline

Any Questions?

Page 7: SplunkLive! Presentation - Data Onboarding with Splunk

The Data Pipeline

Page 8: SplunkLive! Presentation - Data Onboarding with Splunk

• Input Processors: Monitor, FIFO, UDP, TCP, Scripted

• No events yet-- just a stream of bytes

• Break data stream into 64KB blocks

• Annotate stream with metadata keys (host, source,

sourcetype, index, etc.)

• Can happen on UF, HF or indexer

Inputs– Where it all starts

Page 9: SplunkLive! Presentation - Data Onboarding with Splunk

• Check character set

• Break lines

• Process headers

• Can happen on HF or indexer

Parsing Queue

Page 10: SplunkLive! Presentation - Data Onboarding with Splunk

• Merge lines for multi-line events

• Identify events (finally!)

• Extract timestamps

• Exclude events based on timestamp (MAX_DAYS_AGO, ..)

• Can happen on HF or indexer

Aggregation/Merging Queue

Page 11: SplunkLive! Presentation - Data Onboarding with Splunk

• Do regex replacement (field extraction, punctuation extraction, event routing, host/source/sourcetype overrides)

• Annotate events with metadata keys (host, source, sourcetype, ..)

• Can happen on HF or indexer

Typing Queue

Page 12: SplunkLive! Presentation - Data Onboarding with Splunk

• Output processors: TCP, syslog, HTTP• Indexandforward• Sign blocks• Calculate license volume and throughput metrics• Index• Write to disk• Can happen on HF or indexer

Indexing Queue

Page 13: SplunkLive! Presentation - Data Onboarding with Splunk

The Data Pipeline

Page 14: SplunkLive! Presentation - Data Onboarding with Splunk

Data Pipeline: UF & Indexer

Page 15: SplunkLive! Presentation - Data Onboarding with Splunk

Data Pipeline: HF & Indexer

Page 16: SplunkLive! Presentation - Data Onboarding with Splunk

Data Pipeline: UF, IF & Indexer

Page 17: SplunkLive! Presentation - Data Onboarding with Splunk

Data Onboarding Process

Page 18: SplunkLive! Presentation - Data Onboarding with Splunk

• Pre-board• Build the index-time configs• Build the search-time configs• Create data models• Document• Test• Get ready to deploy• Bring it!• Test & Validate

Process Overview

Page 19: SplunkLive! Presentation - Data Onboarding with Splunk

• Identify the specific sourcetype(s) - onboard each separately• Check for pre-existing app/TA on splunk.com-- don't reinvent the wheel!• Gather info

• Where does this data originate/reside? How will Splunk collect it?• Which users/groups will need access to this data? Access controls?• Determine the indexing volume and data retention requirements• Will this data need to drive existing dashboards (ES, PCI, etc.)?• Who is the SME for this data?

• Map it out• Get a "big enough" sample of the event data• Identify and map out fields• Assign sourcetype and TA names according to CIM conventions

Pre-Board

Page 20: SplunkLive! Presentation - Data Onboarding with Splunk

• The Common Information Model (CIM) defines relationships in the underlying data, while leaving the raw machine data intact

• A naming convention for fields, eventtypes & tags• More advanced reporting and correlation requires that the

data be normalized, categorized, and parsed• CIM-compliant data sources can drive CIM-based

dashboards (ES, PCI, others)

Tangent: What is the CIM and why should I care?

Page 21: SplunkLive! Presentation - Data Onboarding with Splunk

• Identify necessary configs (inputs, props and transforms) to properly handle:• timestamp extraction, timezone, event breaking,

sourcetype/host/source assignments

• Do events contain sensitive data (i.e., PII, PAN, etc.)? Create masking transforms if necessary

• Package all index-time configs into the TA

Build the Index-time configs

Page 22: SplunkLive! Presentation - Data Onboarding with Splunk

• Assign sourcetype according to event format; events with similar format should have the same sourcetype

• When do I need a separate index?• When the data volume will be very large, or when it will

be searched exclusively a lot• When access to the data needs to be controlled• When the data requires a specific data retention policy

• Resist the temptation to create lots of indexes

Tangent: Best & Worst Practices

Page 23: SplunkLive! Presentation - Data Onboarding with Splunk

• Always specify a sourcetype and index

• Be as specific as possible: use /var/log/fubar.log,

not /var/log/

• Arrange your monitored filesystems to minimize

unnecessary monitored logfiles

• Use a scratch index while testing new inputs

Best & Worst Practices – [monitor]

Page 24: SplunkLive! Presentation - Data Onboarding with Splunk

• Lookout for inadvertent, runaway monitor clauses

• Don’t monitor thousands of files unnecessarily–

that’s the NSA’s job

• From the CLI: splunk show monitor

• From your browser:

https://your_splunkd:8089/services/admin/inputstatus/TailingPro

cessor:FileStatus

Best & Worst Practices – [monitor]

Page 25: SplunkLive! Presentation - Data Onboarding with Splunk

• Find & fix index-time problems BEFORE polluting your index

• A try-it-before-you-fry-it interface for figuring out

• Event breaking

• Timestamp recognition

• Timezone assignment

• Provides the necessary props.conf parameter settings

Your friend, the Data PreviewerAnotherTangent!

Page 26: SplunkLive! Presentation - Data Onboarding with Splunk

Data Onboarding ProcessContinued

Page 27: SplunkLive! Presentation - Data Onboarding with Splunk

• Identify "interesting" events which should be tagged with an existing CIM tag (http://docs.splunk.com/Documentation/CIM/latest/User/Alerts)

• Get a list of all current tags: | rest splunk_server=local /services/admin/tags | rename tag_name as tag, field_name_value AS definition, eai:acl.app AS app | eval definition_and_app=definition . " (" . app . ")" | stats values(definition_and_app) as "definitions (app)" by tag | sort +tag

• Get a list of all eventtypes (with associated tags): | rest splunk_server=local /services/admin/eventtypes | rename title as eventtype, search AS definition, eai:acl.app AS app | table eventtype definition app tags | sort +eventtype

• Examine the current list of CIM tags: for each "interesting" event, identify which tags should be applied to each. A particular event may have multiple tags

• Are there new tags which should be created, beyond those in the current CIM tag library? If so, add them to the CIM library

Build the Search-time Configs:eventtypes & tags

Page 28: SplunkLive! Presentation - Data Onboarding with Splunk

• Extract "interesting" fields• If already in your CIM library, name or alias appropriately• If not already in your CIM library, name according to CIM conventions

• Add lookups for missing/desirable fields• Lookups may be required to supply CIM-compliant fields/field values (for example,

to convert 'sev=42' to 'severity=medium'• Make the values more readable for humans

• Put everything into the TA package

Build the Search-time Configs:extractions & lookups

Page 29: SplunkLive! Presentation - Data Onboarding with Splunk

• Create data models. What will be interesting for end users?

• Document! (Especially the fields, eventtypes & tags)

• Test• Does this data drive relevant existing dashboards correctly?• Do the data models work properly / produce correct results?• Is the TA packaged properly?• Check with originating user/group; is it OK?

Keep Going

Page 30: SplunkLive! Presentation - Data Onboarding with Splunk

• Determine additional Splunk infrastructure required; can existing infrastructure & license support this?

• Will new forwarders be required? If so, initiate CR process(es)

• Will firewall changes be required? If so, initiate CR process(es)

• Will new Splunk roles be required? Create & map to AD roles

• Will new app contexts be required? Create app(s) as necessary

• Will new users be added? Create the accounts

Get Ready to Deploy

Page 31: SplunkLive! Presentation - Data Onboarding with Splunk

• Deploy new search heads & indexers as needed

• Install new forwarders as needed

• Deploy new app & TA to search heads & indexers

• Deploy new TA to relevant forwarders

Bring it!

Page 32: SplunkLive! Presentation - Data Onboarding with Splunk

• All sources reporting?• Event breaking, timestamp, timezone, host, source,

sourcetype?• Field extractions, aliases, lookups?• Eventtypes, tags?• Data model(s)?• User access?• Confirm with original requesting user/group: looks OK?

Test & Validate

Page 33: SplunkLive! Presentation - Data Onboarding with Splunk

Done!

Page 34: SplunkLive! Presentation - Data Onboarding with Splunk

• Bring new data sources in correctly the first time

• Reduce the amount of “bad” data in your indexes– and the time spent dealing with it

• Make the new data immediately useful to ALL users– not just the ones who originally requested it

• Allow the data to drive all sorts of dashboards without extra modifications

Gee, This Seems Like a Lot of Work…

Page 35: SplunkLive! Presentation - Data Onboarding with Splunk

• http://docs.splunk.com/Documentation/Splunk/latest/Deploy/Datapipeline

• http://wiki.splunk.com/Community:HowIndexingWorks

• http://wiki.splunk.com/Where_do_I_configure_my_Splunk_settings

• http://docs.splunk.com/Documentation/CIM/latest/User/Overview

• http://docs.splunk.com/Documentation/CIM/latest/User/Alerts

• http://splunk-base.splunk.com/apps/29008/sos-splunk-on-splunk

Reference

Page 36: SplunkLive! Presentation - Data Onboarding with Splunk

Copyright © 2014 Splunk Inc.

Thank You!

David [email protected]