Splunk Dynamic lookup

Preview:

Citation preview

Dynamic Lookups

Agenda

Lookups in General

Static Lookups

Dynamic Lookups- Retrieve fields from a web site- Retrieve fields from a database- Retrieve fields from a persistent cache

2

Enrich Your Events with Fields from External Sources

3

4

Splunk: The Engine for Machine Data

Web logsLog4J, JMS, JMX.NET eventsCode and scripts

ConfigurationssyslogSNMPnetflow

ConfigurationsAudit/query logsTablesSchemas

HypervisorGuest OS, AppsCloud

ConfigurationssyslogFile systemps, iostat, top

RegistryEvent logsFile systemsysinternals

Logfiles Configs Messages Traps Alerts

Metrics Scripts TicketsChanges

Linux/UnixWindows NetworkingDatabasesApplicationsVirtualization

& Cloud

Click-stream dataShopping cart dataOnline transaction data

Customer Facing Data

Outside the Datacenter

Manufacturing, logistics…CDRs & IPDRsPower consumptionRFID dataGPS data

5

6

7

8

Interesting Things to Lookup

• User’s Mailing Address• Error Code Descriptions• Product Names• Stock Symbol (from CUSIP)

• External Host Address• Database Query• Web Service Call for Status• Geo Location

9

Other Reasons For Lookup

10

• Bypass static developer or vendor that does not enrich logs• Imaginative correlations• Example: A website URL with “Like” or “Dislike” count

stored in external source• Make your data more interesting• Better to see textual descriptions than arcane codes

Agenda

Lookups in General

Static Lookups

Dynamic Lookups- Retrieve fields from a web site- Retrieve fields from a database- Retrieve fields from a persistent cache

11

Static vs. Dynamic Lookup

12

Static

Dynamic

External Data comes from a CSV file

External Data comes from output of external script, which resembles a CSV file

Static Lookup Review

13

• Pick the input fields that will be used to get output fields• Create or locate a CSV file that has all the fields you need in the

proper order• Tell Splunk via the Manager about your CSV file and your lookup• You can also define lookups manually via props.conf and

transforms.conf• If you use automatic lookups, they will run every time the

source, sourcetype or associated host stanza is used in a search• Non-automatic lookups run only when the lookup command is

invoked in the search

Example Static Lookup Conf Files

14

props.conf[access_combined]

lookup_http = http_status statusOUTPUT status_description, status_type

transforms.conf[http_status]

filename = http_status.csv

PermissionsDefine Lookups via Splunk Manager & set permissions there

15

local.meta

[lookups/http_status.csv]export = system

[transforms/http_status]export = system

Example Automatic Static Lookup

16

Agenda

Lookups in General

Static Lookups

Dynamic Lookups- Retrieve fields from a web site- Retrieve fields from a database- Retrieve fields from a persistent cache

17

Dynamic Lookups

18

• Write the script to simulate access to external source

• Test the script with one set of inputs

• Create the Splunk Version of the lookup script

• Register the script with Splunk via Manager or conf files

• Test the script explicitly before using automatic lookups

Lookups vs Custom Command

19

• Use dynamic lookups when returning fields given input fields

• Standard use case for users who already are familiar with lookups

• Use a custom command when doing MORE than a lookup

• Not all use cases involve just returning fields

• Decrypt event data

• Translate event data from one format to another with new fields

(e.g. FIX)

Write/Test External Field Gathering Script

20

External Data inCloud Your Python Script

Send: Input Fields

Return: Output Fields

Example Script to Test External Lookup

21

# Given a host, find the corresponding IP address

def mylookup(host):

try:

ipaddrlist = socket.gethostbyname_ex(host)

return ipaddrlist

except:

return[]

External Field Gathering Script with Splunk

22

External Data inCloud Your Python Script

Return: Output Fields

Script for Splunk Simulates Reading Input CSV

23

hostname, ip

a.b.c.com

zorrosty.com

seemanny.com

Output of Script Returns Logically Complete CSV

24

hostname, ip

a.b.c.com, 1.2.3.4

zorrosty.com, 192.168.1.10

seemanny.com, 10.10.2.10

transforms.conf for Dynamic Lookup

25

[NameofLookup]

external_cmd = <name>.py field1….fieldN

external_type = python

fields_list = field1, …, fieldN

Example Dynamic Lookup conf files

26

transforms.conf# Note – this is an explicit lookup

[whoisLookup]external_cmd = whois_lookup.py ip whoisexternal_type = pythonfields_list = ip, whois

Dynamic Lookup Python Flow

27

def lookup(input): Perform external lookup based on input. Return result

main()Check standard input for CSV headers.

Write headers to standard output.

For each line in standard input (input fields): Gather input fields into a dictionary (key-value structure) ret = lookup(input fields) If ret: Send to standard output input values and return values from lookup

Whois Lookup

28

def main():

if len(sys.arv) != 3:

print “Usage: python whois_lookup.py [ip field]

[whois field]”

sys.exit(0)

ipf = sys.argv[1]

whoisf = sys.argv[2]

r = csv.reader(sys.stdin)

w = none

header = [ ]

first = True…

Whois Lookup (cont.) to Read CSV Header

29

# First get read the “CSV Header” and output the field names

for line in r:

if first:

header = line

if whoisf not in header or ipf not in header:

print “IP and whois fields must exist in CSV

data”

sys.exit(0)

csv.write(sys.stdout).writerow(header)

w = csv.DictWriter(sys.stdout, header)

first = False continue…

Whois Lookup (cont.) to Populate Input Fields

30

# Read the result and populate the values for the

input fields (ip address in our case)

result = {}

i = 0

while i < len(header):

if i < len(line):

result[header[i]] = line[i]

else:

result[header[i]] = ''

i += 1

Whois Lookup (cont.) to Populate Input Fields

31

# Perform the whois lookup if necessary

if len(result[ipf]) and len(result[whoisf]):

w.writerow(result)

# Else call external website to get whois field from

the ip address as the key

elif len(result[ipf]):

result[whoisf] = lookup(result[ipf])

if len(result[whoisf]):

w.writerow(result)

Whois Lookup Function

32

LOCATION_URL=http://some.url.com?query=

# Given an ip, return the whois response

def lookup(ip):

try:

whois_ret = urllib.urlopen(LOCATION_URL + ip)

lines = whois_ret.readlines()

return lines

except:

return ''

Database Lookup

33

• Acquire proper modules to connect to the database

• Connect and authenticate to database

• Use a connection pool if possible

• Have lookup function query the database

• Return a list([]) of results

Database Lookup vs. Database Sent To Index

34

• Well, it depends…• Use a Lookup when:• Using needle in the haystack searches with a few users• Using form searches returning few results

• Index the database table or view when:• Having LOTS of users and ad hoc reporting is needed• It’s OK to have “stale” data (N minutes) old for a dynamic

database

Example Database Lookup using MySQL

35

# First connect to DB outside of the for loop

conn = MySQLdb.connect(host = “localhost”, user = “name of user”,passwd = “password”,db = “Name of DB”)

cursor = conn.cursor()

Example Database Lookup (cont.) using MySQL

36

import MySQLdb…

# Given a city, find its country

def lookup(city, cur):

try:

selString=“SELECT country FROM city_country where city=“

cur.execute(selString + “\”” + city + “\””)

row = cur.fetechone()

return row[0]

except:

return []

Lookup Using Key Value Persistent Cache

37

• Download and install Redis• Download and install Redis Python module• Import Redis module in Python and populate

key value DB• Import Redis module in lookup function

given to Splunk to lookup a value given a key

Redis is an open source, advanced key-value store.

Redis Lookup

38

###CHANGE PATH According to your REDIS install ######

sys.path.append(“/Library/Python/2.6/…/redis-2.4.5-py.egg”)

import redis

def main()

#Connect to redis – Change for your distribution

pool = redis.ConnectionPool(host=‘localhost’,port=6379,db=0)

redp = redis.Redis(connection_pool=pool)

Redis Lookup (cont.)

39

def lookup(redp, mykey):

try: return redp.get(mykey)

except: return “”

Combine Persistent Cache with External Lookup

40

• For data that is “relatively static”• First see if the data is in the persistent cache• If not, look it up in the external source such as a database or

web service• If results come back, add results to the persistent cache and

return results• For data that changes often, you will need to create your own cache

retention policies

Combining Redis with Whois Lookup

41

def lookup(redp, ip): try: ret = redp.get(ip) if ret!=None and ret!='': return ret else: whois_ret = urllib.urlopen(LOCATION_URL + ip) lines = whois_ret.readlines() if lines!='': redp.set(ip, lines) return lines… except:

Where do I get the add-ons from today?Splunkbase!

42

Add-On Download Location Release

Whoishttp://splunk-base.splunk.com/apps/22381/whois-add-on

4.x

DBLookuphttp://splunk-base.splunk.com/apps/22394/example-lookup-using-a-database

4.x

Redis Lookuphttp://splunk-base.splunk.com/apps/27106/redis-lookup

4.x

Geo IP Lookup (not in these slides)

http://splunk-base.splunk.com/apps/22282/geo-location-lookup-script-powered-by-maxmind

4.x

43

Conclusion

Lookups are a powerful way to enhance your search experience beyond indexing

the data.

Thank You

Recommended