56
Mining Your Logs Gaining Insight Through Visualization Google TechTalk March 2011 Raael Marty - @zrlram

Mining Your Logs - Gaining Insight Through Visualization

  • View
    6.430

  • Download
    3

Embed Size (px)

DESCRIPTION

In this two part presentation we will explore log analysis and log visualization. We will have a look at the history of log analysis; where log analysis stands today, what tools are available to process logs, what is working today, and more importantly, what is not working in log analysis. What will the future bring? Do our current approaches hold up under future requirements? We will discuss a number of issues and will try to figure out how we can address them. By looking at various log analysis challenges, we will explore how visualization can help address a number of them; keeping in mind that log visualization is not just a science, but also an art. We will apply a security lens to look at a number of use-cases in the area of security visualization. From there we will discuss what else is needed in the area of visualization, where the challenges lie, and where we should continue putting our research and development efforts.

Citation preview

Page 1: Mining Your Logs - Gaining Insight Through Visualization

Mining Your LogsGaining Insight Through Visualization

Google TechTalk March 2011

Raffael Marty - @zrlram

Page 2: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Raffael Marty

2

• Founder @ • Chief Security Strategist and Product Manager @ Splunk• Manager Solutions @ ArcSight• Intrusion Detection Research @ IBM Research• IT Security Consultant @ PriceWaterhouse Coopers

Applied Security VisualizationPublisher: Addison Wesley (August, 2008)

ISBN: 0321510100

Page 3: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Agenda

3

•Log Analysis

•History

•Log Architectures

•What’s Working and What’s Not?

•Future Needs

•Data Visualization

•Visualization Concepts

•Security Visualization Use-Cases

Page 4: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Log Analysis

4

10.0.20.9 - - [22/Mar/2011:10:00:52 +0000] "GET /admin/customer/customer/612/ HTTP/1.1" 200 2261 "https://logdog.loggly.org/admin/customer/customer/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-us) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27" TYhzVH8AAAEAAGOkBOQAAADA 655268

2010-12-28T18:12:10.031+00:00 frontend2-raffy syslog-ng[19600]: syslog-ng starting up; version='3.1.1'

2011-01-10T21:27:04.820+00:00 frontend2-raffy kernel: : [ 664.107313] blocked inbound IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:d8:30:62:5f:6a:a3:08:00 SRC=10.0.20.109 DST=10.0.20.255 LEN=180 TOS=0x00 PREC=0x00 TTL=64 ID=126 PROTO=UDP SPT=17500 DPT=17500 LEN=160

Page 5: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

History•1980 Eric Allman develops syslogd(8)•1996 Intellitactics•1997 Tivoli Risk Manager developed by IBM Research in Zurich (later Zurich Correlation Engine, ZCE)

•1999 - 2010 A number of log management / SIEM players enter the market (software, appliances)

•2000 ArcSight - 2010 sold for $1.65bn to HP•2009 Loggly (logging as a service)

5

Page 6: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

History - The Other View•Network management (SNMP)•IDS false positive reduction•Security monitoring (multiple data sources)•Unification of NOC and SOC (failed?)•Application monitoring (moving up the stack)-original tools failed due to architectural constraints-new approaches have been presented

6

Page 7: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Log Management Today

Where are you?

Page 8: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Log Management Today

DIY•grep •Perl•SQL

Log Management•Open source•Commercial

CEP and SIEM•Open source•Commercial

MapReduce•Open source

Advanced Analytics•Not log specific!

less tools

Page 9: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Open Source Tools•graylog2• logstash•swatch• tenshi• logwatch•OSSEC•snare• lasso

• lire•LogSurfer•SEC•LogHound•slct• log2timeline• logzilla•OSSIM

•MS Logparser•Sguil•Octopussy•Sagan

9

this list is likely incomplete!

Page 10: Mining Your Logs - Gaining Insight Through Visualization

© PixlCloud LLC 2011pixlcloud | Visualization in the Cloud

Commercial Tools

10

this list is likely incomplete!

Page 11: Mining Your Logs - Gaining Insight Through Visualization

Log Architectures

11

Page 12: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Log Mgmt Architecture

12

Collection:- syslog- OPSEC- SDEE- netflow- database

Storage:- on board- external storage array- clusters

Processing:- indexing- context storage- clustering

Page 13: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Log Mgmt Architecture

13

Collection:- syslog- OPSEC- SDEE- netflow- database

Processing:- indexing- context storage- clustering

Data Access:- free-text search- field-based search- tagging schemas

normalizedor raw

raw

Page 14: Mining Your Logs - Gaining Insight Through Visualization

© PixlCloud LLC 2011pixlcloud | Visualization in the Cloud

Agents and Connectors• piece of code to transport logs to a central location• features- batch- compress- encrypt- sign- fail-over

14

• often additional features:- parse- normalize- aggregate- enrichment (context)

• special protocols:- OPSEC, SDEE- Windows

• file-based collection• database collection

Page 15: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

SIEM Architecture

15

normalizedraw

asset context

identity context

...

RDBMS

context / tagging

Page 16: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

SIEM Architecture•RDBMS schema- Fixed number and type of fields-New data sources with new fields?‣ overloading

•RDBMS clusters are expensive and scale poorly•Need a parser for every data source•Slow historical data queries•Hard to configure database efficiently-because of different use-cases

16

Page 17: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

SIEM Architecture Benefits•Parsed data enables-real-time correlation-real-time statistics-data augmentation (context) close to source•Unified data access language-over a fixed set of fields

•Real-time dashboards

17

Page 18: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Search vs. SIEM•Full-text indexing•Parsing at search time

18

Example search:denied

Example search:user=rmarty

• use index to find occurrences of ‘denied’

• use index to find ALL occurrences of ‘rmarty’

• apply parser to results• remove results where

user is not rmarty

Page 19: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

New SIEM - Hybrid Models•Use parsers for known data sources•Collect everything else•Index all data and use index for search•Correlate parsed data

19

Page 20: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Categorization and Tagging•How do you find all failed logins across any data source?

•Does not scale- for new data sources- for new events of existing sources

•Define a ‘taxonomy’ for all events•Map events into taxonomy

20

security:538 OR “sshd authentication failure” OR “sshd failed password” OR ...

id -> object, action, status

Page 21: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Content Creation•Rules, dashboards, reports, searches can use taxonomy:

•All failures related to files:

•Mixing with other fields:

21

object=authentication AND action=login AND status=success

object=file AND status=failure

action=login AND user=rmarty

•Approach scales well•Huge effort to build and maintain mappings

Page 22: Mining Your Logs - Gaining Insight Through Visualization

Logging as a Service

Logging as a Service (LaaS)

22

•Economically advantageous - think about TCO•Pay as you go•Elastic infrastructure scales with your needs•No installation needed•No setup costs / time for logging solution•Open platform with RESTful APIs

Page 23: Mining Your Logs - Gaining Insight Through Visualization

Logging as a Service

Loggly

23

Data Sources Consumers

APIProxies

Distributeddata store

Distributedindexing and processing

Data collectionData access

mobile-166 My syslog

Logglyuser interface

Indexers and Search Machines

Log Archive

UI extensions

Page 24: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Tool Usage

24

DIY MR Log Mgmt SIEM LaaS

data sources

knownonly a few

knownonly a few

unknownmany

knownmany -

analysis use-cases

knownone or a few

explorationlarge-scale

unknownmany

unknownmany

extend platform

dynamic use-cases no no yes yes yes

real-time correlation no no no yes extend

platform

costengineerhardwaremaintenance

engineershardwaremaintenance

license(hardware)maintenance

licensehardwaremaintenance

subscription

Should you rather do it yourself (DIY)?

Page 25: Mining Your Logs - Gaining Insight Through Visualization

What is Working and What is not?

25

Page 26: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

What’s Working•Log collection•Log centralization•Alerting on a priori known patterns•Solving specific, known use-cases for sets of known data sources, e.g.,-monitoring privileged access to financial servers-generating compliance reports-security forensics

26

Page 27: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

What’s Not Working•Log formats are all over and not documented

•No logging guidelines / developer education•Parsing is broken-based on regexes-numerous mistakes-doesn’t scale

27

Mar 16 08:09:58 kernel: [ 0.000000] Normal 1048576 -> 1048576

Page 28: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

What’s Not Working•Normalization is broken:- IP to hostnames (when to do DNS lookup)-usernames (rmarty vs. ram vs. raffy)

•Categorization / Taxonomy-doesn’t scale- is buggy

•Prioritization has no working formula•Anomaly detection is voodoo!

28

- is always out of date-expensive

Page 29: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

What Does It Mean?•We don’t understand our data•Security Operations Center (SOC) monitors all corporate data sources. Analysts-don’t know all the applications-don’t know all the setups-don’t know what log records are ‘normal’ behavior

29

--> Need tools to enable log owners to work with their data

Page 30: Mining Your Logs - Gaining Insight Through Visualization

Future Needs

30

Page 31: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

We Need Better Tools•We will have more and more data and need to deal with larger amounts of data - SIEM needs to support new distributed, scalable data management technologies

•More and more application layer data -How are we going to deal with all the parsing / entity extraction?-We need logging standards and guidelines

•How do we help analysts understand the data?-What is important and what is not?-Mapping problems to business process, business risk!

31

Page 32: Mining Your Logs - Gaining Insight Through Visualization

Data Visualization

32

Page 33: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Data/Log Visualization•Exploration and Discovery

•Answer Questions

•Communicate Information

•Support Decisions33

Page 34: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

•We are nowhere!•Visualization is an afterthought•Sec Viz dichotomy•Tools are lacking fundamental capabilities•Users don’t understand data, how can they understand visuals?

Security Visualization

34

Page 35: Mining Your Logs - Gaining Insight Through Visualization

Visualization Concepts

35

Page 36: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

The Analysis Approach

36

Overview first Zoom Details on demand

Principle by Ben Shneiderman

Page 37: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Simultaneous Views

37

Page 38: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Dynamic Coloring

38

Page 39: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Linked Views

39

Page 40: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Legible / Usable Graphs

40

Reducing non data ink!

Page 41: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Choosing the Right Chart

41

Page 42: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Ode to the Pie

42

Page 43: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Careful With Interpretations

43

Page 44: Mining Your Logs - Gaining Insight Through Visualization

SecViz Examples

44

Page 45: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service 45

Page 46: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service 46

Page 47: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service 47

Page 48: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Situational Awareness• Treemap• Protovis.JS• Size: Amount • Brightness: Variance• Color: Sensor• Shows: Scans - bright spots

• Thanks to Chris Horsley

48

Page 49: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service 49

Page 50: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Firewall Treemap

50

Page 51: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Firewall LogPort Source IP Destination IP

51

Page 52: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

IDS Sig Tuning - Treemap

52

Hierarchy: SourceDestinationSignatureNumber of Events

Color: PrioritySize: Number of alerts

Page 53: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Vulnerability Data by Host

53

Page 54: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Visualization Future

54

•A solution to entity extraction•Dynamic and interactive displays•Computer aided intelligence / visualization-Computer supported exploration-Highly interactive

•Expert system that captures domain knowledge-Collaborative

Page 55: Mining Your Logs - Gaining Insight Through Visualization

© by Raffael MartyLogging as a Service

Share, discuss, challenge, and learn about security visualization.

http://secviz.org

• List: secviz.org/mailinglist

• Twitter: @secviz

55

Page 56: Mining Your Logs - Gaining Insight Through Visualization

56

about.me/raffy