Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring

Copyright © 2014 Splunk Inc.

Rob Perdue VP Prof Services, 8020 Labs [email protected]

Detect Fraud and Suspicious Events Using Risk Scoring

IntroducKon

!   Rob Perdue, VP Professional Services at 8020 Labs –  Cyber security professional for 12 years –  Specialize in Security OperaKons, DFIR in financial sector –  Previously held posiKons at IBM, ADP, Viacom and ThreatGRID –  Splunking since 2008

2

Agenda !   What I hope you will learn !   Why am I talking about fraud? !   Case Study: W-‐2 fraud !   Fraud DetecKon Framework (FDF) !   CreaKng Baselines !   Risk Scoring !   Cyber use cases for FDF !   Key takeaways !   Q & A

3

What I Hope You Will Learn

!   New and exciKng ways to mine your data !   The power of the eval command to score risk !   The usefulness of lookup tables for baselining

–  Inputlookup –  outputlookup

!   Different ways to detect suspicious acKviKes

4

Why Am I Talking About Fraud?

!   Contacted to assist in an IR invesKgaKon !   Turned out not to be a typical IR engagement !   Ever hear of W-‐2 fraud? I hadn’t.

–  Steal a W-‐2 and file taxes before the real person does

5

Case Study: W-‐2 Fraud

!   Tasked with finding unauthorized access to W-‐2’s –  During tax season

!   Huge amount of data –  Millions of rows of logs

!   Relevant logs spread across several database tables and files !   Not really sure what W-‐2 fraud looked like

6

Case Study: W-‐2 Fraud

!   How the data was distributed:

7

Summary Tables

Main DB

Stand-‐alone Splunk

Several CSV Files

Case Study Con’t

!   An idea…consolidate data into a single Splunk instance !   No signature for fraud, no problem !   Score a risk value for each W-‐2 transacKon

–  Country of origin –  Uniqueness of Source IP –  Day of Week –  History of IP

!   All of that resulted in one ugly search…

8

Case Study Con’t

!   One ugly search…

9

index=w2 source="summarytable.csv" webpage="*administrator*" |eval daymonth=date_month+date_mday |eval full_user=username+"@"+group|eval full_user=lower(full_user) |iplocaKon src |stats values(Country) AS Country values(Region) AS State values(City) AS City values(date_wday) AS Day dc(daymonth) AS Unique_Days count as user_ip_count by src, full_user|join full_user [search index=w2 source=" summarytableall.csv" webpage="*administrator*" | eval full_user=username+"@"+group | eval full_user=lower(full_user) |stats count as total_W2_events by full_user] |eval traffic_per_IP=round((user_ip_count/total_W2_events)*100)|join full_user src[search index=w2_history |stats values(days_seen) AS days_seen values(total_count) AS hist_total_count by src, full_user|fields src,full_user,days_seen, hist_total_count] |eval Risk_Score=0|eval Risk_Score=if(traffic_per_ip<100 AND days_seen<14, Risk_Score+3,Risk_Score+0)|eval Risk_Score=if(traffic_per_ip ==100 AND days_seen<14, Risk_Score+1,Risk_Score+0)|eval Risk_Score=if(Day=="saturday" OR Day=="sunday",Risk_Score+1, Risk_Score+0)|eval Risk_Score=if(Unique_Days=="1", Risk_Score+2, Risk_Score+0)|eval Risk_Score=if(total_W2_events=="1", Risk_Score+2, Risk_Score+0)|eval Risk_Score=if(Country!="United States", Risk_Score+2, Risk_Score+0)|eval Risk_Score=if(days_seen>60, Risk_Score-‐3, Risk_Score+0)|eval Risk_Score=if(traffic_per_ip <100 AND days_seen>13, Risk_Score+1,Risk_Score+0) |fields full_user, src, Country, State, City, Risk_Score |sort -‐Risk_Score

Let’s Break it Down

10

index=w2 source="summarytable.csv" webpage="*administrator*" |eval daymonth=date_month+date_mday |eval full_user=username+"@"+group |eval full_user=lower(full_user) |iplocaKon src |stats values(Country) AS Country values(Region) AS State values(City) AS City values(date_wday) AS Day dc(daymonth) AS Unique_Days count as user_ip_count by src, full_user

Let’s Keep Breaking it Down

11

|join full_user [search index=w2 source=" summarytableall.csv" webpage="*administrator*" | eval full_user=username+"@"+group | eval full_user=lower(full_user) |stats count as total_W2_events by full_user] |eval traffic_per_IP=round((user_ip_count/total_W2_events)*100) Should have used the eventstats funcKon…more on that later.

…and Down

12

|join full_user src[search index=w2_history |stats values(days_seen) AS days_seen values(total_count) AS hist_total_count by src, full_user|fields src,full_user,days_seen, hist_total_count]

…and Down

13

|eval Risk_Score=0 |eval Risk_Score=if(traffic_per_ip<100 AND days_seen<14, Risk_Score+3,Risk_Score+0) |eval Risk_Score=if(traffic_per_ip ==100 AND days_seen<14, Risk_Score+1,Risk_Score+0) |eval Risk_Score=if(Day=="saturday" OR Day=="sunday",Risk_Score+1, Risk_Score+0) |eval Risk_Score=if(Unique_Days=="1", Risk_Score+2, Risk_Score+0) |eval Risk_Score=if(total_W2_events=="1", Risk_Score+2, Risk_Score+0) |eval Risk_Score=if(Country!="United States", Risk_Score+2, Risk_Score+0) |eval Risk_Score=if(days_seen>60, Risk_Score-‐3, Risk_Score+0) |eval Risk_Score=if(traffic_per_ip <100 AND days_seen>13, Risk_Score+1,Risk_Score+0) And finally… |fields full_user, src, Country, State, City, Risk_Score |sort -‐Risk_Score

Where’s the Magic?

14

!   CreaKon of a composite event –  Join –  Stats

!   Use of eval to score the event –  |eval Risk_Score=if(traffic_per_ip ==100 AND days_seen<14, Risk_Score+1,Risk_Score+0)

!   Know the data –  What did the URL for W-‐2 access look like? –  What could I extract from the logs to build a profile?

Closing the Case Study

!   It worked, but… !   ReacKve in nature !   Not terribly efficient !   Risk scoring could be be{er !   Spawned the Fraud DetecKon Framework (FDF)

15

Fraud DetecKon Framework !   UKlize everything you can from a single log event

–  Timestamp –  Time of Day –  User Agent String –  URL –  IP Info –  User Name

!   Enrich the log –  Even{ypes –  GeoIP –  IP History –  User History –  Watch lists –  Tags

!   ConKnuous Baselining !   Risk Scoring

16

What’s in a Log?

17

2002-‐05-‐02 17:42:15 172.22.255.255 -‐ 172.30.255.255 80 GET /images/picture.jpg robper 200 Mozilla/4.0+(compaKble;MSIE+5.5;+Windows+2000+Server)

Day of Week Time of Day

Source IP Method

URI Stem User Agent

Server IP

User Name

Enriching Your Logs !   EventTypes/Tags

–  What kind of transacKon was this? !   GeoIP (iplocaKon)

–  Where is this IP coming from? !   IP History

–  Have I ever seen this IP before?

!   User History –  When’s the last Kme I’ve seen this ID before? –  Is this an inacKve account?

!   User Agent String –  Is this UAS unusual? –  Have I seen it before from this user? –  Is there a non-‐English language preference?

!   Watch lists –  Is this IP on any threat or fraud watchlists?

18

Building Event Types

!   No need to score a GET request to a jpg file !   Fully understand the applicaKon you are scoring

–  App Dev guys are our friends –  Don’t assume you now what a parKcular URL is, or isn’t, for

!   Build even{ypes for transacKons of interest –  W-‐2 reports –  Payroll ExecuKon –  Beneficiary Change –  Direct Deposit Change –  Successful Logons

19

Baselining

!   What does this usually look like? !   Enables risk scoring !   Relies heavily on lookup tables !   Lesser known lookup commands

–  Inputlookup –  Outputlookup

20

FDF: Baselines

!   GeoIP –  Where does this client usually log in from?

!   User Profiles –  User Agent String –  IP Info –  User Logon History

21

FDF: GeoIP

!   Determine primary locaKon of client !   Feeds into Haversine formula

–  h{ps://apps.splunk.com/app/936/

!   Scheduled search !   UKlizes inputlookup and outputlookup

22

FDF: GeoIP

23

index=hrapp|iplocaKon allfields=true src|eval clientlat=lat|eval clientlon=lon| stats min(_Kme) AS firstTime max(_Kme) AS lastTime count by client,Region,Timezone,clientlat,clientlon |eventstats sum(count) as client_total by client| inputlookup append=T client_geoProfiles.csv|eventstats sum(client_total) AS client_total by client,Region,Timezone,clientlat,clientlon|stats min(firstTime) AS firstTime max(lastTime) AS lastTime sum(count) AS count by client_total, client,Region,Timezone,clientlat,clientlon|eval percent=round((count/client_total)*100)|outputlookup client_geoProfiles.csv|where percent>75|outputlookup client_geoBase.csv

!   GeoIP Baseline Search:


24

index=hrapp|iplocaKon allfields=true src |eval clientlat=lat|eval clientlon=lon | stats min(_Kme) AS firstTime max(_Kme) AS lastTime count by client,Region,Timezone,clientlat,clientlon |eventstats sum(count) as client_total by client | inputlookup append=T client_geoProfiles.csv |eventstats sum(client_total) AS client_total by client,Region,Timezone,clientlat,clientlon |stats min(firstTime) AS firstTime max(lastTime) AS lastTime sum(count) AS count by client_total, client,Region,Timezone,clientlat,clientlon |eval percent=round((count/client_total)*100) |outputlookup client_geoProfiles.csv |where percent>75 |outputlookup client_geoBase.csv

How this data is used is shown on

slide 32

How it Looks…

25

FDF: User Baseline

!   Create profiles for each users –  First/Last Time –  User Agent String –  IP Address

!   Scheduled search !   UKlizes inputlookup and outputlookup

26

FDF: User Baseline

27

index=hrapp| fillnull value=unknown tag::src | stats min(_Kme) AS firstTime max(_Kme) AS lastTime first(date_wday) AS weekday by user,client,src,user_agent,tag::src, tag |inputlookup append=T user_Profiles.csv | stats min(firstTime) AS firstTime max(lastTime) AS lastTime values(weekday) AS weekday by user,client,src,user_agent,tag::src,tag | outputlookup user_Profiles.csv

!   User baseline search:

Breaking it Down

28

index=hrapp| fillnull value=unknown tag::src | stats min(_Kme) AS firstTime max(_Kme) AS lastTime first(date_wday) AS weekday by user,client,src,user_agent,tag::src, tag |inputlookup append=T user_Profiles.csv | stats min(firstTime) AS firstTime max(lastTime) AS lastTime values(weekday) AS weekday by user,client,src,user_agent,tag::src,tag | outputlookup user_Profiles.csv

How this data is used is shown on

slide 32

How it Looks

29

FDF: Risk Engine

!   Anomaly detecKon using the baseline data !   Enriches the log data

–  Watchlists –  Tags –  Haversine

30

FDF: Risk Engine

31

|inputlookup user_Profiles.csv|search tag=w2 OR tag=payroll|lookup client_geoBase.csv client OUTPUT clientlat,clientlon|iplocaKon allfields=true src|lookup threatlist ip as src OUTPUT descripKon| eval short_lon=round(lon, 2)| eval short_lat=round(lat, 2)|eval c_lon=round(clientlon, 2)| eval c_lat=round(clientlat, 2)|strcat c_lat "," c_lon as latlon| strcat short_lat "," short_lon as latlon2| haversine originField=latlon latlon2 unit=mi |eval diff=(round((lastTime-‐firstTime)/86400))|eval risk=0|eval risk=if(distance>0 AND disance<300, risk+5, risk+0)|eval risk=if(distance>299, risk+15, risk+0)|eval risk=if(diff<5, risk+10, risk+0)|eval risk=if(Country!="United States", risk+50, risk+0)|eval risk=if('tag::src'="malicious", risk+30, risk+1)|eval risk=if(weekday="Saturday" OR weekday="Sunday", risk+10, risk+1)|eval risk=if(descripKon="KnownBad", risk+10, risk+0)|eval risk=if('tag::src'="whitelisted", risk-‐10, risk+1)|eval risk=if(risk<0, 1, risk+0)|eval distance=round(distance)|fields src,Country,Region,distance, client, user, tag::src,descripKon,tag,risk|search risk>0


32

|inputlookup user_Profiles.csv |search tag=w2 OR tag=payroll |lookup client_geoBase.csv client OUTPUT clientlat,clientlon |iplocaKon allfields=true src |lookup threatlist ip as src OUTPUT descripKon | eval short_lon=round(lon, 2) | eval short_lat=round(lat, 2) |eval c_lon=round(clientlon, 2) | eval c_lat=round(clientlat, 2) |strcat c_lat "," c_lon as latlon | strcat short_lat "," short_lon as latlon2 | haversine originField=latlon latlon2 unit=mi

From Slide 28

From Slide 24

Let’s Keep Breaking it Down…

33

|eval diff=(round((lastTime-‐firstTime)/86400)) |eval risk=0 |eval risk=if(distance>0 AND disance<300, risk+5, risk+0) |eval risk=if(distance>299, risk+15, risk+0) |eval risk=if(diff<5, risk+10, risk+0) |eval risk=if(Country!="United States", risk+50, risk+0) |eval risk=if('tag::src'="malicious", risk+29, risk+1) |eval risk=if(weekday="Saturday" OR weekday="Sunday", risk+10, risk+1) |eval risk=if(descripKon="KnownBad", risk+10, risk+0) |eval risk=if('tag::src'="whitelisted", risk-‐10, risk+1) |eval risk=if(risk<0, 1, risk+0) |eval distance=round(distance) |fields src,Country,Region,distance, client, user, tag::src,descripKon,risk |search risk>0

What it Looks Like…

34

FDF: Scoring Review

!   In its current state: –  EssenKally scores the risk of the session –  Can focus score on parKcular event types (e.g., direct deposit, payroll) –  Does not score behavior while in the app –  Good job of detecKng compromised creds

!   Can easily be modified to… –  Detect transacKon anomalies (e.g., wire transfers, payroll fraud) –  Incorporate Bremford’s law

ê  h{p://apps.splunk.com/app/355/ –  Score other risks

35

FDF: Other Cyber Use Cases

!   Compromised creds –  FTP –  OWA –  VPN –  Custom apps

!   User profiles –  Proxy logs –  Logon Kmes

!   Risk scoring –  IPS Alert + AV Hit + Failed Logon + ?

36

FDF: Side Story

!   One compromised FTP account reported –  The client wanted to know how many other accounts were used for

unauthorized access –  ~600 acKve FTP accounts

!   Fortunately the client had a year’s worth of FTP logs in Splunk !   UKlized the FDF framework to detect 14 addiKonal accounts

37

Key Takeaways

!   Baseline your data !   Inputlookup and outputlookup very powerful baselining tools !   Chaining eval statements is an effecKve way of scoring risk !   Use every bit of informaKon found in an individual log !   Enrich what you can

38

Q&A

39

40

Security office hours: 11:00 AM – 2:00 PM @Room 103 Everyday Geek out, share ideas with Enterprise Security developers

Red Team / Blue Team -‐ Challenge your skills and learn new tricks Mon-‐Wed: 3:00 PM – 6:00 PM @Splunk Community Lounge Thurs: 11:00 AM – 2:00 PM

Learn, share and hack

Birds of a feather-‐ Collaborate and brainstorm with security ninjas Thurs: 12:00 PM – 1:00 PM @Meal Room

THANK YOU

Technology

Splunk conf2014 - Detecting Fraud and Suspicious Events Using Risk Scoring