Upload
joan-bridges
View
219
Download
0
Embed Size (px)
Citation preview
Impossibility MiningImpossibility Mining
Traditional Data MiningTraditional Data Mining
Using multidimensional data to find Using multidimensional data to find previously unknownpreviously unknown hidden relationships hidden relationshipsNot just simple query/joinsNot just simple query/joinsCanonical: Diapers and Beer at WalmartCanonical: Diapers and Beer at Walmart Urban Legend – comes from 1992 Teradata Urban Legend – comes from 1992 Teradata
study of Osco.study of Osco.
Correlation!=CausationCorrelation!=CausationTerminology currently has negative Terminology currently has negative connotations in the pressconnotations in the press
Il buono, il brutto, il cattivoIl buono, il brutto, il cattivo
3 categories of “data mining” for fraud3 categories of “data mining” for fraud Profiling (il brutto)Profiling (il brutto) Probability Mining (il cattivo)Probability Mining (il cattivo) Anomaly Detection (il buono)Anomaly Detection (il buono)
ProfilingProfiling
Looking for a series of characteristics which identify a Looking for a series of characteristics which identify a likely problemlikely problemDemographic Profiling:Demographic Profiling:
Looking for a series of personal identifiers to determine likely Looking for a series of personal identifiers to determine likely suspectssuspects
Example: Corporate data thieves tend to be males between 30 Example: Corporate data thieves tend to be males between 30 and 40 years of ageand 40 years of age
Behavior Profiling:Behavior Profiling: Looking for a series of behaviors which indicate likely suspectsLooking for a series of behaviors which indicate likely suspects Example: Corporate data thieves are more likely to work Example: Corporate data thieves are more likely to work
weekends, not take vacations, and be generally highly ratedweekends, not take vacations, and be generally highly rated
Profiling - IssuesProfiling - Issues
Demographic profiling, no matter how Demographic profiling, no matter how good, will likely end up with you on CNNgood, will likely end up with you on CNN
Base Rate Fallacy: The profile needs to Base Rate Fallacy: The profile needs to be extraordinarily close to 100% for a be extraordinarily close to 100% for a population of any size.population of any size.
Probability MiningProbability Mining
Identifying high probability issues to targetIdentifying high probability issues to target
Can be applied to profiling or anomaly detectionCan be applied to profiling or anomaly detection
Good for sliding thresholds with competing Good for sliding thresholds with competing business driversbusiness drivers
Example: Stolen credit cards are more likely to Example: Stolen credit cards are more likely to be used at electronics stores for high ticket be used at electronics stores for high ticket items. Applied to a particular profile, a plasma items. Applied to a particular profile, a plasma TV purchase may have a 10% chance of being TV purchase may have a 10% chance of being fraudulent.fraudulent.
Probability Mining - IssuesProbability Mining - Issues
Business drivers need to be consideredBusiness drivers need to be considered Is it worth it to bother 10 legitimate credit card Is it worth it to bother 10 legitimate credit card
holders to find 1 stolen card? What about holders to find 1 stolen card? What about 100? 1000?100? 1000?
Probability generation requires a lot of Probability generation requires a lot of data and a pre-labeled dataset to be data and a pre-labeled dataset to be usefuluseful
Anomaly DetectionAnomaly Detection
Sesame Street analysisSesame Street analysis
Relies on finding outliers in dataRelies on finding outliers in data
Does Does notnot require a priori expert knowledge require a priori expert knowledge of the dataof the data
Does require après-analysis expert Does require après-analysis expert knowledge to interpret outliersknowledge to interpret outliers
Case Example: Anomaly DetectionCase Example: Anomaly Detection
Product launch event - $1.5 Million budgetProduct launch event - $1.5 Million budget
Launch directors had authority for procurements Launch directors had authority for procurements up to $10,000up to $10,000
Report received of a “person directing the launch Report received of a “person directing the launch event gave a lot of vendor work to his brother-in-event gave a lot of vendor work to his brother-in-law”law”
There were ~25 recent launch events that this There were ~25 recent launch events that this could refer to, 10 of which were male-directedcould refer to, 10 of which were male-directed
Looked at the financials for each launch eventLooked at the financials for each launch event
DataData
Event Launch PurchasesEvent Launch Purchases AmountAmount
Consulting – Marketing SupportConsulting – Marketing Support $9,512.00$9,512.00
Supplies - GeneralSupplies - General $250.12$250.12
Consulting - AdvertisingConsulting - Advertising $9,832.00$9,832.00
Supplies – Plasma TV RentalSupplies – Plasma TV Rental $9,814.22$9,814.22
Supplies - CateringSupplies - Catering $1,233.22$1,233.22
Consulting – Launch SupportConsulting – Launch Support $9,763.00$9,763.00
Supplies – Secondary Plasma TVSupplies – Secondary Plasma TV $9,814.22$9,814.22
Mileage - ReimbursementMileage - Reimbursement $252.84$252.84
BenfordBenford
Anomaly Detection – How we Anomaly Detection – How we Found ‘emFound ‘em
Benford’s LawBenford’s Law Take a look at both the last and first digitsTake a look at both the last and first digits Distribution is well of predictionsDistribution is well of predictions
Nearness-to-thresholdNearness-to-threshold Distribution should not be a logarithmic Distribution should not be a logarithmic
decline from approval thresholddecline from approval threshold Nothing was over threshold…Nothing was over threshold…
Common SenseCommon Sense Plasma TV Rentals - $10K to rent? Why 2?Plasma TV Rentals - $10K to rent? Why 2?
ResultsResults
Subject hired their brother-in-law to do Subject hired their brother-in-law to do phantom consultingphantom consulting
Subject rented plasma TVs with a $1 Subject rented plasma TVs with a $1 buyout optionbuyout option
Case Example: Geospatial Case Example: Geospatial AnomaliesAnomalies
Problem: Identify web activity that is Problem: Identify web activity that is spurious in naturespurious in nature
Application: Successfully applied to Application: Successfully applied to internal user data (activity logs) as well as internal user data (activity logs) as well as external data (attacks)external data (attacks)
User DataUser Data
User Data – Plotted as AnomaliesUser Data – Plotted as Anomalies
Outliers – What Were They?Outliers – What Were They?Outlier Categorization
63%10%
14%
7% 3% 3%
Foreign Users
Gambling
False Positives
Pornography
Dating Websites
Spyware
Impossibility MiningImpossibility Mining
Is NOT data miningIs NOT data mining
IS an application of control testingIS an application of control testing
Looks for patterns that cannot exist in any Looks for patterns that cannot exist in any model of reasonable likelihoodmodel of reasonable likelihood
Can be single or multifactorCan be single or multifactor
Only identifies real outliersOnly identifies real outliers
Impossibility Mining Example – Impossibility Mining Example – Single FactorSingle Factor
Asset ManagementAsset Management IT Asset Management software installed on all IT Asset Management software installed on all
machines in a companymachines in a company Cataloged installed hardware and software at Cataloged installed hardware and software at
different points in timedifferent points in time
Proactive LookProactive Look Identify any computers where installed Identify any computers where installed
memory at time T is less than or equal to T-1memory at time T is less than or equal to T-1 Identified several hundred laptops from Identified several hundred laptops from
remote office users that met the criteriaremote office users that met the criteria
Impossibility Mining Example – Impossibility Mining Example – Single Factor, cont’dSingle Factor, cont’d
Identified commonality in laptopsIdentified commonality in laptops All laptops were serviced by the same IT All laptops were serviced by the same IT
support locationsupport location Found the drop in memory was consistent Found the drop in memory was consistent
with the last “upgrade”with the last “upgrade” Reviewed eBay activity of the local IT support Reviewed eBay activity of the local IT support
personnelpersonnel Found the thief, who was removing half of the Found the thief, who was removing half of the
memory from laptops of non-power users and memory from laptops of non-power users and selling it!selling it!
Impossibility Mining – Dual FactorImpossibility Mining – Dual Factor
Electronic Funds Transfer InvestigationElectronic Funds Transfer Investigation
Payment ProcessPayment Process Manager takes in payment request and assigns to a Manager takes in payment request and assigns to a
clerkclerk Clerk enters payment information and selects a payeeClerk enters payment information and selects a payee Manager enters EFT information for the payee and Manager enters EFT information for the payee and
confirms transaction (cannot change amount)confirms transaction (cannot change amount) Division Head confirms name on account, amount, Division Head confirms name on account, amount,
and releases fundsand releases funds
Question: Does fraud require collusion?Question: Does fraud require collusion?
Impossibility Mining – Dual Factor, Impossibility Mining – Dual Factor, cont’dcont’d
EFT AuditEFT Audit Compared actual EFTs for internal consistencyCompared actual EFTs for internal consistency
Looked for EFTs where the customer ID was the same, but Looked for EFTs where the customer ID was the same, but the bank routing number was differentthe bank routing number was different
Identified a manager who was manually changing routing Identified a manager who was manually changing routing information to funnel to her husband’s accountinformation to funnel to her husband’s account
33rdrd set of eyes (Division Head) did not help – ineffective set of eyes (Division Head) did not help – ineffective controlcontrol
Two process changesTwo process changesOnly Division Head can add EFT informationOnly Division Head can add EFT information
Automated check implemented to ID bank name != routing Automated check implemented to ID bank name != routing numbernumber
Impossibility Mining – Data JoiningImpossibility Mining – Data Joining
Unauthorized Computer AccessUnauthorized Computer Access Created a table of physical sitesCreated a table of physical sites Calculated the minimum travel time between Calculated the minimum travel time between
sitessites Identified anyone logging in to a machine at 2 Identified anyone logging in to a machine at 2
sites where time between logins < minimum sites where time between logins < minimum travel timetravel time
Impossibility Mining – Data Joining, Impossibility Mining – Data Joining, cont’dcont’d
Identified several stolen passwordsIdentified several stolen passwords Also highlighted password sharingAlso highlighted password sharing … … as well as user passwords hard-coded in as well as user passwords hard-coded in
applicationsapplications
Impossibility Mining - ConclusionsImpossibility Mining - Conclusions
The less likely for something to occur, the better The less likely for something to occur, the better the candidacy for impossibility miningthe candidacy for impossibility mining
Can always implement controls to prevent the Can always implement controls to prevent the “impossibilities”, but they are not always “impossibilities”, but they are not always implemented correctlyimplemented correctly
Best example in the media: Insurance fraud Best example in the media: Insurance fraud case - men were claiming hysterectomies, case - men were claiming hysterectomies, ovarian cyst removal, PAP tests…ovarian cyst removal, PAP tests…
QuestionsQuestions
……Other than can we go yet?Other than can we go yet?