Upload
max-de-marzi
View
175
Download
1
Embed Size (px)
Citation preview
AboutMe
• MaxDeMarzi-Neo4jFieldEngineer
• MyBlog:http://maxdemarzi.com• FindmeonTwitter:@maxdemarzi• Emailme:[email protected]• GitHub:http://github.com/maxdemarzi
Overview
TypesofFraud• CreditCardFraud• First-PartyFraud• SyntheticIdentitiesandFraudRings• InsuranceFraud
TypesofAnalysis• TraditionalAnalysis• Graph-BasedAnalysis
FraudDetectionandPreventionCommonQuestions
Idon’tknow,butIknowwhodoes
• AlexBeutel,CMU• LemanAkoglu,StonyBrook• ChristosFaloutsos,CMU
• Graph-BasedUserBehaviorModeling:FromPredictiontoFraudDetection
• http://www.cs.cmu.edu/~abeutel/kdd2015_tutorial/
UserBehaviorChallenges
• Howcanweunderstandnormaluserbehavior?
• Howcanwefindsuspiciousbehavior?
• Howcanwedistinguishthetwo?
UnderstandingourUsers
MATCH(u:User)-[r:RATED]->(m:Movie) RETURNu.gender,u.age,COUNT(DISTINCTu)ASuser_cnt,COUNT(DISTINCTm)ASmov_cnt, COUNT(r)ASrtg_cnt
UnderstandingourUsers
MATCH(me:User{id:1})-[r1:RATED]->(m:Movie)<-[r2:RATED]-(similar_users:User) WHEREABS(r1.stars-r2.stars)<=1 RETURNsimilar_users.gender, similar_users.age, COUNT(DISTINCTsimilar_users)ASuser_cnt, COUNT(r2)ASrtg_cnt
WhatdoLittleGirlsLike?
MATCH(u:User)-[r:RATED]->(m:Movie) WHEREu.age=1ANDu.gender="F"ANDr.stars>3RETURNm.title,COUNT(r)AScntORDERBYcntDESC LIMIT10
WhatdoMen25-34Like?
MATCH(u:User)-[r:RATED]->(m:Movie) WHEREu.age=25ANDu.gender="M"ANDr.stars>3RETURNm.title,COUNT(r)AScntORDERBYcntDESC LIMIT10
ContentBasedRecommendations
• Step1:CollectItemCharacteristics• Step2:FindsimilarItems• Step3:RecommendSimilarItems
• Example:SimilarMovieGenres
CollaborativeFilteringRecommendations
• Step1:CollectUserBehavior• Step2:FindsimilarUsers• Step3:RecommendBehaviortakenbysimilarusers
• Example:Peoplewithsimilarmusicaltastes
UsingRelationshipsforRecommendations
Content-basedfilteringRecommenditemsbasedonwhatusershavelikedinthepast
Collaborativefiltering Predictwhatuserslikebasedonthesimilarityoftheirbehaviors,activitiesandpreferencestoothers
Movie
Person
Person
RATED
SIMILARITY
rating:7
value:.92
CypherQuery:MovieRecommendation
MATCH(watched:Movie{title:"ToyStory”})<-[r1:RATED]-()-[r2:RATED]->(unseen:Movie)WHEREr1.rating>7ANDr2.rating>7ANDwatched.genres=unseen.genresANDNOT((:Person{username:”maxdemarzi"})-[:RATED]->(unseen))RETURNunseen.title,COUNT(*)ORDERBYCOUNT(*)DESCLIMIT25
WhataretheTop25Movies• thatIhaven'tseen• withthesamegenresasToyStory• givenhighratings• bypeoplewholikedToyStory
CypherQuery:RatingsofTwoUsers
MATCH(p1:Person{name:'MichaelSherman’})-[r1:RATED]->(m:Movie),(p2:Person{name:'MichaelHunger’})-[r2:RATED]->(m:Movie)RETURNm.nameASMovie,r1.ratingAS`M.Sherman'sRating`,r2.ratingAS`M.Hunger'sRating`
WhataretheMoviesthese2usershavebothrated
CypherQuery:CosineSimilarity
MATCH(p1:Person)-[x:RATED]->(m:Movie)<-[y:RATED]-(p2:Person)WITHSUM(x.rating*y.rating)ASxyDotProduct,SQRT(REDUCE(xDot=0.0,aINCOLLECT(x.rating)|xDot+a^2))ASxLength,SQRT(REDUCE(yDot=0.0,bINCOLLECT(y.rating)|yDot+b^2))ASyLength,p1,p2MERGE(p1)-[s:SIMILARITY]-(p2)SETs.similarity=xyDotProduct/(xLength*yLength)
CalculateitforallPersonnodeswithatleastoneMoviebetweenthem
CypherQuery:Yournearestneighbors
MATCH(p1:Person{name:'GraceAndrews’})-[s:SIMILARITY]-(p2:Person)WITHp2,s.scoreASsimRETURNp2.nameASNeighbor,simASSimilarityORDERBYsimDESCLIMIT5
Whoarethe• top5Personsandtheirsimilarityscore• orderedbysimilarityindescendingorder• forGraceAndrews
CypherQuery:k-NNRecommendation
MATCH(m:Movie)<-[r:RATED]-(b:Person)-[s:SIMILARITY]-(p:Person{name:'ZoltanVarju'})WHERENOT((p)-[:RATED]->(m))WITHm,s.similarityASsimilarity,r.ratingASratingORDERBYm.name,similarityDESCWITHm.nameASmovie,COLLECT(rating)[0..3]ASratingsWITHmovie,REDUCE(s=0,iINratings|s+i)*1.0/LENGTH(ratings)ASrecommendationORDERBYrecommendationDESCRETURNmovie,recommendation LIMIT25
WhataretheTop25Movies• thatZoltanVarjuhasnotseen• usingtheaveragerating• bymytop3neighbors
WhatRatingshouldIgive101Dalmatians?
MATCH(me:User{id:1})-[r1:RATED]->(m:Movie) <-[r2:RATED]-(:User)-[r3:RATED]-> (m2:Movie{title:”101Dalmatians”})WHEREABS(r1.stars-r2.stars)<=1 RETURNAVG(r3.stars)
Modeling“Normal”Behavior
• PredictEdges• PredictNodeAttributes• PredictEdgeAttributes• ClusteringandCommunityDetection
PredictaStarRatingpurelyonDemographics
MATCH(u:User)-[r:RATED]->(m:Movie{title:”ToyStory”})WHEREu.age=1ANDu.gender="F"RETURNAVG(r.stars)
Modeling“Normal”Behavior
• PredictEdges• PredictNodeAttributes• PredictEdgeAttributes• ClusteringandCommunityDetection
• FraudDetection
TwoSidesoftheSameCoin
Recommendations• Addtherelationshipthatdoesnotexist
FraudDetection• Findtherelationshipsthatshouldnotexist
ModelingUserBehavior
• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior
ModelingUserBehavior
• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior
• RoughModelofnormalvsoutlier
ModelingUserBehavior
• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior.
• Finegrainedmodelscanfindmoresubtleoutliers
ModelingUserBehavior
• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior
• Complexmodelscancapturenormalandabnormalpatterns
ModelingUserBehavior
• Modelingnormalusersanddetectinganomaliesaretwosidesofunderstandinguserbehavior
• Knownfraudulentpatternscanbesearchedfordirectly
FindtheNodes
ArrayList<Node>nodes=newArrayList<Node>();nodes.add(db.findNode(Labels.CC,“number”,card));nodes.add(db.findNode(Labels.Phone,“number”,phone));nodes.add(db.findNode(Labels.Email,“address”,address));nodes.add(db.findNode(Labels.IP,“address”,ip));
AddtheCrossesfor(Nodenode:nodes){HashMap<String,AtomicInteger>crosses=newHashMap<String,AtomicInteger>();crosses.put("CCs",newAtomicInteger(0));crosses.put("Phones",newAtomicInteger(0));crosses.put("Emails",newAtomicInteger(0));crosses.put("IPs",newAtomicInteger(0));for(Relationshiprelationship:node.getRelationships(RELATED,Direction.BOTH)){Nodething=relationship.getOtherNode(node);Stringtype=thing.getLabels().iterator().next().name()+"s";crosses.get(type).getAndIncrement();}results.add(crosses);}
ExamineResults
[{"ips":4,"emails":7,"ccs":0,"phones":4},--ccreturned4ips,7emails,and3phones.{"ips":1,"emails":1,"ccs":1,"phones":0},--phonereturnedjust1itemforeachcrossreferencecheck.{"ips":2,"emails":0,"ccs":4,"phones":3},--emailreturned2ips,4creditcardsand3phones.{"ips":0,"emails":1,"ccs":3,"phones":2}]--ipreturned3creditcardsand2phones.
Whataresomeusefulsubgraphs?
Graph queries: find subgraphs of particular pattern
MATCH(a)--(b)--(c)--(a) RETURN*
Whataresomeusefulsubgraphs?
Graph queries: find subgraphs of particular pattern
MATCH(a)—(b)—(c)—(d)—(a)—(c),(d)—(b) RETURN*
Ego-netPatterns
Ni: number of neighbors of ego i
Ei: number of edges in egonet i
Wi: total weight of egonet i
λw,i: principal eigenvalue of the weighted adjacency matrix of egonet i
FindGroupswithinEgo-Nets
Link
First-PartyFraud
• Fraudster’saim:applyforlinesofcredit,actnormally,extendcredit,then…runoffwithit
• FabricateanetworkofsyntheticIDs,aggregatesmallerlinesofcreditintosubstantialvalue
• Oftenahiddenproblemsinceonlybanksarehit• Whereasthird-partyfraudinvolvescustomerswhoseidentitiesarestolen• Moreonthatlater…
Sowhat?
• $10’sbillionslostbybankseveryyear• 25%ofthetotalconsumercreditwrite-offsintheUSA• Around20%ofunsecuredbaddebtinE.U.andN.A.ismisclassified
• Inrealityitisfirst-partyfraud
Thenthefraudhappens…
• Revolvingdoorsstrategy• Moneymovesfromaccounttoaccounttoprovidelegitimatetransactionhistory
• Banksdulyincreasecreditlines• Observedresponsiblecreditbehavior
• Fraudstersmaxoutalllinesofcreditandthenbustout
…andtheBankloses
• Collectionsprocessensues• Realaddressesarevisited• FraudstersdenyallknowledgeofsyntheticIDs• Bankwritesoffdebt
• Twofraudsterscaneasilyrackup$80k• Wellorganizedcrimeringscanrackupmanytimesthat
…andMakesitHardtoReact
• Whenthebustoutstartstohappen,howdoyouknowwhattocancel?• Andhowdoyoudoitfasterthenthefraudstertolimityourlosses?
• Agraph,that’show!
ProbableCohabitersQuery
MATCH (p1:Person)-[:HOLDS|LIVES_AT*]->() <-[:HOLDS|LIVES_AT*]-(p2:Person) WHERE p1 <> p2 RETURN DISTINCT p1
RiskyPeople
MATCH (p1:Person)-[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p2:Person) -[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p3:Person) WHERE p1 <> p2 AND p2 <> p3 AND p3 <> p1 WITH collect (p1.name) + collect(p2.name) + collect(p3.name) AS names UNWIND names AS fraudster RETURN DISTINCT fraudster
Localizethefocus
MATCH (p1:Person {name:'Sol'})-[:HOLDS|LIVES_AT]-()…
Number of fraudsters: [5] Time taken: [13] ms
QuicklyRevokeCardsinBust-Out
MATCH (p1:Person)-[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p2:Person) -[:HOLDS|LIVES_AT]->() <-[:HOLDS|LIVES_AT]-(p3:Person) WHERE p1 <> p2 AND p2 <> p3 AND p3 <> p1 WITH collect (p1) + collect(p2)+ collect(p3) AS names UNWIND names AS fraudster MATCH (fraudster)-[o:OWNS]->(card:CreditCard) DELETE o, card
WhiplashforCash
http://georgia-clinic.com/blog/wp-content/uploads/2013/10/whiplash.jpg http://cdn2.holytaco.com/wp-content/uploads/2012/06/lottery-winner.jpg
Whiplash for Cash Example
Accidents
Cars
Doctor Attorney
People
DrivesIs Passenger
Drivers PassengersWitnesses
Risk
• $80,000,000,000annuallyonautoinsurancefraudandgrowing• Evensmall%reductionsareworthwhile!
• Britishpolicyholderspay~£100peryeartocoverfraud• USdriverspay$200-$300peryearaccordingtoUSNationalInsuranceCrimeBureau
RegularDriversQuery
MATCH (p:Person)-[:DRIVES]->(c:Car) WHERE NOT (p)<-[:BRIEFED]-(:Lawyer) AND NOT (p)<-[:EXAMINED]-(:Doctor) AND NOT (p)-[:WITNESSED]->(:Car) AND NOT (p)-[:PASSENGER_IN]->(:Car) RETURN p,c LIMIT 100
GenuineClaimantsQuery
MATCH (p:Person)-[:DRIVES]->(:Car), (p)<-[:BRIEFED]-(:Lawyer), (p)<-[:EXAMINED]-(:Doctor) OPTIONAL MATCH (p)-[w:WITNESSED]->(:Car), (p)-[pi:PASSENGER_IN]->(:Car) RETURN p, count(w) AS noWitnessed, count(pi) as noPassengerIn
Fraudsters
MATCH (p:Person)-[:DRIVES]->(:Car), (p)<-[:BRIEFED]-(:Lawyer), (p)<-[:EXAMINED]-(:Doctor), (p)-[w:WITNESSED]->(:Car), (p)-[pi:PASSENGER_IN]->(:Car) WITH p, count(w) AS noWitnessed, count(pi) as noPassengerIn WHERE noWitnessed > 1 OR noPassengerIn > 1 RETURN p
Auto-fraudGraph
• Onceyouhavethefraudsters,findingtheirsupportteamiseasy.• (fraudster)<-[:EXAMINED]-(d:Doctor) • (fraudster)<-[:BRIEFED]-(l:Lawyer)
• Andit’salsoeasytofindtheirpassengers• (fraudster)-[:DRIVES]->(:Car)<-[:PASSENGER_IN]-(p)
• Andeasytofindotherplaceswherethey’vemaybecommittedfraud• (fraudster)-[:WITNESSED]->(:Car) • (fraudster)-[:PASSENGER_IN]->(:Car)
• Andyoucanseethisinmilliseconds!
OnlinePaymentsFraud(First-Party)
• Stealingcredentialsiscommonplace• Phishing,malware,simplenaïveusers
• Buyingstolencreditcardnumbersiseasy
• Howshouldoneprotectagainstseeminglyfinecredentials?• Andvalidcreditcardnumbers?
Wearealllittlestars
• Usernameandpasswords• Two-factorauth• IPaddresses,cookies• Creditcard,paypalaccount
• Somegamingsitesalreadydosomeofthis
• ArtsandCraftsplatformEtsyalreadyembracedtheideaofgraphofidentity
OtherSpe
cific
Considera
tions
SpecificWeightedIdentityQuery
MATCH (u:User {username:'Jim', password: 'secret'})
OPTIONAL MATCH (u) -[cookie:PROVIDED]->(:Cookie {id:'1234'}) OPTIONAL MATCH (u)-[address:FROM]->(:IP {network:'128.240.0.0'})
RETURN SUM(cookie.weighting) + SUM(address.weighting) AS score
BareMinimum
OtherSpecificConsiderations
FinalDecision
GeneralWeightedIdentityQuery
MATCH (u:User {username:'Jim', password: 'secret'})
OPTIONAL MATCH (u)-[rel]->() WHERE has(rel.weighting)
RETURN SUM(rel.weighting) AS score
BareMinimum
AllAvailableWeightings
FinalDecision
[email protected] 1234LOL
From1stto3rdParty
• The1stpartyidentitygraphcaneasilybeextendedto3rdpartyfraud• Likeinthebankfraudring,fraudsterscanmix-n-matchclaims• Startwithafewphishedaccountsandexpandfromthere!
ScanforPotentialFraudsters
MATCH (u1:User)--(x)--(u2:User) WHERE u1 <> u2 AND NOT (x:IP) RETURN x
NetworkincommonisOK
Stopspecificfraudsternetwork,quickly
MATCH path = (u1:User {username: 'Jim'})-[*]-(x)-[*]-(u2:User) WHERE u1<>u2 AND NOT (x:IP) AND NOT (x:User) RETURN path
Howdothesefitwithtraditionalfraudprevention?
http://www.gartner.com/newsroom/id/1695014
Gartner’sLayeredFraudPreventionApproach
Askforhelpifyougetstuck
• Onlinetraining-http://neo4j.com/graphacademy/
• Videos-http://vimeo.com/neo4j/videos
• Usecases-http://www.neotechnology.com/industries-and-use-cases/
• Meetups
• Bookstogetyourstarted• http://www.graphdatabases.com
• http://neo4j.com/book-learning-neo4j/
DeepNeuralNetworksforBankFraud
https://www.youtube.com/watch?v=TAer-PeIypI
FraudDetectionstartsabouthalf-way(afterintro)