50
Algorithms and Applications in Social Networks 2019/2020, Semester B Slava Novgorodov 1

Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Algorithms and Applications in Social Networks

2019/2020, Semester B

Slava Novgorodov 1

Page 2: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Lesson #6

• Social Networks application examples:– Fraud

– Crime

– Terrorism

• Advices for in-practice social network analysis

2

Page 3: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Fraud detection and prevention

3

Page 4: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Motivation

• Fraud is everywhere:– Credit cards fraud

– Taxes fraud

– Fake companies fraud

• It costs our industry billions of dollars yearly

4Based on: 1. GOTCHA! IMPROVING FRAUD DETECTION TECHNIQUES slides

2. Neo4j lectures about fraud detection: https://www.youtube.com/watch?v=AeNufTq1W5I

Page 5: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Fraud detection

Current (non SNA) methods:

• Machine learning algorithms that gives a score to each transaction (i.e. the probability to be fraud)– Improvement directions:

• Better ML algorithms

• More labeled data

• Rules based systems, which usually works as addition to ML techniques (usually written by experts)– Improvement directions:

• Automatic rules generation

• Better sharing of rules between experts

5

Page 6: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Example of fraud detection

6

ML Score:0.750.910.220.150.71…

Rules:

Page 7: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Fraud detection

• Basic method: Anomalous behavior detection– Outlier detection: abnormal behavior and/or

characteristics in a data set might often indicate that that person perpetrates suspicious activities.

7

Page 8: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Fraud detection

• Basic method: Anomalous behavior detection

– Pros: Very simple method

– Cons: A lot of false positives and false negatives

8

Page 9: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Fraud detection

Current workflow:

9

Page 10: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Fraud detection

Main (not all) challenges with fraud detection:• Unbalanced:

– Extremely skewed class distribution

– Big data, but only few fraudulent observations (often < 1%)

• Well-considered & Carefully organized:– Complex fraud structures are carefully planned

– Outlier detection no longer sufficient: combination of patterns, preferably well-hidden

– Relationships between fraudsters

• Imperceptibly concealed– Subtlety of fraud: imitating normal behavior, even in identify theft

– Fraudsters are often first “sleeping”, pretending to be a good customer

10

Page 11: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Social Networks Analysis for Fraud Detection

Model interactions as a network:• Nodes:

– People (Fraudsters/Victims)– Banks– Companies– Resources– ….

• Links:– Credit Card transactions– Loans– “belongs to” relation, “works at” relation …– …

11

Page 12: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Visualization can help!

Modeling as a network can help even if you just visualize it…

12

FRAUD

Page 13: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Visualization can help!

Modeling as a network can help even if you just visualize it…

13

LEGITIMATE

Page 14: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Bipartite graphs folding

Folding:

Connect every red node

to other red node if they

are connected to same

green node

14

Page 15: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Bipartite graphs weighted folding

Folding:

Connect every blue node to other blue node if they

are connected to same orange node.

If the node already exists, add 1 to its weight

15

Page 16: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Fraud analysis “basic scheme”

1. Take the data and represent it as a network

2. Decide of the “sides” of the bipartite network

3. Fold it

4. Detect cliques, detect communities, measure centrality…

16

Page 17: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Homophily

• People tend to associate with other whom they perceive as being similar to themselves in some way. e.g.: same city, hobbies, interests…

17

Page 18: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Insurance fraud

• Combining different types of links in one network can give much more information

18

Page 19: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Insurance fraud

• Combining different types of links in one network can give much more information

19

Page 20: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Credit Card Fraud

20

Very sensitive data

Page 21: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Taxes Fraud

21

“Spider construction” fraud scheme – open a company, allocate resources,Bankrupt the company, move the resources…

Page 22: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

The solution

• System called Gotcha! (Gotch’all):(by Van Vlasselaer et al.)

22

Our focus

Page 23: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Individual Scoring

23

Page 24: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Clique detection

24

“Complete” clique

“Partial” clique

Page 25: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Clique scoring

25

Suspiciousness of the clique: How many bankrupts? How many frauds?

Page 26: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Empirical evaluation

• 5 companies, 2 resources

• 4 out of 5 companies are bankrupt

• What about the last company?

26

Page 27: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Empirical evaluation

• 5 companies, 2 resources

• 4 out of 5 companies are bankrupt

• What about the last company?

27

Page 28: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Crime detection

28

Page 29: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Motivation

• Crime is often well organized, with individuals formed into groups/gangs, with structure and hierarchy.

• Crimes have a lot of “meta-data”, that can be better modeled as a network

29Based on: 1. http://liacs.leidenuniv.nl/~takesfw/SNACS/lecture3.pdf

Page 30: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Dutch Police example

• Gain insight in social networks of soccer fans, group formation and organization

• Dataset: all entries in police systems of law violations of a particular group of people involved in soccer violence

30

Page 31: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Dutch Police example

31

Page 32: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Dutch Police example - Dataset

32

Page 33: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Dutch Police example - Dataset

33

Folded bipartite graph (people and incidents):

Page 34: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Dutch Police example - Visualization

34

Page 35: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Dutch Police example - Centrality

35

Page 36: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Dutch Police example - Centrality

36

Page 37: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Dutch Police example - Communities

37

Page 38: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

More examples from PD

• Kansas City crime – “Operation Clean Sweep” (2013):– Historically, one of the top 10 most violent cities in the US

– Averages 106 homicides per year

– Averages 3,484 aggravated assaults per year

• Results:

• Details: https://www.nationalpublicsafetypartnership.org/Documents/VRN%20Social%20Network%20Analysis%20Presentation%20July%2021%202015.pdf 38

Page 39: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Finding Terrorists Cells

39

Page 40: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

9/11 Case Study

• Analyzing such networks is much easier in past, not in future. But still important for the prosecution and potentially detecting other members

• Based on Valdis E. Krebs analysishttp://insna.org/PDF/Connections/v24/2001_I-3-7.pdf

40

Page 41: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

9/11 Case Study

41

Page 42: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

9/11 Case Study

• The beginning (January 2000):

42

Page 43: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

9/11 Case Study

• USS Cole attack (October 2000)

43

Page 44: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

9/11 Case Study

44

Page 45: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

9/11 Case Study

45

Page 46: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

9/11 Case Study

46

Final meetings (shortcuts) in gold

Page 47: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

9/11 Case Study

47

Data to build the network

Page 48: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Technologies in practice

• Small networks or Initial/Partial analysis:– Python / NetworkX

• Huge networks– Graph databases, such as Neo4j

– Distributed systems like Spark/Hadoop

48

Page 49: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Visualization, visualization, visualization…

• Very useful in Social Network analysis, helps faster identify patters and important details

49

Page 50: Algorithms and Applications in Social NetworksMotivation •Fraud is everywhere: –Credit cards fraud –Taxes fraud –Fake companies fraud •It costs our industry billions of dollars

Thank you!Questions?

50