Create and use graph visualizations efficiently in your projects.
Text of Kick start graph visualization projects
by Sbastien Heymann seb@linkurio.us Kick-start Graph Visualization Projects.
...with software. Co-founder of the Gephi project - 2008 Co-founder of the Linkurious startup - 2013 PhD in computer science, UPMC LIP6 - 2013 A few words about me I democratise graph thinking (with pink titles) makes graphs handy
Open source project started in 2008 Built to solve large graph visualization problems Latest version downloaded ~ 400,000 times http://gephi.org A few words about me / Gephi makes graphs handy
A few words about me / Gephi
A few words about me / Linkurious Started by a collaboration with Stanford - Mapping the Republic of Letters and DensityDesign in 2012. Now French startup of 3 people. Linkurious helps companies make sense of data with user- friendly visualization software. We help business analysts, R&D teams, developers and scientists.
A few words about me / Linkurious
Beautiful but unreadable pictures? Lets make graph visualization useful.
0. Why? 1. Key takeaways a. The 5 questions b. User stories c. Design visualization + interaction 2. Fraud detection use case 3. Q&A How to create and use graph visualization successfully? Agenda PRACTICE PRACTICE
0. Why graph visualization? Huh...
What is a graph? This is a graph. Father Of Father Of Siblings
What is a graph? / Nodes & relationships A graph is a set of nodes linked by relationships. Father Of Father Of Siblings This is a node This is a relationship
People, objects, movies, restaurants, music... Antennas, servers, phones, people... Supplier, roads, warehouses, products... Graphs can be used to model many domains. Supply chains Social networks Communications Differents domains where graphs are important
Graph visualization can help you in many ways. Do you have a graph project?
The greatest value of a picture is when it forces us to notice what we never expected to see. Why? John Tukey (1962)
How to create and use graph visualization successfully? 1. Key takeaways to kick-start your projects. a. Ask 5 questions. b. Write user stories. c. Design visualization and interaction.
Ask 5 questions / Q1: Data, tadaa? You need data. sourcing - cleaning - update
sensemaking - scale - complexity Ask 5 questions / Q1: Data, tadaa? Can you model data as graphs? image: Martin Grandjean
Hypothesis discovery, evidence finding Impact analysis, reportingData modelling, database administration Set up your goal. Administrate Understand Monitor Ask 5 questions / Q2: Why using graph visualization in your project? images: XKCD & the web
Ask 5 questions / Q3: Who will use it? Define personas. data scientist business analyst developer public audience images: PhdComics & Despicable Me
Short-term memory max 7 items otherwise the ability to make decisions drops Vision more than 10 000 nodes is generally useless Ask 5 questions / Q4: What are the constraints? Acknowledge human limits.
50 nodes 1B nodes Graph size Machine performances Server side VS client side rendering Interactive VS print Ask 5 questions / Q4: What are the constraints? Acknowledge technical limits.
individual use VS collaborative work artwork VS integrated into an application Ask 5 questions / Q5: How is it used? Define scope.
1. What are the data? 2. What is your goal? 3. Who is your end-user? 4. What are the constraints? 5. How is it used? Ask 5 questions / Summary The 5 questions
Ask 5 questions / Your turn! Answer the 5 questions of your project. PRACTICE
How to create and use graph visualization successfully? 1. Key takeaways to kick-start your projects. a. Ask 5 questions. b. Write user stories. c. Design visualization and interaction.
I define a data model. I generate a significant graph sample. I create a business query with Cypher. I visualize the query result. I iterate on the data model until it is satisfying. Write user story / The developer story I am creating a Neo4j graph database for my application.
Write user story / Your turn! Write your own user story. PRACTICE
How to create and use graph visualization successfully? 1. Key takeaways to kick-start your projects. a. Ask 5 questions. b. Write user stories. c. Design visualization and interaction.
Graph visualization in practice
Design visualization How to represent graphs?
(a) Nodes are ordered as rows and columns; connections are indicated as filled cells. (b) A matrix representation of a typical biological pathway. in (Gehlenborg 2012) Design visualization / Common graph representations Matrices
(a) A directed graph typical of a biological pathway. (b) An undirected graph with nodes arranged in a circle. (c) A spring-embedded layout of data from b. in (Gehlenborg 2012) Design visualization / Common graph representations Node-link diagrams
Design visualization Lets choose node- link diagrams because its more common.
Design visualization Map data to visual variables. proximity hierarchy group
Expand Search Design interaction Add interactivity Details on demand Filter
Design visualization and interaction / Graph Viz 101 Learn more at http://linkurio.us/graph-viz-101
How to create and use graph visualization successfully? 1. Key takeaways to kick-start your projects. a. Ask 5 questions. b. Write user stories. c. Design visualization and interaction.
Use case 2. Bank loan fraud detection use case.
Use case / The cost of fraud $28.6B AITE Group estimates that first party fraud will cost $28.6 billion in credit card losses a year by 2016. http://news.alaric.com/industry-news/fraud/a-new-approach-to-first-party-fraud-reducing-bad-debt/ http://bankinganalyticsblog.fico.com/2013/02/first-party-fraud-it-was-me.html
A criminal uses the fake identity to register a bank account. He acts like a normal customer and tries to secure a loan. Once the criminal feels he cannot get access to more money he carefully prepares his exit : in a short amount of time he empties all of his accounts and disappears. A criminal or a group of criminal mix pieces of information (addresses, phone numbers, social security number) to create a synthetic-identity. A look at a common fraud scenario banks face. Create a fake identity Go to the bank, ask for a loan Disappear with the money Use case / A common fraud scenario
Use case / How do we set up a graph-based fraud detection system? Lets ask our 5 questions. 1. What are the data? 2. What is your goal? 3. Who is your end-user? 4. What are the constraints? 5. How is it used?
Use case / Q1: What are the data? We model customer data as a graph. Loan $25k Home address 58, Eisenhower Square Customer name J. Smith Phone number +33 5 68 98 25 74 Credit card 1 234$ ID J. Smith A graph showing a legitimate customer and the information she is linked to.
Use case / Q1: What are the data? In a fraud ring people share the same information. 58, Eisenhower Square 14, Roses Street +33 6 75 89 22 14 $7k P. Martin $12,5k +331 42 58 66 00 J. Smith SSN 17873897893 31195855 $20k E. Selmati SSN 1787576553 $45k P. Smith SSN 1787579953 SSN 1267576553 31184274