Visualizing doc graph in gephi june 2013

Preview:

DESCRIPTION

NYC Data Science Academy, NYC Open Data Meetup, Big Data, Data Science, NYC, Vivian Zhang, SupStat Inc,NYC, Gephi, Stony Brook Medicine school, Janos Hajagos, NPPES

Citation preview

Analyzing DocGraph in Gephi

Janos G. Hajagos Stony Brook School of Medicine

1

NYC Open Data Meetup June 24, 2013

DocGraph

• Based on FOIA request to CMS by Fred Trotter • Medicare providers (more than doctors) • CY 2011 date of service • Share 10 or more patients in a 30 day forward

window • Initial access restricted to MedStartr funders

but as of June 2013 open access

2

Geographic Visualization

http://isurfsoftware.com/blog/2012/12/13/visualizing-geographic-connections-between-us-doctors/

3

DocGraph by the numbers

• Directed graph • Average total degree 52.8 • 940,492 providers (graph nodes/vertices) • 49,685,810 shared edges

4

DocGraph Data

5

6

NPPES

• National Plan and Provider Enumeration System

• Source of NPI (National Provider Identifier) • Information is entered and updated by

provider • CSV file with 314 columns • MySQL load script generated by Python script

to normalize database

7

Selecting a sub-graph

8

Core nodes

9

Leaf nodes

10

Core-to-core edges

11

Core-to-leaf edges

12

Leaf-to-leaf edges

13

Generating GraphML

• XML based file format for graphs • Readable by a large number of tools

– Gephi – Mathematica – igraph (R)

• NetworkX Python library for graphs can easily export to GraphML

14

15

16

Gephi

Subset defined from 2 Brooklyn zip codes (11215 - Park Slope & 11212 - Brownsville)

17

Links

http://strata.oreilly.com/2012/11/docgraph-open-social-doctor-data.html (information) https://github.com/jhajagos/DocGraph (code) https://github.com/ftrotter/DocGraph (data) https://groups.google.com/forum/#!forum/docgraph (mailing list) http://bit.ly/1459NXn (sample Brooklyn GraphML file) http://strataconf.com/rx2013/public/schedule/detail/29840 (StrataRX workshop with Fred Trotter)

18

Recommended