Analyzing DocGraph in Gephi
Janos G. Hajagos Stony Brook School of Medicine
1
NYC Open Data Meetup June 24, 2013
DocGraph
• Based on FOIA request to CMS by Fred Trotter • Medicare providers (more than doctors) • CY 2011 date of service • Share 10 or more patients in a 30 day forward
window • Initial access restricted to MedStartr funders
but as of June 2013 open access
2
Geographic Visualization
http://isurfsoftware.com/blog/2012/12/13/visualizing-geographic-connections-between-us-doctors/
3
DocGraph by the numbers
• Directed graph • Average total degree 52.8 • 940,492 providers (graph nodes/vertices) • 49,685,810 shared edges
4
DocGraph Data
5
6
NPPES
• National Plan and Provider Enumeration System
• Source of NPI (National Provider Identifier) • Information is entered and updated by
provider • CSV file with 314 columns • MySQL load script generated by Python script
to normalize database
7
Selecting a sub-graph
8
Core nodes
9
Leaf nodes
10
Core-to-core edges
11
Core-to-leaf edges
12
Leaf-to-leaf edges
13
Generating GraphML
• XML based file format for graphs • Readable by a large number of tools
– Gephi – Mathematica – igraph (R)
• NetworkX Python library for graphs can easily export to GraphML
14
15
16
Gephi
Subset defined from 2 Brooklyn zip codes (11215 - Park Slope & 11212 - Brownsville)
17
Links
http://strata.oreilly.com/2012/11/docgraph-open-social-doctor-data.html (information) https://github.com/jhajagos/DocGraph (code) https://github.com/ftrotter/DocGraph (data) https://groups.google.com/forum/#!forum/docgraph (mailing list) http://bit.ly/1459NXn (sample Brooklyn GraphML file) http://strataconf.com/rx2013/public/schedule/detail/29840 (StrataRX workshop with Fred Trotter)
18