View
106
Download
0
Category
Preview:
DESCRIPTION
NYC Data Science Academy, NYC Open Data Meetup, Big Data, Data Science, NYC, Vivian Zhang, SupStat Inc,NYC, Gephi, Stony Brook Medicine school, Janos Hajagos, NPPES
Citation preview
Analyzing DocGraph in Gephi
Janos G. Hajagos Stony Brook School of Medicine
1
NYC Open Data Meetup June 24, 2013
DocGraph
• Based on FOIA request to CMS by Fred Trotter • Medicare providers (more than doctors) • CY 2011 date of service • Share 10 or more patients in a 30 day forward
window • Initial access restricted to MedStartr funders
but as of June 2013 open access
2
Geographic Visualization
http://isurfsoftware.com/blog/2012/12/13/visualizing-geographic-connections-between-us-doctors/
3
DocGraph by the numbers
• Directed graph • Average total degree 52.8 • 940,492 providers (graph nodes/vertices) • 49,685,810 shared edges
4
DocGraph Data
5
6
NPPES
• National Plan and Provider Enumeration System
• Source of NPI (National Provider Identifier) • Information is entered and updated by
provider • CSV file with 314 columns • MySQL load script generated by Python script
to normalize database
7
Selecting a sub-graph
8
Core nodes
9
Leaf nodes
10
Core-to-core edges
11
Core-to-leaf edges
12
Leaf-to-leaf edges
13
Generating GraphML
• XML based file format for graphs • Readable by a large number of tools
– Gephi – Mathematica – igraph (R)
• NetworkX Python library for graphs can easily export to GraphML
14
15
16
Gephi
Subset defined from 2 Brooklyn zip codes (11215 - Park Slope & 11212 - Brownsville)
17
Links
http://strata.oreilly.com/2012/11/docgraph-open-social-doctor-data.html (information) https://github.com/jhajagos/DocGraph (code) https://github.com/ftrotter/DocGraph (data) https://groups.google.com/forum/#!forum/docgraph (mailing list) http://bit.ly/1459NXn (sample Brooklyn GraphML file) http://strataconf.com/rx2013/public/schedule/detail/29840 (StrataRX workshop with Fred Trotter)
18
Recommended