Upload
university-of-maryland
View
436
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Slide show on Information visualization for the Data Visualization Meetup in Washington, DC during www.bigdataweek.com April 22, 2013
Citation preview
Information Visualization forKnowledge Discovery
Ben Shneiderman [email protected] @benbendc
Founding Director (1983-2000), Human-Computer Interaction LabProfessor, Department of Computer Science
Member, Institute for Advanced Computer Studies
University of MarylandCollege Park, MD 20742
Turning Messy BigData into Actionable SmallData
@benbendc
University of MarylandCollege Park, MD 20742
Interdisciplinary research community - Computer Science & Info Studies - Psych, Socio, Poli Sci & MITH (www.cs.umd.edu/hcil)
Design Issues
• Input devices & strategies• Keyboards, pointing devices, voice
• Direct manipulation
• Menus, forms, commands
• Output devices & formats• Screens, windows, color, sound
• Text, tables, graphics
• Instructions, messages, help
• Collaboration & Social Media
• Help, tutorials, training
• Search www.awl.com/DTUI
Fifth Edition: 2010
• Visualization
HCI Pride: Serving 5B Users
Mobile, desktop, web, cloud
Diverse users: novice/expert, young/old, literate/illiterate, abled/disabled, cultural, ethnic & linguistic diversity, gender, personality, skills, motivation, ...
Diverse applications: E-commerce, law, health/wellness, education, creative arts, community relationships, politics, IT4ID, policy negotiation, mediation, peace studies, ...
Diverse interfaces: Ubiquitous, pervasive, embedded, tangible, invisible, multimodal, immersive/augmented/virtual, ambient, social, affective, empathic, persuasive, ...
Obama Unveils “Big Data” Initiative (3/2012)
Big Data challenges:
•Developing scalable algorithms for processing imperfect data in distributed data stores
•Creating effective human-computer interaction tools for facilitating rapidly customizable visual reasoning for diverse missions.
http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data_press_release_final_2.pdf `
Information Visualization & Visual Analytics
• Visual bands• Human percle
• Trend, clus..
• Color, size,..
• Three challe• Meaningful vi
• Interaction: w
• Process mo
1999
Information Visualization & Visual Analytics
• Visual bandwidth is enormous• Human perceptual skills are remarkable
• Trend, cluster, gap, outlier...
• Color, size, shape, proximity...
• Three challenges• Meaningful visual displays of massive da
• Interaction: widgets & window coordinati
• Process models for discovery
1999 2004
Information Visualization & Visual Analytics
• Visual bandwidth is enormous• Human perceptual skills are remarkable
• Trend, cluster, gap, outlier...
• Color, size, shape, proximity...
• Three challenges• Meaningful visual displays of massive data
• Interaction: widgets & window coordination
• Process models for discovery
1999 2004 2010
Business takes action
• General Dynamics buys MayaViz
• Agilent buys GeneSpring
• Google buys Gapminder
• Oracle buys Hyperion
• Microsoft buys Proclarity
• InfoBuilders buys Advizor Solutions
• SAP buys (Business Objects buys Xcelsius & Inxight & Crystal Reports )
• IBM buys (Cognos buys Celequest) & ILOG
• TIBCO buys Spotfire
Spotfire: Retinol’s role in embryos & vision
Spotfire: DC natality data
http://registration.spotfire.com/eval/default_edu.asp
10M - 100M pixels: Large displays
100M-pixels & more
1M-pixels & less Small mobile devices
Information Visualization: Mantra
• Overview, zoom & filter, details-on-demand
• Overview, zoom & filter, details-on-demand
• Overview, zoom & filter, details-on-demand
• Overview, zoom & filter, details-on-demand
• Overview, zoom & filter, details-on-demand
• Overview, zoom & filter, details-on-demand
• Overview, zoom & filter, details-on-demand
• Overview, zoom & filter, details-on-demand
• Overview, zoom & filter, details-on-demand
• Overview, zoom & filter, details-on-demand
Information Visualization: Data Types
• 1-D Linear Document Lens, SeeSoft, Info Mural
• 2-D Map GIS, ArcView, PageMaker, Medical imagery
• 3-D World CAD, Medical, Molecules, Architecture
• Multi-Var Spotfire, Tableau, Qliktech, Visual Insight
• Temporal LifeLines, TimeSearcher, Palantir, DataMontage
• Tree Cone/Cam/Hyperbolic, SpaceTree, Treemap
• Network Pajek, UCINet, NodeXL, Gephi, Tom Sawyer In
foV
iz
S
ciV
iz .
infosthetics.com visualcomplexity.com eagereyes.orgflowingdata.com perceptualedge.com datakind.orgvisual.ly Visualizing.org infovis.org
Anscombe’s Quartet
1 2 3 4
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Anscombe’s Quartet
1 2 3 4
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Property Value
Mean of x 9.0
Variance of x 11.0
Mean of y 7.5
Variance of y 4.12
Correlation 0.816
Linear regression y = 3 + 0.5x
Anscombe’s Quartet
Temporal Data: TimeSearcher 1.3
• Time series• Stocks
• Weather
• Genes
• User-specified patterns
• Rapid search
Temporal Data: TimeSearcher 2.0
• Long Time series (>10,000 time points)
• Multiple variables
• Controlled precision in match (Linear, offset, noise, amplitude)
LifeLines: Patient Histories
www.cs.umd.edu/hcil/lifelines
LifeLines2: Align-Rank-Filter & Summarize
LifeFlow: Aggregation Strategy
Temporal Categorical Data (4 records)
LifeLines2 format
Tree of Event Sequences
LifeFlow Aggregation
www.cs.umd.edu/hcil/lifeflow
LifeFlow: Interface with User Controls
EventFlow: Original Dataset
LABA_ICSs Merged
SABAs Merged
Align by First LABA_ICS
Reduce Window Size
EventFlow Team: Oracle support
www.cs.umd.edu/hcil/eventflow
www.umdrightnow.umd.edu/news/umd-research-team-developing-powerful-data-visualization-tool-support-oracle
Treemap: Gene Ontology
www.cs.umd.edu/hcil/treemap/
+ Space filling
+ Space limited
+ Color coding
+ Size coding - Requires learning
(Shneiderman, ACM Trans. on Graphics, 1992 & 2003)
www.smartmoney.com/marketmap
Treemap: Smartmoney MarketMap
Market falls steeply Feb 27, 2007, with one exception
Market falls steeply Sept 22, 2011, some exceptions
Market mixed, February 8, 2008 Energy & Technology up, Financial & Health Care down
Market rises, September 1, 2010, Gold contrarians
Market rises, March 21, 2011, Sprint declines
newsmap.jp
Treemap: Newsmap (Marcos Weskamp)
Treemap: WHC Emergency Room (6304 patients in Jan2006)
Group by Admissions/MF, size by service time, color by age
Treemap: WHC Emergency Room (6304 patients in Jan2006) (only those service time >12 hours)
Group by Admissions/MF, size by service time, color by age
www.hivegroup.com
Treemap: Supply Chain
www.hivegroup.com
Treemap: Nutritional Analysis
www.spotfire.com
Treemap: Spotfire Bond Portfolio Analysis
Treemap: NY Times – Car&Truck Sales
www.cs.umd.edu/hcil/treemap/
Treemap (Voronoi): NY Times - Inflation
www.nytimes.com/interactive/2008/05/03/business/20080403_SPENDING_GRAPHIC.html
VisualComplexity.com : Manuel Lima
SocialAction
• Integrates statistics & visualization
• 4 case studies, 4-8 weeks (journalist, bibliometrician, terrorist analyst, organizational analyst)
• Identified desired features, gave strong positive feedback about benefits of integration
Perer & Shneiderman, CHI2008, IEEE CG&A 2009www.cs.umd.edu/hcil/socialaction
www.centrifugesystems.com
Network from Database Tables
NodeXL: Network Overview for Discovery & Exploration in Excel
www.codeplex.com/nodexl
NodeXL: Network Overview for Discovery & Exploration in Excel
www.codeplex.com/nodexl
NodeXL: Import Dialogs
www.codeplex.com/nodexl
Tweets at #WIN09 Conference: 2 groups
Flickr networks
Twitter discussion of #GOP
Red: Republicans, anti-Obama, mention FoxBlue: Democrats, pro-Obama, mention CNNGreen: non-affiliated
Node size is number of followersPolitico is major bridging group
Analogy: Clusters Are OccludedHard to count nodes, clusters
Separate Clusters Are More Comprehensible
Twitter networks: #SOTU
Group-In-A-Box: Twitter Network for #CI2012
Twitter Network for “TTW”
Pennsylvania Innovation Network
PatentTech
SBIR (federal)
PA DCED (state)Related patent
2: Federal agency
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
Pittsburgh Metro
Westinghouse Electric
Pharmaceutical/Medical
No Location Philadelphia
Navy
Innovation Patterns: 11,000 vertices, 26,000 edges
PatentTech
SBIR (federal)
PA DCED (state)Related patent
2: Federal agency
3: Enterprise
5: Inventors
9: Universities
10: PA DCED
11/12: Phil/Pitt metro cnty
13-15: Semi-rural/rural cnty
17: Foreign countries
19: Other states
Pittsburgh Metro
Westinghouse Electric
Pharmaceutical/Medical
No Location Philadelphia
Navy
Innovation Clusters: People, Locations, Companies
Interactive Methods to Reveal Patterns
Filtering Node & link attribute values or statistics
Clustering Cluster algorithmically by link connectivity
Grouping Group based on node attributes
Motif Common, meaningful structures Simplification replaced with simplified glyphs
Senate Co-Voting
Group-In-A-Box by Region
Interactive Methods to Reveal Patterns
Filtering Node & link attribute values or statistics
Clustering Cluster algorithmically by link connectivity
Grouping Group based on node attributes
Motif Common, meaningful structures Simplification replaced with simplified glyphs
Motif Simplification
(a) Fan motifs & glyphs (b) Connector motifs & glyphs
Motif Simplification
Motif Simplification
Clique Motifs & Glyphs: 4, 5 & 6
Senate Co-Voting: 65% Agreement
Senate Co-Voting: 70% Agreement
Senate Co-Voting: 80% Agreement
Senate Co-Voting: 90% Agreement
Senate Co-Voting: 95% Agreement
Analyzing Social Media Networks with NodeXL
I. Getting Started with Analyzing Social Media Networks 1. Introduction to Social Media and Social Networks 2. Social media: New Technologies of Collaboration 3. Social Network Analysis
II. NodeXL Tutorial: Learning by Doing 4. Layout, Visual Design & Labeling 5. Calculating & Visualizing Network Metrics 6. Preparing Data & Filtering 7. Clustering &Grouping
III Social Media Network Analysis Case Studies 8. Email 9. Threaded Networks 10. Twitter 11. Facebook 12. WWW 13. Flickr 14. YouTube 15. Wiki Networks
www.elsevier.com/wps/find/bookdescription.cws_home/723354/description
Social Media Research Foundation
Researchers who want to - create open tools - generate & host open data - support open scholarship
Map, measure & understand social media
Support tool projects to collection, analyze & visualize social media data.
smrfoundation.org
Sense-Making Loop
Thomas & Cook: Illuminating the Path (2004)
Sense-Making Loop: Expanded
Thomas & Cook: Illuminating the Path (2004)
Discovery Process: Systematic Yet Flexible
Preparation• Own the problem & define the schedule• Data cleaning & conditioning• Handle missing & uncertain data• Extract subsets & link to related information
Preparation• Own the problem & define the schedule• Data cleaning & conditioning• Handle missing & uncertain data• Extract subsets & link to related information
Purposeful exploration – Hypothesis testing• Range & distribution• Relationships & correlations• Clusters & gaps• Outliers & anomalies• Aggregation & summary• Split & trellis• Temporal comparisons & multiple views• Statistics & forecasts
Discovery Process: Systematic Yet Flexible
Preparation• Own the problem & define the schedule• Data cleaning & conditioning• Handle missing & uncertain data• Extract subsets & link to related information
Purposeful exploration – Hypothesis testing• Range & distribution• Relationships & correlations• Clusters & gaps• Outliers & anomalies• Aggregation & summary• Split & trellis• Temporal comparisons & multiple views• Statistics & forecasts
Situated decision making - Social context• Annotation & marking• Collaboration & coordination• Decisions & presentations
Discovery Process: Systematic Yet Flexible
UN Millennium Development Goals
• Eradicate extreme poverty and hunger• Achieve universal primary education• Promote gender equality and empower women• Reduce child mortality• Improve maternal health• Combat HIV/AIDS, malaria and other diseases• Ensure environmental sustainability• Develop a global partnership for development
To be achieved by 2015
30th Anniversary SymposiumMay 22-23, 2013
www.cs.umd.edu/hcil
For More Information
• Visit the HCIL website for 700+ papers & info on videos www.cs.umd.edu/hcil
• See Chapter 14 on Info Visualization Shneiderman, B. and Plaisant, C., Designing the User Interface: Strategies for Effective Human-Computer Interaction: Fifth Edition (2010) www.awl.com/DTUI
• Edited Collections: Card, S., Mackinlay, J., and Shneiderman, B. (1999) Readings in Information Visualization: Using Vision to Think Bederson, B. and Shneiderman, B. (2003) The Craft of Information Visualization: Readings and Reflections
For More Information
• Treemaps• HiveGroup: www.hivegroup.com • Smartmoney: www.smartmoney.com/marketmap • HCIL Treemap 4.0: www.cs.umd.edu/hcil/treemap
• Spotfire: www.spotfire.com • TimeSearcher: www.cs.umd.edu/hcil/timesearcher • NodeXL: nodexl.codeplex.com• Hierarchical Clustering Explorer:
www.cs.umd.edu/hcil/hce
• LifeLines2: www.cs.umd.edu/hcil/lifelines2 • EventFlow: www.cs.umd.edu/hcil/eventflow