Upload
aditya-parameswaran
View
852
Download
0
Embed Size (px)
Citation preview
1
Aditya ParameswaranAssistant ProfessorUniversity of Illinois
(w/ Manasi Vartak, Samuel Madden @ MIT;Tarique Siddiqui, Silu Huang @ Illinois)
http://data-people.cs.illinois.eduDSIA Workshop, VIS 2015
Towards Visualization Recommendation Systems
2
“Bring out your dead!” courtesy Monty Python
The Dark Ages of Visualization Recommendations
Substantial manual effort and tedious trial-and-error
3
To the Age of Enlightenment: the Holy Grail
Can we build systems that automatically recommend visualizations highlighting patterns
of interest?
“The Holy Grail” courtesy Monty Python
4
Why now?Reason 1: Too much data: records and attributes
Most of the dataset is unexplored!
5
Why now?Reason 2: Lack of skills
Harvard Business Review
Mashable.com
6
Limitations in Current Tools
• Big Picture• Analyst Preferences• Specification• Exploration
not ACID …
7
Limitations in Current Tools
• Big Picture– Poor comprehension of context
• Analyst Preferences– Limited understanding of user interests
• Specification– Insufficient means to specify trends of
interest• Exploration– Inadequate navigation to unexplored
areas
8
Recent Attempts at Vizrec Systems
• Tableau Elastic• Voyager • Harvest • Profiler • Our systems– SeeDB [VLDB 14 x 2, VLDB 16]– zenvisage [unpublished]
This conference!
Still early days!
9
SeeDB: Comparative TasksTask: Compare staplers (target, query) with other productsResults:Visualizations where staplers “differ most” from other products
Issue: Many attributes Many many visualizations!
MA CA IL NY
50
10 1030
3020
10
40
Stapler sales
Other sales
Stapler prod
9
Other prod
10
: Search Tasks
Very early demo! Feedback welcome.
(you saw it here first...)
11
5 Recommendation Axes• Specification of Intended Task or Insight– e.g., comparative (X vs. Y), search (find X with
a desired criteria), outliers (find unusual X)• Data Characteristics– e.g., typical correlations, patterns, trends
across attributes, across rows• Semantics or Domain Knowledge• Visual Ease of Understanding• Analyst Preferences
data-people.cs.illinois.edu/papers/dsia.pdf
12
Architectural Considerations• Pre-computation• Online computation–Sharing –Parallelism–Pruning–Approximations [VLDB’15]
data-people.cs.illinois.edu/papers/dsia.pdf
13
A Clarion Call to DSIA Researchers…
Visualization Recommendation Systems:are critically importantare timelylead to interesting viz, db, ml, hci
problemsLet’s move towards the age of enlightenment!“The Holy Grail” courtesy Monty Python
data-people.cs.illinois.edu/papers/dsia.pdf
14
Ongoing Projects in Interactive Analytics
Minimizing effort & maximizing efficiencyhttp://data-people.cs.illinois.edu
• Data Manipulation [VLDB’15 x 2]
• Data Visualization [VLDB’14 x 2, VLDB ’15, VLDB ‘16]
• Data Collaboration [VLDB ’15 x 2, CIDR ’15, TAPP ’15]
• Data Processing with [VLDB ’15, HCOMP ’15, KDD ‘15]datahub
Recent Papers, Demos
POPULACE
15
16
Research Thrust II: CrowdsMinimizing cost and maximizing accuracy
inhuman-powered data managementData
ProcessingAlgorithms
Auxiliary Plugins: Quality, Pricing
Data Processing Systems
Filter [SIGMOD12, VLDB14] Max [SIGMOD12] Clean [KDD12, TKDD13] Categorize [VLDB11]Search [ICDE14] Debug [NIPS12] Count [HCOMP15]Deco [CIKM12, VLDB12, TR12, SIGMOD Record 12]DataSift [HCOMP13, SIGMOD14] HQuery [CIDR11] Conf [KDD13, ICDE15] Evict [TR12] Debias [KDD15] Pricing[VLDB15] Quality [HCOMP14]
17
Human-in-the-loop Data Management
Dual personalities
• Analysts supervising the analysis– How do we help them get the insights
they want?• Crowds helping the analysis– How do we best make use of them to
process data?
18
Visualizations
Queries (100s)
Sharing
Pruning
Optimizer
DBMS
Middleware Layer
Task Specification
Manual Visualization Builder
Visualization Pane
Recommendation Bar
User StudyPart I : Validate utility metric vs. other metrics– See paper!
Part II : Study impact of recommendations– H1: SeeDB finds interesting visualizations
faster– H2: Users prefer tool w/recommendations
I. SeeDB enables faster analysis
• Users view more visualizations with SeeDB• Users bookmark more visualizations with
SeeDB• Bookmark rate 3X higher with SeeDB
# charts # bookmarks
bookmark rate
Manual 6.3 +/- 3.8 1.1 +/- 1.45 0.14 +/- 0.16
SeeDB 10.8 +/- 4.41 3.4 +/- 1.35 0.43 +/- 0.23
II. Users Prefer SeeDB100% users prefer SeeDB over Manual
“. . . quickly deciding what correlations are relevant” and “[analyze] . . . a new dataset quickly”
“. . . great tool for proposing a set of initial queries for a dataset”“. . . potential downside may be that it made me
lazy so I didn’t bother thinking as much about what I really could study or be interested in”
Questions on Part 2?
24
Overall research agenda …
Human-in-the-loop Data Management
25