Upload
bo-wu
View
264
Download
2
Embed Size (px)
Citation preview
1
Maximizing Correctness with Minimal User Effort to Learn Data Transformations
Bo Wu and Craig KnoblockUniversity of Southern California
Department of Computer Science
4
Programming by Example
Video is from Excel YouTube official channel (https://www.youtube.com/watch?v=YPG8PAQQ894)
8
Problem
Enable the users of PBE systems to achieve maximal correctness with minimal effort on large datasets
Help users to identify at least one of all incorrect records in every iteration with minimal effort on large datasets
Approach Overview
9
Raw Transformed
10“ H x 8” W 10
H: 58 x W:25” 58
12”H x 9”W 12
11”H x 6” 11
… …
30 x 46” 30 x 46
Entire dataset
RandomSampling
Raw Transformed
10“ H x 8” W 10
11”H x 6” 11
… …
30 x 46” 30 x 46
Sampled records
Verifying records
Raw Transformed
11”H x 6” 11
30 x 46” 30 x 46
… …
Sorting and color-codingRaw Transformed
30 x 46” 30 x 46
11”H x 6” 11
… …
11
Verifying Records• First recommend records causing runtime
errors– Records cause the program exit abnormally
• Second recommend potentially incorrect records– Learn a binary meta-classifier
Input: 2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic
Raw Transformed
11”H x 6” 11
30 x 46” 30 x 46
… …
Ex:
12
Learning the Meta-classifier
cs1
…Meta-classifier
cs2
cs4 cs3
cp1
…
cp2
cp3 cp4
cf1
…
cf2
cf3 cf4
Program agreement
Format ambiguity
Similarity
cs3
cs4
cp2
cf1
w1
w2
w3
w4
…
15
User studyExperiment setup:• 5 scenarios with 4000 records per scenario• 10 graduate students divided into two groups
16
Summary and Future Work
• Summary– Sample records– Identify incorrect/questionable records– Allow user to refine the recommendation– Color-code the results
• Future work– Show histograms of the data– Translate the program to readable natural text
18
Type of Classifiers
• Classifier based on distance• Classifier based on agreement of programs• Classifier based on format ambiguity
19
Learning from various past results
…
Raw Transformed
26" H x 24" W x 12.5 26
Framed at 21.75" H x 24.25” W 21
12" H x 9" 12
…
Raw Transformed
Ravage 2099#24 (November, 1994) November, 1994
Gambit III#1 (September, 1997) September, 1997
(comic) Spidey Super Stories#12/2 (September, 1975)
comic
…
Examples
Incorrectrecords
Correctrecords