20
Maximizing Correctness with Minimal User Effort to Learn Data Transformations Bo Wu and Craig Knoblock University of Southern California 1 Department of Computer Sci

Maximizing Correctness with Minimal User Effort to Learn Data Transformations

  • Upload
    bo-wu

  • View
    264

  • Download
    2

Embed Size (px)

Citation preview

1

Maximizing Correctness with Minimal User Effort to Learn Data Transformations

Bo Wu and Craig KnoblockUniversity of Southern California

Department of Computer Science

2

Art website Buyer

3

Dimension of artworks

4

Programming by Example

Video is from Excel YouTube official channel (https://www.youtube.com/watch?v=YPG8PAQQ894)

5

Too Many Records

6

Overconfident Users

Users are often too confident to examine the results thoroughly

7

Variations

8

Problem

Enable the users of PBE systems to achieve maximal correctness with minimal effort on large datasets

Help users to identify at least one of all incorrect records in every iteration with minimal effort on large datasets

Approach Overview

9

Raw Transformed

10“ H x 8” W 10

H: 58 x W:25” 58

12”H x 9”W 12

11”H x 6” 11

… …

30 x 46” 30 x 46

Entire dataset

RandomSampling

Raw Transformed

10“ H x 8” W 10

11”H x 6” 11

… …

30 x 46” 30 x 46

Sampled records

Verifying records

Raw Transformed

11”H x 6” 11

30 x 46” 30 x 46

… …

Sorting and color-codingRaw Transformed

30 x 46” 30 x 46

11”H x 6” 11

… …

10

Learning from users’ feedback

11

Verifying Records• First recommend records causing runtime

errors– Records cause the program exit abnormally

• Second recommend potentially incorrect records– Learn a binary meta-classifier

Input: 2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic

Raw Transformed

11”H x 6” 11

30 x 46” 30 x 46

… …

Ex:

12

Learning the Meta-classifier

cs1

…Meta-classifier

cs2

cs4 cs3

cp1

cp2

cp3 cp4

cf1

cf2

cf3 cf4

Program agreement

Format ambiguity

Similarity

cs3

cs4

cp2

cf1

w1

w2

w3

w4

13

Evaluation

• The recommendation contains incorrect records

14

Evaluation• The recommendation can place incorrect

records on top

15

User studyExperiment setup:• 5 scenarios with 4000 records per scenario• 10 graduate students divided into two groups

16

Summary and Future Work

• Summary– Sample records– Identify incorrect/questionable records– Allow user to refine the recommendation– Color-code the results

• Future work– Show histograms of the data– Translate the program to readable natural text

17

Questions ?

Data and system available athttps://github.com/areshand/Web-Karma

18

Type of Classifiers

• Classifier based on distance• Classifier based on agreement of programs• Classifier based on format ambiguity

19

Learning from various past results

Raw Transformed

26" H x 24" W x 12.5 26

Framed at 21.75" H x 24.25” W 21

12" H x 9" 12

Raw Transformed

Ravage 2099#24 (November, 1994) November, 1994

Gambit III#1 (September, 1997) September, 1997

(comic) Spidey Super Stories#12/2 (September, 1975)

comic

Examples

Incorrectrecords

Correctrecords

20

Sorting Records

Runtime errors

Rank records using #failed_subprograms

Rank records using meta-classifier output

Yes

No

Checking transformed records

Record #failed_subprograms

2008 Mitsubishi Galant ES $7500 (Sylmar CA) pic 3

1998 Honda Civic 12k miles s. Auto. - $3800 (Arcadia) 2