Fast Cross Validation Via Sequential Analysis - Talk

Embed Size (px)

Citation preview

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    1/16

    Fast Cross-

    Validation viaSequential

    Analysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Fast Cross-Validation via Sequential Analysis

    Tammo KruegerDanny PankninMikio Braun

    Machine Learning GroupTechnische Universitaet Berlin

    16.12.2011 Big Learning Workshop

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    2/16

    Fast Cross-

    Validation viaSequential

    Analysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Motivation

    Cross-validation is an indispensable tool for applied MLbut unfortunately very time consuming

    Example fitted Q iteration:

    Cross-Validation 10 10 parameter

    10fold

    50max. iter.

    10reps.

    = 500, 000 reg. problems

    Directly optimizing the error landscape to avoid

    calculations difficult due to noiseOur approach: use increasing subsets of the training data

    1 smaller subsets less training time2 more training data better error estimate3 relative behavior of parameter configurations converges

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    3/16

    Fast Cross-

    Validation viaSequential

    Analysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Motivation Main Idea (Average over 500 Reps.)

    log()

    Class.

    Error

    0.1

    0.2

    0.3

    0.4

    qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qqq

    q

    q

    q

    qqqqq

    qqqqqqqq

    4 3 2 1 0 1 2

    Size

    q 50

    100

    250

    500

    750

    1000

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    4/16

    Fast Cross-

    Validation viaSequential

    Analysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Motivation Main Idea (Individual Reps.)

    log()

    Class.

    Error

    0.1

    0.2

    0.3

    0.4

    0.5

    0.1

    0.2

    0.3

    0.4

    0.5

    0.1

    0.2

    0.3

    0.4

    0.5

    1qqqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qqqqqqqqqqqqqq

    q

    q

    q

    qq

    q

    qqq

    q

    qq

    q

    q

    q

    qq

    q

    4

    qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qqq

    q

    q

    q

    qq

    qqqqqqq

    q

    q

    q

    7

    qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qqqqqqqqqqq

    q

    q

    q

    qq

    q

    q

    qqqqqq

    q

    q

    qq

    4 3 2 1 0 1 2

    2qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qqqqqqqqqqqqq

    qq

    qq

    qqqqqq

    qq

    q

    q

    q

    qq

    5

    qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qqqq

    qqqqqq

    qq

    qqq

    q

    qqq

    8

    qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qqqq

    qqqqqq

    qq

    qqqqqq

    4 3 2 1 0 1 2

    3qqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qqqqqqqqqqqqqqqqq

    q

    q

    q

    q

    q

    qqqqqqq

    qqqq

    6

    qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qqqqqqqqqq

    q

    q

    q

    q

    q

    qqqqqq

    qqqq

    q

    qq

    9

    qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq

    qq

    q

    q

    qqqq

    qqqqqq

    qqq

    q

    q

    4 3 2 1 0 1 2

    Size

    q 50

    100

    250

    500

    750

    1000

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    5/16

    Fast Cross-

    Validation viaSequential

    Analysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Motivation Exploitation

    Observations:

    Individual runs are noisy, but at least we can see thetendency

    A lot of underperforming parameter configurations

    We can estimate the correct parameter on asufficiently large subset of the data

    Exploitation:

    Transformation of the pointwise test errors of the

    configurations into a binary top or flop scheme Dropping of significant loser configurations along the

    way via tests from the sequential analysis framework Early stopping of the procedure, when we have seen

    enough data for a stable parameter estimation

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    6/16

    Fast Cross-

    Validation viaSequential

    Analysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Fast Cross-Validation Procedure Algorithm

    data points steps

    conf. d1 d2 d3 dn1 dn 1 2 3 4 5 6 7 8 9 10c1 -2.2 -1.9 -1.8 2.1 1.5 flop 0 0 0 0 1 0 0 0 0 0 ()c2 -1.9 -2.4 -2.3 1.9 2.4 flop 0 1 0 0 0 0 0 0 0 0 ()c3 -1.4 -0.9 -0.7 0.5 0.5 flop 0 1 1 0 0 1 0 0 0 0

    ......

    ... ...

    ck2 0.6 0.6 0.7 -0.8 -0.4 top 0 1 0 1 1 1 1 0 1 1ck1 0.1 0.5 0.7 -0.9 -0.1 top 0 1 1 1 1 1 0 1 1 1

    ck 0.5 0.4 0.6 -0.3 0.0 top 1 1 0 1 1 1 0 1 1 1pointwisePerformance matrix trace matrix

    0 5 10 15 20

    0

    5

    10

    15

    20

    0 0 0 01 0 0 0 0 0 1

    1 01

    11 0

    11

    1

    X

    H0(0, 1, l, l)

    Sa(0, 1, l, l)

    WINNER

    LOSER

    c1ck

    7 8 9 10

    c3 0 0 0 0

    ?=

    ......

    ck2 1 0 1 1ck1 0 1 1 1

    ck 0 1 1 1

    = N/20

    modelSize = 10

    n = N 10

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    7/16

    Fast Cross-

    Validation viaSequential

    Analysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Meta-parameters Selection of Test Parameters

    ck 0.5 0.4 0.6 -0.3 0.0 top 1 1 0 1 1pointwisePerformance matrix trace matrix

    0 5 10 15 20

    0

    5

    10

    15

    20

    0 0 0 0

    1 0 0 0 0 0

    11 0

    11

    1 01

    11

    X

    H0(0, 1, l, l)

    Sa(0, 1, l, l)

    WINNER

    LOSER

    c1ck

    c

    ckck

    c

    mo

    (0, 1) =argmax0,

    1

    H0 (

    0,

    1, l, l)

    s.t. Sa(0, 1, l, l) (steps 1, steps]

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    8/16

    Fast Cross-

    Validation viaSequential

    Analysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Meta-parameters False Negative Rate

    Change Point

    FalseNegativeRate

    0.0

    0.2

    0.4

    0.6

    0.8

    q q q

    q q

    q q

    q q

    q

    q qq q q

    q q q q

    5 10 15

    Pi

    q 0.1

    0.2

    0.30.4

    0.5

    0 cp

    steps

    log l1l log10

    log 1ll

    log 1110

    security zone (false negative rate of 0)

    with steps

    log1 ll

    / log2

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    9/16

    Fast Cross-

    Validation viaSequential

    Analysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Fast Cross-Validation Procedure Example Run

    Step

    Configuration

    100

    200

    300

    400

    500

    600

    1 2 3 4 5 6 7

    Status

    flop

    top

    out

    50

    100

    150

    2 3 4

    50

    100

    150

    3 4 5

    20

    40

    60

    80

    4 5 6

    1020304050

    6070

    5 6 7

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    10/16

    Fast Cross-

    Validation viaSequentialAnalysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test ErrorSpeed Increase

    Conclusion

    Experimental Setup

    8 classification and 7 regression data sets

    For each dataset:12 for parameter estimation,

    12 for test error estimation

    SVM/SVR and Kernel Ridge Regression/Kernel Logistic

    Regression with Gaussian kernel using 610 parameterconfigurationsParameter estimation with:

    Full 10-fold cross-validation

    Fast cross-validation procedure with 10 steps

    Repeated 50 times with different splits for each datasetCompare:

    Test error difference of fast versus full cross-validationRelative speed factor, i.e. time full cross-validationtime fast cross-validation

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    11/16

    Fast Cross-

    Validation viaSequentialAnalysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test Error

    Speed Increase

    Conclusion

    Experiments Test Error

    MSEFastCV

    MSEFull

    CV

    0.04

    0.02

    0.00

    0.02

    0.04

    0.06

    Classification

    q

    q

    q

    q

    q

    q

    q

    q

    q

    qq

    q

    q

    q

    q

    q

    q

    q

    qq

    qq

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    qq

    q

    q

    q

    banana

    covtype

    german

    image

    ringnorm

    splice

    twonorm

    waveform

    Regression

    q

    qq

    q

    qq

    qq qq

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    ba

    nk32nm

    blocks

    bu

    mps

    do

    ppler

    he

    avisine

    kin32nm

    pu

    madyn32nm

    method

    kreg

    sv

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    12/16

    Fast Cross-

    Validation viaSequentialAnalysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test Error

    Speed Increase

    Conclusion

    Experiments Speed Increase

    TimeFullCV/TimeFastCV

    20

    40

    60

    80

    100

    120

    140

    Classification

    q

    q

    q

    q

    qq

    q

    q

    q

    q

    q

    qqq

    banana

    covtype

    german

    image

    ringnorm

    splice

    twonorm

    waveform

    Regression

    q

    q

    q

    q

    q

    qqq

    q

    q

    q

    q

    q

    q

    ba

    nk32nm

    blocks

    bu

    mps

    do

    ppler

    he

    avisine

    kin32nm

    pu

    madyn32nm

    method

    kreg

    sv

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    13/16

    Fast Cross-

    Validation viaSequentialAnalysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test Error

    Speed Increase

    Conclusion

    Fast Cross-Validation Procedure Summary

    Motivation: we can estimate the correct parameter on asufficiently large subset of the data

    Transformation: Race of configurations evaluated on

    linearly increasing subsets of the dataAt each step of this race:

    1 Transform the test errors on individual data points of theremaining configurations into a binary top or flop scheme

    2 Drop significant loser configurations along the way using

    tests from the sequential analysis framework3 Apply distribution free testing techniques to decide,

    whether we have gathered enough evidence for a stableparameter estimation

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    14/16

    Fast Cross-

    Validation viaSequentialAnalysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test Error

    Speed Increase

    Conclusion

    Questions? Remarks?Thanks for your attention!

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    15/16

    Fast Cross-

    Validation viaSequentialAnalysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test Error

    Speed Increase

    Conclusion

    Experiments Traces Classification

    variable

    value

    0

    100

    200

    300

    400

    500

    600

    0

    100

    200

    300

    400

    500

    600

    0

    100

    200

    300

    400

    500

    600

    banana

    q q

    qqqqqqqqq qqqqqqqqqqqq

    q

    q

    qq

    q

    qqq

    qq

    q

    qqq

    qq

    q

    q

    qqq

    qqqq

    qq

    image

    q

    q

    q

    qqqqqqq

    q

    q

    q

    q

    q

    qqqq

    q

    q

    q

    qqqqqq

    q

    qq

    q

    qqq

    q

    q

    q

    q

    q

    qq

    qq

    qq

    qq

    q

    qq

    q

    q

    twonorm

    q q

    qqqqqqqqqq

    q

    qq

    q

    q

    q

    qqq

    q

    qq

    qq

    q

    q

    q

    q

    q

    qq

    qq

    S1S2S3S4S5S6S7S8S9S10

    covtype

    q

    q

    q

    q q

    qqq

    qqqq

    q

    q

    q

    q

    q

    qqq

    q

    qq

    q

    qqq

    qq

    q

    q

    q

    q

    qq

    qq

    qqq

    q

    qq

    q

    q

    q

    q

    q

    qqqq

    q

    q

    q

    q

    q

    q q

    q

    q

    q

    q

    ringnorm

    qqqqqqqqqqqq

    qq

    qqq

    q

    qqqq

    qqq

    qqq

    qq

    q q

    waveform

    qq qq

    qqqqq q

    qq

    q

    q

    q

    q

    q

    q

    q

    qqqqqqq

    qqqq

    q

    q

    q

    q

    qqqq

    qqq

    qq

    S1S2S3S4S5S6S7S8S9S10

    german

    qq

    q

    q

    qq

    q

    q

    q

    q

    qq

    q

    q

    qq

    q

    q

    q

    q

    qqqq

    q

    q

    q

    q

    q

    q

    q

    q

    q

    qq

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    qq

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    q

    splice

    qqqqqqqqqqqqqq

    qqqqqqq

    qq

    qqq

    qqqqqq

    qq

    q qqqqqq

    qq

    q qqqq

    qq

    q

    S1S2S3S4S5S6S7S8S9S10

    method

    kreg

    sv

  • 8/3/2019 Fast Cross Validation Via Sequential Analysis - Talk

    16/16

    Fast Cross-

    Validation viaSequentialAnalysis

    TammoKruegerDanny

    PankninMikio Braun

    Motivation

    Fast CV

    Algorithm

    Meta-parameters

    Example Run

    Experiments

    Test Error

    Speed Increase

    Conclusion

    Experiments Traces Regression

    variable

    value

    0

    100

    200

    300

    400

    500

    600

    0

    100

    200

    300

    400

    500

    600

    0

    100

    200

    300

    400

    500600

    bank32nm

    qqq

    qqqqqqqq

    qqqqq q qqqq q q

    doppler

    qq qq

    pumadyn32nm

    q q

    q

    qqqqqqqqqq

    qqqqqqqqqq qq

    qqqqqq qqqqqqqq

    qqqqq qq

    S1S2S3S4S5S6S7S8S9S10

    blocks

    q q q q qqq

    heavisine

    qq

    qq

    q

    qqqqqqq qq

    q qq

    S1S2S3S4S5S6S7S8S9S10

    bumps

    qqqqqqq qqqq qqqqqqq qqqq

    kin32nm

    q

    q

    q

    qqqqqq qqq

    qqqq

    qq

    qq

    qqqq

    qq

    S1S2S3S4S5S6S7S8S9S10

    method

    kreg

    sv