38
/40 1 多数 のグラフからの 統計的機械学習 海道学・創成研究機構 [email protected] 瀧川 知能学会 第94回知能基本問題研究会

多数のグラフからの統計的機械学習 (人工知能学会 第94回人工知能基本問題研究会 招待講演)

Embed Size (px)

Citation preview

  • /401

    [email protected]

    94

  • /402

    0.1 0.7 1.2 0.2 1.3 0.9

    g1 g2 g3 g4 g5

    y1 y2 y3 y4 y5

    ...

    yn

    gn

  • /403

    (1) (2) (3)

    (3)

  • /404

    0.1 0.7 1.2 0.2 1.3 0.9

    g1 g2 g3 g4 g5

    y1 y2 y3 y4 y5

    g6

    y6

    y

    g

  • /405

    0.1 0.7 1.2 0.2 1.3 0.9

    g1 g2 g3 g4 g5

    y1 y2 y3 y4 y5

    g6

    y6

    y

    g

  • /406

    0.1 0.7 1.2 0.2 1.3 0.9

    ()

    Hansh-QSAR

    (,,2001)10.3

    (SPR)) (SAR)

  • /407

    ) (SAR)

    (Hansh-QSAR) (LogP)HOMO/LUMO

    ()/

    2013(199815) ()

  • /408

    0.1 0.7 1.2 0.2 1.3 0.9

    ) (SAR)

    /

    /

    (ADME)

  • /409

    a

    h

    h

    h

    h

    d

    hh

    a

    h

    r

    r

    r

    r

    r

    r

    r

    rr

    r

    r

    r

    C

    O

    N

    S

    CC

    C

    CC

    C

    C

    C

    C

    C

    C

    C

    C

    C

    C C

    C

    C

    O2x

    C1x

    C1x

    C1x

    C1x

    N1x

    C1bC1b

    S2a

    C1c

    C8y

    C8y

    C8x

    C8x

    C8x

    C8x

    C8x

    C8xC8x

    C8x

    C8x

    C8x

    RA

    L

    L

    Ar

    ArA

    Structure

    diagram

    Skeletal

    topology

    Atom/bond

    labeled graph

    KEGG atom

    labeled graph

    (KCF)

    Pharmacophore

    type labeled graph

    (ChemAxon Screen)

    Reduced graph

    1

    11 1

    1

    1

    1

    11

    1 1

    2

    1

    11

    1

    1

    2

    2

    2

    2

    2

    1

    1

    graph=

    graphterm(Biggs, Lloyd, Wilson, , 1986)

  • PubChemunique > 5200 (2014722)

  • (S. Nowozin, Learning with Structured Data: Applications to Computer Vision, Phd Thesis, 2009)

    ?

  • /4012

    (/)

    (?)

    /

  • /4013

    (1) (2) (3)

    (3)

  • /4014

    0.1

    0.7

    0.9

    1.2

    0 0 1 1 1 0

    1 0 0 0 0 1

    1 1 0 1 1 0

    1 0 1 1 1 0

    y g

    x1 x2 x3 x4 x5 x6

    g1

    g2

    g3

    gn

    n

    0 or 1 ()

    bag-of-features

  • /4015

    Data-Driven Fingerprints Extended Connectivity Fingerprint

    (Rogers and Hahn, 2010) Frequent and/or Bounded-Size Subgraphs

    (Wale et al, 2008)

    Sparse Learning Graph AdaBoost

    (Kudo et al, 2004) Graph LPBoost (gBoost)

    (Saigo et al, 2009) Graph LARS/LASSO

    (Tsuda et al, 2007)

    Discriminative Subgraph Mining () LEAP

    (Yan et al, 2008) GraphSig

    (Ranu et al, 2009) CORK

    (Thoma et al, 2009)

    Graph Kernels Marginalized Kernels

    (Kashima et al, 2003, 2004; Mhe et al, 2005) Walk Kernels

    (Grtner et al, 2003; Borgwardt et al, 2005; Vishwanathan et al, 2010)

    Weighted Decomposition Kernels(Menchetti et al, 2005)

    Subtree Kernels(Mah and Vert, 2009)

    Weisfeiler-Lehman Kernel (Shervashidze et al, 2011)

  • /4016

    0.1

    0.7

    0.9

    1.2

    0 0 1 1 1 0

    1 0 0 0 0 1

    1 1 0 1 1 0

    1 0 1 1 1 0

    y g

    x1 x2 x3 x4 x5 x6

    g1

    g2

    g3

    gn

    Data-Driven Fingerprints

    1.

    2.

    () PubChem fingerprintMaccs Key

    fingerprint

    0-1fingerprint

  • /4017

    Data-Driven Fingerprints

    (Wale et al, KAIS, 2008)

    1. fp: Hashed Fingerprint () 2. ECFP () 3. MK: 166bit 4. FS: 5. GF:

    GFECFP FS(!)

    ROC50AUC (50 false positivesAUC)

  • /4018

    Hashed Fingerprint (Wale et alChemAxon)

    Data-Driven Fingerprints

    DaylightChemAxon ()

    https://docs.chemaxon.com/display/jchembase/User's+Guide

  • /4019 Data-Driven Fingerprints

    Extended Connectivity Fingerprints

    http://chembioinfo.com/2011/10/30/revisiting-molecular-hashed-fingerprints/

    1. 03

    2. (SMARTS/SMILES)Hashed Fingerprint

    (variation)

    Morgan(Morgan: J. Chem. Doc. 5, 107-113, 1965)

  • /4020 Graph Kernels

    0.1

    0.7

    0.9

    1.2

    0 0 1 1 1 0

    1 0 0 0 0 1

    1 1 0 1 1 0

    1 0 1 1 1 0

    y g

    x1 x2 x3 x4 x5 x6

    g1

    g2

    g3

    gn

    implicit

    Walk lPath Tree:

    Subgraph(NP-hard)

  • /40

    0 1 0 0 1

    0 0 1 0 1

    1 2 0 0 0

    0 0 0 1 2

    21 Graph Kernels

    a a a b a c b b b c

    a b c

    a c b

    a a b

    b

    b b c

    g1

    g2

    g3

    x1 x2 x3 x4 x5

    ) ()

    21 22 0 52 2 0 5

    g1

    g2g3

    g1 g2 g3

    OK

  • /40

    ( )

    22 Graph Kernels

    Marginalized Kernels(Kashima et al, 2003, 2004; Mhe et al, 2005)

    Walk Kernels(Grtner et al, 2003; Borgwardt et al, 2005; Vishwanathan et al, 2010)

    Weighted Decomposition Kernels(Menchetti et al, 2005)

    Subtree Kernels(Mah and Vert, 2009)

    Weisfeiler-Lehman Kernel (Shervashidze et al, 2011)

    Vk(g,g)

    V (Hilbert)

    V(k)?

  • /4023 Graph Kernels

    ) Weisfeiler-Lehman Kernel

    ECFP

    2checkWeisfeiler-Lehman (1968)

    http://www.cc.gatech.edu/~lsong/teaching/8803ML/lecture22.pdf

    (Shervashidze et al, JMLR 2011)

    x5 x5 x3 x2

    x1x3 x2

    x1

    x1 x2 x2 x1

    x2 x1 x2 x1

  • /4024 Sparse Learning: Boosting

    0.1

    0.7

    0.9

    1.2

    0 0 1 1 1 0

    1 0 0 0 0 1

    1 1 0 1 1 0

    1 0 1 1 1 0

    y g

    x1 x2 x3 x4 x5 x6

    g1

    g2

    g3

    gn

    ()boosting

  • /4025 Sparse Learning: Boosting

    ) Adaboost(Kudo et al, NIPS 2004)

    iteration()

    TT

    (gSpan)Branch and Bound LPboost

    (AdaboostArc-GVsoft-margin boosting)

  • /4026 Sparse Learning: Boosting

    ) LPboost(gBoost)(Saigo et al, Mach Learn, 2009)

    AdaboostLPboost

    LSVMhinge (1-norm SVM)

    SVM

    hinge loss+L1

    Totally corrective boosting

    Dantzig-WolfeLP = boostingLPboost (Coordinate Descent?)

    Adaboost

    LPboost (Demiriz et al,2002)

  • /4027

    1.Data-Driven FingerprintHashed Fingerprint, ECFP, ,

    2.Graph Kernels , Weisfeiler-Lehmann

    3.BoostingAdaboostLPboost

    3

    AdaboostLPboost(loss)

  • /4028

    (1) (2) (3)

    (3)

  • /4029 Sparse Learning

    1,2 > 0 L

    = (1,2, . . . )

    g g( |,0) := 0 +

    j=1

    jI( )xj

    min,0

    n

    i=1

    Lyi, (gi |,0)

    + 11 +

    2222

    0

    : AdaboostLPboost(Coordinate Descent)

    or Gradient BoostingBoosting

  • /4030

    g g( |,0) := 0 +

    j=1

    jI( )xj

    Sparse Learning:

    0.1

    0.7

    0.9

    1.2

    0 0 1 1 1 0

    1 0 0 0 0 1

    1 1 0 1 1 0

    1 0 1 1 1 0

    y g

    x1 x2 x3 x4 x5 x6

    g1

    g2

    g3

    gn

    AdaboostLPboost(graph kernel)

    0-1Pseudo-boolean function

    ()Linear threshold functionboolean cube

  • /4031bounding

    (gSpan)Branch and BoundLPboost

    0/1 0/1 0/1 0/1 0/1 0/1 0/1 0/1

  • /4032

    x

    x

    zi 2 {0, 1}f(z1, z2, . . . , zn) 2 R

    x

    x

    nBoolean vector

    1!

    1 0maxmin

    bounding

    nBoolean vector

  • /4033bounding

    f(u) f(v) f(u) for all v s.t. 1(v) 1(u)

    f(u) :=

    i1(u)

    max{fi(0), fi(1)} +

    i0(u)

    fi(0)

    f(u) :=

    i1(u)

    min{fi(0), fi(1)} +

    i0(u)

    fi(0)

    f : {0, 1}n ! R, f(u1, u2, . . . , un) =Pn

    i=1 fi(ui)

    v =

    nz }| {001000110 0

    u = 011001110 1

    Gain:n

    i=1 wiyi(2I(x gi) 1)

    Weighted error count:n

    i=1 wi I(I(x gi) = yi)

    Correlation with response:n

    i=1 yiI(x gi)

    Gain Morishita, 2001; Kudo et al, 2005

    Morishita-Kudo Bounds for Separable Functions

    0-1

    ()

  • /4034Boosting

    1I(x1 g)1I(x1 g) + 2I(x2 g)

    1I(x1 g) + 2I(x2 g) + 3I(x3 g)

    Iteration 1:

    Iteration 2:

    Iteration 3: ...

    x

    x

    Main Trick: MK BoundsBranch & Bound (pruning)

    (subtree)

    xi

    boosting

    iteration

    xx

    k-best( multiple pricing)

  • /4035

    Iterations: (t + 1) (t) + d(t), d(t) := T ((t)) (t)

    block coordinate descentBCGD

    T ((t)) := argmin

    hrf((t)), (t)i+ 1

    2h (t), H(t)( (t))i+R()

    2nd-order approx of f() at (t)

    min

    f() + R(), = (0,)nonsmooth

    Coordinate blockGauss-Southwell

    d(t)j = 0 for d(t)j C d(t) (Gauss-Southwell-r rule)

    step length selected by Armijo rule at each iteration

    : Tseng-Yuns BCGD

    BCGD: Block Coordinate Gradient Descent

  • /4036

    1) zero vector

    2) iterate: BCGDiteration(iteration) MK bounds (=)

    3) BCGD

    (boosting)(elastic-net)

    iteration iterationboosting

  • /4037

    () (+) LPboostL1-

    Boosting (test graph)()fingerprint()

    (L2)

    ()

    pseudo-boolean functionunique ()

  • /4038

    (1)

    (2)

    (3)

    (3)