Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Non-monotone Continuous DR-submodular Maximization:Structure and Algorithms
An Bian, Kfir Y. Levy, Andreas Krause and Joachim M. Buhmann
DR-submodular (Diminishing Returns) Maximization & Its Applications
π Softmax extension for determinantal point processes (DPPs) [Gillenwater et al β12]
π Mean-field inference for log-submodular models [Djolonga et al β14]
π DR-submodular quadratic programming
π (Generalized submodularity over conic lattices) e.g., logistic regression with a non-convex separable regularizer [Antoniadis et al β11]
π Etcβ¦ (more see paper)
Based on Local-Global Relation, can use any solver for finding an approximately stationary point as the subroutine, e.g., the Non-convex Frank-Wolfe solver in [Lascote-Julien β16]
TWO-PHASE ALGORITHMInput: stopping tolerances π", π$, #iterations πΎ", πΎ$π β Non-convex Frank-Wolfe(π, π«, πΎ", π") // Phase I on π«π¬ β π« β© π π β€ π0 β π}π β Non-convex Frank-Wolfe(π, π¬, πΎ$, π$) // Phase II on π¬Output: argmax π π , π(π)
Underlying Properties of DR-submodular Maximization
π Concavity Along Non-negative Directions:
Experimental Results (more see paper)
DR-submodular (DR property) [Bian et al β17]: βπ β€ π β π³, βπ, βπ β βC, it holds,
π ππE + π β π π β₯ π ππE + π β π(π).
- If π differentiable, π»π() is an antitone mapping (βπ β€ π, it holds π»π π β₯ π»π π )
- If π twice differentiable, π»EI$π π β€ 0, βπ
maxπβπ«
π(π)π: π³ β β is continuous DR-submodular. π³ is a hypercube. Wlog, let π³ = π, π0 . π« β π³ is convex and down-closed: π β π« & π β€ π β€ π implies π β π«.
App
licat
ions
Ref
eren
ces
Feldman, Naor, and Schwartz. A unified continuous greedy algorithm for submodular maximization. FOCS 2011
Gillenwater, Kulesza, and Taskar. Near-optimal map inference for determinantal point processes. NIPS 2012.
Bach. Submodular functions: from discrete to continous domains. arXiv:1511.00394, 2015.
Lacoste-Julien. Convergence rate of frank-wolfe for non-convex objectives. arXiv:1607.00345, 2016.
Bian, Mirzasoleiman, Buhmann, and Krause. Guaranteed non-convex optimization: Submodular maximization over continuous domains. AISTATS 2017.
Quadratic Lower Bound. With a πΏ-Lipschitz gradient, for all π and π β Β±βCT , it holds,π π + π β₯ π π + β¨π»π π , πβ© β W$ π X
Strongly DR-submodular & Quadratic Upper Bound. π is π-strongly DR-submodular if for all π and π β Β±βCT , it holds,
π π + π β€ π π + β¨π»π π , πβ© β Z$ π X
Two Guaranteed Algorithms
Guarantee of TWO-PHASE ALGORITHM.
max π π , π π β₯[\( π β πβ $ + π β πβ $)+^_ ` π
β abcd e^f^g^
οΏ½ ,i^ abcdeXfXg^
οΏ½ ,iX ,
where πβ β π β¨ πβ β π
NON-MONOTONE FRANK-WOLFE VARIANTInput: step size πΎ β (0,1]π(o) β 0, π β 0, π‘(o) β 0 // π‘: cumulative step sizeWhile π‘(q) < 1 do:
π(q) β argmaxπβπ«,πsπ0aπ(t) π, π»π(π(q)) // shrunken LMO
πΎq β min πΎ, 1 β π‘(q)
π(qC") β π(q) + πΎqπ(q), π‘(qC") β π‘(q) + πΎq, π + +Output: π(w)
Guarantee of NON-MONOTONE FRANK-WOLFE VARIANT.
π π w β₯ πa"π πβ β π "wX π πβ β z
XW$w
Baselines: - QUADPROGIP: global solver for non-convex quadratic programming (possibly in exponential time)- Projected Gradient Ascent (PROJGRAD) with diminishing step sizes (" qC"β )
DR-submodular Quadratic Programming. Synthetic problem instances π π = ^Xπ|ππ + π
π + π, π« = {π β βCT|ππ β€ π, π β€π0, π β βCCΓT, π β βC} has π linear constraints.
Randomly generated in two manners:1) Uniform distribution (see Figs below); 2) Exponential distribution
Maximizing Softmax Extensions for MAP inference of DPPs.π π = log det diag π π β π + π , π β 0,1 T
π:kernel/similarity matrix. π« is a matching polytope for matched summarization.
Synthetic problem instances: - Softmax objectives: generate π with π random eigenvalues - Generate polytope constraints similarly as that for quadratic programming
Real-world results on matched summarization:Select a set of document pairs out of a corpus of documents, such that the two documents within a pair are similar, and the overall set of pairs is as diverse as possible. Setting similar to [Gillenwater et al β12], experimented on the 2012 US Republican detates data.
0.2 0.4 0.6 0.8 1Match quality controller
2
4
6
8
10
Func
tion
valu
e
0 20 40 60 80 100Iteration
0
0.5
1
1.5
2
2.5
Func
tion
valu
e
Submodular
Concave Convex
DR-submodular
π Approximately Stationary Points & Global Optimum:
(Local-Global Relation). Let π β π« with non-stationarity ππ« π . Define π¬ β π« β© π π β€ π0 β π}. Let π β π¬ with non-stationarity ππ¬ π . Then,
max π π , π π β₯ ^_[π πβ β ππ« π β ππ¬ π ] + [\( π β π
β $ + π β πβ $),
where πβ β π β¨ πβ β π.
- Proof using the essential DR property on carefully constructed auxiliary points
- Good empirical performance for the Two-Phase algorithm: if π is away from πβ, π β πβ $ will augment the bound; if π is close to πβ, by the smoothness of π, should be near optimal.
DR-submodularity captures a subclass of non-convex/non-concave functions that enables exact minimization and approximate maximization in poly. time.
π Investigate geometric properties that underlie such objectives, e.g., a strong relation between stationary points & global optimum is proved.
π Devise two guaranteed algorithms: i) A βtwo-phaseβ algorithm with ΒΌ approximation guarantee. ii) A non-monotone Frank-Wolfe variant with " β approximation guarantee
π Extend to a much broader class of submodular functions on βconicβ lattices.
Abst
ract
8 10 12 14 16Dimensionality
0.85
0.9
0.95
1
Appr
ox. r
atio
8 10 12 14 16Dimensionality
0.85
0.9
0.95
1
Appr
ox. r
atio
8 10 12 14 16Dimensionality
0.9
0.95
1
Appr
ox. r
atio
π = 0.5π π = π π = 1.5π
8 10 12 14 16Dimensionality
-0.05
0
0.05
0.1
0.15
0.2
Func
tion
valu
e
8 10 12 14 16Dimensionality
-0.05
0
0.05
0.1
0.15
0.2
Func
tion
valu
e
8 10 12 14 16Dimensionality
-0.05
0
0.05
0.1
0.15
0.2
Func
tion
valu
e
π = 0.5π π = 1.5ππ = π
key difference from the monotone Frank-Wolfevariant [Bian et al β17]
Lemma. For any π, π, β¨π β π, π»π π β© β₯ π π β¨ π + π π β§ π β 2π π + Z$ πaπ X
If π»π π = 0, then 2π π β₯ π π β¨ π + π π β§ π + [X πaπ XΓ implicit relation between π & π. (finding an exact stationary point is difficult π)
Non-stationarity Measure [Lacoste-Julien β16]. For any π¬ β π³, the non-stationarity of π β π¬ is,ππ¬ π β maxπβπ¬ β¨π β π, π»π π β©
coordinate-wise max. coordinate-wise min.
π·: diameter of π«πΏ: smooth gradient
Softmax (red) & multilinear (blue)extensions, & concave cross-sectionsFig. from [Gillenwater et al β12]