View
7
Download
0
Category
Preview:
Citation preview
Subset Selection by Pareto Optimization:
Theories and Practical Algorithms
Chao Qian
UBRI, School of Computer Science and Technology University of Science and Technology of China
http://staff.ustc.edu.cn/~chaoqian/
Email: chaoqian@ustc.edu.cn
http://staff.ustc.edu.cn/~chaoqian/
Outline
Introduction
Pareto optimization for subset selection
Pareto optimization for large-scale subset selection
Pareto optimization for noisy subset selection
Conclusion
http://staff.ustc.edu.cn/~chaoqian/
Maximum coverage
Maximum coverage [Feige, JACMโ98] : select at most ๐ต sets from ๐given sets to make the union maximal
Formally stated: given a ground set ๐, a collection ๐ = {๐1, โฆ , ๐๐} ofsubsets of ๐ and a budget ๐ต, it is to find a subset ๐ โ ๐ such that
๐๐๐ฅ๐โ๐ ๐ ๐ = |โ๐๐โ๐ ๐๐| ๐ . ๐ก. ๐ โค ๐ต.
๐๐+1 ๐๐+2 ๐2๐
๐1
๐๐
๐๐
http://staff.ustc.edu.cn/~chaoqian/
Sparse regression
Sparse regression [Tropp, TITโ04] : find a sparse approximation solution to the linear regression problem
Formally stated: given all observation variables ๐ = {๐ฃ1, โฆ , ๐ฃ๐}, apredictor variable ๐ง and a budget ๐ต, it is to find a subset ๐ โ ๐ suchthat
๐๐๐ฅ๐โ๐ ๐ ๐ง,๐2 =
Var ๐ง โ MSE๐ง,๐Var ๐ง
๐ . ๐ก. ๐ โค ๐ต.
http://staff.ustc.edu.cn/~chaoqian/
Influence maximization
Influence maximization [Kempe et al., KDDโ03] : select a subset of users from a social network to maximize its influence spread
Formally stated: given a directed graph ๐บ = (๐, ๐ธ) with ๐ = {๐ฃ1, โฆ , ๐ฃ๐},edge probabilities ๐๐ข,๐ฃ ((๐ข, ๐ฃ) โ ๐ธ) and a budget ๐ต, it is to find a subset๐ โ ๐ such that
๐๐๐ฅ๐โ๐ ๐ ๐ = โ๐=1๐ ๐(๐ โ ๐ฃ๐) ๐ . ๐ก. ๐ โค ๐ต.
Influential users
http://staff.ustc.edu.cn/~chaoqian/
Document summarization
Document summarization [Lin & Bilmes, ACLโ11] : select a few sentences to best summarize the documents
http://staff.ustc.edu.cn/~chaoqian/
Sensor placement
Sensor placement [Krause & Guestrin, IJCAIโ09 Tutorial] : select a few places to install sensors such that the information gathered is maximized
Water contamination detection Fire detection
http://staff.ustc.edu.cn/~chaoqian/
Subset selection
Subset selection is to select a subset of size ๐ต from a total set of ๐ items for optimizing some objective function
Formally stated: given all items ๐ = {๐ฃ1, โฆ , ๐ฃ๐}, an objective function๐: 2๐ โ R and a budget ๐ต, it is to find a subset ๐ โ ๐ such that
๐๐๐ฅ๐โ๐ ๐ ๐ ๐ . ๐ก. ๐ โค ๐ต.
Application ๐๐ ๐
maximum coverage a set of elements size of the union
sparse regression an observation variable MSE of prediction
influence maximization a social network user influence spread
document summarization a sentence summary quality
sensor placement a place to install a sensor entropy
Many applications, but NP-hard in general!
http://staff.ustc.edu.cn/~chaoqian/
Subset selection - submodular
Monotone: for any ๐ โ ๐ โ ๐, ๐ ๐ โค ๐(๐)
Subset selection: given all items ๐ = {๐ฃ1, โฆ , ๐ฃ๐}, an objective function๐: 2๐ โ R and a budget ๐ต, it is to find a subset ๐ โ ๐ such that
๐๐๐ฅ๐โ๐ ๐ ๐ ๐ . ๐ก. ๐ โค ๐ต.
Submodular [Nemhauser et al., MPโ78] : satisfy the natural diminishing returns property, i.e., for any ๐ โ ๐ โ ๐, ๐ฃ โ ๐,
๐ ๐ โช ๐ฃ โ ๐ ๐ โฅ ๐ ๐ โช ๐ฃ โ ๐(๐)
Submodular ratio [Zhang & Vorobeychi, AAAIโ16] :
๐พ๐ = min๐โ๐,๐ฃโ๐
๐ ๐ โช ๐ฃ โ ๐ ๐
๐ ๐ โช ๐ฃ โ ๐(๐)
The optimal approximation guarantee [Nemhauser & Wolsey, MORโ78] :
1 โ 1/๐ โ 0.632 by the greedy algorithm
Discrete analogue of convexity!
http://staff.ustc.edu.cn/~chaoqian/
Variants of subset selection
โข Monotone set function maximization with size constraints
๐๐๐ฅ๐โ๐ ๐ ๐ ๐ . ๐ก. ๐ โค ๐ต
๐๐๐๐โ๐ ๐ ๐ /๐(๐)
โข Ratio optimization of monotone functions
1 โ 1/๐๐พ
[Das & Kempe, ICMLโ11]
1/2[Ohsaka & Yoshida, NIPSโ15]
1 โ ๐โ1/(2โ)
[Tschiatschek et al, AAAIโ17]
(1 โ 1/๐)/2[Soma et al., ICMLโ14]
(๐พ/2) 1 โ 1/๐๐พ
[Zhang & Vorobeychik, AAAIโ16]
|๐โ|
(1 + ( ๐โ โ 1)(1 โ ๐ ))๐พ[Bai et al., ICMLโ16]
๐ โค ๐ต โ ๐ ๐ โค ๐ต
โข Monotone set function maximization with general constraints
๐: a subset โ a multiset
โข Monotone multiset function maximization with size constraints
๐: a subset โ ๐ subsets
โข Monotone ๐-submodular function maximization with size constraints
๐: a subset โ a sequence
โข Monotone sequence function maximization with size constraints
http://staff.ustc.edu.cn/~chaoqian/
Previous approaches
โข Greedy algorithms
Process: iteratively select one item that makes some criterion
currently optimized
Iteration j:
Iteration 1:V =
{๐ฃ1, ๐ฃ2, โฆ , ๐ฃ๐}
๐ฃโ
๐1 = {๐ฃโ}
๐ฃโ =
arg๐๐๐ฅ๐ฃโ๐โ๐๐โ1๐ ๐๐โ1 โช ๐ฃ โ ๐ ๐๐โ1
๐ฃโ
๐ โ ๐๐โ1๐๐ =
๐๐โ1 โช {๐ฃโ}
๐๐๐ฅ๐โ๐ ๐ ๐ ๐ . ๐ก. ๐ โค ๐ต
http://staff.ustc.edu.cn/~chaoqian/
Previous approaches
๐ฃโ =
arg๐๐๐ฅ๐ฃโ๐โ๐๐โ1๐ ๐๐โ1 โช ๐ฃ โ ๐(๐๐โ1)
๐ ๐๐โ1 โช ๐ฃ โ ๐(๐๐โ1)
๐๐๐ฅ๐โ๐ ๐ ๐ ๐ . ๐ก. ๐(๐) โค ๐ต
Iteration j:
Iteration 1:V =
{๐ฃ1, ๐ฃ2, โฆ , ๐ฃ๐}
๐ฃโ
๐1 = {๐ฃโ}
๐ฃโ
๐ โ ๐๐โ1๐๐ =
๐๐โ1 โช {๐ฃโ}
โข Greedy algorithms
Process: iteratively select one item that makes some criterion
currently optimized
http://staff.ustc.edu.cn/~chaoqian/
Previous approaches
๐ฃโ =
arg๐๐๐๐ฃโ๐โ๐๐โ1๐ ๐๐โ1 โช ๐ฃ โ ๐(๐๐โ1)
๐ ๐๐โ1 โช ๐ฃ โ ๐(๐๐โ1)
๐๐๐๐โ๐ ๐ ๐ /๐(๐)
โข Greedy algorithms
Process: iteratively select one item that makes some criterion
currently optimized
Iteration j:
Iteration 1:V =
{๐ฃ1, ๐ฃ2, โฆ , ๐ฃ๐}
๐ฃโ
๐1 = {๐ฃโ}
๐ฃโ
๐ โ ๐๐โ1๐๐ =
๐๐โ1 โช {๐ฃโ}
http://staff.ustc.edu.cn/~chaoqian/
Previous approaches
Iteration j:
Iteration 1:V =
{๐ฃ1, ๐ฃ2, โฆ , ๐ฃ๐}
(๐ฃโ, ๐โ)๐๐โ = {๐ฃโ}
(๐ฃโ, ๐โ) = arg๐๐๐ฅ๐ฃโ๐โโ๐๐๐, ๐โ{1,2, โฆ ,๐}
๐ ๐1, โฆ , ๐๐ โช {๐ฃ}, โฆ , ๐๐ โ ๐ ๐1, โฆ , ๐๐ , โฆ , ๐๐
(๐ฃโ, ๐โ)๐ โ โ1โค๐โค๐๐๐
๐๐โ =๐๐โ โช {๐ฃโ}
๐๐๐ฅ๐1,๐2,โฆ ,๐๐โ๐ ๐ ๐1, ๐2, โฆ , ๐๐ ๐ . ๐ก. โ1โค๐โค๐ ๐๐ โค ๐ต
โข Greedy algorithms
Process: iteratively select one item that makes some criterion
currently optimized
http://staff.ustc.edu.cn/~chaoqian/
Previous approaches
โข Greedy algorithms
Process: iteratively select one item that makes some criterion
currently optimized
Weakness: get stuck in local optima due to the greedy behavior
Iteration j:
Iteration 1:V =
{๐ฃ1, ๐ฃ2, โฆ , ๐ฃ๐}
๐ฃโ
๐1 = {๐ฃโ}
๐ฃโ
๐ โ ๐๐โ1๐๐ =
๐๐โ1 โช {๐ฃโ}
http://staff.ustc.edu.cn/~chaoqian/
Previous approaches (conโt)
โข Relaxation methods
Process: relax the original problem, then find the optimal
solutions to the relaxed problem
Weakness: the optimal solution of the relaxed problem may be distant to the true optimum
๐๐๐ฅ๐โ๐ ๐ ๐ ๐ . ๐ก. ๐ โค ๐ต
๐๐๐ฅ๐คโ๐ ๐ ๐ ๐ค ๐ . ๐ก. |๐ค|0 โค ๐ต
๐๐๐ฅ๐คโ๐ ๐ ๐ ๐ค ๐ . ๐ก. |๐ค|1 โค ๐ต
non-convex
convex
http://staff.ustc.edu.cn/~chaoqian/
Motivation
1. Optimize the criterion
2. Keep the size small
Two conflicting objectives:
๐๐๐ฅ๐ฅโ{0,1}๐ ๐ ๐ฅ ๐ . ๐ก. |๐ฅ| โค ๐ต
๐๐๐ฅ๐ฅโ{0,1}๐ ๐ ๐ฅ
๐๐๐๐ฅโ{0,1}๐ ๐๐๐ฅ{|๐ฅ| โ ๐ต, 0}
Subset selection:
Why not directly optimize the bi-objective formulation?
๐๐๐๐ฅโ 0,1 ๐ (โ๐ ๐ฅ , |๐ฅ|)
a subset ๐ โ ๐
http://staff.ustc.edu.cn/~chaoqian/
Outline
Introduction
Pareto optimization for subset selection
Pareto optimization for large-scale subset selection
Pareto optimization for noisy subset selection
Conclusion
http://staff.ustc.edu.cn/~chaoqian/
Pareto optimization
The basic idea:
๐๐๐ฅ๐ฅโ{0,1}๐ ๐ ๐ฅ ๐ . ๐ก. ๐ฅ โค ๐ต
๐๐๐๐ฅ (๐1 ๐ฅ , ๐2 ๐ฅ )
bi-objective optimization
z
x
y
๐1
๐2
better ๐1better ๐2
worse ๐1better ๐2
x dominates z :
๐1 ๐ฅ < ๐1 ๐ง โ ๐2 ๐ฅ < ๐2 ๐ง
x is incomparable with y :
๐1 ๐ฅ > ๐1 ๐ฆ โ ๐2 ๐ฅ < ๐2 ๐ฆ
๐๐๐ฅ๐ฅโ{0,1}๐ ๐ ๐ฅ ๐ . ๐ก. ๐(๐ฅ) โค ๐ต
๐๐๐๐ฅโ{0,1}๐ ๐ ๐ฅ /๐(๐ฅ)
๐๐๐ฅ๐ฅโ{0,1,โฆ , ๐}๐ ๐ ๐ฅ ๐ . ๐ก. ๐ฅ โค ๐ต
http://staff.ustc.edu.cn/~chaoqian/
Pareto optimization
The basic idea:
๐๐๐ฅ๐ฅโ{0,1}๐ ๐ ๐ฅ ๐ . ๐ก. ๐ฅ โค ๐ต
๐๐๐๐ฅ (๐1 ๐ฅ , ๐2 ๐ฅ )
bi-objective optimization
๐๐๐๐ฅโ{0,1}๐ ๐ ๐ฅ /๐(๐ฅ)
population new solutions
reproduction
evaluation & updatinginitialization
A simple multi-objective evolutionary algorithm [Laumanns et al., TEvCโ04]
Initialization: put a random or specialsolution into the population ๐
Reproduction: pick a solution randomlyfrom ๐, and randomly change it (e.g., flipeach bit of ๐ฅ โ {0,1}๐ with prob. 1/๐ )
Evaluation & Updating: if the newsolution is not dominated, put it into ๐and weed out bad solutions
Output: select the best solution w.r.t. the original problem
How to transform?
๐๐๐ฅ๐ฅโ{0,1}๐ ๐ ๐ฅ ๐ . ๐ก. ๐(๐ฅ) โค ๐ต
๐๐๐ฅ๐ฅโ{0,1,โฆ , ๐}๐ ๐ ๐ฅ ๐ . ๐ก. ๐ฅ โค ๐ต
http://staff.ustc.edu.cn/~chaoqian/
Pareto optimization vs Greedy algorithms
Greedy algorithms:
โข Produce a new solution by adding a single item (single-bit forward search: 0 โ 1)
โข Maintain only one solution
Pareto optimization:
โข Produce a new solution by flipping each bit of a solutionwith prob. 1/๐ (single-bit forward search, backward search,multi-bit search)
โข Maintain several non-dominated solutions due to bi-objective optimization
Pareto optimization may have a better ability of avoiding local optima!
http://staff.ustc.edu.cn/~chaoqian/
Monotone set function maximization with size constraints
The POSS approach [Qian, Yu and Zhou, NIPSโ15]
Transformation:
Initialization: put the special solution {0}๐
into the population ๐
Reproduction: pick a solution ๐ฅ randomlyfrom ๐, and flip each bit of ๐ฅ with prob.1/๐ to produce a new solution
Evaluation & Updating: if the newsolution is not dominated, put it into ๐and weed out bad solutions
Output: select the best feasible solution
๐๐๐ฅ๐ฅโ{0,1}๐ ๐ ๐ฅ ๐ . ๐ก. ๐ฅ โค ๐ต original
๐๐๐๐ฅโ{0,1}๐ (โ๐ ๐ฅ , |๐ฅ|) bi-objective
http://staff.ustc.edu.cn/~chaoqian/
Theorem 1. For monotone set function maximization with cardinality constraints, POSS using ๐ธ ๐ โค 2๐๐ต2๐ finds a solution ๐ฅ with ๐ฅ โค ๐ต and ๐ ๐ฅ โฅ (1 โ ๐โ๐พ) โ ๐๐๐.
Theoretical analysis
POSS can achieve the same general approximation guarantee as the greedy algorithm
the expected number of iterationsthe best known polynomial-time approximation ratio, previously obtained by the greedy algorithm [Das & Kempe, ICMLโ11]
http://staff.ustc.edu.cn/~chaoqian/
๐(๐ โช { ๐ฃ}) โ ๐(๐) โฅ๐พ
๐ต(๐๐๐ โ ๐(๐))
Proof
the optimal function valuesubmodularity ratio [Das & Kempe, ICMLโ11]
Roughly speaking, the improvement by adding a specific item is proportional to the current distance to the optimum
Lemma 1. For any ๐ โ ๐, there exists one item ๐ฃ โ ๐ โ ๐ such that
http://staff.ustc.edu.cn/~chaoqian/
Proof
Main idea:
โข consider a solution ๐ฅ with |๐ฅ| โค ๐ and ๐(๐ฅ) โฅ 1 โ 1 โ๐พ
๐ต
๐โ ๐๐๐
{0,1}๐
๐ = 0 ๐ = ๐ต
initial solution 00โฆ0 1 โ 1 โ๐พ
๐ต
๐ต
โฅ 1 โ ๐โ๐พ
the desired approximation bound
๐(๐ โช { ๐ฃ}) โ ๐(๐) โฅ๐พ
๐ต(๐๐๐ โ ๐(๐))
Lemma 1. For any ๐ โ ๐, there exists one item ๐ฃ โ ๐ โ ๐ such that
http://staff.ustc.edu.cn/~chaoqian/
Proof
{0,1}๐
Main idea:
โข consider a solution ๐ฅ with |๐ฅ| โค ๐ and ๐(๐ฅ) โฅ 1 โ 1 โ๐พ
๐ต
๐โ ๐๐๐
โข in each iteration of POSS:
select ๐ฅ from the population ๐, the probability: 1/|๐|
flip one specific 0-bit of ๐ฅ to 1-bit, the probability: 1
๐1 โ
1
๐
๐โ1โฅ
1
๐๐
๐ฅโฒ = ๐ฅ + 1 โค ๐ + 1 and ๐(๐ฅโฒ) โฅ 1 โ 1 โ๐พ
๐ต
๐+1โ ๐๐๐
๐ ๐ + 1 the probability: 1
๐โ1
๐๐
๐(๐ โช { ๐ฃ}) โ ๐(๐) โฅ๐พ
๐ต(๐๐๐ โ ๐(๐))
Lemma 1. For any ๐ โ ๐, there exists one item ๐ฃ โ ๐ โ ๐ such that
http://staff.ustc.edu.cn/~chaoqian/
Proof
๐ ๐ + 1 the probability: 1
๐โ1
๐๐
1
2๐๐ต๐
๐ โค 2๐ต
๐ ๐ + 1 the expected number of iterations: 2๐๐ต๐
๐ = 0 ๐ต the expected number of iterations: ๐ต โ 2๐๐ต๐
{0,1}๐
๐(๐ โช { ๐ฃ}) โ ๐(๐) โฅ๐พ
๐ต(๐๐๐ โ ๐(๐))
Lemma 1. For any ๐ โ ๐, there exists one item ๐ฃ โ ๐ โ ๐ such that
Main idea:
โข consider a solution ๐ฅ with |๐ฅ| โค ๐ and ๐(๐ฅ) โฅ 1 โ 1 โ๐พ
๐ต
๐โ ๐๐๐
โข in each iteration of POSS:
http://staff.ustc.edu.cn/~chaoqian/
Theoretical analysis
Theorem 2. For the Exponential Decay subclass of sparse regression, POSS using ๐ธ ๐ = ๐(๐ต2 ๐ โ ๐ต ๐ log ๐) finds an optimal solution, while the greedy algorithm cannot.
POSS can do better than the greedy algorithm in cases[Das & Kempe, STOCโ08]
Theorem 1. For monotone set function maximization with cardinality constraints, POSS using ๐ธ ๐ โค 2๐๐ต2๐ finds a solution ๐ฅ with ๐ฅ โค ๐ต and ๐ ๐ฅ โฅ (1 โ ๐โ๐พ) โ ๐๐๐.
POSS can achieve the same general approximation guarantee as the greedy algorithm
the best known polynomial-time approximation ratio, previously obtained by the greedy algorithm [Das & Kempe, ICMLโ11]
http://staff.ustc.edu.cn/~chaoqian/
Sparse regression
Sparse regression [Tropp, TITโ04] : find a sparse approximation solution to the linear regression problem
Formally stated: given all observation variables ๐ = {๐ฃ1, โฆ , ๐ฃ๐}, apredictor variable ๐ง and a budget ๐ต, it is to find a subset ๐ โ ๐ suchthat
๐๐๐ฅ๐โ๐ ๐ ๐ง,๐2 =
Var ๐ง โ MSE๐ง,๐Var ๐ง
๐ . ๐ก. ๐ โค ๐ต.
http://staff.ustc.edu.cn/~chaoqian/
Experimental results - ๐ 2 values
greedy algorithms relaxation methods
POSS is significantly better than all the compared methods on all data sets
the size constraint: B = ๐ the number of iterations of POSS: ๐๐๐ฉ๐๐
exhaustive search
http://staff.ustc.edu.cn/~chaoqian/
Experimental results - ๐ 2 values
POSS tightly follows OPT, and has a clear advantage over the rest methods
different size constraints: ๐ฉ = ๐ โ ๐
http://staff.ustc.edu.cn/~chaoqian/
Experimental results โ running time
POSS can be much more efficient in practice than in theoretical analysis
OPT: ๐๐ต/๐ต๐ต greedy methods (FR): ๐ต๐ POSS: 2๐๐ต2๐
theoretical running time
http://staff.ustc.edu.cn/~chaoqian/
Monotone set function maximization with general constraints
The POMC approach [Qian, Shi, Yu and Tang, IJCAIโ17]
Transformation:
๐๐๐ฅ๐ฅโ{0,1}๐ ๐ ๐ฅ ๐ . ๐ก. ๐(๐ฅ) โค ๐ต original
๐๐๐๐ฅโ{0,1}๐ (โ๐ ๐ฅ , ๐(๐ฅ)) bi-objective
Theory: POMC can achieve the same approximation guarantee (๐พ/2)(1 โ ๐โ๐พ) as the greedy algorithm [Zhang & Vorobeychik, AAAIโ16]
Application:
influence maximization
http://staff.ustc.edu.cn/~chaoqian/
Monotone multiset function maximization with size constraints
The POMS approach [Qian, Zhang, Tang and Yao, AAAIโ18]
Transformation:
๐๐๐ฅ๐ฅโZ+๐ ๐ ๐ฅ ๐ . ๐ก. |๐ฅ| โค ๐ต original
๐๐๐๐ฅโZ+๐ (โ๐ ๐ฅ , |๐ฅ|) bi-objective
Theory: POMS can achieve the same approximation guarantee (1 โ 1/๐)/2 as the greedy algorithm [Soma et al., ICMLโ14]
Application:
generalized influencemaximization
http://staff.ustc.edu.cn/~chaoqian/
Monotone k-submodular function maximization with size constraints
The MOMS approach [Qian, Shi, Tang and Zhou, TEvC in press]
Transformation:
๐๐๐ฅ๐ฅโ{0,1,โฆ,๐}๐ ๐ ๐ฅ ๐ . ๐ก. ๐ฅ โค ๐ต original
๐๐๐๐ฅโ{0,1,โฆ,๐}๐ (โ๐ ๐ฅ , |๐ฅ|) bi-objective
Theory: MOMS can achieve the same approximation guarantee 1/2 as the greedy algorithm [Ohsaka & Yoshida, NIPSโ15]
Application:
sensor placement
http://staff.ustc.edu.cn/~chaoqian/
Monotone sequence function maximization with size constraints
The POSeqSel approach [Qian, Feng and Tang, IJCAIโ18]
Transformation:
๐๐๐ฅ๐ฅโ๐ฎ ๐ ๐ฅ ๐ . ๐ก. |๐ฅ| โค ๐ต original
๐๐๐๐ฅโ๐ฎ (โ๐ ๐ฅ , |๐ฅ|) bi-objective
Theory: POSeqSel can achieve the approximation guarantee 1 โ ๐โ1/2 better than the greedy algorithm [Tschiatschek et al., AAAIโ17]
Application:
movie recommendation
http://staff.ustc.edu.cn/~chaoqian/
Ratio optimization of monotone functions
The PORM approach [Qian, Shi, Yu, Tang and Zhou, IJCAIโ17]
Transformation:
๐๐๐๐ฅโ{0,1}๐ ๐ ๐ฅ /๐(๐ฅ) original
๐๐๐๐ฅโ{0,1}๐ (๐ ๐ฅ , โ๐(๐ฅ)) bi-objective
Theory: PORM can achieve the same approximation guarantee |๐โ|
(1+( ๐โ โ1)(1โ๐ ))๐พas the greedy algorithm [Bai et al., ICMLโ16]
Application:
F-measure maximizationin information retrieval
http://staff.ustc.edu.cn/~chaoqian/
Can we make the Pareto optimization method parallelizable?
Pareto optimization for subset selection
achieve superior performance on diverse variants of subset selection both theoretically and empirically
The running time (e.g., 2๐๐ต2๐) for achieving a good solution
unsatisfactory when the problem size (e.g., ๐ต and ๐) is large
A sequential algorithm that cannot be readily parallelized
restrict the application to large-scale real-world problems
http://staff.ustc.edu.cn/~chaoqian/
Outline
Introduction
Pareto optimization for subset selection
Pareto optimization for large-scale subset selection
Pareto optimization for noisy subset selection
Conclusion
http://staff.ustc.edu.cn/~chaoqian/
Pareto optimization
1. randomly generate a solution,and put it into the population ๐;
2. loop| 2.1 pick a solution randomly from ๐;| 2.2 randomly change it to make a new one;| 2.3 if the new one is not strictly worse| | 2.3.1 put it into ๐;| | 2.3.2 remove worse solutions from ๐;3. when terminates, select the best feasible
solution from ๐.
randomlyRandom solution pick a
solution
a new solution
population ๐
flip each bit with prob. 1/๐
put it into ๐ andremove worse solutions
select the best feasible solution
initial
terminated
Random solution
popula
tion ๐
pick a solution
a new solution
pick a solution
a new solution
iteration 1 iteration 2
popula
tion ๐
http://staff.ustc.edu.cn/~chaoqian/
Parallel Pareto optimization
Random solution
popula
tion ๐
pick a solution
a new solution
pick a solution
a new solution
Random solution
popula
tion ๐
pick a solution
a new solution
popula
tion ๐
pick a solution
a new solution
pick a solution
a new solution
pick a solution
a new solution
pick a solution
a new solution
pick a solution
a new solution
pick a solution
a new solution
pick a solution
a new solution
pick a solution
a new solution
pick a solution
a new solution
iteration 1 iteration 2
popula
tion ๐
http://staff.ustc.edu.cn/~chaoqian/
Parallel Pareto optimization
1
๐๐1 โ 1 โ
1
๐๐
๐
โ๐
๐๐
[Qian et al., IJCAIโ16]
1
๐
๐
๐/๐
Q: the same solution quality?
POSS
PPOSS
[Qian et al., NIPSโ15]
http://staff.ustc.edu.cn/~chaoqian/
Theorem 1. For monotone set function maximization with size constraints, the expected number of iterations until PPOSS finds a solution ๐ฅ with ๐ฅ โค ๐ตand ๐ ๐ฅ โฅ (1 โ ๐โ๐พ) โ ๐๐๐ is
(1) if ๐ = ๐ ๐ , then ๐ธ ๐ โค 2๐๐ต2๐/๐;
(2) if ๐ = ฮฉ ๐๐ for 1 โค ๐ โค ๐ต, then ๐ธ ๐ = ๐(๐ต2/๐);
(3) if ๐ = ฮฉ ๐min{3๐ตโ1,๐} , then ๐ธ ๐ = ๐ 1 .
Theoretical analysis
โข When the number ๐ of processors is less than the number ๐ of items, the number ๐ of iterations can be reduced linearly w.r.t. the number of processors
the same approximation bound
http://staff.ustc.edu.cn/~chaoqian/
Theorem 1. For monotone function maximization with cardinality constraints, the expected number of iterations until PPOSS finds a solution ๐ฅ with ๐ฅ โค ๐ตand ๐ ๐ฅ โฅ (1 โ ๐โ๐พ) โ ๐๐๐ is
(1) if ๐ = ๐ ๐ , then ๐ธ ๐ โค 2๐๐ต2๐/๐;
(2) if ๐ = ฮฉ ๐๐ for 1 โค ๐ โค ๐ต, then ๐ธ ๐ = ๐(๐ต2/๐);
(3) if ๐ = ฮฉ ๐min{3๐ตโ1,๐} , then ๐ธ ๐ = ๐ 1 .
Theoretical analysis
โข When the number ๐ of processors is less than the number ๐ of items, the number ๐ of iterations can be reduced linearly w.r.t. the number of processors
the same approximation bound
โข With increasing number ๐ of processors, the number ๐ of iterations can be continuously reduced, eventually to a constant
http://staff.ustc.edu.cn/~chaoqian/
Experiments on sparse regression
Compare the speedup as well as the solution quality measured by ๐น๐ values with different number of cores
http://staff.ustc.edu.cn/~chaoqian/
Experiments on sparse regression
the asynchronous version of PPOSS
PPOSS (blue line): achieve speedup around 8 when the number of cores is 10; the ๐น๐ values are stable, and better than Greedy
PPOSS-asy (red line): achieve better speedup (avoid the synchronous cost);
the ๐น๐ values are slightly worse (the noise from asynchronization)
the best previous algorithm [Das & Kempe, ICMLโ11]
http://staff.ustc.edu.cn/~chaoqian/
Can we make the Pareto optimization method distributable?
Pareto optimization for subset selection
achieve superior performance on diverse variants of subset selection both theoretically and empirically
Parallel Pareto optimization for subset selection
achieve nearly linear runtime speedup while keeping the solution quality
Require centralized access to the whole data set
restrict the application to large-scale real-world problems
http://staff.ustc.edu.cn/~chaoqian/
Distributed Pareto optimization
Random solution
popula
tion ๐
pick a solution
a new solution
Single machine
Data
POSS[Qian et al., NIPSโ15]
stage 1 stage 2
Random solution
popula
tion ๐
pick a solution
a new solution
Single machine
Data
data 1
Split
data ๐
Merg
e
Random solution
popula
tion ๐
pick a solution
a new solution
Single machine
Random solution
popula
tion ๐
pick a solution
a new solution
Single machine
DPOSS[Qian et al., IJCAIโ18]
http://staff.ustc.edu.cn/~chaoqian/
Experiments on sparse regression
Compare DPOSS with the state-of-the-art distributed greedy algorithm RandGreeDi [Mirzasoleiman et al., JMLRโ16] under different number of machines
On regular-scale data sets
DPOSS is always better than RandGreeDi
http://staff.ustc.edu.cn/~chaoqian/
Experiments on sparse regression
On regular-scale data sets
DPOSS is very close to the centralized POSS
On large-scale data sets
DPOSS is better than RandGreeDi
http://staff.ustc.edu.cn/~chaoqian/
Experiments on maximum coverage
On regular-scale data sets
On large-scale data sets
http://staff.ustc.edu.cn/~chaoqian/
Pareto optimization for subset selection
achieve superior performance on diverse variants of subset selection both theoretically and empirically
Parallel Pareto optimization for subset selection
achieve nearly linear runtime speedup while keeping the solution quality
Distributed Pareto optimization for subset selection
achieve very close performance to the centralized algorithm
http://staff.ustc.edu.cn/~chaoqian/
Previous analyses often assume that the exact value of the objective function can be accessed
However, in many applications of subset selection, only a noisy value of the objective function can be obtained
The objective function: the expected number of nodes activated by propagating from ๐
Influential usersInfluence maximization
noiseThe average number of active nodes of independent diffusion processes [Kempe et al., KDDโ03]
Noise
http://staff.ustc.edu.cn/~chaoqian/
How about the performance for noisy subset selection?
Previous analyses often assume that the exact value of the objective function can be accessed
However, in many applications of subset selection, only a noisy value of the objective function can be obtained
The objective function: the mean square error of prediction by ๐
Sparseregression
noise
Noise
http://staff.ustc.edu.cn/~chaoqian/
Outline
Introduction
Pareto optimization for subset selection
Pareto optimization for large-scale subset selection
Pareto optimization for noisy subset selection
Conclusion
http://staff.ustc.edu.cn/~chaoqian/
Noisy subset selection
Subset selection: given all items ๐ = {๐ฃ1, โฆ , ๐ฃ๐}, an objective function ๐: 2๐ โ R and a budget ๐ต, it is to find a subset ๐ โ ๐ such that
๐๐๐ฅ๐โ๐ ๐ ๐ ๐ . ๐ก. ๐ โค ๐ต.
Multiplicative: 1 โ ๐ ๐ ๐ โค ๐น ๐ โค 1 + ๐ ๐(๐)
Additive: ๐ ๐ โ ๐ โค ๐น ๐ โค ๐ ๐ + ๐
Noise
Applications: influence maximization, sparse regression
crowdsourced image collection summarization [Singla et al., AAAIโ16]
maximizing information gain in graphical models [Chen et al., COLTโ15]
http://staff.ustc.edu.cn/~chaoqian/
Theoretical analysis for greedy algorithms
Multiplicative noise:
๐ ๐ โฅ1
1 +2๐๐ต1 โ ๐ ๐พ
1 โ1 โ ๐
1 + ๐
๐ต
1 โ๐พ
๐ต
๐ต
โ ๐๐๐
Additive noise:
submodularity ratio [Das & Kempe, ICMLโ11]
The noiseless approximation guarantee [Das & Kempe, ICMLโ11]
๐ ๐ โฅ 1 โ 1 โ๐พ
๐ต
๐ต
โ ๐๐๐ โฅ 1 โ ๐โ๐พ โ ๐๐๐a constant approximation ratio
๐ โค 1/๐ต for a constant approximation ratio
The performance largely degrades in noisy environments
๐ ๐ โฅ 1 โ 1 โ๐พ
๐ต
๐ต
โ ๐๐๐ โ2๐ต
๐พโ2๐ต
๐พ๐โ๐พ ๐
http://staff.ustc.edu.cn/~chaoqian/
Theoretical analysis for POSS
โข POSS can generally achieve the same approximation guarantee in both multiplicative and additive noises
โข POSS has a better ability of avoiding the misleading search direction led by noise
Maximum coverage
Greedy: very bad approximation[Hassidim & Singer, COLTโ17]
POSS: find the optimal solution through multi-bit search
POSS: find the optimal solution through backward search
http://staff.ustc.edu.cn/~chaoqian/
PONSS
In our previous work, threshold selection was theoretically shown to be tolerant to noise [Qian et al., ECJโ18]
Exponentially decrease the running time
๐ ๐ฅ โฅ ๐(๐ฆ) ๐ ๐ฅ โฅ ๐ ๐ฆ + ๐
Additive:Multiplicative:
๐ฅ โผ ๐ฆ โ ๐ ๐ฅ โฅ
1 + ๐
1 โ ๐๐(๐ฆ)
๐ฅ โค |๐ฆ|
๐ฅ โผ ๐ฆ โ ๐ ๐ฅ โฅ ๐ ๐ฆ + 2๐
๐ฅ โค |๐ฆ|
PONSS [Qian et al., NIPSโ17]
POSS ๐ฅ โผ ๐ฆ โ ๐ ๐ฅ โฅ ๐(๐ฆ)
๐ฅ โค |๐ฆ|
โbetterโ
http://staff.ustc.edu.cn/~chaoqian/
Theoretical analysis
Multiplicative noise:
Additive noise:
๐ ๐ โฅ1 โ ๐
1 + ๐1 โ 1 โ
๐พ
๐ต
๐ต
โ ๐๐๐
๐ ๐ โฅ 1 โ 1 โ๐พ
๐ต
๐ต
โ ๐๐๐ โ 2๐
PONSS
PONSS
๐ ๐ โฅ1
1 +2๐๐ต1 โ ๐ ๐พ
1 โ1 โ ๐
1 + ๐
๐ต
1 โ๐พ
๐ต
๐ต
โ ๐๐๐
๐ ๐ โฅ 1 โ 1 โ๐พ
๐ต
๐ต
โ ๐๐๐ โ2๐ต
๐พโ2๐ต
๐พ๐โ๐พ ๐
POSS & Greedy
POSS & Greedy
โฅ
significantly better
โฅ
significantly better
๐พ = 1 (submodular), ๐ is a constant
a constant approximation ratio
ฮ(1/๐ต) approximation ratio
http://staff.ustc.edu.cn/~chaoqian/
Experimental results - influence maximization
PONSS (red line) vs POSS (blue line) vs Greedy (black line):
โข Noisy evaluation: the average of 10 independent Monte Carlo simulations
โข The output solution: the average of 10,000 independent Monte Carlo simulations
http://staff.ustc.edu.cn/~chaoqian/
Experimental results - sparse regression
PONSS (red line) vs POSS (blue line) vs Greedy (black line):
โข Noisy evaluation: a random sample of 1,000 instances
โข The output solution: the whole data set
http://staff.ustc.edu.cn/~chaoqian/
Conclusion
โข Pareto optimization for subset selection
โข Pareto optimization for large-scale subset selection
โข Pareto optimization for noisy subset selection
http://staff.ustc.edu.cn/~chaoqian/
Future work
โข Problem issuesโ Non-monotone objective functions
โ Continuous submodular objective functions
โ Multiple objective functions
โข Algorithm issuesโ More complicated MOEAs
โข Theory issuesโ Beat the best known approximation guarantee
โข Application issuesโ Attempts on more large-scale real-world applications
http://staff.ustc.edu.cn/~chaoqian/
Collaborators:
Ke Tang
Yang YuJing-Cheng Shi
USTC:
Nanjing University:
Zhi-Hua Zhou
SUSTech:
Guiying LiChao Feng Xin Yao
http://staff.ustc.edu.cn/~chaoqian/
For details
โข C. Qian, Y. Yu, and Z.-H. Zhou. Subset selection by Pareto optimization. In: Advances inNeural Information Processing Systems 28 (NIPS'15), Montreal, Canada, 2015.
โข C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Parallel Pareto optimization for subsetselection. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence(IJCAI'16), New York, NY, 2016.
โข C. Qian, J.-C. Shi, Y. Yu, and K. Tang. On subset selection with general cost constraints. In:Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI'17),Melbourne, Australia, 2017.
โข C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Optimizing ratio of monotone setfunctions. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence(IJCAI'17), Melbourne, Australia, 2017.
โข C. Qian, J.-C. Shi, K. Tang, and Z.-H. Zhou. Constrained monotone k-submodular functionmaximization using multi-objective evolutionary algorithms with theoretical guarantee.IEEE Transactions on Evolutionary Computation, in press.
http://staff.ustc.edu.cn/~chaoqian/
For details
THANK YOU !
โข C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Subset selection under noise. In:Advances in Neural Information Processing Systems 30 (NIPSโ17), Long Beach, CA, 2017.
โข C. Qian, Y. Zhang, K. Tang, and X. Yao. On multiset selection with size constraints. In:Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI'18), New Orleans,LA, 2018.
โข C. Qian, G. Li, C. Feng, and K. Tang. Distributed Pareto optimization for subset selection.In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18),Stockholm, Sweden, 2018.
โข C. Qian, C. Feng, and K. Tang. Sequence selection by Pareto optimization. In: Proceedings ofthe 27th International Joint Conference on Artificial Intelligence (IJCAI'18), Stockholm, Sweden,2018.
โข C. Qian, Y. Yu, and Z.-H. Zhou. Analyzing evolutionary optimization in noisyenvironments. Evolutionary Computation, 2018, 26(1): 1-41.
Codes available at http://staff.ustc.edu.cn/~chaoqian/
Recommended