Subset Selection by Pareto Optimization: Theories and

Preview:

Citation preview

Subset Selection by Pareto Optimization:

Theories and Practical Algorithms

Chao Qian

UBRI, School of Computer Science and Technology University of Science and Technology of China

http://staff.ustc.edu.cn/~chaoqian/

Email: chaoqian@ustc.edu.cn

http://staff.ustc.edu.cn/~chaoqian/

Outline

Introduction

Pareto optimization for subset selection

Pareto optimization for large-scale subset selection

Pareto optimization for noisy subset selection

Conclusion

http://staff.ustc.edu.cn/~chaoqian/

Maximum coverage

Maximum coverage [Feige, JACMโ€™98] : select at most ๐ต sets from ๐‘›given sets to make the union maximal

Formally stated: given a ground set ๐‘ˆ, a collection ๐‘‰ = {๐‘†1, โ€ฆ , ๐‘†๐‘›} ofsubsets of ๐‘ˆ and a budget ๐ต, it is to find a subset ๐‘‹ โŠ† ๐‘‰ such that

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ = |โ‹ƒ๐‘†๐‘–โˆˆ๐‘‹ ๐‘†๐‘–| ๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต.

๐‘†๐‘™+1 ๐‘†๐‘™+2 ๐‘†2๐‘™

๐‘†1

๐‘†๐‘–

๐‘†๐‘™

http://staff.ustc.edu.cn/~chaoqian/

Sparse regression

Sparse regression [Tropp, TITโ€™04] : find a sparse approximation solution to the linear regression problem

Formally stated: given all observation variables ๐‘‰ = {๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘›}, apredictor variable ๐‘ง and a budget ๐ต, it is to find a subset ๐‘‹ โŠ† ๐‘‰ suchthat

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘…๐‘ง,๐‘‹2 =

Var ๐‘ง โˆ’ MSE๐‘ง,๐‘‹Var ๐‘ง

๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต.

http://staff.ustc.edu.cn/~chaoqian/

Influence maximization

Influence maximization [Kempe et al., KDDโ€™03] : select a subset of users from a social network to maximize its influence spread

Formally stated: given a directed graph ๐บ = (๐‘‰, ๐ธ) with ๐‘‰ = {๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘›},edge probabilities ๐‘๐‘ข,๐‘ฃ ((๐‘ข, ๐‘ฃ) โˆˆ ๐ธ) and a budget ๐ต, it is to find a subset๐‘‹ โŠ† ๐‘‰ such that

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ = โˆ‘๐‘–=1๐‘› ๐‘(๐‘‹ โ†’ ๐‘ฃ๐‘–) ๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต.

Influential users

http://staff.ustc.edu.cn/~chaoqian/

Document summarization

Document summarization [Lin & Bilmes, ACLโ€™11] : select a few sentences to best summarize the documents

http://staff.ustc.edu.cn/~chaoqian/

Sensor placement

Sensor placement [Krause & Guestrin, IJCAIโ€™09 Tutorial] : select a few places to install sensors such that the information gathered is maximized

Water contamination detection Fire detection

http://staff.ustc.edu.cn/~chaoqian/

Subset selection

Subset selection is to select a subset of size ๐ต from a total set of ๐‘› items for optimizing some objective function

Formally stated: given all items ๐‘‰ = {๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘›}, an objective function๐‘“: 2๐‘‰ โ†’ R and a budget ๐ต, it is to find a subset ๐‘‹ โŠ† ๐‘‰ such that

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ ๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต.

Application ๐’—๐’Š ๐’‡

maximum coverage a set of elements size of the union

sparse regression an observation variable MSE of prediction

influence maximization a social network user influence spread

document summarization a sentence summary quality

sensor placement a place to install a sensor entropy

Many applications, but NP-hard in general!

http://staff.ustc.edu.cn/~chaoqian/

Subset selection - submodular

Monotone: for any ๐‘‹ โŠ† ๐‘Œ โŠ† ๐‘‰, ๐‘“ ๐‘‹ โ‰ค ๐‘“(๐‘Œ)

Subset selection: given all items ๐‘‰ = {๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘›}, an objective function๐‘“: 2๐‘‰ โ†’ R and a budget ๐ต, it is to find a subset ๐‘‹ โŠ† ๐‘‰ such that

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ ๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต.

Submodular [Nemhauser et al., MPโ€™78] : satisfy the natural diminishing returns property, i.e., for any ๐‘‹ โŠ† ๐‘Œ โŠ† ๐‘‰, ๐‘ฃ โˆ‰ ๐‘Œ,

๐‘“ ๐‘‹ โˆช ๐‘ฃ โˆ’ ๐‘“ ๐‘‹ โ‰ฅ ๐‘“ ๐‘Œ โˆช ๐‘ฃ โˆ’ ๐‘“(๐‘Œ)

Submodular ratio [Zhang & Vorobeychi, AAAIโ€™16] :

๐›พ๐‘“ = min๐‘‹โŠ†๐‘Œ,๐‘ฃโˆ‰๐‘Œ

๐‘“ ๐‘‹ โˆช ๐‘ฃ โˆ’ ๐‘“ ๐‘‹

๐‘“ ๐‘Œ โˆช ๐‘ฃ โˆ’ ๐‘“(๐‘Œ)

The optimal approximation guarantee [Nemhauser & Wolsey, MORโ€™78] :

1 โˆ’ 1/๐‘’ โ‰ˆ 0.632 by the greedy algorithm

Discrete analogue of convexity!

http://staff.ustc.edu.cn/~chaoqian/

Variants of subset selection

โ€ข Monotone set function maximization with size constraints

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ ๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต

๐‘š๐‘–๐‘›๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ /๐‘”(๐‘‹)

โ€ข Ratio optimization of monotone functions

1 โˆ’ 1/๐‘’๐›พ

[Das & Kempe, ICMLโ€™11]

1/2[Ohsaka & Yoshida, NIPSโ€™15]

1 โˆ’ ๐‘’โˆ’1/(2โˆ†)

[Tschiatschek et al, AAAIโ€™17]

(1 โˆ’ 1/๐‘’)/2[Soma et al., ICMLโ€™14]

(๐›พ/2) 1 โˆ’ 1/๐‘’๐›พ

[Zhang & Vorobeychik, AAAIโ€™16]

|๐‘‹โˆ—|

(1 + ( ๐‘‹โˆ— โˆ’ 1)(1 โˆ’ ๐œ…))๐›พ[Bai et al., ICMLโ€™16]

๐‘‹ โ‰ค ๐ต โ†’ ๐‘ ๐‘‹ โ‰ค ๐ต

โ€ข Monotone set function maximization with general constraints

๐‘‹: a subset โ†’ a multiset

โ€ข Monotone multiset function maximization with size constraints

๐‘‹: a subset โ†’ ๐‘˜ subsets

โ€ข Monotone ๐‘˜-submodular function maximization with size constraints

๐‘‹: a subset โ†’ a sequence

โ€ข Monotone sequence function maximization with size constraints

http://staff.ustc.edu.cn/~chaoqian/

Previous approaches

โ€ข Greedy algorithms

Process: iteratively select one item that makes some criterion

currently optimized

Iteration j:

Iteration 1:V =

{๐‘ฃ1, ๐‘ฃ2, โ€ฆ , ๐‘ฃ๐‘›}

๐‘ฃโˆ—

๐‘‹1 = {๐‘ฃโˆ—}

๐‘ฃโˆ— =

arg๐‘š๐‘Ž๐‘ฅ๐‘ฃโˆˆ๐‘‰โˆ–๐‘‹๐‘—โˆ’1๐‘“ ๐‘‹๐‘—โˆ’1 โˆช ๐‘ฃ โˆ’ ๐‘“ ๐‘‹๐‘—โˆ’1

๐‘ฃโˆ—

๐‘‰ โˆ– ๐‘‹๐‘—โˆ’1๐‘‹๐‘— =

๐‘‹๐‘—โˆ’1 โˆช {๐‘ฃโˆ—}

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ ๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต

http://staff.ustc.edu.cn/~chaoqian/

Previous approaches

๐‘ฃโˆ— =

arg๐‘š๐‘Ž๐‘ฅ๐‘ฃโˆˆ๐‘‰โˆ–๐‘‹๐‘—โˆ’1๐‘“ ๐‘‹๐‘—โˆ’1 โˆช ๐‘ฃ โˆ’ ๐‘“(๐‘‹๐‘—โˆ’1)

๐‘ ๐‘‹๐‘—โˆ’1 โˆช ๐‘ฃ โˆ’ ๐‘(๐‘‹๐‘—โˆ’1)

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ ๐‘ . ๐‘ก. ๐‘(๐‘‹) โ‰ค ๐ต

Iteration j:

Iteration 1:V =

{๐‘ฃ1, ๐‘ฃ2, โ€ฆ , ๐‘ฃ๐‘›}

๐‘ฃโˆ—

๐‘‹1 = {๐‘ฃโˆ—}

๐‘ฃโˆ—

๐‘‰ โˆ– ๐‘‹๐‘—โˆ’1๐‘‹๐‘— =

๐‘‹๐‘—โˆ’1 โˆช {๐‘ฃโˆ—}

โ€ข Greedy algorithms

Process: iteratively select one item that makes some criterion

currently optimized

http://staff.ustc.edu.cn/~chaoqian/

Previous approaches

๐‘ฃโˆ— =

arg๐‘š๐‘–๐‘›๐‘ฃโˆˆ๐‘‰โˆ–๐‘‹๐‘—โˆ’1๐‘“ ๐‘‹๐‘—โˆ’1 โˆช ๐‘ฃ โˆ’ ๐‘“(๐‘‹๐‘—โˆ’1)

๐‘” ๐‘‹๐‘—โˆ’1 โˆช ๐‘ฃ โˆ’ ๐‘”(๐‘‹๐‘—โˆ’1)

๐‘š๐‘–๐‘›๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ /๐‘”(๐‘‹)

โ€ข Greedy algorithms

Process: iteratively select one item that makes some criterion

currently optimized

Iteration j:

Iteration 1:V =

{๐‘ฃ1, ๐‘ฃ2, โ€ฆ , ๐‘ฃ๐‘›}

๐‘ฃโˆ—

๐‘‹1 = {๐‘ฃโˆ—}

๐‘ฃโˆ—

๐‘‰ โˆ– ๐‘‹๐‘—โˆ’1๐‘‹๐‘— =

๐‘‹๐‘—โˆ’1 โˆช {๐‘ฃโˆ—}

http://staff.ustc.edu.cn/~chaoqian/

Previous approaches

Iteration j:

Iteration 1:V =

{๐‘ฃ1, ๐‘ฃ2, โ€ฆ , ๐‘ฃ๐‘›}

(๐‘ฃโˆ—, ๐‘–โˆ—)๐‘‹๐‘–โˆ— = {๐‘ฃโˆ—}

(๐‘ฃโˆ—, ๐‘–โˆ—) = arg๐‘š๐‘Ž๐‘ฅ๐‘ฃโˆˆ๐‘‰โˆ–โ‹ƒ๐‘–๐‘‹๐‘–, ๐‘–โˆˆ{1,2, โ€ฆ ,๐‘˜}

๐‘“ ๐‘‹1, โ€ฆ , ๐‘‹๐‘– โˆช {๐‘ฃ}, โ€ฆ , ๐‘‹๐‘˜ โˆ’ ๐‘“ ๐‘‹1, โ€ฆ , ๐‘‹๐‘– , โ€ฆ , ๐‘‹๐‘˜

(๐‘ฃโˆ—, ๐‘–โˆ—)๐‘‰ โˆ– โ‹ƒ1โ‰ค๐‘–โ‰ค๐‘˜๐‘‹๐‘–

๐‘‹๐‘–โˆ— =๐‘‹๐‘–โˆ— โˆช {๐‘ฃโˆ—}

๐‘š๐‘Ž๐‘ฅ๐‘‹1,๐‘‹2,โ€ฆ ,๐‘‹๐‘˜โŠ†๐‘‰ ๐‘“ ๐‘‹1, ๐‘‹2, โ€ฆ , ๐‘‹๐‘˜ ๐‘ . ๐‘ก. โ‹ƒ1โ‰ค๐‘–โ‰ค๐‘˜ ๐‘‹๐‘– โ‰ค ๐ต

โ€ข Greedy algorithms

Process: iteratively select one item that makes some criterion

currently optimized

http://staff.ustc.edu.cn/~chaoqian/

Previous approaches

โ€ข Greedy algorithms

Process: iteratively select one item that makes some criterion

currently optimized

Weakness: get stuck in local optima due to the greedy behavior

Iteration j:

Iteration 1:V =

{๐‘ฃ1, ๐‘ฃ2, โ€ฆ , ๐‘ฃ๐‘›}

๐‘ฃโˆ—

๐‘‹1 = {๐‘ฃโˆ—}

๐‘ฃโˆ—

๐‘‰ โˆ– ๐‘‹๐‘—โˆ’1๐‘‹๐‘— =

๐‘‹๐‘—โˆ’1 โˆช {๐‘ฃโˆ—}

http://staff.ustc.edu.cn/~chaoqian/

Previous approaches (conโ€™t)

โ€ข Relaxation methods

Process: relax the original problem, then find the optimal

solutions to the relaxed problem

Weakness: the optimal solution of the relaxed problem may be distant to the true optimum

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ ๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต

๐‘š๐‘Ž๐‘ฅ๐‘คโˆŠ๐‘…๐‘› ๐‘” ๐‘ค ๐‘ . ๐‘ก. |๐‘ค|0 โ‰ค ๐ต

๐‘š๐‘Ž๐‘ฅ๐‘คโˆŠ๐‘…๐‘› ๐‘” ๐‘ค ๐‘ . ๐‘ก. |๐‘ค|1 โ‰ค ๐ต

non-convex

convex

http://staff.ustc.edu.cn/~chaoqian/

Motivation

1. Optimize the criterion

2. Keep the size small

Two conflicting objectives:

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. |๐‘ฅ| โ‰ค ๐ต

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ

๐‘š๐‘–๐‘›๐‘ฅโˆŠ{0,1}๐‘› ๐‘š๐‘Ž๐‘ฅ{|๐‘ฅ| โˆ’ ๐ต, 0}

Subset selection:

Why not directly optimize the bi-objective formulation?

๐‘š๐‘–๐‘›๐‘ฅโˆŠ 0,1 ๐‘› (โˆ’๐‘“ ๐‘ฅ , |๐‘ฅ|)

a subset ๐‘‹ โŠ† ๐‘‰

http://staff.ustc.edu.cn/~chaoqian/

Outline

Introduction

Pareto optimization for subset selection

Pareto optimization for large-scale subset selection

Pareto optimization for noisy subset selection

Conclusion

http://staff.ustc.edu.cn/~chaoqian/

Pareto optimization

The basic idea:

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. ๐‘ฅ โ‰ค ๐ต

๐‘š๐‘–๐‘›๐‘ฅ (๐‘“1 ๐‘ฅ , ๐‘“2 ๐‘ฅ )

bi-objective optimization

z

x

y

๐‘“1

๐‘“2

better ๐‘“1better ๐‘“2

worse ๐‘“1better ๐‘“2

x dominates z :

๐‘“1 ๐‘ฅ < ๐‘“1 ๐‘ง โ‹€ ๐‘“2 ๐‘ฅ < ๐‘“2 ๐‘ง

x is incomparable with y :

๐‘“1 ๐‘ฅ > ๐‘“1 ๐‘ฆ โ‹€ ๐‘“2 ๐‘ฅ < ๐‘“2 ๐‘ฆ

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. ๐‘(๐‘ฅ) โ‰ค ๐ต

๐‘š๐‘–๐‘›๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ /๐‘”(๐‘ฅ)

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1,โ€ฆ , ๐‘˜}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. ๐‘ฅ โ‰ค ๐ต

http://staff.ustc.edu.cn/~chaoqian/

Pareto optimization

The basic idea:

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. ๐‘ฅ โ‰ค ๐ต

๐‘š๐‘–๐‘›๐‘ฅ (๐‘“1 ๐‘ฅ , ๐‘“2 ๐‘ฅ )

bi-objective optimization

๐‘š๐‘–๐‘›๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ /๐‘”(๐‘ฅ)

population new solutions

reproduction

evaluation & updatinginitialization

A simple multi-objective evolutionary algorithm [Laumanns et al., TEvCโ€™04]

Initialization: put a random or specialsolution into the population ๐‘ƒ

Reproduction: pick a solution randomlyfrom ๐‘ƒ, and randomly change it (e.g., flipeach bit of ๐‘ฅ โˆŠ {0,1}๐‘› with prob. 1/๐‘› )

Evaluation & Updating: if the newsolution is not dominated, put it into ๐‘ƒand weed out bad solutions

Output: select the best solution w.r.t. the original problem

How to transform?

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. ๐‘(๐‘ฅ) โ‰ค ๐ต

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1,โ€ฆ , ๐‘˜}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. ๐‘ฅ โ‰ค ๐ต

http://staff.ustc.edu.cn/~chaoqian/

Pareto optimization vs Greedy algorithms

Greedy algorithms:

โ€ข Produce a new solution by adding a single item (single-bit forward search: 0 โ†’ 1)

โ€ข Maintain only one solution

Pareto optimization:

โ€ข Produce a new solution by flipping each bit of a solutionwith prob. 1/๐‘› (single-bit forward search, backward search,multi-bit search)

โ€ข Maintain several non-dominated solutions due to bi-objective optimization

Pareto optimization may have a better ability of avoiding local optima!

http://staff.ustc.edu.cn/~chaoqian/

Monotone set function maximization with size constraints

The POSS approach [Qian, Yu and Zhou, NIPSโ€™15]

Transformation:

Initialization: put the special solution {0}๐‘›

into the population ๐‘ƒ

Reproduction: pick a solution ๐‘ฅ randomlyfrom ๐‘ƒ, and flip each bit of ๐‘ฅ with prob.1/๐‘› to produce a new solution

Evaluation & Updating: if the newsolution is not dominated, put it into ๐‘ƒand weed out bad solutions

Output: select the best feasible solution

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. ๐‘ฅ โ‰ค ๐ต original

๐‘š๐‘–๐‘›๐‘ฅโˆŠ{0,1}๐‘› (โˆ’๐‘“ ๐‘ฅ , |๐‘ฅ|) bi-objective

http://staff.ustc.edu.cn/~chaoqian/

Theorem 1. For monotone set function maximization with cardinality constraints, POSS using ๐ธ ๐‘‡ โ‰ค 2๐‘’๐ต2๐‘› finds a solution ๐‘ฅ with ๐‘ฅ โ‰ค ๐ต and ๐‘“ ๐‘ฅ โ‰ฅ (1 โˆ’ ๐‘’โˆ’๐›พ) โˆ™ ๐‘‚๐‘ƒ๐‘‡.

Theoretical analysis

POSS can achieve the same general approximation guarantee as the greedy algorithm

the expected number of iterationsthe best known polynomial-time approximation ratio, previously obtained by the greedy algorithm [Das & Kempe, ICMLโ€™11]

http://staff.ustc.edu.cn/~chaoqian/

๐‘“(๐‘‹ โˆช { ๐‘ฃ}) โˆ’ ๐‘“(๐‘‹) โ‰ฅ๐›พ

๐ต(๐‘‚๐‘ƒ๐‘‡ โˆ’ ๐‘“(๐‘‹))

Proof

the optimal function valuesubmodularity ratio [Das & Kempe, ICMLโ€™11]

Roughly speaking, the improvement by adding a specific item is proportional to the current distance to the optimum

Lemma 1. For any ๐‘‹ โŠ† ๐‘‰, there exists one item ๐‘ฃ โˆˆ ๐‘‰ โˆ– ๐‘‹ such that

http://staff.ustc.edu.cn/~chaoqian/

Proof

Main idea:

โ€ข consider a solution ๐‘ฅ with |๐‘ฅ| โ‰ค ๐‘– and ๐‘“(๐‘ฅ) โ‰ฅ 1 โˆ’ 1 โˆ’๐›พ

๐ต

๐‘–โˆ™ ๐‘‚๐‘ƒ๐‘‡

{0,1}๐‘›

๐‘– = 0 ๐‘– = ๐ต

initial solution 00โ€ฆ0 1 โˆ’ 1 โˆ’๐›พ

๐ต

๐ต

โ‰ฅ 1 โˆ’ ๐‘’โˆ’๐›พ

the desired approximation bound

๐‘“(๐‘‹ โˆช { ๐‘ฃ}) โˆ’ ๐‘“(๐‘‹) โ‰ฅ๐›พ

๐ต(๐‘‚๐‘ƒ๐‘‡ โˆ’ ๐‘“(๐‘‹))

Lemma 1. For any ๐‘‹ โŠ† ๐‘‰, there exists one item ๐‘ฃ โˆˆ ๐‘‰ โˆ– ๐‘‹ such that

http://staff.ustc.edu.cn/~chaoqian/

Proof

{0,1}๐‘›

Main idea:

โ€ข consider a solution ๐‘ฅ with |๐‘ฅ| โ‰ค ๐‘– and ๐‘“(๐‘ฅ) โ‰ฅ 1 โˆ’ 1 โˆ’๐›พ

๐ต

๐‘–โˆ™ ๐‘‚๐‘ƒ๐‘‡

โ€ข in each iteration of POSS:

select ๐‘ฅ from the population ๐‘ƒ, the probability: 1/|๐‘ƒ|

flip one specific 0-bit of ๐‘ฅ to 1-bit, the probability: 1

๐‘›1 โˆ’

1

๐‘›

๐‘›โˆ’1โ‰ฅ

1

๐‘’๐‘›

๐‘ฅโ€ฒ = ๐‘ฅ + 1 โ‰ค ๐‘– + 1 and ๐‘“(๐‘ฅโ€ฒ) โ‰ฅ 1 โˆ’ 1 โˆ’๐›พ

๐ต

๐‘–+1โˆ™ ๐‘‚๐‘ƒ๐‘‡

๐‘– ๐‘– + 1 the probability: 1

๐‘ƒโˆ™1

๐‘’๐‘›

๐‘“(๐‘‹ โˆช { ๐‘ฃ}) โˆ’ ๐‘“(๐‘‹) โ‰ฅ๐›พ

๐ต(๐‘‚๐‘ƒ๐‘‡ โˆ’ ๐‘“(๐‘‹))

Lemma 1. For any ๐‘‹ โŠ† ๐‘‰, there exists one item ๐‘ฃ โˆˆ ๐‘‰ โˆ– ๐‘‹ such that

http://staff.ustc.edu.cn/~chaoqian/

Proof

๐‘– ๐‘– + 1 the probability: 1

๐‘ƒโˆ™1

๐‘’๐‘›

1

2๐‘’๐ต๐‘›

๐‘ƒ โ‰ค 2๐ต

๐‘– ๐‘– + 1 the expected number of iterations: 2๐‘’๐ต๐‘›

๐‘– = 0 ๐ต the expected number of iterations: ๐ต โˆ™ 2๐‘’๐ต๐‘›

{0,1}๐‘›

๐‘“(๐‘‹ โˆช { ๐‘ฃ}) โˆ’ ๐‘“(๐‘‹) โ‰ฅ๐›พ

๐ต(๐‘‚๐‘ƒ๐‘‡ โˆ’ ๐‘“(๐‘‹))

Lemma 1. For any ๐‘‹ โŠ† ๐‘‰, there exists one item ๐‘ฃ โˆˆ ๐‘‰ โˆ– ๐‘‹ such that

Main idea:

โ€ข consider a solution ๐‘ฅ with |๐‘ฅ| โ‰ค ๐‘– and ๐‘“(๐‘ฅ) โ‰ฅ 1 โˆ’ 1 โˆ’๐›พ

๐ต

๐‘–โˆ™ ๐‘‚๐‘ƒ๐‘‡

โ€ข in each iteration of POSS:

http://staff.ustc.edu.cn/~chaoqian/

Theoretical analysis

Theorem 2. For the Exponential Decay subclass of sparse regression, POSS using ๐ธ ๐‘‡ = ๐‘‚(๐ต2 ๐‘› โˆ’ ๐ต ๐‘› log ๐‘›) finds an optimal solution, while the greedy algorithm cannot.

POSS can do better than the greedy algorithm in cases[Das & Kempe, STOCโ€™08]

Theorem 1. For monotone set function maximization with cardinality constraints, POSS using ๐ธ ๐‘‡ โ‰ค 2๐‘’๐ต2๐‘› finds a solution ๐‘ฅ with ๐‘ฅ โ‰ค ๐ต and ๐‘“ ๐‘ฅ โ‰ฅ (1 โˆ’ ๐‘’โˆ’๐›พ) โˆ™ ๐‘‚๐‘ƒ๐‘‡.

POSS can achieve the same general approximation guarantee as the greedy algorithm

the best known polynomial-time approximation ratio, previously obtained by the greedy algorithm [Das & Kempe, ICMLโ€™11]

http://staff.ustc.edu.cn/~chaoqian/

Sparse regression

Sparse regression [Tropp, TITโ€™04] : find a sparse approximation solution to the linear regression problem

Formally stated: given all observation variables ๐‘‰ = {๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘›}, apredictor variable ๐‘ง and a budget ๐ต, it is to find a subset ๐‘‹ โŠ† ๐‘‰ suchthat

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘…๐‘ง,๐‘‹2 =

Var ๐‘ง โˆ’ MSE๐‘ง,๐‘‹Var ๐‘ง

๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต.

http://staff.ustc.edu.cn/~chaoqian/

Experimental results - ๐‘…2 values

greedy algorithms relaxation methods

POSS is significantly better than all the compared methods on all data sets

the size constraint: B = ๐Ÿ– the number of iterations of POSS: ๐Ÿ๐’†๐‘ฉ๐Ÿ๐’

exhaustive search

http://staff.ustc.edu.cn/~chaoqian/

Experimental results - ๐‘…2 values

POSS tightly follows OPT, and has a clear advantage over the rest methods

different size constraints: ๐‘ฉ = ๐Ÿ‘ โ†’ ๐Ÿ–

http://staff.ustc.edu.cn/~chaoqian/

Experimental results โ€“ running time

POSS can be much more efficient in practice than in theoretical analysis

OPT: ๐‘›๐ต/๐ต๐ต greedy methods (FR): ๐ต๐‘› POSS: 2๐‘’๐ต2๐‘›

theoretical running time

http://staff.ustc.edu.cn/~chaoqian/

Monotone set function maximization with general constraints

The POMC approach [Qian, Shi, Yu and Tang, IJCAIโ€™17]

Transformation:

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. ๐‘(๐‘ฅ) โ‰ค ๐ต original

๐‘š๐‘–๐‘›๐‘ฅโˆŠ{0,1}๐‘› (โˆ’๐‘“ ๐‘ฅ , ๐‘(๐‘ฅ)) bi-objective

Theory: POMC can achieve the same approximation guarantee (๐›พ/2)(1 โˆ’ ๐‘’โˆ’๐›พ) as the greedy algorithm [Zhang & Vorobeychik, AAAIโ€™16]

Application:

influence maximization

http://staff.ustc.edu.cn/~chaoqian/

Monotone multiset function maximization with size constraints

The POMS approach [Qian, Zhang, Tang and Yao, AAAIโ€™18]

Transformation:

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠZ+๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. |๐‘ฅ| โ‰ค ๐ต original

๐‘š๐‘–๐‘›๐‘ฅโˆŠZ+๐‘› (โˆ’๐‘“ ๐‘ฅ , |๐‘ฅ|) bi-objective

Theory: POMS can achieve the same approximation guarantee (1 โˆ’ 1/๐‘’)/2 as the greedy algorithm [Soma et al., ICMLโ€™14]

Application:

generalized influencemaximization

http://staff.ustc.edu.cn/~chaoqian/

Monotone k-submodular function maximization with size constraints

The MOMS approach [Qian, Shi, Tang and Zhou, TEvC in press]

Transformation:

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ{0,1,โ€ฆ,๐‘˜}๐‘› ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. ๐‘ฅ โ‰ค ๐ต original

๐‘š๐‘–๐‘›๐‘ฅโˆŠ{0,1,โ€ฆ,๐‘˜}๐‘› (โˆ’๐‘“ ๐‘ฅ , |๐‘ฅ|) bi-objective

Theory: MOMS can achieve the same approximation guarantee 1/2 as the greedy algorithm [Ohsaka & Yoshida, NIPSโ€™15]

Application:

sensor placement

http://staff.ustc.edu.cn/~chaoqian/

Monotone sequence function maximization with size constraints

The POSeqSel approach [Qian, Feng and Tang, IJCAIโ€™18]

Transformation:

๐‘š๐‘Ž๐‘ฅ๐‘ฅโˆŠ๐’ฎ ๐‘“ ๐‘ฅ ๐‘ . ๐‘ก. |๐‘ฅ| โ‰ค ๐ต original

๐‘š๐‘–๐‘›๐‘ฅโˆŠ๐’ฎ (โˆ’๐‘“ ๐‘ฅ , |๐‘ฅ|) bi-objective

Theory: POSeqSel can achieve the approximation guarantee 1 โˆ’ ๐‘’โˆ’1/2 better than the greedy algorithm [Tschiatschek et al., AAAIโ€™17]

Application:

movie recommendation

http://staff.ustc.edu.cn/~chaoqian/

Ratio optimization of monotone functions

The PORM approach [Qian, Shi, Yu, Tang and Zhou, IJCAIโ€™17]

Transformation:

๐‘š๐‘–๐‘›๐‘ฅโˆŠ{0,1}๐‘› ๐‘“ ๐‘ฅ /๐‘”(๐‘ฅ) original

๐‘š๐‘–๐‘›๐‘ฅโˆŠ{0,1}๐‘› (๐‘“ ๐‘ฅ , โˆ’๐‘”(๐‘ฅ)) bi-objective

Theory: PORM can achieve the same approximation guarantee |๐‘‹โˆ—|

(1+( ๐‘‹โˆ— โˆ’1)(1โˆ’๐œ…))๐›พas the greedy algorithm [Bai et al., ICMLโ€™16]

Application:

F-measure maximizationin information retrieval

http://staff.ustc.edu.cn/~chaoqian/

Can we make the Pareto optimization method parallelizable?

Pareto optimization for subset selection

achieve superior performance on diverse variants of subset selection both theoretically and empirically

The running time (e.g., 2๐‘’๐ต2๐‘›) for achieving a good solution

unsatisfactory when the problem size (e.g., ๐ต and ๐‘›) is large

A sequential algorithm that cannot be readily parallelized

restrict the application to large-scale real-world problems

http://staff.ustc.edu.cn/~chaoqian/

Outline

Introduction

Pareto optimization for subset selection

Pareto optimization for large-scale subset selection

Pareto optimization for noisy subset selection

Conclusion

http://staff.ustc.edu.cn/~chaoqian/

Pareto optimization

1. randomly generate a solution,and put it into the population ๐‘ƒ;

2. loop| 2.1 pick a solution randomly from ๐‘ƒ;| 2.2 randomly change it to make a new one;| 2.3 if the new one is not strictly worse| | 2.3.1 put it into ๐‘ƒ;| | 2.3.2 remove worse solutions from ๐‘ƒ;3. when terminates, select the best feasible

solution from ๐‘ƒ.

randomlyRandom solution pick a

solution

a new solution

population ๐‘ƒ

flip each bit with prob. 1/๐‘›

put it into ๐‘ƒ andremove worse solutions

select the best feasible solution

initial

terminated

Random solution

popula

tion ๐‘ƒ

pick a solution

a new solution

pick a solution

a new solution

iteration 1 iteration 2

popula

tion ๐‘ƒ

http://staff.ustc.edu.cn/~chaoqian/

Parallel Pareto optimization

Random solution

popula

tion ๐‘ƒ

pick a solution

a new solution

pick a solution

a new solution

Random solution

popula

tion ๐‘ƒ

pick a solution

a new solution

popula

tion ๐‘ƒ

pick a solution

a new solution

pick a solution

a new solution

pick a solution

a new solution

pick a solution

a new solution

pick a solution

a new solution

pick a solution

a new solution

pick a solution

a new solution

pick a solution

a new solution

pick a solution

a new solution

iteration 1 iteration 2

popula

tion ๐‘ƒ

http://staff.ustc.edu.cn/~chaoqian/

Parallel Pareto optimization

1

๐‘’๐‘›1 โˆ’ 1 โˆ’

1

๐‘’๐‘›

๐‘

โ‰ˆ๐‘

๐‘’๐‘›

[Qian et al., IJCAIโ€™16]

1

๐‘‡

๐‘

๐‘‡/๐‘

Q: the same solution quality?

POSS

PPOSS

[Qian et al., NIPSโ€™15]

http://staff.ustc.edu.cn/~chaoqian/

Theorem 1. For monotone set function maximization with size constraints, the expected number of iterations until PPOSS finds a solution ๐‘ฅ with ๐‘ฅ โ‰ค ๐ตand ๐‘“ ๐‘ฅ โ‰ฅ (1 โˆ’ ๐‘’โˆ’๐›พ) โˆ™ ๐‘‚๐‘ƒ๐‘‡ is

(1) if ๐‘ = ๐‘œ ๐‘› , then ๐ธ ๐‘‡ โ‰ค 2๐‘’๐ต2๐‘›/๐‘;

(2) if ๐‘ = ฮฉ ๐‘›๐‘– for 1 โ‰ค ๐‘– โ‰ค ๐ต, then ๐ธ ๐‘‡ = ๐‘‚(๐ต2/๐‘–);

(3) if ๐‘ = ฮฉ ๐‘›min{3๐ตโˆ’1,๐‘›} , then ๐ธ ๐‘‡ = ๐‘‚ 1 .

Theoretical analysis

โ€ข When the number ๐‘ of processors is less than the number ๐‘› of items, the number ๐‘‡ of iterations can be reduced linearly w.r.t. the number of processors

the same approximation bound

http://staff.ustc.edu.cn/~chaoqian/

Theorem 1. For monotone function maximization with cardinality constraints, the expected number of iterations until PPOSS finds a solution ๐‘ฅ with ๐‘ฅ โ‰ค ๐ตand ๐‘“ ๐‘ฅ โ‰ฅ (1 โˆ’ ๐‘’โˆ’๐›พ) โˆ™ ๐‘‚๐‘ƒ๐‘‡ is

(1) if ๐‘ = ๐‘œ ๐‘› , then ๐ธ ๐‘‡ โ‰ค 2๐‘’๐ต2๐‘›/๐‘;

(2) if ๐‘ = ฮฉ ๐‘›๐‘– for 1 โ‰ค ๐‘– โ‰ค ๐ต, then ๐ธ ๐‘‡ = ๐‘‚(๐ต2/๐‘–);

(3) if ๐‘ = ฮฉ ๐‘›min{3๐ตโˆ’1,๐‘›} , then ๐ธ ๐‘‡ = ๐‘‚ 1 .

Theoretical analysis

โ€ข When the number ๐‘ of processors is less than the number ๐‘› of items, the number ๐‘‡ of iterations can be reduced linearly w.r.t. the number of processors

the same approximation bound

โ€ข With increasing number ๐‘ of processors, the number ๐‘‡ of iterations can be continuously reduced, eventually to a constant

http://staff.ustc.edu.cn/~chaoqian/

Experiments on sparse regression

Compare the speedup as well as the solution quality measured by ๐‘น๐Ÿ values with different number of cores

http://staff.ustc.edu.cn/~chaoqian/

Experiments on sparse regression

the asynchronous version of PPOSS

PPOSS (blue line): achieve speedup around 8 when the number of cores is 10; the ๐‘น๐Ÿ values are stable, and better than Greedy

PPOSS-asy (red line): achieve better speedup (avoid the synchronous cost);

the ๐‘น๐Ÿ values are slightly worse (the noise from asynchronization)

the best previous algorithm [Das & Kempe, ICMLโ€™11]

http://staff.ustc.edu.cn/~chaoqian/

Can we make the Pareto optimization method distributable?

Pareto optimization for subset selection

achieve superior performance on diverse variants of subset selection both theoretically and empirically

Parallel Pareto optimization for subset selection

achieve nearly linear runtime speedup while keeping the solution quality

Require centralized access to the whole data set

restrict the application to large-scale real-world problems

http://staff.ustc.edu.cn/~chaoqian/

Distributed Pareto optimization

Random solution

popula

tion ๐‘ƒ

pick a solution

a new solution

Single machine

Data

POSS[Qian et al., NIPSโ€™15]

stage 1 stage 2

Random solution

popula

tion ๐‘ƒ

pick a solution

a new solution

Single machine

Data

data 1

Split

data ๐‘š

Merg

e

Random solution

popula

tion ๐‘ƒ

pick a solution

a new solution

Single machine

Random solution

popula

tion ๐‘ƒ

pick a solution

a new solution

Single machine

DPOSS[Qian et al., IJCAIโ€™18]

http://staff.ustc.edu.cn/~chaoqian/

Experiments on sparse regression

Compare DPOSS with the state-of-the-art distributed greedy algorithm RandGreeDi [Mirzasoleiman et al., JMLRโ€™16] under different number of machines

On regular-scale data sets

DPOSS is always better than RandGreeDi

http://staff.ustc.edu.cn/~chaoqian/

Experiments on sparse regression

On regular-scale data sets

DPOSS is very close to the centralized POSS

On large-scale data sets

DPOSS is better than RandGreeDi

http://staff.ustc.edu.cn/~chaoqian/

Experiments on maximum coverage

On regular-scale data sets

On large-scale data sets

http://staff.ustc.edu.cn/~chaoqian/

Pareto optimization for subset selection

achieve superior performance on diverse variants of subset selection both theoretically and empirically

Parallel Pareto optimization for subset selection

achieve nearly linear runtime speedup while keeping the solution quality

Distributed Pareto optimization for subset selection

achieve very close performance to the centralized algorithm

http://staff.ustc.edu.cn/~chaoqian/

Previous analyses often assume that the exact value of the objective function can be accessed

However, in many applications of subset selection, only a noisy value of the objective function can be obtained

The objective function: the expected number of nodes activated by propagating from ๐‘‹

Influential usersInfluence maximization

noiseThe average number of active nodes of independent diffusion processes [Kempe et al., KDDโ€™03]

Noise

http://staff.ustc.edu.cn/~chaoqian/

How about the performance for noisy subset selection?

Previous analyses often assume that the exact value of the objective function can be accessed

However, in many applications of subset selection, only a noisy value of the objective function can be obtained

The objective function: the mean square error of prediction by ๐‘‹

Sparseregression

noise

Noise

http://staff.ustc.edu.cn/~chaoqian/

Outline

Introduction

Pareto optimization for subset selection

Pareto optimization for large-scale subset selection

Pareto optimization for noisy subset selection

Conclusion

http://staff.ustc.edu.cn/~chaoqian/

Noisy subset selection

Subset selection: given all items ๐‘‰ = {๐‘ฃ1, โ€ฆ , ๐‘ฃ๐‘›}, an objective function ๐‘“: 2๐‘‰ โ†’ R and a budget ๐ต, it is to find a subset ๐‘‹ โŠ† ๐‘‰ such that

๐‘š๐‘Ž๐‘ฅ๐‘‹โŠ†๐‘‰ ๐‘“ ๐‘‹ ๐‘ . ๐‘ก. ๐‘‹ โ‰ค ๐ต.

Multiplicative: 1 โˆ’ ๐œ– ๐‘“ ๐‘‹ โ‰ค ๐น ๐‘‹ โ‰ค 1 + ๐œ– ๐‘“(๐‘‹)

Additive: ๐‘“ ๐‘‹ โˆ’ ๐œ– โ‰ค ๐น ๐‘‹ โ‰ค ๐‘“ ๐‘‹ + ๐œ–

Noise

Applications: influence maximization, sparse regression

crowdsourced image collection summarization [Singla et al., AAAIโ€™16]

maximizing information gain in graphical models [Chen et al., COLTโ€™15]

http://staff.ustc.edu.cn/~chaoqian/

Theoretical analysis for greedy algorithms

Multiplicative noise:

๐‘“ ๐‘‹ โ‰ฅ1

1 +2๐œ–๐ต1 โˆ’ ๐œ– ๐›พ

1 โˆ’1 โˆ’ ๐œ–

1 + ๐œ–

๐ต

1 โˆ’๐›พ

๐ต

๐ต

โˆ™ ๐‘‚๐‘ƒ๐‘‡

Additive noise:

submodularity ratio [Das & Kempe, ICMLโ€™11]

The noiseless approximation guarantee [Das & Kempe, ICMLโ€™11]

๐‘“ ๐‘‹ โ‰ฅ 1 โˆ’ 1 โˆ’๐›พ

๐ต

๐ต

โˆ™ ๐‘‚๐‘ƒ๐‘‡ โ‰ฅ 1 โˆ’ ๐‘’โˆ’๐›พ โˆ™ ๐‘‚๐‘ƒ๐‘‡a constant approximation ratio

๐œ€ โ‰ค 1/๐ต for a constant approximation ratio

The performance largely degrades in noisy environments

๐‘“ ๐‘‹ โ‰ฅ 1 โˆ’ 1 โˆ’๐›พ

๐ต

๐ต

โˆ™ ๐‘‚๐‘ƒ๐‘‡ โˆ’2๐ต

๐›พโˆ’2๐ต

๐›พ๐‘’โˆ’๐›พ ๐œ–

http://staff.ustc.edu.cn/~chaoqian/

Theoretical analysis for POSS

โ€ข POSS can generally achieve the same approximation guarantee in both multiplicative and additive noises

โ€ข POSS has a better ability of avoiding the misleading search direction led by noise

Maximum coverage

Greedy: very bad approximation[Hassidim & Singer, COLTโ€™17]

POSS: find the optimal solution through multi-bit search

POSS: find the optimal solution through backward search

http://staff.ustc.edu.cn/~chaoqian/

PONSS

In our previous work, threshold selection was theoretically shown to be tolerant to noise [Qian et al., ECJโ€™18]

Exponentially decrease the running time

๐‘“ ๐‘ฅ โ‰ฅ ๐‘“(๐‘ฆ) ๐‘“ ๐‘ฅ โ‰ฅ ๐‘“ ๐‘ฆ + ๐œ–

Additive:Multiplicative:

๐‘ฅ โ‰ผ ๐‘ฆ โ‡” ๐‘“ ๐‘ฅ โ‰ฅ

1 + ๐œ–

1 โˆ’ ๐œ–๐‘“(๐‘ฆ)

๐‘ฅ โ‰ค |๐‘ฆ|

๐‘ฅ โ‰ผ ๐‘ฆ โ‡” ๐‘“ ๐‘ฅ โ‰ฅ ๐‘“ ๐‘ฆ + 2๐œ–

๐‘ฅ โ‰ค |๐‘ฆ|

PONSS [Qian et al., NIPSโ€™17]

POSS ๐‘ฅ โ‰ผ ๐‘ฆ โ‡” ๐‘“ ๐‘ฅ โ‰ฅ ๐‘“(๐‘ฆ)

๐‘ฅ โ‰ค |๐‘ฆ|

โ€œbetterโ€

http://staff.ustc.edu.cn/~chaoqian/

Theoretical analysis

Multiplicative noise:

Additive noise:

๐‘“ ๐‘‹ โ‰ฅ1 โˆ’ ๐œ–

1 + ๐œ–1 โˆ’ 1 โˆ’

๐›พ

๐ต

๐ต

โˆ™ ๐‘‚๐‘ƒ๐‘‡

๐‘“ ๐‘‹ โ‰ฅ 1 โˆ’ 1 โˆ’๐›พ

๐ต

๐ต

โˆ™ ๐‘‚๐‘ƒ๐‘‡ โˆ’ 2๐œ–

PONSS

PONSS

๐‘“ ๐‘‹ โ‰ฅ1

1 +2๐œ–๐ต1 โˆ’ ๐œ– ๐›พ

1 โˆ’1 โˆ’ ๐œ–

1 + ๐œ–

๐ต

1 โˆ’๐›พ

๐ต

๐ต

โˆ™ ๐‘‚๐‘ƒ๐‘‡

๐‘“ ๐‘‹ โ‰ฅ 1 โˆ’ 1 โˆ’๐›พ

๐ต

๐ต

โˆ™ ๐‘‚๐‘ƒ๐‘‡ โˆ’2๐ต

๐›พโˆ’2๐ต

๐›พ๐‘’โˆ’๐›พ ๐œ–

POSS & Greedy

POSS & Greedy

โ‰ฅ

significantly better

โ‰ฅ

significantly better

๐›พ = 1 (submodular), ๐œ– is a constant

a constant approximation ratio

ฮ˜(1/๐ต) approximation ratio

http://staff.ustc.edu.cn/~chaoqian/

Experimental results - influence maximization

PONSS (red line) vs POSS (blue line) vs Greedy (black line):

โ€ข Noisy evaluation: the average of 10 independent Monte Carlo simulations

โ€ข The output solution: the average of 10,000 independent Monte Carlo simulations

http://staff.ustc.edu.cn/~chaoqian/

Experimental results - sparse regression

PONSS (red line) vs POSS (blue line) vs Greedy (black line):

โ€ข Noisy evaluation: a random sample of 1,000 instances

โ€ข The output solution: the whole data set

http://staff.ustc.edu.cn/~chaoqian/

Conclusion

โ€ข Pareto optimization for subset selection

โ€ข Pareto optimization for large-scale subset selection

โ€ข Pareto optimization for noisy subset selection

http://staff.ustc.edu.cn/~chaoqian/

Future work

โ€ข Problem issuesโ€“ Non-monotone objective functions

โ€“ Continuous submodular objective functions

โ€“ Multiple objective functions

โ€ข Algorithm issuesโ€“ More complicated MOEAs

โ€ข Theory issuesโ€“ Beat the best known approximation guarantee

โ€ข Application issuesโ€“ Attempts on more large-scale real-world applications

http://staff.ustc.edu.cn/~chaoqian/

Collaborators:

Ke Tang

Yang YuJing-Cheng Shi

USTC:

Nanjing University:

Zhi-Hua Zhou

SUSTech:

Guiying LiChao Feng Xin Yao

http://staff.ustc.edu.cn/~chaoqian/

For details

โ€ข C. Qian, Y. Yu, and Z.-H. Zhou. Subset selection by Pareto optimization. In: Advances inNeural Information Processing Systems 28 (NIPS'15), Montreal, Canada, 2015.

โ€ข C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Parallel Pareto optimization for subsetselection. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence(IJCAI'16), New York, NY, 2016.

โ€ข C. Qian, J.-C. Shi, Y. Yu, and K. Tang. On subset selection with general cost constraints. In:Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI'17),Melbourne, Australia, 2017.

โ€ข C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Optimizing ratio of monotone setfunctions. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence(IJCAI'17), Melbourne, Australia, 2017.

โ€ข C. Qian, J.-C. Shi, K. Tang, and Z.-H. Zhou. Constrained monotone k-submodular functionmaximization using multi-objective evolutionary algorithms with theoretical guarantee.IEEE Transactions on Evolutionary Computation, in press.

http://staff.ustc.edu.cn/~chaoqian/

For details

THANK YOU !

โ€ข C. Qian, J.-C. Shi, Y. Yu, K. Tang, and Z.-H. Zhou. Subset selection under noise. In:Advances in Neural Information Processing Systems 30 (NIPSโ€™17), Long Beach, CA, 2017.

โ€ข C. Qian, Y. Zhang, K. Tang, and X. Yao. On multiset selection with size constraints. In:Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI'18), New Orleans,LA, 2018.

โ€ข C. Qian, G. Li, C. Feng, and K. Tang. Distributed Pareto optimization for subset selection.In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI'18),Stockholm, Sweden, 2018.

โ€ข C. Qian, C. Feng, and K. Tang. Sequence selection by Pareto optimization. In: Proceedings ofthe 27th International Joint Conference on Artificial Intelligence (IJCAI'18), Stockholm, Sweden,2018.

โ€ข C. Qian, Y. Yu, and Z.-H. Zhou. Analyzing evolutionary optimization in noisyenvironments. Evolutionary Computation, 2018, 26(1): 1-41.

Codes available at http://staff.ustc.edu.cn/~chaoqian/

Recommended