Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Non-Adaptive Adaptive Sampling on Turnstile Streams
Sepideh MahabadiTTIC
Ilya RazenshteynMSR Redmond
Samson ZhouCMU
David WoodruffCMU
An algorithmic paradigm for solving many data summarization tasks.
Adaptive Sampling
An algorithmic paradigm for solving many data summarization tasks.
Adaptive Sampling
Given: ππ vectors in βπ π
β’ Sample a vector w.p. proportional to its normβ’ Project all vectors away from the selected subspaceβ’ Repeat on the residuals
Adaptive Sampling Example
Adaptive Sampling Example
Adaptive Sampling Example
Adaptive Sampling Example
Adaptive Sampling Example
Adaptive Sampling Example
Adaptive Sampling Example
Adaptive Sampling Example
Adaptive Sampling Example
Adaptive Sampling Example
Data Summarization TasksGiven: β’ ππ by ππ matrix π¨π¨ β βππΓπ π
β’ parameter ππGoal: β’ Find a representation (of βsize ππβ) for the dataβ’ Optimize a predefined function
Rows correspond to ππ data points
e.g. feature vectors of objects in a dataset
Given: β’ ππ by ππ matrix π¨π¨ β βππΓπ π
β’ parameter ππGoal: β’ Find a representation (of βsize ππβ) for the dataβ’ Optimize a predefined functionInstances:β’ Row/Column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume sampling/maximization
Data Summarization Tasks
β’ Find a subset ππ of ππ rows minimizing the squared distance of all rows to the subspace of ππ
π΄π΄ β ππππππππππ π΄π΄ πΉπΉ
Best set of representatives
Data Summarization TasksGiven: β’ ππ by ππ matrix π¨π¨ β βππΓπ π
β’ parameter ππGoal: β’ Find a representation (of βsize ππβ) for the dataβ’ Optimize a predefined functionInstances:β’ Row/Column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume sampling/maximization
β’ Find a subspace π»π» of dimension ππ minimizing the squared distance of all rows to π»π»
π΄π΄ β πππππππππ»π» π΄π΄ πΉπΉ
Best approximation with a subspace
Data Summarization TasksGiven: β’ ππ by ππ matrix π¨π¨ β βππΓπ π
β’ parameter ππGoal: β’ Find a representation (of βsize ππβ) for the dataβ’ Optimize a predefined functionInstances:β’ Row/Column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume sampling/maximization
β’ Find π π subspaces π»π»1, β¦ ,π»π»π π each of dimension ππminimizing
βππ=1ππ ππ π΄π΄ππ ,π»π» 2
Best approximation with several subspaces
Data Summarization TasksGiven: β’ ππ by ππ matrix π¨π¨ β βππΓπ π
β’ parameter ππGoal: β’ Find a representation (of βsize ππβ) for the dataβ’ Optimize a predefined functionInstances:β’ Row/Column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume sampling/maximization
β’ Find a subset ππ of ππ rows that maximizes the volume of the parallelepiped spanned by ππ
Notion for capturing diversityMaximizing diversity
Given: β’ ππ by ππ matrix π¨π¨ β βππΓπ π
β’ parameter ππGoal: β’ Find a representation (of βsize ππβ) for the dataβ’ Optimize a predefined functionInstances:β’ Row/Column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume sampling/maximization
Data Summarization Tasks
Adaptive sampling is used to derive algorithms for all these tasks
Adaptive Sampling[DeshpandeVempala06, DeshpandeVaradarajan07, DeshpandeRademacherVempalaWang06]
β’ Sample row ππ w.p. proportional to distance squared π΄π΄ππ 22
β’ Given: ππ by ππ matrix π¨π¨ β βππΓπ π , parameter ππ
β’ Sample a row π΄π΄ππ with probability π΄π΄ππ 2
2
π΄π΄ πΉπΉ2
Adaptive Sampling[DeshpandeVempala06, DeshpandeVaradarajan07, DeshpandeRademacherVempalaWang06]
β’ Sample row ππ w.p. proportional to distance squared π΄π΄ππ 22
β’ Given: ππ by ππ matrix π¨π¨ β βππΓπ π , parameter ππ
β’ Sample a row π΄π΄ππ with probability π΄π΄ππ 2
2
π΄π΄ πΉπΉ2
Frobenius norm:
π΄π΄ πΉπΉ = βππ βππ π΄π΄ππ,ππ2
Adaptive Sampling[DeshpandeVempala06, DeshpandeVaradarajan07, DeshpandeRademacherVempalaWang06]
β’ Sample row ππ w.p. proportional to π΄π΄ππ πΌπΌ β ππ+ππ 22
β’ Given: ππ by ππ matrix π¨π¨ β βππΓπ π , parameter ππβ’ ππ β β β’ For ππ rounds,
β’ Sample a row π΄π΄ππ with probability π΄π΄ππ πΌπΌβππ
+ππ 22
π΄π΄ πΌπΌβππ+ππ πΉπΉ2
β’ Append π΄π΄ππ to ππ
Project away from sampled subspaceπ΄π΄+ :Moore-Penrose Pseudoinverse
Adaptive Sampling[DeshpandeVempala06, DeshpandeVaradarajan07, DeshpandeRademacherVempalaWang06]
β’ Sample row ππ w.p. proportional to π΄π΄ππ πΌπΌ β ππ+ππ 22
β’ Given: ππ by ππ matrix π¨π¨ β βππΓπ π , parameter ππβ’ ππ β β β’ For ππ rounds,
β’ Sample a row π΄π΄ππ with probability π΄π΄ππ πΌπΌβππ
+ππ 22
π΄π΄ πΌπΌβππ+ππ πΉπΉ2
β’ Append π΄π΄ππ to ππ
Seems inherently sequential
Project away from sampled subspaceπ΄π΄+ :Moore-Penrose Pseudoinverse
Question:
Can we implement Adaptive Sampling in one pass (non-adaptively)?
Streaming AlgorithmsMotivation: Data is huge and cannot be stored in the main memory
Streaming algorithms: Given sequential access to the data, make one or several passes over input
β’ Solve the problem on the fly
β’ Use sub-linear storage
Parameters: Space, number of passes, approximation
Streaming AlgorithmsMotivation: Data is huge and cannot be stored in the main memory
Streaming algorithms: Given sequential access to the data, make one or several passes over input
β’ Solve the problem on the fly
β’ Use sub-linear storage
Parameters: Space, number of passes, approximation
Models:
β’ Row Arrival: rows of π΄π΄ arrive one by one
β’ Turnstile: we receive updates to the entries of the matrix i.e., (ππ, ππ,Ξ) means π΄π΄ππ,ππ β π΄π΄ππ,ππ + Ξ
Streaming AlgorithmsMotivation: Data is huge and cannot be stored in the main memory
Streaming algorithms: Given sequential access to the data, make one or several passes over input
β’ Solve the problem on the fly
β’ Use sub-linear storage
Parameters: Space, number of passes, approximation
Models:
β’ Row Arrival: rows of π΄π΄ arrive one by one
β’ Turnstile: we receive updates to the entries of the matrix i.e., (ππ, ππ,Ξ) means π΄π΄ππ,ππ β π΄π΄ππ,ππ + Ξ
Focus on the row arrival model for the talk
Streaming AlgorithmsMotivation: Data is huge and cannot be stored in the main memory
Streaming algorithms: Given sequential access to the data, make one or several passes over input
β’ Solve the problem on the fly
β’ Use sub-linear storage
Parameters: Space, number of passes, approximation
Models:
β’ Row Arrival: rows of π΄π΄ arrive one by one
β’ Turnstile: we receive updates to the entries of the matrix i.e., (ππ, ππ,Ξ) means π΄π΄ππ,ππ β π΄π΄ππ,ππ + Ξ
Focus on the row arrival model for the talk
Our goal: Simulate ππ rounds of adaptive sampling in 1 pass of streaming Data Summarization tasks were considered in the streaming models in earlier works that used
adaptive sampling [e.g. DVβ06, DRβ10, DRVWβ06]
Outline of Results
1. Simulate adaptive sampling in 1 pass turnstile streamβ’ πΏπΏππ,2 sampling with post processing matrix ππ
2. Applications in turnstile streamβ’ Row/column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume Maximization
3. Volume maximization lower bounds
4. Volume maximization in row arrival
Outline of Results
1. Simulate adaptive sampling in 1 pass turnstile streamβ’ π³π³ππ,ππ sampling with post processing matrix π·π·
2. Applications in turnstile streamβ’ Row/column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume Maximization
3. Volume maximization lower bounds
4. Volume maximization in row arrival
Results: π³π³ππ,ππ Sampling with Post-Processing
Input:β’ π΄π΄ β βππΓππ as a (turnstile) streamβ’ a post-processing π·π· β βππΓππ
Output: samples an index ππ β [ππ] w.p. π¨π¨πππ·π· 22
π¨π¨π·π· πΉπΉ2
Results: π³π³ππ,ππ Sampling with Post-Processing
Input:β’ π΄π΄ β βππΓππ as a (turnstile) streamβ’ a post-processing π·π· β βππΓππ
Output: samples an index ππ β [ππ] w.p. π¨π¨πππ·π· 22
π¨π¨π·π· πΉπΉ2
π·π· corresponds to the projection matrix (πΌπΌ β ππ+ππ)
Results: π³π³ππ,ππ Sampling with Post-Processing
Input:β’ π΄π΄ β βππΓππ as a (turnstile) streamβ’ a post-processing π·π· β βππΓππ
Output: samples an index ππ β [ππ] w.p. 1 Β± ππ π¨π¨πππ·π· 22
π¨π¨π·π· πΉπΉ2 + 1
ππππππππ(ππ) In one pass ππππππππ(ππ, ππβ1, logππ) space
Results: π³π³ππ,ππ Sampling with Post-Processing
Impossible to return entire row instead of index in sublinear space A long stream of small updates + an arbitrarily large update
Input:β’ π΄π΄ β βππΓππ as a (turnstile) streamβ’ a post-processing π·π· β βππΓππ
Output: samples an index ππ β [ππ] w.p. 1 Β± ππ π¨π¨πππ·π· 22
π¨π¨π·π· πΉπΉ2 + 1
ππππππππ(ππ) In one pass ππππππππ(ππ, ππβ1, logππ) space
Results: π³π³ππ,ππ Sampling with Post-Processing
Impossible to return entire row instead of index in sublinear space A long stream of small updates + an arbitrarily large update
Input:β’ π΄π΄ β βππΓππ as a (turnstile) stream, ππ β {ππ,ππ}β’ a post-processing π·π· β βππΓππ
Output: samples an index ππ β [ππ] w.p. 1 Β± ππ π¨π¨πππ·π· 2ππ
π¨π¨π·π· ππ,ππππ + 1
ππππππππ(ππ)
In one pass ππππππππ(ππ, ππβ1, logππ) space
Outline of Results
1. Simulate adaptive sampling in 1 pass turnstile streamβ’ πΏπΏππ,2 sampling with post processing matrix ππ
2. Applications in turnstile streamβ’ Row/column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume Maximization
3. Volume maximization lower bounds
4. Volume maximization in row arrival
Results: Adaptive Sampling
Input: π΄π΄ β βππΓππ as a (turnstile) streamOutput: Return each set πΊπΊ βππ [ππ] of ππ indices w.p. πππΊπΊ s.t.
βππ πππΊπΊ β πππΊπΊ β€ ππβ’ πππΊπΊ: prob. of selecting πΊπΊ via adaptive samplingβ’ w.r.t. either distance or squared distance (i.e., ππ β {1,2})
Results: Adaptive Sampling
Input: π΄π΄ β βππΓππ as a (turnstile) streamOutput: Return each set πΊπΊ βππ [ππ] of ππ indices w.p. πππΊπΊ s.t.
βππ πππΊπΊ β πππΊπΊ β€ ππβ’ πππΊπΊ: prob. of selecting πΊπΊ via adaptive samplingβ’ w.r.t. either distance or squared distance (i.e., ππ β {1,2})
In one pass
ππππππππ(ππ,ππ, ππβ1, logππ) space
Results: Adaptive Sampling
Input: π΄π΄ β βππΓππ as a (turnstile) streamOutput: Return each set πΊπΊ βππ [ππ] of ππ indices w.p. πππΊπΊ s.t.
βππ πππΊπΊ β πππΊπΊ β€ ππβ’ πππΊπΊ: prob. of selecting πΊπΊ via adaptive samplingβ’ w.r.t. either distance or squared distance (i.e., ππ β {1,2})
In one pass
ππππππππ(ππ,ππ, ππβ1, logππ) space
Besides indices S, a noisy set of rows ππ1, β¦ , ππππ are returned β’ Each ππππ is close to the corresponding π΄π΄ππ (w.r.t. residual)
Results: Adaptive Sampling
Impossible to return the row accurately in sublinear space A long stream of small updates + an arbitrarily large update
Input: π΄π΄ β βππΓππ as a (turnstile) streamOutput: Return each set πΊπΊ βππ [ππ] of ππ indices w.p. πππΊπΊ s.t.
βππ πππΊπΊ β πππΊπΊ β€ ππβ’ πππΊπΊ: prob. of selecting πΊπΊ via adaptive samplingβ’ w.r.t. either distance or squared distance (i.e., ππ β {1,2})
In one pass
ππππππππ(ππ,ππ, ππβ1, logππ) space
Besides indices S, a noisy set of rows ππ1, β¦ , ππππ are returned β’ Each ππππ is close to the corresponding π΄π΄ππ (w.r.t. residual)
Outline of Results
1. Simulate adaptive sampling in 1 pass turnstile streamβ’ πΏπΏππ,2 sampling with post processing matrix ππ
2. Applications in turnstile streamβ’ Row/column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume Maximization
3. Volume maximization lower bounds
4. Volume maximization in row arrival
Applications: Row Subset Selection
Input: π΄π΄ β βππΓππ and an integer ππ > 0
Output: ππ rows of π΄π΄ to form π΄π΄ to minimize π΄π΄ β π΄π΄ππ+ππ πΉπΉ
Applications: Row Subset Selection
Our Result: finds M such that,Pr[ π΄π΄ β π΄π΄π΄π΄+π΄π΄ πΉπΉ
2 β€ ππππ ππ + ππ ! π΄π΄ β π΄π΄ππ πΉπΉ2 ] β₯ 2/3
β’ π΄π΄ππ: best rank-k approximation of π΄π΄β’ first one pass turnstile streaming algorithmβ’ ππππππππ(ππ,ππ, logππ) space
Input: π΄π΄ β βππΓππ and an integer ππ > 0
Output: ππ rows of π΄π΄ to form π΄π΄ to minimize π΄π΄ β π΄π΄ππ+ππ πΉπΉ
Applications: Row Subset Selection
Our Result: finds M such that,Pr[ π΄π΄ β π΄π΄π΄π΄+π΄π΄ πΉπΉ
2 β€ ππππ ππ + ππ ! π΄π΄ β π΄π΄ππ πΉπΉ2 ] β₯ 2/3
β’ π΄π΄ππ: best rank-k approximation of π΄π΄β’ first one pass turnstile streaming algorithmβ’ ππππππππ(ππ,ππ, logππ) space
Previous works: centralized setting [e.g. DRVW06, BMD09, GSβ12] and row arrival [e.g., CMMβ17, GPβ14 , BDMMUWZβ18]
Input: π΄π΄ β βππΓππ and an integer ππ > 0
Output: ππ rows of π΄π΄ to form π΄π΄ to minimize π΄π΄ β π΄π΄ππ+ππ πΉπΉ
Applications: Subspace Approximation
Input: π΄π΄ β βππΓππ and an integer ππ > 0Output: ππ-dim subspace π―π― to minimize βππ=1ππ ππ π΄π΄ππ ,π»π» ππ 1/ππ
β’ ππ β 1,2β’ ππ π΄π΄ππ ,π»π» = π΄π΄ππ ππ β π»π»+π»π» 2
Applications: Subspace Approximation
Input: π΄π΄ β βππΓππ and an integer ππ > 0Output: ππ-dim subspace π―π― to minimize βππ=1ππ ππ π΄π΄ππ ,π»π» ππ 1/ππ
β’ ππ β 1,2β’ ππ π΄π΄ππ ,π»π» = π΄π΄ππ ππ β π»π»+π»π» 2
Our Result I: finds H (which is ππ noisy rows of π¨π¨) s.t.,
Pr[ βππ=1ππ ππ π΄π΄ππ ,π―π― ππ 1/ππ β€ ππ ππ + ππ ! βππ=1ππ ππ π΄π΄ππ ,π΄π΄ππ ππ 1/ππ] β₯ 23
β’ π΄π΄ππ: best rank-k approximation of Aβ’ ππππππππ(ππ,ππ, logππ) space
Applications: Subspace Approximation
Input: π΄π΄ β βππΓππ and an integer ππ > 0Output: ππ-dim subspace π―π― to minimize βππ=1ππ ππ π΄π΄ππ ,π»π» ππ 1/ππ
β’ ππ β 1,2β’ ππ π΄π΄ππ ,π»π» = π΄π΄ππ ππ β π»π»+π»π» 2
Our Result I: finds H (which is ππ noisy rows of π¨π¨) s.t.,
Pr[ βππ=1ππ ππ π΄π΄ππ ,π―π― ππ 1/ππ β€ ππ ππ + ππ ! βππ=1ππ ππ π΄π΄ππ ,π΄π΄ππ ππ 1/ππ] β₯ 23
β’ π΄π΄ππ: best rank-k approximation of Aβ’ ππππππππ(ππ,ππ, logππ) spaceβ’ First relative error on turnstile streams that returns noisy rows of A [Levin, Sevekari, Woodruffβ18]
+(1 + ππ)-approximation βlarger number of rows βrows are not from A
Applications: Subspace Approximation
Input: π΄π΄ β βππΓππ and an integer ππ > 0Output: ππ-dim subspace π―π― to minimize βππ=1ππ ππ π΄π΄ππ ,π»π» ππ 1/ππ
β’ ππ β 1,2β’ ππ π΄π΄ππ ,π»π» = π΄π΄ππ ππ β π»π»+π»π» 2
Our Result II: finds H (which is ππππππππ(ππ,ππ/ππ) noisy rows of π¨π¨) s.t.,
Pr[ βππ=1ππ ππ π΄π΄ππ ,π―π― ππ 1/ππ β€ ππ + ππ βππ=1ππ ππ π΄π΄ππ ,π΄π΄ππ ππ 1/ππ] β₯ 23
β’ π΄π΄ππ: best rank-k approximation of Aβ’ ππππππππ(π π ,ππ,ππ/ππ, π₯π₯π₯π₯π₯π₯ππ) space
[Levin, Sevekari, Woodruffβ18] βππππππππ(log ππππ ,ππ, 1/ππ) rows βrows are not from A
Applications: Projective Clustering
Input: π΄π΄ β βππΓππ, target dim ππ and target number of subspaces ππOutput: ππ ππ-dim subspaces π―π―ππ, β¦ ,π―π―ππ to minimize βππ=1ππ ππ π΄π΄ππ ,π―π― ππ 1/ππ
β’ π―π― = π―π―ππ βͺβ―βͺπ―π―ππ and ππ β 1,2β’ ππ π΄π΄ππ ,π»π» = min
ππβ π π π΄π΄ππ ππ β π»π»ππ+π»π»ππ 2
Applications: Projective Clustering
Our Result: finds S (which is ππππππππ(ππ, ππ,ππ/ππ) noisy rows of π¨π¨),which contains a union T of π π ππ-dim subspaces s.t.,
Pr[ βππ=1ππ ππ π΄π΄ππ ,π»π» ππ 1/ππ β€ ππ + ππ βππ=1ππ ππ π΄π΄ππ ,π―π― ππ 1/ππ] β₯ 2/3β’ π»π»: optimal solution to projective clusteringβ’ first one pass turnstile streaming algorithm with sublinear spaceβ’ ππππππππ(ππ,ππ, logππ , π π , 1/ππ) space [BHIβ02, HMβ04, Che09, FMSWβ10] based on coresets, works in row arrival [KRβ15] turnstile but linear in number of points
Input: π΄π΄ β βππΓππ, target dim ππ and target number of subspaces ππOutput: ππ ππ-dim subspaces π―π―ππ, β¦ ,π―π―ππ to minimize βππ=1ππ ππ π΄π΄ππ ,π―π― ππ 1/ππ
β’ π―π― = π―π―ππ βͺβ―βͺπ―π―ππ and ππ β 1,2β’ ππ π΄π΄ππ ,π»π» = min
ππβ π π π΄π΄ππ ππ β π»π»ππ+π»π»ππ 2
Applications: Volume Maximization
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Applications: Volume Maximization
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Volume of the parallelepiped spanned by those vectors
ππ = ππ
Applications: Volume Maximization
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Our Result (Upper Bound I): for an approximation factor πΆπΆ, finds S(set of ππ noisy rows of π¨π¨) s.t.,
Pr[πΌπΌππ ππ! Vol ππ β₯ Vol(ππ)] β₯ 2/3β’ first one pass turnstile streaming algorithmβ’ οΏ½ππ( βππππππ2 πΌπΌ2) space
Applications: Volume Maximization
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Our Result (Upper Bound I): for an approximation factor πΆπΆ, finds S(set of ππ noisy rows of π¨π¨) s.t.,
Pr[πΌπΌππ ππ! Vol ππ β₯ Vol(ππ)] β₯ 2/3β’ first one pass turnstile streaming algorithmβ’ οΏ½ππ( βππππππ2 πΌπΌ2) space [Indyk, M, Oveis Gharan, Rezaei, β19 β20] coreset based οΏ½ππ ππ ππ/ππ approx. and οΏ½ππ(ππππππππ) space for row-arrival streams
Outline of Results
1. Simulate adaptive sampling in 1 pass turnstile streamβ’ πΏπΏππ,2 sampling with post processing matrix ππ
2. Applications in turnstile streamβ’ Row/column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume Maximization
3. Volume maximization lower bounds
4. Volume maximization in row arrival
Volume Maximization Lower Bounds
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Our Result (Lower Bound I): for πΆπΆ, any ππ-pass algorithm that finds πΆπΆππ-approximation w.p. β₯ β63 64 in turnstile-arrival requires Ξ©( βππ πππππΆπΆ2) space.
β’ Our previous upper bound is matches the upper bound up to a factor of πππππ π in space and ππ! in the approximation factor.
Volume Maximization Lower Bounds
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Our Result (Lower Bound II): for a fixed constant πͺπͺ, any one-pass algorithm that finds πͺπͺππ-approximation w.p. β₯ β63 64 in random order row-arrival requires Ξ©(ππ) space
Volume Maximization β Row Arrival
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Our Result (Upper Bound II): for an approximation factor πͺπͺ < β(log ππ) ππ, finds S (set of ππ rows of π¨π¨) s.t.β’ approximation factor οΏ½ππ πͺπͺππ ππ/2 with high probabilityβ’ one pass row-arrival streaming algorithmβ’ οΏ½ππ(ππππ(1/πͺπͺ)ππ) space
Volume Maximization β Row Arrival
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Our Result (Upper Bound II): for an approximation factor πͺπͺ < β(log ππ) ππ, finds S (set of ππ rows of π¨π¨) s.t.β’ approximation factor οΏ½ππ πͺπͺππ ππ/2 with high probabilityβ’ one pass row-arrival streaming algorithmβ’ οΏ½ππ(ππππ(1/πͺπͺ)ππ) space
[Indyk, M, Oveis Gharan, Rezaei, β19 β20] coreset based οΏ½ππ ππ πͺπͺππ/ππ approx. and οΏ½ππ(ππππ/πͺπͺππππ)space for row-arrival streams
π³π³ππ,ππ Sampler1. Simulate adaptive sampling in 1 pass
β’ πΏπΏππ,2 sampling with post processing matrix ππ
2. Applications in turnstile streamβ’ Row/column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume Maximization
3. Volume maximization lower bounds
4. Volume maximization in row arrival
π³π³ππ,ππ Sampler with Post-Processing Matrix
Input: matrix A as a data stream, a post-processing matrix P
Output: index ππ of a row of AP sampled w.p. ~ π΄π΄ππππ 22
π΄π΄ππ πΉπΉ2
π³π³ππ,ππ Sampler with Post-Processing Matrix
Extension of π³π³ππ Sampler [Andoni et al.β10][Monemizadeh, Woodruffβ10][Jowhari et al.β11][Jayaram, Woodruffβ18]
Input: matrix A as a data stream, a post-processing matrix P
Output: index ππ of a row of AP sampled w.p. ~ π΄π΄ππππ 22
π΄π΄ππ πΉπΉ2
Input: vector f as a data stream
Output: index ππ of a coordinate of f sampled w.p. ~ ππππ2
ππ 22
π³π³ππ,ππ Sampler with Post-Processing Matrix
Extension of π³π³ππ Sampler [Andoni et al.β10][Monemizadeh, Woodruffβ10][Jowhari et al.β11][Jayaram, Woodruffβ18]
Input: matrix A as a data stream, a post-processing matrix P
Output: index ππ of a row of AP sampled w.p. ~ π΄π΄ππππ 22
π΄π΄ππ πΉπΉ2
Input: vector f as a data stream
Output: index ππ of a coordinate of f sampled w.p. ~ ππππ2
ππ 22
What is new:1. Generalizing vectors to matrices 2. Handling the post processing matrix ππ
π³π³ππ,ππ SamplerInput: matrix A as a data stream
Output: index ππ of a row of A sampled w.p. ~ π΄π΄ππ 22
ππ πΉπΉ2
Ignore ππ for now
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at random
π³π³ππ,ππ Sampler
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ
π³π³ππ,ππ Sampler
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππPr[ π΅π΅ππ 2
2 β₯ π΄π΄ πΉπΉ2 ] = Pr[ π΄π΄ππ 2
2
π΄π΄ πΉπΉ2 β₯ π‘π‘ππ] = π¨π¨ππ ππ
ππ
π¨π¨ ππππ
π³π³ππ,ππ Sampler
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ
Return ππ that satisfies π©π©ππ ππππ β₯ π¨π¨ ππ
ππ
Pr[ π΅π΅ππ 22 β₯ π΄π΄ πΉπΉ
2 ] = Pr[ π΄π΄ππ 22
π΄π΄ πΉπΉ2 β₯ π‘π‘ππ] = π¨π¨ππ ππ
ππ
π¨π¨ ππππ
π³π³ππ,ππ Sampler
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ
Return ππ that satisfies π©π©ππ ππππ β₯ π¨π¨ ππ
ππ
Pr[ π΅π΅ππ 22 β₯ π΄π΄ πΉπΉ
2 ] = Pr[ π΄π΄ππ 22
π΄π΄ πΉπΉ2 β₯ π‘π‘ππ] = π¨π¨ππ ππ
ππ
π¨π¨ ππππ
π³π³ππ,ππ Sampler
Issues:1. Multiple rows passing the threshold
2. Donβt have access to exact values of π©π©ππ ππππ and π¨π¨ ππ
ππ
π³π³ππ,ππ Sampler
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ Ideally, return the only ππ that satisfies π©π©ππ ππ
ππ β₯ π¨π¨ ππππ
πΈπΈππ: = πͺπͺ π₯π₯π₯π₯π₯π₯ ππππ
Pr[ π΅π΅ππ 22 β₯ πΈπΈππ β π΄π΄ πΉπΉ
2 ] = πππΈπΈππ
Γ π¨π¨ππ ππππ
π¨π¨ ππππ
Issue 1: Multiple rows passing the threshold Set the threshold higher
π³π³ππ,ππ Sampler
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ Ideally, return the only ππ that satisfies π©π©ππ ππ
ππ β₯ π¨π¨ ππππ
πΈπΈππ: = πͺπͺ π₯π₯π₯π₯π₯π₯ ππππ
Pr[squared norm of at least one row exceeds πΈπΈππ β π΄π΄ πΉπΉ2 ] = Ξ©( 1
πΎπΎ2)
Pr[squared norms of more than one row exceed πΈπΈππ β π΄π΄ πΉπΉ2 ] = O( 1
πΎπΎ4)
Pr[ π΅π΅ππ 22 β₯ πΈπΈππ β π΄π΄ πΉπΉ
2 ] = πππΈπΈππ
Γ π¨π¨ππ ππππ
π¨π¨ ππππ
Success prob: Ξ©( ππlog ππ
)
Issue 1: Multiple rows passing the threshold Set the threshold higher
π³π³ππ,ππ Sampler
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ Ideally, return the only ππ that satisfies π©π©ππ ππ
ππ β₯ π¨π¨ ππππ
πΈπΈππ: = πͺπͺ π₯π₯π₯π₯π₯π₯ ππππ
To succeed, repeat οΏ½πΆπΆ(ππ/ππ)
Pr[ π΅π΅ππ 22 β₯ πΈπΈππ β π΄π΄ πΉπΉ
2 ] = πππΈπΈππ
Γ π¨π¨ππ ππππ
π¨π¨ ππππ
Success prob: Ξ©( ππlog ππ
)
Issue 1: Multiple rows passing the threshold Set the threshold higher
π³π³ππ,ππ Sampler
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ Return ππ that satisfies π©π©ππ ππ β₯ πΈπΈ β π¨π¨ ππ
Issue 2: Donβt have access to exact values of π©π©ππ and π¨π¨ F
estimate π©π©ππ ππ and π¨π¨ ππ
π³π³ππ,ππ Sampler
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ Return ππ that satisfies π©π©ππ ππ β₯ πΈπΈ β π¨π¨ ππ
Issue 2: Donβt have access to exact values of π©π©ππ and π¨π¨ F
estimate π©π©ππ ππ and π¨π¨ ππ
Estimate norm of A using AMS
Find heaviest row using CountSketch
Estimate π©π©ππ ππ for rows with large norms
Count Sketch
Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)
#rows r =ππ(logππ)#buckets/row b = ππ(1/ππ2)
Count Sketch
ππππ
+ππππβ1(ππ)
β’ Hash βππ: ππ β [ππ]β’ Sign ππππ: ππ β {β1, +1}
β’ Update: πΆπΆ[ππ,βππ(ππ)] += ππππ(ππ) β ππππ
β’
Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)
#rows r =ππ(logππ)#buckets/row b = ππ(1/ππ2)
Count Sketch
ππππ
+ππππ
β’ Hash βππ: ππ β [ππ]β’ Sign ππππ: ππ β {β1, +1}
β’ Update: πΆπΆ[ππ,βππ(ππ)] += ππππ(ππ) β ππππ
β’βππππβ2(ππ)
Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)
#rows r =ππ(logππ)#buckets/row b = ππ(1/ππ2)
Count Sketch
ππππ
+ππππ
β’ Hash βππ: ππ β [ππ]β’ Sign ππππ: ππ β {β1, +1}
βππππ+ππππβ3(ππ)
β’ Update: πΆπΆ[ππ,βππ(ππ)] += ππππ(ππ) β ππππ
β’
Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)
#rows r =ππ(logππ)#buckets/row b = ππ(1/ππ2)
Count Sketch
ππππ
+ππππ
β’ Hash βππ: ππ β [ππ]β’ Sign ππππ: ππ β {β1, +1}
β’ Update: πΆπΆ[ππ,βππ(ππ)] += ππππ(ππ) β ππππ
β’ Estimate ππππ β ππππππππππππππ πππππΆπΆ[ππ,βππ(ππ)]βππππ
+ππππ
Given a stream of items, estimate frequency of each item (i.e., coordinates in a vector)
#rows r =ππ(logππ)#buckets/row b = ππ(1/ππ2)
Count Sketch
ππππ
+ππππβππππ
+ππππ
Estimation guaranteeππππ β ππππ β€ ππ β ππ 2
β’ Update: πΆπΆ[ππ,βππ(ππ)] += ππππ(ππ) β ππππ
β’ Estimate ππππ β ππππππππππππππ πππππΆπΆ[ππ,βππ(ππ)]
Estimate π©π©ππ ππ for rows with large norms
#rows r =ππ(logππ)#buckets/row b = ππ(1/ππ2)
Count Sketch
π΅π΅ππ
+π΅π΅ππβπ΅π΅ππ
+π΅π΅ππ
Estimation guarantee
π΅π΅ππ 2 β οΏ½π΅π΅ππ 2 β€ ππ β π΅π΅ πΉπΉ
Space usage:
ππ logππ Γ1ππ2
Γ ππ
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ
π³π³ππ,ππ Sampler
Goal: π΅π΅ππ 2 β₯ πΈπΈ β π΄π΄ πΉπΉ
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ
Step 2. β’ οΏ½π©π©ππ ππis an estimate of π©π©ππ ππ by modified Countsketchβ’ οΏ½ππ is an estimate of π¨π¨ ππ by modified AMS
π³π³ππ,ππ Sampler
Goal: π΅π΅ππ 2 β₯ πΈπΈ β π΄π΄ πΉπΉ
Test: οΏ½π΅π΅ππ 2 β₯ πΈπΈ β οΏ½πΉπΉ
Step 1. β’ pick π‘π‘ππ β [0,1] uniformly at randomβ’ set π΅π΅ππ β
ππππππ
Γ π΄π΄ππ
Step 2. β’ οΏ½π©π©ππ ππis an estimate of π©π©ππ ππ by modified Countsketchβ’ οΏ½ππ is an estimate of π¨π¨ ππ by modified AMS
The test succeeds w.p. ππ, the estimate of largest row exceeds the threshold
π³π³ππ,ππ Sampler
Goal: π΅π΅ππ 2 β₯ πΈπΈ β π΄π΄ πΉπΉ
Test: οΏ½π΅π΅ππ 2 β₯ πΈπΈ β οΏ½πΉπΉ
Handling Post-Processing Matrix
Input: matrix A as a data stream, a post-processing matrix P
Output: index ππ of a row of AP sampled w.p. ~ π΄π΄ππππ 22
π΄π΄ππ πΉπΉ2
Handling Post-Processing Matrix
Run proposed algorithm on A, then multiply by P:β’ CountSketch and AMS both are linear transformations
A is mapped to SA S (AP) = (SA) P
Input: matrix A as a data stream, a post-processing matrix P
Output: index ππ of a row of AP sampled w.p. ~ π΄π΄ππππ 22
π΄π΄ππ πΉπΉ2
Handling Post-Processing Matrix
Run proposed algorithm on A, then multiply by P:β’ CountSketch and AMS both are linear transformations
A is mapped to SA S (AP) = (SA) P
Total space for sampler: ππ( ππππ2
log2 ππ) bits
Input: matrix A as a data stream, a post-processing matrix P
Output: index ππ of a row of AP sampled w.p. ~ π΄π΄ππππ 22
π΄π΄ππ πΉπΉ2
π³π³ππ,ππ sampling with post processingInput:β’ π΄π΄ β βππΓππ as a (turnstile) streamβ’ a post-processing π·π· β βππΓππ
Output: samples an index ππ β [ππ] w.p. 1 Β± ππ π¨π¨πππ·π· 22
π¨π¨π·π· πΉπΉ2 + 1
ππππππππ(ππ) In one pass ππππππππ(ππ, ππβ1, logππ) space
Adaptive Sampler1. Simulate adaptive sampling in 1 pass
β’ πΏπΏππ,2 sampling with post processing matrix ππ
2. Applications in turnstile streamβ’ Row/column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume Maximization
3. Volume maximization lower bounds
4. Volume maximization in row arrival
Algorithm Using π³π³ππ,ππ SamplerMaintain ππ instances of πΏπΏ2,2 sampler with post processing: πΊπΊππ, β¦ ,πΊπΊππππ β β For round ππ = 1 to ππ,
β’ Set ππ β πΌπΌ βππ+ππβ’ Use πΊπΊππ to sample a noisy row ππππ of π΄π΄ with post processing matrix ππβ’ Append ππππ to ππ
Algorithm Using π³π³ππ,ππ SamplerMaintain ππ instances of πΏπΏ2,2 sampler with post processing: πΊπΊππ, β¦ ,πΊπΊππππ β β For round ππ = 1 to ππ,
β’ Set ππ β πΌπΌ βππ+ππβ’ Use πΊπΊππ to sample a noisy row ππππ of π΄π΄ with post processing matrix ππβ’ Append ππππ to ππ
Issues:X Noisy perturbation of rows (unavoidable) Sample ππ, ππππ = Aππππ + v where v has small norm v < ππ π΄π΄ππππ thus ππππ β π΄π΄ππππ
Algorithm Using π³π³ππ,ππ SamplerMaintain ππ instances of πΏπΏ2,2 sampler with post processing: πΊπΊππ, β¦ ,πΊπΊππππ β β For round ππ = 1 to ππ,
β’ Set ππ β πΌπΌ βππ+ππβ’ Use πΊπΊππ to sample a noisy row ππππ of π΄π΄ with post processing matrix ππβ’ Append ππππ to ππ
Issues:X Noisy perturbation of rows (unavoidable) Sample ππ, ππππ = Aππππ + v where v has small norm v < ππ π΄π΄ππππ thus ππππ β π΄π΄ππππ
X This can drastically change the probabilities: may zero out probabilities of some rows
Bad Example
π¨π¨ππ = (π΄π΄,ππ)
π¨π¨ππ = (ππ,ππ)
Bad Example
π¨π¨ππ
π¨π¨ππ ππππ
Bad Example
ππππ
Bad Example
π¨π¨ππ(π°π° βπ΄π΄+π΄π΄)
π¨π¨ππ(π°π° βπ΄π΄+π΄π΄) ππππ
Noisy row sampling: π¨π¨ππ(π°π° βπ΄π΄+π΄π΄) β₯ π¨π¨ππ(π°π° βπ΄π΄+π΄π΄)
Bad Example
π¨π¨ππ(π°π° βπ΄π΄+π΄π΄)
π¨π¨ππ(π°π° βπ΄π΄+π΄π΄) ππππ
Sample one row again and again
Noisy row sampling: π¨π¨ππ(π°π° βπ΄π΄+π΄π΄) β₯ π¨π¨ππ(π°π° βπ΄π΄+π΄π΄)
Bad Example
Noisy row sampling: π¨π¨ππ(π°π° βπ΄π΄+π΄π΄) β₯ π¨π¨ππ(π°π° βπ΄π΄+π΄π΄)
True row sampling: π¨π¨ππ(π°π° βπ΄π΄+π΄π΄) = 0
Bad Example
x We cannot hope for a multiplicative bound on probabilities.
Bad Example
x We cannot hope for a multiplicative bound on probabilities.
Lemma: Not only the norm of π£π£ is small in compare to π΄π΄ππ but also its norm projected away from π΄π΄ππ is small
Bad Example
x We cannot hope for a multiplicative bound on probabilities.
Lemma: Not only the norm of π£π£ is small in compare to π΄π΄ππ but also its norm projected away from π΄π΄ππ is small:
β’ ππππ = π΄π΄ππππ + π£π£
β’ where πππΈπΈ β€ ππ π΄π΄ππππ β π¨π¨π·π·πΈπΈ πππ¨π¨π·π· ππ
for any projection matrix πΈπΈ
Bad Example
x We cannot hope for a multiplicative bound on probabilities.
Lemma: Not only the norm of π£π£ is small in compare to π΄π΄ππ but also its norm projected away from π΄π΄ππ is small:
β’ ππππ = π΄π΄ππππ + π£π£
β’ where πππΈπΈ β€ ππ π΄π΄ππππ β π¨π¨π·π·πΈπΈ πππ¨π¨π·π· ππ
for any projection matrix πΈπΈ
Bad Example
x We cannot hope for a multiplicative bound on probabilities.
Lemma: Not only the norm of π£π£ is small in compare to π΄π΄ππ but also its norm projected away from π΄π΄ππ is small:
β’ ππππ = π΄π΄ππππ + π£π£
β’ where πππΈπΈ β€ ππ π΄π΄ππππ β π¨π¨π·π·πΈπΈ πππ¨π¨π·π· ππ
for any projection matrix πΈπΈ
Bound the additive error of sampling probabilities in subsequent rounds
Overview of How to Bound the ErrorSuppose indices reported by our algorithm are ππ1, β¦ , ππππConsider two bases πΌπΌ and πΎπΎβ’ πΌπΌ follows True rows: ππ = π’π’1, β¦ ,π’π’ππ s.t. {π’π’1, β¦ ,π’π’ππ} spans {π΄π΄ππ1 , β¦ ,π΄π΄ππππ}β’ πΎπΎ follows Noisy rows: ππ = π€π€1, β¦ ,π€π€ππ s.t. {π€π€1, β¦ ,π€π€ππ} spans {ππππ1 , β¦ , ππππππ}
Overview of How to Bound the Error
For row π΄π΄π₯π₯:
β’ π΄π΄π₯π₯ = βππ=1ππ πππ₯π₯,πππ’π’ππβ’ π΄π΄π₯π₯ = βππ=1ππ πππ₯π₯,πππ€π€ππ
Suppose indices reported by our algorithm are ππ1, β¦ , ππππConsider two bases πΌπΌ and πΎπΎβ’ πΌπΌ follows True rows: ππ = π’π’1, β¦ ,π’π’ππ s.t. {π’π’1, β¦ ,π’π’ππ} spans {π΄π΄ππ1 , β¦ ,π΄π΄ππππ}β’ πΎπΎ follows Noisy rows: ππ = π€π€1, β¦ ,π€π€ππ s.t. {π€π€1, β¦ ,π€π€ππ} spans {ππππ1 , β¦ , ππππππ}
Overview of How to Bound the Error
For row π΄π΄π₯π₯:
β’ π΄π΄π₯π₯ = βππ=1ππ πππ₯π₯,πππ’π’ππβ’ π΄π΄π₯π₯ = βππ=1ππ πππ₯π₯,πππ€π€ππ
Sampling probs in terms of ππ and ππ in t-th round
β’ The correct probability: βππ=π‘π‘ππ πππ₯π₯,ππ
2
βπ¦π¦=1ππ βππ=π‘π‘ππ πππ¦π¦,ππ
2
β’ What we sample from: βππ=π‘π‘ππ πππ₯π₯,ππ
2
βπ¦π¦=1ππ βππ=π‘π‘ππ πππ¦π¦,ππ
2
Suppose indices reported by our algorithm are ππ1, β¦ , ππππConsider two bases πΌπΌ and πΎπΎβ’ πΌπΌ follows True rows: ππ = π’π’1, β¦ ,π’π’ππ s.t. {π’π’1, β¦ ,π’π’ππ} spans {π΄π΄ππ1 , β¦ ,π΄π΄ππππ}β’ πΎπΎ follows Noisy rows: ππ = π€π€1, β¦ ,π€π€ππ s.t. {π€π€1, β¦ ,π€π€ππ} spans {ππππ1 , β¦ , ππππππ}
Overview of How to Bound the Error
For row π΄π΄π₯π₯:
β’ π΄π΄π₯π₯ = βππ=1ππ πππ₯π₯,πππ’π’ππβ’ π΄π΄π₯π₯ = βππ=1ππ πππ₯π₯,πππ€π€ππ
Sampling probs in terms of ππ and ππ in t-th round
β’ The correct probability: βππ=π‘π‘ππ πππ₯π₯,ππ
2
βπ¦π¦=1ππ βππ=π‘π‘ππ πππ¦π¦,ππ
2
β’ What we sample from: βππ=π‘π‘ππ πππ₯π₯,ππ
2
βπ¦π¦=1ππ βππ=π‘π‘ππ πππ¦π¦,ππ
2
Suppose indices reported by our algorithm are ππ1, β¦ , ππππConsider two bases πΌπΌ and πΎπΎβ’ πΌπΌ follows True rows: ππ = π’π’1, β¦ ,π’π’ππ s.t. {π’π’1, β¦ ,π’π’ππ} spans {π΄π΄ππ1 , β¦ ,π΄π΄ππππ}β’ πΎπΎ follows Noisy rows: ππ = π€π€1, β¦ ,π€π€ππ s.t. {π€π€1, β¦ ,π€π€ππ} spans {ππππ1 , β¦ , ππππππ}
Difference between the correct prob and our algorithm sampling prob over all rows is ππ for one round
β’ Change of basis matrix β Identity matrixβ’ Bound total variation distance by ππ
Error in each round gets propagated ππ times
Total error is O(ππ2ππ)
Theorem:Our algorithm reports a set of ππ indices such that with high probability β’ the total variation distance between the probability distribution output by the
algorithm and the probability distribution of adaptive sampling is at most ππ(ππ)
β’ The algorithm uses space ππππππππ(ππ, 1ππ
,ππ, logππ)
Applications1. Simulate adaptive sampling in 1 pass
β’ πΏπΏππ,2 sampling with post processing matrix ππ
2. Applications in turnstile streamβ’ Row/column subset selectionβ’ Subspace approximationβ’ Projective clusteringβ’ Volume Maximization
3. Volume maximization lower bounds
4. Volume maximization in row arrival
ApplicationsMain Challenge: it suffices to get a noisy perturbation of the rows
Applications: Row Subset Selection
Input: π΄π΄ β βππΓππ and an integer ππ > 0
Output: ππ rows of π΄π΄ to form π΄π΄ to minimize π΄π΄ β π΄π΄ππ+ππ πΉπΉ
Applications: Row Subset SelectionAdaptive Sampling provides a ππ + ππ ! approximation for subset selection
[DRVWβ06]: Volume Sampling provides a ππ + 1 factor approximation to row subset selection with constant probability.
[DVβ06]: Sampling probabilities for any ππ-set ππ produced by Adaptive Sampling is at most ππ! of its sampling probability with respect to volume sampling.
Applications: Row Subset SelectionAdaptive Sampling provides a ππ + ππ ! approximation for subset selection
[DRVWβ06]: Volume Sampling provides a ππ + 1 factor approximation to row subset selection with constant probability.
[DVβ06]: Sampling probabilities for any ππ-set ππ produced by Adaptive Sampling is at most ππ! of its sampling probability with respect to volume sampling.
Non-adaptive Adaptive Sampling provides a good approximation to Adaptive Sampling
Applications: Row Subset SelectionAdaptive Sampling provides a ππ + ππ ! approximation for subset selection
[DRVWβ06]: Volume Sampling provides a ππ + 1 factor approximation to row subset selection with constant probability.
[DVβ06]: Sampling probabilities for any ππ-set ππ produced by Adaptive Sampling is at most ππ! of its sampling probability with respect to volume sampling.
Non-adaptive Adaptive Sampling provides a good approximation to Adaptive Sampling
1. For a set of indices π±π± output by our algorithm, π΄π΄ πΌπΌ β π π +π π F β€ (1 + ππ) π΄π΄ πΌπΌ βππ+ππ F, w.h.p.
β’ π π : the set of noisy rows corresponding to π±π±β’ ππ: the set of true rows corresponding to π±π±
Applications: Row Subset SelectionAdaptive Sampling provides a ππ + ππ ! approximation for subset selection
[DRVWβ06]: Volume Sampling provides a ππ + 1 factor approximation to row subset selection with constant probability.
[DVβ06]: Sampling probabilities for any ππ-set ππ produced by Adaptive Sampling is at most ππ! of its sampling probability with respect to volume sampling.
Non-adaptive Adaptive Sampling provides a good approximation to Adaptive Sampling
1. For a set of indices π±π± output by our algorithm, π΄π΄ πΌπΌ β π π +π π F β€ (1 + ππ) π΄π΄ πΌπΌ βππ+ππ F, w.h.p.
β’ π π : the set of noisy rows corresponding to π±π±β’ ππ: the set of true rows corresponding to π±π±
2. For most ππ-sets π±π±, its prob. by adaptive sampling is within ππ(1) factor of Non-adaptive Sampling.
Applications: Row Subset Selection
Our Result: finds M such that,Pr[ π΄π΄ β π΄π΄π΄π΄+π΄π΄ πΉπΉ
2 β€ 16 ππ + 1 ! π΄π΄ β π΄π΄ππ πΉπΉ2 ] β₯ 2/3
β’ π΄π΄ππ: best rank-k approximation of π΄π΄β’ first one pass turnstile streaming algorithmβ’ ππππππππ(ππ,ππ, logππ) space
Input: π΄π΄ β βππΓππ and an integer ππ > 0
Output: ππ rows of π΄π΄ to form π΄π΄ to minimize π΄π΄ β π΄π΄ππ+ππ πΉπΉ
Applications: Volume Maximization
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Applications: Volume Maximization
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Volume of the parallelepiped spanned by those vectors
ππ = ππ
Applications: Volume Maximization[Civril, Magdonβ09] Greedy Algorithm Provides a ππ! approximation to Volume Maximization
Greedy
β’ For ππ rounds, pick the vector that is farthest away from the current subspace.
ππ = ππ
Applications: Volume Maximization[Civril, Magdonβ09] Greedy Algorithm Provides a ππ! approximation to Volume Maximization
Greedy
β’ For ππ rounds, pick the vector that is farthest away from the current subspace.
ππ = ππ
Applications: Volume Maximization[Civril, Magdonβ09] Greedy Algorithm Provides a ππ! approximation to Volume Maximization
Greedy
β’ For ππ rounds, pick the vector that is farthest away from the current subspace.
ππ = ππ
Applications: Volume Maximization[Civril, Magdonβ09] Greedy Algorithm Provides a ππ! approximation to Volume Maximization
Greedy
β’ For ππ rounds, pick the vector that is farthest away from the current subspace.
ππ = ππ
Applications: Volume Maximization[Civril, Magdonβ09] Greedy Algorithm Provides a ππ! approximation to Volume Maximization
Greedy
β’ For ππ rounds, pick the vector that is farthest away from the current subspace.
ππ = ππ
Applications: Volume Maximization[Civril, Magdonβ09] Greedy Algorithm Provides a ππ! approximation to Volume Maximization
Simulate Greedy
β’ Maintain ππ instances of CountSketch, AMS and π³π³ππ,ππ Sampler
Applications: Volume Maximization[Civril, Magdonβ09] Greedy Algorithm Provides a ππ! approximation to Volume Maximization
If the largest row exceeds the threshold, then it is correctly found by CountSketch w.h.p.
Simulate Greedy
β’ Maintain ππ instances of CountSketch, AMS and π³π³ππ,ππ Sampler
β’ For ππ rounds,β’ Let ππ be the row of π΄π΄ππ with largest norm //by CountSketch
Applications: Volume Maximization[Civril, Magdonβ09] Greedy Algorithm Provides a ππ! approximation to Volume Maximization
If the largest row exceeds the threshold, then it is correctly found by CountSketch w.h.p.
Otherwise, there are enough large rows and sampler chooses one of them w.h.p.
Simulate Greedy
β’ Maintain ππ instances of CountSketch, AMS and π³π³ππ,ππ Sampler
β’ For ππ rounds,β’ Let ππ be the row of π΄π΄ππ with largest norm //by CountSketch
β’ If ππ ππ < πΌπΌ2
4πππππ¨π¨π·π· ππ
ππ, instead sample row ππ according to norms of rows
Applications: Volume Maximization[Civril, Magdonβ09] Greedy Algorithm Provides a ππ! approximation to Volume Maximization
If the largest row exceeds the threshold, then it is correctly found by CountSketch w.h.p.
Otherwise, there are enough large rows and sampler chooses one of them w.h.p.
Simulate Greedy
β’ Maintain ππ instances of CountSketch, AMS and π³π³ππ,ππ Sampler
β’ For ππ rounds,β’ Let ππ be the row of π΄π΄ππ with largest norm //by CountSketch
β’ If ππ ππ < πΌπΌ2
4πππππ¨π¨π·π· ππ
ππ, instead sample row ππ according to norms of rows
β’ Add ππ to the solution, and update the postprocessing matrix ππ
Applications: Volume Maximization
Input: π΄π΄ β βππΓππ and an integer ππOutput: ππ rows ππππ, β¦ , ππππ of π΄π΄, π΄π΄, with maximum volume
Our Result: for an approximation factor πΆπΆ, finds S (set of ππ noisy rows of π¨π¨) s.t.,
Pr[πΌπΌππ ππ! Vol ππ β₯ Vol(ππ)] β₯ 2/3β’ first one pass turnstile streaming algorithmβ’ οΏ½ππ( βππππππ2 πΌπΌ2) space
Problem Model Approximation/error space Comments
π³π³ππ,ππ Sampler
turnstile
(ππ + ππ) relative + ππππππππππ ππ ππππππππ(π π , ππβππ, π₯π₯π₯π₯π₯π₯ππ)
Adaptive Sampling πΆπΆ(ππ) total variation distance ππππππππ(π π ,ππ, ππβππ, π₯π₯π₯π₯π₯π₯ππ)Row Subset Selection πΆπΆ( ππ + ππ !) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ)
Subspace Approximation
πΆπΆ( ππ + ππ !) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ)(ππ + ππ) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ ,ππ/ππ) ππππππππ(ππ,ππ/ππ) rows
Projective Clustering (ππ + ππ) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ , ππ,ππ/ππ) ππππππππ(ππ, ππ,ππ/ππ) rows
Volume Maximization
πΆπΆππ ππ! οΏ½πΆπΆ( βπππ π ππππ πΆπΆππ)πΆπΆππ ππ( βππ πππππΆπΆππ) ππ pass
Row Arrival
πͺπͺππ ππ(ππ) Random OrderοΏ½πΆπΆ πͺπͺππ ππ/ππ οΏ½πΆπΆ(πππΆπΆ(ππ/πͺπͺ)π π ) πͺπͺ < β(π₯π₯π₯π₯π₯π₯ ππ) ππ
Open problems
β’ Get tight dependence on the parametersβ’ Further applications of non-adaptive adaptive sampling
β’ Result on Volume Maximization in row arrival model is not tight, i.e., can we get ππ(ππ)ππ approximation without dependence on ππ?
Problem Model Approximation/error space Comments
π³π³ππ,ππ Sampler
turnstile
(ππ + ππ) relative + ππππππππππ ππ ππππππππ(π π , ππβππ, π₯π₯π₯π₯π₯π₯ππ)
Adaptive Sampling πΆπΆ(ππ) total variation distance ππππππππ(π π ,ππ, ππβππ, π₯π₯π₯π₯π₯π₯ππ)Row Subset Selection πΆπΆ( ππ + ππ !) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ)
Subspace Approximation
πΆπΆ( ππ + ππ !) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ)(ππ + ππ) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ ,ππ/ππ) ππππππππ(ππ,ππ/ππ) rows
Projective Clustering (ππ + ππ) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ , ππ,ππ/ππ) ππππππππ(ππ, ππ,ππ/ππ) rows
Volume Maximization
πΆπΆππ ππ! οΏ½πΆπΆ( βπππ π ππππ πΆπΆππ)πΆπΆππ ππ( βππ πππππΆπΆππ) ππ pass
Row Arrival
πͺπͺππ ππ(ππ) Random OrderοΏ½πΆπΆ πͺπͺππ ππ/ππ οΏ½πΆπΆ(πππΆπΆ(ππ/πͺπͺ)π π ) πͺπͺ < β(π₯π₯π₯π₯π₯π₯ ππ) ππ
Problem Model Approximation/error space Comments
π³π³ππ,ππ Sampler
turnstile
(ππ + ππ) relative + ππππππππππ ππ ππππππππ(π π , ππβππ, π₯π₯π₯π₯π₯π₯ππ)
Adaptive Sampling πΆπΆ(ππ) total variation distance ππππππππ(π π ,ππ, ππβππ, π₯π₯π₯π₯π₯π₯ππ)Row Subset Selection πΆπΆ( ππ + ππ !) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ)
Subspace Approximation
πΆπΆ( ππ + ππ !) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ)(ππ + ππ) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ ,ππ/ππ) ππππππππ(ππ,ππ/ππ) rows
Projective Clustering (ππ + ππ) ππππππππ(π π ,ππ, π₯π₯π₯π₯π₯π₯ππ , ππ,ππ/ππ) ππππππππ(ππ, ππ,ππ/ππ) rows
Volume Maximization
πΆπΆππ ππ! οΏ½πΆπΆ( βπππ π ππππ πΆπΆππ)πΆπΆππ ππ( βππ πππππΆπΆππ) ππ pass
Row Arrival
πͺπͺππ ππ(ππ) Random OrderοΏ½πΆπΆ πͺπͺππ ππ/ππ οΏ½πΆπΆ(πππΆπΆ(ππ/πͺπͺ)π π ) πͺπͺ < β(π₯π₯π₯π₯π₯π₯ ππ) ππ
Open problems
β’ Get tight dependence on the parametersβ’ Further applications of non-adaptive adaptive sampling
β’ Result on Volume Maximization in row arrival model is not tight, i.e., can we get ππ(ππ)ππ approximation without dependence on ππ?
Thank You!