63
Introduction Information geometry Proposed system Obtained results Conclusion IRCAM Research & Technology Seminar Change detection for audio signals in real-time Arnaud Dessein Institut de Recherche et Coordination Acoustique/Musique October 26th 2011 [email protected] October 26th 2011 IRCAM Research & Technology Seminar 1/21

IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

Embed Size (px)

Citation preview

Page 1: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

IRCAM Research & Technology Seminar

Change detection for audio signals in real-time

Arnaud DesseinInstitut de Recherche et Coordination Acoustique/Musique

October 26th 2011

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 1/21

Page 2: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What is audio change detection?

Audio change detection, segmentation, novelty detectionFinding time boundaries, called change-points, which partition a sound signalinto homogeneous and continuous temporal regions, called segments, thatexhibit inhomogeneities with the adjacent regions.

Temporality:Causality principle.On-line or real-time setups.But also off-line setups.

Homogeneity:Intrinsic homogeneity.Inhomogeneity with contiguous segments.Criterion for homogeneity.

Examples include speech, music, radio broadcasts[Kemp et al., 2000, Sundaram & Chang, 2000, Foote, 2000].

Figure: Audio change detection.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 2/21

Page 3: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What is audio change detection?

Audio change detection, segmentation, novelty detectionFinding time boundaries, called change-points, which partition a sound signalinto homogeneous and continuous temporal regions, called segments, thatexhibit inhomogeneities with the adjacent regions.

Temporality:Causality principle.On-line or real-time setups.But also off-line setups.

Homogeneity:Intrinsic homogeneity.Inhomogeneity with contiguous segments.Criterion for homogeneity.

Examples include speech, music, radio broadcasts[Kemp et al., 2000, Sundaram & Chang, 2000, Foote, 2000].

Figure: Audio change detection.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 2/21

Page 4: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What is audio change detection?

Audio change detection, segmentation, novelty detectionFinding time boundaries, called change-points, which partition a sound signalinto homogeneous and continuous temporal regions, called segments, thatexhibit inhomogeneities with the adjacent regions.

Temporality:Causality principle.On-line or real-time setups.But also off-line setups.

Homogeneity:Intrinsic homogeneity.Inhomogeneity with contiguous segments.Criterion for homogeneity.

Examples include speech, music, radio broadcasts[Kemp et al., 2000, Sundaram & Chang, 2000, Foote, 2000].

Figure: Audio change detection.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 2/21

Page 5: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What is audio change detection?

Audio change detection, segmentation, novelty detectionFinding time boundaries, called change-points, which partition a sound signalinto homogeneous and continuous temporal regions, called segments, thatexhibit inhomogeneities with the adjacent regions.

Temporality:Causality principle.On-line or real-time setups.But also off-line setups.

Homogeneity:Intrinsic homogeneity.Inhomogeneity with contiguous segments.Criterion for homogeneity.

Examples include speech, music, radio broadcasts[Kemp et al., 2000, Sundaram & Chang, 2000, Foote, 2000].

Figure: Audio change detection.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 2/21

Page 6: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we need?

Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.

Other approach with no assumption on the existence of classes:

Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].

Our approach:

No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21

Page 7: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we need?

Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.

Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.

More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].

Our approach:

No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21

Page 8: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we need?

Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.

Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].

Our approach:

No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21

Page 9: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we need?

Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.

Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].

Our approach:No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.

Real-time constraints.Modularity with various types of signals and criteria.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21

Page 10: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we need?

Approach in many works:High level criteria and automatic classification (e.g., speakers, instruments,speech/non speech, voiced/unvoiced, speech/music).Drawbacks: assumes the existence and knowledge of classes, relies on apotentially fallible classification, requires a large amount of training data.

Other approach with no assumption on the existence of classes:Onset detection [Bello et al., 2005]Speaker change detection [Siegler et al., 1997, Tritschler & Gopinath, 1999,Delacourt & Wellekens, 2000, Kotti et al., 2008, Grasic et al., 2010].Distance between frames, or statistics on the hypothesis of a change.Problem-dependent algorithms and heuristics.More generic frameworks with CuSum algorithms[Basseville & Nikiforov, 1993].Approximations for parameter estimation resulting in practical shortcomings[Omar & Chaudhari, 2005, Cont et al., 2011].

Our approach:No assumption on the existence of classes, similarly to the second approach.Control on the variation of the information content.Real-time constraints.Modularity with various types of signals and criteria.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 3/21

Page 11: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we propose?

Real-time modular change detection scheme.

Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.

Figure: Audio change detection in the framework of information geometry.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21

Page 12: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.

Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.

Figure: Audio change detection in the framework of information geometry.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21

Page 13: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.

Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.

Figure: Audio change detection in the framework of information geometry.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21

Page 14: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.

Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.

Figure: Audio change detection in the framework of information geometry.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21

Page 15: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.

Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.

Figure: Audio change detection in the framework of information geometry.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21

Page 16: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.

Quantization of each segment with an information geometric prototype.

Figure: Audio change detection in the framework of information geometry.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21

Page 17: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

ContextMotivationsContributions

What do we propose?

Real-time modular change detection scheme.Framework of information geometry for exponential families.Statistical grounds through sequential generalized likelihood ratio tests.Geometrical interpretation through dually flat Bregman geometry.Link between distance and statistic-based methods in a unified framework.Addresses the problem of CuSum approaches for parameter estimation.Quantization of each segment with an information geometric prototype.

Figure: Audio change detection in the framework of information geometry.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 4/21

Page 18: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

Outline

1 Information geometryTheoretical backgroundExponential familiesDually flat Bregman geometry

2 Proposed system

3 Obtained results

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 5/21

Page 19: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What is information geometry?

Statistical differentiable manifold.Under certain assumptions, a parametric statistical model S = {pξ : ξ ∈ Ξ} ofprobability densities on a measurable set X forms a differentiable manifold.

Example: pξ(x) =1√2πσ2

exp{− (x − µ)2

2σ2

}for all x ∈ X = R, with

ξ = [µ, σ2] ∈ Ξ = R× R++.

Fisher information metric [Rao, 1945, Chentsov, 1982].Under certain assumptions, the Fisher information matrix defines the uniqueRiemannian metric g on S.

Affine connections [Chentsov, 1982, Amari & Nagaoka, 2000].

Under certain assumptions, the α-connections ∇(α) for α ∈ R are the uniqueaffine connections on S.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 6/21

Page 20: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What is information geometry?

Statistical differentiable manifold.Under certain assumptions, a parametric statistical model S = {pξ : ξ ∈ Ξ} ofprobability densities on a measurable set X forms a differentiable manifold.

Example: pξ(x) =1√2πσ2

exp{− (x − µ)2

2σ2

}for all x ∈ X = R, with

ξ = [µ, σ2] ∈ Ξ = R× R++.

Fisher information metric [Rao, 1945, Chentsov, 1982].Under certain assumptions, the Fisher information matrix defines the uniqueRiemannian metric g on S.

Affine connections [Chentsov, 1982, Amari & Nagaoka, 2000].

Under certain assumptions, the α-connections ∇(α) for α ∈ R are the uniqueaffine connections on S.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 6/21

Page 21: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What is information geometry?

Statistical differentiable manifold.Under certain assumptions, a parametric statistical model S = {pξ : ξ ∈ Ξ} ofprobability densities on a measurable set X forms a differentiable manifold.

Example: pξ(x) =1√2πσ2

exp{− (x − µ)2

2σ2

}for all x ∈ X = R, with

ξ = [µ, σ2] ∈ Ξ = R× R++.

Fisher information metric [Rao, 1945, Chentsov, 1982].Under certain assumptions, the Fisher information matrix defines the uniqueRiemannian metric g on S.

Affine connections [Chentsov, 1982, Amari & Nagaoka, 2000].

Under certain assumptions, the α-connections ∇(α) for α ∈ R are the uniqueaffine connections on S.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 6/21

Page 22: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What is information geometry?

Statistical differentiable manifold.Under certain assumptions, a parametric statistical model S = {pξ : ξ ∈ Ξ} ofprobability densities on a measurable set X forms a differentiable manifold.

Example: pξ(x) =1√2πσ2

exp{− (x − µ)2

2σ2

}for all x ∈ X = R, with

ξ = [µ, σ2] ∈ Ξ = R× R++.

Fisher information metric [Rao, 1945, Chentsov, 1982].Under certain assumptions, the Fisher information matrix defines the uniqueRiemannian metric g on S.

Affine connections [Chentsov, 1982, Amari & Nagaoka, 2000].

Under certain assumptions, the α-connections ∇(α) for α ∈ R are the uniqueaffine connections on S.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 6/21

Page 23: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What are exponential families?

Exponential family [Darmois, 1935, Koopman, 1936, Pitman, 1936].

pθ(x) = exp(θ>T (x)− F (θ) + C(x)

)for all x ∈ X .

Characteristics:θ: natural parameters in a non-empty convex open set Θ ⊆ Rd .F (θ): log-normalizer, smooth strictly convex function on Θ.C(x): carrier measure, measurable function on X .T (x): sufficient statistic, measurable function on X .

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 7/21

Page 24: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What are exponential families?

Exponential family [Darmois, 1935, Koopman, 1936, Pitman, 1936].

pθ(x) = exp(θ>T (x)− F (θ) + C(x)

)for all x ∈ X .

Characteristics:θ: natural parameters in a non-empty convex open set Θ ⊆ Rd .F (θ): log-normalizer, smooth strictly convex function on Θ.C(x): carrier measure, measurable function on X .T (x): sufficient statistic, measurable function on X .

A taxonomy of probability measures

Probability measure

Parametric Non-parametric

Exponential families Non-exponential families

Uniform Cauchy Levy skew α-stableUnivariate Multivariate

uniparameter multi-parameter

Dirichlet Weibull

GaussianRayleigh

Bernoulli

Binomial

Exponential

Poisson

Gamma ΓBeta β

Bi-parameter

Multinomial

c© 2009, Frank Nielsen — p. 62/129

Figure: A taxonomy of probability measures [Nielsen & Garcia, 2009].

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 7/21

Page 25: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What is the canonical geometry of exponential families?F possesses a conjugate F ?, which is a smooth strictly convex functiondefined by the Legendre-Fenchel transform F ?(η) = supθ∈Θ θTη − F (θ)for all η ∈ H.The expectation parameters η form another coordinate system of S andwe have the relations η = ∇F (θ) and θ = ∇F ?(η).

Link with maximum likelihood estimation through ηmle =1n∑n

j=1 T (x j).

(S, g ,∇(1),∇(−1)) is a dually flat space in which the natural parameters θand the expectation parameters η are dual affine coordinate systems.It generalizes the self-dual Euclidean geometry, with two dual Bregmandivergences BF and BF? instead of the self-dual Euclidean distance.

Bregman divergence [Bregman, 1967].

Bφ(ξ ‖ ξ′) = φ(ξ)− φ(ξ′)− (ξ − ξ′)>∇φ(ξ′).

Canonical divergences of dually flat spaces, bijection with exponentialfamilies [Amari & Nagaoka, 2000, Banerjee et al., 2005]:DKL(pξ ‖ pξ′) = BF (θ′ ‖ θ) = BF?(η ‖ η′).Generic algorithms that handle many distances [Dessein & Cont, 2011a].

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 8/21

Page 26: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What is the canonical geometry of exponential families?F possesses a conjugate F ?, which is a smooth strictly convex functiondefined by the Legendre-Fenchel transform F ?(η) = supθ∈Θ θTη − F (θ)for all η ∈ H.The expectation parameters η form another coordinate system of S andwe have the relations η = ∇F (θ) and θ = ∇F ?(η).

Link with maximum likelihood estimation through ηmle =1n∑n

j=1 T (x j).(S, g ,∇(1),∇(−1)) is a dually flat space in which the natural parameters θand the expectation parameters η are dual affine coordinate systems.It generalizes the self-dual Euclidean geometry, with two dual Bregmandivergences BF and BF? instead of the self-dual Euclidean distance.

Bregman divergence [Bregman, 1967].

Bφ(ξ ‖ ξ′) = φ(ξ)− φ(ξ′)− (ξ − ξ′)>∇φ(ξ′).

Canonical divergences of dually flat spaces, bijection with exponentialfamilies [Amari & Nagaoka, 2000, Banerjee et al., 2005]:DKL(pξ ‖ pξ′) = BF (θ′ ‖ θ) = BF?(η ‖ η′).Generic algorithms that handle many distances [Dessein & Cont, 2011a].

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 8/21

Page 27: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What is the canonical geometry of exponential families?F possesses a conjugate F ?, which is a smooth strictly convex functiondefined by the Legendre-Fenchel transform F ?(η) = supθ∈Θ θTη − F (θ)for all η ∈ H.The expectation parameters η form another coordinate system of S andwe have the relations η = ∇F (θ) and θ = ∇F ?(η).

Link with maximum likelihood estimation through ηmle =1n∑n

j=1 T (x j).(S, g ,∇(1),∇(−1)) is a dually flat space in which the natural parameters θand the expectation parameters η are dual affine coordinate systems.It generalizes the self-dual Euclidean geometry, with two dual Bregmandivergences BF and BF? instead of the self-dual Euclidean distance.

Bregman divergence [Bregman, 1967].

Bφ(ξ ‖ ξ′) = φ(ξ)− φ(ξ′)− (ξ − ξ′)>∇φ(ξ′).

Canonical divergences of dually flat spaces, bijection with exponentialfamilies [Amari & Nagaoka, 2000, Banerjee et al., 2005]:DKL(pξ ‖ pξ′) = BF (θ′ ‖ θ) = BF?(η ‖ η′).Generic algorithms that handle many distances [Dessein & Cont, 2011a].

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 8/21

Page 28: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Theoretical backgroundExponential familiesDually flat Bregman geometry

What is the canonical geometry of exponential families?

Figure: Geometrical viewpoint [Nielsen & Nock, 2009].

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 8/21

Page 29: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

Outline

1 Information geometry

2 Proposed systemGeneral architectureSound descriptors modelingChange detection

3 Obtained results

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 9/21

Page 30: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to segment audio streams?

Architecture:1 Represent the incoming audio stream with short-time

sound descriptors x j .2 Model the features x j with probability distributions

pξj from a given statistical family.3 Detect when a change in the parameters ξj occurs.

Sequential change detection:1 Accumulate the incoming observations x j in a

growing window x = (x1, . . . , xn).2 Incrementally try to detect a change at any time i of

the window until a change is detected.3 Discard the observations and start again with an

initial window x = (x i+1, . . . , xn).

Reduces to finding one change-point in a givenwindow x = (x1, . . . , xn).

Figure: Segmentation at time t.

Figure: Schema of thegeneral architecture ofthe system.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 10/21

Page 31: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to segment audio streams?

Architecture:1 Represent the incoming audio stream with short-time

sound descriptors x j .2 Model the features x j with probability distributions

pξj from a given statistical family.3 Detect when a change in the parameters ξj occurs.

Sequential change detection:1 Accumulate the incoming observations x j in a

growing window x = (x1, . . . , xn).2 Incrementally try to detect a change at any time i of

the window until a change is detected.3 Discard the observations and start again with an

initial window x = (x i+1, . . . , xn).

Reduces to finding one change-point in a givenwindow x = (x1, . . . , xn).

Figure: Segmentation at time t.

Figure: Schema of thegeneral architecture ofthe system.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 10/21

Page 32: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to segment audio streams?

Architecture:1 Represent the incoming audio stream with short-time

sound descriptors x j .2 Model the features x j with probability distributions

pξj from a given statistical family.3 Detect when a change in the parameters ξj occurs.

Sequential change detection:1 Accumulate the incoming observations x j in a

growing window x = (x1, . . . , xn).2 Incrementally try to detect a change at any time i of

the window until a change is detected.3 Discard the observations and start again with an

initial window x = (x i+1, . . . , xn).

Reduces to finding one change-point in a givenwindow x = (x1, . . . , xn).

Figure: Segmentation at time t.

Figure: Schema of thegeneral architecture ofthe system.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 10/21

Page 33: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to model sounds?Computation of a sound descriptor x j :

Fourier or constant-Q transforms for information on the spectral content.Mel-frequency cepstral coefficients for information on the timbre.Many other possibilities.

Modeling with a probability distribution pξj from a statistical family:Categorical distributions.Multivariate Gaussian distributions.Many other possibilities.

Figure: Sound descriptors modeling.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 11/21

Page 34: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to model sounds?Computation of a sound descriptor x j :

Fourier or constant-Q transforms for information on the spectral content.Mel-frequency cepstral coefficients for information on the timbre.Many other possibilities.

Modeling with a probability distribution pξj from a statistical family:Categorical distributions.Multivariate Gaussian distributions.Many other possibilities.

Figure: Sound descriptors modeling.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 11/21

Page 35: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 36: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given statistical modelS = {pξ : ξ ∈ Ξ} and are identically distributed before (resp., after) change.

Usual approach [Basseville & Nikiforov, 1993]:Assume that ξ0 and ξ1 before and after change are known:H0: x1, . . . , xn ∼ pξ0 .H i1: x1, . . . , x i ∼ pξ0 , and x i+1, . . . , xn ∼ pξ1 .

CuSum test can be employed by thresholding the likelihood ratio:

12

LRi = logp(x |H i

1)

p(x |H0)= log

∏ij=1 pξ0 (x j )

∏nj=i+1 pξ1 (x j )∏i

j=1 pξ0 (x j )∏n

j=i+1 pξ0 (x j )=

n∑j=i+1

logpξ1 (x j )pξ0 (x j )

.

ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.ξ0 unknown: the test cannot be written in its simple form anymore.

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 37: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given statistical modelS = {pξ : ξ ∈ Ξ} and are identically distributed before (resp., after) change.

Usual approach [Basseville & Nikiforov, 1993]:Assume that ξ0 and ξ1 before and after change are known:H0: x1, . . . , xn ∼ pξ0 .H i1: x1, . . . , x i ∼ pξ0 , and x i+1, . . . , xn ∼ pξ1 .

CuSum test can be employed by thresholding the likelihood ratio:

12

LRi = logp(x |H i

1)

p(x |H0)= log

∏ij=1 pξ0 (x j )

∏nj=i+1 pξ1 (x j )∏i

j=1 pξ0 (x j )∏n

j=i+1 pξ0 (x j )=

n∑j=i+1

logpξ1 (x j )pξ0 (x j )

.

ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.ξ0 unknown: the test cannot be written in its simple form anymore.

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 38: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given statistical modelS = {pξ : ξ ∈ Ξ} and are identically distributed before (resp., after) change.

Usual approach [Basseville & Nikiforov, 1993]:Assume that ξ0 and ξ1 before and after change are known:H0: x1, . . . , xn ∼ pξ0 .H i1: x1, . . . , x i ∼ pξ0 , and x i+1, . . . , xn ∼ pξ1 .

CuSum test can be employed by thresholding the likelihood ratio:

12

LRi = logp(x |H i

1)

p(x |H0)= log

∏ij=1 pξ0 (x j )

∏nj=i+1 pξ1 (x j )∏i

j=1 pξ0 (x j )∏n

j=i+1 pξ0 (x j )=

n∑j=i+1

logpξ1 (x j )pξ0 (x j )

.

ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.

ξ0 unknown: the test cannot be written in its simple form anymore.

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 39: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given statistical modelS = {pξ : ξ ∈ Ξ} and are identically distributed before (resp., after) change.

Usual approach [Basseville & Nikiforov, 1993]:Assume that ξ0 and ξ1 before and after change are known:H0: x1, . . . , xn ∼ pξ0 .H i1: x1, . . . , x i ∼ pξ0 , and x i+1, . . . , xn ∼ pξ1 .

CuSum test can be employed by thresholding the likelihood ratio:

12

LRi = logp(x |H i

1)

p(x |H0)= log

∏ij=1 pξ0 (x j )

∏nj=i+1 pξ1 (x j )∏i

j=1 pξ0 (x j )∏n

j=i+1 pξ0 (x j )=

n∑j=i+1

logpξ1 (x j )pξ0 (x j )

.

ξ1 unknown: CuSum test can still be employed by computing generalizedlikelihood ratio statistics where we replace ξ1 with ξi1.ξ0 unknown: the test cannot be written in its simple form anymore.

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 40: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.

Proposed approach for exponential families [Dessein & Cont, 2011b]:θ0 and θ1 unknown:H0: x1, . . . , xn ∼ pθ0 .H i1: x1, . . . , x i ∼ pθi0 , and x i+1, . . . , xn ∼ pθi1 .

Generalized likelihood ratio now becomes:

1

2GLRi = log

∏ij=1 p

θi0(xj )

∏nj=i+1 p

θi1(xj )∏i

j=1 pθ0

(xj )∏nj=i+1 p

θ0(xj )

=i∑

j=1log

pθi0

(xj )

pθ0

(xj )+

n∑j=i+1

logpθi1

(xj )

pθ0

(xj )

=i∑

j=1

((θi0 − θ0)

>T (xj ) − F (θi0) + F (θ0))

+n∑

j=i+1

((θi1 − θ0)

>T (xj ) − F (θi1) + F (θ0))

= i(F (θ0) − F (θi0) + (θi0 − θ0)

>∇F (θi0mle ))

+ (n − i)(F (θ0) − F (θi1) + (θi1 − θ0)

>∇F (θi1mle )).

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 41: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.

Proposed approach for exponential families [Dessein & Cont, 2011b]:θ0 and θ1 unknown:H0: x1, . . . , xn ∼ pθ0 .H i1: x1, . . . , x i ∼ pθi0 , and x i+1, . . . , xn ∼ pθi1 .

Generalized likelihood ratio now becomes:

1

2GLRi = log

∏ij=1 p

θi0(xj )

∏nj=i+1 p

θi1(xj )∏i

j=1 pθ0

(xj )∏nj=i+1 p

θ0(xj )

=i∑

j=1log

pθi0

(xj )

pθ0

(xj )+

n∑j=i+1

logpθi1

(xj )

pθ0

(xj )

=i∑

j=1

((θi0 − θ0)

>T (xj ) − F (θi0) + F (θ0))

+n∑

j=i+1

((θi1 − θ0)

>T (xj ) − F (θi1) + F (θ0))

= i(F (θ0) − F (θi0) + (θi0 − θ0)

>∇F (θi0mle ))

+ (n − i)(F (θ0) − F (θi1) + (θi1 − θ0)

>∇F (θi1mle )).

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 42: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.

Test statistics12

GLRi = i(DKL

(pθi0mle

∥∥∥pθ0)−DKL(pθi0mle

∥∥∥pθi0

))+(n−i)

(DKL

(pθi1mle

∥∥∥pθ0)−DKL(pθi1mle

∥∥∥pθi1

)).

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 43: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.

Test statistics12

GLRi = i(DKL

(pθi0mle

∥∥∥pθ0)−DKL(pθi0mle

∥∥∥pθi0

))+(n−i)

(DKL

(pθi1mle

∥∥∥pθ0)−DKL(pθi1mle

∥∥∥pθi1

)).

Information-geometric interpretation.

Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 44: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.

Test statistics12

GLRi = i(DKL

(pθi0mle

∥∥∥pθ0)−DKL(pθi0mle

∥∥∥pθi0

))+(n−i)

(DKL

(pθi1mle

∥∥∥pθ0)−DKL(pθi1mle

∥∥∥pθi1

)).

Information-geometric interpretation.Encompasses statistic and distance-based methods.

Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 45: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

General architectureSound descriptors modelingChange detection

How to detect a change?Problem

Detect one change-point in a given window x = (x1, . . . , xn).

Assumptions

The samples x j are drawn independently from a given exponential modelS = {pθ : θ ∈ Θ} and are identically distributed before (resp., after) change.

Test statistics12

GLRi = i(DKL

(pθi0mle

∥∥∥pθ0)−DKL(pθi0mle

∥∥∥pθi0

))+(n−i)

(DKL

(pθi1mle

∥∥∥pθ0)−DKL(pθi1mle

∥∥∥pθi1

)).

Information-geometric interpretation.Encompasses statistic and distance-based methods.Computationally efficient updates when considering the mles:

12 GLRi = iF ?(ηi0mle) + (n − i)F ?(ηi1mle)− nF ?(η0mle).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 12/21

Page 46: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Synthetic dataReal-world dataAudio data

Outline

1 Information geometry

2 Proposed system

3 Obtained resultsSynthetic dataReal-world dataAudio data

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 13/21

Page 47: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Synthetic dataReal-world dataAudio data

Fixed-variance univariate normal densities

50 100 150 200 250 300 350 400 450 500

0.020.040.06

Computation time

50 100 150 200 250 300 350 400 450 50020406080100

Maximum generalized likelihood ratio

50 100 150 200 250 300 350 400 450 500−101

Series

50 100 150 200 250 300 350 400 450 500−101

Change detection

1 2 3−101

Original parameters

1 2 3−101

Estimated parameters

Generalized likelihood ratio

100 200 300 400 500

50

100

150

200

250

300

350

400

450

500

0

10

20

30

40

50

60

70

80

90

100

Figure: Change detection in fixed-variance univariate normal data.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 14/21

Page 48: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Synthetic dataReal-world dataAudio data

Univariate exponential densities

50 100 150 200 250 300 350 400 450 500

0.020.040.06

Computation time

50 100 150 200 250 300 350 400 450 50020406080100

Maximum generalized likelihood ratio

50 100 150 200 250 300 350 400 450 500

204060

Series

50 100 150 200 250 300 350 400 450 500

204060

Change detection

1 2 3012

Original parameters

1 2 3012

Estimated parameters

Generalized likelihood ratio

100 200 300 400 500

50

100

150

200

250

300

350

400

450

500

10

20

30

40

50

60

70

80

90

100

Figure: Change detection in univariate exponential data.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 15/21

Page 49: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Synthetic dataReal-world dataAudio data

Multivariate normal densities

50 100 150 200 250 300 350 400 450 500

0.020.040.060.08

Computation time

50 100 150 200 250 300 350 400 450 50080

100120

Maximum generalized likelihood ratio

50 100 150 200 250 300 350 400 450 500

−202

Series

50 100 150 200 250 300 350 400 450 500

−202

Change detection

1 2 3−2

0

2Original parameters

1 2 3−2

0

2Estimated parameters

Generalized likelihood ratio

100 200 300 400 500

50

100

150

200

250

300

350

400

450

500

0

20

40

60

80

100

120

Figure: Change detection in multivariate normal data.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 16/21

Page 50: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Synthetic dataReal-world dataAudio data

Well-log

100 200 300 400 500 600 700 800 900 1000

0.020.040.06

Computation time

100 200 300 400 500 600 700 800 900 10005

101520

Maximum generalized likelihood ratio

100 200 300 400 500 600 700 800 900 1000−4−202

Time series

100 200 300 400 500 600 700 800 900 1000−4−202

Segmentation

0 2 4 6 8 10 12 14 16 18−505

Estimated parameters

Figure: Segmentation of well-log data.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 17/21

Page 51: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Synthetic dataReal-world dataAudio data

Daily log-return of the Dow Jones

2000 4000 6000 8000 10000 12000 14000 16000 18000

0.020.040.06

Computation time

2000 4000 6000 8000 10000 12000 14000 16000 18000

200400600

Maximum generalized likelihood ratio

2000 4000 6000 8000 10000 12000 14000 16000 18000−20−10

010

Time series

2000 4000 6000 8000 10000 12000 14000 16000 18000−20−10

010

Segmentation

0 5 10 15 20 25 30−100

0100

Estimated parameters

Figure: Segmentation of the daily log-return of the Dow Jones.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 18/21

Page 52: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Synthetic dataReal-world dataAudio data

Speech

0 2 4 6 8 10 12 14−1

0

1Original audio

Time (s)

Frame number

Fra

me

num

ber

Generalized likelihood ratio

100 200 300 400 500 600

100

200

300

400

500

600

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

Figure: Speaker change detection in a speech fragment.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 19/21

Page 53: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Synthetic dataReal-world dataAudio data

Polyphonic music

−1

0

1Original audio

0 5 10 15 20 25 30 35F2

A2#

D3#

G3#

C4#

F4#

B4

E5

A5

Time (s)

Pitc

h

Piano Roll

Figure: Note change detection in a polyphonic music excerpt.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 20/21

Page 54: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

What we (don’t) have

Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21

Page 55: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

What we (don’t) have

Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.

Many possibilities.Combinations of descriptors.Feature selection.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21

Page 56: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

What we (don’t) have

Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.

Exponential families and Bregman divergences, mixture models.Model selection.Other geometries, divergences, test statistics.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21

Page 57: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

What we (don’t) have

Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.

Assumption of quasi-stationarity.Non-stationarity modeling.Conditional distributions, linear/non-linear systems.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21

Page 58: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

What we (don’t) have

Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.

Evaluation on large datasets in audio and other domains.Onset detection, music segmentation, speaker segmentation, etc.First stage in real-time systems for polyphonic music transcription,music similarity analysis, computer-assisted improvisation.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21

Page 59: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

What we (don’t) have

Summary and perspectives.Representations.Descriptors modeling.Temporality of events.Applications.

Thank you for your attention! Questions?This work was supported by a doctoral fellowship from the UPMC(EDITE) and by a grant from the JST-CNRS ICT (Improving theVR Experience).

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 21/21

Page 60: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Bibliography I

Amari, S.-i. & Nagaoka, H. (2000).Methods of information geometry, volume 191 of Translations of Mathematical Monographs.American Mathematical Society.

Banerjee, A., Merugu, S., Dhillon, I. S., & Ghosh, J. (2005).Clustering with Bregman divergences.Journal of Machine Learning Research, 6, 1705–1749.

Basseville, M. & Nikiforov, V. (1993).Detection of abrupt changes: Theory and application.Englewood Cliffs, NJ, USA: Prentice-Hall, Inc.

Bello, J. P., Daudet, L., Abdallah, S., Duxbury, C., Davies, M., & Sandler, M. B. (2005).A tutorial on onset detection in music signals.IEEE Transactions on Speech and Audio Processing, 13(5), 1035–1047.

Bregman, L. M. (1967).The relaxation method of finding the common point of convex sets and its application to the solution of problems in convexprogramming.USSR Computational Mathematics and Mathematical Physics, 7(3), 200–217.

Chentsov, N. N. (1982).Statistical decision rules and optimal inference, volume 53 of Translations of Mathematical Monographs.American Mathematical Society.

Cont, A., Dubnov, S., & Assayag, G. (2011).On the information geometry of audio streams with applications to similarity computing.IEEE Transactions on Audio, Speech and Language Processing, 19(4), 837–846.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 22/21

Page 61: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Bibliography II

Darmois, G. (1935).Sur les lois de probabilités à estimation exhaustive.Comptes Rendus des Séances de l’Académie des Sciences, 200, 1265–1266.

Delacourt, P. & Wellekens, C. J. (2000).DISTBIC: A speaker-based segmentation for audio data indexing.Speech Communication, 32(1–2), 111–126.

Dessein, A. & Cont, A. (2011a).Applications of information geometry to audio signal processing.In 14th International Conference on Digital Audio Effects (DAFx) Paris, France.

Dessein, A. & Cont, A. (2011b).Information-geometric approach to real-time audio change detection.Submitted.

Foote, J. (2000).Automatic audio segmentation using a measure of audio novelty.In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME), volume 1 (pp. 452–455). New York City,NY, USA.

Grasic, M., Kos, M., & Kacic, Z. (2010).Online speaker segmentation and clusteringusing cross-likelihood ratio calculation with reference criterion selection.IET Signal Processing, 4(6), 673–685.

Kemp, T., Schmidt, M., Westphal, M., & Waibel, A. (2000).Strategies for automatic segmentation of audio data.In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 3 (pp.1423–1426). Istanbul, Turquie.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 23/21

Page 62: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Bibliography III

Koopman, B. O. (1936).On distributions admitting a sufficient statistic.Transactions of the American Mathematical Society, 39(3), 399–409.

Kotti, M., Benetos, E., & Kotropoulos, C. (2008).Computationally efficient and robust BIC-based speaker segmentation.IEEE Transactions on Audio, Speech, and Language Processing, 16(5), 920–933.

Nielsen, F. & Garcia, V. (2009).Statistical exponential families: A digest with flash cards.

Nielsen, F. & Nock, R. (2009).Sided and symmetrized Bregman centroids.IEEE Transactions on Information Theory, 55(6), 2882–2904.

Omar, M. K. & Chaudhari, U. (2005).Blind change detection for audio segmentation.In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 501–504).Philadelphie, PA, USA.

Pitman, E. J. G. (1936).Sufficient statistics and intrinsic accuracy.Mathematical Proceedings of the Cambridge Philosophical Society, 32(4), 567–579.

Rao, C. R. (1945).Information and accuracy attainable in the estimation of statistical parameters.Bulletin of the Calcutta Mathematical Society, 37, 81–91.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 24/21

Page 63: IRCAM Research & Technology Seminarimtr.ircam.fr/imtr/images/Talk_audioseg_rt.pdf · Introduction Informationgeometry Proposedsystem Obtainedresults Conclusion Context Motivations

IntroductionInformation geometry

Proposed systemObtained results

Conclusion

Bibliography IV

Siegler, M. A., Jain, U., Raj, B., & Stern, R. M. (1997).Automatic segmentation, classification and clustering of broadcast news audio.In Proceedings of the DARPA Speech Recognition Workshop (pp. 97–99). Chantilly, VA, USA.

Sundaram, H. & Chang, S.-F. (2000).Audio scene segmentation using multiple features, models and time scales.In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), volume 4 (pp.2441–2444). Istanbul, Turquie.

Tritschler, A. & Gopinath, R. A. (1999).Improved speaker segmentation and segments clustering using the Bayesian information criterion.In Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech), volume 2 (pp.679–682). Budapest, Hongrie.

[email protected] October 26th 2011 IRCAM Research & Technology Seminar 25/21