19
Submitted to International Journal of Computer Vision manuscript No. (will be inserted by the editor) Rigid-Motion Scattering for Texture Classification Laurent Sifre · St´ ephane Mallat Received: date / Accepted: date Abstract A rigid-motion scattering computes adaptive invariants along translations and rotations, with a deep convolutional network. Convolutions are calculated on the rigid-motion group, with wavelets defined on the translation and rotation variables. It preserves joint rotation and translation information, while providing global invariants at any desired scale. Texture classification is studied, through the characterization of stationary processes from a single realization. State-of-the-art results are obtained on multiple texture data bases, with important rotation and scaling variabilities. Keywords Deep network · scattering · wavelet · rigid-motion · texture · classification Work supported by ANR 10-BLAN-0126 and Advanced ERC InvariantClass 320959 L. Sifre CMAP Ecole Polytechnique Route de Saclay, 91128 Palaiseau France E-mail: [email protected] S. Mallat epartement d’informatique ´ Ecole normale sup´ erieure 45 rue d’Ulm F-75230 Paris Cedex 05 France 1 Introduction Image classification requires to find representations which reduce non-informative intra-class variabil- ity, and hence which are partly invariant, while pre- serving discriminative information across classes. Deep neural networks build hierarchical invariant representations by applying a succession of linear and non-linear operators which are learned from training data. They provide state of the art results for complex image classifications tasks (Hinton & Salakhutdinov, 2006; Lecun et al., 2010; Sermanet et al., 2013; Krizhevsky et al., 2012; Dean et al., 2012). A major issue is to understand the proper- ties of these networks, what needs to be learned and what is generic and common to most image classi- fication problems. Translations, rotations and scal- ing are common sources of variability for most im- ages, because of changes of view points and per- spective projections of three dimensional surfaces. Building adaptive invariants to such transformation is usually considered as a first necessary steps for classification (Poggio et al., 2012). We concentrate on this generic part, which is adapted to the physical properties of the imaging environment, as opposed to the specific content of images which needs to be learned. arXiv:1403.1687v1 [cs.CV] 7 Mar 2014

arXiv:1403.1687v1 [cs.CV] 7 Mar 2014L. Sifre CMAP Ecole Polytechnique Route de Saclay, 91128 Palaiseau France E-mail: [email protected] S. Mallat Departement d’informatique´

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • Submitted to International Journal of Computer Vision manuscript No.(will be inserted by the editor)

    Rigid-Motion Scattering for Texture Classification

    Laurent Sifre · Stéphane Mallat

    Received: date / Accepted: date

    Abstract A rigid-motion scattering computesadaptive invariants along translations and rotations,with a deep convolutional network. Convolutionsare calculated on the rigid-motion group, withwavelets defined on the translation and rotationvariables. It preserves joint rotation and translationinformation, while providing global invariants atany desired scale. Texture classification is studied,through the characterization of stationary processesfrom a single realization. State-of-the-art resultsare obtained on multiple texture data bases, withimportant rotation and scaling variabilities.

    Keywords Deep network · scattering · wavelet ·rigid-motion · texture · classification

    Work supported by ANR 10-BLAN-0126 and AdvancedERC InvariantClass 320959

    L. SifreCMAP Ecole PolytechniqueRoute de Saclay, 91128 Palaiseau FranceE-mail: [email protected]

    S. MallatDépartement d’informatiqueÉcole normale supérieure45 rue d’Ulm F-75230 Paris Cedex 05 France

    1 Introduction

    Image classification requires to find representationswhich reduce non-informative intra-class variabil-ity, and hence which are partly invariant, while pre-serving discriminative information across classes.Deep neural networks build hierarchical invariantrepresentations by applying a succession of linearand non-linear operators which are learned fromtraining data. They provide state of the art resultsfor complex image classifications tasks (Hinton &Salakhutdinov, 2006; Lecun et al., 2010; Sermanetet al., 2013; Krizhevsky et al., 2012; Dean et al.,2012). A major issue is to understand the proper-ties of these networks, what needs to be learned andwhat is generic and common to most image classi-fication problems. Translations, rotations and scal-ing are common sources of variability for most im-ages, because of changes of view points and per-spective projections of three dimensional surfaces.Building adaptive invariants to such transformationis usually considered as a first necessary steps forclassification (Poggio et al., 2012). We concentrateon this generic part, which is adapted to the physicalproperties of the imaging environment, as opposedto the specific content of images which needs to belearned.

    arX

    iv:1

    403.

    1687

    v1 [

    cs.C

    V]

    7 M

    ar 2

    014

  • 2 Laurent Sifre, Stéphane Mallat

    This paper defines deep convolution scatteringnetworks which can provide invariant to translationsand rotations, and hence to rigid motions in R2. Thelevel of invariance is adapted to the classificationtask. Scattering transforms have been introducedto build translation invariant representations, whichare stable to deformations (Mallat, 2012), with ap-plications to image classification (Bruna & Mal-lat, 2013). They are implemented as a convolutionalnetwork, with successive spatial wavelet convolu-tions at each layer. Translations is a simple commu-tative group, parameterized by the location of theinput pixels. Rigid-motions is a non-commutativegroup whose parameters are not explicitly given bythe input image, which raises new issues. The firstone is to understand how to represent the joint infor-mation between translations and rotations. We shallexplain why separating both variables leads to im-portant loss of information and yields representa-tions which are not sufficiently discriminative. Thisleads to the construction of a scattering transform onthe full rigid-motion group, with rigid-motion con-volutions on the joint rotation and translation vari-ables. Rotations variables are explicitly introducedin the second network layer, where convolutionsare performed on the rigid-motion group along thejoint translation and rotation variables. As opposedto translation scattering where linear transforms areperformed along spatial variables only, rigid-motionscattering recombines the new variables created atthe second network layer, which is usually donein deep neural networks. However, a rigid-motionscattering involves no learning since convolutionsare computed with predefined wavelets along spatialand rotation variables. The stability is guaranteed byits contraction properties, which are explained.

    We study applications of rigid-motion scatter-ing to texture classification, where translations, ro-tations and scaling are major sources of variability.Image textures can be modeled as stationary pro-cesses, which are typically non Gaussian and nonMarkovian, with long range dependencies. Texturerecognition is a fundamental problem of visual per-ception, with applications to medical, satellite imag-

    ing, material recognition (Lazebnik et al., 2005; Xuet al., 2010; Liu et al., 2011), object or scene recog-nition (Renninger & Malik, 2004). Recognition isperformed from a single image, and hence can notinvolve high order moments, because their estima-tors have a variance which is too large. Finding alow-variance ergodic representation, which can dis-criminate these non-Gaussian stationary processes,is a fundamental probability and statistical issue.

    Translation invariant scattering representationof stationary processes have been studied to dis-criminate texture which do not involve important ro-tation or scaling variability (Bruna & Mallat, 2013;Bruna et al., 2013). These results are extended tojoint translation and rotation invariance. Invarianceto scaling variability is incorporated through linearprojectors. It provides effective invariants, whichyield state of the art classification results on a largerange of texture data bases.

    Section 2 reviews the construction of transla-tion invariant scattering transforms. Section 2.4 ex-plains why invariants to rigid motion can not becomputed by separating the translation and rota-tion variables, without loosing important informa-tion. Joint translation and rotation operators definesa rigid motion group, also called special Euclideangroup. Rigid-motion scattering transforms are stud-ied in Section 3. Convolutions on the rigid-motiongroup are introduced in Section 3.1 in order to de-fine wavelet tranforms over this group. Their prop-erties are described in Section 3.2. A rigid-motionscattering iteratively computes the modulus of suchwavelet transforms. The wavelet transforms jointlyprocess translations and rotations, but can be com-puted with separable convolutions along spatial androtation variables. A fast filter bank implementationis described in Section 4, with a cascade of spa-tial convolutions and downsampling. Invariant scat-tering representations are applied to image textureclassification in Section 5. State of the art results onfour texture datasets containing different types andranges of variability (KTH TIPS, 2004; UIUC Tex,2005; UMD, 2009; FMD, 2009). All numerical ex-

  • Rigid-Motion Scattering for Texture Classification 3

    periments are reproducible with the ScatNet (Scat-Net, 2013) MATLAB toolbox.

    2 Invariance to Translations, Rotations andDeformations

    Section 2.1 reviews the property of translation in-variant representations and their stability relativelyto deformations. The use of wavelet transform isjustified because of their stability to deformations.Their properties are summarized in Section 2.2.Section 2.3 describes translation scattering trans-forms, implemented with a deep convolutional net-work. Separable extensions to translation and ro-tation invariance is discussed in Section 2.4. It isshown that this simple strategy leads to an impor-tant loss of information.

    2.1 Translation Invariance and DeformationStability

    Building invariants to translations and small defor-mations is a prototypical representation issue forclassification, which carries major ingredients thatmakes this problem difficult. Translation invarianceis simple to compute. There are many possiblestrategies that we briefly review. The main difficultyis to build a representation Φ(x) which is also stableto deformations.

    A representation Φ(x) is said to be translationinvariant if xv(u) = x(u− v) has the same represen-tation

    ∀v ∈ R2 , Φ(x) = Φ(xv) .

    Besides translation invariance, it is often necessaryto build invariants to any specific class of deforma-tions through linear projectors. Invariant to transla-tion can be computed with a registration Φx(u) =x(u− a(x)) where a(x) is an anchor point whichis translated when x is translated. It means thatif xv(u) = x(u− v) then a(xv) = a(x) + v. For ex-ample, a(x) = argmaxu |x ? h(u)|, for some filterh(u). These invariants are simple and preserve as

    much information as possible. The Fourier trans-form modulus |x̂(ω)| is also invariant to translation.

    Invariance to translations is often not enough.Suppose that x is not just translated but also de-formed to give xτ(u) = x(u− τ(u)) with |∇τ(u)| <1. Deformations belong to the infinite dimensionalgroup of diffeomorphisms. Computing invariants todeformations would mean losing too much infor-mation. In a digit classification problem, a defor-mation invariant representation would confuse a 1with a 7. We then do not want to be invariant to anydeformations, but only to the specific deformationswithin the digit class, while preserving informationto discriminate different classes. Such deformationinvariants need to be learned as an optimized linearcombinations.

    Constructing such linear invariants requires therepresentation to be stable to deformations. A repre-sentation Φ(x) is stable to deformations if ‖Φ(x)−Φ(xτ)‖ is small when the deformation is small.The deformation size is measured by ‖∇τ‖∞ =supu |∇τ(u)|. If this quantity vanishes then τ is a“pure” translation without deformation. Stability isformally defined as Lipschitz continuity relativelyto this metric. It means that there exists C > 0 suchthat for all x(u) and τ(u) with ‖∇τ‖∞ < 1

    ‖Φ(x)−Φ(xτ)‖ ≤C‖∇τ‖∞ ‖x‖ . (1)

    This Lipschitz continuity property implies thatdeformations are locally linearized by the repre-sentation Φ . Indeed, Lipschitz continuous operatorsare almost everywhere differentiable in the sense ofGateau. It results that Φ(x)−Φ(xτ) can be approx-imated by a linear operator of ∇τ if ‖∇τ‖∞ is small.A family of small deformations thus generates a lin-ear space spanτ(Φ(xτ)). In the transformed space,an invariant to these deformations can then be com-puted with a linear projector on the orthogonal com-plement spanτ(Φ(xτ))⊥.

    Registration invariants are not stable to defor-mations. If x(u) = 1[0,1]2(u)+ 1[α,α+1]2(u) then forτ(u) = εu one can verify that ‖x− xτ‖ ≥ 1 if |α|>ε−1. It results that (1) is not valid. One can similarlyprove that the Fourier transform modulus Φ(x) = |x̂|

  • 4 Laurent Sifre, Stéphane Mallat

    Fig. 1 Two images of the sametexture (left) from the UIUCTexdataset (UIUC Tex, 2005) and thelog of their modulus of Fouriertransform (right). The periodicpatterns of the texture correspondsto fine grained dots on the Fourierplane. When the texture is de-formed, the dots spread on theFourier plane, which illustrates thefact that modulus of Fourier trans-form is unstable to elastic defor-mation.

    is not stable to deformations because high frequen-cies move too much with deformations as can beseen on figure 1.

    Translation invariance often needs to be com-puted locally. Translation invariant descriptorswhich are stable to deformations can be obtainedby averaging. If translation invariant is only neededwithin a limited range smaller than 2J then it is suf-ficient to average x with a smooth window φJ(u) =2−2Jφ(2−Ju) of width 2J :

    x?φJ(u) =∫

    x(v)φJ(u− v)dv. (2)

    It is proved in (Mallat, 2012) that if ‖∇φ‖1 < +∞and ‖|u|∇φ(u)‖1 < +∞ and ‖∇τ‖∞ ≤ 1− ε withε > 0 then there exists C such that

    ‖xτ ?φJ−x?φJ‖≤C‖x‖(

    2−J‖τ‖∞ +‖∇τ‖∞). (3)

    Averaging operators lose all high frequencies,and hence eliminate most signal information. Thesehigh frequencies can be recovered with a wavelettransform.

    2.2 Wavelet Transform Invariants

    Contrarily to sinusoidal waves, wavelets are local-ized functions which are stable to deformations.They are thus well adapted to construct transla-tion invariants which are stable to deformations. Webriefly review wavelet transforms and their appli-cations in computer vision. Wavelet transform hasbeen used to analyze stationary processes and imagetextures. They provide a set of coefficients closelyrelated to the power spectrum.

    A directional wavelet transform extracts thesignal high-frequencies within different frequencybands and orientations. Two-dimensional direc-tional wavelets are obtained by scaling and rotatinga single band-pass filter ψ . Multiscale directionalwavelet filters are defined for any j ∈ Z and rota-tion rθ of angle θ ∈ [0,2π] by

    ψθ , j(u) = 2−2 jψ(2− jr−θ u) . (4)

    If the Fourier transform ψ̂(ω) is centered at a fre-quency η then ψ̂θ , j(ω) = ψ̂(2 jr−θ ω) has a sup-port centered at 2− jrθ η , with a bandwidth propor-tional to 2− j. We consider a group G of rotations rθwhich is either a finite subgroup of SO(2) or which

  • Rigid-Motion Scattering for Texture Classification 5

    Fig. 2 The gaussian window φJ(left) and oriented and dilatedMorlet wavelets ψθ , j (right). Sat-uration corresponds to amplitudewhile color corresponds to com-plex phase.

    is equal to SO(2). A finite rotation group is indexedby Θ = {2kπ/K : 0 ≤ k < K} and if G = SO(2)then Θ = [0,2π). The wavelet transform at a scale2J is defined by

    Wx ={

    x?φJ(u) , x?ψθ , j(u)}

    u∈R2,θ∈Θ , j 0 is adjustedso that

    ∫ψ = 0. Morlet wavelets for π ≤ θ < 2π

    are not computed since they verified ψθ+π, j = ψ∗θ , j,where z∗ denotes the complex conjugate of z. Theaveraging function is chosen to be a Gaussian win-dow

    φ(u) = (2πσ2)−1 exp(−u2/(2πσ2)) (7)

    Figure 2 shows such window and Morlet wavelets.To simplify notations, we shall write ∑θ∈Θ h(θ)

    a summation over Θ even when Θ = [0,2π) in

    which case this discrete sum represents the integral∫ 2π0 h(θ)dθ . We consider wavelets which satisfy the

    following Littlewood-Paley condition, for ε > 0 andalmost all ω ∈ R2

    1− ε ≤ |φ̂(ω)|2 + ∑j

  • 6 Laurent Sifre, Stéphane Mallat

    Fig. 3 Two images of the sametexture (left) and their convolu-tion with the same Morlet wavelet(right). Even though the textureis highly deformed, the waveletresponds to roughly the sameoriented pattern in both images,which illustrates its stability to de-formation.

    ψθ , j(u) is translated. Removing the complex phaselike in a Fourier transform defines a positive enve-lope |x ? ψθ , j(u)| which is still covariant to trans-lation, not invariant. Averaging this positive enve-lope defines locally translation invariant coefficientswhich depends upon (u,θ , j):

    S1x(u,θ , j) = |x?ψθ , j|?φJ(u) .

    Such averaged wavelet coefficients are used un-der various forms in computer vision. Global his-tograms of quantized filter responses have been usedfor texture recognition in Leung & Malik (2001).SIFT(Lowe, 2004) and DAISY(Tola et al., 2010)descriptors computes local histogram of orientation.This is similar to S1x definition, but with differ-ent wavelets and non-linearity. Due to their stabil-ity properties, SIFT-like descriptors have been usedextensively for a wide range of applications wherestability to deformation is important, such as keypoint matching in pair of images from different viewpoints, and generic object recognition.

    2.3 Transation Invariant Scattering

    The convolution by φJ provides a local translationinvariance but also loses spatial variability of thewavelet transform. A scattering successively recov-ers the information lost by the averaging whichcomputes the invariants. Scattering consists in a cas-cade of wavelet modulus transforms, which can beinterpreted as a deep neural network.

    A scattering transform is computed by iteratingon wavelet transforms and modulus operators. Tosimplify notations, we shall write λ = (θ , j) andΛ = {(θ , j) : θ ∈ [0,2π]}. The wavelet transformand modulus operations are combined in a singlewavelet modulus operator defined by:

    |W |x ={

    x?φJ , |x?ψλ |}

    λ∈Λ. (10)

    This operator averages coefficients with φJ to pro-duce invariants to translations and computes higherfrequency wavelet transform envelopes which carrythe lost information. A scattering transform can beinterpreted as a neural network illustrated in Figure4 which propagates a signal x across multiple layers

  • Rigid-Motion Scattering for Texture Classification 7

    x |W | |W | |W | |W |U1x U2x

    S0x S1x S2x

    . . .

    Umx Um+1x

    Smx

    Fig. 4 Translation scattering can be seenas a neural network which iterates overwavelet modulus operators |W |. Eachlayer m outputs averaged invariant Smxand covariant coefficients Um+1x.

    of the network and which outputs at each layer mscattering invariant coefficients Smx.

    The input of the network is the original signalU0x = x. The scattering transform is then definedby induction. For any m ≥ 0, applying the waveletmodulus operator |W | on Umx outputs the scatter-ing coefficients Smx and computes the next layer ofcoefficients Um+1x:

    |W |Umx = (Smx ,Um+1x) , (11)

    with

    Smx(u,λ1, . . . ,λm) = Umx(.,λ1, . . . ,λm)?φJ(u)= | ||x?ψλ1 |? . . . |?ψλm |?φJ(u)

    and

    Um+1 x (u,λ1, . . . ,λm,λm+1)= |Umx(.,λ1, . . . ,λm)?ψλm+1(u)|= | ||x?ψλ1 |? . . . |?ψλm |?ψλm+1(u)|

    This scattering transform is illustrated in Figure 4.The final scattering vector concatenates all scatter-ing coefficients for 0≤ m≤M:

    Sx = (Smx)0≤m≤M. (12)

    A scattering tranform is a non-expansive opera-tor, which is stable to deformations. Let ‖Sx‖ =∑m ‖Smx‖2, one can prove that

    ‖Sx−Sy‖ ≤ ‖x− y‖ . (13)

    Because wavelets are localized and separate scalewe can also prove that if x has a compact supportthen there exists C > 0 such that

    ‖Sxτ−Sx‖≤C‖x‖(

    2−J‖τ‖∞+‖∇τ‖∞+‖Hτ‖∞).

    (14)

    Most of the energy of scattering coefficients isconcentrated on the first two layers m= 1,2. As a re-sult, applications thus typically concentrate on these

    two layers. Among second layer scattering coeffi-cients

    S2x(u,λ1,λ2) = ||x?ψλ1 |?ψλ2 |?φJ(u)

    coefficients λ2 = 2 j2rθ2 with 2j2 ≤ 2 j1 have a small

    energy. Indeed, |x?ψλ1 | has an energy concentratedin a lower frequency band. As a result, we onlycompute scattering coefficients for increasing scales2 j2 > 2 j1 .

    2.4 Separable Versus Joint Rigid Motion Invariants

    An invariant to a group which is a product of twosub-groups can be implemented as a separable prod-uct of two invariant operators on each subgroup.However, this separable invariant is often too strong,and loses important information. This is shown fortranslations and rotations.

    To understand the loss of information producedby separable invariants let us first consider thetwo-dimensional translation group over R2. A two-dimensional translation invariant operator appliedto x(u1,u2) can be computed by applying first atranslation invariant operator Φ1 which transformsx(u1,u2) along u1 for u2 fixed. Then a second trans-lation invariant operator Φ2 is applied along u2.The product Φ2Φ1 is thus invariant to any two-dimensional translation. However, if xv(u1,u2) =x(u1− v(u2),u2) then Φ1xv = Φ1x for all v(u2), al-though xv is not a translation of x because v(u2) isnot constant. It results that Φx = Φxv. This sepa-rable operator is invariant to a much larger set ofoperators than two-dimensional translations and canthus confuse two images which are not translationsof one-another, as in Figure 5 (left). To avoid thisinformation loss, it is necessary to build a trans-lation invariant operator which takes into accountthe structure of the two-dimensional group. This is

  • 8 Laurent Sifre, Stéphane Mallat

    Fig. 5: (Left) Two images where each row of the second image is translated by a different amount v(u1).A separable translation invariant that would start by computing a translation invariant for each row wouldoutput the same value, which illustrates the fact that such separable invariants are too strong. (Right) Twotextures whose first internal layer is translated by different values for different orientations. In this example,vertical orientations are not translated while horizontal orientations are translated by 1/2(1,1). Translationscattering and other separable invariants cannot distinguish these two textures because it does not connectvertical and horizontal nodes.

    why translation invariant scattering operators in R2are not computed as products of scattering operatorsalong horizontal and vertical variables.

    The same phenomena appears for invariantsalong translations and rotations, although it is moresubtle because translations and rotations interfere.Suppose that we apply a translation invariant oper-ator Φ1, such as a scattering transform, which sepa-rate image components along different orientationsindexed by an orientation parameter θ ∈ [0,2π].Applying a second rotation invariant operator Φ2which acts along θ produces a translation and ro-tation invariant operator.

    Locally Binary Pattern (Zhao et al., 2012) fol-lows this approach. It first builds translation invari-ance with an histogram of oriented pattern. Then, itbuilds rotation invariance on top, by either poolingall patterns that are rotated versions of one another,or by computing modulus of Fourier transform onthe angular difference that relates rotated patterns.

    Such separable invariant operators have the ad-vantage of simplicity and have thus been used inseveral computer vision applications. However, asin the separable translation case, separable productsof translation and rotation invariants can confusevery different images. Consider a first image, whichis the sum of arrays of oscillatory patterns along

    two orthogonal directions, with same locations. Ifthe two arrays of oriented patterns are shifted asin Figure 5 (right) we get a very different textures,which are not globally translated or rotated one rel-atively to the other. However, an operator Φ1 whichfirst separates different orientation components andcomputes a translation invariant representation in-dependently for each component will output thesame values for both images because it does nottake into account the joint location and orientationstructure of the image. This is the case of separablescattering transforms (Sifre & Mallat, 2012) or anyof the separable translation and rotation invariant inused in (Xu et al., 2010; Zhao et al., 2012).

    Taking into account the joint structure of therigid-motion group of rotations and translations inR2 was proposed by several researchers (Citti &Sarti, 2006; Duits & Burgeth, 2007; Boscain et al.,2013), to preserve image structures in applicationssuch as noise removal or image enhancement withdirectional diffusion operators (Duits & Franken,2011). Similarly, a joint scattering invariant to trans-lations and rotations is constructed directly on therigid-motion group in order to take into accountthe joint information between positions and orien-tations.

  • Rigid-Motion Scattering for Texture Classification 9

    v1

    v2

    θ

    x̃(v1,v2,θ ) ⋆y(r−θ .) ⋆̄ ȳ x̃ ⋆̃ ỹ

    Fig. 6: A rigid-motion convolution (20) with a separable filter ỹ(v,θ) = y(v)ȳ(θ) in SE(2) can be factorizedinto a two dimensional convolution with rotated filters y(r−θ v) and a one dimensional convolution with ȳ(θ).

    3 Rigid-motion Scattering

    Translation invariant scattering operators are ex-tended to define invariant representations over anyLie group, by calculating wavelet transforms on thisgroup. Such wavelet transforms are well definedwith weak conditions on the Lie group. We con-centrate on invariance to the action of rotations andtranslations, which belong to the special Euclideangroup. Next section briefly reviews the properties ofthe special Euclidean group. A scattering operator(Mallat, 2012) computes an invariant image repre-sentation relatively to the action of a group by ap-plying wavelet transforms to functions defined onthe group.

    3.1 Rigid-Motion Group

    The set of rigid-motions is called the special Eu-clidean group SE(2). We briefly review its prop-erties. A rigid-motion in R2 is parameterized by atranslation v ∈ R2 and a rotation rθ ∈ SO(2) of an-gle θ ∈ [0,2π). We write g = (v,θ). Such a rigid-motion g maps u ∈ R2 to

    gu = v+ ru . (15)

    A rigid-motion g applied to an image x(u) translatesand rotates the image accordingly:

    g.x(u) = x(g−1u) = x(r−1(u− v)) . (16)

    The group action (15) must be compatible withthe product g′.(gu) = (g′.g)u, so that successive ap-plications of two rigid-motions g and g′ are equiv-alent to the application of a single product rigid-motion g′.g. This combined to (15) implies that

    g′.g = (v′+ rθ ′v, θ +θ ′) . (17)

    This group product is not commutative. The neutralelement is (0,0), and the inverse of g is

    g−1 = (−r−θ v,−θ). (18)

    The product (17) of SE(2) is the definition of thesemidirect product of the translation group R2 andthe rotation group SO(2):

    SE(2) = R2 oSO(2) .

    It is a Lie group, and the left invariant Haar measureof SE(2) is dg = dvdθ , obtained as a product of theHaar measures on R2 and SO(2).

    The space L2(SE(2)) of finite energy measur-able functions x̃(v,θ) is a Hilbert space

    L2(SE(2)) ={

    x̃ :∫R2

    ∫ 2π0|x̃(v,θ)|2 dθdv < ∞

    }.

    The left-invariant convolution of two functions x̃(g)and ỹ(g) is defined by

    x̃ ?̃ ỹ(g) =∫

    SE(2)x̃(g′) ỹ(g′−1g)dg′ .

  • 10 Laurent Sifre, Stéphane Mallat

    Since (v′,θ ′)−1 = (−r−θ ′v′,−θ ′)

    x̃ ?̃ ỹ(v,θ) =∫R2

    ∫ 2π0

    x̃(v′,θ ′) ỹ(r−θ ′(v− v′), θ −θ ′)dv′dθ ′ .

    (19)

    For separable filters ỹ(v,θ) = y(v) ȳ(θ), this convo-lution can be factorized into a spatial convolutionwith rotated filters y(r−θ v) followed by convolutionwith ȳ(θ):

    x̃ ?̃ ỹ(v,θ) =∫ 2π0

    x̃(v′,θ ′)∫R2

    y(r−θ ′(v− v′))dv′ ȳ(θ −θ ′)dθ ′ .

    (20)

    This is illustrated in Figure 6.

    3.2 Wavelet Transform on the Rigid-Motion Group

    A wavelet transform W̃ in L2(SE(2)) is defined asconvolutions with averaging window and waveletsin L2(SE(2)). The wavelets are constructed asseparable products of wavelets in L2(R2) and inL2(SO(2)).

    A spatial wavelet transform in L2(R2) is definedfrom L mother wavelets ψl(u) which are dilatedψl, j(u) = 2−2 jψl(2− ju), and a rotationally symmet-ric averaging function φJ(u) = 2−2Jφ(2−Ju) at themaximum scale 2J :

    Wx ={

    x?φJ(u) , x?ψl, j(u)}

    u∈R2,0≤l

  • Rigid-Motion Scattering for Texture Classification 11

    ∀ω ∈R2, 1−ε2 ≤ |φ̂(ω)|2+ ∑0≤l

  • 12 Laurent Sifre, Stéphane Mallat

    x |W | |W̃ | |W̃ | |W̃ |Ũ1x Ũ2x

    S̃0x S̃1x S̃2x

    . . .

    Ũmx Ũm+1x

    S̃mx

    Fig. 7 Rigid-motion scattering is similarto translation scattering of Figure 4, butdeep wavelet modulus operators |W | are re-placed with rigid-motion wavelet modulusoperators |W̃ | where convolutions are ap-plied along the rigid-motion group.

    modulus operator |W̃ | on Ũmx outputs the scatter-ing coefficients S̃mx and computes the next layer ofcoefficients Ũm+1x:

    |W̃ |Ũmx = (S̃mx , Ũm+1x) , (33)

    with

    S̃mx(g, j1,λ2, . . . ,λm)

    = Ũmx(., j1,λ2, . . . ,λm) ?̃ φ̃J,K(g)

    = | ||x?ψ., j1 | ?̃ ψ̃λ2 . . . ?̃ ψ̃λm | ?̃ φ̃J,K(g)

    and

    Ũm+1x(g, j1,λ2, . . . ,λm,λm+1)

    = |Ũmx(., j1,λ2, . . . ,λm) ?̃ ψ̃λm+1(g)|= | ||x?ψ., j1 |? ψ̃λ2 . . . |? ψ̃λm |? ψ̃λm+1(g)|

    (34)

    This rigid-motion scattering transform is illustratedin Figure 7.

    The final scattering vector concatenates all scat-tering coefficients for 0≤ m≤M:

    S̃x = (S̃mx)0≤m≤M. (35)

    The following theorem proves that a scatteringtransform is a non-expansive operator.

    Theorem 2 For any M ∈N and any (x,y)∈L2(R2)

    ‖S̃x− S̃y‖ ≤ ‖x− y‖ . (36)

    Proof: A modulus is non-expansive in the sensethat for any (a,b)∈C2, ||a|−|b|| ≤ |a−b|. Since W̃is a linear non-expansive operator, it results that thewavelet modulus operator |W̃ | is also non-expansive

    ‖|W̃ |x−|W̃ |y‖ ≤ ‖x− y‖ .

    Since W̃ is non-expansive, it results from (33) that

    ‖|W̃ |Ũmx−|W̃ |Ũmy‖

    = ‖S̃mx− S̃my‖2 +‖Ũm+1x−Ũm+1y‖2

    ≤ ‖Ũmx−Ũmy‖2 .(37)

    Summing this equation from m = 1 to M gives

    M

    ∑m=1‖S̃mx− S̃my‖2 +‖ŨM+1x−ŨM+1y‖2

    ≤ ‖Ũ1x−Ũ1y‖2 .(38)

    Since |W |x = (S0x,Ũ1x) which is also non-expansive, we get

    ‖S0x−S0y‖2 +‖Ũ1x−Ũ1y‖2 ≤ ‖x− y‖2 . (39)

    Inserting (39) in (38) proves (36). �

    4 Fast Rigid-Motion Scattering

    For texture classification applications, first and sec-ond layers of scattering are sufficient for achievingstate-of-the-art results. This section describes a fastimplementation of rigid-motion scattering based ona filter bank implementation of the wavelet trans-form.

    4.1 Wavelet Filter Bank Implementation

    Rigid-motion scattering coefficients are computedby applying a spatial wavelet tranform W and then arigid-motion wavelet tranform W̃ . This section de-scribes filter bank implementations of the spatialwavelet transform.

  • Rigid-Motion Scattering for Texture Classification 13

    x h ↓ 2h ↓ 2h ↓ 2

    g0g0g0

    g1g1g1

    A1x A2xA3x

    B0,0x

    B1,0x

    B0,1x

    B1,1x

    B0,2x

    B1,2x

    Fig. 8 Filter bank implementation of thewavelet transform W with J = 3 scales andC = 2 orientations. A cascade of low pass filterh and downsampling computes low frequenciesA jx = x ? φ j and filters gθ compute high fre-quencies Bθ , jx = x ?ψθ , j . This cascade resultsin a tree whose internal nodes are intermediatecomputations and whose leaves are the outputof the downsampled wavelet transform.

    A wavelet tranform

    Wx ={

    x?φJ(u) , x?ψθ , j(u)}

    u∈R,θ∈Θ , j

  • 14 Laurent Sifre, Stéphane Mallat

    x̃ h ↓ 2h ↓ 2

    g0,0g0,0

    g0,1g0,1

    g1,0g1,0

    g1,1g1,1

    h̄ ↓ 2h̄ ↓ 2h̄ ↓ 2h̄ ↓ 2

    h̄ ↓ 2h̄ ↓ 2

    h̄ ↓ 2h̄ ↓ 2h̄ ↓ 2h̄ ↓ 2

    ḡḡḡḡ

    ḡḡ

    ḡḡḡḡ

    A1 x̃ A2 x̃

    B0,0x̃ B0,1x̃

    B1,0x̃ B1,1x̃

    C2,2 x̃

    D2,1x̃

    E0,0,2x̃ E0,1,2x̃

    E1,0,2x̃ E1,2,1x̃

    F0,0,2x̃ F0,2,1x̃

    F1,0,2x̃ F1,2,1x̃

    Fig. 9: Filter bank implementation of the rigid-motion wavelet transform W̃ with J = 2 spatial scales, C = 2orientations, L = 2 spatial wavelets, K = 2 orientation scales . A first cascade computes spatial downsam-pling and filtering with h and gl,θ . The first cascade is a tree whose leaves are AJ x̃ and Bl, j x̃. Each leaf isretransformed with a second cascade of downsampling and filtering with h̄ and ḡ along the orientation vari-able. The leaves of the second cascade are CJ,K x̃, DJ,kx̃ (whose ancestor is AJ x̃) and El, j,K x̃, Fl, j,kx̃ (whoseancestors are the Bl, j x̃). These leaves constitute the output of the downsampled rigid-motion wavelet trans-form. They correspond to signals x̃ ?̃ φ̃J,K x̃, x̃ ?̃ ψ̃J,k, x̃ ?̃ ψ̃l, j,K , x̃ ?̃ ψ̃l, j,k appropriately downsampled along thespatial and the orientation variable.

    rigid-motion wavelet transform coefficients:

    CJ,K x̃(n,θ) = x̃ ?̃ φ̃J,K(2Jn,2Kθ)DJ,kx̃(n,θ) = x̃ ?̃ ψ̃J,k(2Jn,2kθ)

    El, j,K x̃(n,θ) = x̃ ?̃ ψ̃l, j,K(2 jn,2Kθ)Fl, j,kx̃(n,θ) = x̃ ?̃ ψ̃l, j,k(2 jn,2kθ) .

    These subsampled coefficients are initialized fromA and B with CJ,0x̃ = AJ x̃ and El, j,0x̃ = Bl, j x̃. Wecompute them by induction

    CJ,k+1x̃ = (CJ,kx̃ ?̄ h̄)↓2DJ,kx̃ = CJ,kx̃ ?̄ ḡ

    El, j,k+1x̃ = (E j,l,kx̃ ?̄ h̄)↓2Fl, j,kx̃ = E j,l,kx̃ ?̄ ḡ

    where ?̄ , ↓, h̄, ḡ are the discrete convolution, down-sampling, low pass and high pass filters along theorientation variable θ .

    The first spatial cascade computes CL convo-lutions at each spatial resolution, which requiresO(CLNP) operations and O(CLN) memory. Eachleaf is then retransformed by a cascade along theorientation variable θ of cardinality C. Convolu-tions along the orientations are periodic and sincethe size of the filter h̄, ḡ is of the same order asC, we use FFT-based convolutions. One such con-volution requires O(C logC) operations. One cas-cade of filtering and downsampling along orienta-tions requires ∑k C2−k log(C2−k) = O(C logC) timeand O(C) memory. There are O(LN) such cascadesso that the total cost for processing along orientationis O(CLN logC) operations and O(CLN) memory.Thus, the total cost for the full rigid-motion wavelettransform W̃ is O(CLN(P+ logC)) operations andO(CLN) memory where C is the number of orienta-tions of the input signal, L is the number of spatial

  • Rigid-Motion Scattering for Texture Classification 15

    wavelets, N is the size of the input image, P is thesize of the spatial filters.

    5 Image Texture Classification

    Image Texture classification has many applicationsincluding satellite, medical and material imaging. Itis a relatively well posed problem of computer vi-sion, since the different sources of variability con-tained in texture images can be accurately mod-eled. This section presents application of rigid-motion scattering on four texture datasets contain-ing different types and ranges of variability: (KTHTIPS, 2004; UIUC Tex, 2005; UMD, 2009) texturedatasets, and the more challenging FMD (Sharan etal., 2009; FMD, 2009) materials dataset. Results arecompared with state-of-the-art algorithms in table 1,2, 3 and 4. All classification experiments are repro-ducible with the ScatNet (ScatNet, 2013) toolboxfor MATLAB.

    5.1 Dilation, Shear and Deformation Invariancewith a PCA Classifier

    Rigid-motion scattering builds invariance to therigid-motion group. Yet, texture images also un-dergo other geometric transformations such as dila-tions, shears or elastic deformations. Dilations andshears, combined with rotations and translations,generates the group of affine transforms. One candefine wavelets (Donoho et al., 2011) and a scat-tering transform on the affine group to build affineinvariance. However this group is much larger andit would involve heavy and unnecessary computa-tions. A limited range of dilations and shears isavailable for finite resolution images which allowsone to linearizes these variations. Invariance to di-lations, shears and deformations are obtained withlinear projectors implemented at the classifier level,by taking advantage of the scattering’s stability tosmall deformation. In texture application there istypically a small number of training examples perclass, in which case PCA generative classifiers can

    perform better than linear SVM discriminative clas-sifiers (Bruna & Mallat, 2013).

    Let Xc be a stationary process representing a tex-ture class c. Its rigid-motion scattering transformS̃Xc typically has a power law behavior as a functionof its scale parameters. It is partially linearized bya logarithm which thus improves linear classifiers.The random process log S̃Xc has an energy which isessentially concentrated in a low-dimensional affinespace

    Ac = E(log S̃Xc)+Vc

    where Vc is the principal component linear space,generated by the eigenvalues of the covariance oflog S̃Xc having non-negligible eigenvalues.

    The expected value E(log S̃Xc) is estimated bythe empirical average µc of the log S̃Xc,i for alltraining examples Xc,i of the class c. To guaranteethat the scattering moments are partially invariant toscaling, we augment the training set by dilating eachXc,i by typically 4 scaling factors {1,

    √2, 2, 2

    √2}.

    In the following, we consider {Xc,i}i as the set oftraining examples augmented by dilation, which areincorporated in the empirical average estimation µcof E(log S̃Xc).

    The principal components space Vc is estimatedfrom the singular value decomposition (SVD) of thematrix of centered training example log S̃Xi,c − µc.The number of non-zero eigenvectors which can becomputed is equal to the total number of training ex-amples. We define Vc as the space generated by alleigenvectors. In texture discrimination applications,it is not necessary to regularize the estimation by re-ducing the dimension of this space because there isa small number of training examples.

    Given a test image X , we abusively denote bylog S̃X the average of the log scattering transformof X and its dilated versions. It is therefore a scaledaveraged scattering tranform, which provides a par-tial scaling invariance. We denote by PVc log S̃X theorthogonal projection of log S̃X in the scatteringspace Vc of a given class c. The PCA classifica-tion computes the class ĉ(X) which minimizes thedistance ‖(Id−PVc)(log S̃X−µc)‖ between S̃X and

  • 16 Laurent Sifre, Stéphane Mallat

    the affine space µc +Vc:

    ĉ(X) = argminc‖(Id−PVc)(log S̃X−µc)‖2 (43)

    The translation and rotation invariance of arigid-motion scattering S̃X results from the spatialand angle averaging implemented by the convolu-tion with φ̃J,K . It is nearly translation invariant overspatial domains of size 2J and rotations of angles atmost 2K . The parameters J and K can be adjusted bycross-validation. One can also avoid performing anysuch averaging and let the linear supervised classiferoptimize directly the averaing. This last approaochis possible only if there is enough supervised train-ing examples to learn the appropriate averaging ker-nel. This is not the case in the texture experiments ofSection 5.2 where few training examples are avail-able, but where the classification task is known tobe fully translation and rotation invariant. The val-ues of J and K are thus maximum.

    5.2 Texture Classification Experiments

    This sections details classification results on im-age texture datasets KTH-TIPS (KTH TIPS, 2004),UIUC (Lazebnik et al., 2005; UIUC Tex, 2005) andUMD (UMD, 2009). Those datasets contains im-ages with different range of variability for each dif-ferent geometric transformation type. We give re-sults for progressively more invariant versions ofthe scattering and compare with state-of-the-art ap-proaches for all datasets.

    Most state of the art algorithms use separable in-variants to define a translation and rotation invariantalgorithms, and thus lose joint information on posi-tions and orientations. This is the case of (Lazebniket al., 2005) where rotation invariance is obtainedthrough histograms along concentric circles, as wellas Log Gaussian Cox processes (COX) (Nguyen etal., 2011) and Basic Image Features (BIF) (Crosier& Griff, 2008) which use rotation invariant patchdescriptors calculated from small filter responses.Sorted Random Projection (SRP) (Liu et al., 2011)replaces histogram with a similar sorting algorithm

    Fig. 10: Each row shows images from the same tex-ture class in the UIUC database (Lazebnik et al.,2005), with important rotation, scaling and defor-mation variability.

    and adds fine scale joint information between ori-entations and spatial positions by calculating ra-dial and angular differences before sorting. WaveletMultifractal Spectrum (WMFS) (Xu et al., 2010)computes wavelet descriptors which are averagedin space and rotations, and are similar to first orderscattering coefficients S1x.

    We compare the best published results (Lazeb-nik et al., 2005; Nguyen et al., 2011; Crosier &Griff, 2008; Xu et al., 2010; Liu et al., 2011) andscattering invariants on KTH-TIPS (table 1), UIUC(table 2) and UMD (table 3) texture databases.For the KTH-TIPS, UIUC and UMD database, Ta-bles 1,2,3 give the mean classification accuracy andstandard deviation over 200 random splits betweentraining and testing for different training sizes. Clas-sification accuracy is computed with scattering rep-resentations implemented with progressively moreinvariants, and with the PCA classifier of Section5.1. As the training sets are small for each class c,the dimension D of the high variability space Vc isset to the training size. The space Vc is thus gen-erated by the D scattering vectors of the trainingset. For larger training databases, it must be adjustedwith a cross validation as in (Bruna & Mallat, 2013).

  • Rigid-Motion Scattering for Texture Classification 17

    Train size 5 20 40COX (Nguyen et al., 2011) 80.2±2.2 92.4±1.1 95.7±0.5BIF (Crosier & Griff, 2008) - - 98.5SRP (Liu et al., 2011) - - 99.3Translation scattering 69.1±3.5 94.8±1.3 98.0±0.8Rigid-motion scattering 69.5±3.6 94.9±1.4 98.3±0.9+ log & scale invariance 84.3±3.1 98.3±0.9 99.4±0.4

    Table 1: Classification accuracy with standard deviations on (KTH TIPS, 2004) database. Columns corre-spond to different training sizes per class. The first few rows give the best published results. The last rowsgive results obtained with progressively refined scattering invariants. Best results are bolded.

    Training size 5 10 20Lazebnik (Lazebnik et al., 2005) - 92.6 96.0WMFS (Xu et al., 2010) 93.4 97.0 98.6BIF (Crosier & Griff, 2008) - - 98.8±0.5Translation scattering 50.0±2.1 65.2±1.9 79.8±1.8Rigid-motion scattering 77.1±2.7 90.2±1.4 96.7±0.8+ log & scale invariance 93.3±1.4 97.8±0.6 99.4±0.4

    Table 2 Classification accuracyon (UIUC Tex, 2005) database.

    Training size 5 10 20WMFS (Xu et al., 2010) 93.4 97.0 98.7SRP (Liu et al., 2011) - - 99.3Translation scattering 80.2±1.9 91.8±1.4 97.4±0.9Rigid-motion scattering 87.5±2.2 96.5±1.1 99.2±0.5+ log & scale invariance 96.6±1.0 98.9±0.6 99.7±0.3

    Table 3 Classification accuracyon (UMD, 2009) database.

    Classification accuracy in Tables 1,2,3 are givenfor different scattering representations. The rows“Translation scattering” correspond to the scatter-ing described in Section 2.3 and initially introducedin (Bruna & Mallat, 2013). The rows “Rigid-motionscattering” replace the translation invariant scatter-ing by the rigid-motion scattering of Section 3.3.Finally, the rows “+ log & scale invariance” cor-responds to the rigid-motion scattering, with a log-arithm non-linearity to linearize scaling, and withthe partial scale invariance described in Section 5.1,with augmentation at training and averaging at test-ing along a limited range of dilation.

    (KTH TIPS, 2004) contains 10 classes of 81samples with controlled scaling, shear and illumi-nation variations but no rotation. The Rigid-motionscattering does not degrade results but the scale in-variant provides significant improvement.

    (UIUC Tex, 2005) and (UMD, 2009) both con-tains 25 classes of 40 samples with uncontrolled de-formations including shear, perspectivity effects and

    Training size 50SRP (Liu et al., 2011) 48.2Best single feature (SIFT) in (Sharan et al., 2013) 41.2Rigid-motion scattering + log on grey images 51.22Rigid-motion scattering + log on YUV images 53.28

    Table 4: Classification accuracy on (FMD, 2009)database.

    non-rigid deformations. For both these databases,rigid-motion scattering and the scale invarianceprovide considerable improvements over transla-tion scattering. The overall approach achieves andoften exceeds state-of-the-art results on all thesedatabases.

    (FMD, 2009) contains 10 classes of 100 sam-ples. Each class contains images of the same ma-terial manually extracted from Flickr. Unlike thethree previous databases, images within a class arenot taken from a single physical sample object butcomes with variety of material sub-types which can

  • 18 Laurent Sifre, Stéphane Mallat

    be very different. Therefore, the PCA classifier ofSection 5.1 can not linearize deformation and dis-criminative classifiers tend to give better results.The scattering results reported in table 4 are ob-tained with a one versus all linear SVM. Rigid-motion log scattering applied to each channel ofYUV image and concatenated achieves 52.2 % ac-curacy which is to our knowledge the best for asingle feature. Better results can be obtained usingmultiple features and a feature selection framework(Sharan et al., 2013).

    6 Conclusion

    Rigid motion scattering provides stable translationand rotation invariants through a cascade of wavelettransform along the spatial and orientation vari-ables. We have shown that such joint operatorsprovide tighter invariants than separable operators,which tends to be too strong and thus lose toomuch information. A wavelet transform on the rigid-motion group has been introduced, with a fast im-plementation based on two downsampling and fil-tering cascade. Rigid-motion scattering has been ap-plied to texture classification in presence of largegeometric transformations and provide state-of-the-art classification results on most texture datasets.

    Recent work (Oyallon et al., 2014) has shownthat rigid-motion scattering, with extension to di-lation, could also be used for more generic visiontask such as object recognition, with promising re-sults on the CalTech 101 and 256 datasets. For largescale deep networks, group convolution might alsobe useful to learn more structured and meaningfulmultidimensional filters.

    References

    G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimen-sionality of data with neural networks”, Science, Vol. 313.no. 5786, pp. 504 - 507, 28 July 2006.

    Y. LeCun, K. Kavukvuoglu and C. Farabet, “ConvolutionalNetworks and Applications in Vision”, Proc. of ISCAS2010.

    T. Poggio, J. Mutch, F. Anselmi, L. Rosasco, J.Z. Leibo,and A. Tacchetti, “The computational magic of the ven-tral stream: sketch of a theory (and why some deep ar-chitectures work)”, MIT-CSAIL-TR-2012-035, December2012.

    P. Sermanet, K. Kavukcuoglu, S. Chintala, Y. LeCun, “Pedes-trian Detection with Unsupervised Multi-Stage FeatureLearning”, Proc. of Computer Vision and Pattern Recog-nition (CVPR), 2013.

    A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNetClassification with Deep Convolutional Neural Net-works”, Proc. of Neural Information Processing Systems(NIPS), 2012

    J. Dean, G.S. Corrado, R. Monga, K. Chen, M. Devin, Q.V.Le, M.Z. Mao, M.A. Ranzato, A. Senior, P. Tucker, K.Yang, A. Y. Ng, “Large Scale Distributed Deep Net-works”, Proc. of Neural Information Processing Systems(NIPS), 2012.

    J. Bruna, S. Mallat, “Invariant Scattering Convolution Net-works”, Trans. on PAMI, vol. 35, no. 8, pp. 1872-1886,2013.

    S. Mallat “Group Invariant Scattering”, Communications inPure and Applied Mathematics, vol. 65, no. 10. pp. 1331-1398, 2012.

    L. Sifre, S. Mallat, “Combined scattering for rotation invari-ant texture analysis”, Proc. of European Symposium onArtificial Neural Networks (ESANN), 2012.

    L. Sifre, S. Mallat, “Rotation, Scaling and Deformation In-variant Scattering for Texture Discrimination”, Proc. ofComputer Vision and Pattern Recognition (CVPR), 2013.

    J. Bruna, S. Mallat, E. Bacry and J-F. Muzy, “IntermittentProcess Analysis with Scattering Moments”, submitted toAnnals of Statistics, Nov 2013.

    E. Oyallon, S. Mallat, L. Sifre “Generic Deep Networks withWavelet Scattering”, submitted to International Confer-ence on Learning Representations (ICLR), 2014.

    G. Citti, A. Sarti, “A Cortical Based Model of PerceptualCompletion in the Roto-Translation Space”, Journal ofMathematical Imaging and Vision archive, Vol. 24, no. 3,p. 307=326, 2006.

    U. Boscain, J. Duplaix, J.P. Gauthier, F. Rossi, “Anthro-pomorphic Image Reconstruction via Hypoelliptic Diffu-sion”, SIAM Journal on Control and Optimization, Vol-ume 50, Issue 3, pp. 1071-1733, 2012.

    R. Duits, B. Burgeth, “Scale Spaces on Lie Groups”, inScale Space and Variational Methods in Computer Vision,Springer Lecture Notes in Computer Science, Vol. 4485,pp 300-312, 2007.

    R. Duits, E. Franken, “Left-Invariant Diffusions on the Spaceof Positions and Orientations and their Application toCrossing-Preserving Smoothing of HARDI images”, In-ternational Journal of Computer Vision, Volume 92, Issue3, pp 231-264, 2011.

    T. Leung, J. Malik, “Representing and Recognizing the Vi-sual Appearance of Materials using Three-dimensional

  • Rigid-Motion Scattering for Texture Classification 19

    Textons”, International Journal of Computer Vision , Vol-ume 43, Issue 1, pp 29-44, 2001.

    R. Girshick1, J. Donahue, T. Darrell, J. Malik, “Rich featurehierarchies for accurate object detection and semantic seg-mentation”, arXiv preprint:1311.2524, 2013.

    D. Lowe, “Distinctive image features from scale-invariantkeypoints”, IJCV, 60(4):91–110, 2004.

    S. Lazebnik, C. Schmid and J. Ponce, “A sparse texture repre-sentation using local affine regions”, Trans. on PAMI, vol.27, no. 8, pp. 1265-1278, 2005.

    S. Lazebnik, C. Schmid and J. Ponce, “Beyond Bags of Fea-tures: Spatial Pyramid Matching for Recognizing NaturalScene Categories”, Proc. of Computer Vision and PatternRecognition (CVPR), 2006.

    E. Tola, V. Lepetit, P. Fua “DAISY: An Efficient DenseDescriptor Applied to Wide Baseline Stereo” Trans. onPAMI, Vol. 32, Nr. 5, pp. 815 - 830, 2010.

    H.-G. Nguyen, R. Fablet, and J.-M. Boucher, “Visual tex-tures as realizations of multivariate log-Gaussian Cox pro-cesses”, Proc. of Computer Vision and Pattern Recogni-tion (CVPR), 2011.

    M. Crosier and L.D. Griffin, “Texture classification with adictionary of basic image features”, Proc. of Computer Vi-sion and Pattern Recognition (CVPR), 2008.

    Y. Xu, X. Yang, H. Ling and H. Ji, “A new texture descrip-tor using multifractal analysis in multi-orientation waveletpyramid”, Proc. of Computer Vision and Pattern Recogni-tion (CVPR), 2010.

    L. Liu, P. Fieguth, G. Kuang, H. Zha, “Sorted Random Pro-jections for Robust Texture Classification”, Proc. of ICCV,2011.

    G. Zhao, T. Ahonen, J. Matas, M. Pietikäinen, “Rotation-invariant image and video description with local bi-nary pattern features”, Trans. on Image Processing,21(4):1465-1467, 2012.

    L. Sharan, R. Rosenholtz, E. H. Adelson, “Material percep-tion: What can you see in a brief glance?”, Journal of Vi-sion, 9(8):784, 2009.

    L. Sharan, C. Liu, Ruth Rosenholtz, Edward H. Adelson,“Recognizing Materials Using Perceptually Inspired Fea-tures”, International Journal of Computer Vision Volume103, Issue 3, pp 348-371, 2013.

    G. Yu and J.M. Morel, “A Fully Affine Invariant ImageComparison Method”, Proc. of International Conferenceon Acoustics, Speech, and Signal Processing (ICASSP),Taipei, 2009.

    D. L. Donoho, G. Kutyniok, M. Shahram and X. Zhuang. “ARational Design of Discrete Shearlet Transform”, Proc. ofSampTA’11 (Singapore), 2011.

    S. Mallat, “A Wavelet Tour of Signal Processing, 3rd ed.”,Academic Press, 2008.

    L. W. Renninger and J. Malik, “When is scene recognitionjust texture recognition?”, Vision Research, 44, pp. 2301-2311, 2004.

    KTH-TIPS: http://www.nada.kth.se/cvap/databases/kth-tips/

    UIUC : http://www-cvr.ai.uiuc.edu/ponce_grp/data/UMD : http://www.cfar.umd.edu/~fer/

    website-texture/texture.htm

    FMD : http://people.csail.mit.edu/celiu/CVPR2010/FMD/

    ScatNet, a MATLAB toolbox for scattering network : http://www.di.ens.fr/data/software/scatnet/

    http://www.nada.kth.se/cvap/databases/kth-tips/http://www.nada.kth.se/cvap/databases/kth-tips/http://www-cvr.ai.uiuc.edu/ponce_grp/data/http://www.cfar.umd.edu/~fer/website-texture/texture.htmhttp://www.cfar.umd.edu/~fer/website-texture/texture.htmhttp://people.csail.mit.edu/celiu/CVPR2010/FMD/http://people.csail.mit.edu/celiu/CVPR2010/FMD/http://www.di.ens.fr/data/software/scatnet/http://www.di.ens.fr/data/software/scatnet/

    1 Introduction2 Invariance to Translations, Rotations and Deformations3 Rigid-motion Scattering4 Fast Rigid-Motion Scattering5 Image Texture Classification6 Conclusion