11
Research Article Dimension Estimation Using Weighted Correlation Dimension Method Yuanhong Liu, 1,2 Zhiwei Yu, 1 Ming Zeng, 1 and Shun Wang 1 1 Space Control and Inertial Technology Research Center, Harbin Institute of Technology, Harbin 150001, China 2 School of Information and Electrical Engineering, Northeast Petroleum University, Daqing 163318, China Correspondence should be addressed to Yuanhong Liu; [email protected] Received 24 September 2014; Revised 29 December 2014; Accepted 29 December 2014 Academic Editor: Luca Guerrini Copyright © 2015 Yuanhong Liu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Dimension reduction is an important tool for feature extraction and has been widely used in many fields including image processing, discrete-time systems, and fault diagnosis. As a key parameter of the dimension reduction, intrinsic dimension represents the smallest number of variables which is used to describe a complete dataset. Among all the dimension estimation methods, correlation dimension (CD) method is one of the most popular ones, which always assumes that the effect of every point on the intrinsic dimension estimation is identical. However, it is different when the distribution of a dataset is nonuniform. Intrinsic dimension estimated by the high density area is more reliable than the ones estimated by the low density or boundary area. In this paper, a novel weighted correlation dimension (WCD) approach is proposed. e vertex degree of an undirected graph is invoked to measure the contribution of each point to the intrinsic dimension estimation. In order to improve the adaptability of WCD estimation, -means clustering algorithm is adopted to adaptively select the linear portion of the log-log sequence (log , log (, )). Various factors that affect the performance of WCD are studied. Experiments on synthetic and real datasets show the validity and the advantages of the development of technique. 1. Introduction Many engineering applications are difficult to be analyzed by traditional methods owing to the existence of high dimensional signals, such as face recognition [13], nonlinear dynamic systems [4, 5], and fault diagnosis. erefore, a qualified dimension reduction for the high dimension signals is necessary before further proceeding. Currently, considerable attention has been paid to the dimension reduction and many techniques have been reported [6, 7]. ey can be toughly divided into two groups: linear methods and nonlinear methods. Principal component analysis (PCA) [8], local discriminant analysis (LDA), local preserving projections (LPP), and multidimensional scaling (MDS) are the classical linear methods, in which the original space is uniformly assumed to be linear and the raw data can be directly mapped into a lower dimension space. Classical nonlinear methods such as isometric mapping (Isomap), locally linear embedding (LLE) [9], Laplacian eigenmaps (LE), local tangent space alignment (LTSA), Hessian locally linear embedding (HLLE), and diffusion maps (DM) all regard the dataset as being locally homeomorphic to and the local geometric approximation of the high dimensional space is preserved in low one. For dimension reduction, one key is to choose proper intrinsic dimension. e lower intrinsic dimension estima- tion may lose significant information, whereas the higher one may leave too much redundant information, increas- ing amount of calculation and obscuring the important features. Recently, intrinsic dimension estimation methods have attracted plenty of concerns [1016]. Usually they can be categorized into three classes, projection approach [17], probabilistic approach, and geometric approach. For projec- tion approach, the first step is to extract a low-dimensional representation from a high-dimensional space; then the representation is analyzed and the dimension is estimated Hindawi Publishing Corporation Discrete Dynamics in Nature and Society Volume 2015, Article ID 837185, 10 pages http://dx.doi.org/10.1155/2015/837185

Research Article Dimension Estimation Using Weighted

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Dimension Estimation Using Weighted

Research ArticleDimension Estimation Using WeightedCorrelation Dimension Method

Yuanhong Liu12 Zhiwei Yu1 Ming Zeng1 and Shun Wang1

1Space Control and Inertial Technology Research Center Harbin Institute of Technology Harbin 150001 China2School of Information and Electrical Engineering Northeast Petroleum University Daqing 163318 China

Correspondence should be addressed to Yuanhong Liu 39522496qqcom

Received 24 September 2014 Revised 29 December 2014 Accepted 29 December 2014

Academic Editor Luca Guerrini

Copyright copy 2015 Yuanhong Liu et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Dimension reduction is an important tool for feature extraction andhas beenwidely used inmanyfields including image processingdiscrete-time systems and fault diagnosis As a key parameter of the dimension reduction intrinsic dimension represents thesmallest number of variableswhich is used to describe a complete dataset Among all the dimension estimationmethods correlationdimension (CD) method is one of the most popular ones which always assumes that the effect of every point on the intrinsicdimension estimation is identical However it is different when the distribution of a dataset is nonuniform Intrinsic dimensionestimated by the high density area ismore reliable than the ones estimated by the low density or boundary area In this paper a novelweighted correlation dimension (WCD) approach is proposedThe vertex degree of an undirected graph is invoked to measure thecontribution of each point to the intrinsic dimension estimation In order to improve the adaptability ofWCD estimation 119896-meansclustering algorithm is adopted to adaptively select the linear portion of the log-log sequence (log 120575119896 log119862(119899 120575119896)) Various factorsthat affect the performance of WCD are studied Experiments on synthetic and real datasets show the validity and the advantagesof the development of technique

1 Introduction

Many engineering applications are difficult to be analyzedby traditional methods owing to the existence of highdimensional signals such as face recognition [1ndash3] nonlineardynamic systems [4 5] and fault diagnosis Therefore aqualified dimension reduction for the high dimension signalsis necessary before further proceeding

Currently considerable attention has been paid tothe dimension reduction and many techniques have beenreported [6 7] They can be toughly divided into two groupslinearmethods and nonlinearmethods Principal componentanalysis (PCA) [8] local discriminant analysis (LDA) localpreserving projections (LPP) and multidimensional scaling(MDS) are the classical linear methods in which the originalspace is uniformly assumed to be linear and the raw data canbe directly mapped into a lower dimension space Classicalnonlinear methods such as isometric mapping (Isomap)

locally linear embedding (LLE) [9] Laplacian eigenmaps(LE) local tangent space alignment (LTSA) Hessian locallylinear embedding (HLLE) and diffusion maps (DM) allregard the dataset as being locally homeomorphic to 119877

119899 andthe local geometric approximation of the high dimensionalspace is preserved in low one

For dimension reduction one key is to choose properintrinsic dimension The lower intrinsic dimension estima-tion may lose significant information whereas the higherone may leave too much redundant information increas-ing amount of calculation and obscuring the importantfeatures Recently intrinsic dimension estimation methodshave attracted plenty of concerns [10ndash16] Usually they canbe categorized into three classes projection approach [17]probabilistic approach and geometric approach For projec-tion approach the first step is to extract a low-dimensionalrepresentation from a high-dimensional space then therepresentation is analyzed and the dimension is estimated

Hindawi Publishing CorporationDiscrete Dynamics in Nature and SocietyVolume 2015 Article ID 837185 10 pageshttpdxdoiorg1011552015837185

2 Discrete Dynamics in Nature and Society

by PCA factor analysis or MDS The classical probabilisticapproach is maximum likelihood estimate (MLE) [18] whichestimates the probability distribution of a dataset first andthen the intrinsic dimension is estimated by maximumlikelihood method The accuracy of intrinsic dimensioncompletely depends on the estimation of the probability dis-tributionThegeometric approach includes geodesicminimalspanning tree (GMST) and fractal method GMST simplyconstructs a minimal spanning tree sequence [19] usinggeodesic edge matrix and estimates the intrinsic dimensionby the overall lengths of MST GMST is a global methodwhich does not require estimating the multivariate densityof the dataset but the drawback of GMST is the restrictionto isometric embeddings Fractal dimension [20 21] is a sta-tistical index of complexity of a dataset which is commonlycalculated by box-counting method [22ndash24] and CDmethod[25 26]

In this paper a WCD method is presented to improvethe accuracy of CD method The remainder of this paper isorganized as follows Section 2 presents a review of previouswork on dimension estimation In Section 3 theoreticalanalysis of WCD estimation is conducted Section 4 thor-oughly analyzes the influence of various factors on WCD byexperiments In Section 5 experiments on synthetic and realworld datasets are used to confirm the effectiveness of WCDFinally conclusion is drawn in Section 6

2 Previous Work on Dimension Estimation

Informally intrinsic dimension of a dataset is the mini-mum number of independent variables that can completelydescribe a dataset and it can be used tomeasure complexity ofa datasetThe smaller intrinsic dimension indicates a simplerdataset and vice versa The accurate estimator of intrinsicdimension is useful to improve the performance of dimensionreduction methods and to extract features

A detailed review of intrinsic dimension estimationmethods can be found in [16] which summarised almost allthe typical intrinsic dimension estimation methods so farincluding Fukunaga-Olsenrsquosmethod near neighbormethodsTRN-based methods projection techniques multidimen-sional scaling methods and fractal-based methods Recentlysome new intrinsic dimension estimationmethods have beenpresented such as minimal cover method [27] axiomaticmethod [28] packing number method [29] and expectedabsolute projection (EAP) method [30] Each method hasits own characteristic and therefore can only suit differentdatasets

Fractalmethods are a powerful tool to estimate the intrin-sic dimension Among the existing fractal methods Haus-dorff dimension method box-counting dimension methodand CD method are the most representative ones Furtherresearch on the fractal methods refers to [31]

Hausdorff dimension is the basis of fractal dimensionwhich is derived from Hausdorff measure To proceed fur-ther the Hausdorff measure [32] is firstly introduced

Definition 1 (Hausdorff measure) Let (119883 120588) be a metricspace For any subset 119880 sub 119883 one defines a nonnegativefunction

119867119863

120575(119883) = inf sum

119894isinN

diam (119880119894)119863

119883 sube ⋃

119894isinN

119880119894

119880119894 open diam (119880119894) lt 120575 forall119894 isin N

(1)

where diam(119880) = sup120588(119909 119910) 119909 119910 isin 119880 representsdiameter of subset 119880 119863 dimension Hausdorff measure of 119883can be defined as

119867119863(119883) = lim

120575rarr0

119867119863

120575(119883) (2)

Definition 2 (Hausdorff dimension) Hausdorff dimension ofa set119883 in a metric space (119883 120588) is

dim119867 (119883) = inf 119863 119867119863(119883) = 0

= sup 119863 119867119863(119883) = infin

(3)

Hence Hausdorff dimension119863 is a critical value ofHausdorffmeasure frominfin to 0Hausdorff dimension presents a perfecttheoretical framework for dimension estimation fromwhichmany new fractal dimension estimation methods can bederived But Hausdorff dimension is difficult for dimensionestimation in practice The box-counting dimension derivedfrom Hausdorff dimension simplifies calculation complexityof Hausdorff dimension

Definition 3 (box-counting dimension) For a totallybounded set 119883 in a metric space let 119873120575(119883) be the minimalnumber of balls with scale 120575 that cover 119883 The box-countingdimension is then [33]

dimBC (119883) = lim120575rarr0

log119873120575 (119883)

minus log 120575 (4)

and the necessary condition for the existence of limit is that119873120575(119878) is proportional to 120575

119873120575 (119883) = 119888 sdot 120575(minus119863)

(5)

where 119888 is a constant Take the logarithm on (5)

log119873120575 (119883) = log 119888 minus 119863 log 120575 (6)

The box-counting dimension119863 can be expressed as

119863 =log 119888log 120575

minuslog119873120575 (119883)

log 120575 (7)

and according to (7) in order to obtain a good estimateof 119863 log 119888 log 120575 must approach 0 In practice affected bysample size or the value of 120575 log 119888 log 120575 cannot be completelyeliminated Usually box-counting dimension is determinedby calculating a slope of the linear part of curve fitted bylog119873120575(119883) versus log 120575

Discrete Dynamics in Nature and Society 3

Although box-counting method is simpler in calcula-tion compared with Hausdorff method it still has morecomputation complexity than CD method [32] Let 119883 =

1199091 1199092 119909119899 denote a dataset 119883 isin 119877119863times119899 Correlation

integral 119862(119899 120575) [34] can be defined as

119862 (119899 120575) =2

119899 (119899 minus 1)

119899

sum

119894=1

119899

sum

119895=119894+1

119868 (10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817lt 120575) (8)

where 119909119894 minus119909119895 can be any metric between data points 119909119894 and119909119895 119868(sdot) is Heaviside function which is 1 if the condition ismetand otherwise 0119862(119899 120575) is a statistical average of distances lessthan 120575 It can also be written

119862 (119899 120575) =distances less than 120575

distances altogether (9)

The CD is defined as

119863 = lim120575rarr0

log119862 (119899 120575)

minus log 120575 (10)

although (7) and (10) are the same form of the formula theircalculation process is completely different The numerator ofCDmethod represents a global bulkwith scale120575 however thenumerator of box-counting method stands for the minimumnumber of hyperspheres with scale 120575 that covers the datasetNote that (10) cannot be directly applied to obtain CD inpractice A commonly used scheme is to calculate the slopeof a curve which indicates the relationship of log119862(119899 120575) andlog 120575 Let (log119862(119899 log 1205751) log 1205751) and (log119862(119899 1205752) log 1205752)denote any two points of curve respectively the slope is thendefined as

119863 =log119862 (119899 1205752) minus log119862 (119899 1205751)

log 1205752 minus log 1205751 (11)

and the accuracy of CD method is much dependent onthe choice of 1205751 and 1205752 To get high accurate CD thelinear portion of the log-log (log 120575119896 log119862(119899 120575119896)) sequence isselected and a new straight line is then fitted by the linearportion

3 Theoretical WCD Estimation

31 Analysis of WCD Estimation From a geometric pointof view an objectrsquos bulk is directly related to the dimensionpower of its scale 120575 [31] For example a straight line length isone power of scale The area of a circle is two powers of scaleThe relationship between the bulk and the 120575 can be describedas

bulk sim 120575dimension

(12)

where the bulk can be anymetric like a volume area or massAlthough many notions of bulk are possible a good quantityfor bulk function 120573119883119895

(120575) is defined in CD method [31]

120573119883119895(120575) asymp

1

119899 minus 1

119899

sum

119894=1119894 =119895

119868 (10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817lt 120575) (13)

and (13) indicates that the local bulk is denoted by the numberof points falling into the hypersphere with scale 120575 at center 119909119895It is noted that 119894 = 119895 should be excluded which implies thatthe denominator is 119899 minus 1 rather than 119899 Since 120573119883119895(120575) is a localbulk some averaging method should be used for the globalbulk In CD method the algebraic average is used

119862 (119899 120575) =1

119899

119899

sum

119895=1

120573119883119895(120575) (14)

where 119862(119899 120575) is correlation integral that is global bulkFor the uniform dataset a good result can be obtained

by algebraic average for correlation integral119862(119899 120575) Howeverfor the nonuniform dataset it is unreasonable to treat everypoint equally due to the fact that the local bulk 120573119883119895

(120575) isdifferent at different point Here a developed weighted bulkapproach could be considered for global bulk that is treatingeach local bulk with different weights for global bulk then theglobal bulk can be described as

119862 (119899 120575) =

119899

sum

119895=1

119882(119895) 120573119883119895(120575) (15)

where119882 is the weighted vectorLocal bulk calculated at three cases including high dense

points sparse points and boundary points is shown inFigure 1 Without considering the noise points it is obviousthat the local bulks estimated at high dense area are morereliable than the other two cases It is natural for us to increasethe weights of high dense area and simultaneously decreasethe ones of low dense area and boundary area for dimensionestimation So accurate estimation of the data distributionis important and there are many methods estimating thedistribution of dataset such as the probability distributionestimationmethods and the boundary detection methods Inthis paper the vertex degree of an undirected graph is usedto measure the distribution of a dataset upon which a noveland simple WCD method is then proposed to improve theperformance of CD method If the vertex degree is big thearea around the vertex is dense otherwise it is a sparse pointor a boundary point Moreover vertex degree can reflectthe credibility of the local bulk estimated It is reasonable toregard the vertex degree as a weight of the local bulk Twentypoints are marked by vertex degree method in the dataset inFigure 2 in which ten squares represent the biggest vertexesdegree and ten circles indicate the smallest ones We can seethat the density area and the sparse or boundary points aredistinguished correctly Therefore the WCDmethod is moreaccurate for the intrinsic dimension than CD method Thespecific description ofWCDmethod is shown inAlgorithm 1

32 Selecting the Linear Portion of the log-log SequenceSelecting different portion of the log-log sequence to cal-culate the slope will lead to different precision of CDestimation A log-log plot drawn by the log-log sequence(log 120575119896 log119862(119899 120575119896)) is shown in Figure 3 and it can be dividedinto three portions the low portion the middle portion andthe upper portion In the low portion the scale 120575 of the

4 Discrete Dynamics in Nature and Society

Input Signal dataset119883Output Intrinsic dimension119863(1) Normalize the dataset119883 between 0 and 1 then the distance matrix1198821 can be constructed by1198821(119895 119894) = 119909119894 minus 119909119895(2) Construct the similarity matrix1198822(119895 119894) = exp(minus||119909119894 minus 119909119895||2120579

2) Where 120579 is the variance

of the dataset The vertex degree that is the weighted vector is defined as119882(119895) = sum119899

119894=11198822(119895 119894)

(3) The scale sequences (1205751 1205752 120575119898) are computed by120575119896 = min(1198821) + 119896((max(1198821) minusmin(1198821))119898) 119896 = 1 2 119898 Where119898 is the number of the scale 120575(4) Compute the local bulk 120573119895(120575119896) at point 119909119895 120573119883119895 (120575119896) asymp (1(119899 minus 1))sum

119899

119894=1119894 =119895119868(119909119894 minus 119909119895 lt 120575) 119895 = 1 2 119899

(5) In the scale 120575119896 the global bulk is computed 119862(119899 120575119896) = sum119899

119895=1119882(119895)120573119895(120575119896)

(6) The linear part of the log-log sequence (log 120575119896 log119862(119899 120575

119896)) is selected by 119896-means

method and a curve is fitted using linear part by the linear least square method(7) Correlation dimension is calculated by the slope of the curve119863 = (log119862(119899 1205752) minus log119862(119899 1205751))(log 1205752 minus log 1205751)

Algorithm 1 The calculating procedure of WCD

Sparse point

High density point

Boundary point

Figure 1 Different local bulks at three different points

Figure 2 The indication of falling into the circle at differentlocation

hyperspheres is small and only few points fall into the hyper-spheres So very small noise points can cause great errorwhich is the reason that the low portion occurs fluctuatingphenomenon Besides in the upper portion where the scales120575 of the hyperspheres are larger than a specific value thenumber falling into the hyperspheres will not increase Thescattering plot of the dataset is shown in Figure 4 This is thereason that the upper portion bends down and approaches a

minus45 minus4 minus35 minus3 minus25 minus2 minus15 minus1 minus05 0 05minus1

0

1

2

3

4

5

6

Low portion

Middle portion

Upper portion

log(120575)

log C

(n120575

)

Figure 3 log-log plot for computation of CD

Figure 4 Bending explanation for the upper portion

plateau Usually the middle portion is linear which is perfectto estimate CD of a dataset In order to minimize the errorcaused by nonlinearity we should choose small points fromthe log-log sequence (log 120575119896 log119862(119899 120575119896)) and try our best

Discrete Dynamics in Nature and Society 5

to choose the linear portion of the sequence However tomaximize our sample size we want to include as many pointsas possible How can we accurately choose the linear pointsfrom the log-log sequence For the obvious characteristicsof the three portions of the sequence we can use the 119896-means clustering method to decide which pairs of the log-log sequence should be used for CD estimation 119896-meansclusteringmethod aims to partition the log-log sequence intothree categories by minimizing the objective function

argmin119878

119888

sum

119894=1

sum

119910119895isin119904119894

10038171003817100381710038171003817119910119895 minus 120583119894

10038171003817100381710038171003817

2

(16)

where119910119894 is the pair in the sequence (log 120575119896 log119862(119899 120575119896)) 119888 = 3

represents three categories including the low portion themiddle portion and the upper portion respectively 1198781 1198782 1198783are the number of the three categories 120583119894 is the mean of 119888119894Hence those points that belong to 1198882 are chosen to fit a curveby the least squares method and used to estimate the CDThemost important factor of the 119896-means method is the initialvalue of 120583119894 In this paper the curve is divided equally intothree portions and the mean of each portion represents theinitial value of 120583119894

33 Complexity Analysis ofWCDMethod In this section thecomputational complexity of WCD method is investigatedand compared with CD method From the whole calculationprocess we can see that the local bulk of WCD methodcosts more calculations than that of CD method For theanalysis we assume that the sample size is 119899 The calculationof a local bulk 120573119895(120575) at point 119909119895 with scale 120575 requires 119899 minus 1

operations and the complexity is 119874(119899 minus 1) There are 119899 localbulks that should be calculated so all of the complexity is119874((119899 minus 1)

119899) However the CD method is only 119874((119899 minus 1))

In addition compared with CD method vertex degree needbe calculated in WCDmethod and the complexity cannot beignored when the sample size is huge All these seem that thecomputational complexity of WCD method is much higherthan CD method But actually it is unnecessary to calculateall local bulks of the dataset for WCD method We can onlyuse very few points to estimate the local bulks and can alsoget a high accuracy result The computational complexity ofWCDmethod is almost the same as CDmethod and this canbe proved by the following experiments

4 Experimental Study

There are many factors affecting the results using WCDmethod including the sample size the intrinsic dimensionselecting of linear portion of log-log sequence number oflocal bulks used for correlation integral119862(119899 120575) and selectingscales In our experiments samples with different dimensionsand sample sizes are generated by MATLAB randn functionEach sample is independent of Gauss distribution Theperformance of WCDmethod is compared with CD methodand the various factors are analyzed Correlation dimensionsare depicted in Figures 5(a) 5(b) and 5(c) for both WCDand CD methods respectively with three different sample

sizes Specifically only sample sizes of 100 200 and 500 andintrinsic dimensions of 3 5 and 8 are used to plot It is similarfor other sample sizes and intrinsic dimensions For each plotin Figure 5 the horizontal axis indicates the number of localbulks 120573 whose maximum value is the same as the samplesize The vertical axis represents the actual and the estimatedvalues (via the WCD method and the CD method) of theintrinsic dimension Each horizontal green line represents theactual intrinsic dimension for reference Each red dot denotesthe intrinsic dimension estimated by theWCDmethod Eachblack asterisk denotes the intrinsic dimension estimated bythe CD method

It can be well observed from Figure 5 that the intrinsicdimensions calculated by the WCD method are more accu-rate than the ones by the CD method However the frontpart of the curves plotted by the WCD method fluctuatesfrequently This is because there are few local bulks used forintrinsic dimension estimation which lead to the fact thatthe result is instability In addition all the curves plottedby the WCD method slop downward with the number oflocal bulks increasing but they still can converge to a goodvalue In general the front part of the curves plotted by theWCD method is more precise than the latter part The lossof precision is mainly caused by the data distribution Thehigh dense area is chosen first to calculate the local bulksby the vertex degree method leading to the high accuracyHowever with more sparse points or boundary points beingused to calculate the local bulks the accuracy will be lostHence it is inferred that the number of the points usedto calculate the local bulks is one of the main factors tothe intrinsic dimension estimation This also verifies theeffectiveness of our developed methods of using small highdense points to estimate the intrinsic dimension by theWCDmethod Examining the curves estimated by both methodswhen the samples size is fixed the accuracy will graduallyreduce with the increase of actual intrinsic dimension Themain reason is that the dataset becomes more and moresparse with the increasing intrinsic dimension in the samesample size Observing the curves in Figures 5(a) 5(b)and 5(c) respectively it can be seen that the accuracy ofboth methods tends to improve along with the increasingsample sizes in the same actual intrinsic dimension Thisis because the dataset will become dense with the increaseof the sample sizes Additionally the selection of scales 120575

is also an important factor of affecting the performanceof the intrinsic dimension The smaller scales 120575 will beeasily susceptible to noise however the larger scales willresult in saturation phenomenon in which the correlationintegral 119862(119899 120575) will not change with the increasing scales120575 In addition abundance scales will inevitably increase thecomputational cost and the smaller number one will reducethe precision

For the purpose of analyzing the calculation speed wegenerate three dimension datasets with sample sizes from 100to 4000 by MATLAB randn function and estimate intrinsicdimension by these four methods The computation time ofall four methods is shown in Figure 6 It reveals that theGMSTmethod costs themost computation time whileWCDmethod MLEmethod and CDmethod cost almost the same

6 Discrete Dynamics in Nature and Society

0 20 40 60 80 1002

3

4

5

6

7

8Estimate of intrinsic dimension with 100 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

The number used to gather statistics 120573

(a)

Estimate of intrinsic dimension with 200 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 50 100 150 2002

3

4

5

6

7

8

The number used to gather statistics 120573

(b)

Estimate of intrinsic dimension with 500 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 100 200 300 400 5002

3

4

5

6

7

8

9

The number used to gather statistics 120573

(c)

Figure 5 Estimated and actual intrinsic dimension for datasets on different sample size

calculation time But the computation speed ofWCDmethodwill obviously slow down with the increase of the local bulks

5 Empirical Results

In order to validate the proposed method WCD method isused to estimate the intrinsic dimension of two kinds of data-sets (the synthetic datasets and the real world datasets)More-over the comparisons with geodesic minimum spanningtree (GMST) correlation dimension (CD) and maximum

likelihood estimation (MLE) are also performed to furtherthe advantage of our developed findings in this paper

51 Synthetic Datasets In this subsection two syntheticdatasets (Koch curve and S-curve) are firstly investigatedThesample sizes of the two datasets are 2000 respectively andplots are shown in Figures 7 and 8The dimensions estimatedby all methods are listed in Table 1 Koch curve originatesfrom a line whose middle segment is repeatedly replaced byan equilateral triangle If we use a tool whose dimension isless than 1 to measure Koch curve its Hausdorff measure is

Discrete Dynamics in Nature and Society 7

500 1000 1500 2000 2500 3000 3500 40000

5

10

15

20

25

30

35

40

Sample size

Calc

ulat

ion

time (

s)

Improved methodGMST

Corr dimMLE

Figure 6 Calculation time comparison

00

2 4 6 8 10

05

1

15

2

25

3

Figure 7 Koch curve dataset

minus1minus05

005

1

02

4

6minus1

0

1

2

3

Figure 8 S-curve dataset

Table 1 Intrinsic dimension estimation of synthetic datasets withdifferent methods

Dataset Datadim

Samplesize

Improvedcorr dim

Corrdim MLE GMST

Kochcurve 1-2 1025 13801 11206 17481 13424

S-curve 2 2000 20005 19567 19865 20994

inf If we use two dimensions to measure it its Hausdorffmeasure is 0 So the intrinsic dimension of Koch curve isbetween 1 and 2 and the dimension estimated by the fourmethods falls into this range Moreover the data points inS-curve dataset are contained in a curved surface in three-dimensional space so the intrinsic dimension of S-curvedataset is 2 The obtained results show that all the consideredmethods have high accuracy in which the developed one inthis paper is the most optimal

52 Real Datasets Following a similar process in 51 anotherthree real datasets (the laser generated data the Ikeda mapand the Henon map) will be analyzed in this subsectionwhere the specific explanations of the considered real datasetsare illustrated as follows

521 Laser Generated Data The data were recorded froma far-infrared-laser in a chaotic state [4] formed by 1000samples and the attractor dimension is approximately 226The plot is shown in Figure 9

522 Ikeda Map Ikeda map [31] is a complex map which isdefined by

119911(119899+1) = 119886 + 119877119911(119899) exp(119894(120601 minus119901

(1 +1003816100381610038161003816119911(119899)

10038161003816100381610038162)

)) (17)

Ikeda map is derived from a model of the plane-waveinteractivity field in an optical ring laser It is iterated manytimes and the points [Re(119911119899) Im(119911119899)] are plotted for 119899 =

2000 Here 119886 = 10 119877 = 09 120601 = 04 and 119901 = 6 Theintrinsic dimension of this attractor is approximately 17 Thevisualization of the map is shown in Figure 10

523 Henon Map Henon map [31] is usually cast as anequation of the form

119883(119899+1) = 10 minus 1198861199092

(119899)+ 119884(119899)

119884(119899+1) = 119887119909(119899)

(18)

with 119886 = 14 and 119887 = 03 and gives an attractor withintrinsic dimension of approximately 13 The plot of Henonmap dataset for 119899 = 2000 is shown in Figure 11

For estimating the intrinsic dimension of laser generateddata phase space is reconstructed by delay-time embeddingtechnology Although Takens has proved that original statespace of a dynamical system will be reconstructed as long

8 Discrete Dynamics in Nature and Society

Table 2 Intrinsic dimension estimation of real datasets with different methods

Dataset Data dim Sample size Improved corr dim Corr dim MLE GMSTLaser generated data 206 1000 21027 19379 27124 19842Ikeda map 17 2000 16889 16348 18010 18082Henonmap 13 2000 13584 11989 15206 11962

0 200 400 600 800 10000

50

100

150

200

250

300

Figure 9 Dataset of laser generated data

minus05 0 05 1 15 2minus25

minus2

minus15

minus1

minus05

0

05

1

Re (Z)

Im (Z

)

Figure 10 Dataset of Ikeda map

as 119898 gt 2119863 + 1 where 119898 is the embedding dimensionand 119863 denotes the intrinsic dimension of the attractor itis nontrivial to choose the embedding parameters If theproduct (119898 minus 1)120591 is too large then the reconstructed vectorwill be effectively decorrelated in phase space which lead toa larger dimension estimation When the product (119898 minus 1)120591

is too small the reconstructed vector becomes effectivelyredundant whichwill lead to a smaller dimension estimationIn order to compare the index with [4] we select embeddingdimension 119898 = 5 delay time 120591 = 10 Furthermore thedimension of Ikedamap andHenonmap is estimated directlyby dimension estimation method which avoids selecting 119898

minus15 minus1 minus05 0 05 1 15minus04

minus03

minus02

minus01

0

01

02

03

04

y

x

Figure 11 Dataset of Henonmap

and 120591 From Figures 10 and 11 we note that the thinnerattractor is the lower dimension The results are listed inTable 2 from which we can infer that the WCD method isalso effective on the real datasets

6 Conclusion

When the distribution of a dataset is nonuniform the CDmethod for intrinsic dimension suffers from large bias Toaddress this issue the WCD method has been proposedwith an optimized weighted vector determined by the vertexdegreeThe influencing factors of theWCDmethod have alsobeen comprehensively analyzed including the sample sizethe selecting of the linear portion of the log-log sequence thenumber of local bulks used for correlation integral 119862(119899 120575)and the selecting scales The WCD method is validated byexperiments on synthetic datasets and real world datasets

Compared with the CD method the main drawback ofthe WCD method is that the speed of the computation willslow down when a lot of local bulks 120573119883(120575) are calculated Butthe experiments indicate that it is unnecessary to calculate allthe local bulks of the dataset and only a few points in the highdense area of the dataset used to calculate will also obtaina good result From above experiments it can be seen thatthe computational complexity of WCDmethod is almost thesame as CD method when the local bulks are less than 3500Moreover the density estimation of a dataset by vertex degreeis only applicable to a single distribution when the dataset ismultiple distribution WCD method will fail which shouldbe further studied

Discrete Dynamics in Nature and Society 9

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C K Loo A Samraj and G C Lee ldquoEvaluation of methodsfor estimating fractal dimension in motor imagery-based braincomputer interfacerdquo Discrete Dynamics in Nature and Societyvol 2011 Article ID 724697 8 pages 2011

[2] A T B Jin P Y Han and L H Siong ldquoEigenvector weightingfunction in face recognitionrdquo Discrete Dynamics in Nature andSociety vol 2011 Article ID 521935 15 pages 2011

[3] S Wang T Shi M Zeng L Zhang F E Alsaadi and THayat ldquoNew results on robust finite-time boundedness ofuncertain switched neural networks with time-varying delaysrdquoNeurocomputing vol 151 part 1 pp 522ndash530 2015

[4] F Camastra and M Filippone ldquoA comparative evaluation ofnonlinear dynamicsmethods for time series predictionrdquoNeuralComputing and Applications vol 18 no 8 pp 1021ndash1029 2009

[5] J Luo G Li and H Liu ldquoLinear control of fractional-order financial chaotic systems with input saturationrdquo DiscreteDynamics in Nature and Society vol 2014 Article ID 802429 8pages 2014

[6] K M Carter R Raich and I Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[7] S Yin G Wang and X Yang ldquoRobust PLS approach for KPI-related prediction and diagnosis against outliers and missingdatardquo International Journal of Systems Science vol 45 no 7 pp1375ndash1382 2014

[8] L M Elshenawy S Yin A S Naik and S X Ding ldquoEfficientrecursive principal component analysis algorithms for processmonitoringrdquo Industrial and Engineering Chemistry Researchvol 49 no 1 pp 252ndash259 2010

[9] E E Abusham and E K Wong ldquoLocally linear discriminateembedding for face recognitionrdquo Discrete Dynamics in Natureand Society vol 2009 Article ID 916382 8 pages 2009

[10] S Samudrala K Rajanr and B Ganapathysubramanian ldquoDatadimensionality reduction in materials sciencerdquo in Informaticsfor Materials Science and Engineering Data-Driven Discoveryfor Accelerated Experimentation and Application vol 1 pp 97ndash98 Elsevier Science 2013

[11] K M Carter R Raich and A O Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[12] L Liao Y Zhang S J Maybank and Z Liu ldquoIntrinsic dimen-sion estimation via nearest constrained subspace classifierrdquoPattern Recognition vol 47 no 3 pp 1485ndash1493 2014

[13] R Heylen and P Scheunders ldquoHyperspectral intrinsic dimen-sionality estimation with nearest-neighbor distance ratiosrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 6 no 2 pp 570ndash579 2013

[14] S Yin S X Ding X Xie and H Luo ldquoA review on basic data-driven approaches for industrial process monitoringrdquo IEEETransactions on Industrial Electronics vol 61 no 11 pp 6418ndash6428 2014

[15] S Ding S Yin K Peng H Hao and B Shen ldquoA novel schemefor key performance indicator prediction and diagnosis withapplication to an industrial hot strip millrdquo IEEE Transactionson Industrial Informatics vol 9 no 4 pp 2239ndash2247 2012

[16] F Camastra ldquoData dimensionality estimation methods a sur-veyrdquo Pattern Recognition vol 36 no 12 pp 2945ndash2954 2003

[17] J C Harsanyi and C-I Chang ldquoHyperspectral image classifi-cation and dimensionality reduction an orthogonal subspaceprojection approachrdquo IEEE Transactions on Geoscience andRemote Sensing vol 32 no 4 pp 779ndash785 1994

[18] E Levina and P J Bickel ldquoMaximum likelihood estimation ofintrinsic dimensionrdquo in Proceedings of the Annual Conferenceon Neural Information Processing Systems (NIPS rsquo04) pp 1092ndash1106 Vancouver Canada 2004

[19] J He L Ding L Jiang Z Li andQ Hu ldquoIntrinsic dimensional-ity estimation based onmanifold assumptionrdquo Journal of VisualCommunication and Image Representation vol 25 no 5 pp740ndash747 2014

[20] F Camastra and A Vinciarelli ldquoEstimating the intrinsic dimen-sion of data with a fractal-based methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 24 no 10 pp1404ndash1407 2002

[21] M Sadikin and I Wasito ldquoFractal dimension as a data dimen-sionality reduction method for anomaly detection in timeseriesrdquo in Proceedings of the 7th International Conference onInformation amp Communication Technologies (ICT rsquo13) vol 1May 2013

[22] Z Feng and X Sun ldquoBox-counting dimensions of fractal inter-polation surfaces derived from fractal interpolation functionsrdquoJournal of Mathematical Analysis and Applications vol 412 no1 pp 416ndash425 2014

[23] D Sankar and TThomas ldquoFractal features based on differentialbox counting method for the categorization of digital mammo-gramsrdquo International Journal of Computer Information Systemand Industrial Management Applications vol 2 pp 11ndash19 2010

[24] Y-C Tzeng K-T Fan and K-S Chen ldquoA parallel differentialbox-counting algorithm applied to hyperspectral image classi-ficationrdquo IEEE Geoscience and Remote Sensing Letters vol 9 no2 pp 272ndash276 2012

[25] A Yarlagadda J V R Murthy and M H M Krishna prasadldquoEstimating correlation dimension usingmulti layered grid anddamped window model over data streamsrdquo Procedia Technol-ogy vol 10 pp 797ndash804 2013

[26] A R Osborne andA Provenzale ldquoFinite correlation dimensionfor stochastic systems with power-law spectrardquo Physica DNonlinear Phenomena vol 35 no 3 pp 357ndash381 1989

[27] M Fan X Zhang S Chen H Bao and S Maybank ldquoDimen-sion estimation of image manifolds by minimal cover approxi-mationrdquo Neurocomputing vol 105 no 1 pp 19ndash29 2013

[28] V Pestov ldquoAn axiomatic approach to intrinsic dimension of adatasetrdquo Neural Networks vol 21 no 2-3 pp 204ndash213 2008

[29] B Kegl ldquoIntrinsic dimension estimation using packing num-bersrdquo in Proceedings of the 16th Annual Neural Information Pro-cessing Systems Conference (NIPS rsquo02) pp 681ndash688 December2002

[30] K Johnsson Manifold dimension estimation for omics dataanalysis current methods and a novel approach [MS thesis]Lund University 2011

[31] J Theiler ldquoEstimating fractal dimensionrdquo Journal of the OpticalSociety of America A vol 7 no 6 pp 1055ndash1073 1990

[32] D Schleicher ldquoHausdorff dimension its properties and itssurprisesrdquoThe American Mathematical Monthly vol 114 no 6pp 509ndash528 2007

[33] D Mo and S H Huang ldquoFractal-based intrinsic dimensionestimation and its application in dimensionality reductionrdquo

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Dimension Estimation Using Weighted

2 Discrete Dynamics in Nature and Society

by PCA factor analysis or MDS The classical probabilisticapproach is maximum likelihood estimate (MLE) [18] whichestimates the probability distribution of a dataset first andthen the intrinsic dimension is estimated by maximumlikelihood method The accuracy of intrinsic dimensioncompletely depends on the estimation of the probability dis-tributionThegeometric approach includes geodesicminimalspanning tree (GMST) and fractal method GMST simplyconstructs a minimal spanning tree sequence [19] usinggeodesic edge matrix and estimates the intrinsic dimensionby the overall lengths of MST GMST is a global methodwhich does not require estimating the multivariate densityof the dataset but the drawback of GMST is the restrictionto isometric embeddings Fractal dimension [20 21] is a sta-tistical index of complexity of a dataset which is commonlycalculated by box-counting method [22ndash24] and CDmethod[25 26]

In this paper a WCD method is presented to improvethe accuracy of CD method The remainder of this paper isorganized as follows Section 2 presents a review of previouswork on dimension estimation In Section 3 theoreticalanalysis of WCD estimation is conducted Section 4 thor-oughly analyzes the influence of various factors on WCD byexperiments In Section 5 experiments on synthetic and realworld datasets are used to confirm the effectiveness of WCDFinally conclusion is drawn in Section 6

2 Previous Work on Dimension Estimation

Informally intrinsic dimension of a dataset is the mini-mum number of independent variables that can completelydescribe a dataset and it can be used tomeasure complexity ofa datasetThe smaller intrinsic dimension indicates a simplerdataset and vice versa The accurate estimator of intrinsicdimension is useful to improve the performance of dimensionreduction methods and to extract features

A detailed review of intrinsic dimension estimationmethods can be found in [16] which summarised almost allthe typical intrinsic dimension estimation methods so farincluding Fukunaga-Olsenrsquosmethod near neighbormethodsTRN-based methods projection techniques multidimen-sional scaling methods and fractal-based methods Recentlysome new intrinsic dimension estimationmethods have beenpresented such as minimal cover method [27] axiomaticmethod [28] packing number method [29] and expectedabsolute projection (EAP) method [30] Each method hasits own characteristic and therefore can only suit differentdatasets

Fractalmethods are a powerful tool to estimate the intrin-sic dimension Among the existing fractal methods Haus-dorff dimension method box-counting dimension methodand CD method are the most representative ones Furtherresearch on the fractal methods refers to [31]

Hausdorff dimension is the basis of fractal dimensionwhich is derived from Hausdorff measure To proceed fur-ther the Hausdorff measure [32] is firstly introduced

Definition 1 (Hausdorff measure) Let (119883 120588) be a metricspace For any subset 119880 sub 119883 one defines a nonnegativefunction

119867119863

120575(119883) = inf sum

119894isinN

diam (119880119894)119863

119883 sube ⋃

119894isinN

119880119894

119880119894 open diam (119880119894) lt 120575 forall119894 isin N

(1)

where diam(119880) = sup120588(119909 119910) 119909 119910 isin 119880 representsdiameter of subset 119880 119863 dimension Hausdorff measure of 119883can be defined as

119867119863(119883) = lim

120575rarr0

119867119863

120575(119883) (2)

Definition 2 (Hausdorff dimension) Hausdorff dimension ofa set119883 in a metric space (119883 120588) is

dim119867 (119883) = inf 119863 119867119863(119883) = 0

= sup 119863 119867119863(119883) = infin

(3)

Hence Hausdorff dimension119863 is a critical value ofHausdorffmeasure frominfin to 0Hausdorff dimension presents a perfecttheoretical framework for dimension estimation fromwhichmany new fractal dimension estimation methods can bederived But Hausdorff dimension is difficult for dimensionestimation in practice The box-counting dimension derivedfrom Hausdorff dimension simplifies calculation complexityof Hausdorff dimension

Definition 3 (box-counting dimension) For a totallybounded set 119883 in a metric space let 119873120575(119883) be the minimalnumber of balls with scale 120575 that cover 119883 The box-countingdimension is then [33]

dimBC (119883) = lim120575rarr0

log119873120575 (119883)

minus log 120575 (4)

and the necessary condition for the existence of limit is that119873120575(119878) is proportional to 120575

119873120575 (119883) = 119888 sdot 120575(minus119863)

(5)

where 119888 is a constant Take the logarithm on (5)

log119873120575 (119883) = log 119888 minus 119863 log 120575 (6)

The box-counting dimension119863 can be expressed as

119863 =log 119888log 120575

minuslog119873120575 (119883)

log 120575 (7)

and according to (7) in order to obtain a good estimateof 119863 log 119888 log 120575 must approach 0 In practice affected bysample size or the value of 120575 log 119888 log 120575 cannot be completelyeliminated Usually box-counting dimension is determinedby calculating a slope of the linear part of curve fitted bylog119873120575(119883) versus log 120575

Discrete Dynamics in Nature and Society 3

Although box-counting method is simpler in calcula-tion compared with Hausdorff method it still has morecomputation complexity than CD method [32] Let 119883 =

1199091 1199092 119909119899 denote a dataset 119883 isin 119877119863times119899 Correlation

integral 119862(119899 120575) [34] can be defined as

119862 (119899 120575) =2

119899 (119899 minus 1)

119899

sum

119894=1

119899

sum

119895=119894+1

119868 (10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817lt 120575) (8)

where 119909119894 minus119909119895 can be any metric between data points 119909119894 and119909119895 119868(sdot) is Heaviside function which is 1 if the condition ismetand otherwise 0119862(119899 120575) is a statistical average of distances lessthan 120575 It can also be written

119862 (119899 120575) =distances less than 120575

distances altogether (9)

The CD is defined as

119863 = lim120575rarr0

log119862 (119899 120575)

minus log 120575 (10)

although (7) and (10) are the same form of the formula theircalculation process is completely different The numerator ofCDmethod represents a global bulkwith scale120575 however thenumerator of box-counting method stands for the minimumnumber of hyperspheres with scale 120575 that covers the datasetNote that (10) cannot be directly applied to obtain CD inpractice A commonly used scheme is to calculate the slopeof a curve which indicates the relationship of log119862(119899 120575) andlog 120575 Let (log119862(119899 log 1205751) log 1205751) and (log119862(119899 1205752) log 1205752)denote any two points of curve respectively the slope is thendefined as

119863 =log119862 (119899 1205752) minus log119862 (119899 1205751)

log 1205752 minus log 1205751 (11)

and the accuracy of CD method is much dependent onthe choice of 1205751 and 1205752 To get high accurate CD thelinear portion of the log-log (log 120575119896 log119862(119899 120575119896)) sequence isselected and a new straight line is then fitted by the linearportion

3 Theoretical WCD Estimation

31 Analysis of WCD Estimation From a geometric pointof view an objectrsquos bulk is directly related to the dimensionpower of its scale 120575 [31] For example a straight line length isone power of scale The area of a circle is two powers of scaleThe relationship between the bulk and the 120575 can be describedas

bulk sim 120575dimension

(12)

where the bulk can be anymetric like a volume area or massAlthough many notions of bulk are possible a good quantityfor bulk function 120573119883119895

(120575) is defined in CD method [31]

120573119883119895(120575) asymp

1

119899 minus 1

119899

sum

119894=1119894 =119895

119868 (10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817lt 120575) (13)

and (13) indicates that the local bulk is denoted by the numberof points falling into the hypersphere with scale 120575 at center 119909119895It is noted that 119894 = 119895 should be excluded which implies thatthe denominator is 119899 minus 1 rather than 119899 Since 120573119883119895(120575) is a localbulk some averaging method should be used for the globalbulk In CD method the algebraic average is used

119862 (119899 120575) =1

119899

119899

sum

119895=1

120573119883119895(120575) (14)

where 119862(119899 120575) is correlation integral that is global bulkFor the uniform dataset a good result can be obtained

by algebraic average for correlation integral119862(119899 120575) Howeverfor the nonuniform dataset it is unreasonable to treat everypoint equally due to the fact that the local bulk 120573119883119895

(120575) isdifferent at different point Here a developed weighted bulkapproach could be considered for global bulk that is treatingeach local bulk with different weights for global bulk then theglobal bulk can be described as

119862 (119899 120575) =

119899

sum

119895=1

119882(119895) 120573119883119895(120575) (15)

where119882 is the weighted vectorLocal bulk calculated at three cases including high dense

points sparse points and boundary points is shown inFigure 1 Without considering the noise points it is obviousthat the local bulks estimated at high dense area are morereliable than the other two cases It is natural for us to increasethe weights of high dense area and simultaneously decreasethe ones of low dense area and boundary area for dimensionestimation So accurate estimation of the data distributionis important and there are many methods estimating thedistribution of dataset such as the probability distributionestimationmethods and the boundary detection methods Inthis paper the vertex degree of an undirected graph is usedto measure the distribution of a dataset upon which a noveland simple WCD method is then proposed to improve theperformance of CD method If the vertex degree is big thearea around the vertex is dense otherwise it is a sparse pointor a boundary point Moreover vertex degree can reflectthe credibility of the local bulk estimated It is reasonable toregard the vertex degree as a weight of the local bulk Twentypoints are marked by vertex degree method in the dataset inFigure 2 in which ten squares represent the biggest vertexesdegree and ten circles indicate the smallest ones We can seethat the density area and the sparse or boundary points aredistinguished correctly Therefore the WCDmethod is moreaccurate for the intrinsic dimension than CD method Thespecific description ofWCDmethod is shown inAlgorithm 1

32 Selecting the Linear Portion of the log-log SequenceSelecting different portion of the log-log sequence to cal-culate the slope will lead to different precision of CDestimation A log-log plot drawn by the log-log sequence(log 120575119896 log119862(119899 120575119896)) is shown in Figure 3 and it can be dividedinto three portions the low portion the middle portion andthe upper portion In the low portion the scale 120575 of the

4 Discrete Dynamics in Nature and Society

Input Signal dataset119883Output Intrinsic dimension119863(1) Normalize the dataset119883 between 0 and 1 then the distance matrix1198821 can be constructed by1198821(119895 119894) = 119909119894 minus 119909119895(2) Construct the similarity matrix1198822(119895 119894) = exp(minus||119909119894 minus 119909119895||2120579

2) Where 120579 is the variance

of the dataset The vertex degree that is the weighted vector is defined as119882(119895) = sum119899

119894=11198822(119895 119894)

(3) The scale sequences (1205751 1205752 120575119898) are computed by120575119896 = min(1198821) + 119896((max(1198821) minusmin(1198821))119898) 119896 = 1 2 119898 Where119898 is the number of the scale 120575(4) Compute the local bulk 120573119895(120575119896) at point 119909119895 120573119883119895 (120575119896) asymp (1(119899 minus 1))sum

119899

119894=1119894 =119895119868(119909119894 minus 119909119895 lt 120575) 119895 = 1 2 119899

(5) In the scale 120575119896 the global bulk is computed 119862(119899 120575119896) = sum119899

119895=1119882(119895)120573119895(120575119896)

(6) The linear part of the log-log sequence (log 120575119896 log119862(119899 120575

119896)) is selected by 119896-means

method and a curve is fitted using linear part by the linear least square method(7) Correlation dimension is calculated by the slope of the curve119863 = (log119862(119899 1205752) minus log119862(119899 1205751))(log 1205752 minus log 1205751)

Algorithm 1 The calculating procedure of WCD

Sparse point

High density point

Boundary point

Figure 1 Different local bulks at three different points

Figure 2 The indication of falling into the circle at differentlocation

hyperspheres is small and only few points fall into the hyper-spheres So very small noise points can cause great errorwhich is the reason that the low portion occurs fluctuatingphenomenon Besides in the upper portion where the scales120575 of the hyperspheres are larger than a specific value thenumber falling into the hyperspheres will not increase Thescattering plot of the dataset is shown in Figure 4 This is thereason that the upper portion bends down and approaches a

minus45 minus4 minus35 minus3 minus25 minus2 minus15 minus1 minus05 0 05minus1

0

1

2

3

4

5

6

Low portion

Middle portion

Upper portion

log(120575)

log C

(n120575

)

Figure 3 log-log plot for computation of CD

Figure 4 Bending explanation for the upper portion

plateau Usually the middle portion is linear which is perfectto estimate CD of a dataset In order to minimize the errorcaused by nonlinearity we should choose small points fromthe log-log sequence (log 120575119896 log119862(119899 120575119896)) and try our best

Discrete Dynamics in Nature and Society 5

to choose the linear portion of the sequence However tomaximize our sample size we want to include as many pointsas possible How can we accurately choose the linear pointsfrom the log-log sequence For the obvious characteristicsof the three portions of the sequence we can use the 119896-means clustering method to decide which pairs of the log-log sequence should be used for CD estimation 119896-meansclusteringmethod aims to partition the log-log sequence intothree categories by minimizing the objective function

argmin119878

119888

sum

119894=1

sum

119910119895isin119904119894

10038171003817100381710038171003817119910119895 minus 120583119894

10038171003817100381710038171003817

2

(16)

where119910119894 is the pair in the sequence (log 120575119896 log119862(119899 120575119896)) 119888 = 3

represents three categories including the low portion themiddle portion and the upper portion respectively 1198781 1198782 1198783are the number of the three categories 120583119894 is the mean of 119888119894Hence those points that belong to 1198882 are chosen to fit a curveby the least squares method and used to estimate the CDThemost important factor of the 119896-means method is the initialvalue of 120583119894 In this paper the curve is divided equally intothree portions and the mean of each portion represents theinitial value of 120583119894

33 Complexity Analysis ofWCDMethod In this section thecomputational complexity of WCD method is investigatedand compared with CD method From the whole calculationprocess we can see that the local bulk of WCD methodcosts more calculations than that of CD method For theanalysis we assume that the sample size is 119899 The calculationof a local bulk 120573119895(120575) at point 119909119895 with scale 120575 requires 119899 minus 1

operations and the complexity is 119874(119899 minus 1) There are 119899 localbulks that should be calculated so all of the complexity is119874((119899 minus 1)

119899) However the CD method is only 119874((119899 minus 1))

In addition compared with CD method vertex degree needbe calculated in WCDmethod and the complexity cannot beignored when the sample size is huge All these seem that thecomputational complexity of WCD method is much higherthan CD method But actually it is unnecessary to calculateall local bulks of the dataset for WCD method We can onlyuse very few points to estimate the local bulks and can alsoget a high accuracy result The computational complexity ofWCDmethod is almost the same as CDmethod and this canbe proved by the following experiments

4 Experimental Study

There are many factors affecting the results using WCDmethod including the sample size the intrinsic dimensionselecting of linear portion of log-log sequence number oflocal bulks used for correlation integral119862(119899 120575) and selectingscales In our experiments samples with different dimensionsand sample sizes are generated by MATLAB randn functionEach sample is independent of Gauss distribution Theperformance of WCDmethod is compared with CD methodand the various factors are analyzed Correlation dimensionsare depicted in Figures 5(a) 5(b) and 5(c) for both WCDand CD methods respectively with three different sample

sizes Specifically only sample sizes of 100 200 and 500 andintrinsic dimensions of 3 5 and 8 are used to plot It is similarfor other sample sizes and intrinsic dimensions For each plotin Figure 5 the horizontal axis indicates the number of localbulks 120573 whose maximum value is the same as the samplesize The vertical axis represents the actual and the estimatedvalues (via the WCD method and the CD method) of theintrinsic dimension Each horizontal green line represents theactual intrinsic dimension for reference Each red dot denotesthe intrinsic dimension estimated by theWCDmethod Eachblack asterisk denotes the intrinsic dimension estimated bythe CD method

It can be well observed from Figure 5 that the intrinsicdimensions calculated by the WCD method are more accu-rate than the ones by the CD method However the frontpart of the curves plotted by the WCD method fluctuatesfrequently This is because there are few local bulks used forintrinsic dimension estimation which lead to the fact thatthe result is instability In addition all the curves plottedby the WCD method slop downward with the number oflocal bulks increasing but they still can converge to a goodvalue In general the front part of the curves plotted by theWCD method is more precise than the latter part The lossof precision is mainly caused by the data distribution Thehigh dense area is chosen first to calculate the local bulksby the vertex degree method leading to the high accuracyHowever with more sparse points or boundary points beingused to calculate the local bulks the accuracy will be lostHence it is inferred that the number of the points usedto calculate the local bulks is one of the main factors tothe intrinsic dimension estimation This also verifies theeffectiveness of our developed methods of using small highdense points to estimate the intrinsic dimension by theWCDmethod Examining the curves estimated by both methodswhen the samples size is fixed the accuracy will graduallyreduce with the increase of actual intrinsic dimension Themain reason is that the dataset becomes more and moresparse with the increasing intrinsic dimension in the samesample size Observing the curves in Figures 5(a) 5(b)and 5(c) respectively it can be seen that the accuracy ofboth methods tends to improve along with the increasingsample sizes in the same actual intrinsic dimension Thisis because the dataset will become dense with the increaseof the sample sizes Additionally the selection of scales 120575

is also an important factor of affecting the performanceof the intrinsic dimension The smaller scales 120575 will beeasily susceptible to noise however the larger scales willresult in saturation phenomenon in which the correlationintegral 119862(119899 120575) will not change with the increasing scales120575 In addition abundance scales will inevitably increase thecomputational cost and the smaller number one will reducethe precision

For the purpose of analyzing the calculation speed wegenerate three dimension datasets with sample sizes from 100to 4000 by MATLAB randn function and estimate intrinsicdimension by these four methods The computation time ofall four methods is shown in Figure 6 It reveals that theGMSTmethod costs themost computation time whileWCDmethod MLEmethod and CDmethod cost almost the same

6 Discrete Dynamics in Nature and Society

0 20 40 60 80 1002

3

4

5

6

7

8Estimate of intrinsic dimension with 100 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

The number used to gather statistics 120573

(a)

Estimate of intrinsic dimension with 200 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 50 100 150 2002

3

4

5

6

7

8

The number used to gather statistics 120573

(b)

Estimate of intrinsic dimension with 500 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 100 200 300 400 5002

3

4

5

6

7

8

9

The number used to gather statistics 120573

(c)

Figure 5 Estimated and actual intrinsic dimension for datasets on different sample size

calculation time But the computation speed ofWCDmethodwill obviously slow down with the increase of the local bulks

5 Empirical Results

In order to validate the proposed method WCD method isused to estimate the intrinsic dimension of two kinds of data-sets (the synthetic datasets and the real world datasets)More-over the comparisons with geodesic minimum spanningtree (GMST) correlation dimension (CD) and maximum

likelihood estimation (MLE) are also performed to furtherthe advantage of our developed findings in this paper

51 Synthetic Datasets In this subsection two syntheticdatasets (Koch curve and S-curve) are firstly investigatedThesample sizes of the two datasets are 2000 respectively andplots are shown in Figures 7 and 8The dimensions estimatedby all methods are listed in Table 1 Koch curve originatesfrom a line whose middle segment is repeatedly replaced byan equilateral triangle If we use a tool whose dimension isless than 1 to measure Koch curve its Hausdorff measure is

Discrete Dynamics in Nature and Society 7

500 1000 1500 2000 2500 3000 3500 40000

5

10

15

20

25

30

35

40

Sample size

Calc

ulat

ion

time (

s)

Improved methodGMST

Corr dimMLE

Figure 6 Calculation time comparison

00

2 4 6 8 10

05

1

15

2

25

3

Figure 7 Koch curve dataset

minus1minus05

005

1

02

4

6minus1

0

1

2

3

Figure 8 S-curve dataset

Table 1 Intrinsic dimension estimation of synthetic datasets withdifferent methods

Dataset Datadim

Samplesize

Improvedcorr dim

Corrdim MLE GMST

Kochcurve 1-2 1025 13801 11206 17481 13424

S-curve 2 2000 20005 19567 19865 20994

inf If we use two dimensions to measure it its Hausdorffmeasure is 0 So the intrinsic dimension of Koch curve isbetween 1 and 2 and the dimension estimated by the fourmethods falls into this range Moreover the data points inS-curve dataset are contained in a curved surface in three-dimensional space so the intrinsic dimension of S-curvedataset is 2 The obtained results show that all the consideredmethods have high accuracy in which the developed one inthis paper is the most optimal

52 Real Datasets Following a similar process in 51 anotherthree real datasets (the laser generated data the Ikeda mapand the Henon map) will be analyzed in this subsectionwhere the specific explanations of the considered real datasetsare illustrated as follows

521 Laser Generated Data The data were recorded froma far-infrared-laser in a chaotic state [4] formed by 1000samples and the attractor dimension is approximately 226The plot is shown in Figure 9

522 Ikeda Map Ikeda map [31] is a complex map which isdefined by

119911(119899+1) = 119886 + 119877119911(119899) exp(119894(120601 minus119901

(1 +1003816100381610038161003816119911(119899)

10038161003816100381610038162)

)) (17)

Ikeda map is derived from a model of the plane-waveinteractivity field in an optical ring laser It is iterated manytimes and the points [Re(119911119899) Im(119911119899)] are plotted for 119899 =

2000 Here 119886 = 10 119877 = 09 120601 = 04 and 119901 = 6 Theintrinsic dimension of this attractor is approximately 17 Thevisualization of the map is shown in Figure 10

523 Henon Map Henon map [31] is usually cast as anequation of the form

119883(119899+1) = 10 minus 1198861199092

(119899)+ 119884(119899)

119884(119899+1) = 119887119909(119899)

(18)

with 119886 = 14 and 119887 = 03 and gives an attractor withintrinsic dimension of approximately 13 The plot of Henonmap dataset for 119899 = 2000 is shown in Figure 11

For estimating the intrinsic dimension of laser generateddata phase space is reconstructed by delay-time embeddingtechnology Although Takens has proved that original statespace of a dynamical system will be reconstructed as long

8 Discrete Dynamics in Nature and Society

Table 2 Intrinsic dimension estimation of real datasets with different methods

Dataset Data dim Sample size Improved corr dim Corr dim MLE GMSTLaser generated data 206 1000 21027 19379 27124 19842Ikeda map 17 2000 16889 16348 18010 18082Henonmap 13 2000 13584 11989 15206 11962

0 200 400 600 800 10000

50

100

150

200

250

300

Figure 9 Dataset of laser generated data

minus05 0 05 1 15 2minus25

minus2

minus15

minus1

minus05

0

05

1

Re (Z)

Im (Z

)

Figure 10 Dataset of Ikeda map

as 119898 gt 2119863 + 1 where 119898 is the embedding dimensionand 119863 denotes the intrinsic dimension of the attractor itis nontrivial to choose the embedding parameters If theproduct (119898 minus 1)120591 is too large then the reconstructed vectorwill be effectively decorrelated in phase space which lead toa larger dimension estimation When the product (119898 minus 1)120591

is too small the reconstructed vector becomes effectivelyredundant whichwill lead to a smaller dimension estimationIn order to compare the index with [4] we select embeddingdimension 119898 = 5 delay time 120591 = 10 Furthermore thedimension of Ikedamap andHenonmap is estimated directlyby dimension estimation method which avoids selecting 119898

minus15 minus1 minus05 0 05 1 15minus04

minus03

minus02

minus01

0

01

02

03

04

y

x

Figure 11 Dataset of Henonmap

and 120591 From Figures 10 and 11 we note that the thinnerattractor is the lower dimension The results are listed inTable 2 from which we can infer that the WCD method isalso effective on the real datasets

6 Conclusion

When the distribution of a dataset is nonuniform the CDmethod for intrinsic dimension suffers from large bias Toaddress this issue the WCD method has been proposedwith an optimized weighted vector determined by the vertexdegreeThe influencing factors of theWCDmethod have alsobeen comprehensively analyzed including the sample sizethe selecting of the linear portion of the log-log sequence thenumber of local bulks used for correlation integral 119862(119899 120575)and the selecting scales The WCD method is validated byexperiments on synthetic datasets and real world datasets

Compared with the CD method the main drawback ofthe WCD method is that the speed of the computation willslow down when a lot of local bulks 120573119883(120575) are calculated Butthe experiments indicate that it is unnecessary to calculate allthe local bulks of the dataset and only a few points in the highdense area of the dataset used to calculate will also obtaina good result From above experiments it can be seen thatthe computational complexity of WCDmethod is almost thesame as CD method when the local bulks are less than 3500Moreover the density estimation of a dataset by vertex degreeis only applicable to a single distribution when the dataset ismultiple distribution WCD method will fail which shouldbe further studied

Discrete Dynamics in Nature and Society 9

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C K Loo A Samraj and G C Lee ldquoEvaluation of methodsfor estimating fractal dimension in motor imagery-based braincomputer interfacerdquo Discrete Dynamics in Nature and Societyvol 2011 Article ID 724697 8 pages 2011

[2] A T B Jin P Y Han and L H Siong ldquoEigenvector weightingfunction in face recognitionrdquo Discrete Dynamics in Nature andSociety vol 2011 Article ID 521935 15 pages 2011

[3] S Wang T Shi M Zeng L Zhang F E Alsaadi and THayat ldquoNew results on robust finite-time boundedness ofuncertain switched neural networks with time-varying delaysrdquoNeurocomputing vol 151 part 1 pp 522ndash530 2015

[4] F Camastra and M Filippone ldquoA comparative evaluation ofnonlinear dynamicsmethods for time series predictionrdquoNeuralComputing and Applications vol 18 no 8 pp 1021ndash1029 2009

[5] J Luo G Li and H Liu ldquoLinear control of fractional-order financial chaotic systems with input saturationrdquo DiscreteDynamics in Nature and Society vol 2014 Article ID 802429 8pages 2014

[6] K M Carter R Raich and I Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[7] S Yin G Wang and X Yang ldquoRobust PLS approach for KPI-related prediction and diagnosis against outliers and missingdatardquo International Journal of Systems Science vol 45 no 7 pp1375ndash1382 2014

[8] L M Elshenawy S Yin A S Naik and S X Ding ldquoEfficientrecursive principal component analysis algorithms for processmonitoringrdquo Industrial and Engineering Chemistry Researchvol 49 no 1 pp 252ndash259 2010

[9] E E Abusham and E K Wong ldquoLocally linear discriminateembedding for face recognitionrdquo Discrete Dynamics in Natureand Society vol 2009 Article ID 916382 8 pages 2009

[10] S Samudrala K Rajanr and B Ganapathysubramanian ldquoDatadimensionality reduction in materials sciencerdquo in Informaticsfor Materials Science and Engineering Data-Driven Discoveryfor Accelerated Experimentation and Application vol 1 pp 97ndash98 Elsevier Science 2013

[11] K M Carter R Raich and A O Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[12] L Liao Y Zhang S J Maybank and Z Liu ldquoIntrinsic dimen-sion estimation via nearest constrained subspace classifierrdquoPattern Recognition vol 47 no 3 pp 1485ndash1493 2014

[13] R Heylen and P Scheunders ldquoHyperspectral intrinsic dimen-sionality estimation with nearest-neighbor distance ratiosrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 6 no 2 pp 570ndash579 2013

[14] S Yin S X Ding X Xie and H Luo ldquoA review on basic data-driven approaches for industrial process monitoringrdquo IEEETransactions on Industrial Electronics vol 61 no 11 pp 6418ndash6428 2014

[15] S Ding S Yin K Peng H Hao and B Shen ldquoA novel schemefor key performance indicator prediction and diagnosis withapplication to an industrial hot strip millrdquo IEEE Transactionson Industrial Informatics vol 9 no 4 pp 2239ndash2247 2012

[16] F Camastra ldquoData dimensionality estimation methods a sur-veyrdquo Pattern Recognition vol 36 no 12 pp 2945ndash2954 2003

[17] J C Harsanyi and C-I Chang ldquoHyperspectral image classifi-cation and dimensionality reduction an orthogonal subspaceprojection approachrdquo IEEE Transactions on Geoscience andRemote Sensing vol 32 no 4 pp 779ndash785 1994

[18] E Levina and P J Bickel ldquoMaximum likelihood estimation ofintrinsic dimensionrdquo in Proceedings of the Annual Conferenceon Neural Information Processing Systems (NIPS rsquo04) pp 1092ndash1106 Vancouver Canada 2004

[19] J He L Ding L Jiang Z Li andQ Hu ldquoIntrinsic dimensional-ity estimation based onmanifold assumptionrdquo Journal of VisualCommunication and Image Representation vol 25 no 5 pp740ndash747 2014

[20] F Camastra and A Vinciarelli ldquoEstimating the intrinsic dimen-sion of data with a fractal-based methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 24 no 10 pp1404ndash1407 2002

[21] M Sadikin and I Wasito ldquoFractal dimension as a data dimen-sionality reduction method for anomaly detection in timeseriesrdquo in Proceedings of the 7th International Conference onInformation amp Communication Technologies (ICT rsquo13) vol 1May 2013

[22] Z Feng and X Sun ldquoBox-counting dimensions of fractal inter-polation surfaces derived from fractal interpolation functionsrdquoJournal of Mathematical Analysis and Applications vol 412 no1 pp 416ndash425 2014

[23] D Sankar and TThomas ldquoFractal features based on differentialbox counting method for the categorization of digital mammo-gramsrdquo International Journal of Computer Information Systemand Industrial Management Applications vol 2 pp 11ndash19 2010

[24] Y-C Tzeng K-T Fan and K-S Chen ldquoA parallel differentialbox-counting algorithm applied to hyperspectral image classi-ficationrdquo IEEE Geoscience and Remote Sensing Letters vol 9 no2 pp 272ndash276 2012

[25] A Yarlagadda J V R Murthy and M H M Krishna prasadldquoEstimating correlation dimension usingmulti layered grid anddamped window model over data streamsrdquo Procedia Technol-ogy vol 10 pp 797ndash804 2013

[26] A R Osborne andA Provenzale ldquoFinite correlation dimensionfor stochastic systems with power-law spectrardquo Physica DNonlinear Phenomena vol 35 no 3 pp 357ndash381 1989

[27] M Fan X Zhang S Chen H Bao and S Maybank ldquoDimen-sion estimation of image manifolds by minimal cover approxi-mationrdquo Neurocomputing vol 105 no 1 pp 19ndash29 2013

[28] V Pestov ldquoAn axiomatic approach to intrinsic dimension of adatasetrdquo Neural Networks vol 21 no 2-3 pp 204ndash213 2008

[29] B Kegl ldquoIntrinsic dimension estimation using packing num-bersrdquo in Proceedings of the 16th Annual Neural Information Pro-cessing Systems Conference (NIPS rsquo02) pp 681ndash688 December2002

[30] K Johnsson Manifold dimension estimation for omics dataanalysis current methods and a novel approach [MS thesis]Lund University 2011

[31] J Theiler ldquoEstimating fractal dimensionrdquo Journal of the OpticalSociety of America A vol 7 no 6 pp 1055ndash1073 1990

[32] D Schleicher ldquoHausdorff dimension its properties and itssurprisesrdquoThe American Mathematical Monthly vol 114 no 6pp 509ndash528 2007

[33] D Mo and S H Huang ldquoFractal-based intrinsic dimensionestimation and its application in dimensionality reductionrdquo

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Dimension Estimation Using Weighted

Discrete Dynamics in Nature and Society 3

Although box-counting method is simpler in calcula-tion compared with Hausdorff method it still has morecomputation complexity than CD method [32] Let 119883 =

1199091 1199092 119909119899 denote a dataset 119883 isin 119877119863times119899 Correlation

integral 119862(119899 120575) [34] can be defined as

119862 (119899 120575) =2

119899 (119899 minus 1)

119899

sum

119894=1

119899

sum

119895=119894+1

119868 (10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817lt 120575) (8)

where 119909119894 minus119909119895 can be any metric between data points 119909119894 and119909119895 119868(sdot) is Heaviside function which is 1 if the condition ismetand otherwise 0119862(119899 120575) is a statistical average of distances lessthan 120575 It can also be written

119862 (119899 120575) =distances less than 120575

distances altogether (9)

The CD is defined as

119863 = lim120575rarr0

log119862 (119899 120575)

minus log 120575 (10)

although (7) and (10) are the same form of the formula theircalculation process is completely different The numerator ofCDmethod represents a global bulkwith scale120575 however thenumerator of box-counting method stands for the minimumnumber of hyperspheres with scale 120575 that covers the datasetNote that (10) cannot be directly applied to obtain CD inpractice A commonly used scheme is to calculate the slopeof a curve which indicates the relationship of log119862(119899 120575) andlog 120575 Let (log119862(119899 log 1205751) log 1205751) and (log119862(119899 1205752) log 1205752)denote any two points of curve respectively the slope is thendefined as

119863 =log119862 (119899 1205752) minus log119862 (119899 1205751)

log 1205752 minus log 1205751 (11)

and the accuracy of CD method is much dependent onthe choice of 1205751 and 1205752 To get high accurate CD thelinear portion of the log-log (log 120575119896 log119862(119899 120575119896)) sequence isselected and a new straight line is then fitted by the linearportion

3 Theoretical WCD Estimation

31 Analysis of WCD Estimation From a geometric pointof view an objectrsquos bulk is directly related to the dimensionpower of its scale 120575 [31] For example a straight line length isone power of scale The area of a circle is two powers of scaleThe relationship between the bulk and the 120575 can be describedas

bulk sim 120575dimension

(12)

where the bulk can be anymetric like a volume area or massAlthough many notions of bulk are possible a good quantityfor bulk function 120573119883119895

(120575) is defined in CD method [31]

120573119883119895(120575) asymp

1

119899 minus 1

119899

sum

119894=1119894 =119895

119868 (10038171003817100381710038171003817119909119894 minus 119909119895

10038171003817100381710038171003817lt 120575) (13)

and (13) indicates that the local bulk is denoted by the numberof points falling into the hypersphere with scale 120575 at center 119909119895It is noted that 119894 = 119895 should be excluded which implies thatthe denominator is 119899 minus 1 rather than 119899 Since 120573119883119895(120575) is a localbulk some averaging method should be used for the globalbulk In CD method the algebraic average is used

119862 (119899 120575) =1

119899

119899

sum

119895=1

120573119883119895(120575) (14)

where 119862(119899 120575) is correlation integral that is global bulkFor the uniform dataset a good result can be obtained

by algebraic average for correlation integral119862(119899 120575) Howeverfor the nonuniform dataset it is unreasonable to treat everypoint equally due to the fact that the local bulk 120573119883119895

(120575) isdifferent at different point Here a developed weighted bulkapproach could be considered for global bulk that is treatingeach local bulk with different weights for global bulk then theglobal bulk can be described as

119862 (119899 120575) =

119899

sum

119895=1

119882(119895) 120573119883119895(120575) (15)

where119882 is the weighted vectorLocal bulk calculated at three cases including high dense

points sparse points and boundary points is shown inFigure 1 Without considering the noise points it is obviousthat the local bulks estimated at high dense area are morereliable than the other two cases It is natural for us to increasethe weights of high dense area and simultaneously decreasethe ones of low dense area and boundary area for dimensionestimation So accurate estimation of the data distributionis important and there are many methods estimating thedistribution of dataset such as the probability distributionestimationmethods and the boundary detection methods Inthis paper the vertex degree of an undirected graph is usedto measure the distribution of a dataset upon which a noveland simple WCD method is then proposed to improve theperformance of CD method If the vertex degree is big thearea around the vertex is dense otherwise it is a sparse pointor a boundary point Moreover vertex degree can reflectthe credibility of the local bulk estimated It is reasonable toregard the vertex degree as a weight of the local bulk Twentypoints are marked by vertex degree method in the dataset inFigure 2 in which ten squares represent the biggest vertexesdegree and ten circles indicate the smallest ones We can seethat the density area and the sparse or boundary points aredistinguished correctly Therefore the WCDmethod is moreaccurate for the intrinsic dimension than CD method Thespecific description ofWCDmethod is shown inAlgorithm 1

32 Selecting the Linear Portion of the log-log SequenceSelecting different portion of the log-log sequence to cal-culate the slope will lead to different precision of CDestimation A log-log plot drawn by the log-log sequence(log 120575119896 log119862(119899 120575119896)) is shown in Figure 3 and it can be dividedinto three portions the low portion the middle portion andthe upper portion In the low portion the scale 120575 of the

4 Discrete Dynamics in Nature and Society

Input Signal dataset119883Output Intrinsic dimension119863(1) Normalize the dataset119883 between 0 and 1 then the distance matrix1198821 can be constructed by1198821(119895 119894) = 119909119894 minus 119909119895(2) Construct the similarity matrix1198822(119895 119894) = exp(minus||119909119894 minus 119909119895||2120579

2) Where 120579 is the variance

of the dataset The vertex degree that is the weighted vector is defined as119882(119895) = sum119899

119894=11198822(119895 119894)

(3) The scale sequences (1205751 1205752 120575119898) are computed by120575119896 = min(1198821) + 119896((max(1198821) minusmin(1198821))119898) 119896 = 1 2 119898 Where119898 is the number of the scale 120575(4) Compute the local bulk 120573119895(120575119896) at point 119909119895 120573119883119895 (120575119896) asymp (1(119899 minus 1))sum

119899

119894=1119894 =119895119868(119909119894 minus 119909119895 lt 120575) 119895 = 1 2 119899

(5) In the scale 120575119896 the global bulk is computed 119862(119899 120575119896) = sum119899

119895=1119882(119895)120573119895(120575119896)

(6) The linear part of the log-log sequence (log 120575119896 log119862(119899 120575

119896)) is selected by 119896-means

method and a curve is fitted using linear part by the linear least square method(7) Correlation dimension is calculated by the slope of the curve119863 = (log119862(119899 1205752) minus log119862(119899 1205751))(log 1205752 minus log 1205751)

Algorithm 1 The calculating procedure of WCD

Sparse point

High density point

Boundary point

Figure 1 Different local bulks at three different points

Figure 2 The indication of falling into the circle at differentlocation

hyperspheres is small and only few points fall into the hyper-spheres So very small noise points can cause great errorwhich is the reason that the low portion occurs fluctuatingphenomenon Besides in the upper portion where the scales120575 of the hyperspheres are larger than a specific value thenumber falling into the hyperspheres will not increase Thescattering plot of the dataset is shown in Figure 4 This is thereason that the upper portion bends down and approaches a

minus45 minus4 minus35 minus3 minus25 minus2 minus15 minus1 minus05 0 05minus1

0

1

2

3

4

5

6

Low portion

Middle portion

Upper portion

log(120575)

log C

(n120575

)

Figure 3 log-log plot for computation of CD

Figure 4 Bending explanation for the upper portion

plateau Usually the middle portion is linear which is perfectto estimate CD of a dataset In order to minimize the errorcaused by nonlinearity we should choose small points fromthe log-log sequence (log 120575119896 log119862(119899 120575119896)) and try our best

Discrete Dynamics in Nature and Society 5

to choose the linear portion of the sequence However tomaximize our sample size we want to include as many pointsas possible How can we accurately choose the linear pointsfrom the log-log sequence For the obvious characteristicsof the three portions of the sequence we can use the 119896-means clustering method to decide which pairs of the log-log sequence should be used for CD estimation 119896-meansclusteringmethod aims to partition the log-log sequence intothree categories by minimizing the objective function

argmin119878

119888

sum

119894=1

sum

119910119895isin119904119894

10038171003817100381710038171003817119910119895 minus 120583119894

10038171003817100381710038171003817

2

(16)

where119910119894 is the pair in the sequence (log 120575119896 log119862(119899 120575119896)) 119888 = 3

represents three categories including the low portion themiddle portion and the upper portion respectively 1198781 1198782 1198783are the number of the three categories 120583119894 is the mean of 119888119894Hence those points that belong to 1198882 are chosen to fit a curveby the least squares method and used to estimate the CDThemost important factor of the 119896-means method is the initialvalue of 120583119894 In this paper the curve is divided equally intothree portions and the mean of each portion represents theinitial value of 120583119894

33 Complexity Analysis ofWCDMethod In this section thecomputational complexity of WCD method is investigatedand compared with CD method From the whole calculationprocess we can see that the local bulk of WCD methodcosts more calculations than that of CD method For theanalysis we assume that the sample size is 119899 The calculationof a local bulk 120573119895(120575) at point 119909119895 with scale 120575 requires 119899 minus 1

operations and the complexity is 119874(119899 minus 1) There are 119899 localbulks that should be calculated so all of the complexity is119874((119899 minus 1)

119899) However the CD method is only 119874((119899 minus 1))

In addition compared with CD method vertex degree needbe calculated in WCDmethod and the complexity cannot beignored when the sample size is huge All these seem that thecomputational complexity of WCD method is much higherthan CD method But actually it is unnecessary to calculateall local bulks of the dataset for WCD method We can onlyuse very few points to estimate the local bulks and can alsoget a high accuracy result The computational complexity ofWCDmethod is almost the same as CDmethod and this canbe proved by the following experiments

4 Experimental Study

There are many factors affecting the results using WCDmethod including the sample size the intrinsic dimensionselecting of linear portion of log-log sequence number oflocal bulks used for correlation integral119862(119899 120575) and selectingscales In our experiments samples with different dimensionsand sample sizes are generated by MATLAB randn functionEach sample is independent of Gauss distribution Theperformance of WCDmethod is compared with CD methodand the various factors are analyzed Correlation dimensionsare depicted in Figures 5(a) 5(b) and 5(c) for both WCDand CD methods respectively with three different sample

sizes Specifically only sample sizes of 100 200 and 500 andintrinsic dimensions of 3 5 and 8 are used to plot It is similarfor other sample sizes and intrinsic dimensions For each plotin Figure 5 the horizontal axis indicates the number of localbulks 120573 whose maximum value is the same as the samplesize The vertical axis represents the actual and the estimatedvalues (via the WCD method and the CD method) of theintrinsic dimension Each horizontal green line represents theactual intrinsic dimension for reference Each red dot denotesthe intrinsic dimension estimated by theWCDmethod Eachblack asterisk denotes the intrinsic dimension estimated bythe CD method

It can be well observed from Figure 5 that the intrinsicdimensions calculated by the WCD method are more accu-rate than the ones by the CD method However the frontpart of the curves plotted by the WCD method fluctuatesfrequently This is because there are few local bulks used forintrinsic dimension estimation which lead to the fact thatthe result is instability In addition all the curves plottedby the WCD method slop downward with the number oflocal bulks increasing but they still can converge to a goodvalue In general the front part of the curves plotted by theWCD method is more precise than the latter part The lossof precision is mainly caused by the data distribution Thehigh dense area is chosen first to calculate the local bulksby the vertex degree method leading to the high accuracyHowever with more sparse points or boundary points beingused to calculate the local bulks the accuracy will be lostHence it is inferred that the number of the points usedto calculate the local bulks is one of the main factors tothe intrinsic dimension estimation This also verifies theeffectiveness of our developed methods of using small highdense points to estimate the intrinsic dimension by theWCDmethod Examining the curves estimated by both methodswhen the samples size is fixed the accuracy will graduallyreduce with the increase of actual intrinsic dimension Themain reason is that the dataset becomes more and moresparse with the increasing intrinsic dimension in the samesample size Observing the curves in Figures 5(a) 5(b)and 5(c) respectively it can be seen that the accuracy ofboth methods tends to improve along with the increasingsample sizes in the same actual intrinsic dimension Thisis because the dataset will become dense with the increaseof the sample sizes Additionally the selection of scales 120575

is also an important factor of affecting the performanceof the intrinsic dimension The smaller scales 120575 will beeasily susceptible to noise however the larger scales willresult in saturation phenomenon in which the correlationintegral 119862(119899 120575) will not change with the increasing scales120575 In addition abundance scales will inevitably increase thecomputational cost and the smaller number one will reducethe precision

For the purpose of analyzing the calculation speed wegenerate three dimension datasets with sample sizes from 100to 4000 by MATLAB randn function and estimate intrinsicdimension by these four methods The computation time ofall four methods is shown in Figure 6 It reveals that theGMSTmethod costs themost computation time whileWCDmethod MLEmethod and CDmethod cost almost the same

6 Discrete Dynamics in Nature and Society

0 20 40 60 80 1002

3

4

5

6

7

8Estimate of intrinsic dimension with 100 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

The number used to gather statistics 120573

(a)

Estimate of intrinsic dimension with 200 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 50 100 150 2002

3

4

5

6

7

8

The number used to gather statistics 120573

(b)

Estimate of intrinsic dimension with 500 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 100 200 300 400 5002

3

4

5

6

7

8

9

The number used to gather statistics 120573

(c)

Figure 5 Estimated and actual intrinsic dimension for datasets on different sample size

calculation time But the computation speed ofWCDmethodwill obviously slow down with the increase of the local bulks

5 Empirical Results

In order to validate the proposed method WCD method isused to estimate the intrinsic dimension of two kinds of data-sets (the synthetic datasets and the real world datasets)More-over the comparisons with geodesic minimum spanningtree (GMST) correlation dimension (CD) and maximum

likelihood estimation (MLE) are also performed to furtherthe advantage of our developed findings in this paper

51 Synthetic Datasets In this subsection two syntheticdatasets (Koch curve and S-curve) are firstly investigatedThesample sizes of the two datasets are 2000 respectively andplots are shown in Figures 7 and 8The dimensions estimatedby all methods are listed in Table 1 Koch curve originatesfrom a line whose middle segment is repeatedly replaced byan equilateral triangle If we use a tool whose dimension isless than 1 to measure Koch curve its Hausdorff measure is

Discrete Dynamics in Nature and Society 7

500 1000 1500 2000 2500 3000 3500 40000

5

10

15

20

25

30

35

40

Sample size

Calc

ulat

ion

time (

s)

Improved methodGMST

Corr dimMLE

Figure 6 Calculation time comparison

00

2 4 6 8 10

05

1

15

2

25

3

Figure 7 Koch curve dataset

minus1minus05

005

1

02

4

6minus1

0

1

2

3

Figure 8 S-curve dataset

Table 1 Intrinsic dimension estimation of synthetic datasets withdifferent methods

Dataset Datadim

Samplesize

Improvedcorr dim

Corrdim MLE GMST

Kochcurve 1-2 1025 13801 11206 17481 13424

S-curve 2 2000 20005 19567 19865 20994

inf If we use two dimensions to measure it its Hausdorffmeasure is 0 So the intrinsic dimension of Koch curve isbetween 1 and 2 and the dimension estimated by the fourmethods falls into this range Moreover the data points inS-curve dataset are contained in a curved surface in three-dimensional space so the intrinsic dimension of S-curvedataset is 2 The obtained results show that all the consideredmethods have high accuracy in which the developed one inthis paper is the most optimal

52 Real Datasets Following a similar process in 51 anotherthree real datasets (the laser generated data the Ikeda mapand the Henon map) will be analyzed in this subsectionwhere the specific explanations of the considered real datasetsare illustrated as follows

521 Laser Generated Data The data were recorded froma far-infrared-laser in a chaotic state [4] formed by 1000samples and the attractor dimension is approximately 226The plot is shown in Figure 9

522 Ikeda Map Ikeda map [31] is a complex map which isdefined by

119911(119899+1) = 119886 + 119877119911(119899) exp(119894(120601 minus119901

(1 +1003816100381610038161003816119911(119899)

10038161003816100381610038162)

)) (17)

Ikeda map is derived from a model of the plane-waveinteractivity field in an optical ring laser It is iterated manytimes and the points [Re(119911119899) Im(119911119899)] are plotted for 119899 =

2000 Here 119886 = 10 119877 = 09 120601 = 04 and 119901 = 6 Theintrinsic dimension of this attractor is approximately 17 Thevisualization of the map is shown in Figure 10

523 Henon Map Henon map [31] is usually cast as anequation of the form

119883(119899+1) = 10 minus 1198861199092

(119899)+ 119884(119899)

119884(119899+1) = 119887119909(119899)

(18)

with 119886 = 14 and 119887 = 03 and gives an attractor withintrinsic dimension of approximately 13 The plot of Henonmap dataset for 119899 = 2000 is shown in Figure 11

For estimating the intrinsic dimension of laser generateddata phase space is reconstructed by delay-time embeddingtechnology Although Takens has proved that original statespace of a dynamical system will be reconstructed as long

8 Discrete Dynamics in Nature and Society

Table 2 Intrinsic dimension estimation of real datasets with different methods

Dataset Data dim Sample size Improved corr dim Corr dim MLE GMSTLaser generated data 206 1000 21027 19379 27124 19842Ikeda map 17 2000 16889 16348 18010 18082Henonmap 13 2000 13584 11989 15206 11962

0 200 400 600 800 10000

50

100

150

200

250

300

Figure 9 Dataset of laser generated data

minus05 0 05 1 15 2minus25

minus2

minus15

minus1

minus05

0

05

1

Re (Z)

Im (Z

)

Figure 10 Dataset of Ikeda map

as 119898 gt 2119863 + 1 where 119898 is the embedding dimensionand 119863 denotes the intrinsic dimension of the attractor itis nontrivial to choose the embedding parameters If theproduct (119898 minus 1)120591 is too large then the reconstructed vectorwill be effectively decorrelated in phase space which lead toa larger dimension estimation When the product (119898 minus 1)120591

is too small the reconstructed vector becomes effectivelyredundant whichwill lead to a smaller dimension estimationIn order to compare the index with [4] we select embeddingdimension 119898 = 5 delay time 120591 = 10 Furthermore thedimension of Ikedamap andHenonmap is estimated directlyby dimension estimation method which avoids selecting 119898

minus15 minus1 minus05 0 05 1 15minus04

minus03

minus02

minus01

0

01

02

03

04

y

x

Figure 11 Dataset of Henonmap

and 120591 From Figures 10 and 11 we note that the thinnerattractor is the lower dimension The results are listed inTable 2 from which we can infer that the WCD method isalso effective on the real datasets

6 Conclusion

When the distribution of a dataset is nonuniform the CDmethod for intrinsic dimension suffers from large bias Toaddress this issue the WCD method has been proposedwith an optimized weighted vector determined by the vertexdegreeThe influencing factors of theWCDmethod have alsobeen comprehensively analyzed including the sample sizethe selecting of the linear portion of the log-log sequence thenumber of local bulks used for correlation integral 119862(119899 120575)and the selecting scales The WCD method is validated byexperiments on synthetic datasets and real world datasets

Compared with the CD method the main drawback ofthe WCD method is that the speed of the computation willslow down when a lot of local bulks 120573119883(120575) are calculated Butthe experiments indicate that it is unnecessary to calculate allthe local bulks of the dataset and only a few points in the highdense area of the dataset used to calculate will also obtaina good result From above experiments it can be seen thatthe computational complexity of WCDmethod is almost thesame as CD method when the local bulks are less than 3500Moreover the density estimation of a dataset by vertex degreeis only applicable to a single distribution when the dataset ismultiple distribution WCD method will fail which shouldbe further studied

Discrete Dynamics in Nature and Society 9

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C K Loo A Samraj and G C Lee ldquoEvaluation of methodsfor estimating fractal dimension in motor imagery-based braincomputer interfacerdquo Discrete Dynamics in Nature and Societyvol 2011 Article ID 724697 8 pages 2011

[2] A T B Jin P Y Han and L H Siong ldquoEigenvector weightingfunction in face recognitionrdquo Discrete Dynamics in Nature andSociety vol 2011 Article ID 521935 15 pages 2011

[3] S Wang T Shi M Zeng L Zhang F E Alsaadi and THayat ldquoNew results on robust finite-time boundedness ofuncertain switched neural networks with time-varying delaysrdquoNeurocomputing vol 151 part 1 pp 522ndash530 2015

[4] F Camastra and M Filippone ldquoA comparative evaluation ofnonlinear dynamicsmethods for time series predictionrdquoNeuralComputing and Applications vol 18 no 8 pp 1021ndash1029 2009

[5] J Luo G Li and H Liu ldquoLinear control of fractional-order financial chaotic systems with input saturationrdquo DiscreteDynamics in Nature and Society vol 2014 Article ID 802429 8pages 2014

[6] K M Carter R Raich and I Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[7] S Yin G Wang and X Yang ldquoRobust PLS approach for KPI-related prediction and diagnosis against outliers and missingdatardquo International Journal of Systems Science vol 45 no 7 pp1375ndash1382 2014

[8] L M Elshenawy S Yin A S Naik and S X Ding ldquoEfficientrecursive principal component analysis algorithms for processmonitoringrdquo Industrial and Engineering Chemistry Researchvol 49 no 1 pp 252ndash259 2010

[9] E E Abusham and E K Wong ldquoLocally linear discriminateembedding for face recognitionrdquo Discrete Dynamics in Natureand Society vol 2009 Article ID 916382 8 pages 2009

[10] S Samudrala K Rajanr and B Ganapathysubramanian ldquoDatadimensionality reduction in materials sciencerdquo in Informaticsfor Materials Science and Engineering Data-Driven Discoveryfor Accelerated Experimentation and Application vol 1 pp 97ndash98 Elsevier Science 2013

[11] K M Carter R Raich and A O Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[12] L Liao Y Zhang S J Maybank and Z Liu ldquoIntrinsic dimen-sion estimation via nearest constrained subspace classifierrdquoPattern Recognition vol 47 no 3 pp 1485ndash1493 2014

[13] R Heylen and P Scheunders ldquoHyperspectral intrinsic dimen-sionality estimation with nearest-neighbor distance ratiosrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 6 no 2 pp 570ndash579 2013

[14] S Yin S X Ding X Xie and H Luo ldquoA review on basic data-driven approaches for industrial process monitoringrdquo IEEETransactions on Industrial Electronics vol 61 no 11 pp 6418ndash6428 2014

[15] S Ding S Yin K Peng H Hao and B Shen ldquoA novel schemefor key performance indicator prediction and diagnosis withapplication to an industrial hot strip millrdquo IEEE Transactionson Industrial Informatics vol 9 no 4 pp 2239ndash2247 2012

[16] F Camastra ldquoData dimensionality estimation methods a sur-veyrdquo Pattern Recognition vol 36 no 12 pp 2945ndash2954 2003

[17] J C Harsanyi and C-I Chang ldquoHyperspectral image classifi-cation and dimensionality reduction an orthogonal subspaceprojection approachrdquo IEEE Transactions on Geoscience andRemote Sensing vol 32 no 4 pp 779ndash785 1994

[18] E Levina and P J Bickel ldquoMaximum likelihood estimation ofintrinsic dimensionrdquo in Proceedings of the Annual Conferenceon Neural Information Processing Systems (NIPS rsquo04) pp 1092ndash1106 Vancouver Canada 2004

[19] J He L Ding L Jiang Z Li andQ Hu ldquoIntrinsic dimensional-ity estimation based onmanifold assumptionrdquo Journal of VisualCommunication and Image Representation vol 25 no 5 pp740ndash747 2014

[20] F Camastra and A Vinciarelli ldquoEstimating the intrinsic dimen-sion of data with a fractal-based methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 24 no 10 pp1404ndash1407 2002

[21] M Sadikin and I Wasito ldquoFractal dimension as a data dimen-sionality reduction method for anomaly detection in timeseriesrdquo in Proceedings of the 7th International Conference onInformation amp Communication Technologies (ICT rsquo13) vol 1May 2013

[22] Z Feng and X Sun ldquoBox-counting dimensions of fractal inter-polation surfaces derived from fractal interpolation functionsrdquoJournal of Mathematical Analysis and Applications vol 412 no1 pp 416ndash425 2014

[23] D Sankar and TThomas ldquoFractal features based on differentialbox counting method for the categorization of digital mammo-gramsrdquo International Journal of Computer Information Systemand Industrial Management Applications vol 2 pp 11ndash19 2010

[24] Y-C Tzeng K-T Fan and K-S Chen ldquoA parallel differentialbox-counting algorithm applied to hyperspectral image classi-ficationrdquo IEEE Geoscience and Remote Sensing Letters vol 9 no2 pp 272ndash276 2012

[25] A Yarlagadda J V R Murthy and M H M Krishna prasadldquoEstimating correlation dimension usingmulti layered grid anddamped window model over data streamsrdquo Procedia Technol-ogy vol 10 pp 797ndash804 2013

[26] A R Osborne andA Provenzale ldquoFinite correlation dimensionfor stochastic systems with power-law spectrardquo Physica DNonlinear Phenomena vol 35 no 3 pp 357ndash381 1989

[27] M Fan X Zhang S Chen H Bao and S Maybank ldquoDimen-sion estimation of image manifolds by minimal cover approxi-mationrdquo Neurocomputing vol 105 no 1 pp 19ndash29 2013

[28] V Pestov ldquoAn axiomatic approach to intrinsic dimension of adatasetrdquo Neural Networks vol 21 no 2-3 pp 204ndash213 2008

[29] B Kegl ldquoIntrinsic dimension estimation using packing num-bersrdquo in Proceedings of the 16th Annual Neural Information Pro-cessing Systems Conference (NIPS rsquo02) pp 681ndash688 December2002

[30] K Johnsson Manifold dimension estimation for omics dataanalysis current methods and a novel approach [MS thesis]Lund University 2011

[31] J Theiler ldquoEstimating fractal dimensionrdquo Journal of the OpticalSociety of America A vol 7 no 6 pp 1055ndash1073 1990

[32] D Schleicher ldquoHausdorff dimension its properties and itssurprisesrdquoThe American Mathematical Monthly vol 114 no 6pp 509ndash528 2007

[33] D Mo and S H Huang ldquoFractal-based intrinsic dimensionestimation and its application in dimensionality reductionrdquo

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Dimension Estimation Using Weighted

4 Discrete Dynamics in Nature and Society

Input Signal dataset119883Output Intrinsic dimension119863(1) Normalize the dataset119883 between 0 and 1 then the distance matrix1198821 can be constructed by1198821(119895 119894) = 119909119894 minus 119909119895(2) Construct the similarity matrix1198822(119895 119894) = exp(minus||119909119894 minus 119909119895||2120579

2) Where 120579 is the variance

of the dataset The vertex degree that is the weighted vector is defined as119882(119895) = sum119899

119894=11198822(119895 119894)

(3) The scale sequences (1205751 1205752 120575119898) are computed by120575119896 = min(1198821) + 119896((max(1198821) minusmin(1198821))119898) 119896 = 1 2 119898 Where119898 is the number of the scale 120575(4) Compute the local bulk 120573119895(120575119896) at point 119909119895 120573119883119895 (120575119896) asymp (1(119899 minus 1))sum

119899

119894=1119894 =119895119868(119909119894 minus 119909119895 lt 120575) 119895 = 1 2 119899

(5) In the scale 120575119896 the global bulk is computed 119862(119899 120575119896) = sum119899

119895=1119882(119895)120573119895(120575119896)

(6) The linear part of the log-log sequence (log 120575119896 log119862(119899 120575

119896)) is selected by 119896-means

method and a curve is fitted using linear part by the linear least square method(7) Correlation dimension is calculated by the slope of the curve119863 = (log119862(119899 1205752) minus log119862(119899 1205751))(log 1205752 minus log 1205751)

Algorithm 1 The calculating procedure of WCD

Sparse point

High density point

Boundary point

Figure 1 Different local bulks at three different points

Figure 2 The indication of falling into the circle at differentlocation

hyperspheres is small and only few points fall into the hyper-spheres So very small noise points can cause great errorwhich is the reason that the low portion occurs fluctuatingphenomenon Besides in the upper portion where the scales120575 of the hyperspheres are larger than a specific value thenumber falling into the hyperspheres will not increase Thescattering plot of the dataset is shown in Figure 4 This is thereason that the upper portion bends down and approaches a

minus45 minus4 minus35 minus3 minus25 minus2 minus15 minus1 minus05 0 05minus1

0

1

2

3

4

5

6

Low portion

Middle portion

Upper portion

log(120575)

log C

(n120575

)

Figure 3 log-log plot for computation of CD

Figure 4 Bending explanation for the upper portion

plateau Usually the middle portion is linear which is perfectto estimate CD of a dataset In order to minimize the errorcaused by nonlinearity we should choose small points fromthe log-log sequence (log 120575119896 log119862(119899 120575119896)) and try our best

Discrete Dynamics in Nature and Society 5

to choose the linear portion of the sequence However tomaximize our sample size we want to include as many pointsas possible How can we accurately choose the linear pointsfrom the log-log sequence For the obvious characteristicsof the three portions of the sequence we can use the 119896-means clustering method to decide which pairs of the log-log sequence should be used for CD estimation 119896-meansclusteringmethod aims to partition the log-log sequence intothree categories by minimizing the objective function

argmin119878

119888

sum

119894=1

sum

119910119895isin119904119894

10038171003817100381710038171003817119910119895 minus 120583119894

10038171003817100381710038171003817

2

(16)

where119910119894 is the pair in the sequence (log 120575119896 log119862(119899 120575119896)) 119888 = 3

represents three categories including the low portion themiddle portion and the upper portion respectively 1198781 1198782 1198783are the number of the three categories 120583119894 is the mean of 119888119894Hence those points that belong to 1198882 are chosen to fit a curveby the least squares method and used to estimate the CDThemost important factor of the 119896-means method is the initialvalue of 120583119894 In this paper the curve is divided equally intothree portions and the mean of each portion represents theinitial value of 120583119894

33 Complexity Analysis ofWCDMethod In this section thecomputational complexity of WCD method is investigatedand compared with CD method From the whole calculationprocess we can see that the local bulk of WCD methodcosts more calculations than that of CD method For theanalysis we assume that the sample size is 119899 The calculationof a local bulk 120573119895(120575) at point 119909119895 with scale 120575 requires 119899 minus 1

operations and the complexity is 119874(119899 minus 1) There are 119899 localbulks that should be calculated so all of the complexity is119874((119899 minus 1)

119899) However the CD method is only 119874((119899 minus 1))

In addition compared with CD method vertex degree needbe calculated in WCDmethod and the complexity cannot beignored when the sample size is huge All these seem that thecomputational complexity of WCD method is much higherthan CD method But actually it is unnecessary to calculateall local bulks of the dataset for WCD method We can onlyuse very few points to estimate the local bulks and can alsoget a high accuracy result The computational complexity ofWCDmethod is almost the same as CDmethod and this canbe proved by the following experiments

4 Experimental Study

There are many factors affecting the results using WCDmethod including the sample size the intrinsic dimensionselecting of linear portion of log-log sequence number oflocal bulks used for correlation integral119862(119899 120575) and selectingscales In our experiments samples with different dimensionsand sample sizes are generated by MATLAB randn functionEach sample is independent of Gauss distribution Theperformance of WCDmethod is compared with CD methodand the various factors are analyzed Correlation dimensionsare depicted in Figures 5(a) 5(b) and 5(c) for both WCDand CD methods respectively with three different sample

sizes Specifically only sample sizes of 100 200 and 500 andintrinsic dimensions of 3 5 and 8 are used to plot It is similarfor other sample sizes and intrinsic dimensions For each plotin Figure 5 the horizontal axis indicates the number of localbulks 120573 whose maximum value is the same as the samplesize The vertical axis represents the actual and the estimatedvalues (via the WCD method and the CD method) of theintrinsic dimension Each horizontal green line represents theactual intrinsic dimension for reference Each red dot denotesthe intrinsic dimension estimated by theWCDmethod Eachblack asterisk denotes the intrinsic dimension estimated bythe CD method

It can be well observed from Figure 5 that the intrinsicdimensions calculated by the WCD method are more accu-rate than the ones by the CD method However the frontpart of the curves plotted by the WCD method fluctuatesfrequently This is because there are few local bulks used forintrinsic dimension estimation which lead to the fact thatthe result is instability In addition all the curves plottedby the WCD method slop downward with the number oflocal bulks increasing but they still can converge to a goodvalue In general the front part of the curves plotted by theWCD method is more precise than the latter part The lossof precision is mainly caused by the data distribution Thehigh dense area is chosen first to calculate the local bulksby the vertex degree method leading to the high accuracyHowever with more sparse points or boundary points beingused to calculate the local bulks the accuracy will be lostHence it is inferred that the number of the points usedto calculate the local bulks is one of the main factors tothe intrinsic dimension estimation This also verifies theeffectiveness of our developed methods of using small highdense points to estimate the intrinsic dimension by theWCDmethod Examining the curves estimated by both methodswhen the samples size is fixed the accuracy will graduallyreduce with the increase of actual intrinsic dimension Themain reason is that the dataset becomes more and moresparse with the increasing intrinsic dimension in the samesample size Observing the curves in Figures 5(a) 5(b)and 5(c) respectively it can be seen that the accuracy ofboth methods tends to improve along with the increasingsample sizes in the same actual intrinsic dimension Thisis because the dataset will become dense with the increaseof the sample sizes Additionally the selection of scales 120575

is also an important factor of affecting the performanceof the intrinsic dimension The smaller scales 120575 will beeasily susceptible to noise however the larger scales willresult in saturation phenomenon in which the correlationintegral 119862(119899 120575) will not change with the increasing scales120575 In addition abundance scales will inevitably increase thecomputational cost and the smaller number one will reducethe precision

For the purpose of analyzing the calculation speed wegenerate three dimension datasets with sample sizes from 100to 4000 by MATLAB randn function and estimate intrinsicdimension by these four methods The computation time ofall four methods is shown in Figure 6 It reveals that theGMSTmethod costs themost computation time whileWCDmethod MLEmethod and CDmethod cost almost the same

6 Discrete Dynamics in Nature and Society

0 20 40 60 80 1002

3

4

5

6

7

8Estimate of intrinsic dimension with 100 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

The number used to gather statistics 120573

(a)

Estimate of intrinsic dimension with 200 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 50 100 150 2002

3

4

5

6

7

8

The number used to gather statistics 120573

(b)

Estimate of intrinsic dimension with 500 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 100 200 300 400 5002

3

4

5

6

7

8

9

The number used to gather statistics 120573

(c)

Figure 5 Estimated and actual intrinsic dimension for datasets on different sample size

calculation time But the computation speed ofWCDmethodwill obviously slow down with the increase of the local bulks

5 Empirical Results

In order to validate the proposed method WCD method isused to estimate the intrinsic dimension of two kinds of data-sets (the synthetic datasets and the real world datasets)More-over the comparisons with geodesic minimum spanningtree (GMST) correlation dimension (CD) and maximum

likelihood estimation (MLE) are also performed to furtherthe advantage of our developed findings in this paper

51 Synthetic Datasets In this subsection two syntheticdatasets (Koch curve and S-curve) are firstly investigatedThesample sizes of the two datasets are 2000 respectively andplots are shown in Figures 7 and 8The dimensions estimatedby all methods are listed in Table 1 Koch curve originatesfrom a line whose middle segment is repeatedly replaced byan equilateral triangle If we use a tool whose dimension isless than 1 to measure Koch curve its Hausdorff measure is

Discrete Dynamics in Nature and Society 7

500 1000 1500 2000 2500 3000 3500 40000

5

10

15

20

25

30

35

40

Sample size

Calc

ulat

ion

time (

s)

Improved methodGMST

Corr dimMLE

Figure 6 Calculation time comparison

00

2 4 6 8 10

05

1

15

2

25

3

Figure 7 Koch curve dataset

minus1minus05

005

1

02

4

6minus1

0

1

2

3

Figure 8 S-curve dataset

Table 1 Intrinsic dimension estimation of synthetic datasets withdifferent methods

Dataset Datadim

Samplesize

Improvedcorr dim

Corrdim MLE GMST

Kochcurve 1-2 1025 13801 11206 17481 13424

S-curve 2 2000 20005 19567 19865 20994

inf If we use two dimensions to measure it its Hausdorffmeasure is 0 So the intrinsic dimension of Koch curve isbetween 1 and 2 and the dimension estimated by the fourmethods falls into this range Moreover the data points inS-curve dataset are contained in a curved surface in three-dimensional space so the intrinsic dimension of S-curvedataset is 2 The obtained results show that all the consideredmethods have high accuracy in which the developed one inthis paper is the most optimal

52 Real Datasets Following a similar process in 51 anotherthree real datasets (the laser generated data the Ikeda mapand the Henon map) will be analyzed in this subsectionwhere the specific explanations of the considered real datasetsare illustrated as follows

521 Laser Generated Data The data were recorded froma far-infrared-laser in a chaotic state [4] formed by 1000samples and the attractor dimension is approximately 226The plot is shown in Figure 9

522 Ikeda Map Ikeda map [31] is a complex map which isdefined by

119911(119899+1) = 119886 + 119877119911(119899) exp(119894(120601 minus119901

(1 +1003816100381610038161003816119911(119899)

10038161003816100381610038162)

)) (17)

Ikeda map is derived from a model of the plane-waveinteractivity field in an optical ring laser It is iterated manytimes and the points [Re(119911119899) Im(119911119899)] are plotted for 119899 =

2000 Here 119886 = 10 119877 = 09 120601 = 04 and 119901 = 6 Theintrinsic dimension of this attractor is approximately 17 Thevisualization of the map is shown in Figure 10

523 Henon Map Henon map [31] is usually cast as anequation of the form

119883(119899+1) = 10 minus 1198861199092

(119899)+ 119884(119899)

119884(119899+1) = 119887119909(119899)

(18)

with 119886 = 14 and 119887 = 03 and gives an attractor withintrinsic dimension of approximately 13 The plot of Henonmap dataset for 119899 = 2000 is shown in Figure 11

For estimating the intrinsic dimension of laser generateddata phase space is reconstructed by delay-time embeddingtechnology Although Takens has proved that original statespace of a dynamical system will be reconstructed as long

8 Discrete Dynamics in Nature and Society

Table 2 Intrinsic dimension estimation of real datasets with different methods

Dataset Data dim Sample size Improved corr dim Corr dim MLE GMSTLaser generated data 206 1000 21027 19379 27124 19842Ikeda map 17 2000 16889 16348 18010 18082Henonmap 13 2000 13584 11989 15206 11962

0 200 400 600 800 10000

50

100

150

200

250

300

Figure 9 Dataset of laser generated data

minus05 0 05 1 15 2minus25

minus2

minus15

minus1

minus05

0

05

1

Re (Z)

Im (Z

)

Figure 10 Dataset of Ikeda map

as 119898 gt 2119863 + 1 where 119898 is the embedding dimensionand 119863 denotes the intrinsic dimension of the attractor itis nontrivial to choose the embedding parameters If theproduct (119898 minus 1)120591 is too large then the reconstructed vectorwill be effectively decorrelated in phase space which lead toa larger dimension estimation When the product (119898 minus 1)120591

is too small the reconstructed vector becomes effectivelyredundant whichwill lead to a smaller dimension estimationIn order to compare the index with [4] we select embeddingdimension 119898 = 5 delay time 120591 = 10 Furthermore thedimension of Ikedamap andHenonmap is estimated directlyby dimension estimation method which avoids selecting 119898

minus15 minus1 minus05 0 05 1 15minus04

minus03

minus02

minus01

0

01

02

03

04

y

x

Figure 11 Dataset of Henonmap

and 120591 From Figures 10 and 11 we note that the thinnerattractor is the lower dimension The results are listed inTable 2 from which we can infer that the WCD method isalso effective on the real datasets

6 Conclusion

When the distribution of a dataset is nonuniform the CDmethod for intrinsic dimension suffers from large bias Toaddress this issue the WCD method has been proposedwith an optimized weighted vector determined by the vertexdegreeThe influencing factors of theWCDmethod have alsobeen comprehensively analyzed including the sample sizethe selecting of the linear portion of the log-log sequence thenumber of local bulks used for correlation integral 119862(119899 120575)and the selecting scales The WCD method is validated byexperiments on synthetic datasets and real world datasets

Compared with the CD method the main drawback ofthe WCD method is that the speed of the computation willslow down when a lot of local bulks 120573119883(120575) are calculated Butthe experiments indicate that it is unnecessary to calculate allthe local bulks of the dataset and only a few points in the highdense area of the dataset used to calculate will also obtaina good result From above experiments it can be seen thatthe computational complexity of WCDmethod is almost thesame as CD method when the local bulks are less than 3500Moreover the density estimation of a dataset by vertex degreeis only applicable to a single distribution when the dataset ismultiple distribution WCD method will fail which shouldbe further studied

Discrete Dynamics in Nature and Society 9

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C K Loo A Samraj and G C Lee ldquoEvaluation of methodsfor estimating fractal dimension in motor imagery-based braincomputer interfacerdquo Discrete Dynamics in Nature and Societyvol 2011 Article ID 724697 8 pages 2011

[2] A T B Jin P Y Han and L H Siong ldquoEigenvector weightingfunction in face recognitionrdquo Discrete Dynamics in Nature andSociety vol 2011 Article ID 521935 15 pages 2011

[3] S Wang T Shi M Zeng L Zhang F E Alsaadi and THayat ldquoNew results on robust finite-time boundedness ofuncertain switched neural networks with time-varying delaysrdquoNeurocomputing vol 151 part 1 pp 522ndash530 2015

[4] F Camastra and M Filippone ldquoA comparative evaluation ofnonlinear dynamicsmethods for time series predictionrdquoNeuralComputing and Applications vol 18 no 8 pp 1021ndash1029 2009

[5] J Luo G Li and H Liu ldquoLinear control of fractional-order financial chaotic systems with input saturationrdquo DiscreteDynamics in Nature and Society vol 2014 Article ID 802429 8pages 2014

[6] K M Carter R Raich and I Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[7] S Yin G Wang and X Yang ldquoRobust PLS approach for KPI-related prediction and diagnosis against outliers and missingdatardquo International Journal of Systems Science vol 45 no 7 pp1375ndash1382 2014

[8] L M Elshenawy S Yin A S Naik and S X Ding ldquoEfficientrecursive principal component analysis algorithms for processmonitoringrdquo Industrial and Engineering Chemistry Researchvol 49 no 1 pp 252ndash259 2010

[9] E E Abusham and E K Wong ldquoLocally linear discriminateembedding for face recognitionrdquo Discrete Dynamics in Natureand Society vol 2009 Article ID 916382 8 pages 2009

[10] S Samudrala K Rajanr and B Ganapathysubramanian ldquoDatadimensionality reduction in materials sciencerdquo in Informaticsfor Materials Science and Engineering Data-Driven Discoveryfor Accelerated Experimentation and Application vol 1 pp 97ndash98 Elsevier Science 2013

[11] K M Carter R Raich and A O Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[12] L Liao Y Zhang S J Maybank and Z Liu ldquoIntrinsic dimen-sion estimation via nearest constrained subspace classifierrdquoPattern Recognition vol 47 no 3 pp 1485ndash1493 2014

[13] R Heylen and P Scheunders ldquoHyperspectral intrinsic dimen-sionality estimation with nearest-neighbor distance ratiosrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 6 no 2 pp 570ndash579 2013

[14] S Yin S X Ding X Xie and H Luo ldquoA review on basic data-driven approaches for industrial process monitoringrdquo IEEETransactions on Industrial Electronics vol 61 no 11 pp 6418ndash6428 2014

[15] S Ding S Yin K Peng H Hao and B Shen ldquoA novel schemefor key performance indicator prediction and diagnosis withapplication to an industrial hot strip millrdquo IEEE Transactionson Industrial Informatics vol 9 no 4 pp 2239ndash2247 2012

[16] F Camastra ldquoData dimensionality estimation methods a sur-veyrdquo Pattern Recognition vol 36 no 12 pp 2945ndash2954 2003

[17] J C Harsanyi and C-I Chang ldquoHyperspectral image classifi-cation and dimensionality reduction an orthogonal subspaceprojection approachrdquo IEEE Transactions on Geoscience andRemote Sensing vol 32 no 4 pp 779ndash785 1994

[18] E Levina and P J Bickel ldquoMaximum likelihood estimation ofintrinsic dimensionrdquo in Proceedings of the Annual Conferenceon Neural Information Processing Systems (NIPS rsquo04) pp 1092ndash1106 Vancouver Canada 2004

[19] J He L Ding L Jiang Z Li andQ Hu ldquoIntrinsic dimensional-ity estimation based onmanifold assumptionrdquo Journal of VisualCommunication and Image Representation vol 25 no 5 pp740ndash747 2014

[20] F Camastra and A Vinciarelli ldquoEstimating the intrinsic dimen-sion of data with a fractal-based methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 24 no 10 pp1404ndash1407 2002

[21] M Sadikin and I Wasito ldquoFractal dimension as a data dimen-sionality reduction method for anomaly detection in timeseriesrdquo in Proceedings of the 7th International Conference onInformation amp Communication Technologies (ICT rsquo13) vol 1May 2013

[22] Z Feng and X Sun ldquoBox-counting dimensions of fractal inter-polation surfaces derived from fractal interpolation functionsrdquoJournal of Mathematical Analysis and Applications vol 412 no1 pp 416ndash425 2014

[23] D Sankar and TThomas ldquoFractal features based on differentialbox counting method for the categorization of digital mammo-gramsrdquo International Journal of Computer Information Systemand Industrial Management Applications vol 2 pp 11ndash19 2010

[24] Y-C Tzeng K-T Fan and K-S Chen ldquoA parallel differentialbox-counting algorithm applied to hyperspectral image classi-ficationrdquo IEEE Geoscience and Remote Sensing Letters vol 9 no2 pp 272ndash276 2012

[25] A Yarlagadda J V R Murthy and M H M Krishna prasadldquoEstimating correlation dimension usingmulti layered grid anddamped window model over data streamsrdquo Procedia Technol-ogy vol 10 pp 797ndash804 2013

[26] A R Osborne andA Provenzale ldquoFinite correlation dimensionfor stochastic systems with power-law spectrardquo Physica DNonlinear Phenomena vol 35 no 3 pp 357ndash381 1989

[27] M Fan X Zhang S Chen H Bao and S Maybank ldquoDimen-sion estimation of image manifolds by minimal cover approxi-mationrdquo Neurocomputing vol 105 no 1 pp 19ndash29 2013

[28] V Pestov ldquoAn axiomatic approach to intrinsic dimension of adatasetrdquo Neural Networks vol 21 no 2-3 pp 204ndash213 2008

[29] B Kegl ldquoIntrinsic dimension estimation using packing num-bersrdquo in Proceedings of the 16th Annual Neural Information Pro-cessing Systems Conference (NIPS rsquo02) pp 681ndash688 December2002

[30] K Johnsson Manifold dimension estimation for omics dataanalysis current methods and a novel approach [MS thesis]Lund University 2011

[31] J Theiler ldquoEstimating fractal dimensionrdquo Journal of the OpticalSociety of America A vol 7 no 6 pp 1055ndash1073 1990

[32] D Schleicher ldquoHausdorff dimension its properties and itssurprisesrdquoThe American Mathematical Monthly vol 114 no 6pp 509ndash528 2007

[33] D Mo and S H Huang ldquoFractal-based intrinsic dimensionestimation and its application in dimensionality reductionrdquo

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Dimension Estimation Using Weighted

Discrete Dynamics in Nature and Society 5

to choose the linear portion of the sequence However tomaximize our sample size we want to include as many pointsas possible How can we accurately choose the linear pointsfrom the log-log sequence For the obvious characteristicsof the three portions of the sequence we can use the 119896-means clustering method to decide which pairs of the log-log sequence should be used for CD estimation 119896-meansclusteringmethod aims to partition the log-log sequence intothree categories by minimizing the objective function

argmin119878

119888

sum

119894=1

sum

119910119895isin119904119894

10038171003817100381710038171003817119910119895 minus 120583119894

10038171003817100381710038171003817

2

(16)

where119910119894 is the pair in the sequence (log 120575119896 log119862(119899 120575119896)) 119888 = 3

represents three categories including the low portion themiddle portion and the upper portion respectively 1198781 1198782 1198783are the number of the three categories 120583119894 is the mean of 119888119894Hence those points that belong to 1198882 are chosen to fit a curveby the least squares method and used to estimate the CDThemost important factor of the 119896-means method is the initialvalue of 120583119894 In this paper the curve is divided equally intothree portions and the mean of each portion represents theinitial value of 120583119894

33 Complexity Analysis ofWCDMethod In this section thecomputational complexity of WCD method is investigatedand compared with CD method From the whole calculationprocess we can see that the local bulk of WCD methodcosts more calculations than that of CD method For theanalysis we assume that the sample size is 119899 The calculationof a local bulk 120573119895(120575) at point 119909119895 with scale 120575 requires 119899 minus 1

operations and the complexity is 119874(119899 minus 1) There are 119899 localbulks that should be calculated so all of the complexity is119874((119899 minus 1)

119899) However the CD method is only 119874((119899 minus 1))

In addition compared with CD method vertex degree needbe calculated in WCDmethod and the complexity cannot beignored when the sample size is huge All these seem that thecomputational complexity of WCD method is much higherthan CD method But actually it is unnecessary to calculateall local bulks of the dataset for WCD method We can onlyuse very few points to estimate the local bulks and can alsoget a high accuracy result The computational complexity ofWCDmethod is almost the same as CDmethod and this canbe proved by the following experiments

4 Experimental Study

There are many factors affecting the results using WCDmethod including the sample size the intrinsic dimensionselecting of linear portion of log-log sequence number oflocal bulks used for correlation integral119862(119899 120575) and selectingscales In our experiments samples with different dimensionsand sample sizes are generated by MATLAB randn functionEach sample is independent of Gauss distribution Theperformance of WCDmethod is compared with CD methodand the various factors are analyzed Correlation dimensionsare depicted in Figures 5(a) 5(b) and 5(c) for both WCDand CD methods respectively with three different sample

sizes Specifically only sample sizes of 100 200 and 500 andintrinsic dimensions of 3 5 and 8 are used to plot It is similarfor other sample sizes and intrinsic dimensions For each plotin Figure 5 the horizontal axis indicates the number of localbulks 120573 whose maximum value is the same as the samplesize The vertical axis represents the actual and the estimatedvalues (via the WCD method and the CD method) of theintrinsic dimension Each horizontal green line represents theactual intrinsic dimension for reference Each red dot denotesthe intrinsic dimension estimated by theWCDmethod Eachblack asterisk denotes the intrinsic dimension estimated bythe CD method

It can be well observed from Figure 5 that the intrinsicdimensions calculated by the WCD method are more accu-rate than the ones by the CD method However the frontpart of the curves plotted by the WCD method fluctuatesfrequently This is because there are few local bulks used forintrinsic dimension estimation which lead to the fact thatthe result is instability In addition all the curves plottedby the WCD method slop downward with the number oflocal bulks increasing but they still can converge to a goodvalue In general the front part of the curves plotted by theWCD method is more precise than the latter part The lossof precision is mainly caused by the data distribution Thehigh dense area is chosen first to calculate the local bulksby the vertex degree method leading to the high accuracyHowever with more sparse points or boundary points beingused to calculate the local bulks the accuracy will be lostHence it is inferred that the number of the points usedto calculate the local bulks is one of the main factors tothe intrinsic dimension estimation This also verifies theeffectiveness of our developed methods of using small highdense points to estimate the intrinsic dimension by theWCDmethod Examining the curves estimated by both methodswhen the samples size is fixed the accuracy will graduallyreduce with the increase of actual intrinsic dimension Themain reason is that the dataset becomes more and moresparse with the increasing intrinsic dimension in the samesample size Observing the curves in Figures 5(a) 5(b)and 5(c) respectively it can be seen that the accuracy ofboth methods tends to improve along with the increasingsample sizes in the same actual intrinsic dimension Thisis because the dataset will become dense with the increaseof the sample sizes Additionally the selection of scales 120575

is also an important factor of affecting the performanceof the intrinsic dimension The smaller scales 120575 will beeasily susceptible to noise however the larger scales willresult in saturation phenomenon in which the correlationintegral 119862(119899 120575) will not change with the increasing scales120575 In addition abundance scales will inevitably increase thecomputational cost and the smaller number one will reducethe precision

For the purpose of analyzing the calculation speed wegenerate three dimension datasets with sample sizes from 100to 4000 by MATLAB randn function and estimate intrinsicdimension by these four methods The computation time ofall four methods is shown in Figure 6 It reveals that theGMSTmethod costs themost computation time whileWCDmethod MLEmethod and CDmethod cost almost the same

6 Discrete Dynamics in Nature and Society

0 20 40 60 80 1002

3

4

5

6

7

8Estimate of intrinsic dimension with 100 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

The number used to gather statistics 120573

(a)

Estimate of intrinsic dimension with 200 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 50 100 150 2002

3

4

5

6

7

8

The number used to gather statistics 120573

(b)

Estimate of intrinsic dimension with 500 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 100 200 300 400 5002

3

4

5

6

7

8

9

The number used to gather statistics 120573

(c)

Figure 5 Estimated and actual intrinsic dimension for datasets on different sample size

calculation time But the computation speed ofWCDmethodwill obviously slow down with the increase of the local bulks

5 Empirical Results

In order to validate the proposed method WCD method isused to estimate the intrinsic dimension of two kinds of data-sets (the synthetic datasets and the real world datasets)More-over the comparisons with geodesic minimum spanningtree (GMST) correlation dimension (CD) and maximum

likelihood estimation (MLE) are also performed to furtherthe advantage of our developed findings in this paper

51 Synthetic Datasets In this subsection two syntheticdatasets (Koch curve and S-curve) are firstly investigatedThesample sizes of the two datasets are 2000 respectively andplots are shown in Figures 7 and 8The dimensions estimatedby all methods are listed in Table 1 Koch curve originatesfrom a line whose middle segment is repeatedly replaced byan equilateral triangle If we use a tool whose dimension isless than 1 to measure Koch curve its Hausdorff measure is

Discrete Dynamics in Nature and Society 7

500 1000 1500 2000 2500 3000 3500 40000

5

10

15

20

25

30

35

40

Sample size

Calc

ulat

ion

time (

s)

Improved methodGMST

Corr dimMLE

Figure 6 Calculation time comparison

00

2 4 6 8 10

05

1

15

2

25

3

Figure 7 Koch curve dataset

minus1minus05

005

1

02

4

6minus1

0

1

2

3

Figure 8 S-curve dataset

Table 1 Intrinsic dimension estimation of synthetic datasets withdifferent methods

Dataset Datadim

Samplesize

Improvedcorr dim

Corrdim MLE GMST

Kochcurve 1-2 1025 13801 11206 17481 13424

S-curve 2 2000 20005 19567 19865 20994

inf If we use two dimensions to measure it its Hausdorffmeasure is 0 So the intrinsic dimension of Koch curve isbetween 1 and 2 and the dimension estimated by the fourmethods falls into this range Moreover the data points inS-curve dataset are contained in a curved surface in three-dimensional space so the intrinsic dimension of S-curvedataset is 2 The obtained results show that all the consideredmethods have high accuracy in which the developed one inthis paper is the most optimal

52 Real Datasets Following a similar process in 51 anotherthree real datasets (the laser generated data the Ikeda mapand the Henon map) will be analyzed in this subsectionwhere the specific explanations of the considered real datasetsare illustrated as follows

521 Laser Generated Data The data were recorded froma far-infrared-laser in a chaotic state [4] formed by 1000samples and the attractor dimension is approximately 226The plot is shown in Figure 9

522 Ikeda Map Ikeda map [31] is a complex map which isdefined by

119911(119899+1) = 119886 + 119877119911(119899) exp(119894(120601 minus119901

(1 +1003816100381610038161003816119911(119899)

10038161003816100381610038162)

)) (17)

Ikeda map is derived from a model of the plane-waveinteractivity field in an optical ring laser It is iterated manytimes and the points [Re(119911119899) Im(119911119899)] are plotted for 119899 =

2000 Here 119886 = 10 119877 = 09 120601 = 04 and 119901 = 6 Theintrinsic dimension of this attractor is approximately 17 Thevisualization of the map is shown in Figure 10

523 Henon Map Henon map [31] is usually cast as anequation of the form

119883(119899+1) = 10 minus 1198861199092

(119899)+ 119884(119899)

119884(119899+1) = 119887119909(119899)

(18)

with 119886 = 14 and 119887 = 03 and gives an attractor withintrinsic dimension of approximately 13 The plot of Henonmap dataset for 119899 = 2000 is shown in Figure 11

For estimating the intrinsic dimension of laser generateddata phase space is reconstructed by delay-time embeddingtechnology Although Takens has proved that original statespace of a dynamical system will be reconstructed as long

8 Discrete Dynamics in Nature and Society

Table 2 Intrinsic dimension estimation of real datasets with different methods

Dataset Data dim Sample size Improved corr dim Corr dim MLE GMSTLaser generated data 206 1000 21027 19379 27124 19842Ikeda map 17 2000 16889 16348 18010 18082Henonmap 13 2000 13584 11989 15206 11962

0 200 400 600 800 10000

50

100

150

200

250

300

Figure 9 Dataset of laser generated data

minus05 0 05 1 15 2minus25

minus2

minus15

minus1

minus05

0

05

1

Re (Z)

Im (Z

)

Figure 10 Dataset of Ikeda map

as 119898 gt 2119863 + 1 where 119898 is the embedding dimensionand 119863 denotes the intrinsic dimension of the attractor itis nontrivial to choose the embedding parameters If theproduct (119898 minus 1)120591 is too large then the reconstructed vectorwill be effectively decorrelated in phase space which lead toa larger dimension estimation When the product (119898 minus 1)120591

is too small the reconstructed vector becomes effectivelyredundant whichwill lead to a smaller dimension estimationIn order to compare the index with [4] we select embeddingdimension 119898 = 5 delay time 120591 = 10 Furthermore thedimension of Ikedamap andHenonmap is estimated directlyby dimension estimation method which avoids selecting 119898

minus15 minus1 minus05 0 05 1 15minus04

minus03

minus02

minus01

0

01

02

03

04

y

x

Figure 11 Dataset of Henonmap

and 120591 From Figures 10 and 11 we note that the thinnerattractor is the lower dimension The results are listed inTable 2 from which we can infer that the WCD method isalso effective on the real datasets

6 Conclusion

When the distribution of a dataset is nonuniform the CDmethod for intrinsic dimension suffers from large bias Toaddress this issue the WCD method has been proposedwith an optimized weighted vector determined by the vertexdegreeThe influencing factors of theWCDmethod have alsobeen comprehensively analyzed including the sample sizethe selecting of the linear portion of the log-log sequence thenumber of local bulks used for correlation integral 119862(119899 120575)and the selecting scales The WCD method is validated byexperiments on synthetic datasets and real world datasets

Compared with the CD method the main drawback ofthe WCD method is that the speed of the computation willslow down when a lot of local bulks 120573119883(120575) are calculated Butthe experiments indicate that it is unnecessary to calculate allthe local bulks of the dataset and only a few points in the highdense area of the dataset used to calculate will also obtaina good result From above experiments it can be seen thatthe computational complexity of WCDmethod is almost thesame as CD method when the local bulks are less than 3500Moreover the density estimation of a dataset by vertex degreeis only applicable to a single distribution when the dataset ismultiple distribution WCD method will fail which shouldbe further studied

Discrete Dynamics in Nature and Society 9

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C K Loo A Samraj and G C Lee ldquoEvaluation of methodsfor estimating fractal dimension in motor imagery-based braincomputer interfacerdquo Discrete Dynamics in Nature and Societyvol 2011 Article ID 724697 8 pages 2011

[2] A T B Jin P Y Han and L H Siong ldquoEigenvector weightingfunction in face recognitionrdquo Discrete Dynamics in Nature andSociety vol 2011 Article ID 521935 15 pages 2011

[3] S Wang T Shi M Zeng L Zhang F E Alsaadi and THayat ldquoNew results on robust finite-time boundedness ofuncertain switched neural networks with time-varying delaysrdquoNeurocomputing vol 151 part 1 pp 522ndash530 2015

[4] F Camastra and M Filippone ldquoA comparative evaluation ofnonlinear dynamicsmethods for time series predictionrdquoNeuralComputing and Applications vol 18 no 8 pp 1021ndash1029 2009

[5] J Luo G Li and H Liu ldquoLinear control of fractional-order financial chaotic systems with input saturationrdquo DiscreteDynamics in Nature and Society vol 2014 Article ID 802429 8pages 2014

[6] K M Carter R Raich and I Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[7] S Yin G Wang and X Yang ldquoRobust PLS approach for KPI-related prediction and diagnosis against outliers and missingdatardquo International Journal of Systems Science vol 45 no 7 pp1375ndash1382 2014

[8] L M Elshenawy S Yin A S Naik and S X Ding ldquoEfficientrecursive principal component analysis algorithms for processmonitoringrdquo Industrial and Engineering Chemistry Researchvol 49 no 1 pp 252ndash259 2010

[9] E E Abusham and E K Wong ldquoLocally linear discriminateembedding for face recognitionrdquo Discrete Dynamics in Natureand Society vol 2009 Article ID 916382 8 pages 2009

[10] S Samudrala K Rajanr and B Ganapathysubramanian ldquoDatadimensionality reduction in materials sciencerdquo in Informaticsfor Materials Science and Engineering Data-Driven Discoveryfor Accelerated Experimentation and Application vol 1 pp 97ndash98 Elsevier Science 2013

[11] K M Carter R Raich and A O Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[12] L Liao Y Zhang S J Maybank and Z Liu ldquoIntrinsic dimen-sion estimation via nearest constrained subspace classifierrdquoPattern Recognition vol 47 no 3 pp 1485ndash1493 2014

[13] R Heylen and P Scheunders ldquoHyperspectral intrinsic dimen-sionality estimation with nearest-neighbor distance ratiosrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 6 no 2 pp 570ndash579 2013

[14] S Yin S X Ding X Xie and H Luo ldquoA review on basic data-driven approaches for industrial process monitoringrdquo IEEETransactions on Industrial Electronics vol 61 no 11 pp 6418ndash6428 2014

[15] S Ding S Yin K Peng H Hao and B Shen ldquoA novel schemefor key performance indicator prediction and diagnosis withapplication to an industrial hot strip millrdquo IEEE Transactionson Industrial Informatics vol 9 no 4 pp 2239ndash2247 2012

[16] F Camastra ldquoData dimensionality estimation methods a sur-veyrdquo Pattern Recognition vol 36 no 12 pp 2945ndash2954 2003

[17] J C Harsanyi and C-I Chang ldquoHyperspectral image classifi-cation and dimensionality reduction an orthogonal subspaceprojection approachrdquo IEEE Transactions on Geoscience andRemote Sensing vol 32 no 4 pp 779ndash785 1994

[18] E Levina and P J Bickel ldquoMaximum likelihood estimation ofintrinsic dimensionrdquo in Proceedings of the Annual Conferenceon Neural Information Processing Systems (NIPS rsquo04) pp 1092ndash1106 Vancouver Canada 2004

[19] J He L Ding L Jiang Z Li andQ Hu ldquoIntrinsic dimensional-ity estimation based onmanifold assumptionrdquo Journal of VisualCommunication and Image Representation vol 25 no 5 pp740ndash747 2014

[20] F Camastra and A Vinciarelli ldquoEstimating the intrinsic dimen-sion of data with a fractal-based methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 24 no 10 pp1404ndash1407 2002

[21] M Sadikin and I Wasito ldquoFractal dimension as a data dimen-sionality reduction method for anomaly detection in timeseriesrdquo in Proceedings of the 7th International Conference onInformation amp Communication Technologies (ICT rsquo13) vol 1May 2013

[22] Z Feng and X Sun ldquoBox-counting dimensions of fractal inter-polation surfaces derived from fractal interpolation functionsrdquoJournal of Mathematical Analysis and Applications vol 412 no1 pp 416ndash425 2014

[23] D Sankar and TThomas ldquoFractal features based on differentialbox counting method for the categorization of digital mammo-gramsrdquo International Journal of Computer Information Systemand Industrial Management Applications vol 2 pp 11ndash19 2010

[24] Y-C Tzeng K-T Fan and K-S Chen ldquoA parallel differentialbox-counting algorithm applied to hyperspectral image classi-ficationrdquo IEEE Geoscience and Remote Sensing Letters vol 9 no2 pp 272ndash276 2012

[25] A Yarlagadda J V R Murthy and M H M Krishna prasadldquoEstimating correlation dimension usingmulti layered grid anddamped window model over data streamsrdquo Procedia Technol-ogy vol 10 pp 797ndash804 2013

[26] A R Osborne andA Provenzale ldquoFinite correlation dimensionfor stochastic systems with power-law spectrardquo Physica DNonlinear Phenomena vol 35 no 3 pp 357ndash381 1989

[27] M Fan X Zhang S Chen H Bao and S Maybank ldquoDimen-sion estimation of image manifolds by minimal cover approxi-mationrdquo Neurocomputing vol 105 no 1 pp 19ndash29 2013

[28] V Pestov ldquoAn axiomatic approach to intrinsic dimension of adatasetrdquo Neural Networks vol 21 no 2-3 pp 204ndash213 2008

[29] B Kegl ldquoIntrinsic dimension estimation using packing num-bersrdquo in Proceedings of the 16th Annual Neural Information Pro-cessing Systems Conference (NIPS rsquo02) pp 681ndash688 December2002

[30] K Johnsson Manifold dimension estimation for omics dataanalysis current methods and a novel approach [MS thesis]Lund University 2011

[31] J Theiler ldquoEstimating fractal dimensionrdquo Journal of the OpticalSociety of America A vol 7 no 6 pp 1055ndash1073 1990

[32] D Schleicher ldquoHausdorff dimension its properties and itssurprisesrdquoThe American Mathematical Monthly vol 114 no 6pp 509ndash528 2007

[33] D Mo and S H Huang ldquoFractal-based intrinsic dimensionestimation and its application in dimensionality reductionrdquo

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Dimension Estimation Using Weighted

6 Discrete Dynamics in Nature and Society

0 20 40 60 80 1002

3

4

5

6

7

8Estimate of intrinsic dimension with 100 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

The number used to gather statistics 120573

(a)

Estimate of intrinsic dimension with 200 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 50 100 150 2002

3

4

5

6

7

8

The number used to gather statistics 120573

(b)

Estimate of intrinsic dimension with 500 points

Actu

al v

alue

and

estim

ate o

f int

rinsic

dim

ensio

n

Weighted correlation dimensionCorrelation dimensionIntrinsic dimension

0 100 200 300 400 5002

3

4

5

6

7

8

9

The number used to gather statistics 120573

(c)

Figure 5 Estimated and actual intrinsic dimension for datasets on different sample size

calculation time But the computation speed ofWCDmethodwill obviously slow down with the increase of the local bulks

5 Empirical Results

In order to validate the proposed method WCD method isused to estimate the intrinsic dimension of two kinds of data-sets (the synthetic datasets and the real world datasets)More-over the comparisons with geodesic minimum spanningtree (GMST) correlation dimension (CD) and maximum

likelihood estimation (MLE) are also performed to furtherthe advantage of our developed findings in this paper

51 Synthetic Datasets In this subsection two syntheticdatasets (Koch curve and S-curve) are firstly investigatedThesample sizes of the two datasets are 2000 respectively andplots are shown in Figures 7 and 8The dimensions estimatedby all methods are listed in Table 1 Koch curve originatesfrom a line whose middle segment is repeatedly replaced byan equilateral triangle If we use a tool whose dimension isless than 1 to measure Koch curve its Hausdorff measure is

Discrete Dynamics in Nature and Society 7

500 1000 1500 2000 2500 3000 3500 40000

5

10

15

20

25

30

35

40

Sample size

Calc

ulat

ion

time (

s)

Improved methodGMST

Corr dimMLE

Figure 6 Calculation time comparison

00

2 4 6 8 10

05

1

15

2

25

3

Figure 7 Koch curve dataset

minus1minus05

005

1

02

4

6minus1

0

1

2

3

Figure 8 S-curve dataset

Table 1 Intrinsic dimension estimation of synthetic datasets withdifferent methods

Dataset Datadim

Samplesize

Improvedcorr dim

Corrdim MLE GMST

Kochcurve 1-2 1025 13801 11206 17481 13424

S-curve 2 2000 20005 19567 19865 20994

inf If we use two dimensions to measure it its Hausdorffmeasure is 0 So the intrinsic dimension of Koch curve isbetween 1 and 2 and the dimension estimated by the fourmethods falls into this range Moreover the data points inS-curve dataset are contained in a curved surface in three-dimensional space so the intrinsic dimension of S-curvedataset is 2 The obtained results show that all the consideredmethods have high accuracy in which the developed one inthis paper is the most optimal

52 Real Datasets Following a similar process in 51 anotherthree real datasets (the laser generated data the Ikeda mapand the Henon map) will be analyzed in this subsectionwhere the specific explanations of the considered real datasetsare illustrated as follows

521 Laser Generated Data The data were recorded froma far-infrared-laser in a chaotic state [4] formed by 1000samples and the attractor dimension is approximately 226The plot is shown in Figure 9

522 Ikeda Map Ikeda map [31] is a complex map which isdefined by

119911(119899+1) = 119886 + 119877119911(119899) exp(119894(120601 minus119901

(1 +1003816100381610038161003816119911(119899)

10038161003816100381610038162)

)) (17)

Ikeda map is derived from a model of the plane-waveinteractivity field in an optical ring laser It is iterated manytimes and the points [Re(119911119899) Im(119911119899)] are plotted for 119899 =

2000 Here 119886 = 10 119877 = 09 120601 = 04 and 119901 = 6 Theintrinsic dimension of this attractor is approximately 17 Thevisualization of the map is shown in Figure 10

523 Henon Map Henon map [31] is usually cast as anequation of the form

119883(119899+1) = 10 minus 1198861199092

(119899)+ 119884(119899)

119884(119899+1) = 119887119909(119899)

(18)

with 119886 = 14 and 119887 = 03 and gives an attractor withintrinsic dimension of approximately 13 The plot of Henonmap dataset for 119899 = 2000 is shown in Figure 11

For estimating the intrinsic dimension of laser generateddata phase space is reconstructed by delay-time embeddingtechnology Although Takens has proved that original statespace of a dynamical system will be reconstructed as long

8 Discrete Dynamics in Nature and Society

Table 2 Intrinsic dimension estimation of real datasets with different methods

Dataset Data dim Sample size Improved corr dim Corr dim MLE GMSTLaser generated data 206 1000 21027 19379 27124 19842Ikeda map 17 2000 16889 16348 18010 18082Henonmap 13 2000 13584 11989 15206 11962

0 200 400 600 800 10000

50

100

150

200

250

300

Figure 9 Dataset of laser generated data

minus05 0 05 1 15 2minus25

minus2

minus15

minus1

minus05

0

05

1

Re (Z)

Im (Z

)

Figure 10 Dataset of Ikeda map

as 119898 gt 2119863 + 1 where 119898 is the embedding dimensionand 119863 denotes the intrinsic dimension of the attractor itis nontrivial to choose the embedding parameters If theproduct (119898 minus 1)120591 is too large then the reconstructed vectorwill be effectively decorrelated in phase space which lead toa larger dimension estimation When the product (119898 minus 1)120591

is too small the reconstructed vector becomes effectivelyredundant whichwill lead to a smaller dimension estimationIn order to compare the index with [4] we select embeddingdimension 119898 = 5 delay time 120591 = 10 Furthermore thedimension of Ikedamap andHenonmap is estimated directlyby dimension estimation method which avoids selecting 119898

minus15 minus1 minus05 0 05 1 15minus04

minus03

minus02

minus01

0

01

02

03

04

y

x

Figure 11 Dataset of Henonmap

and 120591 From Figures 10 and 11 we note that the thinnerattractor is the lower dimension The results are listed inTable 2 from which we can infer that the WCD method isalso effective on the real datasets

6 Conclusion

When the distribution of a dataset is nonuniform the CDmethod for intrinsic dimension suffers from large bias Toaddress this issue the WCD method has been proposedwith an optimized weighted vector determined by the vertexdegreeThe influencing factors of theWCDmethod have alsobeen comprehensively analyzed including the sample sizethe selecting of the linear portion of the log-log sequence thenumber of local bulks used for correlation integral 119862(119899 120575)and the selecting scales The WCD method is validated byexperiments on synthetic datasets and real world datasets

Compared with the CD method the main drawback ofthe WCD method is that the speed of the computation willslow down when a lot of local bulks 120573119883(120575) are calculated Butthe experiments indicate that it is unnecessary to calculate allthe local bulks of the dataset and only a few points in the highdense area of the dataset used to calculate will also obtaina good result From above experiments it can be seen thatthe computational complexity of WCDmethod is almost thesame as CD method when the local bulks are less than 3500Moreover the density estimation of a dataset by vertex degreeis only applicable to a single distribution when the dataset ismultiple distribution WCD method will fail which shouldbe further studied

Discrete Dynamics in Nature and Society 9

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C K Loo A Samraj and G C Lee ldquoEvaluation of methodsfor estimating fractal dimension in motor imagery-based braincomputer interfacerdquo Discrete Dynamics in Nature and Societyvol 2011 Article ID 724697 8 pages 2011

[2] A T B Jin P Y Han and L H Siong ldquoEigenvector weightingfunction in face recognitionrdquo Discrete Dynamics in Nature andSociety vol 2011 Article ID 521935 15 pages 2011

[3] S Wang T Shi M Zeng L Zhang F E Alsaadi and THayat ldquoNew results on robust finite-time boundedness ofuncertain switched neural networks with time-varying delaysrdquoNeurocomputing vol 151 part 1 pp 522ndash530 2015

[4] F Camastra and M Filippone ldquoA comparative evaluation ofnonlinear dynamicsmethods for time series predictionrdquoNeuralComputing and Applications vol 18 no 8 pp 1021ndash1029 2009

[5] J Luo G Li and H Liu ldquoLinear control of fractional-order financial chaotic systems with input saturationrdquo DiscreteDynamics in Nature and Society vol 2014 Article ID 802429 8pages 2014

[6] K M Carter R Raich and I Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[7] S Yin G Wang and X Yang ldquoRobust PLS approach for KPI-related prediction and diagnosis against outliers and missingdatardquo International Journal of Systems Science vol 45 no 7 pp1375ndash1382 2014

[8] L M Elshenawy S Yin A S Naik and S X Ding ldquoEfficientrecursive principal component analysis algorithms for processmonitoringrdquo Industrial and Engineering Chemistry Researchvol 49 no 1 pp 252ndash259 2010

[9] E E Abusham and E K Wong ldquoLocally linear discriminateembedding for face recognitionrdquo Discrete Dynamics in Natureand Society vol 2009 Article ID 916382 8 pages 2009

[10] S Samudrala K Rajanr and B Ganapathysubramanian ldquoDatadimensionality reduction in materials sciencerdquo in Informaticsfor Materials Science and Engineering Data-Driven Discoveryfor Accelerated Experimentation and Application vol 1 pp 97ndash98 Elsevier Science 2013

[11] K M Carter R Raich and A O Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[12] L Liao Y Zhang S J Maybank and Z Liu ldquoIntrinsic dimen-sion estimation via nearest constrained subspace classifierrdquoPattern Recognition vol 47 no 3 pp 1485ndash1493 2014

[13] R Heylen and P Scheunders ldquoHyperspectral intrinsic dimen-sionality estimation with nearest-neighbor distance ratiosrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 6 no 2 pp 570ndash579 2013

[14] S Yin S X Ding X Xie and H Luo ldquoA review on basic data-driven approaches for industrial process monitoringrdquo IEEETransactions on Industrial Electronics vol 61 no 11 pp 6418ndash6428 2014

[15] S Ding S Yin K Peng H Hao and B Shen ldquoA novel schemefor key performance indicator prediction and diagnosis withapplication to an industrial hot strip millrdquo IEEE Transactionson Industrial Informatics vol 9 no 4 pp 2239ndash2247 2012

[16] F Camastra ldquoData dimensionality estimation methods a sur-veyrdquo Pattern Recognition vol 36 no 12 pp 2945ndash2954 2003

[17] J C Harsanyi and C-I Chang ldquoHyperspectral image classifi-cation and dimensionality reduction an orthogonal subspaceprojection approachrdquo IEEE Transactions on Geoscience andRemote Sensing vol 32 no 4 pp 779ndash785 1994

[18] E Levina and P J Bickel ldquoMaximum likelihood estimation ofintrinsic dimensionrdquo in Proceedings of the Annual Conferenceon Neural Information Processing Systems (NIPS rsquo04) pp 1092ndash1106 Vancouver Canada 2004

[19] J He L Ding L Jiang Z Li andQ Hu ldquoIntrinsic dimensional-ity estimation based onmanifold assumptionrdquo Journal of VisualCommunication and Image Representation vol 25 no 5 pp740ndash747 2014

[20] F Camastra and A Vinciarelli ldquoEstimating the intrinsic dimen-sion of data with a fractal-based methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 24 no 10 pp1404ndash1407 2002

[21] M Sadikin and I Wasito ldquoFractal dimension as a data dimen-sionality reduction method for anomaly detection in timeseriesrdquo in Proceedings of the 7th International Conference onInformation amp Communication Technologies (ICT rsquo13) vol 1May 2013

[22] Z Feng and X Sun ldquoBox-counting dimensions of fractal inter-polation surfaces derived from fractal interpolation functionsrdquoJournal of Mathematical Analysis and Applications vol 412 no1 pp 416ndash425 2014

[23] D Sankar and TThomas ldquoFractal features based on differentialbox counting method for the categorization of digital mammo-gramsrdquo International Journal of Computer Information Systemand Industrial Management Applications vol 2 pp 11ndash19 2010

[24] Y-C Tzeng K-T Fan and K-S Chen ldquoA parallel differentialbox-counting algorithm applied to hyperspectral image classi-ficationrdquo IEEE Geoscience and Remote Sensing Letters vol 9 no2 pp 272ndash276 2012

[25] A Yarlagadda J V R Murthy and M H M Krishna prasadldquoEstimating correlation dimension usingmulti layered grid anddamped window model over data streamsrdquo Procedia Technol-ogy vol 10 pp 797ndash804 2013

[26] A R Osborne andA Provenzale ldquoFinite correlation dimensionfor stochastic systems with power-law spectrardquo Physica DNonlinear Phenomena vol 35 no 3 pp 357ndash381 1989

[27] M Fan X Zhang S Chen H Bao and S Maybank ldquoDimen-sion estimation of image manifolds by minimal cover approxi-mationrdquo Neurocomputing vol 105 no 1 pp 19ndash29 2013

[28] V Pestov ldquoAn axiomatic approach to intrinsic dimension of adatasetrdquo Neural Networks vol 21 no 2-3 pp 204ndash213 2008

[29] B Kegl ldquoIntrinsic dimension estimation using packing num-bersrdquo in Proceedings of the 16th Annual Neural Information Pro-cessing Systems Conference (NIPS rsquo02) pp 681ndash688 December2002

[30] K Johnsson Manifold dimension estimation for omics dataanalysis current methods and a novel approach [MS thesis]Lund University 2011

[31] J Theiler ldquoEstimating fractal dimensionrdquo Journal of the OpticalSociety of America A vol 7 no 6 pp 1055ndash1073 1990

[32] D Schleicher ldquoHausdorff dimension its properties and itssurprisesrdquoThe American Mathematical Monthly vol 114 no 6pp 509ndash528 2007

[33] D Mo and S H Huang ldquoFractal-based intrinsic dimensionestimation and its application in dimensionality reductionrdquo

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Dimension Estimation Using Weighted

Discrete Dynamics in Nature and Society 7

500 1000 1500 2000 2500 3000 3500 40000

5

10

15

20

25

30

35

40

Sample size

Calc

ulat

ion

time (

s)

Improved methodGMST

Corr dimMLE

Figure 6 Calculation time comparison

00

2 4 6 8 10

05

1

15

2

25

3

Figure 7 Koch curve dataset

minus1minus05

005

1

02

4

6minus1

0

1

2

3

Figure 8 S-curve dataset

Table 1 Intrinsic dimension estimation of synthetic datasets withdifferent methods

Dataset Datadim

Samplesize

Improvedcorr dim

Corrdim MLE GMST

Kochcurve 1-2 1025 13801 11206 17481 13424

S-curve 2 2000 20005 19567 19865 20994

inf If we use two dimensions to measure it its Hausdorffmeasure is 0 So the intrinsic dimension of Koch curve isbetween 1 and 2 and the dimension estimated by the fourmethods falls into this range Moreover the data points inS-curve dataset are contained in a curved surface in three-dimensional space so the intrinsic dimension of S-curvedataset is 2 The obtained results show that all the consideredmethods have high accuracy in which the developed one inthis paper is the most optimal

52 Real Datasets Following a similar process in 51 anotherthree real datasets (the laser generated data the Ikeda mapand the Henon map) will be analyzed in this subsectionwhere the specific explanations of the considered real datasetsare illustrated as follows

521 Laser Generated Data The data were recorded froma far-infrared-laser in a chaotic state [4] formed by 1000samples and the attractor dimension is approximately 226The plot is shown in Figure 9

522 Ikeda Map Ikeda map [31] is a complex map which isdefined by

119911(119899+1) = 119886 + 119877119911(119899) exp(119894(120601 minus119901

(1 +1003816100381610038161003816119911(119899)

10038161003816100381610038162)

)) (17)

Ikeda map is derived from a model of the plane-waveinteractivity field in an optical ring laser It is iterated manytimes and the points [Re(119911119899) Im(119911119899)] are plotted for 119899 =

2000 Here 119886 = 10 119877 = 09 120601 = 04 and 119901 = 6 Theintrinsic dimension of this attractor is approximately 17 Thevisualization of the map is shown in Figure 10

523 Henon Map Henon map [31] is usually cast as anequation of the form

119883(119899+1) = 10 minus 1198861199092

(119899)+ 119884(119899)

119884(119899+1) = 119887119909(119899)

(18)

with 119886 = 14 and 119887 = 03 and gives an attractor withintrinsic dimension of approximately 13 The plot of Henonmap dataset for 119899 = 2000 is shown in Figure 11

For estimating the intrinsic dimension of laser generateddata phase space is reconstructed by delay-time embeddingtechnology Although Takens has proved that original statespace of a dynamical system will be reconstructed as long

8 Discrete Dynamics in Nature and Society

Table 2 Intrinsic dimension estimation of real datasets with different methods

Dataset Data dim Sample size Improved corr dim Corr dim MLE GMSTLaser generated data 206 1000 21027 19379 27124 19842Ikeda map 17 2000 16889 16348 18010 18082Henonmap 13 2000 13584 11989 15206 11962

0 200 400 600 800 10000

50

100

150

200

250

300

Figure 9 Dataset of laser generated data

minus05 0 05 1 15 2minus25

minus2

minus15

minus1

minus05

0

05

1

Re (Z)

Im (Z

)

Figure 10 Dataset of Ikeda map

as 119898 gt 2119863 + 1 where 119898 is the embedding dimensionand 119863 denotes the intrinsic dimension of the attractor itis nontrivial to choose the embedding parameters If theproduct (119898 minus 1)120591 is too large then the reconstructed vectorwill be effectively decorrelated in phase space which lead toa larger dimension estimation When the product (119898 minus 1)120591

is too small the reconstructed vector becomes effectivelyredundant whichwill lead to a smaller dimension estimationIn order to compare the index with [4] we select embeddingdimension 119898 = 5 delay time 120591 = 10 Furthermore thedimension of Ikedamap andHenonmap is estimated directlyby dimension estimation method which avoids selecting 119898

minus15 minus1 minus05 0 05 1 15minus04

minus03

minus02

minus01

0

01

02

03

04

y

x

Figure 11 Dataset of Henonmap

and 120591 From Figures 10 and 11 we note that the thinnerattractor is the lower dimension The results are listed inTable 2 from which we can infer that the WCD method isalso effective on the real datasets

6 Conclusion

When the distribution of a dataset is nonuniform the CDmethod for intrinsic dimension suffers from large bias Toaddress this issue the WCD method has been proposedwith an optimized weighted vector determined by the vertexdegreeThe influencing factors of theWCDmethod have alsobeen comprehensively analyzed including the sample sizethe selecting of the linear portion of the log-log sequence thenumber of local bulks used for correlation integral 119862(119899 120575)and the selecting scales The WCD method is validated byexperiments on synthetic datasets and real world datasets

Compared with the CD method the main drawback ofthe WCD method is that the speed of the computation willslow down when a lot of local bulks 120573119883(120575) are calculated Butthe experiments indicate that it is unnecessary to calculate allthe local bulks of the dataset and only a few points in the highdense area of the dataset used to calculate will also obtaina good result From above experiments it can be seen thatthe computational complexity of WCDmethod is almost thesame as CD method when the local bulks are less than 3500Moreover the density estimation of a dataset by vertex degreeis only applicable to a single distribution when the dataset ismultiple distribution WCD method will fail which shouldbe further studied

Discrete Dynamics in Nature and Society 9

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C K Loo A Samraj and G C Lee ldquoEvaluation of methodsfor estimating fractal dimension in motor imagery-based braincomputer interfacerdquo Discrete Dynamics in Nature and Societyvol 2011 Article ID 724697 8 pages 2011

[2] A T B Jin P Y Han and L H Siong ldquoEigenvector weightingfunction in face recognitionrdquo Discrete Dynamics in Nature andSociety vol 2011 Article ID 521935 15 pages 2011

[3] S Wang T Shi M Zeng L Zhang F E Alsaadi and THayat ldquoNew results on robust finite-time boundedness ofuncertain switched neural networks with time-varying delaysrdquoNeurocomputing vol 151 part 1 pp 522ndash530 2015

[4] F Camastra and M Filippone ldquoA comparative evaluation ofnonlinear dynamicsmethods for time series predictionrdquoNeuralComputing and Applications vol 18 no 8 pp 1021ndash1029 2009

[5] J Luo G Li and H Liu ldquoLinear control of fractional-order financial chaotic systems with input saturationrdquo DiscreteDynamics in Nature and Society vol 2014 Article ID 802429 8pages 2014

[6] K M Carter R Raich and I Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[7] S Yin G Wang and X Yang ldquoRobust PLS approach for KPI-related prediction and diagnosis against outliers and missingdatardquo International Journal of Systems Science vol 45 no 7 pp1375ndash1382 2014

[8] L M Elshenawy S Yin A S Naik and S X Ding ldquoEfficientrecursive principal component analysis algorithms for processmonitoringrdquo Industrial and Engineering Chemistry Researchvol 49 no 1 pp 252ndash259 2010

[9] E E Abusham and E K Wong ldquoLocally linear discriminateembedding for face recognitionrdquo Discrete Dynamics in Natureand Society vol 2009 Article ID 916382 8 pages 2009

[10] S Samudrala K Rajanr and B Ganapathysubramanian ldquoDatadimensionality reduction in materials sciencerdquo in Informaticsfor Materials Science and Engineering Data-Driven Discoveryfor Accelerated Experimentation and Application vol 1 pp 97ndash98 Elsevier Science 2013

[11] K M Carter R Raich and A O Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[12] L Liao Y Zhang S J Maybank and Z Liu ldquoIntrinsic dimen-sion estimation via nearest constrained subspace classifierrdquoPattern Recognition vol 47 no 3 pp 1485ndash1493 2014

[13] R Heylen and P Scheunders ldquoHyperspectral intrinsic dimen-sionality estimation with nearest-neighbor distance ratiosrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 6 no 2 pp 570ndash579 2013

[14] S Yin S X Ding X Xie and H Luo ldquoA review on basic data-driven approaches for industrial process monitoringrdquo IEEETransactions on Industrial Electronics vol 61 no 11 pp 6418ndash6428 2014

[15] S Ding S Yin K Peng H Hao and B Shen ldquoA novel schemefor key performance indicator prediction and diagnosis withapplication to an industrial hot strip millrdquo IEEE Transactionson Industrial Informatics vol 9 no 4 pp 2239ndash2247 2012

[16] F Camastra ldquoData dimensionality estimation methods a sur-veyrdquo Pattern Recognition vol 36 no 12 pp 2945ndash2954 2003

[17] J C Harsanyi and C-I Chang ldquoHyperspectral image classifi-cation and dimensionality reduction an orthogonal subspaceprojection approachrdquo IEEE Transactions on Geoscience andRemote Sensing vol 32 no 4 pp 779ndash785 1994

[18] E Levina and P J Bickel ldquoMaximum likelihood estimation ofintrinsic dimensionrdquo in Proceedings of the Annual Conferenceon Neural Information Processing Systems (NIPS rsquo04) pp 1092ndash1106 Vancouver Canada 2004

[19] J He L Ding L Jiang Z Li andQ Hu ldquoIntrinsic dimensional-ity estimation based onmanifold assumptionrdquo Journal of VisualCommunication and Image Representation vol 25 no 5 pp740ndash747 2014

[20] F Camastra and A Vinciarelli ldquoEstimating the intrinsic dimen-sion of data with a fractal-based methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 24 no 10 pp1404ndash1407 2002

[21] M Sadikin and I Wasito ldquoFractal dimension as a data dimen-sionality reduction method for anomaly detection in timeseriesrdquo in Proceedings of the 7th International Conference onInformation amp Communication Technologies (ICT rsquo13) vol 1May 2013

[22] Z Feng and X Sun ldquoBox-counting dimensions of fractal inter-polation surfaces derived from fractal interpolation functionsrdquoJournal of Mathematical Analysis and Applications vol 412 no1 pp 416ndash425 2014

[23] D Sankar and TThomas ldquoFractal features based on differentialbox counting method for the categorization of digital mammo-gramsrdquo International Journal of Computer Information Systemand Industrial Management Applications vol 2 pp 11ndash19 2010

[24] Y-C Tzeng K-T Fan and K-S Chen ldquoA parallel differentialbox-counting algorithm applied to hyperspectral image classi-ficationrdquo IEEE Geoscience and Remote Sensing Letters vol 9 no2 pp 272ndash276 2012

[25] A Yarlagadda J V R Murthy and M H M Krishna prasadldquoEstimating correlation dimension usingmulti layered grid anddamped window model over data streamsrdquo Procedia Technol-ogy vol 10 pp 797ndash804 2013

[26] A R Osborne andA Provenzale ldquoFinite correlation dimensionfor stochastic systems with power-law spectrardquo Physica DNonlinear Phenomena vol 35 no 3 pp 357ndash381 1989

[27] M Fan X Zhang S Chen H Bao and S Maybank ldquoDimen-sion estimation of image manifolds by minimal cover approxi-mationrdquo Neurocomputing vol 105 no 1 pp 19ndash29 2013

[28] V Pestov ldquoAn axiomatic approach to intrinsic dimension of adatasetrdquo Neural Networks vol 21 no 2-3 pp 204ndash213 2008

[29] B Kegl ldquoIntrinsic dimension estimation using packing num-bersrdquo in Proceedings of the 16th Annual Neural Information Pro-cessing Systems Conference (NIPS rsquo02) pp 681ndash688 December2002

[30] K Johnsson Manifold dimension estimation for omics dataanalysis current methods and a novel approach [MS thesis]Lund University 2011

[31] J Theiler ldquoEstimating fractal dimensionrdquo Journal of the OpticalSociety of America A vol 7 no 6 pp 1055ndash1073 1990

[32] D Schleicher ldquoHausdorff dimension its properties and itssurprisesrdquoThe American Mathematical Monthly vol 114 no 6pp 509ndash528 2007

[33] D Mo and S H Huang ldquoFractal-based intrinsic dimensionestimation and its application in dimensionality reductionrdquo

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Dimension Estimation Using Weighted

8 Discrete Dynamics in Nature and Society

Table 2 Intrinsic dimension estimation of real datasets with different methods

Dataset Data dim Sample size Improved corr dim Corr dim MLE GMSTLaser generated data 206 1000 21027 19379 27124 19842Ikeda map 17 2000 16889 16348 18010 18082Henonmap 13 2000 13584 11989 15206 11962

0 200 400 600 800 10000

50

100

150

200

250

300

Figure 9 Dataset of laser generated data

minus05 0 05 1 15 2minus25

minus2

minus15

minus1

minus05

0

05

1

Re (Z)

Im (Z

)

Figure 10 Dataset of Ikeda map

as 119898 gt 2119863 + 1 where 119898 is the embedding dimensionand 119863 denotes the intrinsic dimension of the attractor itis nontrivial to choose the embedding parameters If theproduct (119898 minus 1)120591 is too large then the reconstructed vectorwill be effectively decorrelated in phase space which lead toa larger dimension estimation When the product (119898 minus 1)120591

is too small the reconstructed vector becomes effectivelyredundant whichwill lead to a smaller dimension estimationIn order to compare the index with [4] we select embeddingdimension 119898 = 5 delay time 120591 = 10 Furthermore thedimension of Ikedamap andHenonmap is estimated directlyby dimension estimation method which avoids selecting 119898

minus15 minus1 minus05 0 05 1 15minus04

minus03

minus02

minus01

0

01

02

03

04

y

x

Figure 11 Dataset of Henonmap

and 120591 From Figures 10 and 11 we note that the thinnerattractor is the lower dimension The results are listed inTable 2 from which we can infer that the WCD method isalso effective on the real datasets

6 Conclusion

When the distribution of a dataset is nonuniform the CDmethod for intrinsic dimension suffers from large bias Toaddress this issue the WCD method has been proposedwith an optimized weighted vector determined by the vertexdegreeThe influencing factors of theWCDmethod have alsobeen comprehensively analyzed including the sample sizethe selecting of the linear portion of the log-log sequence thenumber of local bulks used for correlation integral 119862(119899 120575)and the selecting scales The WCD method is validated byexperiments on synthetic datasets and real world datasets

Compared with the CD method the main drawback ofthe WCD method is that the speed of the computation willslow down when a lot of local bulks 120573119883(120575) are calculated Butthe experiments indicate that it is unnecessary to calculate allthe local bulks of the dataset and only a few points in the highdense area of the dataset used to calculate will also obtaina good result From above experiments it can be seen thatthe computational complexity of WCDmethod is almost thesame as CD method when the local bulks are less than 3500Moreover the density estimation of a dataset by vertex degreeis only applicable to a single distribution when the dataset ismultiple distribution WCD method will fail which shouldbe further studied

Discrete Dynamics in Nature and Society 9

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C K Loo A Samraj and G C Lee ldquoEvaluation of methodsfor estimating fractal dimension in motor imagery-based braincomputer interfacerdquo Discrete Dynamics in Nature and Societyvol 2011 Article ID 724697 8 pages 2011

[2] A T B Jin P Y Han and L H Siong ldquoEigenvector weightingfunction in face recognitionrdquo Discrete Dynamics in Nature andSociety vol 2011 Article ID 521935 15 pages 2011

[3] S Wang T Shi M Zeng L Zhang F E Alsaadi and THayat ldquoNew results on robust finite-time boundedness ofuncertain switched neural networks with time-varying delaysrdquoNeurocomputing vol 151 part 1 pp 522ndash530 2015

[4] F Camastra and M Filippone ldquoA comparative evaluation ofnonlinear dynamicsmethods for time series predictionrdquoNeuralComputing and Applications vol 18 no 8 pp 1021ndash1029 2009

[5] J Luo G Li and H Liu ldquoLinear control of fractional-order financial chaotic systems with input saturationrdquo DiscreteDynamics in Nature and Society vol 2014 Article ID 802429 8pages 2014

[6] K M Carter R Raich and I Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[7] S Yin G Wang and X Yang ldquoRobust PLS approach for KPI-related prediction and diagnosis against outliers and missingdatardquo International Journal of Systems Science vol 45 no 7 pp1375ndash1382 2014

[8] L M Elshenawy S Yin A S Naik and S X Ding ldquoEfficientrecursive principal component analysis algorithms for processmonitoringrdquo Industrial and Engineering Chemistry Researchvol 49 no 1 pp 252ndash259 2010

[9] E E Abusham and E K Wong ldquoLocally linear discriminateembedding for face recognitionrdquo Discrete Dynamics in Natureand Society vol 2009 Article ID 916382 8 pages 2009

[10] S Samudrala K Rajanr and B Ganapathysubramanian ldquoDatadimensionality reduction in materials sciencerdquo in Informaticsfor Materials Science and Engineering Data-Driven Discoveryfor Accelerated Experimentation and Application vol 1 pp 97ndash98 Elsevier Science 2013

[11] K M Carter R Raich and A O Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[12] L Liao Y Zhang S J Maybank and Z Liu ldquoIntrinsic dimen-sion estimation via nearest constrained subspace classifierrdquoPattern Recognition vol 47 no 3 pp 1485ndash1493 2014

[13] R Heylen and P Scheunders ldquoHyperspectral intrinsic dimen-sionality estimation with nearest-neighbor distance ratiosrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 6 no 2 pp 570ndash579 2013

[14] S Yin S X Ding X Xie and H Luo ldquoA review on basic data-driven approaches for industrial process monitoringrdquo IEEETransactions on Industrial Electronics vol 61 no 11 pp 6418ndash6428 2014

[15] S Ding S Yin K Peng H Hao and B Shen ldquoA novel schemefor key performance indicator prediction and diagnosis withapplication to an industrial hot strip millrdquo IEEE Transactionson Industrial Informatics vol 9 no 4 pp 2239ndash2247 2012

[16] F Camastra ldquoData dimensionality estimation methods a sur-veyrdquo Pattern Recognition vol 36 no 12 pp 2945ndash2954 2003

[17] J C Harsanyi and C-I Chang ldquoHyperspectral image classifi-cation and dimensionality reduction an orthogonal subspaceprojection approachrdquo IEEE Transactions on Geoscience andRemote Sensing vol 32 no 4 pp 779ndash785 1994

[18] E Levina and P J Bickel ldquoMaximum likelihood estimation ofintrinsic dimensionrdquo in Proceedings of the Annual Conferenceon Neural Information Processing Systems (NIPS rsquo04) pp 1092ndash1106 Vancouver Canada 2004

[19] J He L Ding L Jiang Z Li andQ Hu ldquoIntrinsic dimensional-ity estimation based onmanifold assumptionrdquo Journal of VisualCommunication and Image Representation vol 25 no 5 pp740ndash747 2014

[20] F Camastra and A Vinciarelli ldquoEstimating the intrinsic dimen-sion of data with a fractal-based methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 24 no 10 pp1404ndash1407 2002

[21] M Sadikin and I Wasito ldquoFractal dimension as a data dimen-sionality reduction method for anomaly detection in timeseriesrdquo in Proceedings of the 7th International Conference onInformation amp Communication Technologies (ICT rsquo13) vol 1May 2013

[22] Z Feng and X Sun ldquoBox-counting dimensions of fractal inter-polation surfaces derived from fractal interpolation functionsrdquoJournal of Mathematical Analysis and Applications vol 412 no1 pp 416ndash425 2014

[23] D Sankar and TThomas ldquoFractal features based on differentialbox counting method for the categorization of digital mammo-gramsrdquo International Journal of Computer Information Systemand Industrial Management Applications vol 2 pp 11ndash19 2010

[24] Y-C Tzeng K-T Fan and K-S Chen ldquoA parallel differentialbox-counting algorithm applied to hyperspectral image classi-ficationrdquo IEEE Geoscience and Remote Sensing Letters vol 9 no2 pp 272ndash276 2012

[25] A Yarlagadda J V R Murthy and M H M Krishna prasadldquoEstimating correlation dimension usingmulti layered grid anddamped window model over data streamsrdquo Procedia Technol-ogy vol 10 pp 797ndash804 2013

[26] A R Osborne andA Provenzale ldquoFinite correlation dimensionfor stochastic systems with power-law spectrardquo Physica DNonlinear Phenomena vol 35 no 3 pp 357ndash381 1989

[27] M Fan X Zhang S Chen H Bao and S Maybank ldquoDimen-sion estimation of image manifolds by minimal cover approxi-mationrdquo Neurocomputing vol 105 no 1 pp 19ndash29 2013

[28] V Pestov ldquoAn axiomatic approach to intrinsic dimension of adatasetrdquo Neural Networks vol 21 no 2-3 pp 204ndash213 2008

[29] B Kegl ldquoIntrinsic dimension estimation using packing num-bersrdquo in Proceedings of the 16th Annual Neural Information Pro-cessing Systems Conference (NIPS rsquo02) pp 681ndash688 December2002

[30] K Johnsson Manifold dimension estimation for omics dataanalysis current methods and a novel approach [MS thesis]Lund University 2011

[31] J Theiler ldquoEstimating fractal dimensionrdquo Journal of the OpticalSociety of America A vol 7 no 6 pp 1055ndash1073 1990

[32] D Schleicher ldquoHausdorff dimension its properties and itssurprisesrdquoThe American Mathematical Monthly vol 114 no 6pp 509ndash528 2007

[33] D Mo and S H Huang ldquoFractal-based intrinsic dimensionestimation and its application in dimensionality reductionrdquo

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Dimension Estimation Using Weighted

Discrete Dynamics in Nature and Society 9

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] C K Loo A Samraj and G C Lee ldquoEvaluation of methodsfor estimating fractal dimension in motor imagery-based braincomputer interfacerdquo Discrete Dynamics in Nature and Societyvol 2011 Article ID 724697 8 pages 2011

[2] A T B Jin P Y Han and L H Siong ldquoEigenvector weightingfunction in face recognitionrdquo Discrete Dynamics in Nature andSociety vol 2011 Article ID 521935 15 pages 2011

[3] S Wang T Shi M Zeng L Zhang F E Alsaadi and THayat ldquoNew results on robust finite-time boundedness ofuncertain switched neural networks with time-varying delaysrdquoNeurocomputing vol 151 part 1 pp 522ndash530 2015

[4] F Camastra and M Filippone ldquoA comparative evaluation ofnonlinear dynamicsmethods for time series predictionrdquoNeuralComputing and Applications vol 18 no 8 pp 1021ndash1029 2009

[5] J Luo G Li and H Liu ldquoLinear control of fractional-order financial chaotic systems with input saturationrdquo DiscreteDynamics in Nature and Society vol 2014 Article ID 802429 8pages 2014

[6] K M Carter R Raich and I Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[7] S Yin G Wang and X Yang ldquoRobust PLS approach for KPI-related prediction and diagnosis against outliers and missingdatardquo International Journal of Systems Science vol 45 no 7 pp1375ndash1382 2014

[8] L M Elshenawy S Yin A S Naik and S X Ding ldquoEfficientrecursive principal component analysis algorithms for processmonitoringrdquo Industrial and Engineering Chemistry Researchvol 49 no 1 pp 252ndash259 2010

[9] E E Abusham and E K Wong ldquoLocally linear discriminateembedding for face recognitionrdquo Discrete Dynamics in Natureand Society vol 2009 Article ID 916382 8 pages 2009

[10] S Samudrala K Rajanr and B Ganapathysubramanian ldquoDatadimensionality reduction in materials sciencerdquo in Informaticsfor Materials Science and Engineering Data-Driven Discoveryfor Accelerated Experimentation and Application vol 1 pp 97ndash98 Elsevier Science 2013

[11] K M Carter R Raich and A O Hero ldquoOn local intrinsicdimension estimation and its applicationsrdquo IEEE Transactionson Signal Processing vol 58 no 2 pp 650ndash663 2010

[12] L Liao Y Zhang S J Maybank and Z Liu ldquoIntrinsic dimen-sion estimation via nearest constrained subspace classifierrdquoPattern Recognition vol 47 no 3 pp 1485ndash1493 2014

[13] R Heylen and P Scheunders ldquoHyperspectral intrinsic dimen-sionality estimation with nearest-neighbor distance ratiosrdquoIEEE Journal of Selected Topics in Applied Earth Observationsand Remote Sensing vol 6 no 2 pp 570ndash579 2013

[14] S Yin S X Ding X Xie and H Luo ldquoA review on basic data-driven approaches for industrial process monitoringrdquo IEEETransactions on Industrial Electronics vol 61 no 11 pp 6418ndash6428 2014

[15] S Ding S Yin K Peng H Hao and B Shen ldquoA novel schemefor key performance indicator prediction and diagnosis withapplication to an industrial hot strip millrdquo IEEE Transactionson Industrial Informatics vol 9 no 4 pp 2239ndash2247 2012

[16] F Camastra ldquoData dimensionality estimation methods a sur-veyrdquo Pattern Recognition vol 36 no 12 pp 2945ndash2954 2003

[17] J C Harsanyi and C-I Chang ldquoHyperspectral image classifi-cation and dimensionality reduction an orthogonal subspaceprojection approachrdquo IEEE Transactions on Geoscience andRemote Sensing vol 32 no 4 pp 779ndash785 1994

[18] E Levina and P J Bickel ldquoMaximum likelihood estimation ofintrinsic dimensionrdquo in Proceedings of the Annual Conferenceon Neural Information Processing Systems (NIPS rsquo04) pp 1092ndash1106 Vancouver Canada 2004

[19] J He L Ding L Jiang Z Li andQ Hu ldquoIntrinsic dimensional-ity estimation based onmanifold assumptionrdquo Journal of VisualCommunication and Image Representation vol 25 no 5 pp740ndash747 2014

[20] F Camastra and A Vinciarelli ldquoEstimating the intrinsic dimen-sion of data with a fractal-based methodrdquo IEEE Transactionson Pattern Analysis and Machine Intelligence vol 24 no 10 pp1404ndash1407 2002

[21] M Sadikin and I Wasito ldquoFractal dimension as a data dimen-sionality reduction method for anomaly detection in timeseriesrdquo in Proceedings of the 7th International Conference onInformation amp Communication Technologies (ICT rsquo13) vol 1May 2013

[22] Z Feng and X Sun ldquoBox-counting dimensions of fractal inter-polation surfaces derived from fractal interpolation functionsrdquoJournal of Mathematical Analysis and Applications vol 412 no1 pp 416ndash425 2014

[23] D Sankar and TThomas ldquoFractal features based on differentialbox counting method for the categorization of digital mammo-gramsrdquo International Journal of Computer Information Systemand Industrial Management Applications vol 2 pp 11ndash19 2010

[24] Y-C Tzeng K-T Fan and K-S Chen ldquoA parallel differentialbox-counting algorithm applied to hyperspectral image classi-ficationrdquo IEEE Geoscience and Remote Sensing Letters vol 9 no2 pp 272ndash276 2012

[25] A Yarlagadda J V R Murthy and M H M Krishna prasadldquoEstimating correlation dimension usingmulti layered grid anddamped window model over data streamsrdquo Procedia Technol-ogy vol 10 pp 797ndash804 2013

[26] A R Osborne andA Provenzale ldquoFinite correlation dimensionfor stochastic systems with power-law spectrardquo Physica DNonlinear Phenomena vol 35 no 3 pp 357ndash381 1989

[27] M Fan X Zhang S Chen H Bao and S Maybank ldquoDimen-sion estimation of image manifolds by minimal cover approxi-mationrdquo Neurocomputing vol 105 no 1 pp 19ndash29 2013

[28] V Pestov ldquoAn axiomatic approach to intrinsic dimension of adatasetrdquo Neural Networks vol 21 no 2-3 pp 204ndash213 2008

[29] B Kegl ldquoIntrinsic dimension estimation using packing num-bersrdquo in Proceedings of the 16th Annual Neural Information Pro-cessing Systems Conference (NIPS rsquo02) pp 681ndash688 December2002

[30] K Johnsson Manifold dimension estimation for omics dataanalysis current methods and a novel approach [MS thesis]Lund University 2011

[31] J Theiler ldquoEstimating fractal dimensionrdquo Journal of the OpticalSociety of America A vol 7 no 6 pp 1055ndash1073 1990

[32] D Schleicher ldquoHausdorff dimension its properties and itssurprisesrdquoThe American Mathematical Monthly vol 114 no 6pp 509ndash528 2007

[33] D Mo and S H Huang ldquoFractal-based intrinsic dimensionestimation and its application in dimensionality reductionrdquo

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Dimension Estimation Using Weighted

10 Discrete Dynamics in Nature and Society

IEEE Transactions on Knowledge and Data Engineering vol 24no 1 pp 59ndash71 2012

[34] P Grassberger and I Procaccia ldquoMeasuring the strangeness ofstrange attractorsrdquo inTheTheory of Chaotic Attractors pp 170ndash189 Springer New York NY USA 2004

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Dimension Estimation Using Weighted

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of