14
Supervised classification of share price trends Zhanggui Zeng a, * , Hong Yan a,b a School of Electrical and Information Engineering, The University of Sydney, NSW 2006, Australia b Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong article info Article history: Received 16 March 2006 Received in revised form 4 March 2008 Accepted 8 June 2008 Keywords: Singular spectrum analysis Share price data analysis Clustering algorithms Supervised pattern classification Naïve Bayesian classifier abstract Share price trends can be recognized by using data clustering methods. However, the accu- racy of these methods may be rather low. This paper presents a novel supervised classifi- cation scheme for the recognition and prediction of share price trends. We first produce a smooth time series using zero-phase filtering and singular spectrum analysis from the ori- ginal share price data. We train pattern classifiers using the classification results of both original and filtered time series and then use these classifiers to predict the future share price trends. Experiment results obtained from both synthetic data and real share prices show that the proposed method is effective and outperforms the well-known K-means clustering algorithm. Ó 2008 Elsevier Inc. All rights reserved. 1. Introduction Share price analysis forms a subject of time series analysis in general [18]. One aim of share price analysis is to find the true price trends or predict the future price changes. The trends of a time series can be determined using many methods, such as filtering, decomposition, modeling, segmentation, and pattern classification [8,12,23]. In existing methods, time ser- ies classification is usually carried out using a clustering algorithm because the number of classes is often unknown. How- ever, clustering is typically a subjective process and can be highly problematic [2,6]. There are several differences between supervised classification and clustering procedures: (1) A classifier is built in supervised classification using training samples that are already classified, while no such samples are available in clustering. (2) Clustering involves partitioning all input data samples into different groups, whereas a classifier assigns a class label to each input sample [6]. (3) In clustering, the key parameter, the number of classes, may need to be specified subjectively. The same set of data items often needs to be clustered differently for different applications. These problems often do not exist in supervised classification. This paper proposes a supervised classification scheme for the recognition and prediction of share price trends. Zero- phase filters and singular spectrum analysis (SSA) are employed to transform the original share price data into a smooth time series, and then the trends are classified by a simple labeling process. By associating the classified smooth time series with the original data, the trends in the share price time series are classified. These classified trends are then used as training 0020-0255/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.ins.2008.06.002 * Corresponding author. Tel.: +61 2 9351 7070; fax: +61 2 9351 3847. E-mail address: [email protected] (Z. Zeng). Information Sciences 178 (2008) 3943–3956 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins

Supervised classification of share price trends

Embed Size (px)

Citation preview

Page 1: Supervised classification of share price trends

Information Sciences 178 (2008) 3943–3956

Contents lists available at ScienceDirect

Information Sciences

journal homepage: www.elsevier .com/locate / ins

Supervised classification of share price trends

Zhanggui Zeng a,*, Hong Yan a,b

a School of Electrical and Information Engineering, The University of Sydney, NSW 2006, Australiab Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong

a r t i c l e i n f o a b s t r a c t

Article history:Received 16 March 2006Received in revised form 4 March 2008Accepted 8 June 2008

Keywords:Singular spectrum analysisShare price data analysisClustering algorithmsSupervised pattern classificationNaïve Bayesian classifier

0020-0255/$ - see front matter � 2008 Elsevier Incdoi:10.1016/j.ins.2008.06.002

* Corresponding author. Tel.: +61 2 9351 7070; faE-mail address: [email protected] (Z. Zeng).

Share price trends can be recognized by using data clustering methods. However, the accu-racy of these methods may be rather low. This paper presents a novel supervised classifi-cation scheme for the recognition and prediction of share price trends. We first produce asmooth time series using zero-phase filtering and singular spectrum analysis from the ori-ginal share price data. We train pattern classifiers using the classification results of bothoriginal and filtered time series and then use these classifiers to predict the future shareprice trends. Experiment results obtained from both synthetic data and real share pricesshow that the proposed method is effective and outperforms the well-known K-meansclustering algorithm.

� 2008 Elsevier Inc. All rights reserved.

1. Introduction

Share price analysis forms a subject of time series analysis in general [18]. One aim of share price analysis is to find thetrue price trends or predict the future price changes. The trends of a time series can be determined using many methods,such as filtering, decomposition, modeling, segmentation, and pattern classification [8,12,23]. In existing methods, time ser-ies classification is usually carried out using a clustering algorithm because the number of classes is often unknown. How-ever, clustering is typically a subjective process and can be highly problematic [2,6]. There are several differences betweensupervised classification and clustering procedures:

(1) A classifier is built in supervised classification using training samples that are already classified, while no such samplesare available in clustering.

(2) Clustering involves partitioning all input data samples into different groups, whereas a classifier assigns a class label toeach input sample [6].

(3) In clustering, the key parameter, the number of classes, may need to be specified subjectively. The same set of dataitems often needs to be clustered differently for different applications. These problems often do not exist in supervisedclassification.

This paper proposes a supervised classification scheme for the recognition and prediction of share price trends. Zero-phase filters and singular spectrum analysis (SSA) are employed to transform the original share price data into a smooth timeseries, and then the trends are classified by a simple labeling process. By associating the classified smooth time series withthe original data, the trends in the share price time series are classified. These classified trends are then used as training

. All rights reserved.

x: +61 2 9351 3847.

Page 2: Supervised classification of share price trends

3944 Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956

samples to build pattern classifiers. Any classifiers can be used in the proposed scheme. Without loss of generality, a NaïveBayesian classifier is used in this research. After the classifier is built, it can then classify unknown patterns and predictfuture trends in the share price time series. We also consider different time scales to improve the classification results.The proposed transformation from clustering to supervised classification is a novel approach. Experiments have producedconvincing results and confirm the effectiveness of our method.

This paper is organized as follows: Section 2 introduces two approaches to smooth time series. Section 3 presents a tech-nique for classifying the original and filtered time series, and a method for training a Naïve Bayesian classifier. Section 4 de-scribes a Naïve Bayesian predictor. Experimental results obtained using the proposed method and other competitivemethods are provided and compared in Section 5. Conclusions are drawn in Section 6.

2. Time series filtering

Suppose that complicated pattern distributions can be converted to simple ones that can be classified or labeled directly.Then we can carry out a coarse classification for the input data. Using the classified data as training samples, we can train aclassifier to perform a finer and more accurate classification. For a complicated share price time series, such simple distri-butions can be defined as supervisory time series or super samples [21]. A supervisory time series is the key to transforminga clustering process into a series of supervised classifications. It guides the training of a classifier and improves the classi-fication accuracy. The supervisory time series should have the following features:

(1) Similarity of overall price movement to the original time series.(2) Reduction of noise.(3) Ease of classification and prediction.

Feature 1 implies that a supervisory time series may be produced from the original time series. We can filter the originaltime series to produce a smoothed series in order to achieve Feature 2. The filtered time series should have a zero-phasedifference from the original time series according to Feature 1. Therefore, a zero-phase filter can be used to convert a timeseries to a supervisory time series. In fact, the supervisory time series is also the principal component of the original timeseries. Hence, principal component analysis (PCA) and singular spectrum analysis (SSA) can be used to develop the supervi-sory time series from the original time series [14].

2.1. Zero-phase filtering

A zero-phase filter is a special case of a linear-phase filter in which the phase slope is a = 0 [19]. The impulse response h(t)of a zero-phase filter is even, i.e., h(t) = h(�t), where t denotes the time variable. To be even, it must be symmetric about time0. In many ‘‘off-line” applications, such as filtering recorded audio data, zero-phase filters are usually preferred.

For simplicity, we use the uniform zero-phase filter with an impulse response of

hðtÞ ¼1 if � N 6 t 6 N;0 otherwise:

�ð1Þ

The order 2N + 1 is taken as the time scale s of the zero-phase filter. Note that the zero-phase filter needs N samples beforeand after the current time t. The filtered time series is only available over the time interval N < t < L � N, where L is the lengthof the time series.

2.2. Cascaded uniform zero-phase filters

The output y(t), t = N + 1, . . . ,L � N of one uniform zero-phase filter is given by the convolution of h(t) with the input timeseries x(t),t = 1, . . . ,L, i.e.,

yðtÞ ¼ xðtÞ � hðtÞ ¼ 12N þ 1

XN

k¼�N

xðt þ kÞ: ð2Þ

This filter can remove high frequency noise and preserve the phase of x(t). However, its performance is not good enough toobtain a very smooth supervisory time series [7]. One solution to improve the smoothness of the supervisory time series is tointroduce cascaded uniform zero-phase filters. Cascaded uniform zero-phase filters are a group of uniform zero-phase filtersin series with each other. Suppose that there are m uniform zero-phase filters in a cascade. Its impulse response is

hmðtÞ ¼ hðtÞ � hðtÞ � � � � � hðtÞ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}m

: ð3Þ

A cascaded uniform filter can also be expressed as a single uniform filter with the feedback of the filtered signal. Like aGaussian filter, the two-dimensional uniform filter is also separable in each dimension [1,16,20]. It can be used for imagesegmentation or edge detection [4].

Page 3: Supervised classification of share price trends

Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956 3945

Once the filtering scale is selected, the number of the cascaded uniform filters m can be determined by the roughness ofthe filtered time series. We define a filtering loss as the sum of the squares of the residuals

eðmÞ ¼XL�N

t¼Nþ1

½ymðtÞ � xðtÞ�2; ð4Þ

where ym(t) is the filtered time series.The filtering loss usually increases with the increase of the number of uniform zero-phase filters. In addition, we define

the roughness of ym(t) as

gðmÞ ¼ eðmÞ � eðm� 1ÞeðmÞ þ eðm� 1Þ ; ð5Þ

when g(m) is small, the filtered time series is smooth. For a time series with given length, its roughness converges to zerowhen the cascade order m increases.

In practice, a roughness threshold e is set to a pre-specified value, e.g., e = 0.5%. When the roughness g(m) is greater thanthe threshold e, then the filtered time series ym(t) is fed back into the uniform filter. When the roughness is less than thethreshold, the filtering is stopped.

2.3. Multiscale time series filtering

By varying the filtering scale s, a series of smooth supervisory time series can be produced. In terms of the frequency andphase response of the zero-phase filter, the number of waves of the filtered time series decreases and the wavelengths be-come wider as the filtering scale increases. Fig. 1 shows an example of the number of waves. It can be seen from Fig. 1 thatsome ‘‘stable numbers” are unchanged after several consecutive increments in the filtering scale. When the scale is large en-ough, the filtered time series becomes a horizontal line with a wave number of zero. The classification of share price trends isrelated to the shape of the filtered time series. Therefore, the classifications of two filtered time series with the same numberof waves should have similar results. These consecutive scales that correspond to a stable number of waves can be repre-sented by one effective scale. Usually, the number of effective filtering scales of a time series is much less than its lengthL. When a time series has finite filtering scales, the pattern clustering process can be transformed into a series of finite super-vised classifications.

Two examples of multiscale zero-phase filtering are illustrated in Figs. 2 and 3. The filtering roughness threshold is set toe = 0.5%. Note that the time series on filtering scale s = 0 are the original series, and some filtered time series have beenshifted in the vertical direction in order to reveal their details. As shown in Figs. 2 and 3, the multiscale supervisory timeseries are very smooth and synchronized with the original time series. Fig. 4 shows the filtering loss function relative tothe share price data of QAN. The filtering loss increases substantially during the period from scale 3 to 50, and then increasesgradually to a constant value. The final filtered time series is a horizontal line with a wave number of zero.

Wave Numbers vs Filter Scales for QAN

0

5

10

15

20

25

30

35

3 13 23 33 43 53 63 73 83 93

Filetring Scales

Wav

e N

um

ber

Fig. 1. The number of waves of zero-phase filtered time series for the share prices of QAN. The stable number of waves are 11, 5 and 3.

Page 4: Supervised classification of share price trends

Filtered time series of QAN

$2.80

$3.00

$3.20

$3.40

$3.60

$3.80

$4.00

$4.20

1 101 201 301 401 501 601 701

Time (Day)

s=0

s=3s=5s=7s=9s=11

Fig. 2. Multiscale zero-phase filtering of QAN data. Some filtered time series have been shifted in the vertical direction in order to reveal their details. Allfiltered time series are synchronized with the original time series while the number of waves of the filtered time series decreases as the filtering scaleincreases.

Filtered time series of CBA

$20.00

$25.00

$30.00

$35.00

$40.00

$45.00

$50.00

1 101 201 301 401 501 601 701 801 901 1001

Time (day)

s=0s=3s=5s=7s=9s=11

Fig. 3. Multiscale zero-phase filtering of CBA data. Some filtered time series have been shifted in the vertical direction. All filtered time series aresynchronized with the original time series while the number of waves of the filtered time series decreases as the filtering scale increases.

3946 Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956

2.4. Singular spectrum analysis (SSA)

SSA was developed for the extraction of periodic or quasi-periodic components from a time series. The basic concept ofthe SSA method is to transform univariate time series into multivariate time series using time delays. Then, PCA is used toextract the principal components based on the distribution of the multivariate time series [10]. Finally, a noise-reduced and

Page 5: Supervised classification of share price trends

Filtering Lossvs. Time Scale

0

5

10

15

20

25

30

35

0 20 40 60 80 100 120 140 160 180 200

Time Scale (day)

Fig. 4. Filtering loss versus filtering scale. Filtering loss increases substantially and then increases gradually to a constant as the scale changes from 3 to 50.

Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956 3947

phase-preserved time series, i.e., a supervisory time series, can be reconstructed by the chosen principal components. Inter-ested readers are referred to [14,15,17] for detailed information on the SSA method.

Fig. 5 shows the number of waves of the SSA filtered time series for the share price data of QAN. The number of wavesdecreases significantly during the period from scale 3 to 12, and then decreases gradually to a constant. Two examples of SSAfiltering are shown in Figs. 6 and 7. Note that the time series on the filtering scale s = 0 are the original ones, and some filteredtime series have been shifted in the vertical direction in order to reveal their details. In order to obtain a smooth filtered timeseries, only the first principal component is used for the reconstruction of the supervisory time series. As shown in Figs. 6 and7, the multiscale supervisory time series are very smooth and are synchronized with the original time series.

3. Classification of share price trends

One task of share price analysis is to determine the true trends from the noisy signals. Filtering is an effective method todiscover the trends in consecutive time points. However, this approach has weaknesses, such as phase delay, amplitude loss,etc. Although a zero-phase filter has no phase delay, it needs N prior samples, and it cannot determine the true currenttrends. Pattern classification searches for trends using many more historic samples than conventional filters. Self-learningpattern classifiers can overcome the disadvantages of these filters.

Wavenumbers of SSA filtering for QAN

0

20

40

60

80

100

120

140

160

180

0 10 20 30 40 50

Filtering scale

Wav

e n

um

ber

Fig. 5. The number of waves of SSA filtered time series for QAN data. The number of waves decreases significantly and then decreases gradually to aconstant as the scale changes from 3 to 12.

Page 6: Supervised classification of share price trends

SSA filtered time series for QAN

$2.80

$3.00

$3.20

$3.40

$3.60

$3.80

$4.00

$4.20

1 101 201 301 401 501 601 701

Time (day)

s=0s=12s=21s=30s=39

Fig. 6. Multiscale SSA filtering of QAN data. The time series on the filtering scale s = 0 is the original one. Some filtered time series have been shifted in thevertical direction. The multiscale supervisory time series are very smooth and are synchronized with the original time series.

SSA filtered time series for CBA

$20.00

$25.00

$30.00

$35.00

$40.00

$45.00

1 101 201 301 401 501 601 701 801 901 1001

Time (day)

s=0s=12s=21s=30s=39

Fig. 7. Multiscale SSA filtering of CBA data. The time series on filtering scale s = 0 is the original one. Some filtered time series have been shifted in thevertical direction. The multiscale supervisory time series are very smooth and are synchronized with the original time series.

3948 Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956

3.1. Class labeling

For a time series, we can define a pattern space with a dimension d. The patterns are the relative lag vectors from the timeseries x(t),

ZðtÞ ¼ ½zðt � dþ 1Þ; . . . ; zðt � kþ 1Þ; . . . ; zðtÞ�; ð6Þ

where zðt � kþ 1Þ ¼ xðt�kþ1Þ�xðtÞxðtÞ

, xðtÞ ¼ 1d

Pdk¼1xðt � kþ 1Þ, 1 6 k 6 d, and d + N 6 t 6 L � N.

Page 7: Supervised classification of share price trends

Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956 3949

In addition, we define a group of patterns Q(t) = [ym(t � d + 1), . . . ,ym(t)], d + N 6 t 6 L � N from the supervisory time seriesy(t) of x(t). Five classes of trends can be defined for the group of patterns Q(t) according to the features of the patterns

(1) ‘‘Peak” class or Class 1: patterns or vectors which include a local maximum.(2) ‘‘Valley” class or Class 2: patterns which have a local minimum.(3) ‘‘Up” class or Class 3: patterns Q(t), ym(t) P ym(t � 1) P, � � � ,P ym(t � d + 1).(4) ‘‘Down” class or Class 4: patterns Q(t), ym(t) 6 ym(t � 1) 6, � � � ,6 ym(t � d + 1).(5) ‘‘Oscillation” class or Class 5: patterns which include one or more peaks and valleys.

Obviously, the definitions of pattern classes may be different for different analysis requirements. However, the followinganalysis method is still effective, whatever the definitions of the pattern classes are.

With these definitions, each pattern of Q(t) can be labeled with a defined class. Patterns Z(t) can be classified by a simpleprinciple: Each pattern of Z(t) belongs to the same class as the corresponding pattern Q(t). This principle is reasonable be-cause the filtered and original time series have synchronized trends. It has been proven to be correct in our experiments.Using this principle, the filtered time series are able to supervise the classification of the original time series. This is the rea-son that the filtered time series y(t) is referred to as super samples or the supervisory time series of the original time seriesx(t).

3.2. Training the Naïve Bayesian classifier

When classified patterns are given, they can be used as training samples to build a classifier. Without loss of generality,the Naïve Bayesian classifier is adopted in this research. Bayesian classification is a probabilistic method [9], and it is optimalin the sense that it minimizes the expected cost of misclassification [22]. The Naïve Bayesian classifier is a simplified Bayes-ian classifier [5], and it has been used successfully in many applications [11,13]. Let Cj, 1 6 j 6 5 denote the classes of shareprice trends, and L1 denote the end time of training. Classified patterns Z(t), d + N 6 t 6 L1, L1 < L can be used to estimate theprobabilities P(Cj), 1 6 j 6 5 and the probability distribution function P(Z(t)jCj), 1 6 j 6 5.

The probabilities P(Cj) can be estimated as follows:

PðCjÞ ¼qj

q; ð7Þ

where q is the number of training samples and qj is the number of these patterns from the class Cj, 1 6 j 6 5.The Naïve Bayesian model is based on the assumption that

PðZðtÞjCjÞ ¼Yd

k¼1

Pðzðt � kþ 1ÞjCjÞ: ð8Þ

Because the share price is a continuous measurement, it is typically assumed to be a Gaussian distribution, i.e.,

Pðzðt � kþ 1ÞjCjÞ ¼1ffiffiffiffiffiffiffi

2pp

rjk

e�ðzðt�kþ1Þ�ljk Þ

2

2r2jk ; ð9Þ

where ljk is the mean of all z(t � k + 1) which belong to class Cj, and rjk is the variance of the variable z(t � k + 1) for class Cj.The means can be estimated by

ljk ¼1qj

Xqj

n¼1

zðtn � kþ 1Þ; ð10Þ

where z(tn � k + 1) is the kth value of Z(tn), which belongs to class j.The variances are approximately equal to

rjk ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi1qj

Xqj

m¼1

ðzðtn � kþ 1Þ � ljkÞ2

vuut : ð11Þ

3.3. Testing the Naïve Bayesian classifier

The Naïve Bayesian classifier is trained over the time interval d + N 6 t 6 L1, L1 < L. Now, the classifier is tested over a dif-ferent time interval, L1 + 1 6 t 6 L. Given an input pattern Z(t) = [z(t � d + 1), . . . ,z(t � k + 1), . . . ,z(t)], L1 + 1 6 t 6 L, the classi-fier assigns Z(t) to the class with the highest posterior probability conditioned on Z(t), i.e., if and only if P(CijZ(t)) > P(CjjZ(t))for all j 6¼ i, is Z(t) assigned to class Ci. According to the Bayesian theorem,

Page 8: Supervised classification of share price trends

3950 Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956

PðCjjZðtÞÞ ¼PðZðtÞjCjÞPðCjÞ

PðZðtÞÞ : ð12Þ

Since P(Z(t)) is the same for all classes Cj, 1 6 j 6 5, the classification depends on P(Z(t)jCj)P(Cj). From Section 3.2, the P(Cj),1 6 j 6 5, and P(z(t � k + 1)jCj) are known. The conditional probabilities P(Z(t)jCj) can be determined from Eq. 8. Hence, thecurrent pattern Z(t) should be classified to class i if P(Z(t)jCi)P(Ci) > P(Z(t)jCj)P(Cj) for all j 6¼ i.

If the classification result is the same as the previous classification described in Section 3.1, then this is a correct classi-fication. By repeating this classification at different times, the performance of the classifier can be assessed based on thenumber of correct classifications.

One disadvantage of the zero-phase filter is that it does not work for the last N time points of the original time series, i.e.,it has a delay of N from the point of view of real time signal processing. Fortunately, the Naïve Bayesian classifier can classifythese time points after training. The last N time points of the supervisory time series can be classified according to the fol-lowing principle: each pattern Q(t) of the supervisory time series L � N 6 t 6 L belongs to the same class as the correspondingpattern Z(t) of the original time series L � N 6 t 6 L. Therefore, the classifier overcomes the disadvantage of the zero-phasefilter.

4. Prediction of share price trends

The prediction of share price trends can now be realized by classifying the class patterns of the supervisory time series.After the class labeling described in Section 3.1, the supervisory time series is transformed into a class series g(t), e.g.,g(t) = {, . . . ,111444442223333, . . .}. Let us define a class pattern G(t) = [g(t � d + 1), . . . ,g(t)], d + N 6 t 6 L. Obviously, thefuture class or trend of G(t) is g(t + 1) over the training time interval d + N 6 t 6 L1, L1 < L. Hence, we assign G(t) to the classg(t + 1), which has five possibilities defined in Section 3. These classified patterns can be taken as training samples forclassifiers.

Given G(t) = [g(t � d + 1), . . . ,g(t)] in a test time interval L1 + 1 6 t 6 L, if and only if P(CijG(t)) > P(CjjG(t)) for all j 6¼ i, G(t) isassigned to class Ci, i.e., the future class g(t + 1) of G(t) is Ci. Again, according to the Bayesian theorem,

PðCjjGðtÞÞ ¼PðGðtÞjCjÞPðCjÞ

PðGðtÞ : ð13Þ

Since P(G(t)) is the same for all classes Cj, 1 6 j 6 5, the classification can be made based on P(G(t)jCj)P(Cj).Classified historical samples can be used to estimate the prior probabilities. Assume that the samples are unbiased, then

PðCjÞ ¼qj

q; ð14Þ

where q is the number of training samples and qj is the number of historical patterns G(t) which belong to class Cj, 1 6 j 6 5.The Naïve Bayesian model assumes that

PðGðtÞjCjÞ ¼Yd

k¼1

Pðgðt � kþ 1ÞjCjÞ: ð15Þ

Pðgðt � kþ 1ÞjCjÞ ¼qjk

qj; ð16Þ

where qjk is the number of patterns that belong to Cj, and their kth feature is equal to g(t � k + 1).By comparing this prediction result with the class labeling at time t + 1, we can determine whether the prediction is cor-

rect. By repeating the prediction at different time points, the performance of the Naïve Bayesian predictor can be evaluatedbased on the number of correct predictions.

5. Experiments

The proposed time series analysis method is composed of three components, i.e., the original time series, the supervisorytime series and the classifiers. In order to verify our proposed method, experiments are carried out on the three components.

5.1. Experiments on synthetic signals

First, the method is tested for its sensitivity to noise and its response to signal complexity using synthetic signals withvariable noise amplitude and variable number of components. Such signals can be given by

yðtÞ ¼XD

i¼1

sin2p800

2D�1t� �

þ a2cðtÞ;

where c(t) is random noise with unit variance, D is the number of sine components, and a represents the noiseamplitude.

Page 9: Supervised classification of share price trends

Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956 3951

Without loss of generality, a zero-phase filter on the time scale s = 5 is used to produce the supervisory time series. A5-dimensional Naïve Bayesian classifier built using the training samples over the period from 5 to 800, classifies testing pat-terns over the period from 801 to 900, and predicts the pattern classes during the period from 801 to 1000. The performanceof the method on the synthetic signals is shown in Tables 1 and 2. The Naïve Bayesian classifier is quite sensitive to thevariance of noise. When noise variance increases, the classification accuracies decrease dramatically. However, the NaïveBayesian predictor performs trend prediction very well for different noise variances. This is because the Naïve Bayesianpredictor is based on a very smooth supervisory time series. When there is no noise in the signals, the performance ofthe Naïve Bayesian classifier deteriorates as the number of components increases. When there is some noise and the numberof components increases toward a certain value, the performance of the Naïve Bayesian classifier improves. But the perfor-mance deteriorates when the number of components increases from the value. This is because the noise strength becomesrelatively weaker when the number of signal components increases from zero. However, when the signal has more and morecomponents, it becomes more and more complex, and the performance of this proposed method deteriorates.

Another experiment is also conducted on the filter scale s = 9 and the dimension d = 9. The performance of our method isshown in Tables 3and 4. The classification performance is better than that of the first experiment when the number of signalcomponent is D = 1, 2, but worse when the number of components is D = 3–5. This is because larger filtering scales canimprove classification accuracy, but larger pattern dimensions will lead to worse classification. Since the Naïve Bayesianpredictor is based on the filtered time series, larger filtering scales will lead to better predictions.

5.2. Experiments on supervisory time series

In this study, we propose two methods to construct supervisory time series from the original time series. Experiments arecarried out in order to verify their effectiveness and compare their performance.

Table 1Classification performance for the synthetic signals at filtering on s = 5, and pattern dimension d = 5

D a

0 0.1 0.2 0.3 0.4 0.5

1 97 53 45 27 21 222 94 72 64 43 36 333 87 76 73 65 51 484 78 71 79 77 69 675 64 60 60 66 68 70

In this experiment, zero-phase filtering and the Naïve Bayesian classifier are used. D represents the number of sine components and a denotes the noiseamplitude.

Table 2Prediction performance for the synthetic signals on filtering scale s = 5, and pattern dimension d = 5

D a

0 0.1 0.2 0.3 0.4 0.5

1 200 190 183 174 177 1662 186 186 189 177 181 1793 174 174 174 177 178 1814 152 152 152 152 152 1525 166 175 174 151 135 134

In this experiment, zero-phase filtering and the Naïve Bayesian predictor are used.

Table 3Classification performance for the synthetic signals on filtering scale s = 9, and pattern dimension d = 9

D a

0 0.1 0.2 0.3 0.4 0.5

1 94 77 54 53 50 232 84 74 68 61 57 523 55 50 52 61 65 624 44 39 46 63 54 545 48 46 44 43 47 47

In this experiment, zero-phase filtering and the Naïve Bayesian classifier are used.

Page 10: Supervised classification of share price trends

Table 4Prediction performance for the synthetic signals on filtering scale s = 9, and pattern dimension d = 9

D a

0 0.1 0.2 0.3 0.4 0.5

1 200 200 189 189 189 1742 174 174 174 185 193 1933 176 176 176 176 176 1644 186 186 187 153 186 1855 179 179 179 179 177 177

In this experiment, zero-phase filtering and the Naïve Bayesian predictor are used.

3952 Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956

5.2.1. Experiments on multiscale zero-phase filteringThe share prices for evaluating the performance of the two filtering methods come from the Australian stock market. The

share QAN has 760 daily prices over the last three years, and share CBA includes 1012 daily prices over the last four years.According to the class definitions in Section 3.1, each supervisory time series obtained from multiscale zero-phase filteringcan be transformed into classified patterns.

Probabilities P(Cj), 1 6 j 6 5 and the probabilistic distribution functions P(Z(t)jCj), 1 6 j 6 5, can be estimated over the per-iod from time d + N to 660 for QAN and from time d + N to 912 for CBA. The remaining 100 time points of these time series areused to test the Naïve Bayesian classifier. The number of correct classifications in different scales s and pattern dimensions dis shown in Tables 5 and 6. The two tables reveal that

(1) The classification accuracy is very low when the original time series is taken directly as the supervisory time series.This implies that smooth supervisory time series plays a very important role in the proposed method.

(2) The filtering scale should be greater than the pattern dimension d in order to achieve higher classification accuracies.(3) The classification accuracy first increases quickly and then increases gradually as the filtering scale increases.(4) A lower fluctuation in the time series can lead to higher classification accuracies.

Considering that the filtered time series loses more information on larger filter scales, a smaller filtering scale is preferred.As a result of the tradeoff between a smaller filtering scale and a higher classification accuracy, s = 7, d = 3 for QAN and s = 5,d = 3 for CBA are reasonable parameters. A classification accuracy of 87% on s = 7, d = 3 for QAN and 83% on s = 5, d = 3 for CBAconfirm the effectiveness of the proposed method. It can be seen from the tables that higher classification accuracies arereached at a lower pattern dimension. This is consistent with the observation that the addition of more features to the pat-terns yields more noise, which can degrade the classification performance [3]. Both tables confirm that more information canbe preserved, but with worse classification performance when the filtering scale is smaller.

Table 5Classification accuracies for QAN data

d s

0 3 5 7 9 11

3 30 63 73 87 90 915 56 63 60 83 90 907 53 56 60 74 86 869 47 57 51 68 82 8411 40 46 45 64 81 75

In this experiment, zero-phase filtering and the Naïve Bayesian classifier are used. Variable d represents the pattern dimension and s denotes the filteringscale.

Table 6Classification accuracies for CBA data

d s

0 3 5 7 9 11

3 30 69 83 82 82 885 38 61 81 79 77 857 37 54 71 75 73 819 36 54 71 68 65 7611 35 47 67 64 62 68

In this experiment, zero-phase filtering and the Naïve Bayesian classifier are used.

Page 11: Supervised classification of share price trends

Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956 3953

After one supervisory time series is classified, it can be denoted as a character string. Based on the current string pattern,the known probabilities and probability distribution functions, the next character or class can be predicted. For both sets ofshare price data, the predictor is tested over the last 200 days. The accuracies of the prediction are shown in Tables 7 and 8.The Naïve Bayesian predictor achieves very high classification accuracies on different filtering scales and pattern dimensions,except for the filtering scale s = 0. This implies that a smooth supervisory time series makes a greater contribution to the highprediction accuracy than the Naïve Bayesian predictor. Tables 7 and 8 reveal similar characteristics to those shown in Tables5 and 6. The high prediction accuracies are due to the fact that the predictions are based on very smooth supervisory timeseries. The tradeoffs between filtering scales and pattern dimensions yield s = 7, d = 3 for QAN and s = 5, d = 3 for CBA. Theresults of the parameter tradeoff are the same as the Naïve Bayesian classifier. By comparing the experimental results, wenote that the Naïve Bayesian classifier and the Naïve Bayesian predictor show very similar performance (the same tradeoffparameters and similar distributions of accuracies).

5.2.2. Experiments on SSA filteringIn SSA filtering experiments, the first principal component is used to construct the supervisory time series. The original

time series are the same as those in Section 5.2.1. The Naïve Bayesian classifier and predictor are also adopted for the SSAfiltered time series. The performance of the proposed method based on SSA filtering is shown in Tables 9 and 10. The clas-sification accuracies are very similar to those based on zero-phase filtering. As a result of the tradeoff between a smaller fil-tering scale and a higher classification accuracy, s = 30, d = 3 for QAN and s = 25, d = 3 for CBA are reasonable parameters.Classification accuracies of 84% for QAN and 82% for CBA at these tradeoffs confirm the effectiveness of SSA filtering. How-ever, these results are slightly worse than those from zero-phase filtering.

The prediction performance of this proposed method based on SSA filtering is shown in Tables 11 and 12. The proposedmethod achieves very high prediction accuracies on different filtering scales and pattern dimensions. The optimal parame-ters with tradeoffs between filtering scale and pattern dimension are s = 25 and d = 3 for both QAN and CBA. The predictionperformance is slightly worse than that of zero-phase filtering.

Table 7Prediction accuracies for QAN data

d s

0 3 5 7 9 11

3 108 184 190 194 198 1985 0 184 185 192 197 1977 0 172 180 189 196 1969 4 175 181 189 196 19511 4 174 178 188 195 194

In this experiment, zero-phase filtering and the Naïve Bayesian predictor are used.

Table 8Prediction accuracies for CBA data

d s

0 3 5 7 9 11

3 108 184 196 196 196 1945 55 183 194 194 194 1927 0 174 192 191 190 1909 0 172 192 190 190 19011 7 170 187 189 188 190

In this experiment, zero-phase filtering and the Naïve Bayesian predictor are used.

Table 9Classification performance based on SSA filtering for QAN data

d s

12 16 21 25 30 35 39

3 62 67 72 74 84 89 885 55 63 52 62 75 81 897 38 58 54 55 64 73 839 39 55 55 55 64 69 8111 37 34 32 43 58 56 83

Page 12: Supervised classification of share price trends

Table 10Classification performance based on SSA filtering for CBA data

d s

12 16 21 25 30 35 39

3 68 70 79 82 82 83 865 60 61 74 82 81 77 817 58 58 69 74 79 73 799 45 58 69 74 72 72 7311 41 52 60 71 68 68 69

In this experiment, SSA filtering and the Naïve Bayesian classifier are used.

Table 11Prediction performance based on SSA filtering for QAN data

d s

12 16 21 25 30 35 39

3 180 184 184 188 187 187 1965 178 179 180 179 176 182 1957 170 168 156 162 173 166 1899 166 167 154 150 173 162 18411 160 161 168 157 175 159 178

In this experiment, SSA filtering and the Naïve Bayesian predictor are used.

Table 12Prediction performance based on SSA filtering for CBA data

d s

12 16 21 25 30 35 39

3 184 188 190 195 195 194 1945 183 181 185 194 189 192 1927 173 176 184 191 191 183 1889 171 175 183 192 189 188 18011 167 173 179 190 184 188 177

In this experiment, SSA filtering and the Naïve Bayesian predictor are used.

3954 Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956

5.3. Experiments on pattern classifiers

5.3.1. Experiments on KNN classifiersIn order to verify the effects of different classification methods in the proposed scheme, experiments are carried out based

on the K Nearest Neighbors (KNN) classifier, which is widely used in many pattern recognition applications. The same ori-ginal time series and zero-phase filters as those in Section 5.2.1 are used in the experiments. The performance of the pro-posed method based on the KNN is shown in Tables 13–16. The classification accuracies are improved as the number ofnearest neighbors increases. From Table 13, the classification accuracies on d = 3, K = 50 are similar to those in Table 5 atd = 3. Such a similarity can also be found between Tables 14 and 5, Tables 15 and 6, and Tables 16 and 6. When K is small,e.g., K = 10 or 20, the KNN does not perform as well as the Naïve Bayesian classifier. However, when K is large, e.g., K = 50, theKNN performs slightly better than the Naïve Bayesian classifier. This result is consistent with the well-known observationthat the KNN can achieve classification as good as Bayesian classifiers when K is sufficiently large.

Table 13The number of correct classifications based on the KNN for QAN data at pattern dimension d = 3

d s

12 16 21 25 30 35 39

3 184 188 190 195 195 194 1945 183 181 185 194 189 192 1927 173 176 184 191 191 183 1889 171 175 183 192 189 188 18011 167 173 179 190 184 188 177

In this experiment, zero-phase filtering and the KNN classifier are used. K represents the number of the nearest neighbors and s denotes the filtering scale.

Page 13: Supervised classification of share price trends

Table 14The number of correct classifications based on the KNN for QAN data on filtering scale s = 7

d K

10 20 30 40 50

3 63 67 68 72 815 73 84 88 88 877 77 79 85 88 879 63 76 79 87 9211 56 61 66 69 67

In this experiment, zero-phase filtering and the KNN classifier are used.

Table 15The number of correct classifications based on the KNN for CBA data under pattern dimension d = 3

K s

3 5 7 9 11

10 60 76 75 74 8320 67 79 82 79 8930 65 79 83 80 9140 68 83 84 79 9050 70 84 85 81 90

In this experiment, zero-phase filtering and the KNN classifier are used.

Table 16The number of correct classifications based on the KNN for CBA data on filtering scale s = 7

d K

10 20 30 40 50

3 76 79 79 83 845 70 81 81 81 817 74 81 84 81 849 70 68 73 78 8111 60 61 62 69 70

In this experiment, zero-phase filtering and the KNN classifier are used.

Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956 3955

It is interesting that the pattern dimension d = 3 is the optimal selection for all share price experiments. This may implythat the stock market needs at most three days to fully respond to important market news.

5.3.2. Experiments on K-means clusteringIn order to verify the effectiveness of the transformation from clustering to supervised classification, experiments are con-

ducted using the well-know K-means clustering algorithm. We use the same original time series and zero-phase filters asthose in Section 5.2.1. In order to verify the results of pattern clustering, the classes of the filtered time series are assignedthe same labels as those defined in Section 3.1. The number of clusters is set to five, and the initial means are set to differentpatterns in different classes. Then, the K-means method is applied to each filtered time series. Comparing the clustering re-sults and the class labels of the filtered time series can reveal the accuracy of the clustering process. The clustering accuraciesfor the two share prices are very low, as shown in Tables 17 and 18. Clearly, our approach has a superior performance to theclustering based method.

Table 17Clustering performance based on the K-means algorithm for QAN data

d s

3 5 7 9 11

3 45 42 39 42 485 43 37 46 37 447 36 36 32 52 469 43 34 36 45 3511 39 40 38 38 36

In this experiment, zero-phase filtering and the K-mean clustering method are used. The number of clusters is set to five and the initial means are set to fivepatterns in different classes.

Page 14: Supervised classification of share price trends

Table 18Clustering performance based on the K-means algorithm for CBA data

d s

3 5 7 9 11

3 38 38 33 34 325 36 30 36 32 347 34 32 29 32 299 39 34 32 32 3111 40 38 37 32 34

In this experiment, zero-phase filtering and the K-means clustering method are used. The number of clusters is set to five and the initial means are set tofive patterns in different classes.

3956 Z. Zeng, H. Yan / Information Sciences 178 (2008) 3943–3956

6. Conclusions

In this paper, zero-phase filters and SSA filters are applied to share price time series to transform the data into smoothsupervisory time series. The patterns of the supervisory time series are categorized into five classes, which are then usedto classify the original time series. Pattern classifiers, such as the Naïve Bayesian classifier and the Naïve Bayesian predictor,can be trained using the classified original time series and the classified supervisory time series. The proposed method trans-forms a clustering process into supervised classification with different time scales. The filters perform the role of coarse clas-sification, and pattern classifiers carry out the fine classification. Our method establishes a bridge between clustering andsupervised classification. Its effectiveness has been verified by a large number of experiments. This technique can be ex-panded to many other applications, such as biological data analysis.

Acknowledgement

This work is supported in part by the Hong Kong Research Grant Council (Project CityU 122506).

References

[1] R. Anonie, E. Carai, Gaussian smoothing by optimal iterated uniform convolutions, Computer Artificial Intelligents 11 (4) (1992) 363–373.[2] V.E. Castro, Why so many clustering algorithms, SIGKDD Explorations 4 (1) (2002) 65–75.[3] D. Chen, D.D. Hua, Z. Liu, Z.F. Cheng, An integrated system for class prediction using gene expression profiling, IEEE International Conference on Control,

Automation, Robotics and Vision, 2004. pp. 1023–1028.[4] M. Dai, P. Baylou, L. Humbert, M. Najim, Image segmentation by a dynamic thresholding using edge detection based on cascaded uniform filters, Signal

Processing 52 (1996) 49–63.[5] R.O. Duda, P.E. Hart, Pattern Classification and Scene Analysis, Wiley, New York, 1973.[6] E.R. Dougherty, M. Brun, A probabilistic theory of clustering, Pattern Recognition 37 (2004) 917–925.[7] H. Eugene, An economical class of digital filters for decimation and interpolation, IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-29

(1981) 155–162. April.[8] A.C. Harvey, Time Series Models, Harvester Wheatsheaf, Sydney, 1993.[9] R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, second ed., Prentice-Hall, New York, 1988.

[10] I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, New York, 1986.[11] H.J. Kim, J.U. Kim, Y.G. Ra, Boosting Naïve Bayes text classification using uncertainty-based selective sampling, Neurocomputing 67 (2005) 403–410.[12] P.A. Lynn, W. Fuerst, Introductory Digital Signal Processing with Computer Application, John Wiley, New York, 1998.[13] T.M. Mitchell, Machine Learning, McGraw-Hill, New York, 1997.[14] V. Nekrutkin, Theoretical properties of the ‘‘Caterpillar” method of time series analysis, in: Eighth IEEE Signal Processing Workshop on Statistical Signal

and Array Processing (SSAP’96), pp. 395–397.[15] W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in C, Cambridge University Press, 1992.[16] R. Rau, J.H. McClellan, Efficient approximation of Gaussian filters, IEEE Transactions on Signal Processing 45 (2) (1997) 468–471.[17] C.S. Sastry, S. Rawat, A.K. Pujari, V.P. Gulati, Network traffic analysis using singular value decomposition and multiscale transforms, Information

Sciences 177 (23) (2007) 5275–5291.[18] R.H. Shumway, D.S. Stofer, Time Series and Its Application, Springer, New York, 2000.[19] J.O. Smith, Introduction to digital filters, Center for Computer Research in Music and Acoustics (CCRMA), Stanford University, September 2005 Draft.

<http://ccrma.stanford.edu/~jos/filters05/>.[20] W.M. Wells, Efficient synthesis of Gaussian filters by cascaded uniform filters, IEEE Transactions on Pattern Analysis and Machine Intelligence PAMI-8

(1986) 234–239.[21] R.R. Yager, An extension of the Naïve Bayesian classifier, Information Science 176 (2006) 577–588.[22] M. Zaffalon, The Naïve credal classifier, Journal of Statistical Planning and Inference 105 (2002) 5–21.[23] Z. Zeng, M.N. Fu, H. Yan, Time series prediction based on pattern classification, Artificial Intelligence in Engineering 15 (2001) 61–69.