A New Approach to Investigate the existence of patterns in ...wseas.us/e-library/conferences/2006prague/papers/514-198.pdf · the LTGP and suggesting a possible prediction of oncoming

A New Approach to Investigate the existence of patterns in Geoelectrical Signals related to Seismicity of Western Greece, using

supervised pattern recognition.

APOSTOLOS IFANTIS, VASILIOS NIKOLAIDIS, GEORGE ECONOMOU* Department of Electrical Engineering, Control System and Signal Processing Laboratory

Technological Educational Institute of Patras M. Alexandrou 1 Koukouli, Patras

GREECE *Department of Physics, Electronics Laboratory, University of Patras, Patras 26500, Greece.

[email protected] [email protected] [email protected]

Abstract: - In this study, a supervised pattern recognition technique is used to examine Long Term Geoelectric Potential difference (LTGP) data recorded during the 1993-1997 period in Western Greece. It presents initial results from an attempt towards automated discovery of similarities between LTGP data recorded during periods of similar seismic activity. In particular, we investigate whether patterns exist in LTGP data recorded during time periods of significant seismic events with geographically adjacent earthquake epicenters. Signals recorded in such periods are grouped together and comparisons are made to properties of other groups. Certain interesting properties of signal groups are detected, indicating the existence of a possible correlation between the geoelectric signal structure and the epicenter location of earthquakes. An explanation of the above results is also provided. Key-Words: - Geoelectrical signals, Pattern recognition, Similarity, Earthquake

1 Introduction Earthquake-related electromagnetic phenomena are considered promising candidates for short-term earthquake prediction. Over the years, numerous electromagnetic signals at a broad range of frequencies associated with earthquakes have been reported [1][2][3]. Due to the many possible sources of noise and other factors that may alter the signal characteristics, correlation between Long Term Geoelectric Potential difference (LTGP) signals and earthquakes is still under investigation. However, there is strong evidence that anomalous changes of the geoelectromagnetic field do occur prior to strong earthquakes. Aiming towards earthquake prediction, a plethora of geoelectromagnetic signal-change investigations have taken place, where LTGP measurements are carried out and analyzed using sophisticated signal analysis techniques [4][5][6]. Unlike an instantaneous phenomenon, an earthquake is accompanied by preseismic and postseismic geotectonic activity, affecting the characteristics of the LTGP and suggesting a possible prediction of oncoming earthquake [5][6]. Presented in this paper are initial results from an attempt towards automated discovery of patterns between LTGP data recorded during time periods of similar seismic activity. In particular, we investigate whether patterns exist in

LTGP data recorded during time periods of significant seismic events with geographically adjacent earthquake epicenters in Western Greece. With the known seismic activity of the 1993-1997 time period and using LTGP data recorded during same period, the goal is to use pattern recognition methods to explore possible mappings between LTGP data and seismic activity information, and investigate the recognition performance to extract conclusions on possible structures in the data. The existence of similarities in time sequences of geoelectric potential data recorded during periods of similar seismic activity and dissimilarities with data recorded in periods of other seismic behavior is investigated, using consecutive 10-day time periods, which include periods of major earthquakes that took place in the area of Western Greece. Recordings have been collected during a independent experimental investigation at the University of Patras Seismological Laboratory (UPSL), and co-occurred with several destructive earthquakes in the same area of Western Greece, a territory with the highest seismic activity in Europe.

1.1 LTGP Acquisition System The measurement of the LTGP is relatively simple, sensing the LTGP between pairs of electrodes placed in the ground at specific locations. The

Proceedings of the 8th WSEAS Int. Conference on Automatic Control, Modeling and Simulation, Prague, Czech Republic, March 12-14, 2006 (pp364-369)

electric field is continuously monitored usually in two perpendicular directions (N-S and E-W) by an appropriate number of electrodes. In this system the monitoring of the LTGP potential difference is achieved by four sets of dipoles arranged in short as well as long distances (Fig.1). These dipoles make use of Pb-PbCl2 electrodes. Two sets have an electrode separation of 100m and are adjusted towards the N-S (ch0) and E-W (ch1) one perpendicular to each other. The third set (ch2) has an electrode separation of 300m and is directed towards the NE-SW. The fourth one has an electrode separation of 3000m in the direction NW-SE (ch4). The dipoles are set on the outskirts of the University of Patras, in Rio, in a rather quiet countryside and are based in Pleistocene compact conglomerates. .

Fig.1 Data Acquisition System

The obtained LTGP signals appear in Figure 2. In Figure 2 (a,b,c,d,) the geo-electric potential difference for each channel (ch0, ch1, ch2, ch4) are presented correspondingly. The vertical axis illustrates the potential in mV and the horizontal axis the time interval in hours. Figure 2e illustrates the seismic events. In this figure, the vertical axis illustrates the magnitude of earthquakes in Richter scale and the horizontal axis the time interval in hours. Details of these significant seismic events are listed in Table 1 and their epicenters appear in Figure 3.

(a) Ch0

(b) Ch1

(c) Ch2

(d) Ch4

(e) Seismic Events

Fig.2 Geoelectic and Seismic signals


Event

# Time

(Hours) Date Mag

n. Dist. (Km)

Depth (Km)

1 1543 5/3/93 5.3 176 6.4 2 1863 18/3/93 4.9 44 1.5 3 2055 26/3/93 5.0 100 18.3 4 3960 13/6/93 5.4 192 40.0 5 4700 14/7/93 5.1 15 50.2 6 10082 25/2/94 5.3 147 49.7 7 11304 16/4/94 5.3 182 36.4 8 16744 29/11/94 4.9 159 28.2 9 16783 1/12/94 4.8 148 35.8

10 21505 15/6/95 5.6 42 50.9 11 21506 15/6/95 5.1 43 3.7 12 24030 28/9/95 4.8 29 30.4 13 30064 6/6/96 4.9 130 35.8 14 42453 5/11/97 4.9 55 28.3 15 42616 12/11/97 4.8 200 56.6 16 42757 18/11/97 6.1 206 36.9 17 42758 18/11/97 5.6 183 48.3 18 42759 18/11/97 5.0 209 32.4

Table 1: Major earthquakes that occurred in Western Greece during the 1993-1997 period.

Fig. 3. Epicenters of the major earthquakes in Patras area in Western Greece during the period 1993-1997. 2 Problem Formulation The five-year LTGP data consists of 43600 samples per channel, four channels; being the average of 180 actual samples taken hourly, each value corresponds to an hour of data acquisition. Given the seismic activity of the period, we investigate possible structures in the LTGP data that

may be common in all periods with events of geographically adjacent earthquake epicenters. 3 Problem Solution 3.1 Data Preprocessing Several preprocessing steps were applied to the aforementioned data. For the purposes of this analysis, the entire preprocessing stage reduced the data to a 181 x 30 dataset matrix, each row corresponding to a 10-day period. Consecutive rows correspond to consecutive time periods, and the 181 feature vectors amount to 43440 hours of data acquisition. The features are taken from the frequency domain only, projected using principal component analysis, and then scaled. Below follows a more detailed description of the preprocessing steps performed. As it has been shown elsewhere, the recorded signals exhibit slow annual variations and high frequency noise. To remove the bias imposed by the signal’s annual variation, a low frequency filtering step was performed as follows: for each sample, the channel’s average signal level during the past five days was calculated (i.e. the mean value of the previous 120 samples), and the sample was replaced by its difference from the corresponding average signal level. Since no average signal level could be computed for the first 120 samples, these were removed from further analysis. This step also added some irrelevant values caused by periods when data acquisition was interrupted (thus having zero average signal level), but no further action was taken to handle this shortcoming. The remaining 43480 samples of each channel were spitted in 240-hour sections, producing 181 segments, each corresponding to a consecutive 10-day period (the final 40 hours of the original five year period were also dropped from further analysis at this point). The Power Spectrum of each segment was then estimated by taking the Discrete Fourier Transform (DFT) of the segment’s data and performing array multiplication between the transform and its complex conjugate. The resulting 240 values are being symmetric and thus only the first 120 values were kept for further processing. The four vectors (each containing 120 power spectrum values per channel) were then combined into one which contained the entire 480-coordinate information of all channels. Combining the vectors from all periods resulted to a 181 x 480 dataset spanning more than 99% of the original five year period signals.


The next preprocessing step consisted of a Principal Component Analysis (PCA) performed on the 181 x 480 dataset. Utilizing the statistical properties of the data, the method used here calculates eigenvectors of the covariance matrix to expose a set of orthogonal axes on which the data reaches maximum variance [7][8]. This allowed the projection of the data to the new principal component axes, and the subsequent reduction of the dimensionality of the problem from 480 to 30 coordinates without major data loss. Finally, each coordinate (column) of the resulting 181 x 30 dataset was normalized by subtracting the mean and dividing by the standard deviation, mapping all values to the [-1 , 1] range. 3.2 Pattern recognition The supervised pattern recognition analysis stage, performed to determine structures in the data, is presented below. Only data from the 181 x 30 dataset which resulted from the previously described preprocessing stage, was used. Training examples (the training dataset) as well as the set of records on which classification was to be performed and results evaluated (the testing dataset), were extracted from this or subsets of this dataset. In terms of supervised pattern recognition performance this is an ideal case. Having a common training and testing dataset, with common preprocessing on both sets (PCA, global normalization parameters etc), eases the problem and biases the results towards better recognition. However, with the purpose of this study not being the recognition of unknown data but the better understanding of the existing data, this was a valid decision. At the same time, expected results were biased towards worse recognition performance by performing no filtering of high frequency noise, no removal of invalid data sections, and by choosing a purely automated method for creating the 181 data vectors. As mentioned above, time was partitioned into 181 consecutive segments. The actual seismic event could have occurred at any point in the segment. Most segments did not correspond to periods of major seismic activity, and the ones that did could contain any ratio of preseismic or postseismic time. Some periods could contain more than one seismic event. The approach taken can be summarized as follows: taking into account the epicenter location of major earthquake activity recorded during each of the 181 periods (Table 1), one can classify each vector (row) of the dataset as corresponding to time periods of one of four cases:

(a) periods when no major earthquake activity (with magnitude >= 4.8 Ms) was recorded. These periods were put to a class labeled NA (168 periods).

(b) periods with major earthquake activity in the North Ionian region (class NI, 3 periods).

(c) periods with major earthquake activity in the South Ionian region (class SI, 5 periods).

(d) periods with major earthquake activity in the Corinthian and Patraikos Gulf region (class CP, 5 periods).

The resulting class assignments for time periods with major seismic events are shown in Table 2.

Period #

Period Start Time

(Hours)

Period End Time

(Hours)

Events in period

Class Assign.

5 1321 1561 1 SI 7 1801 2041 2 CP 8 2041 2281 3 SI

15 3721 3961 4 NI 19 4681 4921 5 CP 41 9961 10201 6 NI 46 11161 11401 7 SI 69 16681 16921 8,9 NI 89 21481 21721 10,11 CP 99 23881 24121 12 CP

124 29881 30121 13 SI 176 42361 42601 14 CP 177 42601 42841 15,16,17,18 SI

Table 2: Class assignments for periods with major seismic events. Since some periods contain more than one of the 18 seismic events, there are only 13 periods with seismic activity. Next, supervised pattern recognition methods were to be trained using one or more prototypes (examples) representing data from each of the four classes. The dataset (or a subset of the dataset) could then be presented to the trained algorithm for classification. Subject to the discrimination ability of the classifier method, classification success (i.e. success in identifying the period’s correct class type from information in the signal data) would indicate that the data at each period is similar to the taught prototypes for the class, and would signify a connection between data in the time period and seismic activity type. Even misclassification adds useful insight about the structure of the class data. As seen by a typical 2-d projection of the data’s


multidimensional feature space (Fig.4), structures in the data could be impossible to identify or visualize without the aid of such analysis.

Fig.4 An example 2-d projection of the entire data. 3.2.1 Four Class Problem The 4-class supervised classification problem described above (with each class representing a different type of earthquake epicenter area) was presented to a classic supervised classification method, the 1- Nearest Neighbor Classifier (1-NNC) [9]. The 1-NNC simply classifies each data point to the class (type) of the nearest taught (stored) prototype. Euclidean distance was used as the metric of similarity. The goal being to determine whether the vectors belonging to a single class are also located together in a common area of the feature space, as well as to determine the degree of separation between the data vectors in separate classes, the mean vector of each of the four classes was used as prototype for training the method. Used in such way, 1-NNC classification success would indicate that the vectors in each class are located together around the mean value of the class. More importantly, the extent of within-class misclassification would show the interference, the degree by which items of other classes enter the space around the mean value of the class in question. Once trained, the method was used to reclassify the entire dataset. Using all 30 PCA axes, the method misclassified 25 of the 181 vectors, an overall 13.81% classification error. The classification error per class (within class error) was 12.5% (21 out of 168) for class NA, 33.3% (1 out of 3) for class NI, 40% (2 out of 5) for class SI and 20% (1 out of 5) for class CP. Although the results of this classification appear promising, they could be misleading. Due to the large feature space which allows the scattering of the data is space and the limited number of items (especially in the smaller classes) a situation where most items are classified

around their mean could easily be created even if the data was random. The good overall classification result is also of little value due to the difference in sizes of class NA over the other three classes. Even if all items were to be classified in class NA (a situation very likely with random data since NA’s mean value would have been calculated over the majority of the data), the overall error reported would be only 7.18% (11 out of 181). However useful insight can be extracted by the observance of the confusion matrix of this classification. The confusion matrix compares the original and output classes to show how classes interfered with each other. The confusion matrix of this classification is presented below:

NA NI SI CP NA 147 0 9 12 NI 1 2 0 0 SI 2 0 3 0 CP 1 0 0 4

Columns list the sizes of the classes output by the supervised classifier; rows are the original class sizes. The value at row x column y is the number of vectors of original class x classified in class y. Items counted in the diagonal are properly classified. From this matrix the following observations can be made: (a) The three classes which represent active time

periods (NI, SI, and CP) appear separate in the feature space and do not interfere with each other. All misclassifications of members of the active periods are placed in the larger class NA.

(b) The number of NA periods misclassified, is relatively small.

(c) The mean value of class NI is separate enough to not attract any vectors from any other class (including the large NA class).

3.2.2 Two Class Problem Several similar experimentations were performed on sub cases of the problem, limiting the number of classes, selecting a subspace of the original 30 PCA axes etc. Focusing on signals from active periods in NI and SI areas only, the resulting confusion matrix was:

NI SI NI 2 1 SI 0 5

A single item of class NI is misclassified as SI. Indeed by visual inspection of several 2-d scatter plot projections (Fig.5) indicates a given pair of vectors from each class which are consistently found adjacent to each other. These vectors correspond to


periods 69 and 177. The first is a member of class NI (containing two events in the NI region), and the second which was labeled SI (containing three events in that region) was a notable exception since it also contained an event occurring in the NI region.

Fig.5 An example 2-d projection of a two class problem.

Fig.6 An example 2-d projection of a three class problem. The schema used here may not be able to capture the entire class geometry properly. Elongated or more complex structures are not well represented by their mean value, causing high classification error when 1-NNC is applied. Indeed such elongated class structures were seen in several projection plots (class SI in Fig.5, classes SI and CP in Fig.6). 4 Conclusion The three seismically active geographical regions examined have geotectonic differences as well as different signal propagation paths, and thus is would be expected that they exhibit different LTGP signal patterns during a seismic period. In this work, presented results indicate that this assumption is verified to a significant degree. The large class of

periods with no seismic activity could possibly affect the results, since it contains earthquakes of lesser magnitude (low magnitude seismic activity is very common in Western Greece). The similarity between LTGP signals coinciding with seismic sequences from the same epicenter area could aid towards better utilization of such signals in the future. Acknowledgements This work was supported by the by the National Ministry of Education under the project entitled “Soft computing techniques for multichannel processing”, in the context of the program “Archimidis References: [1] P. Varotsos and K. Alexopoulos, Physical

properties of the variations of the electric field of the earth preceding earthquakes I & II, Tectonophysics, Vol.110, 1984, pp. 73-125.

[2] K. Meyer and R. Pirjola, Anomalous electrotelluric residuals prior to a large imminent earthquake in Greece, Tectonophysics, Vol.125, 1986, pp. 371-378.

[3] G-A. Tselentis and A. Ifantis, Geoelectric variations related to earthquakes observed during a 3-year independent investigation, Geophysical Research Letters, Vol.23, No.11, 1996, pp. 1445-1448.

[4] M. Hayakawa, T. Ito and N. Smirnova, Fractal analysis of ULF geomagnetic data associated with the Guam earthquake on August 8, 1993, Geophysical Research Letters, Vol.26, No.18, 1999, pp. 2797-2800.

[5] L. Telesca, V. Cuomo, V. Lapenna, A new approach to investigate the correlation between geoelectrical time fluctuations and earthquakes in a seismic area of southern Italy, Geophysical Research Letters, Vol.28, No.23, 2001, pp. 4375-4378.

[6] A. Ifantis, G. Economou and G.A.Tselentis, Experimental Investigation of Electrotelluric Field Periodic Anomalies in Western Greece and their Possible Relation to Seismicity during a Five – Year Period, Acta Geophysica Polonica, Vol.51, No.3, 2003, pp. 291-305.

[7] M. R. Anderberg, Cluster Analysis For Applications, Academic Press, 1973.

[8] R. A. Johnson and D.W.Wichern, Applied Multivariate Statistical Analysis, Prentice-Hall International, 1992.

[9] Batchelor, B. G. Practical Approach to Pattern Classification, Plenum Press, 1974.


Documents

A New Approach to Investigate the existence of patterns in ...wseas.us/e-library/conferences/2006prague/papers/514-198.pdf · the LTGP and suggesting a possible prediction of oncoming