4
Rescue and digitization of climate data by extraction from autographic weather charts Anita S.Diwakar Sr. Lecturer, Computer Engineering Department, [email protected] D.K.Nayak Principal V.P.M’s Polytechnic, Thane (W), India [email protected] Prof. Poornima Talwai Ramrao Adik Institute of Technology Navi Mumbai,India Abstract— Data Rescue is the ongoing process of preserving all data at risk of being lost due to deterioration of the medium and digitizing current and past data into computer compatible form for easy access. IMD has conscientiously taken observations of the weather, recorded them either manually or automatically, transcribed them onto preset paper forms called autographic charts. Most of the charts are now at risk of being lost due to rapid deterioration of the medium. This paper presents algorithms for automatic extraction and storage of the atmospheric data using image processing tools. The key factors that influence the methods are speed, accuracy and space. We have tested the algorithms on scanned images of the autographic charts available at IMD and present the results. Key-Words: - Image processing, Image edge analysis, Curve fitting I. INTRODUCTION The Indian Metropolitan department currently has stacks of historical weather charts. These weather charts record information about barometric pressure, temperature and humidity. Analyzing and compiling data from these charts is a time consuming task. Currently the user must read data from these charts and manually enter it into a computer database or spreadsheet one data point at a time. This process is time consuming, costly, and prone to human error. SAMEER has developed The Surface Weather Chart Digitizer (SWCD) software to demonstrate the extraction of the data values from IMD’s charts. This software utilizes the image processing tools for the various steps involved starting from reading the scanned image file to storing the data in a two dimensional array. The major factors are time, accuracy and space as the number of charts to be digitized is huge and an on going process. The climatic parameters, which are continuously monitored, are Rainfall, Temperature, Pressure, Relative Humidity, Wind Speed and Sunshine Duration. There are nearly 15000 of each type of charts which are to be digitized. These rescued data combined with already available data will enable better assessments of projections of the climate into the future that can serve as input for the policy makers to mitigate loss due to natural disasters and will provide increased information for economic development. Automatic data extraction from these graphs is challenging as they are of different types, they have various imaging conditions, being old they are not in proper condition and they have noise which is diverse in nature. Xiaonan Lu et al have suggested a method for the extraction of data from 2D plots. This paper presents the results obtained after following the guidelines given and a quantitative report of the analysis carried out. II. PROCEDURE The figure illustrates the procedure followed to obtain the actual values of the climate parameters such as Temperature, Pressure, and Humidity from the autographic charts called the Thermograph, Microbarograph and Hygrograph respectively. The values obtained are compared with the actual values and the MSE is calculated. Figure 1. Procedure for data extraction Image Conversion Detection of Axes Location of origin Finding Horizontal and Vertical Resolution Filtering/Noise removal and binary masking Obtaining absolute pixel values Obtaining relative pixel values Conversion to actual climate parameter values: T, P, H Image Input 2009 IACSIT Spring Conference 978-0-7695-3653-8/09 $25.00 © 2009 IEEE DOI 10.1109/IACSIT-SC.2009.27 186 2009 International Association of Computer Science and Information Technology - Spring Conference 978-0-7695-3653-8/09 $25.00 © 2009 IEEE DOI 10.1109/IACSIT-SC.2009.27 186

[IEEE 2009 International Association of Computer Science and Information Technology - Spring Conference - Singapore (2009.04.17-2009.04.20)] 2009 International Association of Computer

Embed Size (px)

Citation preview

Page 1: [IEEE 2009 International Association of Computer Science and Information Technology - Spring Conference - Singapore (2009.04.17-2009.04.20)] 2009 International Association of Computer

Rescue and digitization of climate data by extraction from autographic weather charts

Anita S.Diwakar Sr. Lecturer, Computer Engineering Department,

[email protected] D.K.Nayak Principal

V.P.M’s Polytechnic, Thane (W), India

[email protected]

Prof. Poornima Talwai Ramrao Adik Institute of Technology

Navi Mumbai,India

Abstract— Data Rescue is the ongoing process of preserving all data at risk of being lost due to deterioration of the medium and digitizing current and past data into computer compatible form for easy access. IMD has conscientiously taken observations of the weather, recorded them either manually or automatically, transcribed them onto preset paper forms called autographic charts. Most of the charts are now at risk of being lost due to rapid deterioration of the medium. This paper presents algorithms for automatic extraction and storage of the atmospheric data using image processing tools. The key factors that influence the methods are speed, accuracy and space. We have tested the algorithms on scanned images of the autographic charts available at IMD and present the results.

Key-Words: - Image processing, Image edge analysis, Curve fitting

I. INTRODUCTION The Indian Metropolitan department currently has stacks

of historical weather charts. These weather charts record information about barometric pressure, temperature and humidity. Analyzing and compiling data from these charts is a time consuming task. Currently the user must read data from these charts and manually enter it into a computer database or spreadsheet one data point at a time. This process is time consuming, costly, and prone to human error. SAMEER has developed The Surface Weather Chart Digitizer (SWCD) software to demonstrate the extraction of the data values from IMD’s charts. This software utilizes the image processing tools for the various steps involved starting from reading the scanned image file to storing the data in a two dimensional array. The major factors are time, accuracy and space as the number of charts to be digitized is huge and an on going process. The climatic parameters, which are continuously monitored, are Rainfall, Temperature, Pressure, Relative Humidity, Wind Speed and Sunshine Duration. There are nearly 15000 of each type of charts which are to be digitized. These rescued data combined with already available data will enable better assessments of projections of the climate into the future that can serve as input for the policy makers to mitigate loss due to natural disasters and will provide increased information for economic development. Automatic data extraction from these graphs is challenging as they are of different types, they have various

imaging conditions, being old they are not in proper condition and they have noise which is diverse in nature. Xiaonan Lu et al have suggested a method for the extraction of data from 2D plots. This paper presents the results obtained after following the guidelines given and a quantitative report of the analysis carried out.

II. PROCEDURE The figure illustrates the procedure followed to obtain the

actual values of the climate parameters such as Temperature, Pressure, and Humidity from the autographic charts called the Thermograph, Microbarograph and Hygrograph respectively. The values obtained are compared with the actual values and the MSE is calculated.

Figure 1. Procedure for data extraction

Image Conversion

Detection of Axes

Location of origin

Finding Horizontal

and Vertical Resolution

Filtering/Noise removal and

binary masking

Obtaining absolute pixel values

Obtaining relative pixel values

Conversion to actual climate

parameter values: T, P, H

Image Input

2009 IACSIT Spring Conference

978-0-7695-3653-8/09 $25.00 © 2009 IEEE

DOI 10.1109/IACSIT-SC.2009.27

186

2009 International Association of Computer Science and Information Technology - Spring Conference

978-0-7695-3653-8/09 $25.00 © 2009 IEEE

DOI 10.1109/IACSIT-SC.2009.27

186

Page 2: [IEEE 2009 International Association of Computer Science and Information Technology - Spring Conference - Singapore (2009.04.17-2009.04.20)] 2009 International Association of Computer

A Image Conversion , Axis Detection and Location of Origin

The autographic charts are scanned and stored in a JPEG format. The image is converted to a logical form. The sobel edge detector is used to obtain the horizontal and vertical lines in the graphs. To identify the axes from the lines the column wise array addition is used and a threshold for the number of non zero pixels are chosen. Of all the edge detectors like sobel, prewitt, canny etc the sobel was found most effective in the above application. This computes the centre point in a neighborhood as follows.

g= [Gx 2 + Gy2]1/2 (1) g={[(z7+2z8+z9)-(z1+2z2+z3)]2+[(z3+2z6+z9)- (z1+2z4+z7)]2}1/2 (2) Where Gx and Gy are the first derivatives estimated

digitally. A pixel at location (x, y) is an edge pixel if g ≥T at that location, where T is a specified threshold.

To identify the Y axis from the lines the column wise array addition is carried out. Then a threshold for the number of pixels having 1’s is chosen. The index corresponding to the value is taken as the location of the Y axis. Thus the Y axis is detected. The mathematical equation of the axis is

X = Index (Threshold). For e.g. the Y axis for the Hygrograph shown is x = 3

Similarly to identify the X axis the transpose of the array is taken and column wise addition is carried out. Again a threshold value for the number of 1’s is chosen and the index corresponding to the value is taken as the location of X axis. Thus the X axis is detected. The mathematical equation of the axis is

Y=Index (Threshold). For e.g. the X axis for the

Hygrograph shown is y = 901 This procedure is carried out using the following Matlab

functions. Edge (Sobel) For detection of Horizontal and Vertical lines in the graph Sum To add the columns and get a threshold value Indices To locate the X and Y Axes from the number of lines After the X and Y axes have been detected the origin is

obtained from the point of intersection of the axes. [ x , y] = [3,901].

B Finding Horizontal and Vertical Resolution This is calculated based on the type of chart. The first

step in finding vertical resolution is to fix the length of Y axis. This is carried out from the lines detected after horizontal sobel edge detection. The length of Y axis is equal to the difference in the indices of the last and the first horizontal line detected. This is obtained in terms of number of pixels. The second step is then to find the vertical resolution. From this the vertical resolution obtained for the Hygrograph is 0.18762.The horizontal axes are straight lines

so getting vertical resolution and calculating the actual parameter value is simple math.

The first step in finding horizontal resolution is to fix the length of X axis. This is carried out from the lines detected after vertical sobel edge detection. The length of X axis is equal to the difference in the indices of the last and the first vertical line detected. This is obtained in terms of number of pixels. The vertical axes are not straight lines but curves. So to extract the exact time at which the particular parameter value is valid is obtained by Mat lab tools. The polynomials which best fit these curves is obtained after analysis of each type of chart. The fifth degree polynomial gives a best fit with minimum residual value.

C Image Filtering and Binary Masking The techniques used for image filtering and denoising depend on the quality of the original charts as the life of the charts range from 120 years to as recent as one day. The type of filtering algorithm used depends on the quality of the input image and hence the noise present. The extent of noise is calculated from a histogram of the filtered image at level 1and assigning a particular threshold to a particular algorithm. The denoising techniques used a suitable combination of median filtering and morphological methods like cleaning. The various methods were explored and results obtained for worst case and best case. The noise removal from the latest charts was 100% successful whereas from worst case it was found to be 90%. After the filtering binary masks are applied to the image to remove region based unwanted pixels. The binary mask to be used is dependent on the type of graph.

D Curve Thinning The curve obtained after filtering and masking is subjected to thinning. This reduces the thickness of the curve to one pixel by removing the adjacent pixels. The thinning is applied till stability is obtained.

E Obtaining absolute and relative pixel values After the filtering only the plot of the values is derived. The pixel values of the plot are obtained and stored in a two dimensional array. These absolute values are then used along with the pixel values of the origin to get the relative pixel values. They too are stored in a two dimensional array.

F Conversion to actual climate parameter values (T, P, and H)

The actual climate parameter values i.e. Temperature, Pressure and Humidity are calculated from the relative pixel values using the values of the horizontal and vertical resolution and the starting value of the plot. A program in C is used for the same.

III. EXPERIMENTAL RESULTS AND DISCUSSION We carried out the entire procedure on 100 charts of each of the three types, the life of the charts ranging from ten years

187187

Page 3: [IEEE 2009 International Association of Computer Science and Information Technology - Spring Conference - Singapore (2009.04.17-2009.04.20)] 2009 International Association of Computer

to as recent as one month. The charts were scanned, tagged and stored.

A Location of origin The pixel values of the origin were obtained and matched

with the actual values. Those graphs for which the values did not match were made to go through a different algorithm till the exact values were obtained. Before applying the filtering algorithm the quality and quantity of noise was analyzed and denoising applied likewise. The absolute and relative pixel values were calculated and stored in a two dimensional array. The actual climate parameter values were calculated from the horizontal and vertical resolution and the starting value of the parameter and also stored. The algorithm when tested on all the three types of graphs gave the precise pixel values of the origin with 100% accuracy for 85.8% of the charts on a first trial. For the remaining 14.2% graphs there was a difference of maximum 15 pixels either in horizontal or vertical axes. This was removed on the second trial and origin was fixed for each of the graphs accurately before evaluating the absolute pixel values.

B Actual climate parameter values (T, P, and H) The actual parameter values obtained from the extracted

curve after steps 2.2 to 2.6 were matched with the values on the graphs physically. A simple exercise was performed to verify the accuracy of the values. The physically observed values were manually tabulated for each type of parameter. A plot was drawn in MS Excel. The steps 2.1 to 2.6 were carried out on these plots and values were extracted. These values and the tabulated values were found to perfectly match. The results of the same are tabulated below.

TABLE 1. MEAN SQUARE ERRROR

Type of Graph Number of values MSEThermograph 242 1.75Hygrograph 230 2.31Microbarograph 314 2.54

IV. CONCLUSION The procedure specified in this paper will be applied to the huge number of charts, which have been recorded for the

last 80-90 years. The automatic extraction of values will help in reduction of time and money along with giving accurate data. Rigorous quality checks and assurances will be made before the data is actually made available for use. This will help in preserving the country’s climate data, which can be of great value to the meteorologists in general, and the scientific community at large. Major user groups include consultants; businesses; legal and engineering firms; government; researchers. The legal community makes up nearly one-third of the total customer profile for earth-platform data. The engineering community makes up nearly 10% of the customer base. Climatic data are used by engineers in many decision-making settings, including design, construction, marketing, and sales. Rescue and Digitization of Climate Data is being done the world over to help preserve our climate history.

ACKNOWLEDGEMENT This work is supported by Indian Meteorological Department and SAMEER, Mumbai.

REFERENCES [1] R. O. Duda and P. E. Hart. Use of the Hough transformation to detect

lines and curves in pictures. Communications of ACM, [2] Xiaonan Lu, James Z. Wang, Prasenjit Mitra, and C. Lee Giles,

“Automatic Extraction of Data from 2-D Plots in Documents, “Proceedings of Ninth International Conference on Document Analysis and Recognition, IEEE, Location, pp. 2822-2828, 2007. 15(1):11–15, 1972.

[3] O. R. Terrades and E. Valveny. Radon transform for lineal symbol representation. In Proceedings of the International Conference on Document Analysis and Recognition, pages 195–199, 2003.

[4] H. Freeman. Computer processing of line-drawing images .ACM Computing Surveys, 6(1):57–97, March 1974.

[5] L.S. Tan1, S. Burton2, R. Crouthamel3, A. van Engelen4, R. Hutchinson,5L. Nicodemus6 T.C. Peterson6, F. Rahimzadeh7, Guidelines on Climate Data Rescue, World Meteorological Organization, 2004.

[6] Marc S. Plantico, Foreign Weather Data Servicing at NCDC, 1995. [7] Rafael C Gonzalez,Richard E Woods,Steven L Eddins,Digital Image

Processing using MATLAB,Pearson Education,2004. [8] Gregory A Baxes, Digital Image Processing, John Wiley & Sons, Inc.,

1994.

Time in Second

Tem

pera

ture

in d

egre

e C

elsi

us

Actual Thermograph

Time in Second

Pres

sure

in m

illib

ars

Actual Microbarograph

188188

Page 4: [IEEE 2009 International Association of Computer Science and Information Technology - Spring Conference - Singapore (2009.04.17-2009.04.20)] 2009 International Association of Computer

Figure 2. Thermograph-Case 1 Results Figure 3. Microbarograph-Case 2 Results (Graph life one month) (Graph life eight years)

Time in Second Time in Second

Tem

pera

ture

in d

egre

e C

elsi

us

Thermograph after filtering level 1

Pres

sure

in m

illib

ars

Time in Second

Microbarograph after filtering level 1

Time in Second

Tem

pera

ture

in d

egre

e C

elsi

us

Thermograph after filtering and masking

Pr

essu

re in

mill

ibar

s Time in Second

Microbarograph after filtering and masking

Tem

pera

ture

in d

egre

e C

elsi

us

Time in Second

Pres

sure

in m

illib

ars

Time in Second

Plot of Extracted Values

189189