Upload
poornima
View
215
Download
3
Embed Size (px)
Citation preview
Rescue and digitization of climate data by extraction from autographic weather charts
Anita S.Diwakar Sr. Lecturer, Computer Engineering Department,
[email protected] D.K.Nayak Principal
V.P.M’s Polytechnic, Thane (W), India
Prof. Poornima Talwai Ramrao Adik Institute of Technology
Navi Mumbai,India
Abstract— Data Rescue is the ongoing process of preserving all data at risk of being lost due to deterioration of the medium and digitizing current and past data into computer compatible form for easy access. IMD has conscientiously taken observations of the weather, recorded them either manually or automatically, transcribed them onto preset paper forms called autographic charts. Most of the charts are now at risk of being lost due to rapid deterioration of the medium. This paper presents algorithms for automatic extraction and storage of the atmospheric data using image processing tools. The key factors that influence the methods are speed, accuracy and space. We have tested the algorithms on scanned images of the autographic charts available at IMD and present the results.
Key-Words: - Image processing, Image edge analysis, Curve fitting
I. INTRODUCTION The Indian Metropolitan department currently has stacks
of historical weather charts. These weather charts record information about barometric pressure, temperature and humidity. Analyzing and compiling data from these charts is a time consuming task. Currently the user must read data from these charts and manually enter it into a computer database or spreadsheet one data point at a time. This process is time consuming, costly, and prone to human error. SAMEER has developed The Surface Weather Chart Digitizer (SWCD) software to demonstrate the extraction of the data values from IMD’s charts. This software utilizes the image processing tools for the various steps involved starting from reading the scanned image file to storing the data in a two dimensional array. The major factors are time, accuracy and space as the number of charts to be digitized is huge and an on going process. The climatic parameters, which are continuously monitored, are Rainfall, Temperature, Pressure, Relative Humidity, Wind Speed and Sunshine Duration. There are nearly 15000 of each type of charts which are to be digitized. These rescued data combined with already available data will enable better assessments of projections of the climate into the future that can serve as input for the policy makers to mitigate loss due to natural disasters and will provide increased information for economic development. Automatic data extraction from these graphs is challenging as they are of different types, they have various
imaging conditions, being old they are not in proper condition and they have noise which is diverse in nature. Xiaonan Lu et al have suggested a method for the extraction of data from 2D plots. This paper presents the results obtained after following the guidelines given and a quantitative report of the analysis carried out.
II. PROCEDURE The figure illustrates the procedure followed to obtain the
actual values of the climate parameters such as Temperature, Pressure, and Humidity from the autographic charts called the Thermograph, Microbarograph and Hygrograph respectively. The values obtained are compared with the actual values and the MSE is calculated.
Figure 1. Procedure for data extraction
Image Conversion
Detection of Axes
Location of origin
Finding Horizontal
and Vertical Resolution
Filtering/Noise removal and
binary masking
Obtaining absolute pixel values
Obtaining relative pixel values
Conversion to actual climate
parameter values: T, P, H
Image Input
2009 IACSIT Spring Conference
978-0-7695-3653-8/09 $25.00 © 2009 IEEE
DOI 10.1109/IACSIT-SC.2009.27
186
2009 International Association of Computer Science and Information Technology - Spring Conference
978-0-7695-3653-8/09 $25.00 © 2009 IEEE
DOI 10.1109/IACSIT-SC.2009.27
186
A Image Conversion , Axis Detection and Location of Origin
The autographic charts are scanned and stored in a JPEG format. The image is converted to a logical form. The sobel edge detector is used to obtain the horizontal and vertical lines in the graphs. To identify the axes from the lines the column wise array addition is used and a threshold for the number of non zero pixels are chosen. Of all the edge detectors like sobel, prewitt, canny etc the sobel was found most effective in the above application. This computes the centre point in a neighborhood as follows.
g= [Gx 2 + Gy2]1/2 (1) g={[(z7+2z8+z9)-(z1+2z2+z3)]2+[(z3+2z6+z9)- (z1+2z4+z7)]2}1/2 (2) Where Gx and Gy are the first derivatives estimated
digitally. A pixel at location (x, y) is an edge pixel if g ≥T at that location, where T is a specified threshold.
To identify the Y axis from the lines the column wise array addition is carried out. Then a threshold for the number of pixels having 1’s is chosen. The index corresponding to the value is taken as the location of the Y axis. Thus the Y axis is detected. The mathematical equation of the axis is
X = Index (Threshold). For e.g. the Y axis for the Hygrograph shown is x = 3
Similarly to identify the X axis the transpose of the array is taken and column wise addition is carried out. Again a threshold value for the number of 1’s is chosen and the index corresponding to the value is taken as the location of X axis. Thus the X axis is detected. The mathematical equation of the axis is
Y=Index (Threshold). For e.g. the X axis for the
Hygrograph shown is y = 901 This procedure is carried out using the following Matlab
functions. Edge (Sobel) For detection of Horizontal and Vertical lines in the graph Sum To add the columns and get a threshold value Indices To locate the X and Y Axes from the number of lines After the X and Y axes have been detected the origin is
obtained from the point of intersection of the axes. [ x , y] = [3,901].
B Finding Horizontal and Vertical Resolution This is calculated based on the type of chart. The first
step in finding vertical resolution is to fix the length of Y axis. This is carried out from the lines detected after horizontal sobel edge detection. The length of Y axis is equal to the difference in the indices of the last and the first horizontal line detected. This is obtained in terms of number of pixels. The second step is then to find the vertical resolution. From this the vertical resolution obtained for the Hygrograph is 0.18762.The horizontal axes are straight lines
so getting vertical resolution and calculating the actual parameter value is simple math.
The first step in finding horizontal resolution is to fix the length of X axis. This is carried out from the lines detected after vertical sobel edge detection. The length of X axis is equal to the difference in the indices of the last and the first vertical line detected. This is obtained in terms of number of pixels. The vertical axes are not straight lines but curves. So to extract the exact time at which the particular parameter value is valid is obtained by Mat lab tools. The polynomials which best fit these curves is obtained after analysis of each type of chart. The fifth degree polynomial gives a best fit with minimum residual value.
C Image Filtering and Binary Masking The techniques used for image filtering and denoising depend on the quality of the original charts as the life of the charts range from 120 years to as recent as one day. The type of filtering algorithm used depends on the quality of the input image and hence the noise present. The extent of noise is calculated from a histogram of the filtered image at level 1and assigning a particular threshold to a particular algorithm. The denoising techniques used a suitable combination of median filtering and morphological methods like cleaning. The various methods were explored and results obtained for worst case and best case. The noise removal from the latest charts was 100% successful whereas from worst case it was found to be 90%. After the filtering binary masks are applied to the image to remove region based unwanted pixels. The binary mask to be used is dependent on the type of graph.
D Curve Thinning The curve obtained after filtering and masking is subjected to thinning. This reduces the thickness of the curve to one pixel by removing the adjacent pixels. The thinning is applied till stability is obtained.
E Obtaining absolute and relative pixel values After the filtering only the plot of the values is derived. The pixel values of the plot are obtained and stored in a two dimensional array. These absolute values are then used along with the pixel values of the origin to get the relative pixel values. They too are stored in a two dimensional array.
F Conversion to actual climate parameter values (T, P, and H)
The actual climate parameter values i.e. Temperature, Pressure and Humidity are calculated from the relative pixel values using the values of the horizontal and vertical resolution and the starting value of the plot. A program in C is used for the same.
III. EXPERIMENTAL RESULTS AND DISCUSSION We carried out the entire procedure on 100 charts of each of the three types, the life of the charts ranging from ten years
187187
to as recent as one month. The charts were scanned, tagged and stored.
A Location of origin The pixel values of the origin were obtained and matched
with the actual values. Those graphs for which the values did not match were made to go through a different algorithm till the exact values were obtained. Before applying the filtering algorithm the quality and quantity of noise was analyzed and denoising applied likewise. The absolute and relative pixel values were calculated and stored in a two dimensional array. The actual climate parameter values were calculated from the horizontal and vertical resolution and the starting value of the parameter and also stored. The algorithm when tested on all the three types of graphs gave the precise pixel values of the origin with 100% accuracy for 85.8% of the charts on a first trial. For the remaining 14.2% graphs there was a difference of maximum 15 pixels either in horizontal or vertical axes. This was removed on the second trial and origin was fixed for each of the graphs accurately before evaluating the absolute pixel values.
B Actual climate parameter values (T, P, and H) The actual parameter values obtained from the extracted
curve after steps 2.2 to 2.6 were matched with the values on the graphs physically. A simple exercise was performed to verify the accuracy of the values. The physically observed values were manually tabulated for each type of parameter. A plot was drawn in MS Excel. The steps 2.1 to 2.6 were carried out on these plots and values were extracted. These values and the tabulated values were found to perfectly match. The results of the same are tabulated below.
TABLE 1. MEAN SQUARE ERRROR
Type of Graph Number of values MSEThermograph 242 1.75Hygrograph 230 2.31Microbarograph 314 2.54
IV. CONCLUSION The procedure specified in this paper will be applied to the huge number of charts, which have been recorded for the
last 80-90 years. The automatic extraction of values will help in reduction of time and money along with giving accurate data. Rigorous quality checks and assurances will be made before the data is actually made available for use. This will help in preserving the country’s climate data, which can be of great value to the meteorologists in general, and the scientific community at large. Major user groups include consultants; businesses; legal and engineering firms; government; researchers. The legal community makes up nearly one-third of the total customer profile for earth-platform data. The engineering community makes up nearly 10% of the customer base. Climatic data are used by engineers in many decision-making settings, including design, construction, marketing, and sales. Rescue and Digitization of Climate Data is being done the world over to help preserve our climate history.
ACKNOWLEDGEMENT This work is supported by Indian Meteorological Department and SAMEER, Mumbai.
REFERENCES [1] R. O. Duda and P. E. Hart. Use of the Hough transformation to detect
lines and curves in pictures. Communications of ACM, [2] Xiaonan Lu, James Z. Wang, Prasenjit Mitra, and C. Lee Giles,
“Automatic Extraction of Data from 2-D Plots in Documents, “Proceedings of Ninth International Conference on Document Analysis and Recognition, IEEE, Location, pp. 2822-2828, 2007. 15(1):11–15, 1972.
[3] O. R. Terrades and E. Valveny. Radon transform for lineal symbol representation. In Proceedings of the International Conference on Document Analysis and Recognition, pages 195–199, 2003.
[4] H. Freeman. Computer processing of line-drawing images .ACM Computing Surveys, 6(1):57–97, March 1974.
[5] L.S. Tan1, S. Burton2, R. Crouthamel3, A. van Engelen4, R. Hutchinson,5L. Nicodemus6 T.C. Peterson6, F. Rahimzadeh7, Guidelines on Climate Data Rescue, World Meteorological Organization, 2004.
[6] Marc S. Plantico, Foreign Weather Data Servicing at NCDC, 1995. [7] Rafael C Gonzalez,Richard E Woods,Steven L Eddins,Digital Image
Processing using MATLAB,Pearson Education,2004. [8] Gregory A Baxes, Digital Image Processing, John Wiley & Sons, Inc.,
1994.
Time in Second
Tem
pera
ture
in d
egre
e C
elsi
us
Actual Thermograph
Time in Second
Pres
sure
in m
illib
ars
Actual Microbarograph
188188
Figure 2. Thermograph-Case 1 Results Figure 3. Microbarograph-Case 2 Results (Graph life one month) (Graph life eight years)
Time in Second Time in Second
Tem
pera
ture
in d
egre
e C
elsi
us
Thermograph after filtering level 1
Pres
sure
in m
illib
ars
Time in Second
Microbarograph after filtering level 1
Time in Second
Tem
pera
ture
in d
egre
e C
elsi
us
Thermograph after filtering and masking
Pr
essu
re in
mill
ibar
s Time in Second
Microbarograph after filtering and masking
Tem
pera
ture
in d
egre
e C
elsi
us
Time in Second
Pres
sure
in m
illib
ars
Time in Second
Plot of Extracted Values
189189