8
Rain Forecasting: Conditional Random Field and SVM approach Helin Wang Mechanical Engineering Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Abstract Machine learning approach to weather forecasting is relatively faster and requires less feature to get a reasonably well prediction of future weather oppose to weather prediction based on simulations. It can be served as a confidence tester for simu- lation or a substitution when we don’t have the resource to run simulation. As the rain fall information is two-dimensionally geographically, conditional ran- dom field becomes a good candidate for incoorprating this information. SVM seems also capable of solving this problem. I empirically compare the two pro- posed model and the result shows that SVM out perform CRF when data number grow bigger in our setting. 1 Introduction Modern weather forecasts are heavily depend on computer simulations based on physical differential equations and models. I tries to develop a machine learning schema that forecast weather without explicitly forming the equations and doing simulations, which is much faster than simulation in computational time and need much fewer feature than doing a simulation. 2 Problem definition 2.1 Goal In this specific case, I want to predict if it will rain or not in the next three days in Pittsburgh area, given all daily rain or not binary historical data. The mapping is from an N dimensional vector contains recent history raining infomation of Pittsburgh area, and current raining infomation of areas near Pittsburgh to a three demension vector in which each element indicates weather tomorror, the day after tomorrow and three days from now. 2.2 Data I collected hourly continous data of thirteen weather station in pennsylvania area from 2004-11-15 to 2011-11-14. The demension of data is fifteen, but I am only using one of it’s feature which is perception inch. Further more, I discretised it into binary data indicating rain occured in a day or not. The distribution of different stations are shown in Figure 1. The station I will do forecasting is the station under Pittsburgh, and I am using three kinds of station features. First, only consider Pittsburgh station as feature. Second, consider Pittsburgh station and the surrounding three stations. Last, consider every station shown on the figure. 1

Weather prediction

  • Upload
    helinw

  • View
    23

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Weather prediction

Rain Forecasting: Conditional Random Field andSVM approach

Helin WangMechanical Engineering Department

Carnegie Mellon UniversityPittsburgh, PA 15213

[email protected]

Abstract

Machine learning approach to weather forecasting is relatively faster and requiresless feature to get a reasonably well prediction of future weather oppose to weatherprediction based on simulations. It can be served as a confidence tester for simu-lation or a substitution when we don’t have the resource to run simulation.As the rain fall information is two-dimensionally geographically, conditional ran-dom field becomes a good candidate for incoorprating this information. SVMseems also capable of solving this problem. I empirically compare the two pro-posed model and the result shows that SVM out perform CRF when data numbergrow bigger in our setting.

1 Introduction

Modern weather forecasts are heavily depend on computer simulations based on physical differentialequations and models. I tries to develop a machine learning schema that forecast weather withoutexplicitly forming the equations and doing simulations, which is much faster than simulation incomputational time and need much fewer feature than doing a simulation.

2 Problem definition

2.1 Goal

In this specific case, I want to predict if it will rain or not in the next three days in Pittsburgh area,given all daily rain or not binary historical data.The mapping is from an N dimensional vector contains recent history raining infomation ofPittsburgh area, and current raining infomation of areas near Pittsburgh to a three demension vectorin which each element indicates weather tomorror, the day after tomorrow and three days from now.

2.2 Data

I collected hourly continous data of thirteen weather station in pennsylvania area from 2004-11-15to 2011-11-14. The demension of data is fifteen, but I am only using one of it’s feature which isperception inch. Further more, I discretised it into binary data indicating rain occured in a day ornot.The distribution of different stations are shown in Figure 1. The station I will do forecasting isthe station under Pittsburgh, and I am using three kinds of station features. First, only considerPittsburgh station as feature. Second, consider Pittsburgh station and the surrounding three stations.Last, consider every station shown on the figure.

1

Page 2: Weather prediction

Figure 1: Distribution of different stations.

3 Proposed method

I am using Conditional Random Field and Support Vector Machine seperately, and I will comparethe result of these two methods.The reason of using CRF is because it can encode relational information easily. For SVM, becauseit’s a very effective classifier.

3.1 Conditional Random Field

Figure 2: Designed graph structure.

Figure 2 is the designed graphy structure, in which Y indicates random variable nodes, X indicatesfeatures. Here Y mean the prediction of raining or not of tomorrow, day after tomorrow, three daysfrom today. X contain information of if the stations is raining now. Note the three different set ofstation is mentioned before.This model models P (Y |X) diectly. Comparing to MRF models joint probability, it models higher

2

Page 3: Weather prediction

percentage of thing we care.Considering X are always observed, it is a chain structure, which made decoding, inferencing, andestimation easier.I am using a package of CRF for matlab called UGM.Decode I use viterbi decoding and forward/backward as inferencing.Potentials are Log-linear potentials, and estimation uses a negative log likely hood gradient method.Given graph structure, estimates w by all historical data, and predicts future by decoding and infer-encing.

3.2 SVM

I use the need-to-predict station as one node and choose another station as another node of the pair.For each pair, train three SVMs to predict three seperate days from now on. And finally uses a votingmethod to vote out if a future day will rain.By the voting senario, adding new stations becomes easy. Furthermore, I can use adaboost on eachweak learner.

4 Experiments

4.1 Questions I have

In CRF, if I want single prediction (not a sequence), is viterbi decoding better, or forward/backwardinference better?Given fixed amount of data, what percentage of data we use as training data leads to less empericalerror?What the relation between number of stations as feature, number of training data, and accuracy?

4.2 Experiments I performed

I set two total amount of data, 100 data and 500 data.For each amount of data, test take from 1 percent to 91 percent data as training data and rest astesting data.For each case above, let station feature vary from one station to thirteen stations.

4.3 Experiments detail

Figure 3, 4 are CRF result, there are seven lines in each graph, indicating viterbi sequence decodingaccuracy, which random accuracy is 12.5 percent. Rest six lines are viterbi decoding and inferenceon each of the day to be predicted.Figure 5,6 are SVM result, there are three line, and intuitively, tomorrow has the highest accuracyand day after tomorrow has the second highest.

4.4 Observations

Normally forward/backward inference has a better accuracy in single day prediction, but the gapbecome smaller as number of training data grow.40 percent of data used for training and rest for testing seems to give us a reseanably better accuracy.In CRF case, when add feature stations, with same amount of data, accuracy drops. It’s reasonablebecause model complexity increases.In SVM voting case, when add feature stations, with same amount of data, accuracy merely changes.Since this voting rule is similar to adaboost, it seems reasonable.For both CRF and SVM, switching 100 data points to 500 data points, accuracy drops.Over all, especially in the 500 data point case, SVM outproforms CRF. I may be explained as CRFdoes not has direction information, while after training of SVM we have direction information.

3

Page 4: Weather prediction

4.5 Conclusions

Overall both method does a fair job considering I only uses the binary raining data.SVM outperform CRF actually surprises me because I think CRF can encode more relational infor-mation. I think one reason is that the graph design I made did not enable CRF functioning powerfulenough.SVM is simple to use, only one line matlab code. CRF I am using UGM for matlab, it take muchmore time to learn.

4.6 Future work

Put month temperature in to two models, a further rise in accuracy is expected.Since data is real time, I am interested in building a website providing machine learining weatherforecasting service.

4.7 Experiments detail

References

[1] Hanna M. Wallach (2004) Conditional Random Fields: An Introduction.

[2] Xuming He, Richard S. Zemel & Miguel A. Carreira-Perpinan Multiscale Conditional Random Fields forImage Labeling

[3] Jun Zhu, Zaiqing Nie, Ji-Rong Wen, Bo Zhang, Wei-Ying Ma 2D Conditional Random Fields for WebInformation Extraction

4

Page 5: Weather prediction

Figure 3: CRF result on 1 station, 4 stations, 13 stations, 100 data points

5

Page 6: Weather prediction

Figure 4: CRF result on 1 station, 4 stations, 13 stations, 500 data points

6

Page 7: Weather prediction

Figure 5: SVM result on 1 station, 4 stations, 13 stations, 100 data points

7

Page 8: Weather prediction

Figure 6: SVM result on 1 station, 4 stations, 13 stations, 500 data points

8