Hydrologic regionalization

8/6/2019 Hydrologic regionalization

1/26

Hydrologic Regionalization With Clustering

By

Nirdesh Kumar-06004008


2/26

Intro ction

R gionalization

Areas with homogeneous hydrologic response.

Applications-hydrologic design, planning, management of water resources

systems, regional trend analysis and frequency analysis of floods, low flows and

other variables.

Attributes-factors influencing hydrology in the area.

Physiographic-drainage area, slope of the mainchannel in the drainage

basin, soil runoff coefficient and storage.

Location-latitude, longitude and elevation.

Meteaorological- Specific humidity, temperature, wind velocity, wind

direction and rainfall.

On basis of attributes, sites are selected-Feature vectors. Cluster-Regions containing feature vectors with similar hydrologic response.

Optimum number of clusters obtained by application of cluster validity indices.


3/26

Tests applied to check the homogeneity of the region-Regional homogeneity test.

Regions adjusted to improve homogeneity.

Selection of variables influencing the hydrology in a region as attributes

Preparation of feature vectors using selected variables

Formation of clusters by applying clustering algorithm

Identification of optimum number of clusters

Validation of regions to test their homogeneity

Adjustment of heterogeneous regions


4/26

Clustering Techniques

Clustering-Variety of multivariate statistical procedures that are used to

investigate, interpret and classify given data into similar groups or clusters, which

may or may not be overlapping.

The data points within a cluster should be as similar as possible and the data

points of different clusters should be as dissimilar as possible.

Various Algorithms are used for clustering-K-means algorithm, single linkage,

complete linkage and Wards algorithm.


5/26

Hydrologic Regionalization With Clustering

Clustering Algorithms

K-Means Algorithm

N feature vectors in n-dimensional attribute space

is the value of attribute j in ith feature vector

Each feature vector represents one of the N sites in the study region.

Rescaling-process necessary to nullify the effects of the differences in their

variance and relative magnitudes.


6/26

denotes the rescaled value of

Represents standard deviation of attribute j.

Mean value of attribute j over all N feature vectors.

K-number of clusters.

Nk -number of feature vectors in cluster k.

-rescaled value of attribute j in the feature vector I assigned to cluster k.

-mean value of attribute j for cluster k, computed as


7/26

Minimizing F, distance of each feature vector from the centre of the cluster to which

it belongs, is minimized.

Steps involved in K-means algorithm to delineate clusters for a given value of K

are:

1- Set current iteration number t to 0 and maximum number of iterations to t_max.

2- Initialize K cluster centers to random values in the multidimensional feature vector

space.

3- Initialize the current feature vector number i to 1.

4- Determine Euclidean distance of ith feature vector from centers of each of the Kclusters, and assign it to the cluster whose center is nearest to it.

5- If i < N, increment i to i + 1 and go to step 4; otherwise continue with step 6.

6- Update the centroid of each cluster by computing average of the feature vectors

assigned to it. Then compute F for the current iteration t. If t = 0, increase t to t + 1

and go to step 3. If t > 0, compute the difference in the values of F for iterations t and

t - 1. Terminate the algorithm if change in the value of F between two successiveiterations is insignificant; otherwise, continue with step 7.

7- If t < t_max, update t to t + 1 and go to step 3; otherwise, terminate the algorithm.


8/26

Single linkage and complete linkage algorithms

Single linkage-Distance between the cluster [yi ,yj ], formed by merging clusters yiand yj ,and yk ,is the smaller of the distances between yi and yk or yj and yk . Complete linkage-distance between the new cluster [yi ,yj ] and any other singleton

cluster yk is the greater of the distances between yi and yk or yj and yk .

Single linkage Complete linkage


9/26

Wards algorithm

The objective function, W, of Wards algorithm minimizes the sum of squares of

deviations of the feature vectors from the centroid of their respective clusters.

At each step in the analysis, union of every possible pair of clusters is considered

and two clusters whose fusion results in the smallest increase in W are merged.

The change depends only on the relationship between the two merged clusters

and not on the relationships with other clusters.


10/26

Cluster Validity Indices

Identification of optimum number of compact and well separated clusters.

Dunns index

( Ci ,Cj )-Distance between clusters Ci and Cj

(Ck )-Intracluster distance of cluster Ck .


11/26

Regional Homogeneity Test

Heterogeneity of the set of plausible regions obtained from the cluster analysis

is assessed. Uses the advantages offered by sampling properties of L-moment ratios.

Examines whether the between-site dispersion of the sample LMRs for the

group of sites under consideration is larger than the dispersion expected in a

homogeneous region.

tRRegional average coeficient of L-variation(L-CV).

t4RRegional average L-kurtosis.

t3R

Regional average L-skewness.

-Weight apllied to sample L moment ratios at site i.


12/26

Heterogeneity measures (HM) can be based on three measures of dispersion.

(1) weighted standard deviation of the at-site sample L-CVs (V);

(2) weighted average distance from the site to the group weighted mean in the twodimensional space of L-CV and L-skewness ();

(3) weighted average distance from the site to the group weighted mean in the

two dimensional space of L-skewness and L-kurtosis ().


13/26

For each simulated realization(homogeneous region) V1 ,V2 and V3 are computed.

v ,v2 ,v3 are mean deviations and v ,v2 ,v3 are the standard deviations of the

simulated realizations.

HM


14/26

(6) Merging a region with another or others;

(7) Merging two or more regions and redefining groups;

(8) Obtaining more data and redefining regions.

irst three options are useful in reducing the alues of heterogeneit measures of a region

Options 7 help in ensuring that each region is sufficientl large in terms of collecti e

data length at all the sites in it


15/26


16/26

Serial

Number

Region

Name

Number of

Grid Points

Region Type

1 Peninsular 4 23.28 5. 3 0.26 Definitely

heterogeneous

2 West

Central

86 10.8 0.64 -1.33 Definitely

heterogeneous

3 Northwest 6 20. 6 5.87 -1.08 Definitely

heterogeneous

4 Central

Northeast

5 4.32 -0.73 -1. 0 Definitely

heterogeneous

5 Northeast 36 4.44 -0. 1 1.06 Definitely

heterogeneous

Results and Discussion

The statistical homogeneity of each of the five IMD SMR regions is tested using SMR

data at grid points in the region as shown in the table below.

The IMD regions are adjusted to improve their homogeneity and tabulated in table 2.

Figure 2 shows the number of sites removed to make the regions acceptably

homogeneous.

Table 1- Characteristics of the IMD SMR Regions Determined Using Heterogeneity Measures


17/26

Figure - MRregions that are consi ere as

homogeneous y IMD

Figure - MRregions after a justing


18/26

Serial Number Region Name Number of Grid

Points

Heterogeneity Measures Number of Grid

Points

Eliminated

1 Peninsular 27 0.75 -0.34 1.35 22

2 West Central 62 0.80 -1.17 -2.03 243 Northwest 40 0.84 -0.86 -1. 0 2

4 Central Northeast 45 0.74 -0.86 -1.47 14

5 Northeast 32 0.45 -1.30 -1.06 04

Table 2-Characteristics of SMR regions after adjusting

To delineate new homogeneous SMR regions in the study region, 52 out of 60 NCEP

grid boxes covering India are considered

Rain gauge density low in himalayan region(8 boxes discarded).

mean monthly values of each of the 15 atmospheric variables are considered at each

NCEP grid point for the summer monsoon months.

60 values (15 variables *16 grid points*4 months) are obtained for each grid point.

The principal components and standardized location attributes (latitude, longitude, and

average elevation of terrain in each of the NCEP grid boxes) are considered as attributes

to form 52 feature vectors for K-means cluster analysis, to reduce redundancy.


19/26

Figure 3- Grid boxes covering India.

Atmospheric variables influencing rainfall in the hashed box are considered at 16

NCEP grid points shown as black dots surrounding the box.

To know the exact number of regions,K-means algorithm is applied and cluster

validity indices are computed to determine the optimum numbe rof clusters.

Figure 4- Identification of optimal partition

provided by K-means clustering algorithm


20/26

Partition with the minimum value for Davies-Bouldin index and the

maximum value for Dunns and Calinski-Harabasz indices is considered

as the optimal partition.

Several of the clusters obtained using K-means algorithm for thechoice of K greater than 15 are found to be quite small in size, therefore

clusters obtained for K = 15 are selected as optimal partition.

Figure - lusters inoptimal partitionobtaine using -means

algorithm.


21/26

Cluster Number Cluster Size(in Number of IMD

Grid Points)

Heterogeneity Measures

1 15 -1.56 -0.46 0.03

2 2 10. 1 2.7 -0.83

3 22 17.27 5.46 1.22

4 25 .65 1.03 -0.33

5 38 5.20 -1.71 -3.11

6 53 5.43 -0.40 -1.78

7 4.27 0.36 -1.07

8 6 -0.43 -1.85 -1.55

13 6.08 -1.22 -1.86

10 53 12.37 0.30 -2.17

11 8.51 6.18 4.55

12 4 2.45 0.86 -0.14

13 46 12.67 2.74 -0.16

14 20 -0.20 -1.24 -1.07

15 11 2.46 -0.05 -1.06

Table 3- Characteristics of the Clusters in Optimal Partition

Obtained Using K-Means Algorithm.

Table 3 shows that clusters 8 and 14 are found to be acceptably homogeneous,cluster 1 is possibly homogeneous, whereas the remaining clusters are heterogeneous.

Overall, 23 out of the 301 IMD grid points considered for regionalization are

unallocated, as they are eliminated from different regions to improve statistical

homogeneity.

Six sites are transferred to other regions, and 33 sites are separated from clusters to

form new regions.


22/26

Table 4- Details of Region Formation From Optimal Partition Obtained Using K-MeansAlgorithm. The regions are adjusted and all 17 regions are classified as either acceptably

homogeneous or possibly homogeneous.


23/26

Table 5-Characteristics of the Regions Formed by Adjusting Clusters Obtained Using K-

Means Algorithm


24/26

Figure 6-Homogeneous rainfall regions obtained by adjusting the clusters

We observe that the number of sites that had to be eliminated from the regions for

improving their statistical homogeneity is found to be excessive, indicating that the IMD

SMR regions are not useful as precursors to derive homogeneous SMR regions.

New SMR regions are delineated using the proposed methodology.


25/26

Conclusion

Existing approaches based on statistics computed from observed hydrology.

Independent validation of the delineated regions for homogeneity in hydrology is not

possible.

Uncertainty in forming homogeneous regions in areas having a limited hydrological

data available.

Proposed method has the ability to form regions irrespective of the available data(raingauges for this study).

However, as seen in this study, there is uncertainty in validating homogeneous regions in

areas having a few rain gauges.


26/26

ThankY

Documents

Hydrologic regionalization