Hydrologic regionalization

Embed Size (px)

Citation preview

  • 8/6/2019 Hydrologic regionalization

    1/26

    Hydrologic Regionalization With Clustering

    By

    Nirdesh Kumar-06004008

  • 8/6/2019 Hydrologic regionalization

    2/26

    Intro ction

    R gionalization

    Areas with homogeneous hydrologic response.

    Applications-hydrologic design, planning, management of water resources

    systems, regional trend analysis and frequency analysis of floods, low flows and

    other variables.

    Attributes-factors influencing hydrology in the area.

    Physiographic-drainage area, slope of the mainchannel in the drainage

    basin, soil runoff coefficient and storage.

    Location-latitude, longitude and elevation.

    Meteaorological- Specific humidity, temperature, wind velocity, wind

    direction and rainfall.

    On basis of attributes, sites are selected-Feature vectors. Cluster-Regions containing feature vectors with similar hydrologic response.

    Optimum number of clusters obtained by application of cluster validity indices.

  • 8/6/2019 Hydrologic regionalization

    3/26

    Tests applied to check the homogeneity of the region-Regional homogeneity test.

    Regions adjusted to improve homogeneity.

    Selection of variables influencing the hydrology in a region as attributes

    Preparation of feature vectors using selected variables

    Formation of clusters by applying clustering algorithm

    Identification of optimum number of clusters

    Validation of regions to test their homogeneity

    Adjustment of heterogeneous regions

  • 8/6/2019 Hydrologic regionalization

    4/26

    Clustering Techniques

    Clustering-Variety of multivariate statistical procedures that are used to

    investigate, interpret and classify given data into similar groups or clusters, which

    may or may not be overlapping.

    The data points within a cluster should be as similar as possible and the data

    points of different clusters should be as dissimilar as possible.

    Various Algorithms are used for clustering-K-means algorithm, single linkage,

    complete linkage and Wards algorithm.

  • 8/6/2019 Hydrologic regionalization

    5/26

    Hydrologic Regionalization With Clustering

    Clustering Algorithms

    K-Means Algorithm

    N feature vectors in n-dimensional attribute space

    is the value of attribute j in ith feature vector

    Each feature vector represents one of the N sites in the study region.

    Rescaling-process necessary to nullify the effects of the differences in their

    variance and relative magnitudes.

  • 8/6/2019 Hydrologic regionalization

    6/26

    denotes the rescaled value of

    Represents standard deviation of attribute j.

    Mean value of attribute j over all N feature vectors.

    K-number of clusters.

    Nk -number of feature vectors in cluster k.

    -rescaled value of attribute j in the feature vector I assigned to cluster k.

    -mean value of attribute j for cluster k, computed as

  • 8/6/2019 Hydrologic regionalization

    7/26

    Minimizing F, distance of each feature vector from the centre of the cluster to which

    it belongs, is minimized.

    Steps involved in K-means algorithm to delineate clusters for a given value of K

    are:

    1- Set current iteration number t to 0 and maximum number of iterations to t_max.

    2- Initialize K cluster centers to random values in the multidimensional feature vector

    space.

    3- Initialize the current feature vector number i to 1.

    4- Determine Euclidean distance of ith feature vector from centers of each of the Kclusters, and assign it to the cluster whose center is nearest to it.

    5- If i < N, increment i to i + 1 and go to step 4; otherwise continue with step 6.

    6- Update the centroid of each cluster by computing average of the feature vectors

    assigned to it. Then compute F for the current iteration t. If t = 0, increase t to t + 1

    and go to step 3. If t > 0, compute the difference in the values of F for iterations t and

    t - 1. Terminate the algorithm if change in the value of F between two successiveiterations is insignificant; otherwise, continue with step 7.

    7- If t < t_max, update t to t + 1 and go to step 3; otherwise, terminate the algorithm.

  • 8/6/2019 Hydrologic regionalization

    8/26

    Single linkage and complete linkage algorithms

    Single linkage-Distance between the cluster [yi ,yj ], formed by merging clusters yiand yj ,and yk ,is the smaller of the distances between yi and yk or yj and yk . Complete linkage-distance between the new cluster [yi ,yj ] and any other singleton

    cluster yk is the greater of the distances between yi and yk or yj and yk .

    Single linkage Complete linkage

  • 8/6/2019 Hydrologic regionalization

    9/26

    Wards algorithm

    The objective function, W, of Wards algorithm minimizes the sum of squares of

    deviations of the feature vectors from the centroid of their respective clusters.

    At each step in the analysis, union of every possible pair of clusters is considered

    and two clusters whose fusion results in the smallest increase in W are merged.

    The change depends only on the relationship between the two merged clusters

    and not on the relationships with other clusters.

  • 8/6/2019 Hydrologic regionalization

    10/26

    Cluster Validity Indices

    Identification of optimum number of compact and well separated clusters.

    Dunns index

    ( Ci ,Cj )-Distance between clusters Ci and Cj

    (Ck )-Intracluster distance of cluster Ck .

  • 8/6/2019 Hydrologic regionalization

    11/26

    Regional Homogeneity Test

    Heterogeneity of the set of plausible regions obtained from the cluster analysis

    is assessed. Uses the advantages offered by sampling properties of L-moment ratios.

    Examines whether the between-site dispersion of the sample LMRs for the

    group of sites under consideration is larger than the dispersion expected in a

    homogeneous region.

    tRRegional average coeficient of L-variation(L-CV).

    t4RRegional average L-kurtosis.

    t3R

    Regional average L-skewness.

    -Weight apllied to sample L moment ratios at site i.

  • 8/6/2019 Hydrologic regionalization

    12/26

    Heterogeneity measures (HM) can be based on three measures of dispersion.

    (1) weighted standard deviation of the at-site sample L-CVs (V);

    (2) weighted average distance from the site to the group weighted mean in the twodimensional space of L-CV and L-skewness ();

    (3) weighted average distance from the site to the group weighted mean in the

    two dimensional space of L-skewness and L-kurtosis ().

  • 8/6/2019 Hydrologic regionalization

    13/26

    For each simulated realization(homogeneous region) V1 ,V2 and V3 are computed.

    v ,v2 ,v3 are mean deviations and v ,v2 ,v3 are the standard deviations of the

    simulated realizations.

    HM

  • 8/6/2019 Hydrologic regionalization

    14/26

    (6) Merging a region with another or others;

    (7) Merging two or more regions and redefining groups;

    (8) Obtaining more data and redefining regions.

    irst three options are useful in reducing the alues of heterogeneit measures of a region

    Options 7 help in ensuring that each region is sufficientl large in terms of collecti e

    data length at all the sites in it

  • 8/6/2019 Hydrologic regionalization

    15/26

  • 8/6/2019 Hydrologic regionalization

    16/26

    Serial

    Number

    Region

    Name

    Number of

    Grid Points

    Region Type

    1 Peninsular 4 23.28 5. 3 0.26 Definitely

    heterogeneous

    2 West

    Central

    86 10.8 0.64 -1.33 Definitely

    heterogeneous

    3 Northwest 6 20. 6 5.87 -1.08 Definitely

    heterogeneous

    4 Central

    Northeast

    5 4.32 -0.73 -1. 0 Definitely

    heterogeneous

    5 Northeast 36 4.44 -0. 1 1.06 Definitely

    heterogeneous

    Results and Discussion

    The statistical homogeneity of each of the five IMD SMR regions is tested using SMR

    data at grid points in the region as shown in the table below.

    The IMD regions are adjusted to improve their homogeneity and tabulated in table 2.

    Figure 2 shows the number of sites removed to make the regions acceptably

    homogeneous.

    Table 1- Characteristics of the IMD SMR Regions Determined Using Heterogeneity Measures

  • 8/6/2019 Hydrologic regionalization

    17/26

    Figure - MRregions that are consi ere as

    homogeneous y IMD

    Figure - MRregions after a justing

  • 8/6/2019 Hydrologic regionalization

    18/26

    Serial Number Region Name Number of Grid

    Points

    Heterogeneity Measures Number of Grid

    Points

    Eliminated

    1 Peninsular 27 0.75 -0.34 1.35 22

    2 West Central 62 0.80 -1.17 -2.03 243 Northwest 40 0.84 -0.86 -1. 0 2

    4 Central Northeast 45 0.74 -0.86 -1.47 14

    5 Northeast 32 0.45 -1.30 -1.06 04

    Table 2-Characteristics of SMR regions after adjusting

    To delineate new homogeneous SMR regions in the study region, 52 out of 60 NCEP

    grid boxes covering India are considered

    Rain gauge density low in himalayan region(8 boxes discarded).

    mean monthly values of each of the 15 atmospheric variables are considered at each

    NCEP grid point for the summer monsoon months.

    60 values (15 variables *16 grid points*4 months) are obtained for each grid point.

    The principal components and standardized location attributes (latitude, longitude, and

    average elevation of terrain in each of the NCEP grid boxes) are considered as attributes

    to form 52 feature vectors for K-means cluster analysis, to reduce redundancy.

  • 8/6/2019 Hydrologic regionalization

    19/26

    Figure 3- Grid boxes covering India.

    Atmospheric variables influencing rainfall in the hashed box are considered at 16

    NCEP grid points shown as black dots surrounding the box.

    To know the exact number of regions,K-means algorithm is applied and cluster

    validity indices are computed to determine the optimum numbe rof clusters.

    Figure 4- Identification of optimal partition

    provided by K-means clustering algorithm

  • 8/6/2019 Hydrologic regionalization

    20/26

    Partition with the minimum value for Davies-Bouldin index and the

    maximum value for Dunns and Calinski-Harabasz indices is considered

    as the optimal partition.

    Several of the clusters obtained using K-means algorithm for thechoice of K greater than 15 are found to be quite small in size, therefore

    clusters obtained for K = 15 are selected as optimal partition.

    Figure - lusters inoptimal partitionobtaine using -means

    algorithm.

  • 8/6/2019 Hydrologic regionalization

    21/26

    Cluster Number Cluster Size(in Number of IMD

    Grid Points)

    Heterogeneity Measures

    1 15 -1.56 -0.46 0.03

    2 2 10. 1 2.7 -0.83

    3 22 17.27 5.46 1.22

    4 25 .65 1.03 -0.33

    5 38 5.20 -1.71 -3.11

    6 53 5.43 -0.40 -1.78

    7 4.27 0.36 -1.07

    8 6 -0.43 -1.85 -1.55

    13 6.08 -1.22 -1.86

    10 53 12.37 0.30 -2.17

    11 8.51 6.18 4.55

    12 4 2.45 0.86 -0.14

    13 46 12.67 2.74 -0.16

    14 20 -0.20 -1.24 -1.07

    15 11 2.46 -0.05 -1.06

    Table 3- Characteristics of the Clusters in Optimal Partition

    Obtained Using K-Means Algorithm.

    Table 3 shows that clusters 8 and 14 are found to be acceptably homogeneous,cluster 1 is possibly homogeneous, whereas the remaining clusters are heterogeneous.

    Overall, 23 out of the 301 IMD grid points considered for regionalization are

    unallocated, as they are eliminated from different regions to improve statistical

    homogeneity.

    Six sites are transferred to other regions, and 33 sites are separated from clusters to

    form new regions.

  • 8/6/2019 Hydrologic regionalization

    22/26

    Table 4- Details of Region Formation From Optimal Partition Obtained Using K-MeansAlgorithm. The regions are adjusted and all 17 regions are classified as either acceptably

    homogeneous or possibly homogeneous.

  • 8/6/2019 Hydrologic regionalization

    23/26

    Table 5-Characteristics of the Regions Formed by Adjusting Clusters Obtained Using K-

    Means Algorithm

  • 8/6/2019 Hydrologic regionalization

    24/26

    Figure 6-Homogeneous rainfall regions obtained by adjusting the clusters

    We observe that the number of sites that had to be eliminated from the regions for

    improving their statistical homogeneity is found to be excessive, indicating that the IMD

    SMR regions are not useful as precursors to derive homogeneous SMR regions.

    New SMR regions are delineated using the proposed methodology.

  • 8/6/2019 Hydrologic regionalization

    25/26

    Conclusion

    Existing approaches based on statistics computed from observed hydrology.

    Independent validation of the delineated regions for homogeneity in hydrology is not

    possible.

    Uncertainty in forming homogeneous regions in areas having a limited hydrological

    data available.

    Proposed method has the ability to form regions irrespective of the available data(raingauges for this study).

    However, as seen in this study, there is uncertainty in validating homogeneous regions in

    areas having a few rain gauges.

  • 8/6/2019 Hydrologic regionalization

    26/26

    ThankY