Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

Embed Size (px)

Citation preview

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    1/42

    DataData--drivendriven modellingmodelling

    in waterin water--related problems.related problems.PART 3PART 3

    Dimitri P. Solomatine

    www.ihe.nl/hi/sol [email protected]

    UNESCO-IHE Institute for Water EducationHydroinformatics Chair

    D.P. Solomatine. Data-driven modelling (part 3). 2

    Finding groups (clusters) in dataFinding groups (clusters) in data

    (unsupervised learning)(unsupervised learning)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    2/42

    D.P. Solomatine. Data-driven modelling (part 3). 3

    ClusteringClustering

    classificationis aimed at identifying mapping (function) thatmaps any given input xito a nominal variable (class) yi.

    finding the groups (clusters) in an input data set is clustering

    Clustering is often the preparation phase for classification:

    the identified clusters could be labelled as classes, each inputinstance then can be associated with an output value (class) andthe instances set {xi, yi} can be built

    Cluster1

    Cluster 2

    Cluster 3

    a) b)

    D.P. Solomatine. Data-driven modelling (part 3). 4

    Reasons to use clusteringReasons to use clustering

    labelling large data sets can be very costly;

    clustering may actually give an insight into the data and helpdiscover classes which are not known in advance;

    clustering may find featuresthat can be used for categorization.

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    3/42

    D.P. Solomatine. Data-driven modelling (part 3). 5

    VoronoiVoronoi diagramsdiagrams

    D.P. Solomatine. Data-driven modelling (part 3). 6

    Methods for clusteringMethods for clustering

    partition-based clustering (K-means, fuzzy C-means, based onEuclidean distance);

    hierarchical clustering (agglomerative hierarchical clustering,nearest-neighbour algorithm);

    feature extraction methods: principal component analysis(PCA), self-organizing feature (SOF) maps (also referred to asKohonen neural networks).

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    4/42

    D.P. Solomatine. Data-driven modelling (part 3). 7

    kk--meansmeans clausteringclaustering

    find the best division ofNsamples by Kclusters Cisuch that thetotal distance between the clustered samples and theirrespective centers (that is, the total variance) is minimized:

    where i is the center of class i.

    =

    =K

    i Cn

    in

    i

    xJ1

    2||

    D.P. Solomatine. Data-driven modelling (part 3). 8

    kk--means clustering: algorithmmeans clustering: algorithm

    1 randomly assigning instances to the clusters

    2 compute the centers according to

    3 reassigne the instances to the nearest clusters center

    4 recalculate centers

    5 reassign the instances to new centers

    repeat 2-5 until total variance J stops decreasing (or centersstop to move).

    =iCn

    n

    i

    i xN

    |1

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    5/42

    D.P. Solomatine. Data-driven modelling (part 3). 9

    kk--means clustering: illustrationmeans clustering: illustration

    D.P. Solomatine. Data-driven modelling (part 3). 10

    KohonenKohonen networknetwork

    (Self(Self--organizing feature maporganizing feature map -- SOFM)SOFM)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    6/42

    D.P. Solomatine. Data-driven modelling (part 3). 11

    SOFM: main ideaSOFM: main idea

    x1

    1

    1 2 3

    j

    N

    2 M

    x2

    1112

    1M

    j(0)j(t1)

    j(t2)

    NM

    xM

    ...

    ...

    a) b)

    D.P. Solomatine. Data-driven modelling (part 3). 12

    SOFM: algorithm (1)SOFM: algorithm (1)

    0 Initialize weight, normally with small random values.

    Set topological neighborhood parameters.

    Set learning rate parameters.

    Iteration number t= 1.

    1 While stopping conditionis false, do iteration t(steps 28):

    2 For each input vector x = {x1,..., xN} do steps 3 8:

    3 For each output node kcalculate the similarity measure (in thiscase the Euclidean distance) between the input and the weight

    vector: =

    =N

    i

    iikxwkD

    1

    2)()(

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    7/42

    D.P. Solomatine. Data-driven modelling (part 3). 13

    SOFM: algorithm (2)SOFM: algorithm (2)

    4 Find index kmaxsuch that D(k)is a minimum this will refer to thewinning node.

    5 Update the weights for the node kmaxand for all nodes kwithin aspecified neighborhood radius rfrom kmax:

    6 Update learning rate (t)

    7 Reduce radius rused in the neigborhood function N(this can bedone less frequently than at each iteration).

    8 Test stopping condition.

    )]([),()()()1( twxtrNttwtw ikiikik +=+

    D.P. Solomatine. Data-driven modelling (part 3). 14

    SOFM: exampleSOFM: example

    Input set: sampling points in a square randomly (the probabilityof sampling a point in the central square region was 20 timesgreater than elsewhere in a square)

    The target space is discrete and includes 100 output nodesarranged in 2 dimensions

    SOFM is able to find the cluster the area of the pointsconcentration

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    8/42

    D.P. Solomatine. Data-driven modelling (part 3). 15

    SOFM: visualisation and interpretationSOFM: visualisation and interpretation

    count maps, which is the easiestand mostly used method. This is aplot showing for each output nodenumber of times when it was awinning one. It can beinterpolated into colour shading aswell

    distance matrix (of size K x K)which elements are Euclideandistance of each output unit to its

    immediate neighbouring units

    D.P. Solomatine. Data-driven modelling (part 3). 16

    SOFM: visualization and interpretationSOFM: visualization and interpretation

    vector positionor cluster maps:

    colours are coded according totheir similarity in the inputspace

    each dot corresponds to oneoutput map unit

    each map unit is connected toits neighbours by line

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    9/42

    D.P. Solomatine. Data-driven modelling (part 3). 17

    SOFM: visualization and interpretationSOFM: visualization and interpretation

    vector positionor cluster maps:

    in 3D

    D.P. Solomatine. Data-driven modelling (part 3). 18

    InstanceInstance--based learningbased learning

    (lazy learning)(lazy learning)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    10/42

    D.P. Solomatine. Data-driven modelling (part 3). 19

    Lazy and eager learningLazy and eager learning

    Eager learning:

    first ML (data-driven) model is built

    then it is tested and used

    Lazy learning

    no ML model is built (i.e lazy)

    when newexamples come, the output is generated immediatelyon the basis of the training examples

    Other names for lazy learning:

    Instance-based

    Exemplar-based

    Case-based

    Experience-based

    Edited k-nearest neighbor

    D.P. Solomatine. Data-driven modelling (part 3). 20

    kk--Nearest neighbors method: classificationNearest neighbors method: classification

    instances are points in 2-dim. space, output is boolean (+ or -)

    new instance xq is classified w.r.t. proximity of nearest traininginstances

    to class + (if 1 neighbor is considered)

    to class - (if 4 neighbors are considered)

    for discrete-valued outputs assign: the most common value

    VoronoiVoronoi diagram for 1diagram for 1--Nearest neighborNearest neighbor

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    11/42

    D.P. Solomatine. Data-driven modelling (part 3). 21

    NotationsNotations

    instance x as {a1(x) ... an(x)} where ar(x) denotes the value ofthe r-th attribute of instance x.

    distance between two instances xiand xj is defined to bed(xi, xj) where

    2))()((),( jrirji xaxaxxd =

    D.P. Solomatine. Data-driven modelling (part 3). 22

    kk--NearestNearest neighborneighbor algorithmalgorithm

    Training

    Build the set of training examples D.

    Classification

    Given a query instance xq to be classified,

    Let x1... xkdenote the kinstances from Dthat are nearest to xq Return

    where (a, b)=1, ifa= b, and (a, b)=0 otherwise

    V= {v1,,vs} set of possible output values.

    ==

    k

    i

    iVv

    q xfvxF1

    ))(,(maxarg)(

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    12/42

    D.P. Solomatine. Data-driven modelling (part 3). 23

    kk--Nearest neighbors: regressionNearest neighbors: regression

    (target function is real(target function is real--valued )valued )

    model a real-valued target function F: n .

    instances are points in n-dim. space, output is a real number

    new instance xq is valued w.r.t.

    values of nearest training instances (average ofkinstances istaken, or the weighted average)

    values and proximity of nearest training instances (locallyweighted regressionmodel is built and used to predict the value ofnew instance)

    In this case the final line on the k-NN algorithm should bereplaced by the line

    k

    xf

    xF

    k

    i

    i

    q

    == 1

    ))(

    )(

    D.P. Solomatine. Data-driven modelling (part 3). 24

    Distance weighted kDistance weighted k--NN algorithmNN algorithm

    (classification)(classification)

    weigh the contribution of each of the kneighbors according totheir distance to the query point xq, giving greater weight witocloser neighbors

    This can be accomplished by replacing the final line in thealgorithm by

    where the weight is

    =

    =k

    i

    iiVv

    q xfvwxF1

    ))(,(maxarg)(

    2),(

    1

    iq

    ixxd

    w =

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    13/42

    D.P. Solomatine. Data-driven modelling (part 3). 25

    Distance weighted kDistance weighted k--NN algorithmNN algorithm

    (numerical prediction)(numerical prediction)

    for real-valued output this is accomplished by replacing the finalline in the algorithm by

    where the weight is

    2),(

    1

    iqi xxdw =

    =

    ==k

    i

    i

    k

    i

    ii

    q

    w

    xfw

    xF

    1

    1

    ))(

    )(

    D.P. Solomatine. Data-driven modelling (part 3). 26

    kk--Nearest neighbors: using all examplesNearest neighbors: using all examples

    for classification:

    for regression:

    =

    =instancesAll

    i

    iiVv

    q xfvwxF1

    ))(,(maxarg)(

    =

    ==instancesAll

    i

    i

    instancesAll

    i

    ii

    qw

    xfw

    xF

    1

    1

    ))(

    )(

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    14/42

    D.P. Solomatine. Data-driven modelling (part 3). 27

    kk--Nearest neighbors: commentsNearest neighbors: comments

    k-NN creates a localmodel of the proximity of new instance, insteadof a globalmodel of all training instances

    robust to noisy training data

    requires considerable amount of data

    distance between instances is calculated based on allattributes (andnot on 1 as in decision trees). Possible problem: imagine instances described by 20 attributes, but only 2 are relevant to

    target function

    curse of dimensionality: nearest neighbor method is easily mislead whenhigh-dimensional X

    solution: stretch j-th axis by weight zj chosen to minimize prediction error

    with number of training instances , kNN approaches Bayesian

    optimal classification

    D.P. Solomatine. Data-driven modelling (part 3). 28

    Locally weighted regression (1)Locally weighted regression (1)

    construct an explicit approximation F(x) of the target functionf(x)over a localregion surrounding the new query point xq

    If F(x) is linear then this is called locally weighted linearregression

    Instead of minimizing the global error E, here the local errorE(xq) has to be minimized

    )(...)()( 110 xawxawwxF nn+++=

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    15/42

    D.P. Solomatine. Data-driven modelling (part 3). 29

    Locally weighted regression (2)Locally weighted regression (2)

    Various approaches to minimizing error E(xq):

    Minimize the squared error over just k nearest neighbors

    Minimize the squared error over entire set Dof training examples,while weighting the error of each training example by somedecreasing function Kof its distance from xq:

    Combine 1 and 2 (to reduce computational costs):

    2

    1 ))()((2

    1)(

    =qxofnbrsnearestkx

    q xFxfxE

    )),(())()((2

    1)( 22 xxdKxFxfxE q

    Dx

    q

    =

    )),(())()((2

    1)(

    2

    3 xxdKxFxfxE qxofnbrsnearestkx

    q

    q

    =

    D.P. Solomatine. Data-driven modelling (part 3). 30

    CaseCase--base reasoning (CBR)base reasoning (CBR)

    instance-based learning, but output is not-real valued but isrepresented by symbolic descriptions

    methods used to retrieve similar instances are more elaborate(not just Euclidean distance)

    Applications:

    conceptual design of mechanical devices based on a stored libraryof previous designs (Sycara 1992)

    new legal cases based on previous rulings (Ashley 1990)

    selection of an appropriate hydrological model based on previousexperience (Kukuric 1997, PhD of IHE)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    16/42

    D.P. Solomatine. Data-driven modelling (part 3). 31

    Remarks on Lazy and Eager learningRemarks on Lazy and Eager learning

    Lazy methods: k-NN, locally weighted regression, CBR

    Eager learners: are "eager" to before they observe the testinginstance xqthey already built the global approximation of thetarget function.

    Lazy learners:

    defer the decision of how to generalize beyond the training data untileach new instance is encountered,

    when newexamples come, the output is generated immediately onthe basis of nearest training examples

    Lazy learners have a richer set of hypotheses - they select anappropriate hypothesis (e.g. linear function) for each new instance

    So Lazy methods are better suited to customize to unknownfuture instances

    D.P. Solomatine. Data-driven modelling (part 3). 32

    Fuzzy ruleFuzzy rule--based systemsbased systems

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    17/42

    D.P. Solomatine. Data-driven modelling (part 3). 33

    Fuzzy logicFuzzy logic

    introduced in 1965 by Lotfi ZADEH, Univ. of Berkeley

    Boolean logic is two-valued (False, True). Fuzzy logic is multi-valued (False...AlmostFalse...AlmostTrue...True)

    Fuzzy set theory deals with degree of truththat the outcomebelongs to a certain category (partial truth)

    a fuzzy seton a universe U: for any uU there is acorresponding real number A(u)[0,1] called grade ofmembershipof u belonging to A

    mappingA: U [0,1]is called membership functionof A

    D.P. Solomatine. Data-driven modelling (part 3). 34

    Example of an ordinary and a fuzzy set "tall people"Example of an ordinary and a fuzzy set "tall people"

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    18/42

    D.P. Solomatine. Data-driven modelling (part 3). 35

    Various shapes of membership functionsVarious shapes of membership functions

    [, +] is supportof the fuzzy set, 1 is its kernel

    -

    1

    +

    0

    1

    a) Triangular membership function

    -

    1

    +

    0

    1

    b) Bell-shaped function

    -

    1

    +

    0

    1

    c) Dome-shaped function

    -

    1

    +

    0

    1

    d) Inverted cycloid function

    D.P. Solomatine. Data-driven modelling (part 3). 36

    Example of a membership functionExample of a membership function

    "appropriate water level in the reservoir""appropriate water level in the reservoir"

    supportsupportkernelkernel

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    19/42

    D.P. Solomatine. Data-driven modelling (part 3). 37

    AlphaAlpha--cutcut

    0.5-cut = [4.5, 7.0]

    D.P. Solomatine. Data-driven modelling (part 3). 38

    Fuzzy numbersFuzzy numbers

    Special cases of fuzzy sets are fuzzy numbers

    A fuzzy subset Aof the set of real numbers is called a fuzzynumberif

    there is at least one zsuch that A(z) = 1 (normality assumption)

    for every real numbers a, b, cwith a< c< b

    A(c) min (A(a), A(b))(convexity assumption, meaning that the membership function of a

    fuzzy number consists of an increasing and decreasing part, and

    possibly flat parts)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    20/4222

    D.P. Solomatine. Data-driven modelling (part 3). 39

    Linguistic variable: exampleLinguistic variable: example

    WATER LEVEL

    Enough Volume for

    flood detentionNavigableEnvironmentally

    Friendly

    11

    11

    1

    0

    10.8

    0.9

    0.3

    0.80.7

    0.5

    0.7

    0.9

    11

    1

    0.2

    0 5 10 15 20 25 30 35 40 45 50

    Water level (m)

    BASE

    VARIABLE

    LINGUISTIC VARIABLE

    Fuzzy RestrictionFuzzy Values

    of water level

    CompatibilityLinks

    Linguistic variablecan take linguistic values (like low, high, navigable)associated with fuzzy subsets Mof the universe U(here U= [0,50])

    D.P. Solomatine. Data-driven modelling (part 3). 40

    Operations on fuzzy setsOperations on fuzzy sets

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    21/4222

    D.P. Solomatine. Data-driven modelling (part 3). 41

    Fuzzy rulesFuzzy rules

    Fuzzy rules are linguistic constructs of the type

    IF A THEN B

    where A and B are collections of propositions containing linguisticvariables (i.e. variables with linguistic values). A is called apremiseand B is the consequenceof the rule.

    If there are Kpremises in a system, the i-th rule has the form:

    where ais a crisp input,Aand Bare linguistic variables, is oneof the operators AND, OR, XOR.

    ikikii BthenAisaAisaAisaIf ,2,21,1 ...

    D.P. Solomatine. Data-driven modelling (part 3). 42

    Additive model of combining rulesAdditive model of combining rules

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    22/4222

    D.P. Solomatine. Data-driven modelling (part 3). 43

    Fuzzy ruleFuzzy rule--based systems (FS)based systems (FS)

    use linguistic variables based on fuzzy logic

    based on encoding relationships between variables in the formof rules

    rules are generated through the analysis of a large datasamples

    such rules are used to produce the values of the outputvariables given new input values

    D.P. Solomatine. Data-driven modelling (part 3). 44

    Example: Fuzzy rules in controlExample: Fuzzy rules in control

    MEDIUM

    SLOW

    STOP

    FAST

    BLAST

    AIR MOTOR SPEED0 20 40 60 80 100

    1.0

    0.0

    0.6

    0.2

    AIR MOTOR SPEED0 20 40 60 80 100

    1.0

    0.0

    0.6

    0.2

    AIR MOTOR SPEED0 20 40 60 80 100

    1.0

    0.0

    0.6

    0.2

    The weighted sum

    combination method.

    The crested weightedsum combinationmethod.

    RIGHT

    COOL

    COLD

    WARM

    HOT

    TEMPERATURE0C

    5 10 15 20 25 30 35

    1.0

    0.0

    0.6

    0.2

    defuzzyfication using

    centroid of the area

    RIGHT

    COOL

    COLD

    WARM

    HOT

    BLAST

    FAST

    MEDIUM

    SLOW

    STOP

    TEMPERATURE5 10 15 20 25 30 35

    If Warm,then

    fast

    If Cool,then

    slow

    If Right,then

    medium

    If Hot,then

    blast

    If Cold,then

    stop

    100

    80

    60

    40

    20

    0

    AIRMOTORSPEED

    rules like: IF Temperature is CoolTHEN AirMotorSpeed := Slow

    Input: Temperature = 22What will be the AirMotorSpeed?

    Temperature is

    RIGHT with

    degree of

    fulfillment (DOF)

    = 0.6

    and WARM with

    DOF = 0.2

    two rules are fired

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    23/4222

    D.P. Solomatine. Data-driven modelling (part 3). 45

    Combining premises in a ruleCombining premises in a rule

    Degree of fulfillment (DOF)is the extent to which the premise(left) part of a fuzzy rule is satisfied

    The means to combine the memberships of the inputs to thecorresponding fuzzy sets into a DOF is called inference

    Product inferencefor rule iis defined as:

    (rule is sensitive to the change in the amount of truth containedin each premise)

    Minimum inferencefor rule i is define like this:

    ( ) =

    ==K

    k

    kAii aADOF ki1

    )(,

    ( ) ( )kA

    Kkii aMinADOF ki ,..1

    =

    ==

    D.P. Solomatine. Data-driven modelling (part 3). 46

    Combining rules: example for 2 inputsCombining rules: example for 2 inputs

    Input 2

    Output

    Input 1

    LL

    H

    M

    M

    H

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    24/4222

    D.P. Solomatine. Data-driven modelling (part 3). 47

    Combining rules: weighted sum combinationCombining rules: weighted sum combination

    weighted sum combinationuses the DOF of each rule as aweight

    If there are Irules each having a response fuzzy set BiwithDOF ofi, the combined membership function

    =

    ==I

    i

    Biu

    I

    i

    Bi

    B

    xMax

    x

    x

    i

    i

    1

    1

    )(

    )(

    )(

    AIR MOTOR SPEED0 20 40 60 80 100

    1.0

    0.0

    0.6

    0.2 The weighted sumcombination method.

    D.P. Solomatine. Data-driven modelling (part 3). 48

    Combining rules: crested weighted sum combinationCombining rules: crested weighted sum combination

    crested weighted sum combinationis there, when each outputmembership function is clipped off at a height corresponding tothe rules degree of fulfillment

    If there are Irules each having a response fuzzy set BiwithDOF ofi, the combined membership function

    ( )

    ( )

    =

    ==I

    i

    Biu

    I

    i

    Bi

    B

    xMinMax

    xMin

    x

    i

    i

    1

    1

    )(,

    )(,

    )(

    AIR MOTOR SPEED0 20 40 60 80 100

    1.0

    0.0

    0.6

    0.2

    The crested weighted

    sum combination

    method.

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    25/4222

    D.P. Solomatine. Data-driven modelling (part 3). 49

    Combining rules:Combining rules: defuzzificationdefuzzification

    Defuzzificationis a mapping from a fuzzy consequence ofconsequences Bito a crisp consequence

    this is actually the identification of the fuzzy mean

    the most widely used method is:

    find the centroid (center of gravity) of the area below themembership function and take its abscissa coordinate as the crispoutput.

    AIR MOTOR SPEED0 20 40 60 80 100

    1.0

    0.0

    0.6

    0.2The weighted sum

    combination method.

    defuzzyfication usingcentroid of the area

    D.P. Solomatine. Data-driven modelling (part 3). 50

    In the previous example the rules were given.In the previous example the rules were given.

    But how to build them from data?But how to build them from data?

    the following is given/assumed:

    the known rule structure, that is the number of premises in eachrule

    shapes of membership functions

    the number of rules

    the training set Tis given: a set ofSobserved inputs (a) andoutput (b) real-valued vectors:

    It is assumed that we are training Irules with Kpremises in asystem, where the i-th rule has the following form:

    where a is a crisp input,Aand Bare triangular fuzzy numbers.

    parameters ofA and B (supports and kernels) are to be found

    ( ) ( ) ( )( ){ }SssbsasaT K ,...,1;,,...,1 ==

    ikikii BthenAisaANDANDAisaANDAisaIf ,2,21,1 ...

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    26/4222

    D.P. Solomatine. Data-driven modelling (part 3). 51

    Building rules from data:Building rules from data:

    weighted counting algorithm (1)weighted counting algorithm (1)

    D.P. Solomatine. Data-driven modelling (part 3). 52

    Building rules from data:Building rules from data:

    weighted counting algorithm (2)weighted counting algorithm (2)

    uses the subset of the training set that satisfies the premises ofa rule at least to a degree of fulfilment of threshold toconstruct the shape of the corresponding consequence

    It is accomplished with the following steps (iis the rule number,kis the premise number)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    27/4222

    D.P. Solomatine. Data-driven modelling (part 3). 53

    Building rules from data:Building rules from data:

    weighted counting algorithm (3)weighted counting algorithm (3)

    1 Define the support (-i,k +i,k) of the i-th rules premiseAi,k 2 Ai,k is assumed to be a triangular fuzzy number

    (-i,k 1i,k ,

    +i,k)T where

    1i,k is the mean of all possible ak(s)values

    which fulfil at least partially the ithrule:

    3 Calculate the DOFs i(s)for each premise vector(a1(s) ak(s))corresponding to the training set Tand each rule iwhose premises were determined in step 1.

    4 Select a threshold > 0such that only responses with DOF >will be considered in the construction of the rule response. The

    corresponding response is assumed to be also a triangular fuzzynumber (-i,k 1i,k , +i,k)T defined by:

    )()(

    sbMins

    ii

    >

    =

    >

    >=

    )(

    )(1

    )(

    )()(

    s i

    s i

    i

    i

    i

    s

    sbs)(

    )(sbMax

    si

    i

    >

    + =

    =iRs

    k

    i

    ki saN

    )(11

    ,

    D.P. Solomatine. Data-driven modelling (part 3). 54

    Fuzzy ruleFuzzy rule--based system:based system:

    learning rules from datalearning rules from data

    HISTORICAL

    DATATRAINING

    RULES

    CRISP

    INPUT

    (X)

    FUZZIFIER

    FUZZYINFERENCE

    ENGINE

    DEFUZZIFIERCRISP

    OUTPUT

    (Y)

    EXPERT

    JUDGEMENTS

    (are not considered here)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    28/4222

    D.P. Solomatine. Data-driven modelling (part 3). 55

    Modeling spatial

    rainfall distr ibut ionusing Fuzzy ru le-

    based system :

    filling missing data inpast records estimating rainfalldepth at a stationCaprile (based on datafor Arabba andAndraz) in case of asudden equipmentfailure

    Arabba

    Andraz

    Caprile

    Case study: catchment in Veneto region, ItalyCase study: catchment in Veneto region, Italy

    D.P. Solomatine. Data-driven modelling (part 3). 56

    Problem formulationProblem formulation

    Daily precipitation at three stations in 1985-91

    Data split for training and verification

    Daily precipitation at Andraz & Arabba used to determine thedaily precipitation at Caprile

    Performance indices

    Mean square error (MSE) b/n modeled & observed data

    Percentage of predictions within a predefined tolerance target (5%is used)

    Problems:

    missing records in training data

    non-uniform distribution ofdata

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    29/4222

    D.P. Solomatine. Data-driven modelling (part 3). 57

    MethodsMethods cosideredcosidered

    Traditional Normal ratio method

    Neural network

    Fuzzy rule-based system

    ++

    = CC

    XB

    B

    XA

    A

    XX P

    N

    NP

    N

    NP

    N

    NP

    3

    1

    D.P. Solomatine. Data-driven modelling (part 3). 58

    How many rules to use?How many rules to use?

    Too many rules lead to overfitting and the higher error onverification

    Effect of the Number of Rules

    2

    4

    6

    8

    10

    12

    4 9 16 25 36

    Number of rules

    MeanSq

    uareError

    1988-91(T)

    1985-87(V)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    30/4233

    D.P. Solomatine. Data-driven modelling (part 3). 59

    Results: best performanceResults: best performance

    Training Performance

    (1989-91)

    0

    10

    20

    30

    40

    50

    60

    70

    80

    0 10 20 30 40 50 60 70 80

    Observed Precipitation

    SimulatedPrecipitation

    Verification Performance

    (1985-88)

    0

    10

    20

    30

    40

    50

    60

    70

    80

    0 10 20 30 40 50 60 70 80

    Observed Precipitation

    SimulatedPrecipitation

    Precipitation at CAPRILE for the

    first 120 days of 1987

    D.P. Solomatine. Data-driven modelling (part 3). 60

    Veneto case study: comparison of fuzzy rules,Veneto case study: comparison of fuzzy rules,

    neural network and the normal ratio methodneural network and the normal ratio method

    Performance Comparison

    (Case 1)

    2

    3

    4

    5

    6

    7

    8

    9

    1989-9

    1(T)198

    5(V)198

    6(V)198

    7(V)198

    8(V)

    1985-8

    8(V)

    MeanSquareError

    FRBS NNN TRAD

    Performance Comparison

    (Case 1)

    86

    88

    90

    92

    94

    96

    98

    1989-9

    1(T)198

    5(V)198

    6(V)198

    7(V)198

    8(V)

    1985-8

    8(V)

    Within5%Tolerance

    FRBS NNN TRAD

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    31/4233

    D.P. Solomatine. Data-driven modelling (part 3). 61

    Veneto case study: conclusionsVeneto case study: conclusions

    FRBS was the most accurate than the ANN and the Normal ratiomethod

    training is faster than of ANN

    Issues to pay attention to:

    curse of dimensionality: more than 5 inputs is very difficult tohandle

    too many rules may cause overfitting

    non-uniformly distributed data lead to empty areas where rulescannot be trained

    D.P. Solomatine. Data-driven modelling (part 3). 62

    Case studyCase study DelflandDelfland: training ANN or Fuzzy: training ANN or Fuzzy

    controller on data obtained from an optimal controllercontroller on data obtained from an optimal controllerin water level controlin water level control

    Hydrological

    processes in

    the polders

    ANN or

    FRBS

    model+ -

    Aquarius

    optimal

    controller

    Training

    y(t)d y(t) u(t) y(t)

    water

    level

    Target

    waterlevel

    Error in

    controlsignal

    pumping

    rate

    data-driven controller (ANN or Fuzzy rule-based system) is trained ondata generated by the optimal controller, then can replace it

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    32/4233

    D.P. Solomatine. Data-driven modelling (part 3). 63

    Case study:Case study: DelflandDelfland

    D.P. Solomatine. Data-driven modelling (part 3). 64

    Replicating controller by ANNReplicating controller by ANN

    (output(output -- pump status at timepump status at time tt))

    Input variables in Local control

    water level at time t-1

    water level at time t

    pump status at time t-1

    Input variables in Centralised dynamic control

    precipitation at time t-2

    precipitation at time t-1

    precipitation at time t

    water level at time t-1 water level at time t

    groundwater level at time t

    pump status at time t-1

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    33/4233

    D.P. Solomatine. Data-driven modelling (part 3). 65

    Performance of the Neural network reproducingPerformance of the Neural network reproducing

    behaviourbehaviour of an optimal controllerof an optimal controller

    Pumpstatus

    D.P. Solomatine. Data-driven modelling (part 3). 66

    Fuzzy rules reproducing optimal control of water level inFuzzy rules reproducing optimal control of water level in

    DelflandDelfland

    Pumpstatus

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    34/4233

    D.P. Solomatine. Data-driven modelling (part 3). 67

    Bayesian learningBayesian learning

    D.P. Solomatine. Data-driven modelling (part 3). 68

    Bayesian theoremBayesian theorem

    we are interested in determining the best hypothesis hfromsome space H, given the observed data D

    Some notations:

    P(h) = prior probability that hypothesis h holds

    P(D) = prior probability that training data D will be observed(without knowledge which hypothesis holds)

    P(D/h) = probability of observing data D given h holds

    P(h/D) = probability that h holds given observed data D

    Bayes theorem:

    )(

    )()/()/(

    DP

    hPhDPDhP =

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    35/4233

    D.P. Solomatine. Data-driven modelling (part 3). 69

    Selecting "best" hypothesis usingSelecting "best" hypothesis using BayesBayes theoremtheorem

    learning in Bayesian sense: selecting the most probablehypothesis (maximum a posteriori hypothesis MAP)

    P(D/h) is called likelihoodof data D given h

    if all hypotheses are equally probable, then maximum likelihood(ML)hypothesis:

    )()/(maxarg

    )(

    )()/(maxarg

    )/(maxarg

    hPhDP

    DP

    hPhDP

    DhPh

    Hh

    Hh

    HhMAP

    =

    =

    )/(maxarg hDPhHh

    ML

    =

    D.P. Solomatine. Data-driven modelling (part 3). 70

    Bayesian learning: exampleBayesian learning: example

    hypothesis h = "patient has cancer", alternative = "no cancer"

    prior knowledge (without data): P(h)=0.008

    data that can be observed: test with 2 outcomes (+ or -):

    right results:

    P(+/cancer) = 0.98 P(-/nocancer) = 0.97

    errors:

    P(-/cancer) = 0.02 P(+/nocancer) = 0.03

    suppose data is observed: a patient is tested and result is +is then hypothesis correct?: choose hypothesis with MAP, that ishypothesis for which P(D/h)P(h) = max

    P(+/cancer) P(cancer) = 0.98 * 0.008 = 0.0078

    P(+/nocancer) P(nocancer) = 0.03 * 0.992 = 0.0298

    --> hypothesis "no cancer" wins

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    36/4233

    D.P. Solomatine. Data-driven modelling (part 3). 71

    NaiveNaive BayesBayes classifierclassifier

    assume that each instance xof the data set is characterized bythe several attributes {a1,, an}

    target function F(x) can take on any value from a finite set V

    a set of training examples {xi} is provided

    when a new instance < a1,, an> is presented, the classifiershould identify the most probable target value vMAP.

    D.P. Solomatine. Data-driven modelling (part 3). 72

    NaNaveve BayesBayes classifier (2)classifier (2)

    This condition can be written like this

    or by applying Bayes theorem:

    P(vj) can be estimated simply by counting the frequency withwhich each target value vjoccurs in data

    )()/,...,(maxarg

    ),...,(

    )()/,...,(maxarg

    1

    1

    1

    jjnVv

    n

    jjn

    VvMAP

    vPvaaP

    aaP

    vPvaaPv

    j

    j

    =

    =

    ),...,/(maxarg 1 njVv

    MAP aavPvj

    =

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    37/4233

    D.P. Solomatine. Data-driven modelling (part 3). 73

    NaNaveve BayesBayes classifier (3)classifier (3)

    terms P(a1,, an/ vj) can be estimated by counting in a similarway, however, the total number of these terms is equal to thenumber of possible instances times the number of possibletarget values - so it is difficult

    The solution is in a simplifying assumption that the attributevalues a1,, anare conditionally independent given the targetvalue. In this case P(a1,, an/ vj) = i P(ai/ vj) and to estimateP(ai/ vj) is much easier also by counting the frequency.

    This gives the rule of the nave Bayes classifier:

    =i

    jijVv

    NaiveBayes vaPvPvj

    )/()(maxarg

    D.P. Solomatine. Data-driven modelling (part 3). 74

    Modular models:Modular models:

    committee machinescommittee machines, ensembles,, ensembles,

    mixtures of experts, boostingmixtures of experts, boosting

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    38/4233

    D.P. Solomatine. Data-driven modelling (part 3). 75

    Committee machine (modular model)Committee machine (modular model)

    Instead of building one model, several models are built eachresponsible for a particular situation

    High flows

    Low flows

    Medium flows

    Rainfall (t-3)

    Rainfall (t-2)

    Flow Q(t)

    separate models are built

    past records

    New record

    (hydrometeorological

    condition).

    It is to be attributed to one (or

    several classes), and the

    corresponding models will be

    run

    Consider a forecasting model Q(t+1) =f (R(t-2), R(t-3), Q(t-1)

    D.P. Solomatine. Data-driven modelling (part 3). 76

    Committee machine (modular model)Committee machine (modular model)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    39/4233

    D.P. Solomatine. Data-driven modelling (part 3). 77

    Committee machinesCommittee machines (modular model)(modular model)

    input data is split into subsets and separate data-driven modelsare trained:

    hard split : sort according to position in the input space (low - highrainfall) this allows to bring in the physical insight

    no split : do not sort, but train several models on the same dataand then combine results by some voting scheme (committeemachine)

    voting by majority, weighted majority, by averaging

    soft split : split according to how well a given model trained withthis data, and then train also other models. Example: boosting

    present the original training data (N examples) set to machine 1

    assign higher probability to samples that are badly classified sample N examples from training set based on the new distribution

    train machine 2

    continue, ending with n machines

    D.P. Solomatine. Data-driven modelling (part 3). 78

    Committee machine with hard split,Committee machine with hard split,

    expert (expert (specialisedspecialised) models trained on subsets) models trained on subsets

    y1

    Machine 2Machine 1 Machine n

    Inputx

    y2 yn

    Splitting (gating machine)

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    40/4244

    D.P. Solomatine. Data-driven modelling (part 3). 79

    Committee machine with no split (ensemble),Committee machine with no split (ensemble),

    all models are trained on the same setall models are trained on the same set

    y1

    Machine 2Machine 1 Machine n

    Inputx

    y2 yn

    No splitting

    Combiner (averaging scheme)

    y

    D.P. Solomatine. Data-driven modelling (part 3). 80

    Committee machine with soft split of data.Committee machine with soft split of data.

    BoostingBoosting

    y1

    Machine 2Machine 1 Machine n

    Input x

    Combiner (weighted averaging scheme)

    y2

    yn

    samplingNtraining examples from the distribution where

    badly predicted examples are given higher probability

    y

    redistribution

    redistribution

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    41/4244

    D.P. Solomatine. Data-driven modelling (part 3). 81

    UsingUsing mix ture of expert s (models)mix ture of expert s (models) ::

    each model is for particular hydrological conditioneach model is for particular hydrological condition

    Condition 3

    Pa t-1 > 50

    Pa Mov2 t-2 200

    Module

    1

    Module

    2

    Y

    N

    M5

    ANN

    M5

    ANN

    ?

    D.P. Solomatine. Data-driven modelling (part 3). 82

    Combining physicallyCombining physically--based and databased and data--driven models.driven models.

    Complementary use of a DataComplementary use of a Data--driven modeldriven model

    HYDROLOGIC

    FORECASTING

    MODEL

    Input data

    Observed output

    Model

    errors

    Forecastederrors

    DATA-

    DRIVENerror

    forecastingmodel

    Improved

    output

    Model

    parameters

    Model output

    PHYSICAL SYSTEM

    HYDROLOGIC

    FORECASTING

    MODEL

    Input data

    Observed output

    Model

    errors

    Forecastederrors

    DATA-

    DRIVENerror

    forecastingmodel

    Improved

    output

    Model

    parameters

    Model output

    PHYSICAL SYSTEM

  • 7/31/2019 Handouts on Data-driven Modelling, part 3 (UNESCO-IHE)

    42/42

    D.P. Solomatine. Data-driven modelling (part 3). 83

    End of Part 3End of Part 3