1. Introduction Consistency of learning processes To explain when a learning machine that minimizes...
53
Consistency of Learning Processes (chapter 2) Supervisor : Dr. Shiry By : M.A. Keyvanrad Author : Vladimir N. Vapnik Statistical Learning Theory 1 In the Name of God
1. Introduction Consistency of learning processes To explain when a learning machine that minimizes empirical risk can achieve a small value of actual
Introduction Consistency of learning processes To explain when
a learning machine that minimizes empirical risk can achieve a
small value of actual risk (to generalize) and when it can not.
Equivalently, to describe necessary and sufficient conditions for
the consistency of learning processes that minimize the empirical
risk. It is extremely important to use concepts that describe
necessary and sufficient conditions for consistency. This
guarantees that the constructed theory is general and cannot be
improved from the conceptual point of view. 2
Slide 3
3
Slide 4
The classical definition of consistency Let Q(z, l ) be a
function that minimizes the empirical risk functional for a given
set of i.i.d. observations z1,, z l. 4
Slide 5
The classical definition of consistency The ERM principle is
consistent for the set of functions Q(z, ),, and for the p.d.f. F
(z) if the following two sequences converge in probability to the
same limit Equation (1) asserts that the values of achieved risks
converge to the best possible. Equation (2) asserts that one can
estimate on the basis of the values of empirical risk the minimal
possible value of the risk. 5
Slide 6
The classical definition of consistency 6
Slide 7
Goal To obtain conditions of consistency for the ERM method in
terms of general characteristics of the set of functions and the
probability measure. This is an impossible task because the
classical definition of consistency includes cases of trivial
consistency. 7
Slide 8
Trivial consistency 8
Slide 9
9
Slide 10
ERM consistency In order to create a theory of consistency of
the ERM method depending only on the general properties (capacity)
of the set of functions, a consistency definition excluding trivial
consistency cases is needed. This is done by non-trivial (strict)
consistency definition. 10
Slide 11
Non-trivial consistency 11
Slide 12
12
Slide 13
Key theorem of learning theory 13
Slide 14
Consistency of the ERM principle 14
Slide 15
Consistency of the ERM principle In contrast to the so-called
uniform two-sided convergence defined by the equation 15
Slide 16
Consistency of the ML Method 16
Slide 17
Consistency of the ML Method 17
Slide 18
18
Slide 19
Empirical process 19
Slide 20
Consistency of an empirical process The necessary and
sufficient conditions for an empirical process to converge in
probability to zero imply that the equality holds true. 20
Slide 21
Law of large numbers and its generalization 21
Slide 22
Law of large numbers and its generalization 22
Slide 23
Entropy 23
Slide 24
Entropy of the set of indicator functions (Diversity) 24
Slide 25
Entropy of the set of indicator functions (Diversity) 25
Slide 26
Entropy of the set of indicator functions (Random entropy)
26
Slide 27
Entropy of the set of real functions (Diversity) 27
Slide 28
28
Slide 29
VC Entropy of the set of real functions 29
Slide 30
VC Entropy of the set of real functions (Random entropy and VC
entropy) 30
Slide 31
Conditions for uniform two-sided convergence 31
Slide 32
Conditions for uniform two-sided convergence (Corollary)
32
Slide 33
33
Slide 34
Uniform one-sided convergence Uniform two-sided convergence can
be described as which includes uniform one-sided convergence, and
it's sufficient condition for ERM consistency. But for consistency
of ERM method, the second condition on the left-hand side of (9)
can be violated. 34
Slide 35
Conditions for uniform one-sided convergence 35
Slide 36
Conditions for uniform one-sided convergence 36
Slide 37
37
Slide 38
Models of reasoning Deductive Moving from general to
particular. The ideal approach is to obtain corollaries
(consequences) using a system of axioms and inference rules.
Guarantees that true consequences are obtained from true premises.
Inductive Moving from particular to general. Formation of general
judgments from particular assertions. Judgments obtained from
particular assertions are not always true. 38
Slide 39
Demarcation problem Proposed by Kant, it is a central question
of inductive theory. Demarcation problem What is the difference
between the cases with a justified inductive step and those for
which the inductive step is not justified? Is there a formal way to
distinguish between true theories and false theories? 39
Slide 40
Example Assume that meteorology is a true theory and astrology
is a false one. What is the formal difference between them? The
complexity of the models? The predictive ability of their models?
Their use of mathematics? The level of formality of inference? None
of the above gives a clear advantage to either of these theories.
40
Slide 41
Criterion for demarcation Suggested by Popper (1930), a
necessary condition for justifiability of a theory is the
feasibility of its falsification. By falsification, Popper means
the existence of a collection of particular assertions that cannot
be explained by the given theory although they fall into its
domain. If the given theory can be falsified it satisfies the
necessary conditions of a scientific theory. 41
Slide 42
42
Slide 43
Nature of the ERM principle What happens if the condition of
one-side convergence (theorem 11) is not valid? Why is the ERM
method not consistent is this case? Answer: if uniform two-sided
convergence does not take place, then the method of minimizing the
empirical risk is non-falsifiable. 43
Slide 44
Complete (Popper's) non- falsifiability 44
Slide 45
Complete (Popper's) non- falsifiability 45
Slide 46
Complete (Popper's) non- falsifiability 46
Slide 47
Partial non-falsifiability 47
Slide 48
Partial non-falsifiability 48
Slide 49
Potential non-falsifiability (Definition) 49
Slide 50
Potential non-falsifiability (Graphical representation) A
potentially non-falsifiable learning machine 50
Slide 51
Potential non-falsifiability (Generalization) 51
Slide 52
Potential non-falsifiability 52
Slide 53
Refrences Vapnik, Vladimir,The Nature of Statistical Learning
Theory, 2000 Miguel A. Veganzones, Consistency and bounds on the
rate of convergence for ERM methods 53