22
•Information Contents Versus Knowledge •While the information contents of a string x can be measured by its Kolmogorov complexity K(x), it is not clear how to measure the knowledge stored in x. •We argue that the knowledge contained by a string x is relative to the hypothesis assumed to compute x.

Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

Embed Size (px)

Citation preview

Page 1: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•Information Contents Versus Knowledge•While the information contents of a string x can be measured by its Kolmogorov complexity K(x), it is not clear how to measure the knowledge stored in x.

•We argue that the knowledge contained by a string x is relative to the hypothesis assumed to compute x.

Page 2: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•If H is the hypothesis used to explain x, then we suggest to measure the knowledge in x by K(H).

•The absolute knowledge in x is K(H0), where H0 is a simplest hypothesis for x.

Page 3: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•Using Bayes’ rule and Solomonoff’s universal distribution, we obtain

•K(x) = K(H) + K(x|H) – K(H|x).

•Here one would expect H to be consistent with x and so K(H|x) to be minimal.

•Discarding K(H|x) gives

•K(x) = K(H) + K(x|H).

Page 4: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•We interpret K(H) as a measure of the knowledge part in x relative to H.

•K(x|H) is a measure of the accidental information (noise) in x relative to H.

Page 5: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•A Simple Example•Suppose we record our observations of an ongoing phenomenon and we stop gathering data at time t1 after having obtained the segment

X = 10101010101010

•The information in x (=K(x)) is about the number of bits in a shortest program for x. Something like

For I = 1 to 7 print “10”

Page 6: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•This program assumes the hypothesis

H = “x contains the repeating element 10”.

It is this H that we call knowledge in x. The amount K(x|H) which is about log 7 measures the amount of noise in x under H.

•Other hypotheses exist that trade off the amount of knowledge for the level of noise that can be tolerated. This becomes application-dependent.

Page 7: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•This work is similar to Kolmogorov’s 1974 result in which he proposed to found statistical theory on finite combinatorial and computational principles independent of probabilistic assumptions, as the relation between the individual data and its explanation (model or hypothesis), expressed by Kolmogorov’s structure function.

Page 8: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

Kolmogorov’s Approach to Non-probabilistic Statistics•As the relation between the individual data sample and a specific constrained data model, expressed by Kolmogorov’s structure function Φ(.).

•Let data be finite binary strings and models be finite sets of binary strings. Consider model classes consisting of models of given maximal Kolmogorov complexity.

Page 9: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

The Kolmogorov's structure function Φ_x(k) of the given data x expresses the relation between the complexity level constraint on a model class S and the least log-cardinality of a model in the class containing the data.

Φ_x(k) = min._S {log |S|: S x, K(S) k}.

Page 10: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

Kolmogorov explains …

To each constructive object x corresponds a function Φx(k) of a natural number k – the log of minimal cardinality of x-containing sets that allow definitions of complexity at most k. If the element x itself allows a simple definition, then the function Φ drops to 1 even for small k. Lacking such definition, the element is random in the negative sense. But it is positively probabilistically random only when function Φ having taken the value Φ0 at a relatively small k=k0, then change approximately as

Φx(k)= Φ0 – (k-k0)

Page 11: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•This function Φx(k), its variants, and its relation to model selection have been the subjects of numerous publications, but in my opinion it has not before been well understood.

•We view Kolmogorov’s structure function as a rewrite of Bayes’ rule using Solomonoff’s universal distribution as explained earlier.

Page 12: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

Understanding Kolmogorov’s Structure Function•Φx(k) as the log of minimal cardinality of x-containing sets that allow definitions of complexity at most k is a particular case of K(x|H), when H is a finite set containing x and K(H) k.

• Thus we interpret Φx(k) as a measure of the amount of accidental information (noise) in x when bound to a model of maximum Kolmogorov complexity k.

•If x is typical of a finite set S, then we expect K(x|S) to be about log S.

Page 13: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•The terms 0 and k0 in Kolmogorov’s structure function corresponds to a hypothesis H0 of small Kolmogorov complexity k0, which explains nothing about x.

•In this case I(H0|x)=0, which leads to K(x|H0) = K(x) and K(H0|x)=K(H0)=k0. So, the approximation

K(x) = K(H) + K(x|H) – K(H|x)

or equivalently

K(x|H) = K(x) – (K(H) – K(H|x))

Would degenerate to Kolmogorov’s structure function

Φx(k)= Φ0 – (k-k0)

Page 14: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•In general, for all hypotheses H for x of maximum Kolmogorov complexity k, we have K(x) = K(x|H)+K(H)-K(H|x) K(x|H)+K(H) Φx(k) + k. Thus Φx(k) K(x) – k.

•This explains the act of Kolmogorov when he drew a picture of Φx(k) as a function of k monotonically approaching the diagonal (sufficiency line = L(k) = Φx(k) +k).

•This diagonal line corresponds to a minimum value of Φx(k) attained when there exists some H of max Kolmogorov complexity k such that K(x) = k + Φx(k) .

Page 15: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•Such k = K(H) is called a sufficient statistic for x and the expression k + Φx(k) is treated as a two-part code separating the meaningful information in x represented by k from the meaningless accidental information (noise) in x

following the hypothesis H.

Page 16: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•A Simple Derivation of a Fundamental Result• Vitányi’s Best Fit Function: The randomness deficiency (x|S) of a string x in the set S is defined by (x|S) = log |S| - K(x|S) for x S and otherwise.

• The minimal randomness deficiency function is x(k) = min.S{(x|S) : S x, K(S) k}.

• A model S for which x incurs x(k) deficiency is a best-fit model. We say S is optimal for x and K(S|x) 0.

Page 17: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•Rissanen’s Minimum Description Length Function: Consider the two-part code for x consisting of the constrained model cost K(S) and the length of the index of x in S. The MDL function is

x(k) = min.S{K(S) + log |S| : S x, K(S) k}.

Page 18: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•The results in [Vereshchagin and Vitányi 2004] are obtained by analysis of the relation between the three structure functions: x(k) , x(k) , and x(k) . The most fundamental result there is the equality

x(k) = x(k) + k – K(x) = x(k) – K(x),

which holds within additive terms, that are logarithmic in |x|.

•This above result is an improvement of a previous result by Gács, Tromp, and Vitányi (2001) in which it was proven that

x(k) x(k) + k – K(x) + O(1)

• where the authors mentioned that it would be nice to have an inequality also in the other direction.

Page 19: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•We understand the structure functions x(k) and x(k) as being equivalent to K(x|S) and K(S|x), respectively, where K(S) k.

•Using the approximation

K(x) = K(S) + K(x|S) – K(S|x)

or equivalently

K(x|S) + K(S) = K(x) + K(S|x)

gives the equality

x(k) + k = K(x) + x(k) = x(k) .

Page 20: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•We mention that the approach used in the previous two references uses a much more complicated argument where a shortest program for a string x is assumed to be divisible into two parts, the model part (K(S)) and the data-to-model part (K(x|S)), which is a very difficult task to do.

•Gács and Vitányi credited the Invariance Theorem for such a deep and useful fact. This view lead to the equation

•K(x) = minT{K(T) + K(x|T): T {T0, T1, …}}

Which holds up to additive constants and where T0, T1, … is the standard enumeration of Turing machines.

Page 21: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

•The whole theory of algorithmic statistics is based on this interpretation of K(x) as the shortest length of a two-part code for x.

•We argue that the use of the Invariance Theorem to suggest a two-part code of an object is too artificial.

•We prefer to use the three-part code suggested by Bayes’ rule and think of the two-part code as an approximation of the three-part code in which the model is considered to be optimal.

Page 22: Quantifying Knowledge Fouad Chedid Department of Computer Science Notre Dame University Lebanon

Muchas Gracias