Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
An
Intro
duct
ion
to In
form
atio
n Th
eory
and
Ent
ropy
1 1.
Mea
surin
g co
mpl
exity
Rese
arch
ers
in th
e fie
ld o
f co
mpl
exity
fac
e a
clas
sic
prob
lem
: H
ow c
an w
e te
ll th
at
the
syst
em w
e ar
e lo
okin
g at
is
actu
ally
a c
ompl
ex s
yste
m?
Sho
uld
we
even
be
stud
ying
suc
h a
syst
em?
Of c
ours
e, in
pra
ctic
e, w
e w
ill s
tudy
the
syst
ems
that
inte
rest
us,
for
wha
teve
r re
ason
s, so
the
pro
blem
ide
ntifi
ed a
bove
ten
ds n
ot t
o be
a r
eal
prob
lem
. O
n th
e ot
her
hand
, hav
ing
chos
en a
sys
tem
to s
tudy
, we
mig
ht w
ell
ask:
How
com
plex
is th
is sy
stem
?
In t
his
mor
e ge
nera
l co
ntex
t, w
e pr
obab
ly w
ant
at l
east
to
be a
ble
to c
ompa
re t
wo
syst
ems,
and
be
able
to
say
that
sys
tem
A i
s m
ore
com
plex
tha
n sy
stem
B.
Even
tual
ly, w
e pr
obab
ly w
ould
like
to h
ave
som
e so
rt of
num
eric
al ra
ting
scal
e.
We
can’
t exp
ect t
o be
abl
e to
com
e up
with
a s
ingl
e un
iver
sal m
easu
re o
f com
plex
ity.
The
best
we
are
likel
y to
hav
e is
a m
easu
ring
syst
em u
sefu
l by
a pa
rticu
lar
obse
rver
,
in a
par
ticul
ar c
onte
xt, f
or a
par
ticul
ar p
urpo
se.
Our
foc
us h
ere
will
be
on m
easu
res
rela
ted
to h
ow s
urpr
isin
g or
une
xpec
ted
an
obse
rvat
ion,
or e
vent
, is.
Thi
s app
roac
h ha
s be
en d
escr
ibed
as i
nfor
mat
ion
theo
ry.
2. S
ome
prob
abili
ty b
ackg
roun
d
Ther
e ar
e tw
o m
ain
notio
ns o
f pro
babi
lity
of a
n ev
ent h
appe
ning
. Th
ese
are:
1 Bas
ed o
n re
fere
nces
at t
he e
nd o
f the
man
uscr
ipt.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
A f
requ
entis
t ve
rsio
n of
pro
babi
lity:
In
this
ver
sion
, w
e as
sum
e w
e ha
ve a
set
of
poss
ible
eve
nts,
each
of
whi
ch w
e as
sum
e oc
curs
som
e nu
mbe
r of
tim
es.
Thus
, if
ther
e ar
e N
dist
inct
pos
sibl
e ev
ents
(x 1
, x 2
, …
, x
N),
no t
wo
of w
hich
can
occ
ur
sim
ulta
neou
sly,
and
the
even
ts o
ccur
with
freq
uenc
ies
(n1,
n 2, …
, n N
), w
e sa
y th
at th
e
prob
abili
ty o
f ev
ent
x i is
giv
en b
y P(
x i) =
∑=N
1j
jnin
. Th
is d
efin
ition
has
the
nic
e
prop
erty
that
1
1)
(=
∑=N i
ixP
.
An
obse
rver
rel
ativ
e (B
ayes
ian )
ver
sion
of
prob
abili
ty:
In t
his
vers
ion,
we
take
a
state
men
t of p
roba
bilit
y to
be
an a
sser
tion
abou
t the
bel
ief t
hat a
spe
cific
obs
erve
r has
of t
he o
ccur
renc
e of
a s
peci
fic e
vent
. N
ote
that
in
this
vers
ion
of p
roba
bilit
y, i
t is
poss
ible
tha
t tw
o di
ffere
nt o
bser
vers
may
ass
ign
diffe
rent
pro
babi
litie
s to
the
sam
e
even
t. Fu
rther
mor
e, th
e pr
obab
ility
of a
n ev
ent i
s lik
ely
to c
hang
e as
we
lear
n m
ore
abou
t the
eve
nt, o
r the
con
text
of t
he e
vent
.
In s
ome
case
s, w
e m
ay b
e ab
le t
o fin
d a
reas
onab
le c
orre
spon
denc
e be
twee
n th
ese
two
view
s of
pro
babi
lity.
In
parti
cula
r, w
e m
ay s
omet
imes
be
able
to u
nder
stan
d th
e
obse
rver
rel
ativ
e ve
rsio
n of
the
prob
abili
ty o
f an
even
t to
be a
n ap
prox
imat
ion
to th
e
frequ
entis
t ver
sion,
and
to v
iew
new
kno
wle
dge
as p
rovi
ding
us
a be
tter
estim
ate
of
the
rela
tive
frequ
enci
es.
Som
e pr
obab
ility
bas
ics,
whe
re A
and
B a
re e
vent
s:
P(~A
) = P
(Ac )=
1 - P
(A)
P(A U
B) =
P(A
) + P
(B) -
P(A
IB)
.
We
will
ofte
n de
note
P(A
IB)
by
P(
A , B
). If
P(A,
B)
= 0,
we
say
A an
d B
are
mut
ually
exc
lusiv
e ev
ents .
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
Con
ditio
nal
prob
abili
ty:
P(A
| B
) is
the
pro
babi
lity
of A
, gi
ven
that
we
know
B
occu
rred
. The
join
t pro
babi
lity
of b
oth
A an
d B
is g
iven
by:
P(A
, B) =
P(A
| B)
P(B
).
Sinc
e P(
A, B
) = P
(B, A
), w
e ha
ve B
ayes
' The
orem
:
P(B)
P(A)
A) |
P(B
B) | P(
AP(
A),
A) |
P(B
P(B)
B) |
P(A
×=
×=
×.
If tw
o ev
ents
A a
nd B
are
suc
h th
at P
(A |
B) =
P(A
), w
e sa
y th
at th
e ev
ents
A a
nd B
are
inde
pend
ent .
Fro
m B
ayes
' The
orem
, we
will
als
o ha
ve th
at P
(B |
A) =
P(B
), an
d
furth
erm
ore,
P(A
, B) =
P(A
| B)
P(B
) = P
(A) P
(B).
This
last
equ
atio
n is
ofte
n ta
ken
as
the
defin
ition
of i
ndep
ende
nce .
We
have
in e
ssen
ce b
egun
her
e th
e de
velo
pmen
t of
a m
athe
mat
ical
met
hodo
logy
for
draw
ing
infe
renc
es a
bout
the
wor
ld fr
om u
ncer
tain
kno
wle
dge.
We
coul
d sa
y th
at o
ur
obse
rvat
ion
of th
e co
in s
how
ing
head
s gi
ves
us in
form
atio
n ab
out t
he w
orld
. W
e no
w
deve
lop
a fo
rmal
mat
hem
atic
al d
efin
ition
of
the
info
rmat
ion
cont
ent
of a
n ev
ent,
whi
ch o
ccur
s w
ith a
cer
tain
pro
babi
lity.
3. A
xiom
atic
Dev
elop
men
t of I
nfor
mat
ion
Theo
ry
We
now
wan
t to
deve
lop
a us
able
mea
sure
of t
he in
form
atio
n w
e ge
t fro
m o
bser
ving
the
occu
rren
ce o
f an
eve
nt h
avin
g pr
obab
ility
p. O
ur f
irst r
educ
tion
is to
igno
re a
ny
parti
cula
r fe
atur
es o
f th
e ev
ent,
and
only
obs
erve
whe
ther
or
not
it ha
ppen
ed.
In
esse
nce
this
mea
ns th
at w
e ca
n th
ink
of th
e ev
ent a
s ob
serv
ance
of
a sy
mbo
l who
se
prob
abili
ty o
f occ
urrin
g is
p.
We
will
thus
be
defin
ing
the
info
rmat
ion
in te
rms
of th
e pr
obab
ility
p.
The
follo
win
g re
pres
ent a
set o
f rea
sona
ble
axio
ms f
or a
n in
form
atio
n m
easu
re I(
p):
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
1.
Info
rmat
ion
is a
non
-neg
ativ
e qu
antit
y: I
(p)≥
0.
2.
If a
n ev
ent h
as p
roba
bilit
y 1,
we
get n
o in
form
atio
n fr
om th
e oc
curr
ence
of
the
even
t: I(
1)=
0.
3.
If tw
o in
depe
nden
t ev
ents
occ
ur (
who
se j
oint
pro
babi
lity
is t
he p
rodu
ct o
f
thei
r in
divi
dual
pro
babi
litie
s), t
hen
the
info
rmat
ion
we
get f
rom
obs
ervi
ng th
e
even
ts is
the
sum
of t
he tw
o in
form
atio
ns:
I(p1
*p2)
= I(
p1) +
I(p2
).
4.
We
will
wan
t ou
r in
form
atio
n m
easu
re t
o be
a c
ontin
uous
(an
d, i
n fa
ct,
mon
oton
ic)
func
tion
of t
he p
roba
bilit
y (s
light
cha
nges
in
prob
abili
ty s
houl
d
resu
lt in
slig
ht c
hang
es in
info
rmat
ion)
.
Cor
olla
ries
of t
hese
axi
oms
incl
ude:
1.
I(p2 ) =
I(p*
p) =
I(p)
+ I(
p) =
2*I
(p).
2.
Thus
, I(p
n ) = n
*I(p
), by
indu
ctio
n.
3.
I(p)
= I
((p1/
m)m
) =
m *
I(p
1/m),
so I
(p1/
m)
= (
1/m
)*I(
p) a
nd t
hus,
in g
ener
al,
I(pn/
m)=
(n/m
)*I(
p).
4.
And
thus
, by
cont
inui
ty, w
e ge
t, fo
r 0 <
p ≤
1, a
nd 0
< a
I(
pa ) = a
*I(p
).
5.
From
thes
e, w
e ca
n de
rive
the
nice
pro
perty
of i
nfor
mat
ion
mea
sure
:
I(p)
= -l
ogb(p
) = lo
g b(1
/p)
for s
ome
log-
base
b. T
he b
ase
b de
term
ines
the
units
we
are
usin
g. O
f cou
rse,
we
can
chan
ge th
e un
its b
y ch
angi
ng th
e ba
se, u
sing
the
form
ulas
, for
b 1, b
2, x
> 0
, log b
1(x) =
log b
2(x) /
log b
1(b2).
Thus
, usi
ng d
iffer
ent
base
s fo
r th
e lo
garit
hm r
esul
ts in
info
rmat
ion
mea
sure
s w
hich
are
just
con
stan
t m
ultip
les
of e
ach
othe
r, co
rres
pond
ing
with
mea
sure
men
ts i
n
diffe
rent
uni
ts:
1.
log 2
uni
ts a
re b
its (f
rom
bin
ary)
2.
log 3
uni
ts a
re tr
its (f
rom
trin
ary)
3.
log e
uni
ts a
re n
ats (
from
nat
ural
loga
rith
m) (
We
com
mon
ly u
se ln
(x)=
log e
(x))
4.
log 1
0 uni
ts a
re H
artle
ys, a
fter R
.V.L
. Har
tleys
, 194
2.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
Unl
ess
we
wan
t to
emph
asiz
e th
e un
its, w
e ne
ed n
ot b
othe
r to
spec
ify th
e ba
se fo
r the
loga
rithm
, and
sim
ply
writ
e lo
g(p)
. Ty
pica
lly, w
e th
ink
in te
rms o
f log
2(p)
.
Exam
ple:
Su
ppos
e w
e fli
p a
fair
coin
onc
e. T
he o
utco
mes
are
eve
nts
H a
nd T
eac
h
with
pro
babi
lity
½, a
nd th
us a
sin
gle
flip
of a
coi
n gi
ves
us
–log
2(1/
2) =
1, b
it of
info
rmat
ion
(whe
ther
the
outc
ome
is a
H o
r T).
Flip
ping
a f
air
coin
n t
imes
(or
, eq
uiva
lent
ly,
inde
pend
ently
flip
ping
n f
air
coin
s)
give
s us
–lo
g 2((
1/2)
n ) =
log 2
(2n )
= n*
log 2
(2)
= n
bits
of
info
rmat
ion.
We
coul
d
rand
omly
gen
erat
e (s
ee h
ttp://
socr
.sta
t.ucl
a.ed
u/)
a se
quen
ce o
f 25
flip
s as
, fo
r
exam
ple:
{HTH
HTT
HTH
HH
THTT
THTH
HH
THTT
}
or, u
sing
1 fo
r H a
nd 0
for T
, the
25
bits
{101
1001
0111
0100
0101
1101
00}.
We
thus
get
the
nice
fact
that
n fl
ips
of a
fair
coin
giv
es u
s n
bits
of i
nfor
mat
ion,
and
take
s n
bina
ry d
igits
to s
peci
fy.
That
thes
e tw
o qu
antit
ies
are
the
sam
e re
assu
res
us
that
we
have
don
e a
reas
onab
le a
xiom
atic
def
initi
on o
f inf
orm
atio
n m
easu
re.
4. S
ome
Entro
py T
heor
y
Supp
ose
now
that
we
have
n sy
mbo
ls {
a 1, a
2, …
, an}
, and
som
e so
urce
is p
rovi
ding
us
with
a s
tream
of
thes
e sy
mbo
ls.
Supp
ose
furth
er th
at th
e so
urce
em
its th
e sy
mbo
ls
with
pro
babi
litie
s {p
1, p 2
, …
, p
n},
resp
ectiv
ely.
For
now
, w
e al
so a
ssum
e th
at t
he
sym
bols
are
em
itted
inde
pend
ently
(su
cces
sive
sym
bols
do
not d
epen
d in
any
way
on
past
sym
bols
).
Wha
t is
the
aver
age
amou
nt o
f inf
orm
atio
n w
e ge
t fro
m e
ach
sym
bol w
e se
e in
the
stre
am?
Wha
t w
e re
ally
wan
t he
re i
s a
wei
ghte
d av
erag
e.
If w
e ob
serv
e th
e
sym
bol a
i, w
e w
ill b
e ge
tting
log(
1/p i
) in
form
atio
n fr
om th
at p
artic
ular
obs
erva
tion.
In a
long
run
of (
say
N) o
bser
vatio
ns, w
e w
ill s
ee (a
ppro
xim
atel
y) N
* p
i occ
urre
nces
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
of t
he s
ymbo
l a i
(in
the
fre
quen
tist
sens
e, t
hat's
wha
t it
mea
ns t
o sa
y th
at t
he
prob
abili
ty o
f see
ing
a i is
pi).
Thu
s, in
the
N (in
depe
nden
t) ob
serv
atio
ns, w
e w
ill g
et
tota
l inf
orm
atio
n I o
f
∑=
××
=
n
1i
ipip
N
(T
I) n
Info
rmat
io
Tota
l1
log
And
ther
efor
e, th
e av
erag
e in
form
atio
n w
e ge
t per
sym
bol o
bser
ved
will
be
∑=
×=
∑=
××
=
n
1i
ipip
n1
iip
ipN
N1 I
1lo
g1
log
Not
e th
at
() 0
1lo
g0
lim=
→
x
xx
, so
w
e ca
n,
for
our
purp
oses
, de
fine
p i*l
og(1
/ pi)
to b
e 0
whe
n p i
= 0
. Thi
s br
ings
us
to a
fun
dam
enta
l def
initi
on.
This
defin
ition
is e
ssen
tially
due
to S
hann
on in
194
8, in
the
sem
inal
pap
ers
in th
e fie
ld o
f
info
rmat
ion
theo
ry.
As
we
have
obs
erve
d, w
e ha
ve d
efin
ed i
nfor
mat
ion
stric
tly i
n
term
s of
the
prob
abili
ties
of e
vent
s. T
here
fore
, let
us
supp
ose
that
we
have
a s
et o
f
prob
abili
ties (
a pr
obab
ility
dis
tribu
tion
P =
{p 1
, p2,
… ,
p n})
.
Def
initi
on: W
e de
fine
the
(Sha
nnon
-Wie
ner)
ent
ropy
of t
he d
istri
butio
n P
by:
()
kk
kk
plo
gp
p1lo
gp
H(P
)n
1k
n
1k
×−
=
×
=∑
∑=
=
(1)
Ther
e is
an
obvi
ous
gene
raliz
atio
n of
the
entro
py fo
r con
tinuo
us, r
athe
r tha
n di
scre
te,
prob
abili
ty d
istri
butio
n P(
x):
() dx
xP
log
xP
H
(P)
)(
)(
×−
=∫
(2)
Ano
ther
way
to
thin
k ab
out
this
is
in t
erm
s of
exp
ecte
d va
lue.
G
iven
a P
DF/
PMF
P(x)
we
can
defin
e th
e ex
pect
ed v
alue
of a
n as
soci
ated
func
tion
F(x)
by:
dxx
Px
FX
FE
∫×
=)
()
())
((
.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
With
this
def
initi
on, w
e ha
ve th
at: H
(P) =
E(I(
p)).
In o
ther
wor
ds, t
he e
ntro
py o
f a
prob
abili
ty d
istri
butio
n is
just
the
expe
cted
val
ue o
f th
e in
form
atio
n m
easu
re o
f th
at
dist
ribut
ion.
We’
ll di
scus
s the
follo
win
g fe
w im
port
ant p
oint
s:
1.
Wha
t pr
oper
ties
does
the
fun
ctio
n H
(P)
have
? F
or e
xam
ple,
doe
s it
have
extre
ma,
and
if s
o w
here
?
2.
Is e
ntro
py a
rea
sona
ble
nam
e fo
r th
is?
In
parti
cula
r, th
e na
me
entr
opy
is
alre
ady
in u
se i
n ph
ysic
s/th
erm
odyn
amic
s.
How
are
the
se u
ses
of t
he t
erm
rela
ted
to e
ach
othe
r?
3.
Wha
t can
we
do w
ith th
is n
ew to
ol?
5. T
he G
ibbs
ineq
ualit
y
Firs
t, no
te th
at th
e (n
atur
al-lo
g) fu
nctio
n ln
(x) h
as d
eriv
ativ
e 1/
x. F
rom
this
, we
find
that
the
tang
ent t
o ln
(x)
at x
=1
is th
e lin
e y
= x
- 1.
Fur
ther
, sin
ce ln
(x) i
s co
ncav
e
dow
n, w
e ha
ve, f
or x
>0,
that
ln(x
) ≤
x - 1
, with
equ
ality
onl
y w
hen
x =
1.
Now
, giv
en tw
o pr
obab
ility
dis
tribu
tions
,
P =
{p1,
p 2, …
, p n
} and
Q =
{q1,
q 2, …
, q n
},
whe
re p
k, q k
≥ 0
and
1
11
=∑
==
∑=
n kkq
n kkp
, we
have
()
,01
11
11
1ln
=−
=∑
=−
=
∑=
−≤
∑=
n kkp
kq
n kkpkq
kpn k
kpkqkp
(3)
with
equ
ality
onl
y w
hen
pk=
qk
for
all k
. It
is e
asy
to s
ee th
at th
e in
equa
lity
actu
ally
hold
s for
any
log-
base
, not
just
bas
e e.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
We
can
now
use
the
Gib
bs i
nequ
ality
to
find
the
prob
abili
ty d
istri
butio
n, w
hich
max
imiz
es t
he e
ntro
py f
unct
ion.
Su
ppos
e P
= {p
1, p 2
, …
, p
n}
is a
pro
babi
lity
dist
ribut
ion.
We
have
0≤
∑=
=
∑=
=
∑=
=
∑=
∑=
=
∑=
=
kpkqlo
g kp
n1
k
kpn1lo
g kp
n1
k
log(
n) -
kp1
log
kpn
1k
kpn
1k
lo
g(n)
- kp
1lo
g kp
n1
k
log(
n) -
kp1
log
kpn
1k
lo
g(n)
- H
(P)
(4
)
with
equ
ality
onl
y w
hen
n
kp1
= f
or a
ll k.
Th
e la
st s
tep
is t
he a
pplic
atio
n of
the
Gib
bs in
equa
lity
(3).
Wha
t thi
s m
eans
is th
at:
)lo
g()
(0
nP
H≤
≤
(
5).
In p
artic
ular
, if f
or s
ome
k o, p
Ko=
1 an
d p k
=0
for a
ll k≠
k o, w
e ha
ve H
(P) =
0.
On
the
othe
r en
d of
the
spe
ctru
m,
the
entro
py H
(P)
= lo
g(n)
(m
axim
um p
ossi
ble
entro
py)
only
whe
n al
l of t
he e
vent
s (o
utco
mes
) ha
ve th
e sa
me
prob
abili
ty,
nkp
1=
. Tha
t is,
the
max
imum
of
th
e en
tropy
fu
nctio
n is
lo
g(of
th
e nu
mbe
r of
po
ssib
le
even
ts/o
utco
mes
), an
d oc
curs
whe
n al
l th
e ev
ents
are
equ
ally
lik
ely.
Thi
s ill
ustra
tes
the
entro
py a
s a
mea
sure
of
unce
rtain
ty
hig
h en
tropy
mea
ns lo
ts of
unc
erta
inty
and
low
entro
py y
ield
s hig
h ce
rtain
ty a
bout
the
outc
ome
of th
e pr
oces
s/ex
perim
ent.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
Exam
ple :
How
muc
h in
form
atio
n is
obt
aine
d by
a s
ingl
e ne
urop
sych
iatri
c (N
P) te
st?
Firs
t, th
e m
axim
um in
form
atio
n oc
curs
if
all
outc
omes
/sco
res/
resu
lts o
f th
e N
P te
st
have
equ
al p
roba
bilit
y to
be
obse
rved
(e.g
., in
an
AD
vs.
NoA
D te
st, o
n av
erag
e ha
lf
the
subj
ects
sho
uld
end
up h
avin
g A
D a
nd th
e ot
her
have
sho
uld
not h
ave
dem
entia
)
if w
e w
ant
to m
axim
ize
the
info
rmat
ion
give
n by
the
NP
test
. H
ere
are
seve
ral
com
mon
situ
atio
ns in
dica
ting
the
cond
ition
s fo
r ob
tain
ing
the
max
imum
info
rmat
ion
from
a s
ingl
e N
P te
st re
sult,
we
use
equa
tion
(1):
Expe
rimen
t/Pro
cess
Typ
e M
ax In
form
atio
n [p
lug
in e
quat
ion
(1)
p k=
1/n]
Bina
ry T
est (
AD
vs.
NoA
D)
1 bi
t = lo
g 2(2
)
Five
-leve
l tes
t res
ults
: [E
xtre
me(
E), S
ever
e(S)
, Mod
erat
e(M
), M
CI,
Nor
mal
(N)]
2.3
bits
= lo
g 2(5
)
Twel
ve-le
vel t
est r
esul
ts:
E, E
- , S+ , S
, S- , …
, MC
I- , N
3.6
bits
= lo
g 2(1
2)
Thus
, usi
ng +
/–‘s
giv
es t
he p
atie
nts/
doct
ors
abou
t 1.3
mor
e bi
ts o
f in
form
atio
n, p
er
test
-leve
l, th
an w
ithou
t us
ing
+/–‘
s, an
d ab
out
2.6
bits
per
gra
de m
ore
than
bin
ary
(AD
vs.
NoA
D)
type
tes
t re
sults
. W
hich
is
natu
rally
exp
ecte
d as
we
actu
ally
hav
e
mor
e in
form
atio
n av
aila
ble
in a
dditi
on to
pre
senc
e or
abs
ence
of A
D N
P sy
mpt
oms.
Exam
ple:
The
gen
etic
cod
e pr
ovid
es u
s w
ith s
eque
nces
con
stru
cted
fro
m 4
sym
bols
(A, C
, G, T
). Th
e m
axim
um a
vera
ge in
form
atio
n pe
r sym
bol i
s lo
g 2(4
)=2
bits
. If
the
sour
ce p
rovi
des
codo
ns (
bloc
ks o
f 3
of t
hese
sym
bols
), th
en t
he m
axim
um a
vera
ge
info
rmat
ion
is 6
bits
per
blo
ck, a
s I(
pk ) =
kxI
(p),
p =
¼. I
f w
e us
ed d
iffer
ent u
nits
,
e.g.
, log
10 ,
the
max
ent
ropy
will
be
4.15
9 na
ts p
er b
lock
.
Rem
arks
:
1.
Firs
t, th
ese
defin
ition
s of
info
rmat
ion
and
entro
py m
ay n
ot m
atch
with
som
e
othe
r use
s of
the
term
s. F
or e
xam
ple,
if w
e kn
ow th
at a
sou
rce
will
, with
equ
al
prob
abili
ty, t
rans
mit
eith
er th
e co
mpl
ete
text
of H
amle
t or t
he c
ompl
ete
text
of
Mac
beth
(an
d no
thin
g el
se),
then
rec
eivi
ng t
he c
ompl
ete
text
of
Ham
let
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
prov
ides
us
with
pre
cise
ly 1
bit
of i
nfor
mat
ion.
Sup
pose
a b
ook
cont
ains
ASC
II ch
arac
ters
. I
f th
e bo
ok i
s to
pro
vide
us
with
inf
orm
atio
n at
the
max
imum
rat
e, th
en e
ach
ASC
II ch
arac
ter
will
occ
ur w
ith e
qual
pro
babi
lity
–
it w
ill b
e a
rand
om se
quen
ce o
f cha
ract
ers.
2.
Seco
nd,
it is
im
porta
nt t
o re
cogn
ize
that
our
def
initi
ons
of i
nfor
mat
ion
and
entro
py d
epen
d on
ly o
n th
e pr
obab
ility
dis
tribu
tion.
In g
ener
al, i
t won
't m
ake
sens
e fo
r us
to ta
lk a
bout
the
info
rmat
ion
or th
e en
trop
y of
a s
ourc
e w
ithou
t
spec
ifyin
g its
pro
babi
lity
dist
ribut
ion.
3.
Bey
ond
that
, it c
an c
erta
inly
hap
pen
that
two
diffe
rent
obs
erve
rs o
f th
e sa
me
data
stre
am h
ave
diffe
rent
mod
els
of th
e so
urce
, and
thus
ass
ocia
te d
iffer
ent
prob
abili
ty d
istri
butio
ns to
the
sour
ce.
The
two
obse
rver
s w
ill t
hen
assig
n
diff
eren
t val
ues t
o th
e in
form
atio
n an
d en
tropy
ass
ocia
ted
with
the
sour
ce.
This
obs
erva
tion
acco
rds
with
our
intu
ition
: tw
o pe
ople
list
enin
g to
the
sam
e
tune
can
get
ver
y di
ffere
nt in
form
atio
n fr
om th
e m
usic
. Fo
r exa
mpl
e, w
ithou
t
appr
opria
te m
usic
bac
kgro
und,
one
per
son
may
get
exc
ited,
ano
ther
one
may
get b
ored
, yet
ano
ther
one
may
fall
asle
ep. T
he fi
rst l
iste
ner w
ho e
njoy
s m
usic
may
ass
ign
non
equa
l pro
babi
litie
s to
eac
h so
und/
chor
d/ep
ochs
as
he/s
he m
ay
antic
ipat
e m
uch
of w
here
the
tun
e go
es.
On
the
cont
rary
, th
e m
usic
al
com
posi
tion
may
sou
nd a
s (r
ando
m)
unsy
nchr
oniz
ed c
olle
ctio
n of
cho
rds
(e.g
., ab
stra
ct j
azz)
and
hen
ce t
he a
mou
nt o
f in
form
atio
n co
mpr
ehen
ded
by
this
list
ener
will
be
sign
ifica
ntly
hig
her,
as th
e pr
obab
ilitie
s th
at h
e/sh
e as
sign
s
to e
ach
note
/cho
rd a
re ro
ughl
y eq
ual.
A p
hysic
al E
xam
ple
(Gas
Par
ticle
s):
Let
us c
onsi
der
a si
mpl
e m
odel
for
an
idea
lized
gas
. Su
ppos
e a
cubi
cal v
olum
e V
cont
ains
gas
mad
e up
of N
poi
nt p
artic
les.
Ass
ume
also
tha
t th
roug
h so
me
mec
hani
sm, w
e ca
n de
term
ine
the
loca
tion
of e
ach
parti
cle
suffi
cien
tly w
ell a
s to
be
able
to lo
cate
it w
ithin
a b
ox w
ith s
ides
1/1
00 o
f the
side
s of t
he c
onta
inin
g vo
lum
e V.
The
re a
re 1
06 of t
hese
sm
all b
oxes
with
in V
.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
A f
requ
entis
t pro
babi
lity
mod
el fo
r th
is s
yste
m is
obt
aine
d by
mea
surin
g th
e nu
mbe
r
of p
artic
les
in e
ach
of t
he 1
06 sm
all
boxe
s at
one
fix
ed t
ime,
and
ass
igni
ng a
prob
abili
ty p
k of
fin
ding
a g
as p
artic
le i
n th
e sm
all
box
by c
ount
ing
the
num
ber
of
parti
cles
nk
in th
e bo
x, a
nd d
ivid
ing
by N
. Th
at is
, pk=
nk/N
. Fr
om th
is p
roba
bilit
y
dist
ribut
ion
mod
el, w
e ca
n ca
lcul
ate
the
entro
py:
∑=
=
∑=
=
610
1lo
g11
log
)(
kkn
NNkn
N kkp
kpP
H
Ther
e ar
e a
coup
le o
f spe
cial
cas
es to
con
side
r (re
pres
entin
g th
e ex
trem
a of
the
valu
es
of th
e en
tropy
).
1.
If th
e pa
rticl
es a
re e
venl
y di
strib
uted
am
ong
the
106 b
oxes
, the
n w
e w
ill h
ave
that
eac
h n k
=N/1
06 , and
in th
is c
ase
the
entro
py w
ill b
e:
()
()
()
10lo
g6
610
log
610
16
10lo
g6
101)
(×
==
∑=
=k
PH
Th
is ca
se o
bvio
usly
pre
sent
s a m
axim
um e
ntro
py c
onfig
urat
ion.
2.
At t
he o
ppos
ite s
ide
of th
e sp
ectru
m w
e ha
ve a
ll 10
6 par
ticle
s sit
ting
in e
xact
ly
one
smal
l box
, and
the
entro
py o
f eac
h of
thos
e co
nfig
urat
ions
is:
okk
for
0kp
and
1okp
as
kkp
kpP
H
≠=
=
=∑
==
,06
101
1lo
g)
(
This
cas
e ob
viou
sly
pres
ents
a m
inim
um e
ntro
py c
onfig
urat
ion.
Not
ice
that
the
se tw
o ca
lcul
ated
ent
ropi
es o
f th
e sy
stem
dep
end
in a
stro
ng w
ay o
n
the
rela
tive
scal
e of
m
easu
rem
ent.
Fo
r ex
ampl
e,
if th
e pa
rticl
es
are
even
ly
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
dist
ribut
ed, a
nd w
e in
crea
se o
ur a
ccur
acy
of m
easu
rem
ent b
y a
fact
or o
f 10
(i.e
., if
each
sm
all b
ox is
1/1
000
of th
e si
de o
f V),
then
the
calc
ulat
ed m
axim
um e
ntro
py, i
n
the
first
cas
e, w
ill b
e lo
g(10
9 ) ins
tead
of l
og(1
06 ).
In a
dditi
on,
for
phys
ical
sys
tem
s, w
e kn
ow t
hat
quan
tum
lim
its (
e.g.
, H
eise
nber
g
unce
rtain
ty r
elat
ions
) w
ill g
ive
us a
bou
nd o
n th
e ac
cura
cy o
f our
mea
sure
men
ts, a
nd
thus
a m
ore
or le
ss n
atur
al s
cale
for
doi
ng e
ntro
py c
alcu
latio
ns.
On
the
othe
r ha
nd,
for
mac
rosc
opic
sys
tem
s, w
e ar
e lik
ely
to f
ind
that
we
can
only
mak
e re
lativ
e ra
ther
than
abs
olut
e en
tropy
cal
cula
tions
.
Third
, su
ppos
e w
e ge
nera
lize
our
mod
el s
light
ly,
and
allo
w t
he p
artic
les
to m
ove
abou
t with
in V
. A
con
figur
atio
n of
the
syst
em is
then
sim
ply
a lis
t of
106 n
umbe
rs
N
kb1
≤≤
(i.e
., a
list
of t
he n
umbe
rs o
f pa
rticl
es i
n ea
ch o
f th
e sm
all
boxe
s).
Supp
ose
that
the
mot
ions
of
the
parti
cles
are
suc
h th
at f
or e
ach
parti
cle,
ther
e is
an
equa
l pr
obab
ility
tha
t it
will
mov
e in
to a
ny g
iven
new
sm
all
box
durin
g on
e
(mac
rosc
opic
) tim
e st
ep.
How
lik
ely
is i
t th
at a
t so
me
late
r tim
e w
e w
ill f
ind
the
syst
em in
a h
igh
entro
py c
onfig
urat
ion?
How
like
ly is
it th
at if
we
star
t the
sys
tem
in
a lo
w e
ntro
py c
onfig
urat
ion,
it
will
sta
y in
a l
ow e
ntro
py c
onfig
urat
ion
for
an
appr
ecia
ble
leng
th o
f tim
e?
If th
e sy
stem
is
not
curr
ently
in
a m
axim
um e
ntro
py
conf
igur
atio
n, h
ow li
kely
is it
that
the
entro
py w
ill in
crea
se in
suc
ceed
ing
time
step
s
(rat
her t
han
stay
the
sam
e or
dec
reas
e)?
Rec
all
the
bino
mia
l co
effic
ient
s (n
umbe
r of
arr
ange
men
ts o
f n
obje
cts
take
n k
at a
time,
whe
re t
he o
rder
doe
s no
t m
atte
r, co
mbi
natio
ns)
)!(!
!k
n k
nkn
−=
. A
nd t
he
Stirl
ing's
app
roxi
mat
ion
of n
!: n
en
nn
π2!
=, f
or la
rge
n.
Ther
e ar
e 10
6 con
figur
atio
ns w
ith a
ll th
e pa
rticl
es s
ittin
g in
exa
ctly
one
sm
all
box,
and
as w
e sh
own
abov
e, th
e en
tropy
of e
ach
of th
ose
conf
igur
atio
ns is
H(P
)=0.
The
se
are
obvi
ousl
y m
inim
um e
ntro
py c
onfig
urat
ions
.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
If w
e no
w c
onsi
der
pairs
of
smal
l bo
xes,
the
num
ber
of c
onfig
urat
ions
with
all
the
parti
cles
eve
nly
dist
ribut
ed b
etw
een
two
boxe
s is
11
105
26
10×
=
,
whi
ch i
s la
rge.
The
entro
py o
f eac
h of
thes
e co
nfig
urat
ions
is
).2lo
g()2
log(
21)2
log(
21)
2(
=+
=
−−
−−
boxe
sin
even
lyAl
l | P
H
The
tota
l num
ber o
f sys
tem
con
figur
atio
ns, i
n te
rms
of th
e nu
mbe
r of p
artic
les
with
in
a sm
all b
ox, i
s at
leas
t 5*1
011 +
106 . I
f w
e st
art t
he s
yste
m in
a c
onfig
urat
ion
with
entro
py 0
, the
n th
e pr
obab
ility
that
at s
ome
late
r tim
e it
will
be
in a
con
figur
atio
n w
ith
entro
py
)2lo
g()
(≥
PH
will
be
larg
er t
han
[5*1
011]
/ [5
*1011
+ 1
06 ]> 1
-105 ,
as
|S|
|Ev
ent
|
P(Ev
ent)
=
and
0,
,,
>∀
+>
+++
xb
ab
aax
ba
xa
. H
ere,
a=
P(Al
l-in-
2-bo
xes)
,
b=P(
all-i
n on
e-bo
x) a
nd x
=P(
all-i
n-m
ore-
than
-2-b
oxes
).
As
an e
xam
ple
at t
he o
ther
end
, co
nsid
er t
he n
umbe
r of
con
figur
atio
ns w
ith t
he
parti
cles
dis
tribu
ted
alm
ost
equa
lly,
exce
pt t
hat
half
the
boxe
s ar
e sh
ort
by o
ne
parti
cle,
and
the
rest
hav
e on
e ex
tra p
artic
le.
The
num
ber o
f suc
h co
nfig
urat
ions
is:
510*3
10
2
610
≈
6
10
Each
of
thes
e co
nfig
urat
ions
has
ent
ropy
ess
entia
lly a
ppro
xim
atel
y eq
ual t
o lo
g(10
6 ).
From
this
, we
can
conc
lude
that
if w
e st
art t
he s
yste
m in
a c
onfig
urat
ion
with
ent
ropy
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
of 0
(i.e
., al
l pa
rticl
es i
n on
e bo
x),
the
prob
abili
ty t
hat
late
r it
will
be
in a
hig
her
entro
py c
onfig
urat
ion
will
be
larg
er th
an
15
10*310
1≈
−−
.
Sim
ilar
argu
men
ts (w
ith s
imila
r re
sults
in
term
s of
pro
babi
litie
s) c
an b
e m
ade
for
starti
ng
in a
ny
conf
igur
atio
n w
ith e
ntro
py a
ppre
ciab
ly l
ess
than
log
(106 )
(the
max
imum
). I
n ot
her
wor
ds,
it is
ove
rwhe
lmin
gly
prob
able
tha
t as
tim
e pa
sses
,
mac
rosc
opic
ally
, the
syste
m w
ill in
crea
se in
ent
ropy
unt
il it
reac
hes t
he m
axim
um.
In m
any
resp
ects
, the
se g
ener
al a
rgum
ents
can
be th
ough
t of a
s a
proo
f (or
at l
east
an
expl
anat
ion)
of a
ver
sion
of th
e se
cond
law
of t
herm
odyn
amic
s:
Giv
en a
ny m
acro
scop
ic sy
stem
, whi
ch is
free
to c
hang
e co
nfig
urat
ions
, and
giv
en a
ny
conf
igur
atio
n wi
th e
ntro
py l
ess
than
the
max
imum
, th
ere
will
be o
verw
helm
ingl
y
man
y m
ore
acce
ssib
le c
onfig
urat
ions
with
hig
her
entro
py t
han
lowe
r en
tropy
, an
d
thus
, with
pro
babi
lity
indi
sting
uish
able
from
1, t
he s
yste
m w
ill (
in m
acro
scop
ic ti
me
steps
) su
cces
sivel
y ch
ange
to c
onfig
urat
ions
with
hig
her
entro
py u
ntil
it re
ache
s th
e
max
imum
.
7. S
hann
on's
com
mun
icat
ion
theo
ry
In s
ome
clas
sic
1948
pap
ers,
Cla
ude
Shan
non
laid
the
foun
datio
ns fo
r co
ntem
pora
ry
info
rmat
ion,
cod
ing}
, and
com
mun
icat
ion
theo
ry.
He
deve
lope
d a
gene
ral m
odel
for
com
mun
icat
ion
syst
ems,
and
a se
t of t
heor
etic
al to
ols
for a
naly
zing
suc
h sy
stem
s H
is
basic
mod
el c
onsis
ts o
f thr
ee p
arts:
a se
nder
(or s
ourc
e), a
cha
nnel
, and
a re
ceiv
er (o
r
Sour
ce S
igna
l En
codi
ng
Sour
ce
Cha
nnel
Cha
nnel
D
ecod
ing
Cha
nnel
Re
ceiv
er
Rec
eive
d Si
gnal
Noi
se
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
sink
). I
n ad
ditio
n, t
he m
odel
als
o in
clud
es e
ncod
ing
and
deco
ding
ele
men
ts,
and
nois
e w
ithin
the
com
mun
icat
ion
chan
nel.
In S
hann
on's
disc
rete
mod
el,
it is
ass
umed
tha
t th
e so
urce
pro
vide
s a
stre
am o
f
sym
bols
sel
ecte
d fr
om a
fini
te a
lpha
bet A
= {
a 1, a
2, …
, an}
, whi
ch a
re th
en e
ncod
ed.
The
code
is s
ent t
hrou
gh th
e ch
anne
l (an
d po
ssib
ly c
orru
pted
by
nois
e).
At t
he o
ther
end
of t
he c
hann
el,
the
rece
iver
will
dec
ode,
and
der
ive
info
rmat
ion
from
the
sequ
ence
of s
ymbo
ls.
Giv
en a
sou
rce
of s
ymbo
ls a
nd a
cha
nnel
with
noi
se (
in p
artic
ular
, a
prob
abili
ty
mod
el fo
r the
se e
lem
ents
), w
e ca
n ta
lk a
bout
the
capa
city
of t
he c
hann
el. T
he g
ener
al
mod
el S
hann
on w
orke
d w
ith in
volv
ed tw
o se
ts o
f sym
bols
, the
inpu
t sym
bols
and
the
outp
ut s
ymbo
ls.
Let u
s sa
y th
e tw
o se
ts o
f sy
mbo
ls a
re A
= {
a 1, a
2, …
, an}
and
B =
{b1,
b 2, …
, bm}.
Not
e th
at w
e do
not
nec
essa
rily
assu
me
the
sam
e nu
mbe
r of s
ymbo
ls
in t
he t
wo
sets
. G
iven
the
noi
se i
n th
e ch
anne
l, w
hen
sym
bol
b k c
omes
out
of
the
chan
nel,
we
can
not b
e su
re w
hich
al,
was
put
in.
The
chan
nel i
s ch
arac
teriz
ed b
y th
e
set o
f pro
babi
litie
s
{P(a
l | b
k)}l,k
.
We
can
then
con
side
r va
rious
rel
ated
inf
orm
atio
n an
d en
tropy
mea
sure
s.
Firs
t, w
e
can
cons
ider
the
info
rmat
ion
we
get f
rom
obs
ervi
ng a
sym
bol b
k. G
iven
a p
roba
bilit
y
mod
el o
f th
e so
urce
, we
have
an
a pr
iori
estim
ate
P(a l
) th
at s
ymbo
l al w
ill b
e se
nt
next
. U
pon
obse
rvin
g b k
, we
can
revi
se o
ur e
stim
ate
to P
(al |
bk).
The
cha
nge
in o
ur
info
rmat
ion
(the
mut
ual i
nfor
mat
ion)
will
be
give
n by
:
=
−=
)l
P(a
)kb|
lP(
alo
g
)kb|
lP(
a
1lo
g)
lP(
a1lo
g )
kb ; lI(
a
(6)
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
We
have
the
prop
ertie
s of t
his
func
tiona
l:
1. I
(al;
b k) =
I(b
k; a l
)
2. I
(al;
b k) =
log(
P(a l
| b k
)) +
I(a l
)
3. I
(al;
b k)
≤ I(a
l)
If a l
an
d b k
are
inde
pend
ent (
i.e.,
P(a l
| b
k)= P
(al )
), th
en I(
a l; b
k)=0.
Wha
t we
ofte
n tim
es w
ant i
s to
ave
rage
the
mut
ual i
nfor
mat
ion
over
all
the
sym
bols
:
∑∈
×=
∑∈
×=∑
∈×
=
∑∈
×=
Bk
)k
P(b
)la|
kP(
bogl
)la|
kP(
b
Bk
)la;
kI(
b)
la|k
P(b
B) ; l
I(a
Al
)l
P(a
)kb|
lP(
aogl
)kb|
lP(
a
Al
)kb; l
I(a
)kb|
lP(
a )
kb I(
A;
(7
)
Ther
efor
e
∑∈
××
∑∈
=
∑∈
××
×∑
∈=
∑∈
×=
Bk
)k
P(b
)l
P(a
)kb ; l
P(a
)kb ; l
P(a
Al
Bk
)k
P(b
)l
P(a
)kb ; l
P(a
)kb | l
P(a
) lP(
aA
l
Al
B); lI(
a)
lP(
a
B) I(
A;
log
log
(
8)
We
have
the
pro
perti
es:
I(A;
B)
≥0 a
nd I
(A;
B) =
0,
if an
d on
ly i
f A
and
B ar
e
inde
pend
ent.
Def
initi
on: C
ondi
tiona
l Ent
ropy
is d
efin
ed b
y
∑∈
∑∈
×
=A
lB
k)
kb|l
P(a
ogl)
kb|l
P(a
B) |
H(A
1
(
9)
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
Not
ice
that
:
∑∈
∑∈
×
=
==
∑
×
∑
×
∈∈
Al
Bk
)kb ; l
P(a
ogl)
kb; lP(
a
B)H
(A,
H(B
) H
(A)
Bk
kk
Al
ll
)P(
bogl
)P(
b
)P(
aogl
)P(
a
1
11
(10
)
And
H
(A, B
) =
H(A
) + H
(B|A
) = H
(B) +
H(A
|B);
I(A;
B) =
H(A
) + H
(B) –
H(A
, B)
= H
(A) -
H(A
|B)
= H
(B) -
H(B
|A) ≥
0.
If w
e ar
e gi
ven
a ch
anne
l, w
e co
uld
ask
wha
t is
the
max
imum
pos
sibl
e in
form
atio
n
can
be tr
ansm
itted
thro
ugh
the
chan
nel.
We
coul
d al
so a
sk w
hat m
ix o
f the
sym
bols
{al}
we
shou
ld u
se t
o ac
hiev
e th
e m
axim
um b
andw
idth
. I
n pa
rticu
lar,
usin
g th
e
defin
ition
s ab
ove,
we
can
defin
e th
e C
hann
el C
apac
ity, C
, to
be:
()
B) I(
A;P(
a)
C
max
=
We
have
the
nic
e pr
oper
ty t
hat
if w
e ar
e us
ing
the
chan
nel
at i
ts c
apac
ity,
then
for
each
of t
he a
l, I(
a l;B
) = C
, and
thus
, we
can
max
imiz
e ch
anne
l use
by
max
imiz
ing
the
use
for e
ach
sym
bol i
ndep
ende
ntly
.
Theo
rem
: [S
hann
on]
For
any
chan
nel,
ther
e ex
ist
way
s of
enc
odin
g in
put
sym
bols
such
tha
t w
e ca
n si
mul
tane
ousl
y ut
ilize
the
cha
nnel
as
clos
ely
as w
e w
ish
to t
he
capa
city
, and
at t
he s
ame
time
have
an
erro
r rat
e as
clo
se to
zer
o as
we
wis
h.
This
is a
ctua
lly q
uite
a re
mar
kabl
e th
eore
m.
We
mig
ht n
aive
ly g
uess
that
in o
rder
to
min
imiz
e th
e er
ror
rate
, we
wou
ld h
ave
to u
se m
ore
of th
e ch
anne
l cap
acity
for
erro
r
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
dete
ctio
n/co
rrec
tion,
and
les
s fo
r ac
tual
tra
nsm
issi
on o
f in
form
atio
n.
Shan
non
show
ed t
hat
it is
pos
sibl
e to
kee
p er
ror
rate
s lo
w a
nd s
till
use
the
chan
nel
for
info
rmat
ion
trans
mis
sion
at (
or n
ear)
its
capa
city
.
Unf
ortu
nate
ly, S
hann
on's
proo
f has
a c
oupl
e of
dow
nsid
es.
The
first
is th
at th
e pr
oof
is n
on-c
onst
ruct
ive.
It d
oesn
't te
ll us
how
to c
onst
ruct
the
codi
ng s
yste
m to
opt
imiz
e
chan
nel u
se, b
ut o
nly
tells
us
that
suc
h a
code
exi
sts.
The
sec
ond
is th
at in
ord
er to
use
the
capa
city
with
a lo
w e
rror
rat
e, w
e m
ay h
ave
to e
ncod
e ve
ry la
rge
bloc
ks o
f
data
. Th
is m
eans
that
if w
e ar
e at
tem
ptin
g to
use
the
chan
nel i
n re
al-ti
me,
ther
e m
ay
be ti
me
lags
whi
le w
e ar
e fil
ling
buffe
rs.
Ther
e is
thus
stil
l muc
h w
ork
poss
ible
in th
e
sear
ch fo
r effi
cien
t cod
ing
sche
mes
.
Am
ong
the
thin
gs w
e ca
n do
is lo
ok a
t nat
ural
cod
ing
syst
ems
(suc
h as
, for
exa
mpl
e,
the
DN
A c
odin
g sy
stem
, or n
eura
l sys
tem
s) a
nd s
ee h
ow th
ey u
se th
e ca
paci
ty o
f the
ir
chan
nel.
It is
not
unr
easo
nabl
e to
ass
ume
that
evo
lutio
n w
ill h
ave
done
a p
retty
goo
d
job
of o
ptim
izin
g ch
anne
l use
.
8. A
pplic
atio
n of
Ent
ropy
to m
odel
ing
DN
A se
quen
ces
Let u
s ap
ply
som
e of
thes
e id
eas
to th
e (g
ener
al)
prob
lem
of a
naly
zing
gen
omes
. We
can
star
t w
ith a
n ex
ampl
e su
ch a
s th
e co
mpa
rativ
ely
smal
l ge
nom
e of
Esc
heric
hia
coli ,
stra
in K
-12,
sub
stra
in M
G16
55, v
ersi
on M
52.
This
exa
mpl
e ha
s th
e co
nven
ient
feat
ures
:
1.
It ha
s bee
n co
mpl
etel
y se
quen
ced.
2.
The
sequ
ence
is a
vaila
ble
for d
ownl
oadi
ng: h
ttp://
ww
w.g
enet
ics.
wis
c.ed
u/
3.
Ann
otat
ed v
ersi
ons a
re a
vaila
ble
for f
urth
er w
ork.
4.
It is
larg
e en
ough
to b
e in
tere
stin
g (s
omew
hat o
ver 4
meg
a-ba
ses,
or
4 m
illio
n
nucl
eotid
es),
but n
ot so
hug
e as
to b
e co
mpl
etel
y un
wie
ldy.
5.
This
dat
a fil
e be
gins
with
:
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
>gb|
U00
096|
U00
096
Esch
eric
hia
coli
K
-12
MG
1655
com
plet
e ge
nom
e
AG
CTT
TTC
AT
TCTG
AC
TG
CA
AC
GG
GC
AA
TATG
TCT
CTG
TG
TG
GA
TTA
AA
AA
AA
GA
GTG
TCTG
ATA
GC
AG
C
TTC
TGA
AC
TGG
TTA
CC
TGC
CG
TGA
GTA
AA
TTA
AA
A
TTTT
ATT
GA
CTT
AG
GTC
AC
TAA
ATA
CT
TTA
AC
CA
A
TAT
AG
GC
ATA
GC
GC
AC
AG
AC
AG
ATA
AA
AA
TTA
CA
G
AG
TAC
AC
AA
CA
TC
CA
TGA
AA
CG
CA
TTA
GC
AC
CA
CC
ATT
AC
CA
CC
AC
CA
TCA
CC
ATT
AC
CA
CA
GG
TA
AC
GG
TGC
GG
GC
TGA
CG
CG
TAC
AG
GA
AA
CA
CA
GA
AA
AA
AG
CC
CG
CA
CC
TGA
CA
GTG
CG
GG
CTT
TTT
TTTT
CG
AC
C
AA
AG
GTA
AC
GA
GG
TAA
CA
AC
CA
TGC
GA
GTG
TTG
AA
In th
is e
xplo
rato
ry p
roje
ct, o
ur g
oal w
ill b
e to
app
ly th
e in
form
atio
n an
d en
tropy
idea
s
outli
ned
abov
e to
gen
ome
anal
ysis
. O
ur fi
rst s
tep
is to
gen
erat
e a
rand
om g
enom
e of
com
para
ble
size
to c
ompa
re t
hing
s w
ith.
We
can
use
SOCR
, Exc
el, S
AS,
R, C
++,
Java
or
othe
r la
ngua
ges/
prog
ram
s to
gen
erat
e a
file
cont
aini
ng a
ran
dom
seq
uenc
e of
abou
t 4 m
illio
n le
tters
A, C
, G, T
. In
the
actu
al g
enom
e, th
ese
lette
rs s
tand
for
the
nucl
eotid
es a
deni
ne (A
), cy
tosin
e (C
), gu
anin
e (G
), an
d th
ymin
e (T
).
Ther
e ar
e ot
her
appr
oach
es t
o th
is p
roce
ss,
e.g.
, ra
ndom
ly s
huffl
ing
an a
ctua
l
obse
rved
gen
ome
(thus
mai
ntai
ning
the
rela
tive
prop
ortio
ns o
f A
s, C
s, G
s, an
d Ts
).
Part
of th
e ju
stifi
catio
n fo
r thi
s m
etho
dolo
gy is
that
act
ual (
iden
tifie
d) c
odin
g se
ctio
ns
of D
NA
tend
to h
ave
a ra
tio o
f 1
≠++
TA
GC
.
One
can
hop
e th
at im
porta
nt s
tretc
hes
of D
NA
will
hav
e en
tropy
diff
eren
t fro
m o
ther
stre
tche
s.
Of
cour
se,
as n
oted
abo
ve,
the
entro
py m
easu
re d
epen
ds i
n an
ess
entia
l
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
way
on
the
prob
abili
ty m
odel
attr
ibut
ed to
the
sour
ce.
We
will
wan
t to
try to
bui
ld a
mod
el th
at c
atch
es im
porta
nt a
spec
ts o
f wha
t we
find
inte
rest
ing
or s
igni
fican
t.
We
will
wan
t to
use
our
kno
wle
dge
of t
he s
yste
ms
in w
hich
DN
A i
s em
bedd
ed t
o
guid
e th
e de
velo
pmen
t of o
ur m
odel
s. O
n th
e ot
her
hand
, we
prob
ably
don
't w
ant t
o
cons
train
the
mod
el to
o m
uch.
Rem
embe
r tha
t inf
orm
atio
n an
d en
tropy
are
mea
sure
s
of u
nexp
ecte
dnes
s. If
we
cons
train
our
mod
el to
o m
uch,
we
won
't le
ave
any
room
for
the
unex
pect
ed!
We
know
, fo
r ex
ampl
e,
that
si
mpl
e
repe
titio
ns h
ave
low
ent
ropy
. B
ut i
f th
e
code
bei
ng u
sed
is r
edun
dant
(so
met
imes
calle
d de
gene
rate
), w
ith
mul
tiple
enco
ding
s fo
r th
e sa
me
sym
bol
(as
is t
he
case
for
DN
A c
odon
s), w
hat l
ooks
to o
ne
obse
rver
to
be a
ran
dom
stre
am m
ay b
e
reco
gniz
ed
by
anot
her
obse
rver
(w
ho
know
s th
e co
de) t
o be
a si
mpl
e re
petit
ion.
The
codi
ng s
eque
nces
for
pep
tides
and
prot
eins
are
enc
oded
via
cod
ons,
that
is,
by
sequ
ence
s of
blo
cks
of t
riple
s of
nucl
eotid
es.
Thus
, for
exa
mpl
e, th
e co
don
AGC
on m
RN
A (
mes
seng
er R
NA
) co
des
for
the
amin
o ac
id s
erin
e (o
r, if
we
happ
en t
o be
rea
ding
in
the
reve
rse
dire
ctio
n,
CGA,
it
mig
ht c
ode
for
alan
ine)
. O
n D
NA
, AG
C co
des
for
UCG
or
CGA
on t
he
mR
NA
, and
thus
cou
ld c
ode
for c
yste
ine
or a
rgin
ine.
Am
ino
acid
s sp
ecifi
ed b
y ea
ch c
odon
sequ
ence
on
mR
NA
.
A =
ade
nine
G
= g
uani
ne
C =
cyt
osin
e T
= th
ymin
e U
= u
raci
l
The
Gen
etic
Cod
e.
htt p
://w
ww
.acc
esse
xcel
lenc
e.or
g/
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
Am
ino
acid
key
for t
he a
bove
Fig
ure:
Ala
= A
lani
ne
Arg
=Arg
inin
e
Asn
=Asp
arag
ine
A
sp=A
spar
tic
acid
Cys
=Cys
tein
e
Gln
=Glu
tam
ine
Glu
=Glu
tam
ic
acid
Gly
=Gly
cine
His
=His
tidin
e
Ile=I
sole
ucin
e Le
u=Le
ucin
e Ly
s=Ly
sine
Met
=Met
hion
ine
Phe=
Phen
ylal
anin
e
Pro=
Prol
ine
Se
r=Se
rine
Thr=
Thre
onin
e
Trp=
Tryp
toph
ane
Ty
r=Ty
rosi
ne
Val
=Val
ine
As
a fir
st s
tep
cons
ider
eac
h of
the
thre
e-nu
cleo
tide
codo
ns a
s a
dist
inct
sym
bol.
We
can
then
tak
e a
chun
k of
gen
ome
and
estim
ate
the
prob
abili
ty o
f oc
curr
ence
of
each
cod
on b
y sim
ply
coun
ting
and
divi
ding
by
the
leng
th.
At t
his
leve
l, w
e ar
e
assu
min
g w
e ha
ve n
o kn
owle
dge
of w
here
cod
ons
star
t, an
d so
in
this
mod
el,
we
assu
me
that
rea
dout
cou
ld b
egin
at a
ny n
ucle
otid
e. W
e th
us u
se e
ach
thre
e ad
jace
nt
nucl
eotid
es.
For e
xam
ple,
giv
en th
e D
NA
chu
nk (l
engt
h=34
):
AG
CTT
TTC
ATT
CTG
AC
TG
CA
AC
GG
GC
AA
TATG
TC
Our
cod
on c
ount
yie
lds (
in le
xico
grap
hica
l ord
er!):
AA
T 1
AA
C 1
AC
G 1
AC
T 1
AG
C 1
ATA
1 A
TG 1
ATT
1 C
AA
2 C
AT
1
CG
G 1
CTG
2 C
TT 1
GA
C 1
GC
A 2
GC
T 1
GG
C 1
GG
G 1
GTC
1 T
AT
1
TCA
1 T
CT
1 T
GA
1 T
GC
1 T
GT
1
TTC
2 T
TT 2
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
We
can
then
est
imat
e th
e en
tropy
of
this
sequ
ence
by:
bits.
4.
7
27
1l
lp2
log
lp=
∑=
×
1
The
max
imum
pos
sibl
e en
tropy
for
this
chu
nk w
ould
be:
lo
g 2(3
2) =
5
bits
.
Supp
ose
we
wan
t to
fin
d in
tere
stin
g
sect
ions
(fe
atur
es)
in th
e ge
nom
e. A
s
a st
artin
g pl
ace,
w
e ca
n sl
ide
a
win
dow
ov
er
the
geno
me,
an
d
estim
ate
the
entro
py
with
in
the
win
dow
. Th
e pl
ot b
elow
sho
ws
the
entr
opy
estim
ates
for
the
E.
coli
geno
me,
with
in a
win
dow
of
size
38 =656
1. T
he w
indo
w i
s sli
d in
ste
ps o
f siz
e 34 =8
1.
This
res
ults
in
57,1
94
snap
shot
s, on
e fo
r ea
ch p
lace
men
t of
the
win
dow
. Fo
r co
mpa
rison
, the
val
ues
for
a
rand
om g
enom
e ar
e al
so sh
own.
At t
his
leve
l, w
e ca
n m
ake
the
sim
ple
obse
rvat
ion
that
the
actu
al g
enom
e va
lues
are
quite
diff
eren
t fro
m th
e co
mpa
rativ
e ra
ndom
str
ing.
The
val
ues
for E
. col
i ran
ge
from
abo
ut 5
.8 t
o ab
out
5.96
, w
hile
the
ran
dom
val
ues
are
clus
tere
d qu
ite c
lose
ly
abov
e 5.
99 (t
he m
axim
um p
ossi
ble
is lo
g 2(4
3 )=lo
g 2(6
4) =
6. R
ecal
l, I(
p) =
-log
b(p)
= lo
g b(1
/p) a
nd p
=1/6
4).
Com
paris
on o
f the
ent
ropy
for E
. col
i ge
nom
e (s
piky
cur
ve) a
nd a
rand
om g
enom
e se
quen
ce.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
9. M
easu
res o
f Dim
ensi
onal
ity a
nd re
latio
n to
Ent
ropy
A u
sefu
l gen
eral
izat
ion
of e
ntro
py (a
s a
mea
sure
of c
ompl
exity
) was
dev
elop
ed b
y th
e
Hun
garia
n m
athe
mat
icia
n A.
Ren
yi.
The
Reny
i Ent
ropy
is d
efin
ed a
s th
e m
omen
ts o
f
orde
r q o
f a p
roba
bilit
y di
strib
utio
n P=
{pi}:
∑−
=i
q iplo
gq
1q
S1
(11
)
Taki
ng th
e lim
it as
q1,
we
get:
∑=
iip
log
ip1S
1
The
last
exp
ress
ion
is e
xact
ly t
he e
ntro
py w
e pr
evio
usly
def
ined
. S
o, S
q is
a
gene
raliz
ed e
ntro
py fo
r any
real
num
ber q
. The
lim
it of
S1,
as q
1 is
bec
ause
:
()
()
())
log(
1)
log(
11
lim1
1lim
ipq ip
qip
q ipq ip
q
and
iq ip
log
q
iq ip
log
q
→
→
→
→
=∂∂
−∂∂
∑∂∂
=−∑
Usi
ng th
e Re
nyi E
ntro
py, w
e ca
n th
en d
efin
e a
gene
raliz
ed d
imen
sion
asso
ciat
ed w
ith
a da
ta s
et.
Supp
ose
a da
ta s
et is
dis
tribu
ted
amon
g bi
ns o
f dia
met
er r
, we
can
let p
i
be th
e pr
obab
ility
that
a d
ata
item
fal
ls in
the
ith b
in (
estim
ated
by
coun
ting
the
data
elem
ents
in th
e bi
n, a
nd d
ivid
ing
by th
e to
tal n
umbe
r of i
tem
s).
We
can
then
, def
ine
a
dim
ensio
n (f
or e
ach
q):
∑
−=
→
)
log(
log
10
limri
q ip
q
1
r q
D
(12)
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
Why
do
we
call
this
a g
ener
aliz
ed d
imen
sion?
Con
side
r fir
st D
q=0.
We
defin
e p i
0 = 0
,
whe
n p i
=0.
A
lso,
let N
r be
the
num
ber
of n
on-e
mpt
y bi
ns o
f di
amet
er r
it ta
kes
to
cove
r the
dat
a se
t. Th
en w
e ha
ve:
()
()
→
→
=∑
=
rrN
rr
iip
r
Do
1lo
glog
0lim
1lo
g
log
0lim
Def
initi
on:
D0
is t
he H
ausd
orff
dim
ensio
n, w
hich
is
som
etim
es c
alle
d th
e fra
ctal
dim
ensio
n of
the
set.
Exam
ples
:
1.
1D: C
onsi
der t
he u
nit i
nter
val [
0,1]
. Le
t k 21
kr=
. Th
en N
rk =
2k , a
nd
() ()1
2lo
g
2lo
g
0lim
==
→
kk
r
Do
2.
2D:
Con
side
r th
e un
it sq
uare
[0,
1]x[
0,1]
. A
gain
, let
k 21
kr=
. Th
en N
rk =
22k, a
nd
()
()2
2lo
g
2 2lo
g
0lim
==
→
kk
r
Do
.
3.
1D2D
? C
onsi
der
the
Can
tor
set:
The
cons
truct
ion
of th
e C
anto
r se
t is
done
by
indu
ctio
n.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
The
Can
tor
set i
s w
hat r
emai
ns f
rom
the
inte
rval
afte
r w
e ha
ve r
emov
ed m
iddl
e
third
s co
unta
bly
man
y tim
es.
It is
an
unco
unta
ble
set,
with
mea
sure
(le
ngth
) 0.
For t
his s
et w
e w
ill le
t k 31
kr=
. Th
en N
rk =
2k , a
nd
() ()63
1.0
)3lo
g(
)2lo
g(
3lo
g
2lo
g
0lim
==
=
→
kk
r
Do
The
Can
tor
set
is a
tra
ditio
nal
exam
ple
of a
fra
ctal
. I
t is
sel
f-sim
ilar,
and
has
Hau
sdor
ff di
men
sion
of
0.
631,
w
hich
is
st
rictly
gr
eate
r th
an
its
(inte
ger)
topo
logi
cal d
imen
sion
0.
Som
e no
nlin
ear
dyna
mic
al s
yste
ms
have
traj
ecto
ries
whi
ch a
re lo
cally
the
prod
uct o
f
a C
anto
r set
with
a m
anifo
ld (
i.e.,
Poin
care
sect
ions
are
gen
eral
ized
Can
tor s
ets)
.
Prop
ertie
s of D
q:
1.
Mon
oton
icity
: If
21
≤, t
hen
21
DD
≤.
2.
Frac
tal C
alcu
latio
ns:
If th
e se
t is
stric
tly s
elf-s
imila
r w
ith e
qual
pro
babi
litie
s
p i =
1/N
, the
n th
e ca
lcul
atio
ns a
re tr
ivia
l and
we
do n
ot n
eed
to ta
ke th
e lim
it
as r
0, si
nce
()
()
oD
rN
r
q
NN
q
1
r q
D=
=
×
−=
→
1
log
log
)lo
g(
1lo
g
10
lim
This
is th
e ca
se, f
or e
xam
ple,
for t
he C
anto
r set
.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
3.
Info
rmat
ion
Dim
ensio
n:
∑
=
→
)lo
g(
1lo
g
0lim
r
iip
ip
r 1
D
(13
)
The
num
erat
or is
just
the
entro
py o
f the
pro
babi
lity
dist
ribut
ion.
4.
Corr
elat
ion
Dim
ensio
n: T
his
dim
ensi
on is
rela
ted
to th
e pr
obab
ility
of f
indi
ng
two
elem
ents
of t
he se
t with
in a
dis
tanc
e r o
f eac
h ot
her.
()
∑
=
→
)lo
g(
2lo
g
0lim
riip
r
D2
(1
4)
10. M
utua
l Inf
orm
atio
n
Con
side
r a
syst
em w
ith th
e in
put X
and
out
put Y
(X,
Y r
ando
m v
aria
bles
). H
ow c
an
we
mea
sure
the
unc
erta
inty
abo
ut X
afte
r ob
serv
ing
Y? L
et’s
def
ine
cond
ition
al
entr
opy
of X
with
giv
en Y
: H( X
| Y
) = H
(X, Y
) – H
(Y).
,,
(,
)(
,)l
og(
(,
))X
YX
YH
XY
fx
yf
xy
dxdy
∞∞
−∞−∞
=−∫∫
(14)
()
()l
og(
())
YY
HY
fy
fy
dy∞ −∞
=−∫
The
cond
ition
al e
ntro
py
(|
)H
XY
repr
esen
ts th
e am
ount
of t
he u
ncer
tain
ty r
emai
ning
abou
t the
sys
tem
inpu
t X a
fter t
he s
yste
m o
utpu
t Y h
as b
een
obse
rved
. The
nex
t cla
im
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
is i
ntui
tivel
y cl
ear:
The
diffe
renc
e (
)(
|)
HX
HX
Y−
mus
t re
pres
ent
the
unce
rtain
ty
abou
t the
syst
em in
put t
hat i
s re
solv
ed b
y ob
serv
ing
the
syst
em o
utpu
t:
(,
)(
)(
|)
IX
YH
XH
XY
=−
(15
)
Mut
ual i
nfor
mat
ion
can
qual
itativ
ely
be th
ough
t of
as a
mea
sure
of
how
wel
l one
imag
e ex
plai
ns t
he o
ther
, an
d is
max
imiz
ed a
t th
e op
timal
alig
nmen
t. It
can
be
expr
esse
d in
the
follo
win
g fo
rm:
∑∑
=−
+=
ab
bP
aP
ba
Pb
aP
BA
HB
HA
HB
AI
)(
)(
),
(lo
g)
,(
),
()
()
()
,(
(16
)
The
cond
ition
al p
roba
bilit
y P(
b|a)
is th
e pr
obab
ility
that
B w
ill ta
ke th
e va
lue
b gi
ven
that
A h
as t
he v
alue
a.
The
cond
ition
al e
ntro
py i
s th
eref
ore
the
aver
age
of t
he
entro
py o
f B fo
r eac
h va
lue
of A
, wei
ghte
d ac
cord
ing
to th
e pr
obab
ility
of g
ettin
g th
at
valu
e of
A:
)(
),
()
|(
log
),
()
|(
,
AH
BA
Ha
bP
ba
PA
BH
ba
−=
−=∑
(1
7)
Thus
the
equa
tion
for m
utua
l inf
orm
atio
n ca
n be
rew
rite
as:
)
|(
)(
)|
()
()
,(
BA
HB
HA
BH
AH
BA
I−
=−
=
(18
)
Reg
istra
tion
by m
axim
izat
ion
of m
utua
l in
form
atio
n th
eref
ore
invo
lves
fin
ding
the
trans
form
atio
n th
at m
akes
imag
e A
the
best
poss
ible
pre
dict
or fo
r im
age
B w
ithin
the
regi
on o
f ove
rlap.
The
adva
ntag
e of
mut
ual
info
rmat
ion
over
joi
nt e
ntro
py i
s th
at i
t in
clud
es t
he
entro
pies
of t
he s
epar
ate
imag
es. M
utua
l inf
orm
atio
n an
d jo
int e
ntro
py a
re c
ompu
ted
for t
he o
verla
ppin
g pa
rts o
f the
imag
es a
nd th
e m
easu
res
are
ther
efor
e se
nsiti
ve to
the
size
and
the
cont
ents
of o
verla
p. A
pro
blem
that
can
occ
ur w
hen
usin
g jo
int e
ntro
py
on it
s ow
n is
that
low
val
ues
(nor
mal
ly a
ssoc
iate
d w
ith a
hig
h de
gree
of
alig
nmen
t)
can
be f
ound
for
com
plet
e m
isreg
istra
tions
. Fo
r ex
ampl
e, w
hen
trans
form
ing
one
Ivo
D. D
inov
UCL
A S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cour
ses &
Stu
dent
s
imag
e to
suc
h an
ext
ent t
hat o
nly
an a
rea
of b
ackg
roun
d ov
erla
ps fo
r the
two
imag
es,
the
join
t hist
ogra
m w
ill b
e ve
ry sh
arp,
ther
e is
onl
y on
e pe
ak fr
om b
ackg
roun
d.
Mut
ual i
nfor
mat
ion
is be
tter e
quip
ped
to a
void
suc
h pr
oble
ms,
beca
use
it in
clud
es th
e
mar
gina
l ent
ropi
es H
(A)
and
H(B
). Th
ese
will
hav
e lo
w v
alue
s w
hen
the
over
lapp
ing
part
of t
he i
mag
es c
onta
ins
only
bac
kgro
und
and
high
val
ues
whe
n it
cont
ains
anat
omic
al s
truct
ure.
The
mar
gina
l ent
ropi
es w
ill th
us b
alan
ce th
e m
easu
re s
omew
hat
by p
enal
izin
g fo
r tra
nsfo
rmat
ions
tha
t de
crea
se t
he a
mou
nt o
f in
form
atio
n in
the
sepa
rate
im
ages
. Con
sequ
ently
, mut
ual
info
rmat
ion
is le
ss s
ensit
ive
to o
verla
p th
an
join
t ent
ropy
, alth
ough
not
com
plet
ely
imm
une.
11. N
orm
alize
d M
utua
l Inf
orm
atio
n Th
e siz
e of
the
ove
rlapp
ing
part
of t
he i
mag
es i
nflu
ence
s th
e m
utua
l in
form
atio
n
mea
sure
in
two
way
s. Fi
rst
of a
ll, a
dec
reas
e in
ove
rlap
decr
ease
s th
e nu
mbe
r of
sam
ples
, whi
ch re
duce
s th
e sta
tistic
al p
ower
of t
he p
roba
bilit
y di
strib
utio
n es
timat
ion.
Seco
ndly
, th
e m
utua
l in
form
atio
n m
easu
re m
ay a
ctua
lly i
ncre
ase
with
inc
reas
ing
misr
egist
ratio
n (w
hich
usu
ally
coi
ncid
es w
ith d
ecre
asin
g ov
erla
p).
This
can
occu
r
whe
n th
e re
lativ
e ar
eas
of o
bjec
t an
d ba
ckgr
ound
eve
n ou
t an
d th
e su
m o
f th
e
mar
gina
l ent
ropi
es in
crea
ses,
faste
r tha
n th
e jo
int e
ntro
py. S
tudh
olm
e et
al.
prop
osed
a no
rmal
ized
mea
sure
of
mut
ual
info
rmat
ion,
whi
ch i
s le
ss s
ensi
tive
to c
hang
es i
n
over
lap:
),
()
()
(B
AH
BH
AH
NMI
+=
(19)
Mae
s et
al.
have
sug
geste
d th
e us
e of
the
Entro
py C
orre
latio
n Co
effic
ient
(EC
C) a
s
anot
her
form
of
norm
aliz
ed m
utua
l in
form
atio
n. N
MI
and
ECC
are
rela
ted
in t
he
follo
win
g m
anne
r:
NM
IEC
CB
HA
HB
AI
/22
)(
)(
),
(2
−=
=+
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
12. K
ullb
ack-
Leib
ler d
iver
genc
e
It co
uld
be u
sefu
l to
defin
e a
dist
ance
bet
wee
n tw
o ve
ctor
dis
tribu
tions
. If
()
f Xx
is a
distr
ibut
ion
of th
e ve
ctor
X, a
nd
()
g Xx
is a
diff
eren
t dist
ribut
ion
of th
e X
, tha
n th
e
dista
nce
betw
een
thes
e di
strib
utio
ns c
an b
e w
ritte
n as
()||
()
()
()l
og(
)f
xg
xf
Df
dg
∞ −∞
=∫
XX
X
xx
xx
(
20)
For
a sin
gle
imag
e, t
he e
ntro
py i
s no
rmal
ly c
alcu
late
d fro
m t
he i
mag
e in
tens
ity
histo
gram
in w
hich
the
prob
abili
ties
{ p1,
p 2, p
3, …
, p n
} ar
e th
e hi
stogr
am e
ntrie
s. If
all v
oxel
s in
an
imag
e ha
ve th
e sa
me
inte
nsity
a, t
he h
istog
ram
con
tain
s a
singl
e no
n-
zero
ele
men
t with
pro
babi
lity
of 1
, and
the
entro
py o
f thi
s im
age
is 0.
If th
is un
iform
imag
e w
ere
to in
clud
e so
me
noise
, the
n th
e hi
stogr
am w
ill c
onta
in a
clus
ter
of n
on-z
ero
entri
es a
roun
d a
peak
at
the
aver
age
inte
nsity
val
ue.
So t
he
addi
tion
of n
oise
to th
e im
age
tend
s to
equ
aliz
e th
e pr
obab
ilitie
s, w
hich
incr
ease
s th
e
entro
py. O
ne c
onse
quen
ce is
that
inte
rpol
atio
n of
an
imag
e m
ay s
moo
th th
e im
age,
whi
ch
can
redu
ce
the
nois
e,
and
cons
eque
ntly
‘s
harp
en’
the
histo
gram
. Th
is
shar
peni
ng o
f the
hist
ogra
ms r
educ
es e
ntro
py.
App
licat
ion
of e
ntro
py f
or in
tram
odal
ity im
age
regi
stra
tion:
The
goa
l now
is to
calc
ulat
e th
e en
trop
y of
a d
iffer
ence
imag
e. If
two
perf
ectly
alig
ned
iden
tical
imag
es
are
subt
ract
ed th
e re
sult
is an
ent
irely
uni
form
imag
e th
at h
as z
ero
entro
py. F
or t
wo
imag
es th
at d
iffer
by
nois
e, th
e hi
stog
ram
will
be
blur
red ,
giv
ing
high
er e
ntro
py,
as i
s sh
own
in t
he F
igur
e be
low
. A
ny m
isreg
istra
tion,
how
ever
, w
ill l
ead
to e
dge
artif
acts
that
fur
ther
inc
reas
e th
e en
tropy
. V
ery
simila
r im
ages
can
the
refo
re b
e
regi
ster
ed b
y ite
rativ
ely
min
imiz
ing
the
entr
opy
of th
e di
ffer
ence
imag
e.
Ivo
D. D
inov
UCL
A S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cour
ses &
Stu
dent
s Fi
gure
1 H
isto
gram
is b
lurre
d if
the
imag
es a
re n
ot a
ligne
d (G
uido
Ger
ig)
12. J
oint
ent
ropy
Jo
int e
ntro
py m
easu
res
the
amou
nt o
f inf
orm
atio
n in
the
two
imag
es c
ombi
ned .
If th
ese
two
imag
es a
re to
tally
unr
elat
ed, t
hen
the
join
t ent
ropy
will
be
the
sum
of t
he
entro
pies
of
the
indi
vidu
al i
mag
es. T
he m
ore
simila
r th
e im
ages
are
, the
low
er t
he
join
t ent
ropy
com
pare
d to
the
sum
of
the
indi
vidu
al e
ntro
pies
. The
con
cept
of
join
t
entro
py c
an b
e vi
sual
ized
usin
g a
join
t hi
stogr
am c
alcu
late
d fro
m t
he i
mag
es,
as
show
n in
Fig
ure
belo
w. F
or a
ll vo
xels
in th
e ov
erla
ppin
g re
gion
s of
the
imag
es w
e
plot
the
inte
nsity
of
this
voxe
l in
imag
e A
agai
nst t
he in
tens
ity o
f th
e co
rres
pond
ing
voxe
l in
im
age
B. T
he j
oint
his
togr
am c
an b
e no
rmal
ized
by
divi
ding
by
the
tota
l
num
ber
of v
oxel
s N,
and
reg
arde
d as
a j
oint
pro
babi
lity
dens
ity f
unct
ion
(PD
F)
P(a;
b)
of i
mag
es A
and
B.
The
num
ber
of e
lem
ents
in t
he P
DF
can
eith
er b
e
dete
rmin
ed b
y th
e ra
nge
of in
tens
ity v
alue
s in
the
two
imag
es, o
r fro
m a
par
titio
ning
of th
e in
tens
ity sp
ace
into
“bi
ns”.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
Def
initi
on: T
he jo
int e
ntro
py H
(A,B
) is
ther
efor
e gi
ven
by:
∑∑
−=
ab
ba
Pb
aP
BA
H)
,(
log
),
()
,(
(2
1)
whe
re a
, b r
epre
sent
the
orig
inal
imag
e in
tens
ities
or
the
sele
cted
inte
nsity
bin
s. A
s
can
be s
een
from
the
Fig
ure
belo
w,
the
join
t hi
stogr
ams
disp
erse
or
blur
with
incr
easi
ng m
isre
gistr
atio
n an
d th
us in
crea
ses t
he e
ntro
py.
Figu
re 2
exa
mpl
e 2D
his
togr
ams
of th
e he
ad im
ages
(a) i
dent
ical
MR
imag
es, (
b) M
R a
nd C
T im
ages
(Hill
et a
l., “
Vox
el s
imila
rity
mea
sure
s fo
r au
tom
ated
imag
e re
gistr
atio
n,”
Visu
aliza
tion
in B
iom
edic
al
Com
putin
g 19
94, v
ol. P
roc.
SPI
E 23
59, p
p. 2
05–2
16, 1
994.
)
Alig
ned
Tra
nsla
ted
by 2
mm
Tra
nsla
ted
by 5
mm
Ivo
D. D
inov
UCL
A S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cour
ses &
Stu
dent
s
13.
Refe
renc
es:
[1] B
rillo
uin,
L.,
Scie
nce
and
info
rmat
ion
theo
ry A
cade
mic
Pre
ss, N
ew Y
ork,
195
6.
[2] B
rook
s, D
anie
l R.,
and
Wile
y, E
. O.,
Evol
utio
n as
Ent
ropy
, Tow
ard
a U
nifie
d
Theo
ry o
f Bio
logy
, Sec
ond
Editi
on, U
nive
rsity
of C
hica
go P
ress
, Chi
cago
, 198
8.
[3] C
ampb
ell,
Jere
my,
Gra
mm
atic
al M
an, I
nfor
mat
ion,
Ent
ropy
, Lan
guag
e, a
nd L
ife,
Sim
on a
nd S
chus
ter,
New
Yor
k, 1
982.
[4] C
over
, T. M
., an
d Th
omas
J. A
., El
emen
ts of
Info
rmat
ion
Theo
ry, J
ohn
Wile
y an
d
Sons
, New
Yor
k, 1
991.
[5] D
eLill
o, D
on, W
hite
Noi
se, V
ikin
g/Pe
ngui
n, N
ew Y
ork,
198
4.
[6] F
elle
r, W
., A
n In
trodu
ctio
n to
Pro
babi
lity
Theo
ry a
nd It
s App
licat
ions
, Wile
y,
New
Yor
k,19
57.
[7] F
eynm
an, R
icha
rd, F
eynm
an le
ctur
es o
n co
mpu
tatio
n, A
ddiso
n-W
esle
y, R
eadi
ng,
1996
.
[8] G
atlin
, L. L
., In
form
atio
n Th
eory
and
the
Livi
ng S
yste
m, C
olum
bia
Uni
vers
ity
Pres
s, N
ew Y
ork,
197
2.
[9] H
aken
, Her
man
n, In
form
atio
n an
d Se
lf-O
rgan
izat
ion,
a M
acro
scop
ic A
ppro
ach
to
Com
plex
Sys
tem
s, Sp
ringe
r-Ver
lag,
Ber
lin/N
ew Y
ork,
198
8.
[10]
Ham
min
g, R
. W.,
Erro
r det
ectin
g an
d er
ror c
orre
ctin
g co
des,
Bell
Syst.
Tec
h. J.
29 1
47, 1
950.
[11]
Ham
min
g, R
. W.,
Codi
ng a
nd in
form
atio
n th
eory
, 2nd
ed,
Pre
ntic
e-H
all,
Engl
ewoo
d Cl
i_s,
1986
.
[12]
Hill
, R.,
A fi
rst c
ours
e in
cod
ing
theo
ry C
lare
ndon
Pre
ss, O
xfor
d, 1
986.
[13]
Hod
ges,
A.,
Ala
n Tu
ring:
the
enig
ma
Vin
tage
, Lon
don,
198
3.
[14]
Hof
stadt
er, D
ougl
as R
., M
etam
agic
al T
hem
as: Q
uesti
ng fo
r the
Ess
ence
of M
ind
and
Patte
rn, B
asic
Boo
ks, N
ew Y
ork,
198
5
[15]
Jone
s, D
. S.,
Elem
enta
ry in
form
atio
n th
eory
Cla
rend
on P
ress
, Oxf
ord,
197
9.
[16]
Knu
th, E
ldon
L.,
Intro
duct
ion
to S
tatis
tical
The
rmod
ynam
ics,
McG
raw
-Hill
,
New
Yor
k, 1
966.
[17]
Lan
daue
r, R.
, Inf
orm
atio
n is
phys
ical
, Phy
s. To
day,
May
199
1 23
-29.
[18]
Lan
daue
r, R.
, The
phy
sical
nat
ure
of in
form
atio
n, P
hys.
Lett.
A, 2
17 1
88, 1
996.
[19]
van
Lin
t, J.
H.,
Codi
ng T
heor
y, S
prin
ger-V
erla
g, N
ew Y
ork/
Berli
n, 1
982.
Ivo
D. D
inov
UC
LA S
tatis
tics
http
://w
ww
.stat
.ucl
a.ed
u/~d
inov
Cou
rses
& S
tude
nts
[20]
Lip
ton,
R. J
., U
sing
DN
A to
solv
e N
P-co
mpl
ete
prob
lem
s, S
cien
ce, 2
68 5
42–
545,
Apr
. 28,
199
5.
[21]
Mac
Will
iam
s, F
. J.,
and
Sloa
ne, N
. J. A
., Th
e th
eory
of e
rror
cor
rect
ing
code
s,
Else
vier
Sci
ence
, Am
ster
dam
, 197
7.
[22]
Mar
tin, N
. F. G
., an
d En
glan
d, J
. W.,
Mat
hem
atic
al T
heor
y of
Ent
ropy
,
Add
ison
-Wes
ley,
Rea
ding
, 198
1.
[23]
Max
wel
l, J.
C.,
Theo
ry o
f hea
t Lon
gman
s, G
reen
and
Co,
Lon
don,
187
1.
[24]
von
Neu
man
n, J
ohn,
Pro
babi
listic
logi
c an
d th
e sy
nthe
sis
of re
liabl
e or
gani
sms
from
unr
elia
ble
com
pone
nts,
in a
utom
ata
stud
ies(
Shan
on,M
cCar
thy
eds)
, 195
6 .
[25]
Pap
adim
itrio
u, C
. H.,
Com
puta
tiona
l Com
plex
ity, A
ddis
on-W
esle
y, R
eadi
ng,
1994
.
[26]
Pie
rce,
Joh
n R
., A
n In
trodu
ctio
n to
Info
rmat
ion
Theo
ry –
Sym
bols
, Sig
nals
and
Noi
se, (
seco
nd re
vise
d ed
ition
), D
over
Pub
licat
ions
, New
Yor
k, 1
980.
[27]
Rom
an, S
teve
n, In
trodu
ctio
n to
Cod
ing
and
Info
rmat
ion
Theo
ry, S
prin
ger-
Ver
lag,
Ber
lin/N
ew Y
ork,
199
7.
[28]
Sam
pson
, Je_
rey
R.,
Ada
ptiv
e In
form
atio
n Pr
oces
sing
, an
Intro
duct
ory
Surv
ey,
Sprin
ger-
Ver
lag,
Ber
lin/N
ew Y
ork,
197
6.
[29]
Sch
roed
er, M
anfr
ed, F
ract
als,
Cha
os, P
ower
Law
s, M
inut
es fr
om a
n In
finite
Para
dise
, W. H
. Fre
eman
, New
Yor
k, 1
991.
[30]
Sha
nnon
, C. E
., A
mat
hem
atic
al th
eory
of c
omm
unic
atio
n B
ell S
yst.
Tech
. J. 2
7
379;
als
o p.
623
, 194
8.
[31]
Sle
pian
, D.,
ed.,
Key
pap
ers
in th
e de
velo
pmen
t of i
nfor
mat
ion
theo
ry IE
EE
Pres
s, N
ew Y
ork,
197
4.
[32]
Tur
ing,
A. M
., O
n co
mpu
tabl
e nu
mbe
rs, w
ith a
n ap
plic
atio
n to
the
Ents
chei
dung
spro
blem
, Pro
c. L
ond.
Mat
h. S
oc. S
er. 2
42,
230
; se
e al
so P
roc.
Lon
d.
Mat
h. S
oc. S
er. 2
43,
544
, 193
6.
[33]
Zur
ek, W
. H.,
Ther
mod
ynam
ic c
ost o
f com
puta
tion,
alg
orith
mic
com
plex
ity a
nd
the
info
rmat
ion
met
ric, N
atur
e 34
1 11
9-12
4, 1
989.
[34]
Buz
ug T
. M. a
nd W
eese
J.,
“Im
age
regi
stra
tion
for
DSA
qua
lity
enha
ncem
ent”
,
Com
pute
rized
Imag
ing
Gra
phic
s 22
103
1998
.
[35]
Tom
Car
ter’
s Not
es: h
ttp://
cogs
.csu
stan
.edu
/~to
m/