Information Theory
Mohamed Hamada
Software Engineering LabThe University of Aizu
Email: [email protected]: http://www.u-aizu.ac.jp/~hamada
Evaluation
• Active sheets 20%
• Midterm Exam 30%
• Final Exam 50 %
1
Goals
• Understand the concepts of information entropy and channel capacity
• Understand the digital communication model and its components
• Understand how the components operate
• Understand data compression
• Understand error detection and correction 2
Course Outline
• Introduction to set theory & probability
• Introduction to information theory
• Coding techniques & data compression
• Information Entropy
• Communication Channel
• Error Detection and Correction
3
Final Overview
4
Lecture 1
Introduction to Set Theory and Probability
• 1. Sets, Operations on sets
• 2. Trial, Probability space, Events
• 3. Random variables, Probability distribution
• 4. Expected values, Variance
• 5. Conditional Probability
• 6. Bayes Theory
5
Lecture 2
• Overview of Information Theory
• Digital Communication
• History
• What is Information Theory
6
Information Theory
Communication Theory
Probability Theory
Statistics
EconomyMathematics
Physics
Computer Science
Others
OVERVIEW OF INFORMATION THEORY FRAMEWORK
7
DIGITAL COMMUNICATION
8
DIGITAL COMMUNICATION
9
Lecture 3
• Watching a Coding Video (50 mins.)
• What is Information Theory
• Information Source
• Introduction to Source Coding
• What is Information Theory
10
Channelencoder decoderSource
decoding
Channel
decodingSource
coding
Channel
coding
Sent
messagessymbols
Error Correction
Channel Capacity
Compression
Source Entropy
Decompression
Capacity vs Efficiency
Received
messages
INFORMATION TRANSFER ACROSS CHANNELS
source receiver
11
Channel
Digital Communication Systems
Destination
Source Encoder
Channel Encoder
Modulator
Source Decoder
Channel Decoder
De-Modulator
InformationSource
Memoryless
Stationary
Discrete
12
Memoryless means the generated symbols (of a source message ) are independent.
Memoryless
The idea of stationary of a source demands no change with time
Stationary
The source produces independent symbols in different unit times
Discrete
Information Source
13
Lecture 4
• What is Entropy
• Information Source
• Measure of Information
• Self-Information
• Unit of Information
• Entropy
• Properties
14
Measure of Information• The amount of Information gained after
observing sk which has a probability pk
1I logk
k
sp
15
Unit of Information• Depends on the BASE of the Logarithm
2
1I logk
k
sp
10
1I logk
k
sp
e
1I logk
k
sp
Bits
Nats
Hartleys
16
Lecture 5
• Entropy
• Conditional Entropy
• Example
• Joint Entropy
• Chain Rule
17
Entropy H(S)• Entropy is the average information content of
a source
K-1
2k=0
H =E I
1H = log
k
kk
S s
S pp
18
Conditional Entropy H(Y|X)
Is the amount of information contained in Y such that X is given
19
H (Y |X) = Σj P (X=vj) H (Y | X = vj)
Joint EntropyIs the amount of information contained in both events X and Y
H(X, Y) = -Σ p(x,y) log p(x,y)X,Y
20
H(X,Y) = H(X) + H(Y|X)
Chain RuleChain Rule
Relationship between conditional and joint entropy
21
Lecture 6
• Entropy and Data Compression
• Uniquely decodable codes
• Prefix Code
• Average Code Length
• Shannon’s First Theorem
• Kraft-McMillan Inequality
• Code Efficiency
• Code Extension
22
Prefix Coding (Instantaneous code)
• A prefix code is defined as a code in which nocodeword is the prefix of some other code word.
• A prefix code is uniquely decodable.
23
Average Code Length
• Source has K symbols• Each symbol sk has probability pk
• Each symbol sk is represented by a codeword ck of length lk bits
• Average codeword length
Information Source
Source Encoder
sk ck
1
0
K
k kk
L p l
24
Shannon’s First Theorem:The Source Coding Theorem
The outputs of an information source cannot be represented by a source code whose average length is less than the source entropy
HL S
25
Kraft-McMillan Inequality
• If codeword lengths of a code satisfy the Kraft McMillan’s inequality, then a prefix code with these codeword lengths can beconstructed.
1
0
2 1k
Kl
k
26
Code Efficiency η
• An efficient code means η1
H SL
27
Lecture 7
• Mid Term Exam
28
Lecture 8
• Source Coding Techniques
• Huffman Code
• Two-pass Huffman Code
• Lemple-Ziv Encoding
• Lemple-Ziv Decoding
29
Channel
InformationSource
Destination
Source Encoder
Channel Encoder
Modulator
Source Decoder
Channel Decoder
De-Modulator
1. Huffman Code.
2. Two-pass Huffman Code.
4. Fano code.
5. Shannon Code.
6. Arithmetic Code.
3. Lemple-Ziv Code.
Data Compression
30
Huffman Code.
1 take together smallest probabilites: P(i) + P(j) 2 replace symbol i and j by new symbol 3 go to 1 - until end
Application examples: JPEG, MPEG, MP3
31
Another Solution BSource Symbol
sk
Stage I Stage II Stage III Stage IV Code
s2 0.4 0.4 0.4 0.6 1
s1 0.2 0.2 0.4 0.4 01
s3 0.2 0.2 0.2 000
s0 0.1 0.2 0010
s4 0.1 0011
0
10
1
0
1
0
1
32
Two-pass Huffman Code.
This method is used when the probability of symbols in the information source is unknown. So we first can estimate this probability by calculating the number ofoccurrence of the symbols in the given message then we can find the possible Huffman codes. This can be summarized by the following two passes.
Pass 1 : Measure the occurrence possibility of each character in the message
Pass 2 : Make possible Huffman codes
33
Source Coding Techniques2. Two-pass Huffman Code.
Example
Consider the message: M=ABABABABABACADABACADABACADABACAD
0
L(M)=32 #(A)=16 p(A)=16/32=0.5#(B)=8 p(B)=8/32=0.25#(C)=4 p(C)=4/32=0.125#(D)=4 p(D)=4/32=0.125
34
•Universal: effective for different types of data
•Lossless: no errors at reproduction
Applications:GIF, TIFF, V.42bis modem compression standard, PostScript Level 2
35
Lempel-Ziv Coding
0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook Index
1 2 3 4 5 6 7 8 9
Subsequence 0 1
Representation
Encoding
Input:
36
Lempel-Ziv Coding Lempel-Ziv Coding Example0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…
Codebook Index
1 2 3 4 5 6 7 8 9
Subsequence 0 1 00 01 011 10 010 100 101
Representation 11 12 42 21 41 61 62
Source Code 0010 0011 1001 0100 1000 1100 1101
0010 0011 1001 0100 1000 1100 1101Source encoded bits
Information bits
37
LZ Encoding exampleConsider the input message:
Input string: a a b a a c a b c a b
Input: a a b a a c a b c a b
a a
a a ba a b a
a a b a a ca a b a a c a
a a b a a c a b ca a b a a c a b c a b
aa not in dictionry, output 0 add aa to dictionarycontinue with a, store ab in dictionary
continue with b, store ba in dictionary
aa in dictionary, aac not,
output update
0 aa 30 ab 41 ba 5
3 aac 62 ca 74 abc 8
7 cab 9
LZ Encoderaabaacabcab 00132471
a 0b 1c 2
LZ encoding process:
Dictionary
Initial
a a b a a c a b c a b 1
38
UNIVERSAL (LZW) (decoder)
1. Start with basic symbol set
2. Read a code c from the compressed file.- The address c in the dictionary determines the segment w.- write w in the output file.
3. Add wa to the dictionary: a is the first letter of the next segment
39
LZ Decoding example
a
a a !a a b .
a a b a a .a a b a a c .
a a b a a c a b .a a b a a c a b c a .
Output a
output a determines ? = a, update aaoutput 1 determines !=b, update ab
Input update
0aa 30ab 41
ba 53aac 62ca 74abc 87
Output String:
?
LZ Decoder aabaacabcab00132471
a 0b 1c 2
Dictionary
Initial
1a a b a a c a b c a b
40
Lecture 9
• Source Coding Techniques
• Fano Code
• Coding Examples
• Shannon Code
• Code Comparison
41
Fano Code.
The Fano code is performed as follows:
1. arrange the information source symbols in order of decreasing probability
2. divide the symbols into two equally probable groups, as possible as you can
3. each group receives one of the binary symbols (i.e. 0 or 1) as the first symbol
4. repeat steps 2 and 3 per group as many times as this is possible.
5. stop when no more groups to divide
42
Example 3:
4. Fano Code.
Symbol Probability Fano CodeA 0.39 00
B 0.18 01
C 0.15 10
D 0.15 110
E 0.13 111
43
0
0 1
0 1 0 1
1
0 1
0 1 01
0 1
Shannon Code.
The Shannon code is performed as follows:
1. calculate a series of cumulative probabilities
k
iik
pq1
, k=1,2,…,n
2. calculate the code length for each symbol using log(1\pi )≤ li < log (1\pi )+ 1
3. write qk in the form c1 2-1 + c2 2-2 + … + cli 2-li where each ci is either 0 or 1
44
Symbol Probability qk Length li Shannon CodeA 1/4
B 1/4
C 1/8
D 1/8
E 1/16
F 1/16
G 1/32
H 1/32
I 1/32
J 1/32
5. Shannon Code.
0
1/41/25/8
3/413/16
7/8
29/3215/1631/32
223
3
445
555
0001
100101
11001101
11100111011111011111
Example 4:
45
k
iik
pq1
log(1\pi )≤ li < log (1\pi )+ 1
c1 2-1 + c2 2-2 + … + cli 2-li
Lecture 10
•Source Coding Techniques
•Arithmetic Coding
•Arithmetic Vs. Huffman Coding
•Coding Examples
•Arithmetic Decoding
•Decoding Examples
46
ArithmeticEncoding ( Message )1. CurrentInterval = [0, 1);While the end of message is not reached
2. Read letter xi from the message;3. Divid CurrentInterval into subintervals IRCurrentInterval;
Output any number from the CurrentInterval (usually its left boundary);
Arithmetic Code
Coding
47
Arithmetic Code
Coding
Example 1 A B B C #input message:
Xi Current interval Subintervals
1. CurrentInterval = [0, 1);
2. Read Xi
A [0, 1)
A B C #
0.4 0.3 0.1 0.2
48
Arithmetic Code
Coding
Example 1 A B B C #input message:
Xi Current interval Subintervals
A [0, 1) [0, 0.4) , [0.4, 0.7), [0.7, 0.8), [0.8, 1)
[0, 0.4) B [0, 0.16) , [0.16, 0.28), [0.28, 0.32), [0.32, 0.4)
[0.16, 0.28)C
[ 0.16, 0.208) , [0.208, 0.244), [0.244, 0.256), [0.256, 0.28)B[0.208, 0.244) [0.208, 0.2224) , [0.2224, 0.2332), [0.2332, 0.2368), [0.2368, 0.244)
[0.2332, 0.2368)
2. Read Xi
# [0.2332, 0.23464) , [0.23464, 0.23572), [0.23572, 0.23608), [0.23608, 0.2368)
[0.23608, 0.2368)
# is the end of input message Stop Return current interval [0.23608, 0.2368)
A B C #
0.4 0.3 0.1 0.2
49
Arithmetic Code
50
ArithmeticDecoding ( Codeword )0. CurrentInterval = [0, 1);While(1)
1. Divid CurrentInterval into subintervals IRCurrentInterval;2. Determine the subintervali of CurrentInterval to which
Codeword belongs;3. Output letter xi corresponding to this subinterval;4. If xi is the symbol ‘#’
Return;5. CurrentInterval = subintervali in IRCurrentInterval;
Decoding
Arithmetic Code
Decoding
input codeword:
Current interval Subintervals
0. CurrentInterval = [0, 1);
0.23608
Output
Example
[0, 1)
A B C #
0.4 0.3 0.1 0.2
51
Arithmetic Code
Decoding
Current interval Subintervals
[0, 1)Output
[0, 0.4) , [0.4, 0.7), [0.7, 0.8), [0.8, 1)
input codeword: 0.23608Example
A[0, 0.4)
Similarly we repeat the algorithm steps 1 to 5 until the output symbol = ‘#’
[0, 0.16) , [0.16, 0.28), [0.28, 0.32), [0.32, 0.4) B[0.16, 0.28)
C[ 0.16, 0.208) , [0.208, 0.244), [0.244, 0.256), [0.256, 0.28) B
[0.208, 0.244) [0.208, 0.2224) , [0.2224, 0.2332), [0.2332, 0.2368), [0.2368, 0.244)
[0.2332, 0.2368) #[0.2332, 0.23464) , [0.23464, 0.23572), [0.23572, 0.23608), [0.23608, 0.2368)
4. If xi is the symbol ‘#’ Yes Stop
Return the output message: A B B C #
A B C #
0.4 0.3 0.1 0.2
52
Lecture 11
•Memoryless information source.
•Information source with memory:
- stochastic process
- Markov Process
- ergodic process
53
Channel
Destination
Source Encoder
Channel Encoder
Modulator
Source Decoder
Channel Decoder
De-Modulator
InformationSource
Memoryless
Stationary
Discrete
Markov
Ergodic
Stochastic
54
Information Source
A Markov chain is said to be Ergodic if, after a certain finite number of steps, it is possible to go from any state to any other state with a nonzero probability.
Stochastic
Markov
Ergodic
55
Markov information source is an information source with memory in which the probability of a symbol occurring in a message will depend on a finite number of preceding symbols
Stochastic process: is any sequence of random variables from some probability space.
Lecture 12
• Communication Channel
• Noiseless binary channel
• Binary Symmetric Channel (BSC)
• Symmetric Channel
• Mutual Information
• Channel Capacity
56
Channel
InformationSource
Destination
Source Encoder
Channel Encoder
Modulator
Source Decoder
Channel Decoder
De-Modulator
Noiseless Binary Channel
Binary Symmetric Channel
Symmetric Channel
MarkovErgodic
Stochastic
Channel Capacity
Mutual Information
Conditional Entropy
57
Noiseless binary channelNoiseless binary channel
0 0
Channel
1 1
Transition Matrix
0 1
0 1 0
1 0 1
p(y | x) =
58
Binary Symmetric Channel (BSC)
0 01
1 10
0 011
BSC Channel
BSC Channel
1-pp
BSC Channel
p 1-p
p
1-pp
(Noisy channel)
59
X Y
Channel
Transition Matrix
p(y | x) =
Symmetric Channel(Noisy channel)
In the transmission matrix of this channel , all the rows are permutations of each other and so the columns.
Example:
0.3 0.2 0.5
0.5 0.3 0.20.2 0.5 0.3
y1 y2 y3x1
x2
x3
60
• I(X,Y)=H(Y)-H(Y|X)• I(X,Y)=H(X)-H(X|Y)• I(X,Y)=H(X)+H(Y)-H(X,Y)• I(X,Y)=I(Y,X)• I(X,X)=H(X)
Mutual Information (MI)Note That:
H(X) H(Y)
H(X|Y) H(Y|X)I(X,Y)
61
Channel Capacity
The channel capacity C is the highest rate, in bits per channeluse, at which information can be sent with arbitrarily lowprobability of error
C = Max I(X , Y)p(x)
Where the maximum is taken over all possible inputdistributions p(x)
62
Lecture 13
• Channel Coding/Decoding
• Hamming Method:
- Hamming Distance
- Hamming Weight
• Hamming (4, 7)
63
Channel
InformationSource
Destination
Source Encoder
Channel Encoder
Modulator
Source Decoder
Channel Decoder
De-Modulator
Hamming
64
Example: Hamming (7, 4) codes
• Generating matrix
• G = I4 P
We assume that the sequence of symbols generated by the information source is divided up into blocks of 4symbols. Codewords have length 7
C1=u2+u3+u4
C2=u1+u3+u4
C3=u1+u2+u4
Where + is modulo 2: 0+0=1+1=0 and 1+0=0+1=1
and ui are I4 elements
1000 011
0100 101
0010 110
0001 111
=
I4 P
C1C2 C3u1
u2u4u3
65
Hamming (7, 4) Syndrome decoding
Let G = [ Ik P ]
H is the parity check matrix HT is the Transpose matrix of H
PT In-kStep 1. construct H =
Step 3. Determine the syndrome S= y. HT (y is the received message)
Step 4. If S=0 then no error occurs during transmisstion of information
Step 5. If S≠0 then S gives a binary representation of the error position(we assume only one error ocuured)
Step 2. Arrange the columns of H in order of increasing binary values
For Hamming(7, 4) code: n=7 and k=4
PT is the Transpose matrix of P
66
Example:
H =
1000 011
0100 101
0010 110
0001 111
G =
0111 1001011 0101101 001
Step 1
H =000111101100111010101
Step 2
Suppose that y=(1111011) is received
Step 3
S= y. HT = (101) = (5)10
Step 5
An error had occurred at position 5 in the received message
y =(1111011)
The correct sent message is then = (1111111)
PT In-k
PIk
n=7 and k=4
67
Comments for Final ExamTime: Jan. 31, 2012 Period 2 Room M4
16 Questions: 13 Multi choice 3 Fill in the space
Closed Book: No Text book No Handouts
Allowed: Dictionary Calculator Empty sheet of paper
Cover: From Lecture 2 until the last Lecture
68