Download pdf - Information Theory - 会津大学

Information Theory

Mohamed Hamada

Software Engineering LabThe University of Aizu

Email: [email protected]: http://www.u-aizu.ac.jp/~hamada

Evaluation

• Active sheets 20%

• Midterm Exam 30%

• Final Exam 50 %

1

Goals

• Understand the concepts of information entropy and channel capacity

• Understand the digital communication model and its components

• Understand how the components operate

• Understand data compression

• Understand error detection and correction 2

Course Outline

• Introduction to set theory & probability

• Introduction to information theory

• Coding techniques & data compression

• Information Entropy

• Communication Channel

• Error Detection and Correction

3

Final Overview

4

Lecture 1

Introduction to Set Theory and Probability

• 1. Sets, Operations on sets

• 2. Trial, Probability space, Events

• 3. Random variables, Probability distribution

• 4. Expected values, Variance

• 5. Conditional Probability

• 6. Bayes Theory

5

Lecture 2

• Overview of Information Theory

• Digital Communication

• History

• What is Information Theory

6

Information Theory

Communication Theory

Probability Theory

Statistics

EconomyMathematics

Physics

Computer Science

Others

OVERVIEW OF INFORMATION THEORY FRAMEWORK

7

DIGITAL COMMUNICATION

8

DIGITAL COMMUNICATION

9

Lecture 3

• Watching a Coding Video (50 mins.)


• Information Source

• Introduction to Source Coding


10

Channelencoder decoderSource

decoding

Channel

decodingSource

coding

Channel

coding

Sent

messagessymbols

Error Correction

Channel Capacity

Compression

Source Entropy

Decompression

Capacity vs Efficiency

Received

messages

INFORMATION TRANSFER ACROSS CHANNELS

source receiver

11

Channel

Digital Communication Systems

Destination

Source Encoder

Channel Encoder

Modulator

Source Decoder

Channel Decoder

De-Modulator

InformationSource

Memoryless

Stationary

Discrete

12

Memoryless means the generated symbols (of a source message ) are independent.

Memoryless

The idea of stationary of a source demands no change with time

Stationary

The source produces independent symbols in different unit times

Discrete

Information Source

13

Lecture 4

• What is Entropy

• Information Source

• Measure of Information

• Self-Information

• Unit of Information

• Entropy

• Properties

14

Measure of Information• The amount of Information gained after

observing sk which has a probability pk

1I logk

k

sp

15

Unit of Information• Depends on the BASE of the Logarithm

2

1I logk

k

sp

10

1I logk

k

sp

e

1I logk

k

sp

Bits

Nats

Hartleys

16

Lecture 5

• Entropy

• Conditional Entropy

• Example

• Joint Entropy

• Chain Rule

17

Entropy H(S)• Entropy is the average information content of

a source

K-1

2k=0

H =E I

1H = log

k

kk

S s

S pp

18

Conditional Entropy H(Y|X)

Is the amount of information contained in Y such that X is given

19

H (Y |X) = Σj P (X=vj) H (Y | X = vj)

Joint EntropyIs the amount of information contained in both events X and Y

H(X, Y) = -Σ p(x,y) log p(x,y)X,Y

20

H(X,Y) = H(X) + H(Y|X)

Chain RuleChain Rule

Relationship between conditional and joint entropy

21

Lecture 6

• Entropy and Data Compression

• Uniquely decodable codes

• Prefix Code

• Average Code Length

• Shannon’s First Theorem

• Kraft-McMillan Inequality

• Code Efficiency

• Code Extension

22

Prefix Coding (Instantaneous code)

• A prefix code is defined as a code in which nocodeword is the prefix of some other code word.

• A prefix code is uniquely decodable.

23

Average Code Length

• Source has K symbols• Each symbol sk has probability pk

• Each symbol sk is represented by a codeword ck of length lk bits

• Average codeword length

Information Source

Source Encoder

sk ck

1

0

K

k kk

L p l

24

Shannon’s First Theorem:The Source Coding Theorem

The outputs of an information source cannot be represented by a source code whose average length is less than the source entropy

HL S

25

Kraft-McMillan Inequality

• If codeword lengths of a code satisfy the Kraft McMillan’s inequality, then a prefix code with these codeword lengths can beconstructed.

1

0

2 1k

Kl

k

26

Code Efficiency η

• An efficient code means η1

H SL

27

Lecture 7

• Mid Term Exam

28

Lecture 8

• Source Coding Techniques

• Huffman Code

• Two-pass Huffman Code

• Lemple-Ziv Encoding

• Lemple-Ziv Decoding

29

Channel

InformationSource

Destination

Source Encoder

Channel Encoder

Modulator

Source Decoder

Channel Decoder

De-Modulator

1. Huffman Code．

2. Two-pass Huffman Code．

4. Fano code．

5. Shannon Code．

6. Arithmetic Code．

3. Lemple-Ziv Code．

ＤａｔａＣｏｍｐｒｅｓｓｉｏｎ

30

Huffman Code．

1 take together smallest probabilites: P(i) + P(j) 2 replace symbol i and j by new symbol 3 go to 1 - until end

Application examples: JPEG, MPEG, MP3

31

Another Solution BSource Symbol

sk

Stage I Stage II Stage III Stage IV Code

s2 0.4 0.4 0.4 0.6 1

s1 0.2 0.2 0.4 0.4 01

s3 0.2 0.2 0.2 000

s0 0.1 0.2 0010

s4 0.1 0011

0

10

1

0

1

0

1

32

Two-pass Huffman Code．

This method is used when the probability of symbols in the information source is unknown. So we first can estimate this probability by calculating the number ofoccurrence of the symbols in the given message then we can find the possible Huffman codes. This can be summarized by the following two passes.

Pass 1 : Measure the occurrence possibility of each character in the message

Pass 2 : Make possible Huffman codes

33

Source Coding Techniques2. Two-pass Huffman Code．

Example

Consider the message: M=ABABABABABACADABACADABACADABACAD

0

L(M)=32 #(A)=16 p(A)=16/32=0.5#(B)=8 p(B)=8/32=0.25#(C)=4 p(C)=4/32=0.125#(D)=4 p(D)=4/32=0.125

34

•Universal: effective for different types of data

•Lossless: no errors at reproduction

Applications:GIF, TIFF, V.42bis modem compression standard, PostScript Level 2

35

Lempel-Ziv Coding

0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…

Codebook Index

1 2 3 4 5 6 7 8 9

Subsequence 0 1

Representation

Encoding

Input:

36

Lempel-Ziv Coding Lempel-Ziv Coding Example0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 1 0 1…

Codebook Index

1 2 3 4 5 6 7 8 9

Subsequence 0 1 00 01 011 10 010 100 101

Representation 11 12 42 21 41 61 62

Source Code 0010 0011 1001 0100 1000 1100 1101

0010 0011 1001 0100 1000 1100 1101Source encoded bits

Information bits

37

LZ Encoding exampleConsider the input message:

Input string: a a b a a c a b c a b

Input: a a b a a c a b c a b

a a

a a ba a b a

a a b a a ca a b a a c a

a a b a a c a b ca a b a a c a b c a b

aa not in dictionry, output 0 add aa to dictionarycontinue with a, store ab in dictionary

continue with b, store ba in dictionary

aa in dictionary, aac not,

output update

0 aa 30 ab 41 ba 5

3 aac 62 ca 74 abc 8

7 cab 9

LZ Encoderaabaacabcab 00132471

a 0b 1c 2

LZ encoding process:

Dictionary

Initial

a a b a a c a b c a b 1

38

UNIVERSAL (LZW) (decoder)

1. Start with basic symbol set

2. Read a code c from the compressed file.- The address c in the dictionary determines the segment w.- write w in the output file.

3. Add wa to the dictionary: a is the first letter of the next segment

39

LZ Decoding example

a

a a !a a b .

a a b a a .a a b a a c .

a a b a a c a b .a a b a a c a b c a .

Output a

output a determines ? = a, update aaoutput 1 determines !=b, update ab

Input update

0aa 30ab 41

ba 53aac 62ca 74abc 87

Output String:

?

LZ Decoder aabaacabcab00132471

a 0b 1c 2

Dictionary

Initial

1a a b a a c a b c a b

40

Lecture 9

• Source Coding Techniques

• Fano Code

• Coding Examples

• Shannon Code

• Code Comparison

41

Fano Code．

The Fano code is performed as follows:

1. arrange the information source symbols in order of decreasing probability

2. divide the symbols into two equally probable groups, as possible as you can

3. each group receives one of the binary symbols (i.e. 0 or 1) as the first symbol

4. repeat steps 2 and 3 per group as many times as this is possible.

5. stop when no more groups to divide

42

Example 3:

4. Fano Code．

Symbol Probability Fano CodeA 0.39 00

B 0.18 01

C 0.15 10

D 0.15 110

E 0.13 111

43

0

0 1

0 1 0 1

1

0 1

0 1 01

0 1

Shannon Code．

The Shannon code is performed as follows:

1. calculate a series of cumulative probabilities

k

iik

pq1

, k=1,2,…,n

2. calculate the code length for each symbol using log（1\pi ）≤ li < log （1\pi ）+ 1

3. write qk in the form c1 ２-１ + c2 2-2 + … + cli 2-li where each ci is either 0 or 1

44

Symbol Probability qk Length li Shannon CodeA 1/4

B 1/4

C 1/8

D 1/8

E 1/16

F 1/16

G 1/32

H 1/32

I 1/32

J 1/32

5. Shannon Code．

0

1/41/25/8

3/413/16

7/8

29/3215/1631/32

223

3

445

555

0001

100101

11001101

11100111011111011111

Example 4:

45

k

iik

pq1

log（1\pi ）≤ li < log （1\pi ）+ 1

c1 ２-１ + c2 2-2 + … + cli 2-li

Lecture 10

•Source Coding Techniques

•Arithmetic Coding

•Arithmetic Vs. Huffman Coding

•Coding Examples

•Arithmetic Decoding

•Decoding Examples

46

ArithmeticEncoding ( Message )1. CurrentInterval = [0, 1);While the end of message is not reached

2. Read letter xi from the message;3. Divid CurrentInterval into subintervals IRCurrentInterval;

Output any number from the CurrentInterval (usually its left boundary);

Arithmetic Code

Coding

47

Arithmetic Code

Coding

Example 1 A B B C #input message:

Xi Current interval Subintervals

1. CurrentInterval = [0, 1);

2. Read Xi

A [0, 1)

A B C #

0.4 0.3 0.1 0.2

48

Arithmetic Code

Coding

Example 1 A B B C #input message:

Xi Current interval Subintervals

A [0, 1) [0, 0.4) , [0.4, 0.7), [0.7, 0.8), [0.8, 1)

[0, 0.4) B [0, 0.16) , [0.16, 0.28), [0.28, 0.32), [0.32, 0.4)

[0.16, 0.28)C

[ 0.16, 0.208) , [0.208, 0.244), [0.244, 0.256), [0.256, 0.28)B[0.208, 0.244) [0.208, 0.2224) , [0.2224, 0.2332), [0.2332, 0.2368), [0.2368, 0.244)

[0.2332, 0.2368)

2. Read Xi

# [0.2332, 0.23464) , [0.23464, 0.23572), [0.23572, 0.23608), [0.23608, 0.2368)

[0.23608, 0.2368)

# is the end of input message Stop Return current interval [0.23608, 0.2368)

A B C #

0.4 0.3 0.1 0.2

49

Arithmetic Code

50

ArithmeticDecoding ( Codeword )0. CurrentInterval = [0, 1);While(1)

1. Divid CurrentInterval into subintervals IRCurrentInterval;2. Determine the subintervali of CurrentInterval to which

Codeword belongs;3. Output letter xi corresponding to this subinterval;4. If xi is the symbol ‘#’

Return;5. CurrentInterval = subintervali in IRCurrentInterval;

Decoding

Arithmetic Code

Decoding

input codeword:

Current interval Subintervals

0. CurrentInterval = [0, 1);

0.23608

Output

Example

[0, 1)

A B C #

0.4 0.3 0.1 0.2

51

Arithmetic Code

Decoding

Current interval Subintervals

[0, 1)Output

[0, 0.4) , [0.4, 0.7), [0.7, 0.8), [0.8, 1)

input codeword: 0.23608Example

A[0, 0.4)

Similarly we repeat the algorithm steps 1 to 5 until the output symbol = ‘#’

[0, 0.16) , [0.16, 0.28), [0.28, 0.32), [0.32, 0.4) B[0.16, 0.28)

C[ 0.16, 0.208) , [0.208, 0.244), [0.244, 0.256), [0.256, 0.28) B

[0.208, 0.244) [0.208, 0.2224) , [0.2224, 0.2332), [0.2332, 0.2368), [0.2368, 0.244)

[0.2332, 0.2368) #[0.2332, 0.23464) , [0.23464, 0.23572), [0.23572, 0.23608), [0.23608, 0.2368)

4. If xi is the symbol ‘#’ Yes Stop

Return the output message: A B B C #

A B C #

0.4 0.3 0.1 0.2

52

Lecture 11

•Memoryless information source．

•Information source with memory:

- stochastic process

- Markov Process

- ergodic process

53

Channel

Destination

Source Encoder

Channel Encoder

Modulator

Source Decoder

Channel Decoder

De-Modulator

InformationSource

Memoryless

Stationary

Discrete

Markov

Ergodic

Stochastic

54

Information Source

A Markov chain is said to be Ergodic if, after a certain finite number of steps, it is possible to go from any state to any other state with a nonzero probability.

Stochastic

Markov

Ergodic

55

Markov information source is an information source with memory in which the probability of a symbol occurring in a message will depend on a finite number of preceding symbols

Stochastic process: is any sequence of random variables from some probability space.

Lecture 12

• Communication Channel

• Noiseless binary channel

• Binary Symmetric Channel (BSC)

• Symmetric Channel

• Mutual Information

• Channel Capacity

56

Channel

InformationSource

Destination

Source Encoder

Channel Encoder

Modulator

Source Decoder

Channel Decoder

De-Modulator

Noiseless Binary Channel

Binary Symmetric Channel

Symmetric Channel

MarkovErgodic

Stochastic

Channel Capacity

ＭｕｔｕａｌＩｎｆｏｒｍａｔｉｏｎ

ＣｏｎｄｉｔｉｏｎａｌＥｎｔｒｏｐｙ

57

Noiseless binary channelNoiseless binary channel

0 0

Channel

1 1

Transition Matrix

0 1

0 1 0

1 0 1

p(y | x) =

58

Binary Symmetric Channel (BSC)

0 01

1 10

0 011

BSC Channel

BSC Channel

1-pp

BSC Channel

p 1-p

p

1-pp

(Noisy channel)

59

X Y

Channel

Transition Matrix

p(y | x) =

Symmetric Channel(Noisy channel)

In the transmission matrix of this channel , all the rows are permutations of each other and so the columns.

Example:

0.3 0.2 0.5

0.5 0.3 0.20.2 0.5 0.3

y1 y2 y3x1

x2

x3

60

• I(X,Y)=H(Y)-H(Y|X)• I(X,Y)=H(X)-H(X|Y)• I(X,Y)=H(X)+H(Y)-H(X,Y)• I(X,Y)=I(Y,X)• I(X,X)=H(X)

Mutual Information (MI)Note That:

H(X) H(Y)

H(X|Y) H(Y|X)I(X,Y)

61

Channel Capacity

The channel capacity C is the highest rate, in bits per channeluse, at which information can be sent with arbitrarily lowprobability of error

C = Max I(X , Y)p(x)

Where the maximum is taken over all possible inputdistributions p(x)

62

Lecture 13

• Channel Coding/Decoding

• Hamming Method:

- Hamming Distance

- Hamming Weight

• Hamming (4, 7)

63

Channel

InformationSource

Destination

Source Encoder

Channel Encoder

Modulator

Source Decoder

Channel Decoder

De-Modulator

Hamming

64

Example: Hamming (7, 4) codes

• Generating matrix

• G = I4 P

We assume that the sequence of symbols generated by the information source is divided up into blocks of 4symbols. Codewords have length 7

C1=u2+u3+u4

C2=u1+u3+u4

C3=u1+u2+u4

Where + is modulo 2: 0+0=1+1=0 and 1+0=0+1=1

and ui are I4 elements

1000 011

0100 101

0010 110

0001 111

=

I4 P

C1C2 C3u1

u2u4u3

65

Hamming (7, 4) Syndrome decoding

Let G = [ Ik P ]

H is the parity check matrix HT is the Transpose matrix of H

PT In-kStep 1. construct H =

Step 3. Determine the syndrome S= y. HT (y is the received message)

Step 4. If S=0 then no error occurs during transmisstion of information

Step 5. If S≠0 then S gives a binary representation of the error position(we assume only one error ocuured)

Step 2. Arrange the columns of H in order of increasing binary values

For Hamming(7, 4) code: n=7 and k=4

PT is the Transpose matrix of P

66

Example:

H =

1000 011

0100 101

0010 110

0001 111

G =

0111 1001011 0101101 001

Step 1

H =000111101100111010101

Step 2

Suppose that y=(1111011) is received

Step 3

S= y. HT = (101) = (5)10

Step 5

An error had occurred at position 5 in the received message

y =(1111011)

The correct sent message is then = (1111111)

PT In-k

PIk

n=7 and k=4

67

Comments for Final ExamTime: Jan. 31, 2012 Period 2 Room M4

16 Questions: 13 Multi choice 3 Fill in the space

Closed Book: No Text book No Handouts

Allowed: Dictionary Calculator Empty sheet of paper

Cover: From Lecture 2 until the last Lecture

68