Entropy 2011

5/20/2018 Entropy 2011

1/29

February 3, 2010 Harvard QR48 1

Coding and Entropy

5/20/2018 Entropy 2011

2/29

February 3, 2010 2

Squeezing out the Air

Suppose you want to ship pillows in boxes andare charged by the size of the box

Lossless data compression

Entropy = lower limit of compressibility

Harvard QR48

5/20/2018 Entropy 2011

3/29

February 3, 2010 3

Claude Shannon (1916-2001)A Mathematical Theory of Communication (1948)

Harvard QR48

5/20/2018 Entropy 2011

4/29

February 3, 2010 4

Communication over a Channel

Source Coded Bits Received Bits Decoded Message

S X Y T

Channel

symbols bits bits symbolsEncode bits before putting them in the channel

Decode bits when they come out of the channel

E.g. the transformation from Sinto Xchangesyea --> 1 nay --> 0

Changing Y into Tdoes the reverse

For now, assume no noise in the channel, i.e. X=Y

Harvard QR48

5/20/2018 Entropy 2011

5/29

February 3, 2010 5

Example: TelegraphySource English letters -> Morse Code

D -..

-.. D

Baltimore

Washington-..

Harvard QR48

5/20/2018 Entropy 2011

6/29

February 3, 2010 6

Low and High Information ContentMessages The more frequent a message is, the less information it

conveys when it occurs

Two weather forecast messages:

Bos:

LA:

In LA Sunny is a low information message and cloudy isa high information message

Harvard QR48

5/20/2018 Entropy 2011

7/29February 3, 2010 7

Harvard Grades

Less information in Harvard grades now than in recentpast

% A A- B+ B B- C+

2005 24 25 21 13 6 2

1995 21 23 20 14 8 3

1986 14 19 21 17 10 5

Harvard QR48

5/20/2018 Entropy 2011

8/29February 3, 2010 8

Fixed Length Codes (Block Codes)

Example: 4 symbols, A, B, C, D

A=00, B=01, C=10, D=11

In general, with nsymbols, codes need to be oflength lg n, rounded up

For English text, 26 letters + space = 27 symbols,length = 5 since 24< 27 < 25

(replace all punctuation marks by space)

AKA block codes

Harvard QR48

5/20/2018 Entropy 2011

9/29February 3, 2010 9

Modeling the Message Source

Characteristics of the stream of messagescoming from the source affect the choice ofthe coding method

We need a model for a source of Englishtext that can be described and analyzedmathematically

Source Destination

Harvard QR48

5/20/2018 Entropy 2011

10/29February 3, 2010 10

How can we improve on block codes?

Simple 4-symbol example: A, B, C, D

If that is all we know, need 2 bits/symbol

What if we know symbol frequencies?

Use shorter codes for more frequent symbols Morse Code does something like this

Example:

A B C D

.7 .1 .1 .1

0 100 101 110

Harvard QR48

5/20/2018 Entropy 2011

11/29

February 3, 2010 11

Prefix Codes

Only one way to decode left to right

A B C D.7 .1 .1 .1

0 100 101 110

Harvard QR48

5/20/2018 Entropy 2011

12/29

February 3, 2010 12

Minimum Average Code Length?

Average bits per symbol:

A B C D

.7 .1 .1 .1

0 100 101 110

A B C D

.7 .1 .1 .1

0 10 110 111

.71+.12+.13+.13 = 1.5

.71+.13+.13+.13 = 1.6bits/symbol (down from 2)

Harvard QR48

5/20/2018 Entropy 2011

13/29

February 3, 2010 13

Entropy of this code

5/20/2018 Entropy 2011

14/29


Self-Information

If a symbol Shas frequencyp, its self-

informationis H(S) = lg(1/p) = -lgp.

S A B C D

p .25 .25 .25 .25

H(S) 2 2 2 2

p .7 .1 .1 .1

H(S) .51 3.32 3.32 3.32

5/20/2018 Entropy 2011

15/29


First-Order Entropy of Source= Average Self-Information

S A B C D

p .25 .25 .25 .25

-lgp 2 2 2 2

-plgp .5 .5 .5 .5

p .7 .1 .1 .1

-lgp .51 3.32 3.32 3.32

-plgp .357 .332 .332 .332

-plgp

2

1.353

5/20/2018 Entropy 2011

16/29


Entropy, Compressibility,Redundancy

Lower entropy More redundantMorecompressibleLess information

Higher entropyLess redundant Less

compressibleMore information A source of yeas and nays takes 24 bits per

symbol but contains at most one bit per symbol ofinformation

010110010100010101000001 = yea 010011100100000110101001 = nay

5/20/2018 Entropy 2011

17/29


A B C D

.7 .1 .1 .1

0 10 110 111

Entropy andCompression

Average length for this code=.71+.12+.13+.13 = 1.5

No code taking only symbol frequencies intoaccount can be better than first-order entropy

First-order Entropy of this source =.7lg(1/.7)+.1lg(1/.1)+ .1lg(1/.1)+.1lg(1/.1) =

1.353 First-order Entropy of English is about 4

bits/character based on typical English texts

Efficiency of code = (entropy of

source)/(average code length) = 1.353/1.5 =

5/20/2018 Entropy 2011

18/29


A Simple Prefix Code:Huffman Codes

Suppose we know the symbol frequencies. We

can calculate the (first-order) entropy. Can we

design a code to match?

There is an algorithm that transforms a set of

symbol frequencies into a variable-length, prefix

code that achieves average code length

approximately equal to the entropy.

David Huffman, 1951

5/20/2018 Entropy 2011

19/29


Huffman Code Example

A.35

B.05

C.2

D.15

E.25

BD

.2

BCD

.4

AE

.6

ABCDE

1.0

5/20/2018 Entropy 2011

20/29


Huffman Code ExampleA

.35

B

.05

C

.2

D

.15

E

.25

BD

.2

BCD

.4AE

.6

ABCDE

1.0

0 1

01

0

1

01

A 00

B 100

C 11D 101

E 01

Entropy 2.12

Ave

length2.20

5/20/2018 Entropy 2011

21/29


Efficiency of Huffman Codes

Huffman codes are as efficient as possible if only

first-order information (symbol frequencies) is

taken into account.

Huffman code is always within 1 bit/symbol of the

entropy.

5/20/2018 Entropy 2011

22/29


Second-Order Entropy

Second-Order Entropy of a source iscalculated by treating digrams as singlesymbols according to their frequencies

Occurrences of q and u are notindependent so it is helpful to treat qu asone

Second-order entropy of English is about3.3 bits/character

5/20/2018 Entropy 2011

23/29

How English Would Look Basedon frequencies alone

0: xfoml rxkhrjffjuj zlpwcfwkcyj ffjeyvkcqsghyd

qpaamkbzaacibzlhjqd

1: ocroh hli rgwr nmielwis eu ll nbnesebya th eei

alhenhttpa oobttva

2: On ie antsoutinys are t inctore st be s deamy

achin d ilonasive tucoowe at

3: IN NO IST LAT WHEY CRATICT FROURE BIRSGROCID PONDENOME OF DEMONSTURES OF

THE REPTAGIN IS REGOACTIONA


5/20/2018 Entropy 2011

24/29

How English Would Look Basedon word frequencies

1) REPRESENTING AND SPEEDILY IS AN GOOD

APT OR COME CAN DIFFERENT NATURAL HERE

HE THE A IN CAME THE TO OF TO EXPERT GRAY

COME TO FURNISHES THE LINE MESSAGE HADBE THESE

2) THE HEAD AND IN FRONTAL ATTACK ON AN

ENGLISH WRITER THAT THE CHARACTER OFTHIS POINT IS THEREFORE ANOTHER METHOD

FOR THE LETTERS THAT THE TIME OF WHO

EVER TOLD THE PROBLEM FOR AN

UNEXPECTEDFebruary 8, 2010 Harvard QR48 24

5/20/2018 Entropy 2011

25/29


What is entropy of English?

Entropy is the limit of the information persymbol using single symbols, digrams,trigrams,

Not really calculable because English is afinite language!

Nonetheless it can be determinedexperimentally using Shannons game

Answer: a little more than 1 bit/character

5/20/2018 Entropy 2011

26/29


Shannons Remarkable 1948 paper

5/20/2018 Entropy 2011

27/29


Shannons Source CodingTheorem

No code can achieve efficiency greaterthan 1, but

For any source, there are codes withefficiency as close to 1 as desired.

The proof does not give a method to findthe best codes. It just sets a limit on how

good they can be.

5/20/2018 Entropy 2011

28/29


Huffman coding used widely

Eg JPEGs use Huffman codes to for the

pixel-to-pixel changesin color values

Colors usually change gradually so there are manysmall numbers, 0, 1, 2, in this sequence

JPEGs sometimes use a fancier

compression method called arithmetic

coding

Arithmetic coding produces 5% better

compression

5/20/2018 Entropy 2011

29/29

February 8 2010 Harvard QR48 29

Why dont JPEGs use arithmeticcoding?

Because it is patented by IBM

United States Patent 4,905,297

Langdon, Jr. , et al. February 27, 1990

Arithmetic coding encoder and decoder system

AbstractApparatus and method for compressing and de-compressing binary decision data by arithmetic

coding and decoding wherein the estimated probability Qe of the less probable of the two decision events,

or outcomes, adapts as decisions are successively encoded. To facilitate coding computations, an augend

value A for the current number line interval is held to approximate

What if Huffman had patented his code?

Documents

Entropy 2011