Entropy 2011

Embed Size (px)

DESCRIPTION

Entropy

Citation preview

  • 5/20/2018 Entropy 2011

    1/29

    February 3, 2010 Harvard QR48 1

    Coding and Entropy

  • 5/20/2018 Entropy 2011

    2/29

    February 3, 2010 2

    Squeezing out the Air

    Suppose you want to ship pillows in boxes andare charged by the size of the box

    Lossless data compression

    Entropy = lower limit of compressibility

    Harvard QR48

  • 5/20/2018 Entropy 2011

    3/29

    February 3, 2010 3

    Claude Shannon (1916-2001)A Mathematical Theory of Communication (1948)

    Harvard QR48

  • 5/20/2018 Entropy 2011

    4/29

    February 3, 2010 4

    Communication over a Channel

    Source Coded Bits Received Bits Decoded Message

    S X Y T

    Channel

    symbols bits bits symbolsEncode bits before putting them in the channel

    Decode bits when they come out of the channel

    E.g. the transformation from Sinto Xchangesyea --> 1 nay --> 0

    Changing Y into Tdoes the reverse

    For now, assume no noise in the channel, i.e. X=Y

    Harvard QR48

  • 5/20/2018 Entropy 2011

    5/29

    February 3, 2010 5

    Example: TelegraphySource English letters -> Morse Code

    D -..

    -.. D

    Baltimore

    Washington-..

    Harvard QR48

  • 5/20/2018 Entropy 2011

    6/29

    February 3, 2010 6

    Low and High Information ContentMessages The more frequent a message is, the less information it

    conveys when it occurs

    Two weather forecast messages:

    Bos:

    LA:

    In LA Sunny is a low information message and cloudy isa high information message

    Harvard QR48

  • 5/20/2018 Entropy 2011

    7/29February 3, 2010 7

    Harvard Grades

    Less information in Harvard grades now than in recentpast

    % A A- B+ B B- C+

    2005 24 25 21 13 6 2

    1995 21 23 20 14 8 3

    1986 14 19 21 17 10 5

    Harvard QR48

  • 5/20/2018 Entropy 2011

    8/29February 3, 2010 8

    Fixed Length Codes (Block Codes)

    Example: 4 symbols, A, B, C, D

    A=00, B=01, C=10, D=11

    In general, with nsymbols, codes need to be oflength lg n, rounded up

    For English text, 26 letters + space = 27 symbols,length = 5 since 24< 27 < 25

    (replace all punctuation marks by space)

    AKA block codes

    Harvard QR48

  • 5/20/2018 Entropy 2011

    9/29February 3, 2010 9

    Modeling the Message Source

    Characteristics of the stream of messagescoming from the source affect the choice ofthe coding method

    We need a model for a source of Englishtext that can be described and analyzedmathematically

    Source Destination

    Harvard QR48

  • 5/20/2018 Entropy 2011

    10/29February 3, 2010 10

    How can we improve on block codes?

    Simple 4-symbol example: A, B, C, D

    If that is all we know, need 2 bits/symbol

    What if we know symbol frequencies?

    Use shorter codes for more frequent symbols Morse Code does something like this

    Example:

    A B C D

    .7 .1 .1 .1

    0 100 101 110

    Harvard QR48

  • 5/20/2018 Entropy 2011

    11/29

    February 3, 2010 11

    Prefix Codes

    Only one way to decode left to right

    A B C D.7 .1 .1 .1

    0 100 101 110

    Harvard QR48

  • 5/20/2018 Entropy 2011

    12/29

    February 3, 2010 12

    Minimum Average Code Length?

    Average bits per symbol:

    A B C D

    .7 .1 .1 .1

    0 100 101 110

    A B C D

    .7 .1 .1 .1

    0 10 110 111

    .71+.12+.13+.13 = 1.5

    .71+.13+.13+.13 = 1.6bits/symbol (down from 2)

    Harvard QR48

  • 5/20/2018 Entropy 2011

    13/29

    February 3, 2010 13

    Entropy of this code

  • 5/20/2018 Entropy 2011

    14/29

    February 3, 2010 Harvard QR48 14

    Self-Information

    If a symbol Shas frequencyp, its self-

    informationis H(S) = lg(1/p) = -lgp.

    S A B C D

    p .25 .25 .25 .25

    H(S) 2 2 2 2

    p .7 .1 .1 .1

    H(S) .51 3.32 3.32 3.32

  • 5/20/2018 Entropy 2011

    15/29

    February 3, 2010 Harvard QR48 15

    First-Order Entropy of Source= Average Self-Information

    S A B C D

    p .25 .25 .25 .25

    -lgp 2 2 2 2

    -plgp .5 .5 .5 .5

    p .7 .1 .1 .1

    -lgp .51 3.32 3.32 3.32

    -plgp .357 .332 .332 .332

    -plgp

    2

    1.353

  • 5/20/2018 Entropy 2011

    16/29

    February 3, 2010 Harvard QR48 16

    Entropy, Compressibility,Redundancy

    Lower entropy More redundantMorecompressibleLess information

    Higher entropyLess redundant Less

    compressibleMore information A source of yeas and nays takes 24 bits per

    symbol but contains at most one bit per symbol ofinformation

    010110010100010101000001 = yea 010011100100000110101001 = nay

  • 5/20/2018 Entropy 2011

    17/29

    February 8, 2010 Harvard QR48 17

    A B C D

    .7 .1 .1 .1

    0 10 110 111

    Entropy andCompression

    Average length for this code=.71+.12+.13+.13 = 1.5

    No code taking only symbol frequencies intoaccount can be better than first-order entropy

    First-order Entropy of this source =.7lg(1/.7)+.1lg(1/.1)+ .1lg(1/.1)+.1lg(1/.1) =

    1.353 First-order Entropy of English is about 4

    bits/character based on typical English texts

    Efficiency of code = (entropy of

    source)/(average code length) = 1.353/1.5 =

  • 5/20/2018 Entropy 2011

    18/29

    February 8, 2010 Harvard QR48 18

    A Simple Prefix Code:Huffman Codes

    Suppose we know the symbol frequencies. We

    can calculate the (first-order) entropy. Can we

    design a code to match?

    There is an algorithm that transforms a set of

    symbol frequencies into a variable-length, prefix

    code that achieves average code length

    approximately equal to the entropy.

    David Huffman, 1951

  • 5/20/2018 Entropy 2011

    19/29

    February 8, 2010 Harvard QR48 19

    Huffman Code Example

    A.35

    B.05

    C.2

    D.15

    E.25

    BD

    .2

    BCD

    .4

    AE

    .6

    ABCDE

    1.0

  • 5/20/2018 Entropy 2011

    20/29

    February 8, 2010 Harvard QR48 20

    Huffman Code ExampleA

    .35

    B

    .05

    C

    .2

    D

    .15

    E

    .25

    BD

    .2

    BCD

    .4AE

    .6

    ABCDE

    1.0

    0 1

    01

    0

    1

    01

    A 00

    B 100

    C 11D 101

    E 01

    Entropy 2.12

    Ave

    length2.20

  • 5/20/2018 Entropy 2011

    21/29

    February 8, 2010 Harvard QR48 21

    Efficiency of Huffman Codes

    Huffman codes are as efficient as possible if only

    first-order information (symbol frequencies) is

    taken into account.

    Huffman code is always within 1 bit/symbol of the

    entropy.

  • 5/20/2018 Entropy 2011

    22/29

    February 8, 2010 Harvard QR48 22

    Second-Order Entropy

    Second-Order Entropy of a source iscalculated by treating digrams as singlesymbols according to their frequencies

    Occurrences of q and u are notindependent so it is helpful to treat qu asone

    Second-order entropy of English is about3.3 bits/character

  • 5/20/2018 Entropy 2011

    23/29

    How English Would Look Basedon frequencies alone

    0: xfoml rxkhrjffjuj zlpwcfwkcyj ffjeyvkcqsghyd

    qpaamkbzaacibzlhjqd

    1: ocroh hli rgwr nmielwis eu ll nbnesebya th eei

    alhenhttpa oobttva

    2: On ie antsoutinys are t inctore st be s deamy

    achin d ilonasive tucoowe at

    3: IN NO IST LAT WHEY CRATICT FROURE BIRSGROCID PONDENOME OF DEMONSTURES OF

    THE REPTAGIN IS REGOACTIONA

    February 8, 2010 Harvard QR48 23

  • 5/20/2018 Entropy 2011

    24/29

    How English Would Look Basedon word frequencies

    1) REPRESENTING AND SPEEDILY IS AN GOOD

    APT OR COME CAN DIFFERENT NATURAL HERE

    HE THE A IN CAME THE TO OF TO EXPERT GRAY

    COME TO FURNISHES THE LINE MESSAGE HADBE THESE

    2) THE HEAD AND IN FRONTAL ATTACK ON AN

    ENGLISH WRITER THAT THE CHARACTER OFTHIS POINT IS THEREFORE ANOTHER METHOD

    FOR THE LETTERS THAT THE TIME OF WHO

    EVER TOLD THE PROBLEM FOR AN

    UNEXPECTEDFebruary 8, 2010 Harvard QR48 24

  • 5/20/2018 Entropy 2011

    25/29

    February 8, 2010 Harvard QR48 25

    What is entropy of English?

    Entropy is the limit of the information persymbol using single symbols, digrams,trigrams,

    Not really calculable because English is afinite language!

    Nonetheless it can be determinedexperimentally using Shannons game

    Answer: a little more than 1 bit/character

  • 5/20/2018 Entropy 2011

    26/29

    February 8, 2010 Harvard QR48 26

    Shannons Remarkable 1948 paper

  • 5/20/2018 Entropy 2011

    27/29

    February 8, 2010 Harvard QR48 27

    Shannons Source CodingTheorem

    No code can achieve efficiency greaterthan 1, but

    For any source, there are codes withefficiency as close to 1 as desired.

    The proof does not give a method to findthe best codes. It just sets a limit on how

    good they can be.

  • 5/20/2018 Entropy 2011

    28/29

    February 8, 2010 Harvard QR48 28

    Huffman coding used widely

    Eg JPEGs use Huffman codes to for the

    pixel-to-pixel changesin color values

    Colors usually change gradually so there are manysmall numbers, 0, 1, 2, in this sequence

    JPEGs sometimes use a fancier

    compression method called arithmetic

    coding

    Arithmetic coding produces 5% better

    compression

  • 5/20/2018 Entropy 2011

    29/29

    February 8 2010 Harvard QR48 29

    Why dont JPEGs use arithmeticcoding?

    Because it is patented by IBM

    United States Patent 4,905,297

    Langdon, Jr. , et al. February 27, 1990

    Arithmetic coding encoder and decoder system

    AbstractApparatus and method for compressing and de-compressing binary decision data by arithmetic

    coding and decoding wherein the estimated probability Qe of the less probable of the two decision events,

    or outcomes, adapts as decisions are successively encoded. To facilitate coding computations, an augend

    value A for the current number line interval is held to approximate

    What if Huffman had patented his code?