Information Theory

7/21/2019 Information Theory

http://slidepdf.com/reader/full/information-theory-56da4acb84263 1/5

Digital CommunicationInformation Theory

Digital communication systems and networks are described by their data rates. The ultimate goal

of a communication system is to transfer information rather than data. Defining what information

is, how to characterize it, and how to improve information transfer rates are the challenges

addressed by information theory. One of the persons who started the digital revolution was

Claude hannon. hannon showed in one of his ma!or research papers that all informationsources " people speaking, television pictures, telegraph keys # have a $source rate% associated

with them which can be measured in bits per second. imilarly communication channels have acapacity that can be e&pressed in the same units. Information can be transmitted over the channel

if and only if the source rate does not e&ceed the channel capacity.

Information Sources and Entropy

Consider a weather forecast for the ahara desert. This forecast is an information source. Theinformation source has two main outcomes' rain or no"rain. Clearly, the outcome no"rain

contains little information( it is a highly probable outcome. The outcome rain, however, contains

considerable information( it is a highly improbable event.

In information theory, an information source is a probability distribution, i.e. a set of probabilities assigned to a set of outcomes. This reflects the fact that the information contained in

an outcome is determined not only by the outcome, but by how uncertain it is. )n almost certain

outcome contains little information.

) measure of the information contained in an outcome was introduced by Hartley in *+-. e

defined the information /sometimes called self-information0 contained in an outcome, i x , as'

( ) { }( )ii

i x P x P x I ,, log12

*log −=

= Eq. (60)

This measure satisfies our re3uirement that the information contained in an outcome is

proportional to its uncertainty. If { }i x P = 1, then ( )i x I 4 0, telling us that a certain event

contains no information.

The definition also satisfies the re3uirement that the total information in independent eventsshould add. Clearly, a rain forecast for two days contains twice as much information as for one

day. 5rom e3uation for two independent outcomes, i x and j x '

( ){ }

= ji

ji xand x P

xand x I

*log ,

{ } { }

=

ji x P x P

*log,

*



{ }

=

i x P

*log, 6 { }

j x P

*log ,

4 ( ) ji x I x I + Eq. (61)

artley7s measure defines the information in a single outcome. The measure entropy H(X),

/sometimes absolute entropy0, defines the information content of the source X as a whole. It is

the mean information provided by the source per source output or symbol. 8e have frome3uation 9*'

( ) { } { } { } { }( )i

i

ii

i

i x P x P x I x P X H ,log∑∑ −== Eq.(62)

Example

Consider the case where there is a need to communicate the set of student grades of which there

are four, :);, :<;, :C; and :=;. It is a trivial task to map the grades into a binary alphabet as

follows'

) > ??

< > ?*

C > *?

= > **

ince there are @? grades to be communicated, the above coding will re3uire the transfer of *?? binary symbols, or bits. The 3uestion arises whether the information could be accurately

transferred using fewer than *?? bits. If the four grades are equiprobable /that is each symbol

has an e3ually likely chance of occurring0 then the probability of any of the grades is ?.@ andthe entropy is'

4 A & B/any grade0log2B/any grade01

4 A & /?.@0 &

4 bitssymbol

8hen this is the case then the information can%t be transmitted in less than *?? bits.



owever, if the distribution of the grades is as shown below, we may calculate the entropy by'

EF<OG H)JTITE BKO<)<IGITE

) @ B) 4 ?.*?

< *? B< 4 ?.?

C L BC 4 ?.@9

= - B= 4 ?.*A

4 "/ B)Gog B) 6 B<Gog B< 6 BCGog BC 6 B=Gog B=0

4 "/?.*? & "M.M*+ 6 ?.? & ".M*+ 6 ?.@9 & "?.LM9@ 6 ?.*A & ".LM9@0

4 *.99* bitssymbol

5rom this we may infer that the most efficient coding of the information will re3uire *.99*

binary symbols per information symbol. Therefore we should be able to find a way to code the

information with fewer than *?? binary symbols /note @? & *.99* 4 LM.*?@0.

)t the same time we must be cognizant of the fact that it will never be possible to accuratelycode the information with fewer than LA binary symbols.

8e can easily demonstrate a reduction in the re3uired number of binary symbols by adopting a

variable length coding and assigning shorter codes to more fre3uently occurring messagesymbols. ) solution is as follows'

) > ?*?

< > ??

C > *

= > ?**

If the above coding is used, only LA bits will be re3uired /check this on your own0. Jote that in

the above assignment, each code is uni3ue and is not a prefi& of any other code in the set. Thisvariable"length coding is known as uffman coding /after the engineer who designed it0 and is a

practical way to design relatively efficient, prefi&"free, variable length codes, where the probability distribution of the source information symbols is known.

)s an e&ercise, write down the binary se3uence for the following grades, then starting with the

first bit see why it is possible for the receiver to accurately decode the received binary se3uence.

C E C C E A C C E B B C C E C C E E B B C A B C A

M



/int0

The binary se3uence for the first si& grades is'

1 0 1 1 1 1 0 1 1 0 1 0

The only grade that does not start with a ? is C, also there are no symbols with codes *? or *?*,

therefore the first symbol is C. There is no symbol with code ? or ?*, therefore the second

symbol is ?** or =, etc. uffman coding is only one e&ample of many types of compressioncoding. One of the drawbacks is that it re3uires the probability of the message elements to be

known beforehand.

=ntropy then gives us the average information content of the source.

The relationship between information, bandwidth and noise

The most important 3uestion associated with a communication channel is the ma&imum rate atwhich it can transfer information.

Information can only be transferred by a signal if the signal is permitted to change. )nalogue

signals passing through physical channels may not change arbitrarily fast. The rate at which a

signal may change is determined by the bandwidth. In fact it is governed by the same Jy3uist"

hannon law as governs sampling( a signal of bandwidt B ma! "ange at a maxim#m $ate of2B. If ea" "ange is #sed to signif! a bit% te maxim#m info$mation $ate is 2B.

The Jy3uist"hannon theorem makes no observation concerning the magnitude of the change. If

changes of differing magnitude are each associated with a separate bit, the information rate may

be increased. Thus, if each time the signal changes it can take one of n levels, the informationrate is increased to'

R = 2Blo2(n) bitssec

This formula states that as n tends to infinity, so does the information rate.

Is there a limit on the number of levelsN The limit is set by the presence of noise. If we continueto subdivide the magnitude of the changes into ever decreasing intervals, we reach a point where

we cannot distinguish the individual levels because of the presence of noise. Joise therefore

places a limit on the ma&imum rate at which we can transfer information. Obviously, what really

matters is the sinal-to-noise ratio (S!"). This is defined by the ratio of signal power S to noise power !, and is often e&pressed in deci<els(

SNR = 10lo10(S/N) d&

)lso note that it is common to see following e&pressions for power in many te&ts'

A



P dBW =10lo10(S/1) d&'

P dBm =10lo10(S/0.001) d&m

The first e3uation e&presses power as a ratio to * 8att and the second e3uation e&presses power

as a ratio to * milli8att. These are e&pressions of power and should not be confused with JK.

There is a theoretical ma&imum to the rate at which information passes error free over the

channel. This ma&imum is called the channel capacity #. The famous Hartley-S$annon %a&

states that the channel capacity # is given by'

C = Blo2(1 + (S/N)) bits/sec

Jote that J is linear /i.e. not stated in d<0 in this e&pression. 5or e&ample, a *?z channel

operating in a JK of *@d< has a theoretical ma&imum information rate of bs.

*?,??? & log/M*.9M0 4 A+,LL bitssecond

The theorem makes no statement as to ow the channel capacity is achieved. In fact, channels

only approach this limit. The task of providing high channel efficiency is the goal of codingtechni3ues. The failure to meet perfect performance is measured by t$e bit-error-rate ('E").

Typically <=Ks are of the order *? "9.

<it error rate is a measured 3uantity and is the number of errored bits received divided by the

number of bits transmitted. If a system has a <=K of *? "9 then this means that in a testmeasurement, one bit error was received for every one million bits transmitted. The p$obabilit!

of e$$o$% P(e), is a theoretical e&pectation of the bit error rate.

@

Documents

Information Theory