Download pdf - Physical Layer and Codingmedard/6.02smm08/6.20SDay1.pdf · 2 PSK 4 PSK. MIT Modulation •Pulse position modulation (PPM) •Combining phase and amplitude, for instance: 0 +1 +2 +2

MIT

Physical Layer and Coding

Muriel MédardProfessor

EECSMIT

MIT

Overview

• A variety of physical media: copper, free space, optical fiber• Unified way of addressing signals at the input and the output

of these media:– basic models– Nyquist sampling theorem

• How we use the channels: modulation• Modeling the net effect of channels: transition probabilities

and errors• Making up for channel errors:

– principles of coding– theoretical limits

MIT

Signals

• Channel has a physical signal traversing a physical medium• Electromagnetic signal• Polarization of the wave

TM TE

Propagation of the wave

MIT

Signals

• What do we mean by frequency?• We can express a signal by a superposition of sinusoids, each

of them with its own frequency• A continuous signal is bandlimited to a bandwidth W at

carrier frequency f0 if it can be expressed in terms ofsinusoids in a range of frequencies of width W around afrequency f0

• If we multiply the signal by a frequency shift, then weoperate around 0, the baseband representation

f0 f0 +W/2 f0 -W/2 0 W/2 -W/2

passband baseband

MIT

Nyquist sampling theorem

• We can wholly reconstruct a signal from its periodic samples aslong as those samples are within 1/W of each other

• Continuous signal x(t), discrete time signal is x[n] = x(n T)where T = 1/W

t

x(t)

0 T 2T 3T 4T 5T 6T

MIT

Modulation: creating signals

• How we generate signals, by varying amplitude, phase, frequency• On-off keyed (OOK): send 1, 0, a special case of amplitude

modulation

• Frequency-shift keyed (FSK), change frequencies:

• Phase-shift key (PSK):2 PSK 4 PSK

MIT

Modulation

• Pulse position modulation (PPM)

• Combining phase and amplitude, for instance:

0 +1 +2

+2

+1

-1

-2

-2 -1

PAM 5x5 symbol code for 100BASE-T2 PHY

MIT

Channels

• Canonical channel model: signal goes into channel, there is anon-additive effect and addition of noise

• Very commonly, G is a multiplicative effect:• Y[n] = g X[n] + N[n]• Y[n] = g-1 X[n-1] + g0 X[n] + N[n]

INTERSYMBOL-INTERFERENCE (ISI)

GTxX[n]

Rx

N[n]

G(X[n]) Y[n]

MIT

Effect of the channel

• If we put in a certain string of input signals X[1], X[2], …,then we have a certain probability of having a certain outputstring Y[1], Y[2], …

• The net effect is what is the probability of having a certainoutput when a given input is given

MIT

Channel model

• The channel is described by an input alphabet, an outputalphabet, a set of probabilities on the input alphabet and a setof channel transitions, for a memoryless channel (on every bitthe channel behaves independently of other times):

• Channel with input alphabet of size k+1 and output alphabetof size m+1

• p(y | x) is transition probability, probability of getting outputy for input x

0

k m

0Tx Rx

... ...

MIT

When do we use codes

• Two different types of codes:– source codes: compression– channel codes: error-correction– Source-channel separation theorem says the two can be

done independently for a large family of channels

Sour

ceen

code

r

Channelencoder

Channel decoder

stream

Modulator, channel, receiver, etc...

Sour

cede

code

rs x y

MIT

What is a reasonable way to construct codes?

• Suppose that errors occur independently from symbol tosymbol: a reasonable way to code is to make codewords asdissimilar as possible

• Number of bits in red: Hamming distance

• If we received y = 01000101001111, we would map to thefirst codeword - we’ll make this more precise later

x1 = 01000101001110

x2 = 01001100000111

MIT

What do we mean by constructing a code?

• Block code: map a block of bits to another, generally larger,block of bits

• (n, k) linear block code encodes n-bit message into k-bit codevector, rate of code is R = n/k

• In general, code maps a message from set M= 2n onto acodeword of length k

• Every codeword is one-to-one correspondence with amessage

• An error occurs in decoding if we map a received codewordto the wrong message or to more than one message

MIT

What do we mean by decoding?

• We choose most likely input to have yielded observed output- Maximum Likelihood Decoding

• Given that we observe the output vector y, the probabilitythat x1 was transmitted is p(x1 | y) and the probability that x2was transmitted is p(x2 | y) (let’s look at 2 codewords)

• Then we pick x1 when p(x1 | y) / p(x2 | y) > 1 or equivalentlywhen ln(p(x1 | y) ) - ln(p(x2 | y) ) > 0 in terms of loglikelihoods

MIT

Decoding example

• Consider the binary symmetric channel (BSC)

• The probabilities are:– p(x1| y) = (1-p)13 p 13 no errors and 1 error– p(x1| y) = (1-p)11 p3 11 no errors and 3 errors

• The BER is p

0

1 1

0Tx Rx1-p

1-p

p

p

MIT

Hamming distance and code capability

• Minimum Hamming Distance dmin of a code is the minimumHamming Distance between any two codewords

• A code can detect up to t errors iff dmin t + 1• A code can correct up to t errors iff dmin 2t + 1• A code can correct up to t errors and detect up to t’ > t errors iff

dmin 2t + 1 and dmin t + t’ + 1• How does dmin relate to r = k-n, the number of redundant bits?• Singleton bound: r + 1 dmin• Example take strings of n bits and add a parity check code: r is 1

and dmin is 2• These are bounds, most codes don’t achieve these bounds

!

!

! !

!

MIT

Linear codes: how do we build them?

• Code word x = s G where for (n,k) code G is a matrix with krows and n columns

• Linear code because sums of codewords are codewords• How can we find dmin ? The Hamming distance between two

binary codewords is the same as the Hamming weight of theirsum

• Thus, the minimum Hamming distance is the minimumweight of non-zero code vectors

MIT

Syndromes and parity-check

• For any code, it is always possible to find a systematic code,i.e. a code for which G = [I k x k P k x r ]by manipulating therows using linear operations

• We define the parity-check matrix H to be:

• We have x H = 0• Syndrome s = y H• Define error sequence e = y + x• Then s = e H• Decoding is done by finding the minimum Hamming weight

sequence that satisfies s = e H and decoding to the codewordy + z

I-

P

kk x

kr x

!"

#$%

&

MIT

Let’s think in 3-D

• Hamming distance between codes is related to t, the numberof errors to correct in a (n, k) linear block code

• Example: rate (3,1) repetition code: can correct at most 1error and detect 2 errors

000

111

100

010110

010

011

MIT

Further bounds

• Hamming bound: r

• Gilbert bound: there exists a code such that:

!!"

#$$%

&!!"

#$$%

&' (

=

t

0j j

nlog

!!"

#$$%

&!!"

#$$%

&' (

=

2t

0j j

nlogr

MIT

Cyclic Codes

• Cyclic codes are codes with cyclic shift property in code words:if (a, b, c, d, e) is a codeword, so is (b, c, d, e, a)

• A generator matrix is then

• Can we make use of the regular structure of this matrix to simplifythings?

!!!!!!!

"

#

$$$$$$$

%

&

=

mm

0m

0m

g...g...

.

.

.

...g...g0

...g...g

G

MIT

Polynomials on fields

• Galois field: a field with a finite number of elements• We can define a polynomial over a finite field (Galois Field)

GF(q) as f(D) = fn Dn + fn-1 Dn-1 + … + f0

• D is not a variable in the field, it is an indeterminate - think of theD-transform or the Z-transform

• For polynomials, addition and multiplication can be defined, theyare commutative, associative, closure holds, an addition inverse isdefined BUT NO MULTIPLICATIVE INVERSE

• Instead, we have the following theorem: let f(D) and g(D) bepolynomials over GF(q) and let g(D) have degree at least 1. Thenthere exist unique polynomials h(D) and r(D) over GF(q) for whichthe degree of r(D) is less than that of g(D) and

f(D) = g(D) h(D) + r(D) (Euclidean division algorithm)

MIT

Relation between polynomials and cyclic codes

• Represent a vector x = (xn-1 … x0 ) as x(D) = xn-1 Dn-1 + … + x0

• We say that x(D) is a codeword when the coefficients for theletters of a codeword

• If we have a cyclic code, then it is the case that the remainder ofdividing D x(D) by Dn - 1 ( also called the remainder of D x(D)modulo Dn - 1 ) is also a code word

• To see this: Dx(D) = xn-1 Dn + … + x0 D = xn-1(Dn - 1) + … + x0 D + xn-1

• Define g(D) to be the lowest degree monic polynomial which is acode word in a cyclic code define m to be its degree

• Because of linearity, any Dj g(D) is also a codeword, so in generala(D) g(D) is a codeword

• Moreover all code words have this form

MIT

Implementing systematic cyclic codes

• In polynomial form, a systematic binary code is of the form:x(D) = Dr t(D) - d(D) = Dr t(D) + d(D)

• The check bits are d(D) and d(D) has degree less than r• Let’s take d(D) to be the remainder of Dr t(D) / g(D)• What about the syndrome?• s(D) = x(D) / g(D) = Dr t(D) / g(D) + d(D) / g(D) = 2 d(D) = 0• For some q(D), we have g(D) q(D) + d(D) = Dr t(D)• So g(D) q(D) = Dr t(D) - d(D) = Dr t(D) + d(D) which is our

codeword x(D)

MIT

Implementing cyclic codes

Divide by g(D) circuit

+.... . .

g0 g1 gr-1

s1+

s2 ... +sr-1

Dr s(D)

Dr s(D)

+

Divide by g(D)

...

Encoder

Circuits are easily implementedusing shift registers to represent polynomials in D

MIT

Decoding

• Brute force: divide y(D) by g(D), from there get syndrome, checksyndrome against syndrome table, generate n-bit error pattern, thenadd to y(D) to obtain corrected word

• However, we know that we can only correct a number t of errors• Let’s define the Meggitt set to be the set of error patterns such that

en-1 = 1• Use Meggitt’s theorem: suppose that g(D) h(D) = Dn - 1 and

e(D) / g(D) = s(D) then[Db e(D) mod (Dn - 1)]/g(D) = [Db s(D)]/g(D)

• So only need to store syndromes for error patterns in the Meggittset

MIT

Different types of codes: Hamming codes

• Hamming codes are some of the few “perfect” codes• Sphere of radius v: set of all sequences at distances v or less from a

sequence• A v-error-correcting sphere-packed code has the property that the

spheres of size v around the code are non-overlapping and thatevery sequence is at most v+1 in distance from some code word

• A v-error-correcting sphere-packed code for which every sequenceis at most v+1 in distance from some code word is perfect

• Hamming codes: rows of H are all the different nonzero sequencesof length k-n

• Thus k = 2 k-n -1• For all Hamming codes, the minimum Hamming distance is 3, so

they can correct 1 error or detect 2• The rate of a Hamming code is R = (2 r - r -1 )/(2 r -1 )

MIT

Cyclic Redundancy Check Codes (CRCs)

• Normally look only to see whether the syndrome is 0 or not,which is why they are error-detecting

• This is not a function of the CRC code, it’s more of the wayit’s decoded

• Usually used in a concatenated fashion with an errorcorrecting code

• g(D) = (x+1) p(D) where p(D) is a primitive polynomial,which means that it divides D2r-1-1 -1but no lower orderpolynomial of the form Dv -1

• Length k = 2 r-1 - 1• Usually used in shortened mode: for a systematic code, stuff

with 0s and those 0s need not be transmitted but must betaken into account at the decoding

MIT

Convolutional codes

• Rather than map a block to a block, we code continuously• Tend to be used in lower BER channels• We can define the code as a set of states: for M memory

elements in the encoder, there are 2 M states• The trellis structure is as follows, say M=2

11

0100

10t t + 1

Trellis

Decoding•We use a time between known states•Possible ways of getting from one state to another are called the adversary paths•Viterbi algorithm is a means of applying dynamic programming to path selection

MIT

Beyond codes based on distance

• What is the best performance we can get with a coded system?• Let us consider a channel with bandwidth W, with noise energy N

per Hz, and with a maximum E on the mean transmitted• Shannon’s limit is that if we have no delay constraint and no

complexity constrain on the coder and decoder, the best achievableerror-free rate (capacity) is given by

Bits per second

• A modulation to achieve this limit is a one that itself looks likenoise and the codes are random

!"

#$%

&+NW

E1log

2

W

MIT

Capacity

• Recall the BSC

• Capacity is: I(X; Y) = H(Y) - H(Y|X)• H is entropy H(Z) = - Σ pZ(z) log(pZ(z) )• Let’s work it out ...

0

1 1

0X Y1-p

1-p

p

p

MIT

Random codes

• The codes are random in that the codewords are selected atrandom from a large number of codewords

• The coding theorem gives results based on the fact that anerror is unlikely because the for long enough codewords, twocodewords are unlikely to be the close enough to be confused

• Codes have emerged which work on that principle• Examples: Turbo Codes and Low Density Parity Check

Codes