MIT
Physical Layer and Coding
Muriel MédardProfessor
EECSMIT
MIT
Overview
• A variety of physical media: copper, free space, optical fiber• Unified way of addressing signals at the input and the output
of these media:– basic models– Nyquist sampling theorem
• How we use the channels: modulation• Modeling the net effect of channels: transition probabilities
and errors• Making up for channel errors:
– principles of coding– theoretical limits
MIT
Signals
• Channel has a physical signal traversing a physical medium• Electromagnetic signal• Polarization of the wave
TM TE
Propagation of the wave
MIT
Signals
• What do we mean by frequency?• We can express a signal by a superposition of sinusoids, each
of them with its own frequency• A continuous signal is bandlimited to a bandwidth W at
carrier frequency f0 if it can be expressed in terms ofsinusoids in a range of frequencies of width W around afrequency f0
• If we multiply the signal by a frequency shift, then weoperate around 0, the baseband representation
f0 f0 +W/2 f0 -W/2 0 W/2 -W/2
passband baseband
MIT
Nyquist sampling theorem
• We can wholly reconstruct a signal from its periodic samples aslong as those samples are within 1/W of each other
• Continuous signal x(t), discrete time signal is x[n] = x(n T)where T = 1/W
t
x(t)
0 T 2T 3T 4T 5T 6T
MIT
Modulation: creating signals
• How we generate signals, by varying amplitude, phase, frequency• On-off keyed (OOK): send 1, 0, a special case of amplitude
modulation
• Frequency-shift keyed (FSK), change frequencies:
• Phase-shift key (PSK):2 PSK 4 PSK
MIT
Modulation
• Pulse position modulation (PPM)
• Combining phase and amplitude, for instance:
0 +1 +2
+2
+1
-1
-2
-2 -1
PAM 5x5 symbol code for 100BASE-T2 PHY
MIT
Channels
• Canonical channel model: signal goes into channel, there is anon-additive effect and addition of noise
• Very commonly, G is a multiplicative effect:• Y[n] = g X[n] + N[n]• Y[n] = g-1 X[n-1] + g0 X[n] + N[n]
INTERSYMBOL-INTERFERENCE (ISI)
GTxX[n]
Rx
N[n]
G(X[n]) Y[n]
MIT
Effect of the channel
• If we put in a certain string of input signals X[1], X[2], …,then we have a certain probability of having a certain outputstring Y[1], Y[2], …
• The net effect is what is the probability of having a certainoutput when a given input is given
MIT
Channel model
• The channel is described by an input alphabet, an outputalphabet, a set of probabilities on the input alphabet and a setof channel transitions, for a memoryless channel (on every bitthe channel behaves independently of other times):
• Channel with input alphabet of size k+1 and output alphabetof size m+1
• p(y | x) is transition probability, probability of getting outputy for input x
0
k m
0Tx Rx
... ...
MIT
How do we determine the probabilities?
• We want to get to p(x | y)• Use Bayes’s rule: p(x | y) = p( x, y) / p(y)• Also p(x, y) = p(y | x) p(x)• p(y) = p(y | xi) p(xi)• For the case where all the inputs have the same probability,
then p(x | y) = p(y , x) / (1/n p (y | xi) )= p(y | x) / n p (y | x)
• How can we try to make up for the effect of the channel?
i!
i!
MIT
When do we use codes
• Two different types of codes:– source codes: compression– channel codes: error-correction– Source-channel separation theorem says the two can be
done independently for a large family of channels
Sour
ceen
code
r
Channelencoder
Channel decoder
stream
Modulator, channel, receiver, etc...
Sour
cede
code
rs x y
MIT
What is a reasonable way to construct codes?
• Suppose that errors occur independently from symbol tosymbol: a reasonable way to code is to make codewords asdissimilar as possible
• Number of bits in red: Hamming distance
• If we received y = 01000101001111, we would map to thefirst codeword - we’ll make this more precise later
x1 = 01000101001110
x2 = 01001100000111
MIT
What do we mean by constructing a code?
• Block code: map a block of bits to another, generally larger,block of bits
• (n, k) linear block code encodes n-bit message into k-bit codevector, rate of code is R = n/k
• In general, code maps a message from set M= 2n onto acodeword of length k
• Every codeword is one-to-one correspondence with amessage
• An error occurs in decoding if we map a received codewordto the wrong message or to more than one message
MIT
What do we mean by decoding?
• We choose most likely input to have yielded observed output- Maximum Likelihood Decoding
• Given that we observe the output vector y, the probabilitythat x1 was transmitted is p(x1 | y) and the probability that x2was transmitted is p(x2 | y) (let’s look at 2 codewords)
• Then we pick x1 when p(x1 | y) / p(x2 | y) > 1 or equivalentlywhen ln(p(x1 | y) ) - ln(p(x2 | y) ) > 0 in terms of loglikelihoods
MIT
Decoding example
• Consider the binary symmetric channel (BSC)
• The probabilities are:– p(x1| y) = (1-p)13 p 13 no errors and 1 error– p(x1| y) = (1-p)11 p3 11 no errors and 3 errors
• The BER is p
0
1 1
0Tx Rx1-p
1-p
p
p
MIT
Hamming distance and code capability
• Minimum Hamming Distance dmin of a code is the minimumHamming Distance between any two codewords
• A code can detect up to t errors iff dmin t + 1• A code can correct up to t errors iff dmin 2t + 1• A code can correct up to t errors and detect up to t’ > t errors iff
dmin 2t + 1 and dmin t + t’ + 1• How does dmin relate to r = k-n, the number of redundant bits?• Singleton bound: r + 1 dmin• Example take strings of n bits and add a parity check code: r is 1
and dmin is 2• These are bounds, most codes don’t achieve these bounds
!
!
! !
!
MIT
Linear codes: how do we build them?
• Code word x = s G where for (n,k) code G is a matrix with krows and n columns
• Linear code because sums of codewords are codewords• How can we find dmin ? The Hamming distance between two
binary codewords is the same as the Hamming weight of theirsum
• Thus, the minimum Hamming distance is the minimumweight of non-zero code vectors
MIT
Syndromes and parity-check
• For any code, it is always possible to find a systematic code,i.e. a code for which G = [I k x k P k x r ]by manipulating therows using linear operations
• We define the parity-check matrix H to be:
• We have x H = 0• Syndrome s = y H• Define error sequence e = y + x• Then s = e H• Decoding is done by finding the minimum Hamming weight
sequence that satisfies s = e H and decoding to the codewordy + z
I-
P
kk x
kr x
!"
#$%
&
MIT
Let’s think in 3-D
• Hamming distance between codes is related to t, the numberof errors to correct in a (n, k) linear block code
• Example: rate (3,1) repetition code: can correct at most 1error and detect 2 errors
000
111
100
010110
010
011
MIT
Further bounds
• Hamming bound: r
• Gilbert bound: there exists a code such that:
!!"
#$$%
&!!"
#$$%
&' (
=
t
0j j
nlog
!!"
#$$%
&!!"
#$$%
&' (
=
2t
0j j
nlogr
MIT
Cyclic Codes
• Cyclic codes are codes with cyclic shift property in code words:if (a, b, c, d, e) is a codeword, so is (b, c, d, e, a)
• A generator matrix is then
• Can we make use of the regular structure of this matrix to simplifythings?
!!!!!!!
"
#
$$$$$$$
%
&
=
mm
0m
0m
g...g...
.
.
.
...g...g0
...g...g
G
MIT
Polynomials on fields
• Galois field: a field with a finite number of elements• We can define a polynomial over a finite field (Galois Field)
GF(q) as f(D) = fn Dn + fn-1 Dn-1 + … + f0
• D is not a variable in the field, it is an indeterminate - think of theD-transform or the Z-transform
• For polynomials, addition and multiplication can be defined, theyare commutative, associative, closure holds, an addition inverse isdefined BUT NO MULTIPLICATIVE INVERSE
• Instead, we have the following theorem: let f(D) and g(D) bepolynomials over GF(q) and let g(D) have degree at least 1. Thenthere exist unique polynomials h(D) and r(D) over GF(q) for whichthe degree of r(D) is less than that of g(D) and
f(D) = g(D) h(D) + r(D) (Euclidean division algorithm)
MIT
Relation between polynomials and cyclic codes
• Represent a vector x = (xn-1 … x0 ) as x(D) = xn-1 Dn-1 + … + x0
• We say that x(D) is a codeword when the coefficients for theletters of a codeword
• If we have a cyclic code, then it is the case that the remainder ofdividing D x(D) by Dn - 1 ( also called the remainder of D x(D)modulo Dn - 1 ) is also a code word
• To see this: Dx(D) = xn-1 Dn + … + x0 D = xn-1(Dn - 1) + … + x0 D + xn-1
• Define g(D) to be the lowest degree monic polynomial which is acode word in a cyclic code define m to be its degree
• Because of linearity, any Dj g(D) is also a codeword, so in generala(D) g(D) is a codeword
• Moreover all code words have this form
MIT
Implementing systematic cyclic codes
• In polynomial form, a systematic binary code is of the form:x(D) = Dr t(D) - d(D) = Dr t(D) + d(D)
• The check bits are d(D) and d(D) has degree less than r• Let’s take d(D) to be the remainder of Dr t(D) / g(D)• What about the syndrome?• s(D) = x(D) / g(D) = Dr t(D) / g(D) + d(D) / g(D) = 2 d(D) = 0• For some q(D), we have g(D) q(D) + d(D) = Dr t(D)• So g(D) q(D) = Dr t(D) - d(D) = Dr t(D) + d(D) which is our
codeword x(D)
MIT
Implementing cyclic codes
Divide by g(D) circuit
+.... . .
g0 g1 gr-1
s1+
s2 ... +sr-1
Dr s(D)
Dr s(D)
+
Divide by g(D)
...
Encoder
Circuits are easily implementedusing shift registers to represent polynomials in D
MIT
Decoding
• Brute force: divide y(D) by g(D), from there get syndrome, checksyndrome against syndrome table, generate n-bit error pattern, thenadd to y(D) to obtain corrected word
• However, we know that we can only correct a number t of errors• Let’s define the Meggitt set to be the set of error patterns such that
en-1 = 1• Use Meggitt’s theorem: suppose that g(D) h(D) = Dn - 1 and
e(D) / g(D) = s(D) then[Db e(D) mod (Dn - 1)]/g(D) = [Db s(D)]/g(D)
• So only need to store syndromes for error patterns in the Meggittset
MIT
Different types of codes: Hamming codes
• Hamming codes are some of the few “perfect” codes• Sphere of radius v: set of all sequences at distances v or less from a
sequence• A v-error-correcting sphere-packed code has the property that the
spheres of size v around the code are non-overlapping and thatevery sequence is at most v+1 in distance from some code word
• A v-error-correcting sphere-packed code for which every sequenceis at most v+1 in distance from some code word is perfect
• Hamming codes: rows of H are all the different nonzero sequencesof length k-n
• Thus k = 2 k-n -1• For all Hamming codes, the minimum Hamming distance is 3, so
they can correct 1 error or detect 2• The rate of a Hamming code is R = (2 r - r -1 )/(2 r -1 )
MIT
Cyclic Redundancy Check Codes (CRCs)
• Normally look only to see whether the syndrome is 0 or not,which is why they are error-detecting
• This is not a function of the CRC code, it’s more of the wayit’s decoded
• Usually used in a concatenated fashion with an errorcorrecting code
• g(D) = (x+1) p(D) where p(D) is a primitive polynomial,which means that it divides D2r-1-1 -1but no lower orderpolynomial of the form Dv -1
• Length k = 2 r-1 - 1• Usually used in shortened mode: for a systematic code, stuff
with 0s and those 0s need not be transmitted but must betaken into account at the decoding
MIT
Convolutional codes
• Rather than map a block to a block, we code continuously• Tend to be used in lower BER channels• We can define the code as a set of states: for M memory
elements in the encoder, there are 2 M states• The trellis structure is as follows, say M=2
11
0100
10t t + 1
Trellis
Decoding•We use a time between known states•Possible ways of getting from one state to another are called the adversary paths•Viterbi algorithm is a means of applying dynamic programming to path selection
MIT
Beyond codes based on distance
• What is the best performance we can get with a coded system?• Let us consider a channel with bandwidth W, with noise energy N
per Hz, and with a maximum E on the mean transmitted• Shannon’s limit is that if we have no delay constraint and no
complexity constrain on the coder and decoder, the best achievableerror-free rate (capacity) is given by
Bits per second
• A modulation to achieve this limit is a one that itself looks likenoise and the codes are random
!"
#$%
&+NW
E1log
2
W
MIT
Capacity
• Recall the BSC
• Capacity is: I(X; Y) = H(Y) - H(Y|X)• H is entropy H(Z) = - Σ pZ(z) log(pZ(z) )• Let’s work it out ...
0
1 1
0X Y1-p
1-p
p
p
MIT
Random codes
• The codes are random in that the codewords are selected atrandom from a large number of codewords
• The coding theorem gives results based on the fact that anerror is unlikely because the for long enough codewords, twocodewords are unlikely to be the close enough to be confused
• Codes have emerged which work on that principle• Examples: Turbo Codes and Low Density Parity Check
Codes