33
MIT Physical Layer and Coding Muriel Médard Professor EECS MIT

Physical Layer and Codingmedard/6.02smm08/6.20SDay1.pdf · 2 PSK 4 PSK. MIT Modulation •Pulse position modulation (PPM) •Combining phase and amplitude, for instance: 0 +1 +2 +2

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

  • MIT

    Physical Layer and Coding

    Muriel MédardProfessor

    EECSMIT

  • MIT

    Overview

    • A variety of physical media: copper, free space, optical fiber• Unified way of addressing signals at the input and the output

    of these media:– basic models– Nyquist sampling theorem

    • How we use the channels: modulation• Modeling the net effect of channels: transition probabilities

    and errors• Making up for channel errors:

    – principles of coding– theoretical limits

  • MIT

    Signals

    • Channel has a physical signal traversing a physical medium• Electromagnetic signal• Polarization of the wave

    TM TE

    Propagation of the wave

  • MIT

    Signals

    • What do we mean by frequency?• We can express a signal by a superposition of sinusoids, each

    of them with its own frequency• A continuous signal is bandlimited to a bandwidth W at

    carrier frequency f0 if it can be expressed in terms ofsinusoids in a range of frequencies of width W around afrequency f0

    • If we multiply the signal by a frequency shift, then weoperate around 0, the baseband representation

    f0 f0 +W/2 f0 -W/2 0 W/2 -W/2

    passband baseband

  • MIT

    Nyquist sampling theorem

    • We can wholly reconstruct a signal from its periodic samples aslong as those samples are within 1/W of each other

    • Continuous signal x(t), discrete time signal is x[n] = x(n T)where T = 1/W

    t

    x(t)

    0 T 2T 3T 4T 5T 6T

  • MIT

    Modulation: creating signals

    • How we generate signals, by varying amplitude, phase, frequency• On-off keyed (OOK): send 1, 0, a special case of amplitude

    modulation

    • Frequency-shift keyed (FSK), change frequencies:

    • Phase-shift key (PSK):2 PSK 4 PSK

  • MIT

    Modulation

    • Pulse position modulation (PPM)

    • Combining phase and amplitude, for instance:

    0 +1 +2

    +2

    +1

    -1

    -2

    -2 -1

    PAM 5x5 symbol code for 100BASE-T2 PHY

  • MIT

    Channels

    • Canonical channel model: signal goes into channel, there is anon-additive effect and addition of noise

    • Very commonly, G is a multiplicative effect:• Y[n] = g X[n] + N[n]• Y[n] = g-1 X[n-1] + g0 X[n] + N[n]

    INTERSYMBOL-INTERFERENCE (ISI)

    GTxX[n]

    Rx

    N[n]

    G(X[n]) Y[n]

  • MIT

    Effect of the channel

    • If we put in a certain string of input signals X[1], X[2], …,then we have a certain probability of having a certain outputstring Y[1], Y[2], …

    • The net effect is what is the probability of having a certainoutput when a given input is given

  • MIT

    Channel model

    • The channel is described by an input alphabet, an outputalphabet, a set of probabilities on the input alphabet and a setof channel transitions, for a memoryless channel (on every bitthe channel behaves independently of other times):

    • Channel with input alphabet of size k+1 and output alphabetof size m+1

    • p(y | x) is transition probability, probability of getting outputy for input x

    0

    k m

    0Tx Rx

    ... ...

  • MIT

    How do we determine the probabilities?

    • We want to get to p(x | y)• Use Bayes’s rule: p(x | y) = p( x, y) / p(y)• Also p(x, y) = p(y | x) p(x)• p(y) = p(y | xi) p(xi)• For the case where all the inputs have the same probability,

    then p(x | y) = p(y , x) / (1/n p (y | xi) )= p(y | x) / n p (y | x)

    • How can we try to make up for the effect of the channel?

    i!

    i!

  • MIT

    When do we use codes

    • Two different types of codes:– source codes: compression– channel codes: error-correction– Source-channel separation theorem says the two can be

    done independently for a large family of channels

    Sour

    ceen

    code

    r

    Channelencoder

    Channel decoder

    stream

    Modulator, channel, receiver, etc...

    Sour

    cede

    code

    rs x y

  • MIT

    What is a reasonable way to construct codes?

    • Suppose that errors occur independently from symbol tosymbol: a reasonable way to code is to make codewords asdissimilar as possible

    • Number of bits in red: Hamming distance

    • If we received y = 01000101001111, we would map to thefirst codeword - we’ll make this more precise later

    x1 = 01000101001110

    x2 = 01001100000111

  • MIT

    What do we mean by constructing a code?

    • Block code: map a block of bits to another, generally larger,block of bits

    • (n, k) linear block code encodes n-bit message into k-bit codevector, rate of code is R = n/k

    • In general, code maps a message from set M= 2n onto acodeword of length k

    • Every codeword is one-to-one correspondence with amessage

    • An error occurs in decoding if we map a received codewordto the wrong message or to more than one message

  • MIT

    What do we mean by decoding?

    • We choose most likely input to have yielded observed output- Maximum Likelihood Decoding

    • Given that we observe the output vector y, the probabilitythat x1 was transmitted is p(x1 | y) and the probability that x2was transmitted is p(x2 | y) (let’s look at 2 codewords)

    • Then we pick x1 when p(x1 | y) / p(x2 | y) > 1 or equivalentlywhen ln(p(x1 | y) ) - ln(p(x2 | y) ) > 0 in terms of loglikelihoods

  • MIT

    Decoding example

    • Consider the binary symmetric channel (BSC)

    • The probabilities are:– p(x1| y) = (1-p)13 p 13 no errors and 1 error– p(x1| y) = (1-p)11 p3 11 no errors and 3 errors

    • The BER is p

    0

    1 1

    0Tx Rx1-p

    1-p

    p

    p

  • MIT

    Hamming distance and code capability

    • Minimum Hamming Distance dmin of a code is the minimumHamming Distance between any two codewords

    • A code can detect up to t errors iff dmin t + 1• A code can correct up to t errors iff dmin 2t + 1• A code can correct up to t errors and detect up to t’ > t errors iff

    dmin 2t + 1 and dmin t + t’ + 1• How does dmin relate to r = k-n, the number of redundant bits?• Singleton bound: r + 1 dmin• Example take strings of n bits and add a parity check code: r is 1

    and dmin is 2• These are bounds, most codes don’t achieve these bounds

    !

    !

    ! !

    !

  • MIT

    Linear codes: how do we build them?

    • Code word x = s G where for (n,k) code G is a matrix with krows and n columns

    • Linear code because sums of codewords are codewords• How can we find dmin ? The Hamming distance between two

    binary codewords is the same as the Hamming weight of theirsum

    • Thus, the minimum Hamming distance is the minimumweight of non-zero code vectors

  • MIT

    Syndromes and parity-check

    • For any code, it is always possible to find a systematic code,i.e. a code for which G = [I k x k P k x r ]by manipulating therows using linear operations

    • We define the parity-check matrix H to be:

    • We have x H = 0• Syndrome s = y H• Define error sequence e = y + x• Then s = e H• Decoding is done by finding the minimum Hamming weight

    sequence that satisfies s = e H and decoding to the codewordy + z

    I-

    P

    kk x

    kr x

    !"

    #$%

    &

  • MIT

    Let’s think in 3-D

    • Hamming distance between codes is related to t, the numberof errors to correct in a (n, k) linear block code

    • Example: rate (3,1) repetition code: can correct at most 1error and detect 2 errors

    000

    111

    100

    010110

    010

    011

  • MIT

    Further bounds

    • Hamming bound: r

    • Gilbert bound: there exists a code such that:

    !!"

    #$$%

    &!!"

    #$$%

    &' (

    =

    t

    0j j

    nlog

    !!"

    #$$%

    &!!"

    #$$%

    &' (

    =

    2t

    0j j

    nlogr

  • MIT

    Cyclic Codes

    • Cyclic codes are codes with cyclic shift property in code words:if (a, b, c, d, e) is a codeword, so is (b, c, d, e, a)

    • A generator matrix is then

    • Can we make use of the regular structure of this matrix to simplifythings?

    !!!!!!!

    "

    #

    $$$$$$$

    %

    &

    =

    mm

    0m

    0m

    g...g...

    .

    .

    .

    ...g...g0

    ...g...g

    G

  • MIT

    Polynomials on fields

    • Galois field: a field with a finite number of elements• We can define a polynomial over a finite field (Galois Field)

    GF(q) as f(D) = fn Dn + fn-1 Dn-1 + … + f0

    • D is not a variable in the field, it is an indeterminate - think of theD-transform or the Z-transform

    • For polynomials, addition and multiplication can be defined, theyare commutative, associative, closure holds, an addition inverse isdefined BUT NO MULTIPLICATIVE INVERSE

    • Instead, we have the following theorem: let f(D) and g(D) bepolynomials over GF(q) and let g(D) have degree at least 1. Thenthere exist unique polynomials h(D) and r(D) over GF(q) for whichthe degree of r(D) is less than that of g(D) and

    f(D) = g(D) h(D) + r(D) (Euclidean division algorithm)

  • MIT

    Relation between polynomials and cyclic codes

    • Represent a vector x = (xn-1 … x0 ) as x(D) = xn-1 Dn-1 + … + x0

    • We say that x(D) is a codeword when the coefficients for theletters of a codeword

    • If we have a cyclic code, then it is the case that the remainder ofdividing D x(D) by Dn - 1 ( also called the remainder of D x(D)modulo Dn - 1 ) is also a code word

    • To see this: Dx(D) = xn-1 Dn + … + x0 D = xn-1(Dn - 1) + … + x0 D + xn-1

    • Define g(D) to be the lowest degree monic polynomial which is acode word in a cyclic code define m to be its degree

    • Because of linearity, any Dj g(D) is also a codeword, so in generala(D) g(D) is a codeword

    • Moreover all code words have this form

  • MIT

    Implementing systematic cyclic codes

    • In polynomial form, a systematic binary code is of the form:x(D) = Dr t(D) - d(D) = Dr t(D) + d(D)

    • The check bits are d(D) and d(D) has degree less than r• Let’s take d(D) to be the remainder of Dr t(D) / g(D)• What about the syndrome?• s(D) = x(D) / g(D) = Dr t(D) / g(D) + d(D) / g(D) = 2 d(D) = 0• For some q(D), we have g(D) q(D) + d(D) = Dr t(D)• So g(D) q(D) = Dr t(D) - d(D) = Dr t(D) + d(D) which is our

    codeword x(D)

  • MIT

    Implementing cyclic codes

    Divide by g(D) circuit

    +.... . .

    g0 g1 gr-1

    s1+

    s2 ... +sr-1

    Dr s(D)

    Dr s(D)

    +

    Divide by g(D)

    ...

    Encoder

    Circuits are easily implementedusing shift registers to represent polynomials in D

  • MIT

    Decoding

    • Brute force: divide y(D) by g(D), from there get syndrome, checksyndrome against syndrome table, generate n-bit error pattern, thenadd to y(D) to obtain corrected word

    • However, we know that we can only correct a number t of errors• Let’s define the Meggitt set to be the set of error patterns such that

    en-1 = 1• Use Meggitt’s theorem: suppose that g(D) h(D) = Dn - 1 and

    e(D) / g(D) = s(D) then[Db e(D) mod (Dn - 1)]/g(D) = [Db s(D)]/g(D)

    • So only need to store syndromes for error patterns in the Meggittset

  • MIT

    Different types of codes: Hamming codes

    • Hamming codes are some of the few “perfect” codes• Sphere of radius v: set of all sequences at distances v or less from a

    sequence• A v-error-correcting sphere-packed code has the property that the

    spheres of size v around the code are non-overlapping and thatevery sequence is at most v+1 in distance from some code word

    • A v-error-correcting sphere-packed code for which every sequenceis at most v+1 in distance from some code word is perfect

    • Hamming codes: rows of H are all the different nonzero sequencesof length k-n

    • Thus k = 2 k-n -1• For all Hamming codes, the minimum Hamming distance is 3, so

    they can correct 1 error or detect 2• The rate of a Hamming code is R = (2 r - r -1 )/(2 r -1 )

  • MIT

    Cyclic Redundancy Check Codes (CRCs)

    • Normally look only to see whether the syndrome is 0 or not,which is why they are error-detecting

    • This is not a function of the CRC code, it’s more of the wayit’s decoded

    • Usually used in a concatenated fashion with an errorcorrecting code

    • g(D) = (x+1) p(D) where p(D) is a primitive polynomial,which means that it divides D2r-1-1 -1but no lower orderpolynomial of the form Dv -1

    • Length k = 2 r-1 - 1• Usually used in shortened mode: for a systematic code, stuff

    with 0s and those 0s need not be transmitted but must betaken into account at the decoding

  • MIT

    Convolutional codes

    • Rather than map a block to a block, we code continuously• Tend to be used in lower BER channels• We can define the code as a set of states: for M memory

    elements in the encoder, there are 2 M states• The trellis structure is as follows, say M=2

    11

    0100

    10t t + 1

    Trellis

    Decoding•We use a time between known states•Possible ways of getting from one state to another are called the adversary paths•Viterbi algorithm is a means of applying dynamic programming to path selection

  • MIT

    Beyond codes based on distance

    • What is the best performance we can get with a coded system?• Let us consider a channel with bandwidth W, with noise energy N

    per Hz, and with a maximum E on the mean transmitted• Shannon’s limit is that if we have no delay constraint and no

    complexity constrain on the coder and decoder, the best achievableerror-free rate (capacity) is given by

    Bits per second

    • A modulation to achieve this limit is a one that itself looks likenoise and the codes are random

    !"

    #$%

    &+NW

    E1log

    2

    W

  • MIT

    Capacity

    • Recall the BSC

    • Capacity is: I(X; Y) = H(Y) - H(Y|X)• H is entropy H(Z) = - Σ pZ(z) log(pZ(z) )• Let’s work it out ...

    0

    1 1

    0X Y1-p

    1-p

    p

    p

  • MIT

    Random codes

    • The codes are random in that the codewords are selected atrandom from a large number of codewords

    • The coding theorem gives results based on the fact that anerror is unlikely because the for long enough codewords, twocodewords are unlikely to be the close enough to be confused

    • Codes have emerged which work on that principle• Examples: Turbo Codes and Low Density Parity Check

    Codes