LTE Implementation Uing Xilinx FPGA

Embed Size (px)

DESCRIPTION

This is a text book for LTE implementation using Xlininx models. and here that implementation also done by FPGA

Citation preview

  • implementation

    using FPGA

    Supervised By

    Dr. Mohammed Rizk

  • 2

  • LTE implementation using XILINX FPGA

    July 8, 2013

  • ii

  • Prepared by

    Abdullah Elsaied Kamal Elsheikh

    eng [email protected]

    Ahmed Helmy Elbendary

    [email protected]

    Ahmed Talal Khalifa

    [email protected]

    Belal Mohammed Abu El-Ola

    [email protected]

    Eslam Ossama Youssef

    [email protected]

    Hassan Hesham Hassan Shalaby

    [email protected]

    Hazem Mamdouh Tharwat

    eng.hazem [email protected]

    iii

  • Khalid Eid Elsayed

    [email protected]

    Mahmoud Gamal [email protected]

    Muhammad Gamal Abbas Ahmed

    eng [email protected]

    Samer Sarwat Nageeb

    samersarwat [email protected]

    iv

  • Acknowledgments

    This book was written during our fourth year time at the Department of Communications Engineer-ing at the University of Alexandria and basically describes our work and study in our graduationproject. Certainly, it could not have been written without the support and patience of many peo-ple. Therefore, we are obliged to everyone who assisted us during that time. In particular, wewant to express our gratitude to our supervisor Dr. Mohamed Rizk for all the valuable advice,encouragement, and discussions. The opportunity to work with him was a precious experience, heexerts all the effort and time to help us to learn, search, and do our best in this project.

    Also we want to thank Our Professors in the communication department, who made their bestto teach us the soul of Communication and electronic Engineering, Specially Dr. Essam Sourrourwho accorded us with all the help and support whenever we asked, and our deep thanks to teacherassistants eng. Kareem Banawan , eng. Ahmed Serag and eng. Mostafa Medra who wereour beacon through our project journey.

    Also we want to thank eng. Mohammed Mostafa for helping us with the FPGA .

    Most of all, we thank our beloved families for their immeasurable support, encouragement, andpatience while working on this project. Without their love and understanding, this book and ourproject would not have come to fruition.

    At the end and the beginning, we would be remiss if we fail to express our profound gratitude toAllah who always we asking for his assistance and we owing to him with any success and progresswe made in our life.

    v

  • vi

  • Preface

    Market needs for higher data rates are driving the evolution of wireless cellular systems fromnarrowband 2G GSM systems to 4G LTE systems supporting peak data rates up to 100 Mbps.

    For LTE specifications, complex signal processing techniques such as multiple-input multiple-output (MIMO), along with radio technologies like OFDMA, are considered key to achieving targetthroughputs in excess of 100 Mbps In-building coverage is also regarded as a key requirement forfuture wireless growth, with technologies such as pico and femto base stations trying to addressthis issue.

    The emerging wireless technologies described above pose significant challenges for operatingequipment manufacturers needing to design products that are not only scalable and cost-effectivebut also flexible and reusable. These diverse requirements ultimately make FPGA the hardwareplatform of choice.

    The aim of our project is to implement the LTE physical layer on FPGA.

    vii

  • viii

  • Abbreviations

    16-QAM 16 quadrature amplitude modulation

    2G Second generation

    3G Third generation

    3GPP Third Generation Project Partnership

    4G Fourth generation

    64-QAM 64quadrature amplitude modulation

    ARQ Automatic repeat request

    BCJR Bahl, Cocke, Jelinek and Raviv

    BLAST Bell Labs Layered Space Time

    BPSK Binary phase shift keying

    E-UTRA Evolved UMTS Terrestrial Radio Access

    EGC Equal Gain Combining

    eNB E-UTRAN NodeB

    FDD Frequency Division Duplex

    FDMA Frequency division multiple access

    FFT Fast Fourier transform

    HARQ Hybrid ARQ

    HDA Hard Decision Aided

    HSDPA High speed downlink packet access

    LLR Log Likelihood Ratio

    MAP maximum a posteriori

    MIMO Multiple Input Multiple Output

    ix

  • MISO Multiple Input Single Output

    ML Maximum Likelihood

    MMSE Minimum Mean Square Error

    MRC Maximum Ratio Combining

    MU-MIMO Multi User MIMO

    OFDM Orthogonal frequency division multiplexing

    OFDMA Orthogonal frequency division multiple access

    PAPR Peak-to-Average Power Ratio

    PMI Precoding Matrix Indicator

    QAM Quadrature Amplitude Modulation

    QPSK Quadrature Phase Shift Keying

    RI Rank Indicator

    SFBC SpaceFrequency Block Code

    SIC Successive Interference Cancellation

    SIMO Single Input Multiple Output

    SISO Single Input Single Output

    SNR Signal-to-Noise Ratio

    STBC SpaceTime Block Code

    STC SpaceTime Code

    STTC SpaceTime Trellis Code

    SU-MIMO Single User MIMO

    TDD Time Division Duplex

    V-BLAST Vertical BLAST

    ZF Zero Forcing

    x

  • Contents

    1 Overview on LTE 1

    1.1 Motivation For LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 From UMTS to LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.2.1 High Level Architecture of LTE . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.2.2 Long Term Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.3 System Architecture Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 FPGA 9

    2.1 Key factors for describing FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.1.1 Fabrication process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.1.2 Logic density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.1.3 Clock management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.1.4 On-chip memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.1.5 DSP capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    2.2 Virtex-5 FPGA Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.2.1 Summary of Virtex-5 FPGA Features . . . . . . . . . . . . . . . . . . . . . . 11

    2.2.2 Virtex-5 FPGA Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.2.3 550 MHz Clock Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.2.4 SelectIO Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2.2.5 550 MHz Integrated Block Memory . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.2.6 550 MHz DSP48E Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    2.2.7 Digitally Controlled Impedance (DCI)Active I/O Termination . . . . . . . . . 16

    2.2.8 Advanced Flip-Chip Packaging . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.2.9 System Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    2.2.10 65-nm Copper CMOS Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    2.2.11 Tri-Mode Ethernet Media Access Controller . . . . . . . . . . . . . . . . . . . 17

    2.2.12 RocketIO GTP Transceivers (LXT/SXT only) . . . . . . . . . . . . . . . . . 17

    2.3 Architectural Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.3.1 Virtex-5 FPGA Array Overview . . . . . . . . . . . . . . . . . . . . . . . . . 18

    2.3.2 Virtex-5 FPGA Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.3.3 Input/Output Blocks (SelectIO) . . . . . . . . . . . . . . . . . . . . . . . . . 19

    2.3.4 Configurable Logic Blocks (CLBs) . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.3.5 Block RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.3.6 Global Clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    2.3.7 DSP48E Slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    xi

  • 2.3.8 Routing Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.3.9 Boundary Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.3.10 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    2.3.11 System Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    2.3.12 Virtex-5 LXT, SXT, TXT, and FXT Platform Features . . . . . . . . . . . . 23

    2.3.13 Tri-Mode (10/100/1000 Mb/s) Ethernet MACs . . . . . . . . . . . . . . . . . 23

    2.3.14 Integrated Endpoint Blocks for PCI Express . . . . . . . . . . . . . . . . . . . 24

    2.3.15 Virtex-5 LXT and SXT Platform Features . . . . . . . . . . . . . . . . . . . . 24

    2.3.16 RocketIO GTP Transceivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    2.3.17 Virtex-5 TXT and FXT Platform Features . . . . . . . . . . . . . . . . . . . 24

    2.3.18 RocketIO GTX Serial Transceivers . . . . . . . . . . . . . . . . . . . . . . . . 25

    2.4 ML505 evaluation board . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    3 CRC and Segmentation 29

    3.1 CRC (cyclic redundancy check) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.1.1 CRC polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    3.1.2 CRC calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.1.3 Modulo-2 arithmatic example . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.1.4 CRC calculation example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    3.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.2.1 What is segmentation ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

    3.2.2 Example: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.2.3 Problem solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.2.4 Segmentation process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

    3.3 Matlab code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

    3.4 VHDL code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    4 Turbo Codes 45

    4.1 A Brief History of Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.2 Turbo Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.2.1 The Component Encoder with Binary Codes . . . . . . . . . . . . . . . . . . 47

    4.2.2 Interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.2.3 Trellis Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    4.2.4 Puncturing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4.3 Iterative Decoding Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    4.3.1 BCJR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    4.3.2 Tools for Iterative Decoding of Turbo Codes . . . . . . . . . . . . . . . . . . . 50

    4.4 Optimal and Suboptimal Algorithms for Turbo Decoding . . . . . . . . . . . . . . . 52

    4.4.1 MAP algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    4.4.2 Log-MAP Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    4.4.3 Max-Log-Map Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    4.5 Improvements In Turbo Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    4.5.1 Extrinsic Information Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    4.5.2 The Sliding Window Soft Input Soft Output Decoder . . . . . . . . . . . . . 57

    4.5.3 Stopping Criteria for Turbo Decoding . . . . . . . . . . . . . . . . . . . . . . 59

    4.5.4 Modulo Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    xii

  • 4.6 LTE Standard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.6.1 Turbo Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.6.2 Trellis termination for turbo encoder . . . . . . . . . . . . . . . . . . . . . . . 624.6.3 Interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

    4.7 Implementation of Turbo Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.7.1 Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.7.2 The Turbo Encoder main blocks . . . . . . . . . . . . . . . . . . . . . . . . . 644.7.3 PISO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.7.4 Interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664.7.5 Convolutional code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 674.7.6 SIPO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.7.7 TRELLIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    4.8 Simulations of Turbo Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.8.1 By using Modelsim and Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . 70

    4.9 Workflow for Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.9.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 704.9.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    4.10 Design Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.10.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.10.2 Extrinsic Information Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.10.3 Sliding window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.10.4 Stopping Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 744.10.5 Internal word length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    4.11 Implementation of Map Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.11.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.11.2 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    4.12 Implementation of Turbo Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.12.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.12.2 Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804.12.3 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.12.4 Ressource utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.12.5 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.12.6 BER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

    5 RATE MATCHING 895.1 Subblock interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 915.2 permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 925.3 Subblock interlacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.4 Hybrid ARQ soft buffer limitation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.5 RV starting points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 945.6 Implementation of Rate Matching Transmitter . . . . . . . . . . . . . . . . . . . . . 95

    5.6.1 The Rate Matching Transimatter main blocks . . . . . . . . . . . . . . . . . . 955.6.2 Sub block interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 955.6.3 The function of the Sub block interleaver . . . . . . . . . . . . . . . . . . . 965.6.4 Bit collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

    5.7 Simulation of Transmitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    xiii

  • 5.7.1 the first Sub block interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . 98

    5.7.2 the Third Sub block interleaver . . . . . . . . . . . . . . . . . . . . . . . . . 98

    5.7.3 The Bit collection Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    5.7.4 The Bit selection Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    5.8 Simulation of receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    5.8.1 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    5.8.2 VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

    6 Scrambling 115

    6.1 PN-sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    6.1.1 m-sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

    6.1.2 Preferred Pair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

    6.1.3 Gold Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

    6.2 Scrambler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

    6.3 Why scrambling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

    6.3.1 Data randomization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

    6.3.2 PAPR reduction(peak to average power ratio) . . . . . . . . . . . . . . . . . . 122

    6.4 Matlab code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

    7 Digital Modulation Technique 129

    7.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

    7.2 HIERARCHY OF DIGITAL MODULATION TECHNIQUES . . . . . . . . . . . . 131

    7.3 Pass band Transmission Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

    7.4 COHERENT PHASE-SHIFT KEYING . . . . . . . . . . . . . . . . . . . . . . . . . 133

    7.4.1 Binary Phase-Shift Keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

    7.4.2 QUADRIPHASE-SHIFT KEYING . . . . . . . . . . . . . . . . . . . . . . . 137

    7.4.3 M-ARY PSK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

    7.4.4 Frequency-Shift Keying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

    7.4.5 M-ary QUADRATURE AMPLITUDE Modulation (QAM Mod.): . . . . . . 148

    7.4.6 Compare between (PSK) Vs (QAM) . . . . . . . . . . . . . . . . . . . . . . . 152

    7.5 Noncoherent Orthogonal Modulation : . . . . . . . . . . . . . . . . . . . . . . . . . . 153

    7.5.1 NONCOHERENT BINARY FSK: . . . . . . . . . . . . . . . . . . . . . . . . 153

    7.5.2 Differential phase shift keying (DPSK): . . . . . . . . . . . . . . . . . . . . . 155

    7.6 Table of BER equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

    7.7 Modulation in LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

    7.8 Soft demodulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

    8 MIMO 173

    8.1 MIMO concepts and capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

    8.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

    8.1.2 WIRELESS CHANNEL IMPAIREMENTS: . . . . . . . . . . . . . . . . . . 174

    8.1.3 What is MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

    8.1.4 MIMO vs. Channel Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

    8.1.5 SISO, SIMO, MISO and MIMO terminology . . . . . . . . . . . . . . . . . . 177

    8.2 Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

    8.2.1 Types of diversity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

    xiv

  • 8.2.2 Receive Diversity: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1858.3 Spatial multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

    8.3.1 Principles of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2088.3.2 V-blast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2098.3.3 spatial multiplexing Types : . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

    8.4 Downlink MIMO modes in LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2158.4.1 Precoding for two antenna ports . . . . . . . . . . . . . . . . . . . . . . . . . 2238.4.2 CDD-based precoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

    9 Orthogonal Frequency Division Multiplixing (OFDM) 2319.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2329.2 OFDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233

    9.2.1 Why OFDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2339.2.2 Orthogonal Multiplexing Principle . . . . . . . . . . . . . . . . . . . . . . . . 2359.2.3 OFDM adventage and disadventages . . . . . . . . . . . . . . . . . . . . . . . 2399.2.4 Peak-to-Average Power Ratio and Sensitivity to Non-Linearity . . . . . . . . 2409.2.5 PAPR Reduction Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 2419.2.6 Cyclic Prefix Insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2429.2.7 Frequency-domain model of OFDM transmission . . . . . . . . . . . . . . . . 2469.2.8 Channel estimation and reference symbols . . . . . . . . . . . . . . . . . . . . 248

    9.3 OFDM as a user-multiplexing and multiple-access scheme . . . . . . . . . . . . . . . 2499.4 The downlink physical resource: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

    A Matlab 259A.1 Communications System Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259A.2 Fixed Point Toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261A.3 Matlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261A.4 HDL Verifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

    A.4.1 Workflow for Using the Cosimulation Wizard to Create a MATLAB System Ob-ject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261

    B Xilinx ISE Overview 263B.1 Design Flow Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

    B.1.1 Design Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264B.1.2 Design Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264B.1.3 Design Verification (simulation) . . . . . . . . . . . . . . . . . . . . . . . . . . 264B.1.4 Design Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265B.1.5 Device Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265

    B.2 Starting the ISE Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265B.2.1 Create a New Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265B.2.2 Create an HDL Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266B.2.3 Checking the Syntax of the New Counter Module . . . . . . . . . . . . . . . . 268B.2.4 Implement Design and Verify Constraints . . . . . . . . . . . . . . . . . . . . 269

    xv

  • xvi

  • List of Figures

    1.1 Global total traffic in mobile networks, 2007-2012 . . . . . . . . . . . . . . . . . . . . 2

    1.2 Main LTE performance targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Evolution of the system architecture from GSM and UMTS to LTE. . . . . . . . . . 4

    2.1 Global total traffic in mobile networks, 2007-2012 . . . . . . . . . . . . . . . . . . . . 26

    4.1 Brief history of turbo codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    4.2 The Turbo Coding/Decoding Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 46

    4.3 Encoder Block Diagram (Binary) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    4.4 Recursive systematic convolution encoder with feedback rate 1/2 code with memory2. The generator polynomials are g0(D) = 1 +D +D

    2 and g1(D) = 1 +D2 . . . . . 47

    4.5 soft-in/soft-out decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

    4.6 Iterative decoding procedure with two soft-in/soft-out decoders . . . . . . . . . . . . 52

    4.7 Relation between Map, Log Map and Max Log Map . . . . . . . . . . . . . . . . . . 53

    4.8 Trellis structure of Systematic Convolution Codes with Feedback Encoders . . . . . 53

    4.9 turbo code with different scaling factors and block length 5114 bit, 8 iterations, AWGN 57

    4.10 Graphical representation of a real-time MAP architecture . . . . . . . . . . . . . . . 58

    4.11 Average number of iterations for various stopping schemes . . . . . . . . . . . . . . . 60

    4.12 Graphical example of modulo normalisation. . . . . . . . . . . . . . . . . . . . . . . . 61

    4.13 Hardware realisation of modulo normalisation. . . . . . . . . . . . . . . . . . . . . . 61

    4.14 Structure of rate 1/3 turbo encoder (dotted lines apply for trellis termination only) . 62

    4.15 The work flow used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

    4.16 Steps of floating point design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    4.17 Fixed point design is obtained by quantizing the floating point design . . . . . . . . 72

    4.18 Steps of implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    4.19 BER rate curve for turbo codes using Map at different iterations . . . . . . . . . . . 74

    4.20 comparison between max log map and map BER curves (interleaver size=1088 num-ber of iterations = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    4.21 comparison between different scaling factors (interleaver size=1088 number of itera-tions = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    4.22 comparison between different sliding window techniques (interleaver size=1088 num-ber of iterations = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    4.23 comparison between two B units and no sliding window (interleaver size=1088 num-ber of iterations = 3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

    4.24 comparison between different early stopping criteria . . . . . . . . . . . . . . . . . . 77

    xvii

  • 4.25 relation between BER and internal size of turbo decoder at SNR -9.16 dB and 2 iter-ations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    4.26 comparison between floating point and fixed point turbo decoder with internal widthof 11 (interelaver size=1088 number of iterations = 2) . . . . . . . . . . . . . . . . . 78

    4.27 High-level VLSI architecture of the implemented max-log map decoder (thin boxes in-dicate registers). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

    4.28 High-level VLSI architecture of the implemented turbo decoder. . . . . . . . . . . . . 80

    4.29 The timing diagram of the implemented map decoder . . . . . . . . . . . . . . . . . 81

    4.30 The timing diagram of the implemented map decoder . . . . . . . . . . . . . . . . . 81

    4.31 The placed and routed design on FPGA . . . . . . . . . . . . . . . . . . . . . . . . . 84

    4.32 BER curves for the implemented decoder . . . . . . . . . . . . . . . . . . . . . . . . 85

    5.1 Circular-buffer rate matching for turbo . . . . . . . . . . . . . . . . . . . . . . . . . . 90

    8.1 CHANNEL IMPAIREMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

    8.2 Shadowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

    8.3 Interference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

    8.4 SISO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

    8.5 SIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

    8.6 MISO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

    8.7 MIMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

    8.8 MIMO single-user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

    8.9 MIMO multi-user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

    8.10 table 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

    8.11 Frequency diversity Vs time at one slot . . . . . . . . . . . . . . . . . . . . . . . . . 182

    8.12 Frequency diversity Vs time at two slots . . . . . . . . . . . . . . . . . . . . . . . . . 182

    8.13 Twp Antenna Delay Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

    8.14 Twp Antenna Cyclic Delay Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

    8.15 Receive Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

    8.16 main idea of Receive Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

    8.17 Selective Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

    8.18 branch selective diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

    8.19 Threshold Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

    8.20 Switch-and-examine strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

    8.21 Switch-and-stay strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

    8.22 Maximal Ratio Combining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

    8.23 Transmit Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

    8.24 SpaceTime Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

    8.25 SpaceFrequency Block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

    8.26 Transmit Diversity Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

    8.27 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

    8.28 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200

    8.29 Two-Branch Transmit Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

    8.30 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

    8.31 Two-Branch transmit diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

    8.32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

    xviii

  • 8.33 Demodulation/decoding of spatially multiplexed signals based on successive interfer-ence cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

    8.34 2 2 MIMO channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2128.35 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2168.36 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2188.37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2188.38 Transmit diversity precoding and RE mapping for two antenna ports . . . . . . . . . 2208.39 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2218.40 Illustration of feedback-based MIMO precoding . . . . . . . . . . . . . . . . . . . . . 223

    9.1 Spectral efficiency of OFDM compared to classical multicarrier modulation: (a) clas-sical multicarrier system spectrum; (b) OFDM system spectrum. . . . . . . . . . . . 232

    9.2 Extension to wider transmission bandwidth by means of multi-carrier transmission. . 2339.3 Per-subcarrier pulse shape and spectrum for basic OFDM transmission. . . . . . . . 2349.4 OFDM subcarrier spacing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2349.5 Serial-to-Parallel (S/P) conversion operation for OFDM. . . . . . . . . . . . . . . . . 2369.6 Effect of channel on signals with short and long symbol duration. . . . . . . . . . . . 2379.7 OFDM system model: (a) transmitter; (b) receiver. . . . . . . . . . . . . . . . . . . . 2389.8 OFDM Cyclic Prefix (CP) insertion. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2389.9 PAPR distribution for different numbers of OFDM subcarriers. . . . . . . . . . . . . 2419.10 Time dispersion and corresponding received-signal timing. . . . . . . . . . . . . . . . 2439.11 Cyclic-prefix insertion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2449.12 Frequency-domain model of OFDM transmission/reception. . . . . . . . . . . . . . . 2479.13 Frequency-domain model of OFDM transmission/reception with one-tap equalization

    at the receiver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2479.14 Time-frequency grid with known reference symbols. . . . . . . . . . . . . . . . . . . . 2489.15 OFDM as a user-multiplexing/multiple-access scheme : (a) downlink and (b) uplink 2499.16 Distributed user multiplexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2509.17 Uplink transmission-timing control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2509.18 The LTE downlink physical resource . . . . . . . . . . . . . . . . . . . . . . . . . . . 2519.19 Frequency-domain structurefor LTE downlink . . . . . . . . . . . . . . . . . . . . . . 2539.20 detailed time domain structure for LTE downlink transmission . . . . . . . . . . . . 2549.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2549.22 downlink resource block assuming normal cyclic prefix (i.e 7 OFDM symbols per slot).

    with extended cyclic prefix there are six OFDM symbols per slot. . . . . . . . . . . . 255

    xix

  • xx

  • List of Tables

    1.1 Key features of the air interfaces of WCDMA and LTE . . . . . . . . . . . . . . . . . 51.2 Key features of the radio access networks of UMTS and LTE . . . . . . . . . . . . . 61.3 Key features of the core networks of UMTS and LTE . . . . . . . . . . . . . . . . . . 6

    4.1 Detailed power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.2 Summary of power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.3 Resources utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 834.4 Throughput of the implemented design . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    xxi

  • xxii

  • Chapter 1

    Overview on LTE

    1.1 Motivation For LTE

    The evolution of 3G systems into 4G is driven by the creation and development of new servicesfor mobile devices, and is enabled by advancement of the technology available for mobile systems.There has also been an evolution of the environment in which mobile systems are deployed andoperated, in terms of competition between mobile operators, challenges from other mobile tech-nologies, and new regulation of spectrum use and market aspects of mobile systems.

    The rapid evolution of the technology used in telecommunication systems, consumer electronics,and specifically mobile devices has been remarkable in the last 20 years. Moores law illustratesthis and indicates a continuing evolution of processor performance and increased memory size, of-ten combined with reduced size, power consumption, and cost for devices. High-resolution colordisplays and megapixel camera sensors are also coming into all types of mobile devices. Com-bined with a high-speed internet backbone often based on optical fiber networks, we see that arange of technology enablers are in place to go hand-in-hand with advancement in mobile com-munications technology such as LTE.

    The rapid increase in use of the internet to provide all kinds of services since the 1990s startedat the same time as 2G and 3G mobile systems came into widespread use. The natural next stepwas that those internet-based services also moved to the mobile devices, creating what is todayknow as mobile broadband. Being able to support the same Internet Protocol (IP)-based servicesin a mobile device that people use at home with a fixed broadband connection is a major chal-lenge and a prime driver for the evolution of LTE. A few services were already supported by theevolved 2.5G systems, but it is not until the systems are designed primarily for IP-based servicesthat the real mobile IP revolution can take off. An interesting aspect of the migration of broad-band services to mobile devices is that a mobile flavor is also added. The mobile position and themobility and roaming capabilities do in fact create a whole new range of services tailored to themobile environment.

    Fixed telephony (POTS) and earlier generations of mobile technology were built for circuit switchedservices, primarily voice. The first data services over GSM were circuit switched, with packet-based GPRS coming in as a later addition. This also influenced the first development of 3G,which was based on circuit switched data, with packet-switched services as an add-on. It wasnot until the 3G evolution into HSPA and later LTE/LTE-Advanced that packet-switched ser-vices and IP were made the primary design target. The old circuit-switched services remain, but

    1

  • will on LTE be provided over IP, with Voice-over IP (VoIP) as an example. IP is in itself serviceagnostic and thereby enables a range of services with different equirements.

    The main service-related design parameters for a radio interface supporting a variety of servicesare:

    Data rate. Many services with lower data rates such as voice services are important and stilloccupy a large part of a mobile networks overall capacity, but it is the higher data rate servicesthat drive the design of the radio interface. The ever increasing demand for higher data rates forweb browsing, streaming and file transfer pushes the peak data rates for mobile systems fromkbit/s for 2G, to Mbit/s for 3G and getting close to Gbit/s for 4G.

    increased user data rates as shown in figure ??

    cell-edge bit-rate, for uniformity of service provision

    Figure 1.1: Global total traffic in mobile networks, 2007-2012

    Delay. Interactive services such as real-time gaming, but also web browsing and interactivefile transfer, have requirements for very low delay, making it a primary design target. There are,however,many applications such as e-mail and television where the delay requirements are not asstrict. The delay for a packet sent from a server to a client and back is called latency.

    delays, in terms of both connection establishment and transmission latency

    2

  • Capacity. From the mobile system operators point of view, it is not only the peak data ratesprovided to the end-user that are of importance, but also the total data rate that can be pro-vided on average from each deployed base station site and per hertz of licensed spectrum. Thismeasure of capacity is called spectral efficiency. In the case of capacity shortage in a mobile sys-tem, the Quality-of-Service (QoS) for the individual end-users may be degraded.

    reduced cost per bit, implying improved spectral efficiency greater flexibility of spectrum usage, in both new and pre-existing bands

    Also

    seamless mobility, including between different radio-access technologies reasonable power consumption for the mobile terminal. packet switched optimized

    Figure 1.2: Main LTE performance targets

    1.2 From UMTS to LTE

    1.2.1 High Level Architecture of LTE

    In 2004, 3GPP began a study into the long term evolution of UMTS. The aim was to keep 3GPPsmobile communication systems competitive over timescales of 10 years and beyond, by deliveringthe high data rates and low latencies that future users would require. Figure ?? shows the re-sulting architecture and the way in which that architecture developed from that of UMTS.In the new architecture, the evolved packet core (EPC) is a direct replacement for the packetswitched domain of UMTS and GSM. It distributes all types of information to the user, voiceas well as data, using the packet switching technologies that have traditionally been used for dataalone. There is no equivalent to the circuit switched domain: instead, voice calls are transportedusing voice over IP. The evolved UMTS terrestrial radio access network (E-UTRAN) handles theEPCs radio communications with the mobile, so is a direct replacement for the UTRAN. Themobile is still known as the user equipment, though its internal operation is very different frombefore.

    3

  • Figure 1.3: Evolution of the system architecture from GSM and UMTS to LTE.

    The new architecture was designed as part of two 3GPP work items, namely system architectureevolution (SAE), which covered the core network, and long term evolution (LTE), which coveredthe radio access network, air interface and mobile. Officially, the whole system is known as theevolved packet system (EPS), while the acronym LTE refers only to the evolution of the air inter-face. Despite this official usage, LTE has become a colloquial name for the whole system, and isregularly used in this way by 3GPP.

    1.2.2 Long Term Evolution

    The main output of the study into long-term evolution was a requirements specification for theair interface [6], in which the most important requirements were as follows.LTE was required todeliver a peak data rate of 100 Mbps in the downlink and 50 Mbps in the uplink. This require-ment was exceeded in the eventual system, which delivers peak data rates of 300 Mbps and 75Mbps respectively. For comparison, the peak data rate of WCDMA, in Release 6 of the 3GPPspecifications, is 14 Mbps in the downlink and 5.7 Mbps in the uplink.

    It cannot be stressed too strongly, however, that these peak data rates can only be reached inidealized conditions, and are wholly unachievable in any realistic scenario. A better measure isthe spectral efficiency, which expresses the typical capacity of one cell per unit bandwidth. LTEwas required to support a spectral efficiency three to four times greater than that of Release 6WCDMA in the downlink and two to three times greater in the uplink.

    Latency is another important issue, particularly for time-critical applications such as voice andinteractive games. There are two aspects to this. Firstly, the requirements state that the timetaken for data to travel between the mobile phone and the fixed network should be less thanfive milliseconds, provided that the air interface is uncongested. Mobile phones can operate intwo states: an active state in which they are communicating with the network and a low-powerstandby state. The requirements state that a phone should switch from standby to the activestate, after an intervention from the user, in less than 100 milliseconds.

    There are also requirements on coverage and mobility. LTE is optimized for cell sizes up to 5 km,

    4

  • works with degraded performance up to 30 km and supports cell sizes of up to 100 km. It is alsooptimized for mobile speeds up to 15 km hr1, works with high performance up to 120 km hr1and supports speeds of up to 350 km hr1. Finally, LTE is designed to work with a variety of dif-ferent bandwidths, which range from 1.4MHz up to a maximum of 20 MHz. Table 1.1 summa-rizes its key technical features, and compares them with those of WCDMA.

    Feature WCDMA LTE

    Multiple access scheme WCDMA OFDMA and SC-FDMAFrequency re-use 100% FlexibleUse of MIMO antennas From Release 7 YesBandwidth 5MHz 1.4, 3, 5, 10, 15 or 20MHzFrame duration 10 ms 10 msTransmission time interval 2 or 10ms 1 msModes of operation FDD and TDD FDD and TDDUplink timing advance Not required RequiredTransport channels Dedicated and shared SharedUplink power control Fast Slow

    Table 1.1: Key features of the air interfaces of WCDMA and LTE

    1.3 System Architecture Evolution

    The main output of the study into system architecture evolution was a requirements specificationfor the fixed network , in which the most important requirements were as follows.

    The evolved packet core routes packets using the Internet Protocol (IP) and supports devicesthat are using IP version 4, IP version 6, or dual stack IP version 4/version 6. In addition, theEPC provides users with always-on connectivity to the outside world, by setting up a basic IPconnection for a device when it switches on and maintaining that connection until it switches off.This is different from the behaviour of UMTS and GSM, in which the network only sets up an IPconnection on request and tears that connection down when it is no longer required.

    The EPC is designed as a data pipe that simply transports information to and from the user: itis not concerned with the information content or with the application. This is similar to the be-haviour of the internet, which transports packets that originate from any application software,but is different from that of a traditional telecommunication system, in which the voice appli-cation is an integral part of the system. Because of this, voice applications do not form part ofLTE: instead, voice calls are controlled by some external entity such as the IP multimedia sub-system (IMS). The EPC simply transports the voice packets in the same way as any other datastream.

    Unlike the internet, the EPC contains mechanisms to specify and control the data rate, error rateand delay that a data stream will receive. There is no explicit requirement on the maximum timerequired for data to travel across the EPC, but the relevant specification suggests a user planelatency of 10 milliseconds for a non roaming mobile, increasing to 50 milliseconds in a typicalroaming scenario [8]. To calculate the total delay, we have to add the earlier figure for the de-lay across the air interface, giving a typical delay in a non roaming scenario of around 20 mil-

    5

  • liseconds. Table 1.2 Key features of the radio access networks of UMTS and LTE Feature UMTSLTE Chapter The EPC is also required to support inter-system handovers between LTE and ear-lier 2G and 3G technologies. These cover not only UMTS and GSM, but also non 3GPP systemssuch as cdma2000 and WiMAX. Tables 1.2 and 1.3 summarize the key features of the radio ac-cess network and the evolved packet core, and compare them with the corresponding features ofUMTS.

    Feature UMTS LTE

    Radio access network Node B, RNC eNBcomponentsRRC protocol states CELL DCH, CELL FACH, RRC IDLE

    CELL PCH, URA PCH, RRC CONNECTED,RRC IDLE

    Handovers Soft and hard HardNeighbour lists Always required Not required

    Table 1.2: Key features of the radio access networks of UMTS and LTE

    Feature UMTS LTE

    IP version support IPv4 and IPv67 IPv4 and IPv6USIM version support Release 99 USIM onwards Release 99 USIM onwardsTransport mechanisms Circuit & packet switching Packet switchingCS domain components MSC server, MGW n/aPS domain components SGSN, GGSN MME, S-GW, P-GWIP connectivity After registration During registrationVoice and SMS Included External

    Table 1.3: Key features of the core networks of UMTS and LTE

    6

  • Bibliography

    [1] Christopher Cox. An Introduction to LTE. John Wiley & Sons Ltd, 2012.

    [2] Stefan Parkvall Erik Dahlman and Johan Skld. 4G LTE/LTE-Advanced for Mobile Broad-band. Elsevier Ltd., 2011.

    [3] Harri Holma and Antti Toskala. LTE for UMTS OFDMA and SC-FDMA Based Radio Ac-cess. John Wiley & Sons, Ltd, 2009.

    7

  • 8

  • Chapter 2

    FPGA

    Field programmable gate arrays (FPGAs) are digital integrated circuits (ICs) that contain con-figurable (programmable) blocks of logic along with configurable interconnects between theseblocks. Design engineers can configure (program) such devices to perform a tremendous varietyof tasks.

    2.1 Key factors for describing FPGAs

    2.1.1 Fabrication process

    The more advanced fabrication process brings higher integration, and thus higher density and/orreduced size of chips.

    2.1.2 Logic density

    For the logic structure consisting of a 4-input look-up table (LUT), a D-flip-flop and some addi-tional circuitry Xilinx uses the term LC Logical Cell. The terminology used for expressing thelogic density of FPGAs is quite confusing. The point is we need a unit to express the logic capa-bility of our FPGA. The problem is how to define this unit. By introducing new features into alogic block, its functionality increases, and cannot be easily expressed in terms of LCs.Xilinx uses the term CLB Configurable Logic Block to name the basic logic block of all its FP-GAs. Each CLB has 8 LCs. But since these 8 LCs provide a grater functionality than if theywere separate, Xilinx now uses the unit ELC Equivalent Logic Cell (1 ELC = 1.125 LC) to statethe complexity of its FPGAs. To make it all more complicated, Xilinx introduced the term ASMBLAdvanced Silicon Modular Block (pronounced like assemble) to describe the new feature-richarchitecture of their Virtex-4 building blocks

    2.1.3 Clock management

    Clock management comprises two basic functions:

    Remove clock skew1 and propagation delay

    All parts of a digital circuit need to be synchronized to a desired clock signal. If the circuit islarge, complex, and operating at high frequencies, the clock propagation delay and clock skew

    9

  • have a great impact on its performance. Therefore, providing a clock signal with zero-delay inall parts of an FPGA becomes crucial. Generally, this can be done using either DLLs DelayLocked Loops, or PLLs Phase Locked Loops. Both of these 2 types of circuits yield the sameresult they compensate for the delay generated on the routing network inside the FPGA, provid-ing zero-delay clock signal (with respect to a user source clock) to different parts of FPGA.

    Clock generation and phase shifting

    Since the whole point of FPGAs lies in their configurability, having the option to make differentparts (called clock domains) of the same FPGA work at different frequencies dramatically simpli-fies the design, at the same time improving the performance. Clock multiplication gives the de-signer a number of design alternatives. For instance, a 50 MHz source clock multiplied 4X by theDLL/PLL can drive an FPGA design operating at 200 MHz. This technique can simplify boarddesign because the clock path on the board no longer distributes such a high-speed signal.

    2.1.4 On-chip memory

    As FPGA applications grow in complexity so does their need for memory. Using Look-Up Tablesas registers for storing data couldnt possibly provide enough space for serious applications. Es-pecially if these applications require numerous arithmetical computations to be performed, andare time dependent. As this is often the case, the outside memory could not produce desired ef-ficiency. This is why, with every new generation of FPGAs, more and more memory gets embed-ded into FPGA. The main advantages of embedded (built-in) memory are:

    Short access time

    High bandwidth

    Great versatility Versatility means that the embedded memory can behave like various mem-ory forms, and implement some of the most commonly used memory functions, including: RAM(synchronous/asynchronous), ROM, FIFO, Buffers, Cache, Shift registers, etc

    2.1.5 DSP capabilities

    The majority of FPGA applications require some sort of Digital Signal Processing (DSP). DSPrequires many computations to take place in short periods of time. In order to reduce the timethese computations take, and to increase efficiency, computations are executed in parallel (pipelin-ing). FPGAs are ideal for implementing this pipeline mode of DSP, thanks to their adaptablestructure. FPGA manufacturers have over years developed special DSP units to help designersfully exploit the FPGA possibilities. These units are designed to optimize execution of most com-monly used DSP algorithms (filtering, compression, encoding/decoding, equalization, digital con-version, FFT, modulation, etc.) They usually contain a great number of multipliers (in parallel),accumulators, shift registers, adders

    10

  • 2.2 Virtex-5 FPGA Features

    2.2.1 Summary of Virtex-5 FPGA Features

    Cross-platform compatibility Most advanced, high-performance, optimal-utilization, FPGA fabric

    Real 6-input look-up table (LUT) technology

    Dual 5-LUT option

    Improved reduced-hop routing

    64-bit distributed RAM option

    SRL32/Dual SRL16 option

    Powerful clock management tile (CMT) clockingDigital Clock Manager (DCM) blocks for zero delay buffering, frequency synthesis, and

    clock phase shifting

    PLL blocks for input jitter filtering, zero delay buffering, frequency synthesis, and phase-matched clock division

    36-Kbit block RAM/FIFOsTrue dual-port RAM blocks

    Enhanced optional programmable FIFO logic

    Programmable

    True dual-port widths up to x36

    Simple dual-port widths up to x72

    Built-in optional error-correction circuitry

    Optionally program each block as two independent 18-Kbit blocks

    High-performance parallel SelectIO technology1.2 to 3.3V I/O Operation

    Source-synchronous interfacing using ChipSync technology

    Digitally-controlled impedance (DCI) active termination

    Flexible fine-grained I/O banking

    High-speed memory interface support

    Advanced DSP48E slices25 x 18, twos complement, multiplication

    Optional adder, subtracter, and accumulator

    Optional pipelining

    Optional bitwise logical functionality

    Dedicated cascade connections

    11

  • Flexible configuration optionsSPI and Parallel FLASH interface

    Multi-bitstream support with dedicated fallback reconfiguration logic

    Auto bus width detection capability

    System Monitoring capability on all devices On-chip/Off-chip thermal monitoringOn-chip/Off-chip power supply monitoring

    JTAG access to all monitored quantities

    Integrated Endpoint blocks for PCI Express DesignsLXT, SXT, TXT, and FXT Platforms

    Compliant with the PCI Express Base Specification 1.1

    x1, x4, or x8 lane support per block

    Works in conjunction with RocketIO transceivers

    Tri-mode 10/100/1000 Mb/s Ethernet MACsRocketIO transceivers can be used as PHY or connect to external PHY using many

    soft MII (Media Independent Interface) options

    RocketIO GTP transceivers 100 Mb/s to 3.75 Gb/sLXT and SXT Platforms

    RocketIO GTX transceivers 150 Mb/s to 6.5 Gb/sTXT and FXT Platforms

    PowerPC 440 MicroprocessorsFXT Platform only

    RISC architecture

    7-stage pipeline

    32-Kbyte instruction and data caches included

    Optimized processor interface structure (crossbar)

    65-nm copper CMOS process technology

    1.0V core voltage

    High signal-integrity flip-chip packaging available in standard or Pb-free package options

    12

  • Notes:

    1. Virtex-5 FPGA slices are organized differently from previous generations. Each Virtex-5 FPGA slice contains four LUTs and four flip-flops (previously it was two LUTs andtwo flip-flops.)

    2. Each DSP48E slice contains a 25 x 18 multiplier, an adder, and an accumulator.

    3. Block RAMs are fundamentally 36 Kbits in size. Each block can also be used as twoindependent 18-Kbit blocks.

    4. Each Clock Management Tile (CMT) contains two DCMs and one PLL.

    5. This table lists separate Ethernet MACs per device.

    6. RocketIO GTP transceivers are designed to run from 100 Mb/s to 3.75 Gb/s. Rocke-tIO GTX transceivers are designed to run from 150 Mb/s to 6.5 Gb/s.

    7. This number does not include RocketIO transceivers.

    8. Includes configuration Bank 0.

    13

  • 2.2.2 Virtex-5 FPGA Logic

    On average, one to two speed grade improvement over Virtex-4 devices

    Cascadable 32-bit variable shift registers or 64-bit distributed memory capability

    Superior routing architecture with enhanced diagonal routing supports block-to-block con-nectivity with minimal hops

    Up to 330,000 logic cells including:

    Up to 207,360 internal fabric flip-flops with clock enable (XC5VLX330)

    Up to 207,360 real 6-input look-up tables (LUTs) with greater than 13 million total LUTbits

    Two outputs for dual 5-LUT mode gives enhanced utilization

    Logic expanding multiplexers and I/O registers

    2.2.3 550 MHz Clock Technology

    Up to six Clock Management Tiles (CMTs)Each CMT contains two DCMs and one PLLup to eighteen total clock generators

    Flexible DCM-to-PLL or PLL-to-DCM cascade

    Precision clock deskew and phase shift

    Flexible frequency synthesis

    Multiple operating modes to ease performance trade-off decisions

    Improved maximum input/output frequency

    Fine-grained phase shifting resolution

    Input jitter filtering

    Low-power operation

    Wide phase shift range

    Differential clock tree structure for optimized low-jitter clocking and precise duty cycle

    32 global clock networks

    Regional, I/O, and local clocks in addition to global clocks

    2.2.4 SelectIO Technology

    Up to 1,200 user I/Os

    Wide selection of I/O standards from 1.2V to 3.3V

    14

  • Extremely high-performanceUp to 800 Mb/s HSTL and SSTL (on all single-ended I/Os)

    Up to 1.25 Gb/s LVDS (on all differential I/O pairs)

    True differential termination on-chip Same edge capture at input and output I/Os Extensive memory interface support

    2.2.5 550 MHz Integrated Block Memory

    Up to 16.4 Mbits of integrated block memory 36-Kbit blocks with optional dual 18-Kbit mode True dual-port RAM cells Independent port width selection (x1 to x72)

    Up to x36 total per port for true dual port operation

    Up to x72 total per port for simple dual port operation (one Read port and one Writeport)

    Memory bits plus parity/sideband memory support for x9, x18, x36, and x72 widths

    Configurations from 32K x 1 to 512 x 72 (8K x 4 to 512 x 72 for FIFO operation)

    Multirate FIFO support logicFull and Empty flag with fully programmable Almost Full and Almost Empty flags

    Synchronous FIFO support without Flag uncertaintyOptional pipeline stages for higher performance

    Byte-write capability Dedicated cascade routing to form 64K x 1 memory without using FPGA routing Integrated optional ECC for high-reliability memory requirements Special reduced-power design for 18 Kbit (and below)

    2.2.6 550 MHz DSP48E Slices

    25 x 18 twos complement multiplication Optional pipeline stages for enhanced performance Optional 48-bit accumulator for multiply accumulate (MACC) operation with optional ac-

    cumulator cascade to 96-bits

    15

  • Integrated adder for complex-multiply or multiply-add operation Optional bitwise logical operation modes Independent C registers per slice Fully cascadable in a DSP column without external routing resources

    2.2.7 Digitally Controlled Impedance (DCI)Active I/O Termination

    Optional series or parallel termination Temperature and voltage compensation Makes board layout much easier

    Reduces resistors

    Places termination in the ideal location, at the signal

    2.2.8 Advanced Flip-Chip Packaging

    Pre-engineered packaging technology for proven superior signal integrityMinimized inductive loops from signal to return

    Optimal signal-to-PWR/GND ratios

    Reduces SSO induced noise by up to 7x Pb-Free and standard packages

    2.2.9 System Monitor

    On-Chip temperature measurement (4C) On-Chip power supply measurement (1 Easy to use, self-contained

    No design required for basic operation

    Autonomous monitoring of all on-chip sensors

    User programmable alarm thresholds for on-chip sensors

    User accessible 10-bit 200kSPS ADCAutomatic calibration of offset and gain error

    DNL = 0.9 LSBs maximum

    Up to 17 external analog input channels supported 0V to 1V input range

    Monitor external sensors e.g., voltage, temperature

    General purpose analog inputs

    16

  • Full access from fabric or JTAG TAP to System Monitor Fully operational prior to FPGA configuration and during device power down (access via

    JTAG TAP only)

    2.2.10 65-nm Copper CMOS Process

    1.0V Core Voltage 12-layer metal provides maximum routing capability and accommodates hard-IP immersion Triple-oxide technology for proven reduced static power consumption

    2.2.11 Tri-Mode Ethernet Media Access Controller

    Designed to the IEEE 802.3-2002 specification Operates at 10, 100, and 1,000 Mb/s Supports tri-mode auto-negotiation Receive address filter (5 address entries) Fully monolithic 1000Base-X solution with RocketIO GTP transceivers Supports multiple external PHY connections (RGMII, GMII, etc.) interfaces through soft

    logic and SelectIO resources

    Supports connection to external PHY device through SGMII using soft logic and RocketIOGTP transceivers

    Receive and transmit statistics available through separate interface Separate host and client interfaces Support for jumbo frames Support for VLAN Flexible, user-configurable host interface Supports IEEE 802.3ah-2004 unidirectional mode

    2.2.12 RocketIO GTP Transceivers (LXT/SXT only)

    Full-duplex serial transceiver capable of 100 Mb/s to 3.75 Gb/s baud rates 8B/10B, user-defined FPGA logic, or no encoding options Channel bonding support CRC generation and checking

    17

  • Programmable pre-emphasis or pre-equalization for the transmitter Programmable termination and voltage swing Programmable equalization for the receiver Receiver signal detect and loss of signal indicator User dynamic reconfiguration using secondary configuration bus Out of Band (OOB) support for Serial ATA (SATA) Electrical idle, beaconing, receiver detection, and PCI Express and SATA spread-spectrum

    clocking support

    Less than 100 mW typical power consumption Built-in PRBS Generators and Checkers

    2.3 Architectural Description

    2.3.1 Virtex-5 FPGA Array Overview

    Virtex-5 devices are user-programmable gate arrays with various configurable elements and em-bedded cores optimized for high-density and high-performance system designs. Virtex-5 devicesimplement the following functionality:

    I/O blocks provide the interface between package pins and the internal configurable logic.Most popular and leading-edge I/O standards are supported by programmable I/O blocks(IOBs). The IOBs can be connected to very flexible ChipSync logic for enhanced source-synchronous interfacing. Source-synchronous optimizations include per-bit deskew (on bothinput and output signals), data serializers/deserializers, clock dividers, and dedicated I/Oand local clocking resources.

    Configurable Logic Blocks (CLBs), the basic logic elements for Xilinx FPGAs, provide com-binatorial and synchronous logic as well as distributed memory and SRL32 shift registercapability. Virtex-5 FPGA CLBs are based on real 6-input look-up table technology andprovide superior capabilities and performance compared to previous generations of pro-grammable logic.

    Block RAM modules provide flexible 36 Kbit true dualport RAM that are cascadable toform larger memory blocks. In addition, Virtex-5 FPGA block RAMs contain optional pro-grammable FIFO logic for increased device utilization. Each block RAM can also be config-ured as two independent 18 Kbit true dual-port RAM blocks, providing memory granular-ity for designs needing smaller RAM blocks.

    Cascadable embedded DSP48E slices with 25 x 18 twos complement multipliers and 48-bit adder/subtracter/accumulator provide massively parallel DSP algorithm support. Inaddition, each DSP48E slice can be used to perform bitwise logical functions.

    18

  • Clock Management Tile (CMT) blocks provide the most flexible, highest-performance clock-ing for FPGAs. Each CMT contains two Digital Clock Manager (DCM) blocks (self-calibrating,fully digital), and one PLL block (selfcalibrating, analog) for clock distribution delay com-pensation, clock multiplication/division, coarse- /fine-grained clock phase shifting, and in-put clock jitter filtering. Additionally, LXT, SXT, TXT, and FXT devices also contain:

    Integrated Endpoint blocks for PCI Express designs providing x1, x4, or x8 PCI ExpressEndpoint functionality. When used in conjunction with RocketIO transceivers, a completePCI Express Endpoint can be implemented with minimal FPGA logic utilization.

    10/100/1000 Mb/s Ethernet media-access control blocks offer Ethernet capability. LXTand SXT devices contain:

    RocketIO GTP transceivers capable of running up to 3.75 Gb/s. Each GTP transceiversupports full-duplex, clock-and-data recovery. TXT and FXT devices contain:

    GTX transceivers capable of running up to 6.5 Gb/s. Each GTX transceiver supports full-duplex, clock-anddata recovery. FXT devices contain:

    Embedded IBM PowerPC 440 RISC CPUs. Each PowerPC 440 CPU is capable of run-ning up to 550 MHz. Each PowerPC 440 CPU also has an APU (Auxiliary Processor Unit)interface that supports hardware acceleration, and an integrated cross-bar for high datathroughput.

    general routing matrix (GRM) provides an array of routing switches between each in-ternal component. Each programmable element is tied to a switch matrix, allowing multipleconnections to the general routing matrix. The overall programmable interconnection ishierarchical and designed to support high-speed designs. In Virtex-5 devices, the routingconnections are optimized to support CLB interconnection in the fewest number of hops.Reducing hops greatly increases post place-and-route (PAR) design performance. All pro-grammable elements, including the routing resources, are controlled by values stored instatic storage elements. These values are loaded into the FPGA during configuration andcan be reloaded to change the functions of the programmable elements.

    2.3.2 Virtex-5 FPGA Features

    This section briefly describes the features of the Virtex-5 family of FPGAs.

    2.3.3 Input/Output Blocks (SelectIO)

    IOBs are programmable and can be categorized as follows:

    Programmable single-ended or differential (LVDS) operation Input block with an optional single data rate (SDR) or double data rate (DDR) register Output block with an optional SDR or DDR register Bidirectional block Per-bit deskew circuitry

    19

  • Dedicated I/O and regional clocking resources

    Built-in data serializer/deserializer The IOB registers are either edge-triggered D-type flip-flops or level-sensitive latches. IOBs support the following single-ended standards:

    LVTTL

    LVCMOS (3.3V, 2.5V, 1.8V, 1.5V, and 1.2V)

    PCI (33 and 66 MHz)

    PCI-X

    GTL and GTLP

    HSTL 1.5V and 1.8V (Class I, II, III, and IV)

    HSTL 1.2V (Class 1)

    SSTL 1.8V and 2.5V (Class I and II) The Digitally Controlled Impedance (DCI) I/O fea-ture can be configured to provide on-chip termination for each single-ended I/O standardand some differential I/O standards. The IOB elements also support the following differen-tial signaling I/O standards:

    LVDS and Extended LVDS (2.5V only)

    BLVDS (Bus LVDS)

    ULVDS

    Hypertransport

    Differential HSTL 1.5V and 1.8V (Class I and II)

    Differential SSTL 1.8V and 2.5V (Class I and II)

    RSDS (2.5V point-to-point) Two adjacent pads are used for each differential pair. Twoor four IOB blocks connect to one switch matrix to access the routing resources. Per-bitdeskew circuitry allows for programmable signal delay internal to the FPGA. Per-bit deskewflexibly provides fine-grained increments of delay to carefully produce a range of signaldelays. This is especially useful for synchronizing signal edges in source-synchronous in-terfaces. General purpose I/O in select locations (eight per bank) are designed to be re-gional clock capable I/O by adding special hardware connections for I/O in the same local-ity. These regional clock inputs are distributed within a limited region to minimize clockskew between IOBs. Regional I/O clocking supplements the global clocking resources. Dataserializer/deserializer capability is added to every I/O to support source-synchronous in-terfaces. A serial-toparallel converter with associated clock divider is included in the in-put path, and a parallel-to-serial converter in the output path. An in-depth guide to theVirtex-5 FPGA IOB is found in the Virtex-5 FPGA Tri-Mode Ethernet MAC User Guide.

    20

  • 2.3.4 Configurable Logic Blocks (CLBs)

    A Virtex-5 FPGA CLB resource is made up of two slices. Each slice is equivalent and contains:

    function generators

    Four storage elements

    Arithmetic logic gates

    Large multiplexers

    Fast carry look-ahead chain The function generators are configurable as 6-input LUTs ordual-output 5-input LUTs. SLICEMs in some CLBs can be configured to operate as 32-bitshift registers (or 16-bit x 2 shift registers) or as 64-bit distributed RAM. In addition, thefour storage elements can be configured as either edge-triggered D-type flip-flops or levelsensitive latches. Each CLB has internal fast interconnect and connects to a switch matrixto access general routing resources. The Virtex-5 FPGA CLBs are further discussed in theVirtex-5 FPGA User Guide.

    2.3.5 Block RAM

    The 36 Kbit true dual-port RAM block resources are programmable from 32K x 1 to 512 x 72, invarious depth and width configurations. In addition, each 36-Kbit block can also be configuredto operate as two, independent 18- Kbit dual-port RAM blocks. Each port is totally synchronousand independent, offering three read-during-write modes. Block RAM is cascadable to implementlarge embedded storage blocks. Additionally, back-end pipeline registers, clock control circuitry,built-in FIFO support, ECC, and byte write enable features are also provided as options. Theblock RAM feature in Virtex-5 devices is further discussed in the Virtex-5 FPGA User Guide.

    2.3.6 Global Clocking

    The CMTs and global-clock multiplexer buffers provide a complete solution for designing high-speed clock networks. Each CMT contains two DCMs and one PLL. The DCMs and PLLs canbe used independently or extensively cascaded. Up to six CMT blocks are available, providing upto eighteen total clock generator elements. Each DCM provides familiar clock generation capabil-ity. To generate deskewed internal or external clocks, each DCM can be used to eliminate clockdistribution delay. The DCM also provides 90, 180, and 270 phase-shifted versions of the outputclocks. Fine-grained phase shifting offers higherresolution phase adjustment with fraction of theclock period increments. Flexible frequency synthesis provides a clock output frequency equalto a fractional or integer multiple of the input clock frequency. To augment the DCM capabil-ity, Virtex-5 FPGA CMTs also contain a PLL. This block provides reference clock jitter filteringand further frequency synthesis options. Virtex-5 devices have 32 global-clock MUX buffers. Theclock tree is designed to be differential. Differential clocking helps reduce jitter and duty cycledistortion.

    21

  • 2.3.7 DSP48E Slices

    DSP48E slice resources contain a 25 x 18 twos complement multiplier and a 48-bit adder / sub-tacter / accumulator. Each DSP48E slice also contains extensive cascade capability to efficientlyimplement high-speed DSP algorithms. The Virtex-5 FPGA DSP48E slice features are furtherdiscussed in Virtex-5 FPGA XtremeDSP Design Considerations.

    2.3.8 Routing Resources

    All components in Virtex-5 devices use the same interconnect scheme and the same access to theglobal routing matrix. In addition, the CLB-to-CLB routing is designed to offer a complete setof connectivity in as few hops as possible. Timing models are shared, greatly improving the pre-dictability of the performance for highspeed designs.

    2.3.9 Boundary Scan

    Boundary-Scan instructions and associated data registers support a standard methodology foraccessing and configuring Virtex-5 devices, complying with IEEE standards 1149.1 and 1532.

    2.3.10 Configuration

    Virtex-5 devices are configured by loading the bitstream into internal configuration memory us-ing one of the following modes:

    Slave-serial mode Master-serial mode Slave SelectMAP mode Master SelectMAP mode Boundary-Scan mode (IEEE-1532 and -1149) SPI mode (Serial Peripheral Interface standard Flash) BPI-up/BPI-down modes (Byte-wide Peripheral interface standard x8 or x16 NOR Flash)

    In addition, Virtex-5 devices also support the following configuration options:

    256-bit AES bitstream decryption for IP protection Multi-bitstream management (MBM) for cold/warm boot support Parallel configuration bus width auto-detection Parallel daisy chain Configuration CRC and ECC support for the most robust, flexible device integrity check-

    ing Virtex-5 device configuration is further discussed in the Virtex-5 FPGA ConfigurationGuide.

    22

  • 2.3.11 System Monitor

    FPGAs are an important building block in high availability/reliability infrastructure. Therefore,there is need to better monitor the on-chip physical environment of the FPGA and its immediatesurroundings within the system. For the first time, the Virtex-5 family System Monitor facili-tates easier monitoring of the FPGA and its external environment. Every member of the Virtex-5 family contains a System Monitor block. The System Monitor is built around a 10-bit 200kSPSADC (Analog-to-Digital Converter). This ADC is used to digitize a number of on-chip sensors toprovide information about the physical environment within the FPGA. On-chip sensors include atemperature sensor and power supply sensors. Access to the external environment is provided viaa number of external analog input channels. These analog inputs are general purpose and can beused to digitize a wide variety of voltage signal types. Support for unipolar, bipolar, and true dif-ferential input schemes is provided. There is full access to the onchip sensors and external chan-nels via the JTAG TAP, allowing the existing JTAG infrastructure on the PC board to be usedfor analog test and advanced diagnostics during development or after deployment in the field.The System Monitor is fully operational after power up and before configuration of the FPGA.System Monitor does not require an explicit instantiation in a design to gain access to its basicfunctionality. This allows the System Monitor to be used even at a late stage in the design cycle.The Virtex-5 FPGA System Monitor is further discussed in theVirtex-5 FPGA System MonitorUser Guide.

    2.3.12 Virtex-5 LXT, SXT, TXT, and FXT Platform Features

    This section briefly describes blocks available only in LXT, SXT, TXT, and FXT devices.

    2.3.13 Tri-Mode (10/100/1000 Mb/s) Ethernet MACs

    Virtex-5 LXT, SXT, TXT, and FXT devices contain up to eight embedded Ethernet MACs, twoper Ethernet MAC block. The blocks have the following characteristics:

    Designed to the IEEE 802.3-2002 specification

    UNH-compliance tested

    RGMII/GMII Interface with SelectIO or SGMII interfacewhen used with RocketIO transceivers

    Half or full duplex

    Supports Jumbo frames

    1000 Base-X PCS/PMA: When used with RocketIO GTP transceiver, can provide complete1000 Base-X implementation on-chip

    DCR-bus connection to microprocessors

    23

  • 2.3.14 Integrated Endpoint Blocks for PCI Express

    Virtex-5 LXT, SXT, TXT, and FXT devices contain up to four integrated Endpoint blocks. Theseblocks implement Transaction Layer, Data Link Layer, and Physical Layer functions to providecomplete PCI Express Endpoint functionality with minimal FPGA logic utilization. The blockshave the following characteristics:

    Compliant with the PCI Express Base Specification 1.1 Works in conjunction with RocketIO transceivers to provide complete endpoint functional-

    ity

    1, 4, or 8 lane support per block

    2.3.15 Virtex-5 LXT and SXT Platform Features

    This section briefly describes blocks available only in LXT and SXT devices.

    2.3.16 RocketIO GTP Transceivers

    4 - 24 channel RocketIO GTP transceivers capable of running 100 Mb/s to 3.75 Gb/s.

    Full clock and data recovery 8/16-bit or 10/20-bit datapath support Optional 8B/10B or FPGA-based encode/decode Integrated FIFO/elastic buffer Channel bonding and clock correction support Embedded 32-bit CRC generation/checking Integrated comma-detect or A1/A2 detection Programmable pre-emphasis (AKA transmitter equalization) Programmable transmitter output swing Programmable receiver equalization Programmable receiver termination Embedded support for:

    Out of Band (OOB) signalling: Serial ATA

    Beaconing, electrical idle, and PCI Express receiver detection

    Built-in PRBS generator/checker Virtex-5 FPGA RocketIO GTP transceivers are furtherdiscussed in the Virtex-5 FPGA RocketIO GTP Transceiver User Guide.

    2.3.17 Virtex-5 TXT and FXT Platform Features

    This section describes blocks only available in TXT and FXT devices.

    24

  • 2.3.18 RocketIO GTX Serial Transceivers

    (TXT/FXT) 8 - 48 channels RocketIO serial transceivers capable of running 150 Mb/s to 6.5Gb/s

    Full Clock and Data Recovery 8/16/32-bit or 10/20/40-bit datapath support Optional 8B/10B encoding, gearbox for programmable 64B/66B or 64B/67B encoding, or

    FPGA-based encode/decode

    Integrated FIFO/Elastic Buffer Channel bonding and clock correction support Dual embedded 32-bit CRC generation/checking Integrated programmable character detection Programmable de-emphasis (AKA transmitter equalization) Programmable transmitter output swings Programmable receiver equalization Programmable receiver termination Embedded support for:

    Serial ATA: Out of Band (OOB) signalling

    PCI Express: Beaconing, electrical idle, and receiver detection

    Built-in PRBS generator/checker Virtex-5 FPGA RocketIO GTX transceivers are furtherdiscussed in the Virtex-5 FPGA RocketIO GTX Transceiver User Guide.

    25

  • 2.4 ML505 evaluation board

    Virtex-5LXT/ S XT/FXT

    FPGA

    GPIO(B utton/LED/DIP S witch)

    PLL Clock Gener atorPl us U ser O scillator

    S ystem Monitor

    S MA(Dierential In/O ut Clock s)

    Dual P S /2

    GTP: PCIe 1x

    Fl as h

    S yncS RAM

    Pl atform Fl as hS PI

    S ystem ACEController

    CPLDMisc. Gl ue Logic

    Sele

    ctM

    ap

    SPI C

    fg

    BPI F

    las

    h Cf

    g

    Slav

    e Se

    rial

    JTAG

    JTAG

    JTAG

    JTAG

    Mas

    ter S

    eria

    l

    XGI He ader

    US BController

    10/100/1000Ethernet PHY

    AC97Audio CODEC

    B attery andFan He ader

    CF PC4

    R S -232 XCVR

    VGA Inp utCodec

    16 X 32Ch aracter LCD

    IIC EEPROM

    RJ-45

    Line O ut /He adphone

    Digital Audio

    Mic In / Line In

    S erial

    Piezo/ S peaker

    HostPeripher alPeripher al

    1632

    16

    323216

    User IIC B us

    DDR2 S O-DIMM

    DVI-I Video O utDVI O utputCodec

    GTP: 2 S erial ATA

    GTP: 4 S FP

    GTP: 4 S MA

    Figure 2.1: Global total traffic in mobile networks, 2007-2012

    26

  • Bibliography

    [1] Ognjen ekic. Fpga comparative analysis. note.

    [2] Xilinx. ML505/ML506/ML507 Evaluation Platform User Guide Evaluation Platform, 2009.

    [3] Xilinx. Virtex-5 Family Overview, 2009.

    27

  • 28

  • Chapter 3

    CRC and Segmentation

    3.1 CRC (cyclic redundancy check)

    The first step in the processing sequence is the CRC attachment. A fixed 24-A CRC code is ap-pended to each transport block (TB)(A transport block is defined as the data accepted by thephysical layer to be jointly encoded). CRC codes are error-detecting codes typically used in automatic-repeat request (ARQ) systems. CRC codes have no error correction capability but they can beused in a combination with an error-correcting code to improve the performance of the system. ACRC constructed by an (n, k) cyclic code is capable of detecting any error burst of length n kor less. Binary (n, k) CRC codes are capable of detecting the following error patterns:

    1. All error bursts of length n k or less.

    2. A fraction of error bursts of length equal to nk + 1; the fraction equals 12(nk1)

    3. A fraction of error bursts of length greater than to nk + 1; the fraction equals 2(nk1).

    4. All combinations of dmin1 (or fewer) errors

    5. All error patterns with an odd number of errors if the generator polynomial g(X) for thecode has an even number of nonzero coefficients.

    3.1.1 CRC polynomials

    Denote the input bits to the CRC computation by a0, a1, a2, a3, ..., aA1 , and the parity bits byp0, p1, p2, p3, ..., pL1. A is the size of the input sequence and L is the number of parity bits. Theparity bits are generated by one of the following cyclic generator polynomials:

    1. gCRC24A(D) = [D24 +D23 +D18 +D17 +D14 +D11 +D10 +D7 +D6 +D5 +D4 +D3 +D+ 1]

    2. gCRC24B(D) = [D24 +D23 +D6 +D5 +D + 1]

    3. gCRC16(D) = [D16 +D12 +D5 + 1]

    4. gCRC8(D) = [D8 +D7 +D4 +D3 +D + 1]

    29

  • 3.1.2 CRC calculation

    The theory of a CRC calculation is straight forward. The data is treated by the CRC algorithmas a binary number. This number is divided by another binary number called the polynomial.The rest of the division is the CRC checksum, which is appended to the transmitted message.The receiver divides the message (including the calculated CRC), by the same polynomial thetransmitter used. If the result of this division is zero, then the transmission was successful. How-ever, if the result is not equal to zero, an error occurred during the transmission. The divisionuses the Modulo-2 arithmetic. Modulo-2 calculation is simply realized by XORing two numbers.

    3.1.3 Modulo-2 arithmatic example

    1 0 0 1 1 0 0 1 0 10 1 0 0 1 1 0 1 1 1 XOR

    1 1 0 1 0 1 0 0 1 0 =

    XOR function:

    X1 X2 Y

    0 0 01 0 10 1 11 1 0

    3.1.4 CRC calculation example

    In this example calculation, the message is two bytes long. In general, the message can have anylength in bytes. Before we can start calculating the CRC value 1, the message has to be aug-mented by n-bits, where n is the length of the polynomial. The CRC-16 polynomial has a lengthof 16-bits, therefore, 16-bits have to be augmented to the original message. In this example calcu-lation, the polynomial has a length of 3-bits, therefore, the message has to be extended by threezeros at the end. An example calculation for a CRC is shown in Example 1. The reverse calcula-tion is shown in Example 2.

    1. Example(1):

    Message = 1 1 0 1 0 1

    CRC polynomial = 1 0 1

    1 1 0 1 0 1 0 0 1 0 1 = 1 1 1 0 1 = Quotient (has no function in CRC calculation)

    30

  • 1 1 0 1 0 1 0 01 0 1

    1 1 11 0 1

    1 0 01 0 1

    1 1 01 0 1

    1 1 01 0 1

    1 1 = Remainder = CRC checksum

    Message with CRC = 1 1 0 1 0 1 1 1

    2. Example (2) :

    Message with CRC = 1 1 0 1 0 1 1 1

    Polynomial = 1 0 1

    1 1 0 1 0 1 1 1 1 0 1 = 1 1 1 0 1 = Quotient

    1 1 0 1 0 1 1 11 0 1

    1 1 11 0 1

    1 0 01 0 1

    1 1 11 0 1

    1 0 11 0 1

    0 0 = Checksum is zero, therefore, no transmission error

    3.2 Segmentation

    For the purposes of reducing complexity, a certain fixed number of turbo interleaver sizes is sup-ported as given in the table(realease 8 standard) . The difference between two adjacent inter-leaver sizes is 8-bits for small codeblocks and goes up to 64 bits for the largest codeblock sizewhy?The reason for a coarser granularity of interleaver sizes for larger code blocks is that a largernumber of filler bits is still a small fraction of the codeblock size when the codeblock size is large.

    3.2.1 What is segmentation ?

    The maximum codeblock size is limited to 6144 bits.

    31

  • When the transport is larger than 6144 bits, segmentation of the input bit sequence is per-formed

    When the transport block size is not matched to the turbo interleaver size, filler bits areadded.

    3.2.2 Example:

    Let us assume a transport block size of 19 000 bits They will be segmented into four codeblocks The last three segments are of maximum size 6144 bits and the first segment is of size 576

    bits.

    The first segment size is actually 568 bits and is matched to the nearest interleaver size of576.

    This results in a filler bits overhead of 8-bits (576 568 = 8 bits). There is a problem with this segmentation approach

    Vastly different codeblock sizes would result in different turbo code performance.

    This approach will lead to the increase of filler bits, which is considered a delay or un-useful

    Data increasing the amount of redundancy.

    3.2.3 Problem solution

    In order to reduce the number of filler bits while keeping the codeblock sizes approximately thesame, the LTE system uses two adjacent interleaver sizes In the later stages of lte standard devel-opment, it was agreed that filler bits are mostly removed after channel coding.

    3.2.4 Segmentation process

    32

  • A fixed 24-a CRC is calculated for each transport block comming from MAC layer. The calculated CRC is appended to the transport block in order to check the data integrity

    at the receiver end.

    If the input sequence length is shorter than 40 bits, filler bits are added to the beginning ofthe code block. (no segmentation)

    If the resulting bit sequence is longer than the maximum allowed code block size (6144bits), a segmentation process must be carried out.

    After the segmentation, filler bits are added to the last segment, if needed. Finally, an additional CRC sequence of 24-b is then attached to each resulting segment.

    3.3 Matlab code

    For the matlab code , first I needed to enter the whole avaliable k to be able to choose thesuitable ones from them.

    This psuedo-code taken from the standard is very helpful in explaining the whole segmenta-tion process

    33

  • 34

  • Now to describe the crc operation , I will use the following matlab code

    First , we need to enter the crc generator matrix : crc24a = [1100001100100110011111011];

    Crc generation process , is the long division of the data block by the crc generator ; in bi-nary , we can translate this operation into a simple xor process , but with shifting the crcgenerator after every xor operation.

    We will insert a number of 24 zeros to be concatenated with the data block, its like provid-ing a space to insert the crc reminder in the end of the process,it will be named as (shifted-data).

    Using a while loop , and two variables , count and coordinator . count=length(shifted-data)-length(crc24a); determines the number of shifts needed coordinator = 1 ; initializedby one and then incremented at each shifting process

    while coordinat