Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
POLITECNICO DI TORINO
Faculty of Engineering
Master degree in
Communications and Computer Networks Engineering
MASTER THESIS
Software Defined Radio
Implementation of a Satellite
Radio System
Supervisor
Prof. Roberto Garello
April 2020
Candidate
Luca Vigna
To you who are the most beautiful
gift life could ever give me
iii
Abstract
This thesis is focused on the design and implementation of a
Telecommand Satellite Radio System and it has been carried out
with the Italian aerospace engineering company Argotec.
A design of the digital signal processing part of the system is
proposed, having as guideline for the definition of the
specifications the Consultative Committee for Space Data
Systems (CCSDS) Standard.
The focus is mainly on the Physical and Coding layers of the
architecture proposed by CCSDS. Simulations and performance
analysis are carried out through the MATLAB Software.
A research on an experimental coding scheme, able to provide
significant coding gain, is presented and the theoretical results
are verified in practice.
The document ends with the implementation of the designed
system, in C language, on a development board (equipped with
a Xilinx Zynq-7000) used to test and validate the prototype.
iv
Contents
Abstract .......................................................................................................................... iii
Contents ......................................................................................................................... iv
List of Figures ............................................................................................................... vii
List of Tables .................................................................................................................. xi
Acronyms ...................................................................................................................... xii
1 Introduction ............................................................................................................ 1
1.1 Background and Motivation ......................................................................... 1
1.2 Objectives ......................................................................................................... 3
1.3 Thesis Outline .................................................................................................. 4
2 Telecommand System Design .............................................................................. 5
2.1 Project Description .......................................................................................... 5
2.2 Project Specification ........................................................................................ 7
2.3 Functional Block Diagram ............................................................................. 8
2.4 Transmitter Setup ........................................................................................... 9
2.5 Received Signal ............................................................................................. 12
2.6 Test Set-up ...................................................................................................... 15
3 Physical Layer Design ......................................................................................... 18
3.1 RF and Modulation Systems ....................................................................... 18
3.2 Automatic Gain Control ............................................................................... 19
3.2.1 Block Design and Trade-off analysis .................................................. 20
v
3.2.2 Simulation results .................................................................................. 22
3.2.3 Optimization and Performance results .............................................. 25
3.3 Sub-Carrier Synchronizer ............................................................................ 26
3.3.1 Design and Trade-off analysis ............................................................. 27
3.3.2 Simulation and Performance results ................................................... 33
3.4 Filter ................................................................................................................ 40
3.4.1 Design and Trade-off analysis ............................................................. 40
3.4.2 Simulation and Performance results ................................................... 43
3.5 Clock Recovery .............................................................................................. 46
3.5.1 Design and Trade-off analysis ............................................................. 46
3.5.2 Simulation and Performance results ................................................... 49
3.6 Phase Recovery .............................................................................................. 53
3.6.1 Design and Trade-off analysis ............................................................. 53
3.6.2 Simulation and Performance results ................................................... 55
4 Coding Layer Design .......................................................................................... 56
4.1 TC Synchronization and Channel Coding ................................................ 56
4.2 Frame Synchronizer ...................................................................................... 58
4.2.1 Design and Trade-off analysis ............................................................. 59
4.3 Decoder ........................................................................................................... 61
4.3.1 Design and Trade-off analysis ............................................................. 62
4.4 Simulation and Performance results .......................................................... 65
5 LDPC Code ........................................................................................................... 68
5.1 Overview ........................................................................................................ 68
vi
5.2 Encoder ........................................................................................................... 69
5.3 Randomizer .................................................................................................... 73
5.4 Decoder ........................................................................................................... 76
5.5 Simulation and Performance results .......................................................... 79
6 Prototype Testing ................................................................................................. 84
6.1 Testing Overview .......................................................................................... 84
6.2 CPU utilization .............................................................................................. 86
6.3 BCH SEC Decoder ......................................................................................... 90
6.4 LDPC SPA Decoder ...................................................................................... 91
6.5 Custom IP Core ............................................................................................. 96
7 Final Remarks ....................................................................................................... 99
7.1 Conclusions .................................................................................................... 99
7.2 Future Work ................................................................................................. 101
Bibliography ............................................................................................................... 102
vii
List of Figures
Figure 2.1 – Basic Scheme of Telecommand System ................................................ 5
Figure 2.2 – CCSDS Protocols [1]................................................................................. 6
Figure 2.3 – Typical Spacecraft Transponder ............................................................ 7
Figure 2.4 – Block Diagram of Telecommand System .............................................. 8
Figure 2.5 – GNU Radio flow graph for Tx signal generation ................................ 9
Figure 2.6 – Tx Signal in time domain (No Noise) .................................................. 10
Figure 2.7 – Spectrum of the signal with data (No Noise) ..................................... 11
Figure 2.8 – Block Scheme of the test set-up ............................................................ 15
Figure 2.9 – Zynq-7000 SoC Architectural Overview ............................................. 16
Figure 3.1 – Carrier Modulation Modes (PLOP-2) .................................................. 18
Figure 3.2 – AGC Absolute value block diagram ................................................... 20
Figure 3.3 – AGC RMS block diagram ...................................................................... 21
Figure 3.4 – Effect of AGC RMS to Rx Signal .......................................................... 22
Figure 3.5 – AGC Gain Value for different α (AGC absolute value) ................... 23
Figure 3.6 – AGC Gain Value for different α (AGC RMS)..................................... 24
Figure 3.7 – PSD of Rx Signal ..................................................................................... 26
Figure 3.8 – Ideal (Left) vs Real (Right) Oscillator frequency response .............. 27
Figure 3.9 – PLL Block Diagram ................................................................................ 28
Figure 3.10 – PLL Discriminator ................................................................................ 29
Figure 3.11 – First Order PLL (up) – Second Order PLL (down).......................... 30
Figure 3.12 – Steady-state errors of first order PLL ................................................ 30
Figure 3.13 – IIR Biquad Filter ................................................................................... 32
Figure 3.14 – Step response (Left) – Nyquist diagram (Right) with 𝜍 = 0.237 and
𝜔𝑛 = 316.23 .................................................................................................................. 33
viii
Figure 3.15 – Estimated Phase in Rx simulation – First test .................................. 34
Figure 3.16 – Step response (Left) – Nyquist diagram (Right) with 𝜍 = 0.75 and
𝜔𝑛 = 10 ......................................................................................................................... 35
Figure 3.17 – Estimated Phase in Rx simulation – Second test ............................. 35
Figure 3.18 – Bode Diagram with 𝜍 = 0.707, 𝜔𝑛 = 110 and 𝑇𝑠 = 1 𝑚𝑠 .............. 36
Figure 3.19 – Step response with 𝜍 = 0.707, 𝜔𝑛 = 110 and 𝑇𝑠 = 1 𝑚𝑠 ................ 37
Figure 3.20 – Estimated Phase (Left) – Scattering Diagram after PLL (Right) .... 38
Figure 3.21 – Shaping pulse at the Tx in time .......................................................... 40
Figure 3.22 – FIR filter Block Diagram...................................................................... 41
Figure 3.23 – Rx Signal filtered (MA) – No Noise ................................................... 41
Figure 3.24 – SRRC Impulse Response at different roll-off ................................... 42
Figure 3.25 – Eye diagram with MA filter (Left) – Scattering Diagram (Right) -
Eb/N0 = 25 ....................................................................................................................... 44
Figure 3.26 - Rx Signal filtered (SRRC, rolloff = 0.25) – No Noise ........................ 44
Figure 3.27 – Linear interpolation between 4 samples ........................................... 47
Figure 3.28 – Delay Locked Loop for Timing estimation ...................................... 48
Figure 3.29 – Signal’s Energy sampling at different τ ............................................ 49
Figure 3.30 – Timing Error Detector S-curves for Strobe and Midway samples 52
Figure 3.31 – Costas Loop Block Diagram ............................................................... 54
Figure 3.32 – Scattering Diagram in presence
of Doppler Shift ............................................................................................................ 54
Figure 3.33 – Phase Estimated by Costas Loop with different 𝛾 .......................... 55
Figure 4.1 – Format of a CLTU [18] ........................................................................... 56
Figure 4.2 – CLTU Reception State Diagram [18] ................................................... 57
Figure 4.3 – BCH Start Sequence Circular Autocorrelation .................................. 58
Figure 4.4 – Cross Correlation between BCH Start Sequence and Idle Sequence
........................................................................................................................................ 59
ix
Figure 4.5 – Format of a BCH Codeblock [18] ......................................................... 61
Figure 4.6 – BCH(63,56) Encoder [1] ......................................................................... 62
Figure 4.7 – Implementation of Error Correction Mode Decoder [18] ................. 63
Figure 4.8 – BER versus 𝐸𝑏𝑁0 with BCH and without Coding ............................ 66
Figure 4.9 – CER versus 𝐸𝑏𝑁0 with BCH and without Coding ............................ 66
Figure 5.1 – Parity Check Matrix of LDPC(128,64) ................................................. 69
Figure 5.2 – CLTU Structure for LDPC(128,64) code.............................................. 70
Figure 5.3 – LDPC Start Sequence Circular Autocorrelation ................................ 71
Figure 5.4 – Cross Correlation between LDPC Start Sequence and Idle Sequence
........................................................................................................................................ 71
Figure 5.5 – LDPC Start Sequence Autocorrelation 𝐸𝑏𝑁0 = 4 ............................. 72
Figure 5.6 – How Randomizer works – Tx side (Up) – Rx side (Down) ............. 74
Figure 5.7 – LFSR proposed by CCSDS [1] .............................................................. 75
Figure 5.8 – Tanner Graph (left) – H matrix (right) ................................................ 77
Figure 5.9 – BER versus 𝐸𝑏𝑁0 with different LDPC decoder ............................... 80
Figure 5.10 – CER versus 𝐸𝑏𝑁0 with different LDPC decoder ............................. 81
Figure 5.11 – BER versus 𝐸𝑏𝑁0 – Comparison between Coding Schemes ......... 82
Figure 5.12 – CER versus 𝐸𝑏𝑁0 – Comparison between Coding Schemes ......... 82
Figure 6.1 – GNU Radio Interface ............................................................................. 85
Figure 6.2 – CPU utilization by the two FSM .......................................................... 88
Figure 6.3 – CPU utilization by Physical Layer’s blocks ........................................ 89
Figure 6.4 – CPU utilization by Coding Layer’s blocks ......................................... 89
Figure 6.5 – CER of uncoded system ........................................................................ 90
Figure 6.6 – BCH Decoder 𝐸𝑏𝑁0 = 8 𝑑𝐵 ................................................................. 91
Figure 6.7 – LDPC Decoder 𝐸𝑏𝑁0 = 5 𝑑𝐵 ................................................................ 92
Figure 6.8 – LDPC Decoder 𝐸𝑏𝑁0 = 4 𝑑𝐵 ................................................................ 93
Figure 6.9 – SPA decoder CER performance on Zynq-7000 .................................. 94
x
Figure 6.10 – SPA decoder CER performance at different data rates .................. 95
Figure 6.11 – Block Design with Vivado tool........................................................... 97
xi
List of Tables
Table 2.1 System Specifications ................................................................................... 8
Table 3.1 – AGC Performance .................................................................................... 25
Table 3.2 – Steady-state errors of a PLL.................................................................... 31
Table 3.3 – Subcarrier removal Performance ........................................................... 38
Table 3.4 – Sub-Carrier Synchronizer Performance ................................................ 39
Table 3.5 – Filter Performance.................................................................................... 45
Table 3.6 – Clock Recovery Performance ................................................................. 51
Table 3.7 – Symbol Lock Detector Performance ...................................................... 52
Table 3.8 – Costas Loop Performance ....................................................................... 55
Table 4.1 – Events and States of Coding Layer ....................................................... 57
Table 4.2 – Codeblock Decision for BCH Decoder (SEC)....................................... 64
Table 6.1 – Performance of Coding Layer FSM (with BCH Decoder) .................. 86
Table 6.2 – Performance of the Coding Layer FSM (with LDPC Decoder) ......... 87
xii
Acronyms
ADC Analog to Digital Converter
AGC Automatic Gain Control
AWGN Additive White Gaussian Noise
BCH Bose–Chaudhuri–Hocquenghem
BER Bit Error Rate
BR Buffer Register
CCSDS Consultative Committee for Space Data Systems
CER Codeword Error Rate
CLTU Communications Link Transmission Unit
CMM Carrier Modulation Modes
DAC Digital to Analog Converter
DLL Delay Locked Loop
DSP Digital Signal Processing
EOD Even Odd Detector
FDD Frequency Discriminator Device
FIR Finite Impulse Response
FSM Finite State Machine
ISI Intersymbol Interference
LDPC Low-Density Parity-Check
LFSR Linear Feedback Shift Register
LLR Log-Likelihood Ratio
LO Local Oscillator
LUT Lookup Table
MA Moving Average
xiii
MS Min-Sum
NCO Numerically Controlled Oscillator
NMS Normalized Min-Sum
PL Programmable Logic
PLOP Physical Layer Operations Procedure
PS Processing System
RF Radio Frequency
RMS Root Mean Square
Rx Receiver
SDR Software Defined Radio
SNR Signal-to-Noise Ratio
SPA Sum-Product Algorithm
SR Syndrome Register
TC Telecommand
TM Telemetry
Tx Transmitter
1
CHAPTER 1
1 Introduction
1.1 BACKGROUND AND MOTIVATION
One of the basic functions of any spacecraft is the ability to establish a connection
with ground stations or with other spacecrafts to exchange information,
commands, control data, etc. To this purpose, the spacecraft uses a transceiver
that can be designed for a fixed type of communication (using analog
components) guaranteeing no flexibility to the user, being incapable of adapting
to new configurations. For this reason, the trend is to move most of the
functionality of a conventional radio to the digital domain (using re-
programmable hardware logic) in which some of them are fully implemented in
software. This new technology is the so-called Software Defined Radio (SDR).
Usually, an SDR-based transceiver results in a compact, low power and flexible
communication system at the expense of not exactly cheap high-performance
components. However, in a satellite system the possibility to have a flexible
device is very appealing, using a generic hardware design that can be used to
address different communication needs, with varying characteristics.
Applying this concept to small satellites can increase data throughput, add the
possibility to perform software updates over-the-air and make it possible to reuse
the hardware platform for multiple missions with different requirements [1].
The focus of the aerospace company Argotec in the production of nanosatellites
for Deep Space applications led to interest in creating a collaboration for an all
in-house design and implementation of an SDR.
1. Introduction
2
Argotec is an aerospace engineering company based in Turin and founded by
David Avino in 2008. Actually, Argotec is on the verge of integrate two platforms
(ArgoMoon and LICIACube) which will fly with two different NASA missions
in the years 2020 and 2021.
As anticipated, the platform conceived by the company is based on the all in-
house concept. Starting from the ideas of Argotec’s engineers, a first design of the
project is made. Then, the process of the realization and testing of all the
subsystems up to their integration on the comprehensive platform is done in the
internal laboratories.
The main pillars of this thesis work are the needs of the company to expand its
Portfolio with a deep-space radio for nanosatellites, the opportunity to
implement an SDR that can be ported on a real flight system hardware and the
will to research and develop new technologies that can improve the actual
performance of a system which is experiencing an important period of growth.
In a typical radio communication system, the basic data flow is made by
Telemetry (TM) and Telecommand (TC) data, in downlink and uplink
respectively.
The work presented in this thesis, is focused on the design and implementation
of the Telecommand System. It must reliably and transparently convey control
information from an originating source (located on the Earth) to a remotely
located physical device (i.e. satellite).
1. Introduction
3
1.2 OBJECTIVES
The goal of this master’s thesis work is to make a preliminary analysis on the
feasibility of the project starting from a basic design of the Sub-System and to
build a prototype of the system able to work as close as possible to realistic
conditions.
The object is to find the best solution that fits all the requirements. The first one
is that the system must be compliant with the CCSDS Standard. Starting from it,
the main objectives of the work are reported below:
• Literature study of a Satellite Radio System focusing on the Receiver (Rx)
structure;
• Study of a Base Station Transmitter (Tx) to generate an Uplink signal, able
to send realistic data;
• Feasibility analysis of the proposed architecture through MATLAB
simulations;
• Implementation on a Development Board able to work in real-time;
• Characterization of each block in order to evaluate which of them could
benefit from a hardware implementation (e.g. in terms of performance,
resource utilization etc.);
• Test of the prototype simulating a real environment condition.
1. Introduction
4
1.3 THESIS OUTLINE
The thesis is organized as follow:
• Chapter 2, Telecommand System Design, reports the study and the
definition of the system requirements in order to have a guideline for the
development. A functional block diagram of the receiver is proposed;
• Chapter 3, Physical Layer Design, describes the functionalities of the
physical layer. The receiver block diagram, defined in the previous
Chapter, is implemented and simulated in MATLAB in order to study the
feasibility of each block (understanding the functionalities and the various
parameters) fitting them to the generated Tx signal and analyzing the
trade-offs;
• Chapter 4, Coding Layer Design, gives an overview of the
Synchronization and Channel Coding Sublayer proposed by CCSDS is
reported. The Frame Synchronizer and the Decoder implementations are
shown with particular attention on their performance;
• Chapter 5, LDPC Code, presents, theoretically and practically, a new
coding scheme that can work with very low value of SNR;
• Chapter 6, Prototype Testing, presents a real-time prototype. After the
simulation model is validated, the MATLAB code is ported on a
Development Board (C language) and the system is tested;
• Chapter 7, Final Remarks, draws the conclusion, highlighting the final
result obtained and giving a hint about future works;
5
CHAPTER 2
2 Telecommand System Design
This chapter provides an overview of the project highlighting the characteristics of the
transmitted signal, how it is simulated and what are the effects of the channel and of the
non-ideal RF front ends on it. A high-level block diagram of the receiver adopted and the
test set-up used are presented.
2.1 PROJECT DESCRIPTION
A space link is a communications link between a spacecraft and its associated
ground system or between two spacecrafts. In the figure below is reported the
typical configuration where the ground station sends telecommands to a
spacecraft
Figure 2.1 – Basic Scheme of Telecommand System
The telecommands are encoded and modulated from a ground system and sent
over the Radio Frequency (RF) Space Link. The task of the Spacecraft’s
transponder is to receive the signal, demodulate it and extract the telecommand
through a decoder.
2. Telecommand System Design
6
The transponder on board the spacecraft is an SDR. It is a radio communication
system that uses a minimum amount of analog/Radio Frequency components to
down convert the RF signal from a digital format (and vice versa). Once analog
data is digitized, all processing is performed using hardware logic. Typically, this
processing includes filtering, modulation, up/down converting and
demodulation.
Concerning the architecture of the satellite subsystem the Recommendations
proposed by the CCSDS, that has the purpose to promote interoperability and
cross support among cooperating space agencies, are taken as a reference point.
In Figure 2.2 are represented the CCSDS protocols
Figure 2.2 – CCSDS Protocols [1]
The focus will be mainly on:
• Physical Layer: provides the RF channel and the techniques required to
operate it (e.g. modulation, demodulation, bit synchronization);
• Synchronization and Channel Coding Sublayer: provides functions
necessary for transferring frames over a space link (e.g. error-control
coding, synchronization, pseudo-randomizing, repeated transmissions).
2. Telecommand System Design
7
2.2 PROJECT SPECIFICATION
The typical structure of the transponder is reported in Figure 2.3
Figure 2.3 – Typical Spacecraft Transponder
This work does not deal with the RF Front-End block of the SDR but only the
Digital baseband Signal Processing (DSP) of the signal. However, the
impairments introduced by Tx and Rx front-ends are studied in detail.
The choice to be compliant with CCSDS standards leads to an easier definition of
initial specification about the Radio Frequency signal. In the following table are
reported the characteristics of the transmitted signal. The scope of these choices
is that the result of the project must be an SDR prototype that is compatible with
real space application. The specifications are explained, more in details, in the
Section 2.4
2. Telecommand System Design
8
Table 2.1 System Specifications
Specification Value
Modulation Waveform BPSK
Sub-carrier Waveform Sine Wave
Sub-carrier frequency 𝑓𝑠𝑐 = 16 𝑘𝐻𝑧
Sample Rate 𝑓𝑠 = 48 𝑘𝐻𝑧
Data Rate 𝑅𝑠 = 1 𝑘𝑏𝑖𝑡/𝑠
Decoding Scheme BCH / SEC Mode
Physical Layer PLOP-2
2.3 FUNCTIONAL BLOCK DIAGRAM
The receiver designed in this Thesis is a coherent receiver. Coherent
transmissions are transmissions where the receiver knows what type of data is
being sent. Coherency implies a strict timing mechanism, because even a data
signal may look like noise if the receiver looks at the wrong part of it [1].
More in general, a receiver must be able to demodulate and decode the received
signal, but also it must recover as much as possible the impairments introduced
by the channel and by the non-ideal components. A functional block diagram,
taken as reference point, is reported in Figure 2.4
Figure 2.4 – Block Diagram of Telecommand System
2. Telecommand System Design
9
2.4 TRANSMITTER SETUP
The transmitted signal, used as input for the receiver developed within the thesis
context, is generated by a Workstation exploiting the software GNU Radio. It is
a free & open-source software development toolkit that provides signal
processing blocks to implement software radios. It can be used with readily-
available low-cost external RF hardware to create software-defined radios, or
without hardware in a simulation-like environment [4]. Taking advantage of the
blocks provided by the software, the signal is generated and sent to a binary file
for MATLAB simulation. The generated flow will be then sent to the
development board through a socket exploiting an Ethernet connection for the
real-time test.
In Figure 2.5 is reported a high-level GNU Radio’s block diagram
Figure 2.5 – GNU Radio flow graph for Tx signal generation
The specifications reported in Table 2.1 derive from CCSDS Recommendations.
BPSK modulation is used since, it is considered the most robust modulation
scheme in terms of noise immunity and it allows the highest level of distortion
that can still successfully be modulated. A sinewave sub-carrier for telecommand
2. Telecommand System Design
10
is chosen with a frequency of 16 kHz (PSK modulated). This allows to separate
the data’s transmitted spectrum from the RF carrier. The subcarrier is totally
suppressed.
The Tx signal equation is reported below:
𝑡𝑠𝑖𝑔𝑛𝑎𝑙(𝑡) = √2𝑃𝑡 𝑒𝑗2𝜋𝑓𝑐𝑡 ∑ 𝑎𝑛𝑠(𝑡 − 𝑛𝑇)
𝑛
𝑒𝑗2𝜋𝑓𝑠𝑐𝑡
where 𝑃𝑡 is the total available transmitter power, 𝑎𝑛 are the sequence of
constellation points, 𝑠(𝑡) is the shaping pulse, 𝑓𝑐 is the carrier frequency and 𝑓𝑠𝑐
is the sub-carrier frequency. T is the data interval (𝑅𝑏 = 1𝑇⁄ is the data rate).
The response in time domain of the signal (no noise is present) is reported in
Figure 2.6
Figure 2.6 – Tx Signal in time domain (No Noise)
The choice to work at 48 samples per symbol derives from the requirement to
process the signal at the minimum possible frequency for simulation purpose.
Since the CCSDS does not give any specification about the Carrier Tx frequency,
is taken as reference 768 kHz. In order to satisfy Nyquist Theorem (in the RF front-
2. Telecommand System Design
11
end) the signal should be sampled at 32 kHz (twice the frequency of the Sub-
carrier). When data are present this sampling is not enough to assure anti-
aliasing, so the choice goes to the next integer divider of 768 kHz. Summarizing
it is supposed that the Analog to Digital Converter (ADC) at the RF front-end,
samples the received signal at 48 kHz in order to have an integer number of
samples per symbol.
When data are present, the BPSK modulation at the sub-carrier frequency is
visible in the spectrum of the signal
Figure 2.7 – Spectrum of the signal with data (No Noise)
The Physical Layer Operations Procedures (PLOPs) specify the sequence of
operations performed during a communications session. For recent missions,
CCSDS requires that PLOP-2 is used. It defines three data formats: Acquisition
Sequence, Communications Link Transmission Unit (CLTU) and Idle Sequence.
The Acquisition Sequence provides initial symbol synchronization within the
incoming stream of detected symbols and its pattern consists in alternating
“ones” and “zeros”. The CLTU is a data structure that contains the data symbols
that must be transmitted to the receiving end. The Idle Sequence is the data
2. Telecommand System Design
12
structure which provides for maintenance of symbol synchronization in the
absence of CLTUs and its pattern is the same of the Acquisition Sequence. In
PLOP-2 the channel is not deactivated after each transmitted CLTU and for this
reason, when there is a connection, the Acquisition Sequence is always present
to ensures that the bit synchronization is maintained. Moreover, a minimum Idle
Sequence of one octet is systematically inserted between each CLTU to eliminate
the small but finite possibility of synchronization lockout [1].
2.5 RECEIVED SIGNAL
Transmitter and Receiver RF Front-Ends introduce impairments to the Received
signal due to their non-idealities. The objective of the digital baseband block is
also to recover as much as possible these impairments.
Considering only an Additive White Gaussian Noise (AWGN) introduced by the
channel, the Rx signal at the RF Front End is in the form:
r(𝑡) = √2𝑃𝑡 𝑒𝑗2𝜋𝑓𝑐𝑡 ∑ 𝑎𝑛𝑝(𝑡 − 𝑛𝑇)𝑒𝑗2𝜋𝑓𝑠𝑐𝑡 + 𝑛(𝑡)
𝑛
where 𝑎𝑛 are the sequence of constellation points used at the Tx, 𝑝(𝑡) is the
cascade of the Tx filter and the impulse response of the channel (𝑠(𝑡) ∗ 𝑐(𝑡)) and
𝑛(𝑡) is the noise introduced by the channel.
Typical quantities used to describe the relative power of noise in an AWGN
channel are the Signal-to-Noise Ratio (SNR), the ratio of bit energy to noise
power spectral density (𝐸𝑏
𝑁0) and the ratio of symbol energy to noise power
spectral density (𝐸𝑠
𝑁0). The relation between the last two quantities (expressed in
dB) is:
𝐸𝑠
𝑁0=
𝐸𝑏
𝑁0+ 10 ∙ 𝑙𝑜𝑔10(𝑘)
2. Telecommand System Design
13
where k is the number of information bits per symbol.
In the considered system, k is influenced by the size of the modulation alphabet
(M) and by the code rate (𝑅𝑐) and it is:
𝑘 = 𝑅𝑐 ∙ 𝑙𝑜𝑔2(𝑀)
Instead, the relationship between 𝐸𝑠
𝑁0 and the SNR (in dB), for a complex input
signal, is:
𝐸𝑠
𝑁0= 10 ∙ 𝑙𝑜𝑔10 (
𝑇𝑠𝑦𝑚
𝑇𝑠𝑎𝑚𝑝) + 𝑆𝑁𝑅
where 𝑇𝑠𝑦𝑚 is the symbol time (1
𝑅𝑠) and 𝑇𝑠𝑎𝑚𝑝 is the sampling time (
1
𝑓𝑠) [5].
The block that models the channel in GNU Radio requires the noise voltage as
parameter for the AWGN noise. Knowing that the SNR is the ratio between the
input signal power (𝑆) and the noise power (𝑁) it is possible to derives the noise
voltage as the square root of 𝑁 exploiting the formulas above. The noise voltage
is equal to:
√𝑁 =√
𝑆
𝐸𝑏
𝑁0+ 𝑘 −
𝑇𝑠𝑦𝑚
𝑇𝑠𝑎𝑚𝑝
where all the quantities are considered in linear scale. Choosing a value of 𝐸𝑏
𝑁0 a
corresponding noise voltage to model the AWGN channel is provided.
In the received signal, also non-idealities are present and they are originated by
amplifiers, mixers, A/D and D/A converters.
Amplifiers introduce third-order distortion and additive thermal noise (that is
AWGN).
2. Telecommand System Design
14
Mixers (up and down convert the complex baseband signal that contains the
useful information) with Local Oscillator (LO) introduce carrier frequency
offset 𝜐(𝑡), phase noise 𝜃(𝑡) and IQ imbalance.
ADC and Digital to Analog Converter (DAC) carry other impairments as
sampling clock offset (𝜏) and sampling jitter (𝜏(𝑡)).
After the analog RF front-end (at the digital baseband) the signal is affected by
different types of impairments. Rx signal, with the RF impairments, is:
𝑟𝐵𝐵(𝑡) = √2𝑃𝑡 𝑒𝑗(2𝜋𝜐(𝑡)𝑡+𝜃(𝑡)) ∑ 𝑎𝑛𝑝((𝑡 − 𝑛𝑇) − 𝜏(𝑡)) + 𝑤(𝑡)
𝑛
where 𝑤(𝑡) takes into account the noise introduced by the channel plus other
thermal interferences.
The presence of the subcarrier is ignored in the equation, but its contributions are
still present and must be removed by the digital baseband block through the Sub-
Carrier Synchronization block.
2. Telecommand System Design
15
2.6 TEST SET-UP
As already anticipated, once the system is validated through MATLAB
simulation, a porting of the project, in C language, is done and it is executed on
a Development Board. The Set-up is shown below
Figure 2.8 – Block Scheme of the test set-up
In the Figure above the test Set-up is shown. It is made by a transmitter that is a
Workstation that generates the signal exploiting the software GNU Radio (see
Chapter 2.4). In details, the Figure represents the connections to make the virtual
link between the Workstation and the development board (ETHERNET), to
program the Xilinx (PROG) and to print on terminal some control variables
(UART).
The signal is sent through a Gigabit Ethernet connection to a ZedBoard that is an
evaluation and development board based on the Xilinx Zynq-7000 All
Programmable SoC (Zynq-7000 SoC XC7Z020). It is composed of two major
functional blocks: the Processing System (PS) and the Programmable Logic (PL).
The ARM Cortex-A9 CPUs are the heart of the processing system and also
include on-chip memory, external memory interfaces, and a rich set of peripheral
connectivity interfaces [6]
2. Telecommand System Design
16
Figure 2.9 – Zynq-7000 SoC Architectural Overview
From the main characteristics it is known that the CPU maximum frequency is
667 MHz and the total amount of DDR3 memory is 512 Mb. The processor also
has two USB 2.0, Gigabit Ethernet and other interfaces as the UART that allows
the ZedBoard to interact with the Workstation.
The PL offers the designer the ability to implement their own custom logic which
can work alongside the software running on the processor cores. The two areas
of the device, PS and PL, are linked by a series of interfaces which adhere to the
AXI4 interconnect standard. These interfaces allow the designer to implement
2. Telecommand System Design
17
custom logic in the PL which can be connected to the PS and extend the range of
peripherals which are available and visible in the processor’s memory map.
This technology is used to create a block of custom logic in the PL, and then add
control and status monitoring capabilities by using memory mapped registers
which the processors can access via the AXI4 interconnect [7].
During the study of each receiver blocks will be evaluated the possibility to
exploit this functionality and, eventually, exporting the block in hardware logic
with the purpose to improve and speed up the implementation and the software.
18
CHAPTER 3
3 Physical Layer Design
The following chapter will explain in detail the functionalities of the physical layer. For
each block, different possible designs are discussed evaluating the trade-off focusing on
the implementation of the chosen one. Then simulation and performance results are
presented.
3.1 RF AND MODULATION SYSTEMS
The Physical Layer is responsible to activate and deactivate the physical
connection between the transmitter and the receiver. The subsystem modulates
the CLTUs onto the RF carrier. As discussed in Chapter 2.4, PLOP-2 is chosen
and the Carrier Modulation Modes (CMM) are reported in the following diagram
Figure 3.1 – Carrier Modulation Modes (PLOP-2)
3. Physical Layer Design
19
The CMMs consist of different states of data modulation which creates the
physical telecommand channel.
With PLOP-2, CMM-1 is in charge of establishing and maintaining the RF
connection so, in this state no telecommand modulation is present. CMM-2 is
concerned to transmit the Acquisition Sequence to enable the receiving end to
acquire bit synchronization. CMM-3 deals with the transmission of one CLTU
while CMM-4 takes care to send the Idle Sequence between CLTUs in order to
maintain the modulation part of the physical connection.
In the next Sections are explained the blocks that makes up the physical layer
(Figure 2.4). For each block is shown the possible designs evaluating the trade-
off focusing on the implementation of the chosen one. Then simulation and
performance results are presented.
3.2 AUTOMATIC GAIN CONTROL
In order to adjust the receiver signal strength to a certain desired value an
Automatic Gain Control (AGC) block is implemented. The AGC should process
the signal and provide a feedback to the RF chain. If the input power is high, it
can decide to reduce the signal strength to minimize the distortion due to
nonlinearity or, if the input power is low, it can increase the gain to reduce the
noise figure and boost the SNR.
Since this project concerns the digital baseband processing part, the AGC is
implemented as first block in the chain with the purpose to bring the signal to a
reference value.
3. Physical Layer Design
20
3.2.1 Block Design and Trade-off analysis
The response time plays an important role when designing the AGC. There is
usually a compromise between having the loop that replies quickly to
undesired input level fluctuations vs. having it undesirably modify amplitude
modulation on the signal. The reason is to be sought in the fact that large and
abrupt changes in the input level of the signal may lead to unacceptable
recovery behavior.
In some transponders, the AGC is designed to keep constant the average of the
absolute value of the voltage at the AGC output. In other transponders, the
AGC is designed to keep constant the Root Mean Square (RMS) voltage at the
AGC output [8].
The following block diagram reports the first of the two possible solutions
Figure 3.2 – AGC Absolute value block diagram
In the diagram above a recursive algorithm for updating the AGC value (G) is
applied:
𝐺𝑘+1 = 𝐺𝑘 + 𝛼 (𝑟𝑒𝑓 − 𝐸{|𝑟|})
3. Physical Layer Design
21
where 𝛼 is the updating step size, ref is the reference value to which the average
absolute value of the signal must be brought and 𝐸{|𝑟|} is the average over N
samples of the Rx signal. The output is the multiplication of each new received
sample by the AGC Gain value. For both the AGC it is supposed that the Gain is
equal for both real and imaginary part of the signal.
The second AGC computes the average RMS level of the signal and then
divides the signal by this value. The result is obtained using a single-pole IIR
filter:
𝑝𝑘+1 = (1 − 𝛼)𝑝𝑘 + 𝛼|𝑥𝑘|2 , 𝑟𝑘 = √𝑝𝑘
where 𝑥𝑘 is the input signal, 𝑝𝑘 is the averaged power and 𝑟𝑘 is the RMS. The
diagram in Figure 3.3 summarizing the block implementation
Figure 3.3 – AGC RMS block diagram
The filter has a single design parameter, which is the decay value 𝛼 and the
response is analogous to the response of an electronic low-pass filter. The decay
value 𝛼 is related to the time constant 𝜏. For example, with 𝛼 = 0.01, the time
constant 𝜏 = −1
ln(1−𝛼)≈ 99.5. Since the sampling frequency 𝑓𝑠 = 48 𝑘𝐻𝑧, the time
constant is around 2 𝑚𝑠 as it is possible to see in the figure below
3. Physical Layer Design
22
Figure 3.4 – Effect of AGC RMS to Rx Signal
In the next subsection, a choice on the ACG type is done evaluating the
performance in a real time implementation of the blocks.
3.2.2 Simulation results
Exploiting a real signal, simulations of the two different implementations are
done.
For what concern the first design, it is necessary to reach a compromise on how
many samples are taken for the average. Since, from specification, the samples
per symbol are 48, for the received signal mean are taken 𝑁 = 256 samples that
results in a reasonable trade-off because it means that the signal is observed on
an interval of around 0.5 seconds.
In Figure 3.5 is reported the trend of the G parameter at different step size 𝛼. The
simulations are done simulating once the AWGN samples and adding them each
time to the same portion of the Rx signal. 𝛼 is taken from 10−6 up to 10−2 since
other values do not give interesting results.
3. Physical Layer Design
23
As it is easy to see, the variance of the AGC gain parameter arises as 𝛼 increases.
However, higher values of 𝛼 lead to a faster convergence of the results, in fact,
for 𝛼 = 10−6 it is hard to see where the value converges after 1 second. When the
parameter increases so much, the system diverges (for 𝛼 = 10−6 AGC gain goes
to −∞).
The good compromise is to take the value around 10−4 where the variance of the
result is small and the convergence is quite fast
Figure 3.5 – AGC Gain Value for different α (AGC absolute value)
3. Physical Layer Design
24
The same simulations made for the first AGC are performed for the second one
(same conditions hold).
Also in this case is possible to see the trend of the Gain at the various step size.
The variance of the curves is almost equal to the previous case and the choice of
the 𝛼 parameter does not change
Figure 3.6 – AGC Gain Value for different α (AGC RMS)
Both AGCs are implemented in C language and executed, as already anticipated,
on the Xilinx board.
3. Physical Layer Design
25
The results, changing the parameter α, are as expected. For what concerned the
performance in terms of clock cycles needed to perform the operation of updating
the gain parameter, obviously the first AGC performs worse.
3.2.3 Optimization and Performance results
An optimization is adopted regarding the calculation of averaging N samples.
Instead of iterating every time on all the considered samples, an accumulator that
contains the average value is initialized and, when a new sample enters, it is
added to the accumulator (averaged on N samples) while the oldest one is
removed from it. This avoid looping always on the vector containing the
incoming signal and the clock cycles needed to perform the operation is
consistently reduced. The performances are reported below
Table 3.1 – AGC Performance
Type Clock Cycles Elapsed Time [µs] Repetitions CPU [%]
AGC 1 647 1.94 48000 9.31
AGC 1 mod. 131 0.39 48000 1.87
AGC 2 80 0.24 48000 1.15
In the table are reported, for each implemented AGC, the clock cycles needed
from the instant in which a new sample enters in the loop to the time in which
the AGC gain value is given. For all the performance analysis, from now on, a
value of CPU utilization (in percentage), of the Zynq-7000, is calculated,
considering the number of time that the operation is done in one second. Since
the sampling frequency is 48 kHz, and the AGC calculates the gain every new
sample, the operation is repeated 48000 times per second.
As it can be noticed from Table 3.1, the performance of the modified version of
the first AGC (AGC 1 mod.) is comparable with the second AGC implementation
(AGC 2), but it is still worse in terms of computational needs. For this reason,
3. Physical Layer Design
26
AGC 2 is chosen for the final implementation. Just one note on the choice of α
must be done since, in the real implementation, it is derived that a slightly
different value must be used to reach the same result as in simulation
environment.
3.3 SUB-CARRIER SYNCHRONIZER
In order to remove the presence of the sub-carrier, the baseband signal is simply
multiplied with a local oscillator that (ideally) generates a pure single frequency
sine wave.
The PSD of the signal, in which the presence of the subcarrier is evident, is
reported in the following picture
Figure 3.7 – PSD of Rx Signal
The real process leads to the impairments described before that must be
recovered. The phase of the sub-carrier generated by the local oscillator must be
aligned to the phase of the incoming signal and the phase noise makes the job
more difficult.
3. Physical Layer Design
27
If the oscillator was ideal, its frequency response could be represented with a
Dirac function at the carrier frequency 𝑓𝑐, but the real one presents a kind of bell
due to the phase noise profile
Figure 3.8 – Ideal (Left) vs Real (Right) Oscillator frequency response
The phase noise 𝜃(𝑡) is described in the frequency domain by its PSD in dBc/Hz.
It is the ratio between the noise power measured in 1 Hz bandwidth, at a
frequency offset 𝑓𝑚, and the power of the carrier [9].
3.3.1 Design and Trade-off analysis
The best estimate of the unknown phase 𝜃 based on the observation, over an
interval, is the phase maximizing the log-likelihood function (ML). The use of the
open loop ML estimator suffers from two main drawbacks. The first one is that
the phase estimate is available only at the end of the observation interval. This
leads to the second drawbacks where the data received in this interval have to be
stored in order to postpone every decision about them.
The solution is to use an iterative procedure. Assume that at the end of the k-th
carrier period an estimation 𝜃𝑘 of the carrier phase is available, so it can be used
as starting value to be updated according to the received signal observation in
the next carrier period. Since maximizing the log-likelihood ratio is equivalent to
require that its derivative be zero, it can be conditioned by the value 𝜃𝑘 obtained
3. Physical Layer Design
28
in the previous interval. At the end of the training phase it is reasonable to
assume that 𝜃𝑘 ≈ 𝜃 . Its sign gives information on whether the estimation is
smaller or greater than the real phase value.
The recursive algorithm adopted to update the estimation is:
𝜃𝑘+1 = 𝜃𝑘 − 𝛼𝑘𝐸{𝑒(𝜃𝑘)}
The new phase estimation is equal to the old one minus a coefficient that multiply
the average of the error. The ensemble average 𝐸{∙} can be substituted by a time
average.
A practical implementation can be obtained through the Phase Locked Loop
(PLL). The adopted solution is reported in Figure 3.9
Figure 3.9 – PLL Block Diagram
What is represented in the block diagram is that the received signal in baseband
is multiplied by a signal at the subcarrier frequency, generated by the Virtual
Controlled Oscillator (VCO), plus a phase offset given by the feedback loop. The
loop calculates the instantaneous phase of the signal 𝑥𝑘 as:
𝜃𝑘 = 𝑡𝑎𝑛−1 (𝐼𝑚(𝑥𝑘)
𝑅𝑒(𝑥𝑘))
3. Physical Layer Design
29
where Re and Im are the real and imaginary components of the signal
respectively. This value is used like an error 𝑒(𝜃𝑘) to change the frequency of the
VCO and, if the Loop Filter is not present, the PLL is of the First Order (the error
enters directly in the VCO). After the multiplication by the subcarrier, the
components are filtered in order to remove residual high frequency terms and
for the reasons reported in Section 3.4.
The choice of the discriminator is a trade-off between performance of the PLL
and required hardware resources, as well as complexity [10].
The optimum compromise is to use as discriminator the error signal reported in
the formula above. Looking at the Figure 3.10 (left), considering a constellation
point in the complex domain, the angle 𝜃 can be estimated as described
Figure 3.10 – PLL Discriminator
The ratio between Real and Imaginary part is the 𝑡𝑎𝑛 (𝜃) so the angle is
represented by the 𝑡𝑎𝑛−1(∙) . Moreover, the ratio allows to remove the data
dependency since they are present in both parts. The output of the discriminator
is a limited function, Figure 3.10 (right), between −𝜋
2 and
𝜋
2, so it is possible to
have the control of the VCO input.
3. Physical Layer Design
30
The PLL can be of different orders. As already anticipated, if the error signal is
input to the VCO without further filtering, the PLL is of the first order. The block
diagrams are shown below
Figure 3.11 – First Order PLL (up) – Second Order PLL (down)
The transfer function of the first order PLL is:
𝐻𝑒𝑞(𝑠) =𝐴𝑘1
𝑠 + 𝐴𝑘1
where 𝐴 and 𝑘1 are the gains of the loop.
The drawback of the first order PLL is that residual offset in phase will be a
function of the offset in frequency
Figure 3.12 – Steady-state errors of first order PLL
As is possible to see, the PLL is able to completely recover the phase offset with
the response in Figure 3.12 (left). If it is present a constant frequency offset, it
3. Physical Layer Design
31
translates in a constant phase offset (center). Finally, if a non-constant frequency
offset is present, the PLL phase error goes to infinite (right).
For this reason in this project is implemented a second order PLL that is not only
able to track the phase, but also to track the frequency offset. If the frequency
offset is non-constant, the phase is recovered with a constant residual error.
In the following table are summarized the steady-state errors of a PLL
Table 3.2 – Steady-state errors of a PLL
Errors 1-st order PLL 2-nd order PLL 3-rd order PLL
Phase Offset 0 0 0
Constant
Freq. Offset Constant 0 0
Non-Constant
Freq. Offset ∞ Constant 0
The case of a third order PLL is reported, but it is not implemented since the
complexity of the system grows. Moreover, increasing the tracking capability of
the PLL lead to put more noise in the system since the equivalent bandwidth of
the PLL will increase. Lastly, a fine phase recovery loop of the first order is
inserted in the block diagram of the receiver and this translates in increasing the
order loop bandwidth.
From Figure 3.11 is possible to describe the system with the following transfer
function:
𝐻𝑒𝑞(𝑠) =4𝜍𝜋𝑓𝑛𝑠 + (2𝜋𝑓𝑛)2
𝑠2 + 4𝜍𝜋𝑓𝑛𝑠 + (2𝜋𝑓𝑛)2
where 𝑓𝑛 =1
2𝜋√
𝐴𝑘1
𝜏1 is the natural frequency, 𝜍 =
𝜏2
2√
𝐴𝑘1
𝜏1 is the damping factor and
𝐻(𝑠) =1+𝑠𝜏2
𝑠𝜏2 is the frequency response of the filter.
3. Physical Layer Design
32
These are the three most important parameters in a PLL design and they will be
analyzed in detail later.
In order to convert the model from continuos to discrete time is possible to use
the Bilinear Transform [11] that allows to easily substitute to the variable s the
following approximation:
𝑠 ←2
𝑇
𝑧 − 1
𝑧 + 1
where T is the sampling frequency. The result, after some algebric operations, is:
𝐻𝑒𝑞[𝑧] = (
4𝑇 𝜍𝜔𝑛 + 𝜔𝑛
2) 𝑧2 + 2𝜔𝑛2𝑧 + 𝜔𝑛
2 −4𝑇 𝜍𝜔𝑛
(4
𝑇2 +4𝑇 𝜍𝜔𝑛 + 𝜔𝑛
2) 𝑧2 + (2𝜔𝑛2 −
8𝑇2) 𝑧 +
4𝑇2 −
4𝑇 𝜍𝜔𝑛 + 𝜔𝑛
2
Consequently, because 𝐻𝑒𝑞(𝑠) is a second-order analog filter, 𝐻𝑒𝑞[𝑧] is a second-
order digital filter. From the formula above it is possible to represent the system
in the following form:
𝐻𝑒𝑞[𝑧] =𝑏0 + 𝑏1𝑧−1 + 𝑏0𝑧−2
1 + 𝑎1𝑧−1 + 𝑎2𝑧−2
where 𝑏0, 𝑏1 and 𝑏2 are the feed-forward coefficients and 𝑎1, 𝑎2 are the feedback
coefficients. The functional block diagram is reported in the figure below (𝑧−1
represents a single sample delay)
Figure 3.13 – IIR Biquad Filter
3. Physical Layer Design
33
A final observation should be done on the design of the VCO. The signal, in
practice, can be generated using the ‘math.h’ library. The generated sin and cos
have a high accuracy, but the process is quite expensive in terms of clock cycles
(see 3.3.2). An alternative method is to use a Lookup Table (LUT) with a
Numerically Controlled Oscillator (NCO). Internally, the NCO keeps track of
the phase of the sine wave it produces, and it increments this phase at each
sample point [12]. Obviously, losses in precision of the result are presents, but
they can be alleviated by exploiting some tricks (saving the error computed at
each iteration) and by choosing an acceptable dimension for the LUT. For each
value of phase, three different values are sufficient to represent the sinusoid since
the sampling frequency is three times the subcarrier frequency.
3.3.2 Simulation and Performance results
Considering the second order PLL (IIR Biquad Filter) and fixing the gain loop
parameters, 𝜏1 and 𝜏2 is possible to verify the step response and the stability of
the system exploiting MATLAB
Figure 3.14 – Step response (Left) – Nyquist diagram (Right) with 𝝇 = 𝟎. 𝟐𝟑𝟕 and
𝝎𝒏 = 𝟑𝟏𝟔. 𝟐𝟑
In the first case, fixing the mentioned values, the result is the one reported in
Figure 3.14. From the Nyquist diagram is possible to see that the system is stable
3. Physical Layer Design
34
since the plot is far from the critical value -1. The overshoot in the step response
is due to the low value of 𝜍 . Implementing the solution in the MATLAB
simulation of the Receiver, the result reflects the study just done. The output
phase in time is shown below
Figure 3.15 – Estimated Phase in Rx simulation – First test
The system can be brought to a slower convergence modifying the parameters.
In the following test is possible to see the difference with the first case. The
overshoot is smaller and the step response is slower since the dumping factor is
slightly higher and the cutoff frequency is low
3. Physical Layer Design
35
Figure 3.16 – Step response (Left) – Nyquist diagram (Right) with 𝝇 = 𝟎. 𝟕𝟓 and
𝝎𝒏 = 𝟏𝟎
Also in this case, verifying the solution in the Rx simulation, the phase has the
following trend
Figure 3.17 – Estimated Phase in Rx simulation – Second test
3. Physical Layer Design
36
It is possible to notice that now, the system response is slower to converge to a
constant value and it is only present one overshoot. In the box is reported a zoom
of a portion of the plot where it can be seen that the phase value oscillates. This
effect is due to the fact that the phase is recovered and the PLL is working
correctly.
Porting the simulation in C language, the result is not the expected one since the
error oscillates between its limit values and so, the PLL is not able to track the
phase. The reason is that, when matched filtering is used, the subcarrier
synchronizer must work at the optimum sampling instant, since it must know
the period on which it must integrate the result. Moreover, for what concerns the
choice of the loop bandwidth 𝜔𝑛, the smaller it is, the smaller is the tracking error
but the smaller is the frequency offset that the PLL can follow. For this reason, a
good compromise is reached by increasing its value. The coefficients of the IIR
filter are calculated in MATLAB fixing the sampling time to 1 ms (duration of one
symbol), the damping factor to √2
2 (that is the typical value) and 𝜔𝑛 = 110 𝑟𝑎𝑑/𝑠.
In Figure 3.18 is reported the corresponding Bode Diagram, that shows the
frequency response of the system in magnitude and phase
Figure 3.18 – Bode Diagram with 𝝇 = 𝟎. 𝟕𝟎𝟕, 𝝎𝒏 = 𝟏𝟏𝟎 and 𝑻𝒔 = 𝟏 𝒎𝒔
3. Physical Layer Design
37
The step response has an overshoot and the system converges to the final value
quite fast
Figure 3.19 – Step response with 𝝇 = 𝟎. 𝟕𝟎𝟕, 𝝎𝒏 = 𝟏𝟏𝟎 and 𝑻𝒔 = 𝟏 𝒎𝒔
An important parameter is the noise equivalent bandwidth of the PLL that, for
a second order PLL and for this system, is:
𝐵𝑒𝑞 = 𝜋𝑓𝑛√𝜍 +1
4𝜍= 56.64 𝐻𝑧
From experimental results it is derived that, as in theory, the larger is the noise
equivalent bandwidth, the more is the additive noise in the system. However, if
the bandwidth is large, more is the frequency offset that the loop can track.
A problem arises in the real implementation of the IIR filter. It is demonstrable
that, systems containing non-linearities and feedback loops are always prone to
autonomous oscillations or limit cycles also if floating-point numbers are used
3. Physical Layer Design
38
[13]. To avoid the problem, the estimated phase is clipped from 0 to 2𝜋 , but
another undesired effect is shown in Figure 3.20
Figure 3.20 – Estimated Phase (Left) – Scattering Diagram after PLL (Right)
For these reasons, the final implementation of the PLL is slightly different from
Figure 3.13, but the same study, just done, holds.
As already anticipated in the design subsection, an NCO is implemented. In the
following table is shown the difference between removing the presence of the
subcarrier multiplying the incoming signal by a signal generated by the ‘math.h’
library and using a LUT
Table 3.3 – Subcarrier removal Performance
Type Clock Cycles Elapsed Time [µs] Repetitions CPU [%]
math.h 501 1.50 48000 7.20
LUT 134 0.40 48000 1.92
As expected, the gain in using the Lookup Table is consistent. The trade-off to be
reached is in the dimension of the LUT since, larger it is, better is the accuracy of
the result, but higher is the occupied memory. Some tricks to reduce the size of
the LUTs can be used as exploiting the same table for the sine and the cosine since
they are related in some way.
3. Physical Layer Design
39
The CPU utilization of the Sub-Carrier Synchronizer (to be summed with the
value on Table 3.3) is very low. It is executed once per symbol and the
performance is reported in the following table
Table 3.4 – Sub-Carrier Synchronizer Performance
Type Clock Cycles Elapsed Time [µs] Repetitions CPU [%]
Sub-Carrier
Synchronizer 212 0.64 1000 0.06
Summarizing, the stability study plays the important role in the design of a PLL.
More in general, some of the key pieces that must to be known are reported in
the following list:
• The convergence’s speed depends mainly on the damping factor, smaller
it is, faster is the convergence of the PLL, but more overshoots are present;
• Smaller is the Loop Bandwidth, smaller is the tracking error, but also
smaller is the frequency offset that the PLL can follow;
• Larger is the Noise equivalent bandwidth (it depends on the damping
factor and on the PLL loop bandwidth), higher is the additive noise in the
system;
• When matched filtering is used, the PLL must know the period on which
integrates the result;
• When the PLL is locked, the estimated phase continuously oscillates in a
small range;
• In the implementation of a PLL, cycle sleep are present and they must be
solved changing a little bit the structure of the block.
3. Physical Layer Design
40
3.4 FILTER
The Rx filter is a filter perfectly matching the one used at the Tx side. The filter
used at the Tx is called shaping pulse, instead the one used at the Rx is called
matched filter. They have the purpose to shape the spectrum by limiting the
effective bandwidth of the signal, to reduce the effect of the noise. More in
general, the system should not have Intersymbol Interference (ISI), so it must
satisfy the Nyquist criterion. When multiple symbols are transmitted, the
impulse response of the channel causes a transmitted symbol to be spread in the
time domain and at the receiving end they result overlapped. Nyquist theorem
translates the time domain condition in a frequency domain condition providing
a method to construct band-limited functions.
Two different kind of filter are provided theoretically and they are implemented
in practice showing in what they differ.
3.4.1 Design and Trade-off analysis
The first filter used at the Tx is a rectangular pulse with length equal to the
symbol length (Figure 3.21)
Figure 3.21 – Shaping pulse at the Tx in time
In order to match the filter at the Tx, a Moving Average (MA) filter is used at the
Rx. It is a Low Pass Finite Impulse Response (FIR) filter that takes N samples of
input at time, takes the average and produce a single output point
3. Physical Layer Design
41
Figure 3.22 – FIR filter Block Diagram
In Figure 3.22, it is reported the typical representation of a FIR filter where 𝑢𝑛 are
the taps of the filter, 𝑟[𝑛] is the signal to be filter, 𝑧−1 represent a single delay line
and 𝑦[𝑛] is the filtered signal. The output is described as the convolution between
the input signal and the impulse response of the filter that, in the discrete domain,
is:
𝑦[𝑛] = ∑ 𝑟[𝑛 − 𝑘] − 𝑢[𝑘]
𝑘=𝑁−1
𝑘=0
Neglecting the noise, the output to the filter is reported in the following plot
Figure 3.23 – Rx Signal filtered (MA) – No Noise
3. Physical Layer Design
42
The second type of filter is the Square Root Raised Cosine (SRRC). It is a
particular case of Nyquist filter and its impulse response is defined as follow:
ℎ(𝑡) =2𝛽
𝜋√𝑇
cos [(1 + 𝛽)𝜋𝑡
𝑇 ] +sin [(1 − 𝛽)𝜋𝑡/𝑇]
4𝛽𝑡/𝑇
1 − (4𝛽𝑡/𝑇)2
where T is the symbol time and β is the rolloff factor that is a measure of the
excess bandwidth of the filter, i.e., the bandwidth occupied beyond the Nyquist
bandwidth of 1 2𝑇⁄ [14].
Figure 3.24 – SRRC Impulse Response at different roll-off
The impulse response at the different rolloff factor is reported in the plot above.
The filter, theoretically, has an infinite number of taps so it has infinite
attenuation in the stop band. However, in implementation, its length should be
reduced to a finite value.
Smaller rolloff gives narrower bandwidth, however, its side lobes increase so
attenuation in stop band is reduced. In other words, low values of β allow for a
more efficient use of the spectrum but increase the ISI.
3. Physical Layer Design
43
In Figure 3.24, the filter is represented using 8 sample per symbol and its response
is cut at 6 symbols. This trade off is reached in order to compare the MA filter
with the SRRC ones taking into account the same number of taps. In this way the
complexity of the filtering stage is not increased and a reasonable number of side
lobes is considered. From now on, all the following blocks must work at lower
samples per symbol, so the resolution of the result is lower and a trade-off must
be reached.
The downsampling of the Rx signal, at this stage, is simply done by taking one
sample every six:
𝑑𝑜𝑤𝑛𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔𝑓𝑎𝑐𝑡𝑜𝑟 =𝑓𝑠
𝑓𝑠,𝑑𝑜𝑤𝑛=
48000
8000= 6
Performance analysis are presented in the next subsection.
3.4.2 Simulation and Performance results
A useful tool for the evaluation of the combined effects of channel noise
and ISI on the performance of a baseband pulse-transmission system is the Eye
Diagram. It is an oscilloscope display in which a digital signal from a receiver is
repetitively sampled and applied to the vertical input, while the data rate is used
to trigger the horizontal sweep.
Several system performance measures can be derived by analyzing the display.
An open eye pattern corresponds to minimal signal distortion. Distortion of the
signal waveform due to ISI and noise appears as closure of the eye pattern [15]
3. Physical Layer Design
44
Figure 3.25 – Eye diagram with MA filter (Left) – Scattering Diagram (Right) -
Eb/N0 = 25
In the figure above is reported the eye diagram after the MA filter with a value
of 𝐸𝑏 𝑁0 = 25⁄ . Also the scattering diagram at the same 𝐸𝑏 𝑁0⁄ is shown in the
figure (right).
The filtered signal with the SRRC filter is shown in Figure 3.26
Figure 3.26 - Rx Signal filtered (SRRC, rolloff = 0.25) – No Noise
3. Physical Layer Design
45
A rolloff of 0.25 is used at the Tx and Rx sides. The difference between the MA
filter is that the signal is rounded and the Subcarrier Synchronizer was able to
completely recover the phase of the signal.
Using a large rolloff, the eye diagram is completely open, this means, if the
sampling instant is the perfect one there is not ISI and a small sampling instant
time error produce small ISI. Instead, if the rolloff used is small, a small sampling
instant time error produces a large ISI.
In the implementation on the development board, the MA filter is implemented
(SRRC filter implementation is demanded as Future Work) and the taps chosen
are all equal in order to generate the impulse response of the filter in Figure 3.21.
As already anticipated, one of the objectives of the project is to find some blocks
that can be exported in hardware implementation. From performance analysis
results that the FIR filter is very expensive in terms of CPU utilization since, every
time one new sample of the signal enters in the system, a loop must be performed
to multiply all the taps of the filter by the incoming signal.
The following table shows the gaining in CPU utilization between the two
implementations
Table 3.5 – Filter Performance
Type Clock Cycles Elapsed Time [µs] Repetitions CPU [%]
Filter 1 1888 5.67 48000 27.22
Filter 2 603 1.81 48000 8.69
Filter 1 represents the software implementation of the filter, instead, Filter 2 is its
hardware implementation that results in an improvement of about three times in
terms of clock cycles needed to perform the filtering of one sample. The hardware
implementation of the block will be discussed in detail in Chapter 6.5.
3. Physical Layer Design
46
3.5 CLOCK RECOVERY
The ADC delimits the boundary between the analog and digital parts of the
signal processing chain and it provides to the DSP block a sequence of samples
at a certain rate (~N samples per symbol). Clock recovery is required to determine
the optimum sampling instant in order to down-sample the signal to 1 sample
per symbol.
Moreover, the clocks used to sample the signal are not ideal and introduce error
on the sampling time which manifests itself as noise and can limit the system
performance. This timing error is known as jitter and is expressed in seconds.
3.5.1 Design and Trade-off analysis
The optimum sampling instant is the one that maximizes the power of the
discrete signal (independently from the data). This translates in finding the
likelihood of the sampling time offset (τ):
Λ(𝜏) = 𝐸[|𝑟(𝑛𝑇 + 𝜏)|2]
The likelihood of τ is the average value of the power, of the received signal,
sampled at time τ. In other words, the best possible sampling instant is when the
output sequence, after sampling, has the maximum possible power.
This kind of Clock recovery is an Open Loop Estimation. In order to calculate τ
it is needed a window of length L samples and so this block introduces delay in
the processing.
Once the optimum sampling instant is found, it is needed an interpolation
because it can be among two samples. This increases the complexity of the
system, but it could not be necessary if there is not downsampling after the
filtering stage.
3. Physical Layer Design
47
There are different kind of interpolator, the less complex is the linear one that
provides a very rough interpolation between samples
Figure 3.27 – Linear interpolation between 4 samples
In the figure above there is an example of the linear interpolator where 4 samples
per symbols are taken into account.
About that, if the filter used at the receiver is the MA (it works with 48 samples
per symbol) the error computed in using the nearest sample to τ is negligible.
In this project, the Open Loop Timing estimation is performed once in the
receiver chain and when the symbol clock synchronization is lost. Every time the
symbol lock is acquired, another kind of estimation is done and it is a Closed
Loop Timing estimation.
This technique takes the derivative of the likelihood (derivative of the power of
the observation with respect to τ):
𝑑Λ(τ)
𝑑𝜏= 𝑅𝑒(𝑟∗(𝑘𝑇 + 𝜏)𝑟′(𝑘𝑇 + 𝜏))
The derivative can be calculated replacing it with a finite difference:
𝑟′(𝑘𝑇 + 𝜏) = 𝑟(𝑘𝑇 + 𝜏 + 𝑇/2) − 𝑟(𝑘𝑇 + 𝜏 − 𝑇/2)
3. Physical Layer Design
48
The loop is called Delay Locked Loop (DLL) and, in particular, the error
calculated as above, that takes the sample at the actual time (𝑧2𝑘), at time 𝜏 +𝑇
2
(𝑧2𝑘+1) and 𝜏 −𝑇
2 (𝑧2𝑘−1), is called Early late correlator.
The block diagram is reported in Figure 3.28
Figure 3.28 – Delay Locked Loop for Timing estimation
The circuit updates the error made in the calculation of the clock timing offset
and only needs 2 samples per symbol. The initialization is done using the
optimum sampling instant derived from the open loop. The error is computed by
taking the product of the middle tap (complex conjugate) times the difference of
the previous and the next. The error is used to update τ with a gradient algorithm:
𝜏𝑘+1 = 𝜏𝑘 + 𝛾𝑒𝑘
This is a first order loop and the drawback is the error propagation.
Since it is not possible to have the sample at the time 𝜏 +𝑇
2, the derivative can be
calculated delaying the correlator by 𝑇 2⁄ , obtaining the Gardner correlator that
is the one implemented in the project.
3. Physical Layer Design
49
3.5.2 Simulation and Performance results
There are not particular problems in the implementation of the Open Loop
Timing on the Xilinx board. The only trade-off to be reached here is finding the
right number of symbols that allows to find the optimum sampling instant.
It can be demonstrated that, with a number of symbols less than 50, some errors
can appear if the signal-to-noise ratio is low. Taking 100 (or more) symbols gives
a good estimation of the optimum sampling instant at which the incoming signal
must be sampled. Obviously, the higher is the number of symbols taken into
account, the higher is the delay introduced in the processing chain.
The prove that the Open Loop Timing provides the correct sampling instant is
shown below
Figure 3.29 – Signal’s Energy sampling at different τ
3. Physical Layer Design
50
Figure 3.29 illustrates the signal’s energy calculated sampling the signals at
different sampling instants (shifting the signal by one sample each time).
The energy is calculated by squaring the real part of the filtered signal (since on
the imaginary part, after the PLL, there is no more information) over a fixed
period of time. The highest value of the energy is in the optimum sampling
instant, when no further delay is added in the signal. Farther more, a bad effect
caused by sampling at not the right τ can be seen in the scattering diagram. The
two constellation points coming closer to each other gradually the delay increases
and they collapse in a unique one when the signal is sampled with 𝜏 =𝑁
2=
24 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 (the energy is equal to zero).
In real implementation, after the Open Loop Timing estimation there is a Closed
Loop implementation that is able to follow some variation in the incoming signal
due to the different rates between the sample clocks of the transmitter and
receiver.
The Channel Model in GNU Radio is able to simulate this jitter in the timing in
order to allow the user to be in a more real environment. The problem of
implementing a Closed Loop timing is the tuning of the parameter 𝛾.
Its value is dependent from the SNR: the lower the Signal-to-noise ratio is, the
lower should be the step size, since the noise reduces the stability of the loop.
However, from real simulation it can be seen that noise can help the loop to get
out from some minima (since it is a gradient algorithm). When SNR is high and
𝛾 is too small, variations in the timing are not followed by the loop.
The acceptable trade-off can be reached by tuning the parameter starting from a
very low value (𝛾 = 10−6) and increasing it bringing the loop near the instability.
A value similar to the one extract from MATLAB simulation is chosen.
3. Physical Layer Design
51
The performance of the two kinds of loop, above described, are reported here
Table 3.6 – Clock Recovery Performance
Type Clock Cycles Elapsed Time [µs] Repetitions CPU [%]
Open Loop 135429 406.69 1 0.04
Closed Loop 140 0.42 1000 0.04
In the table above it is supposed that the Open Loop Timing is done once in a
second. In reality, it is performed every time the Symbol Lock is lost. The reason
to have a Symbol Lock detector is that, when the clock recovery is not able to find
the optimum sampling instant (e.g. SNR is too low, timing jitter is too high, etc.),
the receiver should be able to signal the event.
In literature there are different kind of symbol lock detectors that performs quite
well if they are implemented in this project. The main problem is to make the
detector independent from the SNR and from all the blocks of the receiver (e.g.
independent from AGC).
One detector is reported briefly here showing some optimization in the real
implementation. The detector described here, requires only two samples per
symbol (the same used by the Gardner timing recovery), it has good performance
with BPSK and it works well in AWGN [16]. The algorithm is:
• Calculate the error in the optimum sampling instant (the same of
Gardner): 𝑢(𝑘) = 𝑟(𝑘𝑇 + 𝜏 − 𝑇/2) ∙ [𝑟(𝑘𝑇 + 𝜏) − 𝑟(𝑘𝑇 + 𝜏 − 𝑇)] ;
• Calculate the error moving by half a symbol (at midway sample): 𝑤(𝑘) =
𝑟(𝑘𝑇 + 𝜏) ∙ [𝑟(𝑘𝑇 + 𝜏 + 𝑇/2) − 𝑟(𝑘𝑇 + 𝜏 − 𝑇/2)];
• Take the expectation of both errors and make the difference: 𝐷𝑖𝑓𝑓 =
𝐸{|𝑤(𝑘)|} − 𝐸{|𝑢(𝑘)|}.
3. Physical Layer Design
52
The working principle can be understood better by tracing the S-curves of the
two errors. Shifting τ, moving away from the optimum sampling instant, the two
expectations are equal with opposite sign
Figure 3.30 – Timing Error Detector S-curves for Strobe and Midway samples
From the plot is easy to see that when the Diff goes to zero the symbol lock can
be definitely declared. Obviously, this method works well when the Acquisition
Sequence is present, but when data are sent the errors increase. Also, if the
expectation of the error (over a reasonable interval) is taken, when the data
sequence is too long, the error increases too much. So, the practical
implementation is not very appeal because, also, it results to be very sensitive to
the SNR.
The detector can be slightly modified taking also into account the sign of the
sample in the optimum sampling instant [17]. Only the error calculated by the
Gardner is needed and it seems to perform well in the condition of the
presented project.
In order to declare the symbol lock, a lookup table can be deduced, at different
SNR, to set a threshold to be looked by the detector at each optimum sampling
instant. The performances are reported in Table 3.7
Table 3.7 – Symbol Lock Detector Performance
Type Clock Cycles Elapsed Time [µs] Repetitions CPU [%]
Symbol Lock
Detector 176 0.53 1000 0.05
Get threshold
from LUT 113 0.34 1 0.00
3. Physical Layer Design
53
3.6 PHASE RECOVERY
Tx and Rx carrier frequencies are slightly different so, after frequency down-
conversion, at the receiving end, the signal will have a residual frequency offset.
Distinct estimations between frequency and phase offsets are necessary when the
first one is very high. However, since a frequency offset translates into a linear
phase variation (in time), only a phase recovery loop can be implemented when
the frequency offset is small. In the Subcarrier Synchronizer block a PLL is
already implemented, so here a similar one is explained that works directly on
the symbol making a decision on it.
3.6.1 Design and Trade-off analysis
The block implemented here is a Costas Loop. It has a Frequency Discriminator
Device (FDD) that is a non-linear device returning an error signal proportional
to the frequency of the input. The error is calculated by setting to zero the
derivative of the log-likelihood of the estimated phase. The result is:
𝑒𝑘 = 𝐼𝑚{𝑎𝑘∗ 𝑥𝑘𝑒−𝑗�̂�𝑘}
where 𝑎𝑘 is the hard decision taken on the entering symbol 𝑥𝑘. At this point it is
assumed that all the information about the bit is on the real part of the signal, so
the decision is taken only looking at it. Hard decision means that the value of
the symbol is chosen comparing it with a threshold, if the symbol is above the
threshold it is demodulated as 1, otherwise as 0. The symbols are passed to the
layer above (Coding Layer).
3. Physical Layer Design
54
Figure 3.31 – Costas Loop Block Diagram
The updated value of the phase is given by a gradient algorithm and is equal to:
𝜃𝑘+1 = 𝜃𝑘 + 𝛾𝑒𝑘
where 𝛾 is the VCO sensitivity.
The Costas Loop is a PLL able to track doppler shift of the carrier. The Doppler
effect is a phenomenon that modifies the received signal frequency as a function
of the relative motion between transmitter and receiver. If constant Doppler shift
is present, in absence of the Costas Loop, the scattering diagram looks as follow
Figure 3.32 – Scattering Diagram in
presence of Doppler Shift
3. Physical Layer Design
55
Since the Costas Loop is a first order PLL, the study done in the Section 3.3 is
valid here as well. Varying the step size 𝛾, the estimated phase at the output of
the block is reported here
Figure 3.33 – Phase Estimated by Costas Loop with different 𝜸
As expected, increasing the VCO sensitivity, the response of the loop is faster,
but it becomes more unstable.
3.6.2 Simulation and Performance results
As anticipated, the Costas Loop works at one sample per symbol, that is the one
taken at the optimum sampling instant (given by Clock Recovery block). The
performance in terms of clock cycles needed by the loop are reported here
Table 3.8 – Costas Loop Performance
Type Clock Cycles Elapsed Time [µs] Repetitions CPU [%]
Costas Loop 410 1.23 1000 0.12
The loop processes 1000 symbols in one second and the CPU utilization is 0.12%.
Also in this case, in the C implementation a little modification on the 𝛾 parameter
is needed in order to obtain the same results of the simulation.
56
CHAPTER 4
4 Coding Layer Design
This Chapter describes the Synchronization and Channel Coding Schemes used with the
Telecommand space data link protocol. The data format, the coding scheme and the frame
synchronizer block used are explained with particular attention to their performances.
4.1 TC SYNCHRONIZATION AND CHANNEL CODING
The CCSDS specifies a method for synchronizing codewords using a data unit
called CLTU (already mentioned in the previous Chapter), which consists in a
Start Sequence, a Bose–Chaudhuri–Hocquenghem (BCH) codeword, and a Tail
Sequence.
The CLTU is made by two fixed data patterns at the start and at the end. The Start
Sequence provides the synchronization pattern and delimits the beginning of the
first BCH Codeblock. After N BCH Codeblocks, the Tail Sequence is placed and
it marks the end of the CLTU
Figure 4.1 – Format of a CLTU [18]
The Reception Logic adopted, at the receiving end, can be represented by the
following state diagram
4. Coding Layer Design
57
Figure 4.2 – CLTU Reception State Diagram [18]
The Inactive state (S1) is the first state in which the Physical Layer is not sending
any bits to the Coding Layer because it does not have bit synchronization of an
RF signal. When it achieves the synchronization, after an event signaling, it sends
the demodulated bits to the Coding Layer. The Coding Layer switches to the
Search state (S2) in which it looks for the Start sequence. When the sequence is
found, the Decode state (S3) is enabled and the next received bits are decoded
until an End sequence or an uncorrectable word is found. All the events of the
state machine are described in the following table
Table 4.1 – Events and States of Coding Layer
Inactive (S1) Search (S2) Decode (S3)
Channel
Activation (E1) Go to Search / /
Channel
Deactivation (E2) / Go to Inactive Go to Inactive
Start Sequence
Found (E3) / Go to Decode /
Codeblock
Rejection (E4) / / Go to Search
In next Sections, the two main blocks of the Sublayer are described in detail,
showing, in particular, the characteristics of the CLTU and of the BCH coding
scheme.
4. Coding Layer Design
58
4.2 FRAME SYNCHRONIZER
When Coding Layer is in Search state, it is looking for the Start Sequence in order
to synchronize the receiver with the CLTU.
To this purpose, the Start sequence is designed in order to provide low
autocorrelation side lobes, thereby reducing the probability that the receiving
end will mistakenly identify the pattern. In fact, it is clearly distinguishable from
the Idle Sequence.
The sequence is composed by 16 bits and the pattern is EB 90 (in hexadecimal).
Its autocorrelation is reported here
Figure 4.3 – BCH Start Sequence Circular Autocorrelation
The autocorrelation is the correlation of a signal with a delayed copy of itself as
a function of delay. In the plot above, the autocorrelation starts with a delay of
half of the sequence length (shift = 8 samples means that the sequence is perfectly
aligned with itself).
4. Coding Layer Design
59
4.2.1 Design and Trade-off analysis
Considering the properties of the start sequence, the easiest and most intuitive
way to implement the CLTU Synchronizer is to perform the correlation with the
incoming symbols bit by bit. The incoming signal is considered in chunk of length
equal to the length of the Start sequence and the cross correlation is defined as:
(𝑠 ∗ 𝑥)[𝑛] ≜ ∑ 𝑠[𝑚]𝑥[𝑚 + 𝑛]
𝑚=∞
𝑚=−∞
where s is the start sequence, x is the incoming signal and n is the shift in samples.
For example, the cross correlation of the Start sequence with the Idle sequence is
shown in the following figure
Figure 4.4 – Cross Correlation between BCH Start Sequence and Idle Sequence
For the Frame synchronizer just described, it is supposed that the decision on the
symbol at the Physical layer, is a hard decision (Section 3.6.1). However, the
decision can be a Soft Decision, meaning that the soft information coming from
the channel is exploited. Soft-decision decoders are often used in Viterbi decoders
that are used for decoding convolutional codes, so with the BCH coding scheme
4. Coding Layer Design
60
this system is not used. This kind of correlation is described in Chapter 5 when a
convolutional code is presented.
Since the Physical layer (due to the modulation scheme used) is not able to
resolve the data ambiguity, the value of the cross correlation is also used to this
purpose. Data ambiguity means that there is a shift in phase of 180° of the
incoming signal. When the Frame Synchronizer is looking for the Start sequence,
it tries to find it or its inverse (the correlation has the opposite value). If the Start
of the CLTU is found and it is reversed, all the next bits are inverted until the
Coding layer does not go again to the state S2.
Lastly, the Recommended Standard [1] specifies two options for the reception of
the CLTU regarding the Start Sequence. The first one is that the CLTU is accepted
only if the Start Sequence perfectly match the expected bit pattern (no errors are
allowed in the sequence). The second one is that the CLTU is accepted also if the
Start Sequence differs of 1 bit from the expected pattern (1 error is allowed). As
will be presented in the next subsection, since a decoder in a single error
correcting mode is used, the Recommendation suggests to use the second option
since it results in a better performance.
4. Coding Layer Design
61
4.3 DECODER
The Synchronization and Channel Coding Sublayer encodes the Frames to be
transmitted with a BCH block code and generates a set of BCH Codeblocks. The
format of a BCH Codeblock is reported in the following Figure
Figure 4.5 – Format of a BCH Codeblock [18]
The Information bits are used as input to the encoding process that generates
seven Parity bits. The Filler bit is always set to zero.
This code is called systematic since the information word is unchanged by the
encoding process. The error control field is characterized by a generating
polynomial and each information word is mapped into it (multiplying them). For
systematic codes, all the possible error control fields have an even number of bits
equal to one (Parity code). This property is exploited at the receiver side since it
is possible to detect any odd number of errors.
The code is an expurgated Hamming code, derived from a basic (63,57)
Hamming code. The basic Hamming code can be improved by limiting it to the
even-parity codewords. The resulting code has only 256 codewords so it has an
information word of 56 bits and 7 check bits. The generator polynomial is:
𝑔(𝑥) = 𝑥7 + 𝑥6 + 𝑥2 + 1
The modified BCH code has a minimum distance of four meaning that each
codeword differs from every other codeword in at least four bits position.
4. Coding Layer Design
62
In Figure 4.6, is shown the BCH encoder, and it is briefly explained, since it is
implemented for the performance measurements in Chapter 4.4
Figure 4.6 – BCH(63,56) Encoder [1]
The shift registers are initialized to zero and the switches are in position one
while the 56 Information bits enter to the encoder. The Syndrome is updated at
each new bit and, after the information bits, the switches move in position two.
The seven parity bits are calculated and changed in sign and, for the 64-th bit
(filler bit), the switches are in position three.
4.3.1 Design and Trade-off analysis
The decoder can be used either in error-detecting or error-correcting mode. In the
first case, it can detect up to three errors and it can also detect any odd number
of errors. In the second case, it can detect up to two errors and correct a single
error.
The solution adopted at the Rx, as reported in the Specification, is the Decoder in
Single Error Correction (SEC) mode
4. Coding Layer Design
63
Figure 4.7 – Implementation of Error Correction Mode Decoder [18]
The decoder operates as follows: the Even Odd Detector (EOD) and the
Syndrome Register (SR) are all set to zero before receiving a new BCH Codeblock.
When the Coding layer switch to the Decoding state, the next 63 bits are input to
the Buffer Register (BR), SR and EOD.
After the 63 bits, the BR is full, SR contains the syndrome and EOD is 1 if the
parity is odd and 0 if it is even.
The seven switches close momentarily to transfer the contents of the SR to the
Position Location Register (PLR) and to transfer the output of the EOD to the
XOR gate and to the Hold device. This is done to save the result and to allow the
4. Coding Layer Design
64
decoder to calculate the syndrome of another incoming BCH Codeblock while it
tries to correct the actual one.
As the next BCH Codeblock is shifted into the buffer BR, the current one is shifted
out and, if there is an error, it is corrected at the output XOR gate. This correction
occurs when the contents of the PLR register are ‘000001’ and the Hold output is
‘1’.
The decisions on whether the Codeword is correct, is rejected or must be
corrected are summarized in the following table
Table 4.2 – Codeblock Decision for BCH Decoder (SEC)
DECISION EOD SB
‘No Errors’ 0 0
‘Codeword Rejection’ 0 1
‘Codeword Rejection’ 1 0
‘Correction of single error’ 1 1
The Decoding process, as already specified in Table 4.2, stops when an
uncorrectable Codeword is found. The Tail Sequence in the CLTU has the same
length of a BCH Codeword and it has a particular pattern (C5 C5 C5 C5 C5 C5 C5
79 in hexadecimal) that makes it different from a valid or correctable Codeblock.
The performances of the code are report in the next subsection and another type
of Coding Scheme is presented in Chapter 5.
4. Coding Layer Design
65
4.4 SIMULATION AND PERFORMANCE RESULTS
In order to calculate the performance of the BCH SEC mode decoder, a MATLAB
simulation is executed.
A BCH(63,56) encoder is implemented in MATLAB environment in the ways
explained previously. In the simulation, a certain number of random bits are
encoded, modulated and sent over an AWGN Channel.
At the receiver side, the bits are demodulated and decoded with the BCH SEC
Mode Decoder described in the previous Paragraph. Bit Error Rate (BER) and
Codeword Error Rate (CER) are calculated as function of 𝐸𝑏
𝑁0.
The BER is calculated by comparing the transmitted sequence of bits to the
received bits and counting the number of errors. It is defined as:
𝐵𝐸𝑅 = 𝑁𝐸𝑟𝑟−𝑏𝑖𝑡𝑠
𝑁𝑇𝑥−𝑏𝑖𝑡𝑠
It is the ratio between the number of erroneous received bits and the number of
total transmitted bits. In the same way, the CER is the number of erroneous
received codewords over the number of total transmitted codewords.
For the simulation, 𝐸𝑏
𝑁0 is incremented by a value of 0.5 𝑑𝐵 every time a number of
500 erred codewords are received. Obviously, the more 𝐸𝑏
𝑁0 increases, the more
time the simulation requires to obtain a reliable value of BER and CER since is
more difficult to reach a so high number of errors and so the simulation is
stopped before BER and CER become very small. In the next Figures are
illustrated the results comparing the BCH with a system where no coding scheme
is used. To obtain the performance of an uncoded system, the same MATLAB
simulation, without exploiting the encoder and the decoder, is used
4. Coding Layer Design
66
Figure 4.8 – BER versus 𝑬𝒃 𝑵𝟎⁄ with BCH and without Coding
Figure 4.9 – CER versus 𝑬𝒃 𝑵𝟎⁄ with BCH and without Coding
4. Coding Layer Design
67
As it is possible to see from the plots, the BCH decoder, that is able to correct one
single error, gains about 2 dB with respect to an uncoded system. For example,
to have a Bit Error Rate of 10−4 an 𝐸𝑏
𝑁0 of 7 dB is needed. The same
𝐸𝑏
𝑁0 results in a
higher value of CER, since the decoder in not able to correct multiple error. The
capabilities of detecting odd number of errors is not considered here.
The curves will be compared with the real-time execution of the system in which
all the other blocks, described in the previous Section, are implemented in the
receiver. To anticipate the results, the simulation is quite similar to the real-time
prototype (see Section 6).
68
CHAPTER 5
5 LDPC Code
In this Chapter a different type of coding scheme is presented. It provides higher coding
gain with respect to the BCH code, presented in the previous Chapter. However, its
implementation is more expensive so, pros and cons will be analyzed here.
5.1 OVERVIEW
Low-Density Parity-Check (LDPC) Codes are one of the most powerful of
channel coding scheme [19]. They are block codes with very sparse parity check
matrix (small number of ones), that is a matrix that describes the linear relations
that the components of a codeword must satisfy. These codes can be:
• Regular LDPC: have same number of edges leaving each node (same
number of 1’s in each column and same number of 1’s in each row of the
parity check matrix). They have good performance but a little worse than
turbo codes;
• Irregular LDPC: the degree of each node is allowed to vary and they are
competitive with turbo codes.
The parameters of the binary linear block code (C) are:
• k → number of information bits;
• n → number of code bits;
• 𝑅𝑐 =𝑘
𝑛 → coding rate;
• 𝑑𝑚𝑖𝑛 → minimum distance.
5. LDPC Code
69
The block code can be described by different method, such as, Codebook, Parity
check matrix (H) or generator matrix (G) and by graphical representation through
a bipartite graph (Tanner Graph).
5.2 ENCODER
Encoding of LDPC, in general, has a non-negligible complexity. Two LDPC codes
are specified by CCSDS Standard with codeword lengths (𝑛 = 128, 𝑘 = 64) and
(𝑛 = 512, 𝑘 = 256).
Different types of encoder are present in literature which can provide high
performance simplifying enormously the computation of the codeword.
However, this project regards the implementation of the receiver structure and
the encoding process is done by a Workstation. For this reason, this work does
not investigate the different encoders but only the classic method (𝑢 ∗ 𝐺 encoder)
is described since it is implemented to be used by the transmitter.
For any linear block code C(n, k), encoding can be performed by collecting the
information bits into a k-bit information vector 𝒖 = (𝑣1, … , 𝑣𝑘), and obtaining the
n-bit codeword 𝒄 = (𝑐1, … , 𝑐𝑛) as 𝒄 = 𝒖𝑮 where 𝑮.
The generator matrix 𝑮 can be obtained by the equation 𝑮𝑯𝑻 = 𝟎𝑘𝑥𝑟 where 𝟎𝑘𝑥𝑟
is the all-zero matrix and 𝑯 is the parity check matrix that is given by the
Standard. For the short block length code, it is equal to
Figure 5.1 – Parity Check Matrix of LDPC(128,64)
5. LDPC Code
70
where 𝐼𝑀 and 0𝑀 are the M×M identity and zero matrices, respectively, and Φ is
the first right circular shift of 𝐼𝑀. That is, Φ has a non-zero entry in row i and
column j if 𝑗 = 𝑖 + 1 mod M. It should be noted that Φ2 is the second right
circular shift of 𝐼𝑀 , etc., and Φ2 = 𝐼𝑀 . The operator ⊕ indicates modulo-2
addition [1].
The type of encoding just described is called systematic since the first k-bit of the
codeword are equal to the information vector.
The short code is the one chosen for the implementation since, LDPC(512,256)
would add too much complexity to the Rx.
The CLTU, as for BCH coding, has the same structure (Start Sequence, LDPC
block, End Sequence) and nothing changes in the reception procedure. The length
of the Start and End Sequences changes with respect to the BCH case. For
LDPC(128,64), the CLTU is illustrated in Figure 5.2
Figure 5.2 – CLTU Structure for LDPC(128,64) code
The Start Sequence is composed by 64 bits and has the pattern 0347 76C7 2728
95B0 (in hexadecimal). Its autocorrelation is reported in Figure 5.3
5. LDPC Code
71
Figure 5.3 – LDPC Start Sequence Circular Autocorrelation
Instead, the cross correlation between Start and Idle Sequences is reported below
Figure 5.4 – Cross Correlation between LDPC Start Sequence and Idle Sequence
5. LDPC Code
72
What it is easy to see is that the cross correlation gives lower values with respect
to the BCH case where the mean value is 0.25, while the autocorrelation presents
some side lobes (but there are not the two peaks of Figure 4.3). This was expected
since the length of the Start Sequence if four times longer.
The properties of the sequence are preserved looking at the cross correlation of it
with its noisy version. Simulating an 𝐸𝑏
𝑁0= 4 the results is the following one
Figure 5.5 – LDPC Start Sequence Autocorrelation (𝑬𝒃
𝑵𝟎= 𝟒)
The peak is still highly distinguishable from the side lobes so, in order to test
the system implementing an LDPC decoder, the start sequence can be easily
recognizable performing, at the Rx side, a cross correlation with the incoming
signal and comparing the result with a threshold that can be between 0.4 up to
0.8.
As for the BCH CLTU, after the start sequence there are N LDPC Codewords.
Two observation must be done for the encoding process of these blocks. The
5. LDPC Code
73
first one is that, if the frame that should be transferred does not fit exactly
within an integral number of LDPC codewords, then fill bits must be appended
to the last LDPC Codeword. The pattern of the fill consists in alternating ones
and zeros (starting with a one).
The second observation is that LDPC coding, by itself, cannot guarantee
sufficient bit transitions to keep receiver symbol synchronizers in lock. For this
reason, a randomizer should be added in the Rx chain (see 5.3).
Lastly, an End Sequence is present after the last Codeword. In reality, this is an
optional decision since it can be neglected relying to the ability of the decoder to
exit from the decoding state when a non-correctable codeword is found. This
reduces the complexity at the receiver stage and increases the throughput since
the CLTUs can be transferred one back the other without sending the End
Sequence in between. In this project, for completeness, the end sequence is
introduced in the CLTU and the decoder looks for a non-correctable codeword
until signaling the Coding Layer to switch to the Search state. The sequence has
the pattern C5C5 C5C5 C5C5 C579 (in hexadecimal).
5.3 RANDOMIZER
In order to maintain bit (or symbol) synchronization with the received
communications signal, every data capture system at the receiving end requires
that the incoming signal must have a minimum bit transition density. Since, as
said in the previous Paragraph, LDPC encoding does not guarantee sufficient bit
transition, a Randomizer must be used.
The Randomizer is a block used to increase randomness of the Tx binary
sequence. A random binary sequence is a sequence of equi-probable
(𝑝(0) = 𝑝(1) =1
2) and statistical independent bits.
5. LDPC Code
74
The properties of a random binary sequence of length N are:
• No 0/1 bias → 𝑁
2 bits are 0 and
𝑁
2 are 1;
• Transition density → for any bit there is the probability of 1
2 of having a
transition (from 0 to 1 and viceversa);
• Autocorrelation → the sequence is ‘orthogonal’ to all its shifted
versions. This means that when comparing it with
cyclically shifted version, 1
2 bits are equal and
1
2 bits
differ;
• Run distribution → probability of finding long run (sequence of same
bit) is very small.
A Randomizer is a (digital) device able to output a pseudo-random binary
sequence, i.e., a sequence with nearly optimal randomness properties. How the
Randomizer works is illustrated in the following Figure
Figure 5.6 – How Randomizer works – Tx side (Up) – Rx side (Down)
The Randomizer is based on a Linear Feedback Shift Register (LFSR), that is a
shift register whose input bit is a linear function of its previous state. The LFSR,
given a starting seed (initial value of the registers), produces a periodic binary
sequence with period 𝑁 = 2𝑀 − 1 where 𝑁 is the number of shift registers used.
5. LDPC Code
75
The period corresponds to all possible binary configurations (states), but all zero
state must be avoided.
The properties of a random binary sequence are maintained, but the correlation
is not exactly 0 as it should be. However, the difference is very small and becomes
negligible when period N increases. Moreover, the autocorrelation properties are
very good only when the entire period is used. When only a segment is used, out
of phase autocorrelation may show quite high peaks.
The LFSR proposed by the CCSDS has 8 shift registers, so the max period is 𝑁 =
28 − 1 = 255
Figure 5.7 – LFSR proposed by CCSDS [1]
The generator polynomial is:
ℎ(𝑥) = 𝑥8 + 𝑥6 + 𝑥4 + 𝑥3 + 𝑥2 + 𝑥 + 1
The Standard specifies how to generate the random sequence and how it must be
applied to the transfer frames. Every time a new CLTU starts, the LFSR is reset.
For this reason, in order to do not add complexity to the receiver, the sequence is
generated once and put in a lookup table. This simplifies a lot the operation since
the Rx does not have to make a lot of shifting and ex-or operations at every
received CLTU.
5. LDPC Code
76
5.4 DECODER
LDPC Decoder exploits ‘soft symbols’ to decode a Codeword. Soft decision
decoding is a class of algorithms that takes a stream of bits or a block of bits and
decodes them by considering a range of possible values that they may take. It
considers the reliability of each received pulse to form better estimates of input
data. This is opposite to the hard decision used by the BCH decoder where the
bits are compared with a threshold and they are, definitely, considered as ‘1’ or
‘0’.
Summarizing, the Phase Recovery block (Chapter 3.6) does not make a decision
on the received bits, but it transfers them to the Coding Layer as they are and it
does not trunk the information about the channel carrying by the bits. This,
theoretically, provides a performance improvement of about 2-3 dB.
Nothing is specified in the Standard about the type of decoder to be used. In
literature, there are different kind of LDPC Decoders, as specified in [20], and
some of them are presented here.
In particular, three decoders are briefly presented here, with particular attention
to one of them that is the one that is actually exported in C language and executed
on the development board.
The decoders are listed here:
• The iterative Sum-Product Algorithm (SPA);
• The iterative Min-Sum (MS) Algorithm;
• The iterative Normalized Min-Sum (NMS) Algorithm.
The SPA works on the Tanner graph, by implementing an iterative exchange and
update of reliability values concerning each received bit.
5. LDPC Code
77
As already anticipated, the matrix H can be represented by a bipartite graph,
where n nodes correspond to codeword bits (message nodes, on the left) and r
nodes corresponds to redundancy bits (check nodes, on the right).
Variable nodes and check nodes are connected each other (a 1 in H matrix
corresponds to an edge) and they exchange some probabilistic information.
𝑞𝑖→𝑗(𝑥) is the information sent by the i-th variable node to the j-th check node
and it is the probability that the bit takes the value 𝑥, with 𝑥 ∈ {0,1}. In the same
way, 𝑟𝑗→𝑖(𝑥) is the information sent by the j-th check node to the i-th variable
node
Figure 5.8 – Tanner Graph (left) – H matrix (right)
An LDPC code can be decoded by the belief propagation decoding algorithm,
that directly matches the code bipartite graph. After variable nodes are initialized
with the channel messages, the decoding messages are iteratively computed by
all the variable nodes and check nodes and exchanged through the edges
between the neighboring nodes.
The implementation of the algorithm is done in the Log-Likelihood Ratio (LLR),
that is, for each received bit the log of the ratio between the probability that the
bit is 1 and the probability that the bit is 0 is calculated. The LLR is reported on
each edge and a Belief Propagation algorithm is applied. The iterative decoding
5. LDPC Code
78
updates the value of the LLR at each iteration in order to calculate the syndrome.
Once the syndrome is zero the decoder stops.
The LLR of a random variable 𝑈 is defined as:
𝐿(𝑈) = 𝑙𝑛 (𝑃(𝑈 = 0)
𝑃(𝑈 = 1))
where 𝑃(𝑈 = 𝑥) is the probability that 𝑈 takes the value 𝑥. In the LLR-SPA, the
messages sent between the nodes are:
Γ𝑖→𝑗(x𝑖) = 𝑙𝑛 (q𝑖→𝑗(0)
q𝑖→𝑗(1)) , Λ𝑗→𝑖(x𝑖) = 𝑙𝑛 (
r𝑗→𝑖(0)
r𝑗→𝑖(1))
For the iterative decoding, these values are initialized to Γ𝑖→𝑗(x𝑖) = 𝐿(x𝑖) and
Λ𝑗→𝑖(x𝑖) = 0, where for a BPSK, the value 𝐿(x𝑖) is equal to:
𝐿(x𝑖) = 𝑙𝑛 (𝑃(x𝑖 = 0 | y𝑖)
𝑃(x𝑖 = 1 | y𝑖)) =
2 y𝑖
𝜎2
where 𝑃(x𝑖 = 𝑥 | y𝑖) is the probability that the i-th transmitted bit takes the value
x, conditioned on the received symbol y𝑖 and on the variance of the noise 𝜎2.
This is done for each pair of nodes. The value sent from the check nodes to the
variable nodes is updated as:
Λ𝑗→𝑖(x𝑖) = 2 ∙ 𝑎𝑡𝑎𝑛ℎ {∏ 𝑡𝑎𝑛ℎ [1
2Γ𝑖→𝑗(x𝑖)]}
Then, the message sent from the variable nodes to the check nodes is updated as:
Γ𝑖→𝑗(x𝑖) = 𝐿(x𝑖) + ∑ Λ𝑗→𝑖(x𝑖)
At the same time is calculated the value of Γ𝑖(x𝑖) and an estimation of the
transmitted bit is given by:
�̂�𝑖 = {0 𝑖𝑓 Γ𝑖(x𝑖) ≥ 0
1 𝑖𝑓 Γ𝑖(x𝑖) < 0
5. LDPC Code
79
If the estimated codeword satisfies all the parity check matrix constraints, the
decoding stops and the decoded codeword is given as output. Otherwise, the
process listed above is repeated until a maximum number of iterations. If, after
the last iteration, the estimated codeword still does not satisfies the constraint,
the codeword is considered non-correctable and the coding layer skips to the
search state.
The algorithm just explained is quite expensive in terms of performance since it
requires some hard operations, such as 𝑎𝑡𝑎𝑛ℎ(∙) , 𝑒𝑥𝑝(∙) , 𝑚𝑜𝑑(∙), updating of a
matrix, etc. However, it is suitable for an eventual hardware implementation.
Since the load of the Rx described in this work is not so high, the SPA can be
implemented in software without particular performance bottleneck.
Other types of decoders, making use of some mathematical approximations, are
present. The MS algorithm consists in applying the Max-Log approximation in
the computation of the messages between variable and check nodes. The NMS
algorithm is based on the same principle, but it introduces a normalizing factor
on the LLR values in order to improve the Min-Sum algorithm.
They have similar performance with the one of SPA, but they introduce a small
loss in the result.
5.5 SIMULATION AND PERFORMANCE RESULTS
In order to compare the performance of the different decoders a MATLAB
simulation is performed.
The same kind of simulation in Chapter 4.4 is executed. Random bits are encoded
with the classical method, described in Section 5.2, modulated and sent over an
AWGN Channel. At the receiver side, the bits are demodulated and decoded
5. LDPC Code
80
with the 3 different decoders just presented. BER and CER are calculated as
function of 𝐸𝑏
𝑁0.
For the simulation, 𝐸𝑏
𝑁0 is incremented by a value of 0.5 𝑑𝐵 every time a number of
20 erred codewords are received. The Randomizer is not implemented in this test
and a maximum number of 20 decoding iteration is considered. Obviously, a
smaller number of points are extracted since the decoding process requires more
time with respect to the BCH one.
In the next Figures, the results are illustrated
Figure 5.9 – BER versus 𝑬𝒃 𝑵𝟎⁄ with different LDPC decoder
5. LDPC Code
81
Figure 5.10 – CER versus 𝑬𝒃 𝑵𝟎⁄ with different LDPC decoder
The SPA decoder has the best performance in terms of BER and CER at very low
value of SNR. From 𝐸𝑏
𝑁0= 4 𝑑𝐵, NMS algorithm gives better results. This will be
analyzed in detail as future work, when all the decoders will be implemented on
the Zedboard. The results can be justified from the fact that the simulation is
stopped before obtaining a confident value of error rate.
NMS algorithm (a normalization factor of 0.8 is used) seems to perform slightly
better than MS, but Sum-Product Algorithm is the best one. However, MS and
NMS are a good trade-off between complexity of the implementation and
performance of the result.
Now, the performances of the SPA decoder (considered since it is the one
implemented in practice), are compared with the performances of the BCH
decoder, already presented. The same graphs of BER and CER are extrapolated,
increasing the number of points for the SPA decoder
5. LDPC Code
82
Figure 5.11 – BER versus 𝑬𝒃 𝑵𝟎⁄ – Comparison between Coding Schemes
Figure 5.12 – CER versus 𝑬𝒃 𝑵𝟎⁄ – Comparison between Coding Schemes
5. LDPC Code
83
With these plots, it is possible to see better the difference between LDPC and BCH
coding schemes. It was already anticipated that, with LDPC code is possible to
achieve a gain of about 2-3 dB with respect to the BCH code. This is visible in the
graph where LDPC has the best performance.
For example, considering an acceptable value of BER (10−4), the result is that
LDPC is able to achieve it at around 4.5 dB of 𝐸𝑏
𝑁0, instead, BCH needed a value of
7 dB. These results will be compared with the practical system implementation.
Using a complex coding scheme, such as the one described in this Chapter, with
respect to use the BCH code, can make an important difference when working
with very low values of SNR. Other types of simpler decoders will be
investigated as future task, in order to make the implementation of the Rx
simpler.
84
CHAPTER 6
6 Prototype Testing
This Section describes the set-up used to test the receiver architecture designed and
simulated in the previous Chapters. All the receiver blocks are assembled together in order
to write a software able to process a real-time signal. Improvement to the system and
performance analysis are made to obtain a prototype that can be upgraded with future
work.
6.1 TESTING OVERVIEW
The goal of this Chapter is to implement the functional baseband receiver
designed in the previous Chapters, able to receive a signal and processing it to
extract the information that it carries (Telecommand data). The test set-up used
was already described in Section 2.6 and shown in Figure 2.8.
Recalling the Scheme, the Ethernet link is exploited to send the signal generated
by the Workstation and it also used by the Zedboard to send back some useful
variables that can be observed through graphical representation in the GNU
Radio environment.
An example of the interface is reported in Figure 6.1. It can be observed that some
parts of the screen are dedicated to the visualization of the transmitted signal and
an Options tab is present to control its generation. The third part is dedicated to
shown the signal processed by the receiver chain
6. Prototype Testing
85
Figure 6.1 – GNU Radio Interface
More in detail, once the program is running is possible to see: the spectrum of the
Tx signal, the shape of the signal in time and a ‘Option’ panel that can be used to
modify the parameters. For example, is possible to add a delay in time to the
signal, to add a frequency offset, to change the value of 𝐸𝑏
𝑁0 (modeled by the
channel model block) and to add phase noise.
Other plots are used to shows the processing on the signal by the receiver that
sends back some data. For example, the filtered signal and also the scattering
diagram after the PLL is shown in Figure 6.2.
For what concern the receiver part, the Xilinx chip is programmed with the C
code written exploiting all the information, about the blocks, derived by the
MATLAB simulations. Using the UART interfaces is possible to print, also here,
6. Prototype Testing
86
some control variables on the terminal to check the correct execution of the
software.
With the set-up just described, each block is tested and the obtained results are
compared with the ones extracted in the MATLAB simulation.
6.2 CPU UTILIZATION
The results in terms of clock cycles needed to each block of the Physical Layer to
perform its operation, is already given in Chapter 3. The Physical Layer and the
Coding Layer are implemented in the software as two separated Finite State
Machine (FSM).
The CPU utilization of the Coding Layer depends on the coding scheme used in
the receiver. Considering the BCH Decoder, the following table summarizing the
performance
Table 6.1 – Performance of Coding Layer FSM (with BCH Decoder)
Type Clock Cycles Elapsed Time [µs] Repetitions CPU [%]
Frame Sync. 532 1.6 1000 0.16
Decoder 817 2.45 1000 0.25
Here, it is supposed that the Frame Synchronizer performs the correlation every
time a new symbol enters the block (1000 bits at second). Also for the BCH
Decoder is supposed that it must decode 1000 bits in one second and it must
decode and correct one bit on each received frame. These are usual operations
that the decoder does every time a new start sequence is detected. So, the two
blocks never work simultaneously and the CPU utilization must be summed to
the one of the Physical Layer. As it can be notice, the Rx is not very complex at
this stage, so the utilization of a more complex decoding scheme is reasonable.
6. Prototype Testing
87
To this purpose, the performance of the Coding Layer FSM, considering the
LDCP encoder, is reported below
Table 6.2 – Performance of the Coding Layer FSM (with LDPC Decoder)
Type Clock Cycles Elapsed Time [µs] Repetitions CPU [%]
Frame Sync. 1300 3.9 1000 0.39
Decoder 194412 583.82 8 0.47
The same approach as for the BCH Coding Layer is followed. Obviously, the
Frame Synchronizer requires more resources since the correlation is performed
on 128 bits instead of 16. For the Decoder are considered the clock cycles needed
to decode one frame (128 bits) with only one iteration. In one second, the decoder
decodes around 8 frames. The performances are slightly worse than the BCH
decoder, but not so high to exclude the utilization of the LDPC coding scheme on
the Receiver. However, the difference can be seen at lower SNR where the
Decoder must perform more than one iteration to decode the frames.
Considering a maximum number of iterations of 20 and supposing that the
decoder makes them for all receiver frames, the CPU utilization increases to
9.34 %.
Summarizing, the CPU utilization, distinguishing the case in which BCH and
LDPC decoders are used, is shown in the following pie charts (the worst case for
LDPC decoder is taken into account, i.e. it must perform the highest number of
iterations)
6. Prototype Testing
88
Figure 6.2 – CPU utilization by the two FSM
The utilization is calculated considering the number of time that each operation
is performed in one second of execution. The CPU is always used at its maximum
frequency (667 MHz), the L1 and L2 Caches are enabled by default and the
software is running on a single core.
For what concern the Physical Layer, the percentage is the sum of all the
contributions described in Chapter 3 and it is equal to 15.05 %.
More in detail, the chart in Figure 6.3 represents how the utilization is subdivided
between each block
6. Prototype Testing
89
Figure 6.3 – CPU utilization by Physical Layer’s blocks
Despite MA filtering and signal’s variance are moved in hardware, they are the
main contribution in the processing chain.
Instead, the two Coding Layers exploit the CPU as shown in Figure 6.4
Figure 6.4 – CPU utilization by Coding Layer’s blocks
6. Prototype Testing
90
6.3 BCH SEC DECODER
This Paragraph verifies the performance, in terms of BER and CER, of the BCH
Decoder. First of all, the system is tested making a CLTU without coding the
information bits. The comparison between the codeword error rate simulated in
MATLAB and the one obtained the implemented software is shown below
Figure 6.5 – CER of uncoded system
It can be seen how the two curves are, as expected, almost similar. So, the physical
layer of the system is working properly, providing a hard decision of the
processed symbols.
To test the BCH Decoder, a series of CLTUs, containing real NOOP Commands
(the spacecraft does not use this TC, they are only used for test), are sent to the
Receiver. As already seen in simulation, the Decoder works with very low BER
at 𝐸𝑏
𝑁0 above 8 dB. Supposing this limit value, where in simulation the BER was
around 10−5 , at least one error per Codeword is observed and correct as is
6. Prototype Testing
91
possible to see in the Xilinx Vitis Terminal (Figure 6.6). In the left part of the
Image, the scattering diagram at the output of the PLL is shown
Figure 6.6 – BCH Decoder (𝑬𝒃
𝑵𝟎= 𝟖 𝒅𝑩)
Reducing 𝐸𝑏
𝑁0 to 7 dB, some Codeword present an even number of errors (they are
rejected) or they present odd numbers of errors, but only one is corrected, so the
Codeword is erred.
6.4 LDPC SPA DECODER
To test the LDPC Decoder (SPA), a C program is written that is able to create a
CLTU by:
• Placing the Start Sequence;
• Encoding frames by means codifying characters, passed as input
parameters to the code, in ASCII;
• Closing the last frame adding fill bits as a sequence of ones and zeros;
• Placing the Start Sequence.
6. Prototype Testing
92
The generated sequence is sent through the Ethernet interface exploiting the
software GNU Radio that translates the bits into the signal that will be process
by the Receiver.
As expected, the performance of LDPC Coding are better that the BCH one. At
the same values of 𝐸𝑏
𝑁0 the LDPC decodes the signal with only one iteration and
with no errors.
At lower SNR, the LDPC start to increase the number of iterations and at 𝐸𝑏
𝑁0=
5 𝑑𝐵, its mean is around 3. From simulation results that this value gives a BER
quite similar to the one obtained from BCH at 𝐸𝑏
𝑁0= 8 𝑑𝐵. So, the gain of 2-3 dB is
also confirmed here. In the test, the alphabet is sent more times in separated
CLTUs. After the character ‘Z’ a sequence of ‘U’ is present (in ASCII, it
corresponds to the fill pattern ‘01010101’). The number next to each frame
represents the number of iterations performed by the SPA to decode it
Figure 6.7 – LDPC Decoder (𝑬𝒃
𝑵𝟎= 𝟓 𝒅𝑩)
Reducing 𝐸𝑏
𝑁0 to 4 dB, the constellation points are almost a unique one. The mean
number of iterations increases and some uncorrected Codeword appears. In
6. Prototype Testing
93
Figure 6.8, one frame in the first CLTU is not corrected by the Decoder. This
means that the decoder performed the maximum number of iterations, but the
parity check conditions are not verified so, the decoder rejects the Codeword and
the Coding Layer goes in Search state
Figure 6.8 – LDPC Decoder (𝑬𝒃
𝑵𝟎= 𝟒 𝒅𝑩)
To better see the performance of the SPA decoder implemented on the Zedboard,
a CER curve is extracted and compared with the one derived from the MATLAB
simulation. In the figure is also reported the CER of the uncoded system is
highlighted
6. Prototype Testing
94
Figure 6.9 – SPA decoder CER performance on Zynq-7000
As it can be seen, the decoder’s performance is quite similar to the theoretical
one. Considering the same value of 𝐸𝑏
𝑁0 a penalty smaller than 1 dB is present, that
is still a good reason to use LDPC code with respect to the BCH one. The
calculation of the CER becomes very difficult for a value of 𝐸𝑏
𝑁0 greater than 5 dB,
since it will be really small and, probably, closer to the theoretical behavior of the
decoder at high SNR.
A last interesting observation is done on the maximum possible data rate that the
system can achieve with LDPC code.
Starting from a data rate of 1 𝑘𝑏𝑖𝑡/𝑠 up to 4 𝑘𝑏𝑖𝑡/𝑠 the codeword error rate
curves are reported in Figure 6.10
6. Prototype Testing
95
Figure 6.10 – SPA decoder CER performance at different data rates
It is evident how for a data rate greater or equal to 3 𝑘𝑏𝑖𝑡/𝑠 the system is unusable
since the CER goes near (but also exceeds) the performance of an uncoded
system. The reasons why this happens are different and most of them must be
attributed to the physical layer. As seen in the design, all the blocks are be tuned
to work at 1 𝑘𝑏𝑖𝑡/𝑠. In order to obtain a valid result, the study performed for each
block should be done for the different data rates and the software must adapt
them in a suitable way.
A really final observation regards the fact that increasing the data rate translates
in a decreasing in the number of samples per symbol. First of all, this means that
the data rate cannot physically increase up to a certain value and secondly, in the
choice of the optimum sample a higher error is made since the system is
implemented without making interpolation between samples. Obviously, the
performance seen in Figure 6.10 can be increased taking into account these
remarks.
6. Prototype Testing
96
6.5 CUSTOM IP CORE
This last paragraph provides an overview about the development flow of
accelerators IP cores to implement in hardware, within the Zynq programmable
logic, part of the architecture previously discussed.
As already anticipated in Chapter 2.6, the Programmable Logic of the Zedboard
is based on the Xilinx 7-Series FPGA fabric and offers the designer the ability to
implement their own custom logic which can work alongside the software
running on the processor cores [7].
For designing and testing custom IP blocks, the Xilinx Vivado tool suite is used.
A block of custom logic, in the PL, can be created, adding control and status
monitoring capabilities by using memory mapped registers which the processors
can access via the AXI4 interconnect.
The AXI4 signals are completely flexible from the point of view of the VHDL
design. During the creation of a Xilinx IP block, the Vivado tools can be used to
map each AXI signal onto the signal name that the designer used when creating
the IP.
The first block implemented in hardware is the MA Filter. This thesis does not
treat how the implementation in VHDL is done, but some observations are
reported here.
First of all, the logic followed in the filter implementation is not the same as for
the software one (Figure 3.22). If the same logic is used, the output of the filter
can be obtained in one shot (only 1 clock cycle) since it is possible to use N
multipliers and 1 adder with N inputs simultaneously (N is the number of filter
taps). However, the resource utilization associated with this extremely parallel
architecture would impact significantly the chip utilization.
6. Prototype Testing
97
For this reason, the logic followed is to use the same number of multipliers, but
more adders with less inputs. In this way, more clock cycles are required, but less
resources are wasted. As already seen in the performance results of the filter
implementation, the hardware implementation is much faster than the software
one.
In the previous Paragraph, it was given a reference of the importance to have an
estimation of the SNR and it was said that the calculation of the signal’s variance
is essential. The calculation of the variance of a complex signal is quite expensive
in terms of operations that must be executed. For this reason, also this block is
implemented in hardware, resulting in a significant reduction of clock cycles
needed to perform the operation. The same logic used for the software
implementation is used here.
Lastly, an overview on how to add the two blocks in a simple architecture is
given.
In the following picture is shown a basic block design in Vivado to which, the
two peripherals are inserted
Figure 6.11 – Block Design with Vivado tool
6. Prototype Testing
98
In the block design, the Processing system is essential and it must be connected
in some way to the custom peripherals. After, these blocks are added, running
the Connect Automation command, the AXI Interconnect and the PS Reset are
automatically added and connected to the PS and to the peripherals. These blocks
are needed to properly interface the Zynq PS with the AXI4 peripherals.
After this important step, the design must be Validated, Synthetized,
Implemented and, finally, the Bitstream can be generated [21].
At the end of the design process, it is possible to export the hardware platform,
which can be used in Xilinx Vitis tool that is the environment used to write the
application to be run on the Zedboard and to provide software drivers for the
hardware configuration just presented.
99
CHAPTER 7
7 Final Remarks
This last Chapter summarizes the objectives of the thesis focusing on achieved results.
Then, a list of topics that could be farther analyzed and studies in deep are presented as
future work.
7.1 CONCLUSIONS
In this thesis work the implementation of a Software Defined Radio for a Satellite
Radio System has been presented. The main goal to achieve was to obtain a
Receiver prototype able to receive, process and decode a signal that is as close as
possible to a real one.
At this purpose, a Receiver architecture has been analyzed and the design was
adapted to the specifications that are chosen for the prototyping.
Simulation analysis, for each block of the receiver chain, were made performing
simulations in MATLAB environment, creating a first implementation that was
not able to work with a real-time signal, but able to process a signal having some
specific characteristics.
Finally, a prototype was implemented on a development board with good
results, which are very near to the ones extracted from the design phase. The
prototype was tested and validated providing a solid basis for future
developments.
7. Final Remarks
100
Here, it is reported a summary of the objectives proposed in Chapter 1.2 focusing
on the obtained results:
• Literature study of a Satellite Radio System focusing on the Receiver
structure: the study on a typical coherent satellite receiver gave
information about its architecture. Following the CCSDS Standard, a
series of specification about the transmitted signal in a TC system are fixed
and they were a guideline in the modeling of the initial proposed Rx
architecture;
• Study of a Base Station Transmitter to generate an Uplink signal, able to
send realistic data: it was chosen to use the software GNU Radio in order
to model the transmitted signal basing on the fixed specification. This
choice allowed to make a flexible transmitter structure that is able to save
the signal as a stream of bits in a file, for the first part of simulation in
MATLAB, but also to send the signal in real-time exploiting different
interfaces for next implementation on the development board;
• Feasibility analysis of the proposed architecture through MATLAB
simulations: MATLAB was used as simulation environment. The receiver
chain is implemented, optimizing each block and tuning their parameters
for adapting them to the transmitted signal. A study on their performance
has been done, choosing the best solution in the design process;
• Implementation on a Development Board able to work in real-time: once
the architecture has been frozen, all the designed blocks were
implemented in C language on a development board;
• Performance measurements of each block finding which of them can be
implemented in hardware: two blocks that required too much resources, if
implemented in software, were exported in hardware;
7. Final Remarks
101
• Test of the prototype simulating a real environment condition: the
prototype was tested and validated through performance analysis. The
receiver is also tested in harsh conditions, changing the SNR and adding
some impairments to the transmitted signal.
7.2 FUTURE WORK
The test campaign of the Receiver prototype is still on-going and it is needed to
assess in a more defined way the system performance and the system
specifications.
When the specifications of the system will be defined in detail, the structure of
the Telecommand System can be frozen for a future realization of the RF part of
the receiver. This will allow to study the behavior of the baseband part of the
receiver with the real impairments added by non-idealities of the RF front-end.
After that, following the same steps for the design and implementation of the TC
system, the Telemetry system will be taken into account. Transmitter structure,
MATLAB simulations, performance measurements and implementation on a
development board will be performed.
The two projects will be merged together in order to have a complete satellite
transponder prototype.
102
Bibliography
[1] Gara Quintana-Diaz, Roger Birkeland, “Software-Defined Radios in Satellite
Communications”, Norwegian University of Technology and Science, May
2018
[2] CCSDS Blue Book, “TC synchronization and Channel Coding”,
CCSDS 231.0-B-3, September 2017
[3] “Communication Systems / Coherent Receivers”,
https://en.wikibooks.org/wiki/Communication_Systems/Coherent_Receive
rs
[4] “GNU Radio”, https://wiki.gnuradio.org/index.php/Main_Page
[5] “AWGN Channel”, https://it.mathworks.com/help/comm/ug/awgn-
channel.html
[6] Zynq-7000 SoC, “Technical Reference Manual”, UG585 (v1.12.2), July 2018
[7] Rich Griffin, Silica EMEA, “Designing a Custom AXI-lite Slave Peripheral”,
Version 1.0, July 2014
[8] Deep Space Network, “Pseudo-Noise and Regenerative Ranging”, 810-005
214, Rev. B, 2019
[9] Lydi Smaini, “RF Analog Impairments Modeling for Communication Systems
Simulation: Application to OFDM-Based Transceivers”, September 1012
[10] “Phase Look Loop (PLL)”,
https://gssc.esa.int/navipedia/index.php/Phase_Lock_Loop_(PLL), 2011
[11] “Bilinear Transform”, https://en.wikipedia.org/wiki/Bilinear_transform
[12] “Building a Numerically Controlled Oscillator”
https://zipcpu.com/dsp/2017/12/09/nco.html, 2017
7. Final Remarks
103
[13] Timo I. Laakso and Bing Zeng, “Limit Cycles in Floating-Point
Implementations of Recursive Filters – A Reviews”, IEEE International
Symposium, June 1992
[14] Erkin Cubukcu, “Root Raised Cosine (RRC) Filters and Pulse Shaping in
Communication Systems”, Friday 2012
https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20120008631.pdf
[15] “Eye Pattern”, https://en.wikipedia.org/wiki/Eye_pattern
[16] Huang Lou and Pingfen Lin, “A New Lock Detector for Gardner’s Timing
Recovery Method”, IEEE Transactions on Consumer Electronics, May 2008
[17] G. Karam, V. Paxal and M. Moeneclaey, “Lock Detectors for Timing
Recovery”, International Conference on Communications, June 1996
[18] CCSDS Green Book, “TC Synchronization and Channel Coding – Summary of
Concept and Rationale”, CCSDS 230.1-G-2, November 2012
[19] TU Wien, “LDPC Codes”, https://www.nt.tuwien.ac.at/wp-
content/uploads/2016/10/DC2-16_Ch7_LDPC.pdf
[20] Vikram Chandrasetty, Syed Mahfuzul Aziz, “Resource Efficient LDPC
Decoders”, Academic Press, December 2017
[21] “Creating a Custom IP core using the IP Integrator”, Digilent,
https://reference.digilentinc.com/learn/programmable-
logic/tutorials/zedboard-creating-custom-ip-cores/start