Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Active-Stereo Synchronization of multiple Displays via
EthernetMaster Thesis
Saarland University
Faculty 6 - Natural Sciences and Technology I
Computer and Communications Technology
submitted by: Julian Metzger
on: January 2, 2012
supervisor: Dipl.-Ing. Jochen Miroll
1st examiner: Prof. Dr.-Ing. Thorsten Herfet
2nd examiner: Prof. Dr.-Ing. Philipp Slusallek
Masterarbeit
Master ThesisB.Sc. Julian Metzger
Topic:Active-Stereo Synchronization of multiple Displays via Ethernet
Tiled displays, video walls and virtual reality (VR) installations typically consist ofmultiple identical displays such as a number (n) of CRT-monitors, LCDs or projec-tors, creating an immersive experience by capturing a large part of the field of vision.Composition of the set of n displays in an (x.y=n) setup enables an increased resolu-tion compared to a single display. In order to create a single virtual canvas, the dis-plays have lo be frame-locked (Framelock) and the content sources have to begenerator-locked (GenLock).
In this work, Framelock of n displays driven by n independent PCs, where onedisplay serves as the clock master, shall be realized via Ethernet at an accuracy thatis sufficient tor l2OHz active-stereo while ghosting artifacts are limited. A scalablemechanism and the software framework for this purpose shall be established andresults of an evaluation of the accuracy in theory and by measurement shall be ob-tained.
This topic includes the following tasks:
o Brief description of real-time Ethernet and comparison of Ethernet clocksynchronization protocols and their implementations, such as NTPv4, IEEE1588 (PTP) and 802.1as.
. Description and analysis of GenLock mechanisms as used in MPEG Trans-port streams (H.222.01SO/|EC 13818-1) when implemented in software onPCs, as well as digitalvideo signal (DVliHDMl) generation and Genlock.
. Description and evaluation of runtime refresh rate varialion and its artifacts.o Summary of stereoscoprc display technologies and of scalable, tiled (VR)
display wall projects as provided in the literature, and their requirements.. Description of possible lP-based, scalable multi display Framelock architec-
tures in which one the displays serves as the master clock.o Design, implementation and evaluation of a prototype for active stereoscopy.o Measurement of synchronization accuracy and long term stability in the
presence of background traffic and extrapolation of the results for many(n t 8) displays.
Software development may be based upon pre-existing open source projects and/orvideo driver code. The prototype shall consist ol at least three display nodes, forwhich hardware is available. Measurements may be obtained by synchronous dis-play of "dummy" images.
Betreuer: a
. "Q-7a1_tL
Dipl.- lng. Jochen Miroll
LehrstuhlfürNachrichtentechnik
FR Informatik
Prof. Dr. Th. Herfet
Universität des SaarlandesGampus Saarbrückenc6 3, 10. OG66123 Saarbrucken
Telefon (0681) 302-6541Telefax (0681) 302-6542
www. nt.unl-saarland.de
UNIVERSIT
Eidesstattliche Erklärung Ich erkläre hiermit an Eides Statt, dass ich die vorliegende Arbeit selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe.
Statement under Oath I confirm under oath that I have written this thesis on my own and that I have not used any other media or materials than the ones referred to in this thesis.
Einverständniserklärung Ich bin damit einverstanden, dass meine (bestandene) Arbeit in beiden Versionen in die Bibliothek der Informatik aufgenommen und damit veröffentlicht wird.
Declaration of Consent I agree to make both versions of my thesis (with a passing grade) accessible to the public by having them added to the library of the Computer Science Department. Saarbrücken,…………………………….. …………………………………………. (Datum / Date) (Unterschrift / Signature)
Contents
Contents
Contents 4
1 Introduction 6
2 Project Description 7
3 Refresh Rate and Display Timing 10
4 Synchronization Techniques on Ethernet 13
4.1 Realtime Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Network Time Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Precision Time Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 802.1as . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Genlock in MPEG 20
6 Display Technologies for Stereo Vision 23
6.1 Active Shutter Stereo Display Technology . . . . . . . . . . . . . . . . . . . . 23
6.2 Polarization Stereo Display Technology . . . . . . . . . . . . . . . . . . . . . . 24
6.3 HDMI 1.4a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
7 Other Display Wall Solutions and Projects 27
7.1 Hardware Genlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.2 SoftGenLock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.3 WinSGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8 Refresh Rate Adaptation 32
8.1 Display Timing on Common Graphics Devices . . . . . . . . . . . . . . . . . . 32
8.2 Software controlled VCXO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
9 Phase-Locked-Loop 37
9.1 Frequency Characteristics of PLLs . . . . . . . . . . . . . . . . . . . . . . . . 40
9.2 Type I and Type II Phase-Locked-Loops . . . . . . . . . . . . . . . . . . . . . 41
9.3 Inner Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.3.1 Loop Filter Design Tools . . . . . . . . . . . . . . . . . . . . . . . . . 43
9.4 Software PLL in the Display Synchronization . . . . . . . . . . . . . . . . . . 45
9.4.1 PLL Design Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
10 Necessity of Synchronization 50
4
Contents
11 Synchronization Architecture 52
11.1 Synchronization Packet Format . . . . . . . . . . . . . . . . . . . . . . . . . . 54
12 Implementation Details 55
12.1 Clock Master Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
12.2 Slave Display Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
12.2.1 The VBLANK Detector . . . . . . . . . . . . . . . . . . . . . . . . . . 58
12.2.2 Synchronization Core . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
12.2.3 RTT Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
12.3 Frame deadline Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
13 Measurements and Synchronization Performance 71
14 Outlook 74
References 76
A SPLL Code Snippet 77
B Display Timings on Intel Graphics Cards 78
B.1 Display Pipe timing registers . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
B.2 Intel graphics devices generations . . . . . . . . . . . . . . . . . . . . . . . . . 80
List of Figures 82
List of Tables 82
Glossary 84
5
1 INTRODUCTION
1 Introduction
Composite displays built from LCDs are an appropriate solution for large screen sizes. These
display walls can provide a very high resolution of more than 10 mega pixels and a very high
pixel density, as each single LCD in the composite display already can feature at least roughly
2 mega pixel. The LCDs provide a brilliance in color, that is unmatched by digital video
projectors and the content of LCDs is visible also in bright environment.
High performance projectors are very expensive compared to LCDs, and already consumer
LCDs provide a high image quality. Furthermore no canvas – which is also quite expensive
for large screen sizes – is necessary.
Today stereo capable displays are available at moderate prices, allowing to build up a large,
several times FullHD, and stereoscopic composite display.
There are several commercial and non-commercial software solutions for connecting a number
of displays into one large screen. These solutions however have in common, that the displays
are connected to the content generating nodes by dedicated display cabling like DVI or HDMI.
Connecting multiple PCs to the composite display is complex and costs time to set up. A
reconfiguration requires reconnecting cables and reconfiguring software.
The Display Wall project that is investigated at the Intel Visual Computing Institute in
Saarbrucken aims to build a stereo capable composite display. Exceptional is that the only
connection of the displays is an IP network. Configuration and transfer of the video content
is purely software based, encapsulated in IP packets.
To enable a good visual quality and facilitate the impression of one composite display, with
seamless images across the borders of the single screens, a tight time synchronization of the
displays is necessary. Furthermore it is essential for the goal to make the Display Wall stereo
capable. Missing synchronization already disturbs the visual quality for two-dimensional
content, but will completely destroy any stereo effect and leave the spectator with doubled
images, and severe ghosting.
Not only the display nodes but also the image generating nodes should be synchronized to
transmit the content right in time and match their rendering rate to that of the sinks.
As the only connection to the outside is an IP network, the synchronization is required to run
over Ethernet and abandon any additional synchronization cabling. The topic of this thesis
is the synchronization of active-stereo displays over Ethernet.
6
2 PROJECT DESCRIPTION
Figure 1: Projected Prototype
2 Project Description
The goal of the project described in this thesis is to implement and evaluate a software
based method to synchronize a number of display nodes assembled to a composite display.
A prototype with three synchronized display nodes was built to demonstrate and test the
development.
The requirements designated are:
• The synchronization should be precise enough to enable active-stereo vision.
• The synchronization should rely on Ethernet only and abandon any dedicated, addi-
tional cabling for synchronization.
• The synchronization must not create artifacts that affect the visual quality.
• The software architecture should scale to a larger number of displays nodes.
Figure 1 illustrates the projected prototype.
One of the display nodes, the clock master display (CMD), serves as master clock and provides
7
2 PROJECT DESCRIPTION
a reference clock to a set of slave display nodes (SDNs). Each node is connected to a stereo
capable display, driven with a refresh rate of e.g. 120Hz. The master is connected to the slave
nodes via Ethernet and periodically broadcasts synchronization information for the slaves to
adapt to the masters exact refresh rate. The stereo glasses are separately synchronized to
one display node via infrared.
Talking about video synchronization and composite displays there exist several terms to
describe the level of synchronization:
Genlock The video output of a system is synchronized to an external clock signal (generator
lock).
Framelock At least two nodes that display frames at exactly the same rate and with the
same phase have framelock. Framelock is a crucial requirement for active stereo vision.
Swaplock Applications that run on different nodes but generate content for a composite
display need to swap their buffers at exactly the same time. This is ensured by swaplock.
Framelock is an indispensable requirement for displaying active stereo on a composite display.
Missing framelock does not only disturb the impression of the composite image, as the stereo
shutter-glasses are synchronized only to one display node. Without framelock, the stereo
vision will vanish on all but this one displays, as no stereo separation will be possible.
The developed synchronization architecture is to be included in the Display Wall project later
and will display video streams received over Ethernet. Thus also synchronization with the
video sources is necessary: the generation rate of video frames must match the consumption
rate at the video sinks in order to avoid buffer over- and underrun. This can be achieved with
genlock. However, the restrictions on frequency are much tighter at the video sinks. Also
it is not guaranteed that the video sources are on the same network as the video sinks. A
genlock from a video source over a large distance would complicate the synchronization due
to larger jitter and delay, possible packet reordering and packet loss.
For these reasons it was decided to use a reverse genlock. Instead of the video source clock-
ing the displays, the master display provides information to the video source about frame
consumption rate and video generation requirements.
Also swaplock is a necessary ingredient for the final Display Wall. Swaplock ensures, that
all video sinks display the correct video frame. This is important especially for video scenes
that contain movements as it avoids incoherence between single parts of the composite im-
age. The synchronization architecture itself does not provide swaplock, but it supports the
implementation of swaplock by providing the necessary information.
The main goal of this work is to ensure proper and accurate framelock but it also provides
the necessary information for genlock and swaplock.
The following sections 3, 4, 5, 6 and 9 will unroll theoretical background regarding synchro-
nization and display technologies.
8
2 PROJECT DESCRIPTION
Section 7 summarizes earlier display wall projects and commercial solutions, section 8 explains
approaches and the results of three different refresh rate variation methods.
Section 11 describes the synchronization architecture - as implemented in the prototype - to
build a scalable, synchronized display wall for active stereo.
Section 12 explains the details of the implementation.
9
3 REFRESH RATE AND DISPLAY TIMING
3 Refresh Rate and Display Timing
The following section explains the details of refresh rate and display timing.
The image on a display device is generated at some rate by the graphics device and sent to
the display. The display refreshes the currently shown image with the new pixel data. At this
time LCDs are the common display devices and have nearly displaced CRTs. The creation
of the visible image is completely different but the format of the pixel data sent via cable to
the displays is still the same.
To understand the background of refresh rate generation one has to look onto the image
generation of analog CRTs: In a CRT an electron beam moves over the screen and excites
a fluorescent material to emit light. Each pixel is drawn serially and separately onto the
screen, starting from the upper left corner and proceeding line by line to the right bottom.
At each line end the beam needs to be steered from the right edge back to the beginning of
the next line. This action is called the horizontal retrace. Within the retrace the beam must
be switched off, to prevent drawing unwanted pixels onto the screen. The phase within the
beam is switched off and is horizontally retraced, is called the horizontal blanking interval
(HBLANK).
To signal the monitor the end of the line, the horizontal sync interval (HSYNC) is included
at the end of each line. It is positioned within the HBLANK. To allow the analog voltage
signal on the display cable to stabilize before and after HSYNC, two additional margins are
inserted, the front porch and the back porch.
The same applies in the vertical direction. At the end of the last bottom line, the electron
beam is required to travel back to the upper left display corner. Therefore the vertical blank-
ing interval (VBLANK) follows after the last line of the visible image. The vertical sync
interval (VSYNC) instructs the electron beam to retrace. A vertical front and back porch is
included as well.
A periodic refresh of the image is necessary, to create an image that not appears flickering
to the viewer. The fluorescent material has a specific afterglow. If the pixel is not re-drawn
within this afterglow period, the pixel will darken out and vanish. This periodic refreshing of
each image pixel is called the refresh rate – in the following denoted as vr – and is identical
to the rate of the VSYNCs. CRTs require – depending on the hardware and susceptibility of
the viewer – refresh rates of at least 75− 80Hz for an undisturbed image perceptibility.
As the electron beam requires a few µs to retrace, the length of the blanking periods must be
sufficient. The lengths of the blanking periods for CRTs are specified by the Video Electronics
Standards Association (VESA) in the general timing formula (GTF). Figure 2 shows the
geometry of the image.
10
3 REFRESH RATE AND DISPLAY TIMING
Visible Display Area
HS
YN
C
H_ACTIVE
H_TOTAL
V_B
LAN
K_S
TA
RT
V_B
LAN
K_E
ND
V_S
YN
C_E
ND
V_S
YN
C_S
TA
RT
VSYNC
Figure 2: pixel alignment
On the cable each pixel is transferred after each other in one continuous stream. This is
illustrated in figure 3
Figure 3: Pixel transmission on the display cable
The image generation of LCDs is completely different than that of CRTs. A LCD has a
fixed number of pixels, which are accessed by a matrix. Once a pixel is switched on, it
theoretically needs to be changed only, if the image content changes. Though the DVI
specification already mentions a selective refresh, commonly a periodic refresh with fixed
rate is used. As no flickering occurs, the refresh rate on LCDs is typically chosen as 60Hz.
On digital television sets (DTVs) refresh rates that are equal or multiples of video frame rates
11
3 REFRESH RATE AND DISPLAY TIMING
are preferred, e.g. 60Hz · 1000/1001 = 59.94Hz = 2 · 29.97fps.
The format of the video output of the graphics card is the same also for digital display devices.
All periods described above are contained in the video signal. Basis for the transmission in
DVI and HDMI is the TMDS link. Each TMDS link contains three data channels, that carry
10-bit symbols that are created of 8-bit pixel data each. Each TMDS link can be clocked
with a rate of up to 165MHz. This clock is transmitted on the display cable and is called the
pixelclock or dotclock. On digital display devices this pixelclock is used for synchronization of
display and graphics card instead of the HSYNC and VSYNC. One TMDS link is mandatory,
the second one optional. Intel graphics devices only support one TMDS link.
The pixelclock is directly related to the refresh rate. The pixelclock is the rate at which each
single pixel is transmitted. The refresh rate rv is determined as the pixelclock fp divided by
the total number of displayed pixels.
rv =fp
htot · vtot(1)
Equation 1 makes obvious that two possibilities exist to change the refresh rate.
1. A change of the denominator, the number of pixels transferred to the display.
2. Modification of the numerator, the pixelclock.
Both methods were evaluated, the approach and the results are explained in detail in section 8.
12
4 SYNCHRONIZATION TECHNIQUES ON ETHERNET
4 Synchronization Techniques on Ethernet
Synchronization between remote systems is crucial to many applications. There are several
points that influence the accuracy of synchronization over networks:
• frequency stability and deviation of quartz oscillators
• accuracy of timestamp generation on ingress and egress of network packets
• delay and jitter induced by the network
• network topology
• delay and scheduling indeterminism of the operating systems
The following part gives an overview over realtime Ethernet and time synchronization proto-
cols.
4.1 Realtime Ethernet
Standard Ethernet lacks realtime capabilities. It does not provide reliable transmission nor
sticks to deterministic time constraints.
Indeterminism in 802.3 is introduced at several points:
• no realtime scheduling of the OS
• dynamic address resolution
• collisions on the shared medium
• delay and lost packets due to congestion
There exist currently a number of realtime Ethernet solutions, mostly commercial, that aim
to overcome the problems stated above.
Two concepts of realtime can be distinguished:
hard realtime Missing a deadline is not tolerable and is regarded as a failure of the system.
soft realtime Missing a deadline results in degraded system performance but the system
remains functional.
Approaches to make standard Ethernet realtime can be made on all network layers. The
effectiveness however increases towards the physical layer.
One method to enable soft realtime is to introduce a priority scheme. Depending on the
deadline and importance of the data a priority number is assigned. Network devices at the
nodes and switches antedate the transmission of packets with a higher priority.
However this can not guarantee determinism. In case of many senders transmitting high
priority packets still congestion or packet losses can occur.
13
4 SYNCHRONIZATION TECHNIQUES ON ETHERNET
Achieving hard realtime usually requires modifications on the lower layers. In switched net-
works switches with realtime extensions are necessary. The principle in most approaches to
establish a guaranteed transmission delay and bandwidth is to introduce TDMA. A master
assigns timeslots to the nodes in the network. In each timeslot only one specific node is al-
lowed to send data. An accurate time synchronization is required between master and slaves
as the slaves must transmit exactly at the admeasured time.
4.2 Network Time Protocol
The Network Time Protocol (NTP) was invented in 1985. Its goal is to synchronize clocks
on distant nodes. The techniques and algorithms it uses enable precision of double-digit
milliseconds, in ideal cases in LANs up to a few milliseconds. The base of the protocol is a
periodic exchange of synchronization messages containing timestamps. Meanwhile version 4
of the protocol is up to date.
The NTP timestamp contains a 64-bit value, which consists of an unsigned 32-bit seconds field
and an unsigned 32-bit fractional seconds field, which gives an accuracy of 2−32s = 232ps and
a range of 232s = 136.19years. For special purposes also a 32-bit short format and a 128-bit
date format are available.
The synchronization is based on calculations with four timestamps, which are exchanged in
the NTP packets. In the following one synchronization round is described, as depicted in
figure 4: Peer A – in the role of a polling client – sends a NTP message containing timestamp
t1(t) to peer B. Peer B generates timestamp t2(t) on packet ingress. It builds a reply packet,
inserts t1(t) and t2(t) and adds t3(t), the timestamp generated at the time of egress. When
the reply packet arrives at peer A, peer A generates timestamp t4(t).
Figure 4: NTP Synchronization Diagram
From these timestamps peer A is able to calculate the round-trip delay δ(t), the time that the
packet needs to travel one round, excluding the processing time at the remote node. Knowing
the round-trip delay, the client is able to compute offset θ(t), the time difference between t2(t)
and the correct time at the master in the moment t2(t) was generated. The calculation of
14
4 SYNCHRONIZATION TECHNIQUES ON ETHERNET
θ(t) assumes a symmetric path delay.
δ(t) = (t2(t)− t1(t)) + (t4(t)− t3(t)) (2)
θ(t) = (t2(t)− t1(t))− δ(t)/2 =(t2(t)− t1(t))− (t4(t)− t3(t))
2(3)
To increase accuracy and minimize the impact of jitter and temporarily increased round-trip
delay due to congestion NTP uses a clock filtering algorithm. For each incoming NTP packet
from different servers, statistics besides θ(t) and δ(t) are calculated. Based on these statistics
only the good NTP time servers are chosen as time reference. Details can be found in [9].
4.3 Precision Time Protocol
The Precision Time Protocol (PTP) , standardized as IEEE 1588, was developed to provide
finer time precision and an increased accuracy compared to NTP. It was proposed in 2002, in
2008 it was updated to the second version. It can provide accuracy up to ns range. It includes
an algorithm to build up a hierarchical clock tree, with one root clock, the grandmaster (GM).
The GM is typically connected to an external high precision clock, e.g. GPS or radio clock.
One of the key differences compared to NTP is the separation of the synchronization in two
separate steps:
• Synchronization of the oscillator frequency at the slaves to the reference frequency of
the GM.
• Calculation of the delay to the GM to determine the absolute time.
The process of clock synchronization is called syntonization. For syntonization the PTP
server, the clock master, periodically broadcasts packets with timestamp tm to the client, the
PTP slave. At each arrival of a syntonization packet, the client generates timestamp ts. The
slave is in sync with the master, if
ti+1m − tim = ti+1
s − timand
ti+1s − ti+1
m = tis − tim(4)
15
4 SYNCHRONIZATION TECHNIQUES ON ETHERNET
Figure 5: PTP Syntonization
The follow_up messages shown in figure 5 are one of two syntonization options. In the two
step option the client creates its local timestamp on the reception of the sync packet. The
masters timestamp is received afterwards in the follow_up message. This option is included
in the standard to allow nodes take part in the synchronization which are not able to alter the
packet content on the fly. This could be e.g. PTP capable Ethernet bridges. Alternatively
PTP allows to abandon the follow_up packet, if instead the timestamp is sent within the
sync packet. The sync and follow_up messages are sent as multicast, thus allows all clients
to receive syntonization.
In IEEE 1588 version 1 the sync message contains the timestamps for syntonization and
the information for building a clock tree in hierarchical network topologies. To increase the
precision, the second version of PTP splits the sync packets of PTP version 1 into a sync
and an announce message. Syntonization is based on the sync packets and the announce
messages are exchanged between the network nodes to build the clock tree. For the sync
packets version 2 defines packet rates up to 1/8s, the highest rate in version 1 is 1s. The
separation into sync and announce reduces the overhead, as the clock tree informations can
be updated with a lower rate.
One of the most important ingredients is a hardware timestamping mechanism. Timestamps
that are generated on the application layer suffer from varying scheduling and processing
delay, induced by passing the packets through the network stack. Timestamps that are
generated by the network hardware preserve the exact moment of packet ingress also when
the PTP packet is handled with delay by the PTP application. Therefore IEEE 1588 capable
network hardware includes a hardware timestamping mechanism between the MAC layer and
the physical layer and thus increases the synchronization performance of PTP.
The syntonization provides for the correct frequency at the client. To remove a possible
offset between the slave and master clock, PTP proceeds similar as NTP. The clock offset is
calculated in the same way as in NTP, given in equation 3. Timestamps t1 and t2 are already
available from the last sync packet. To get timestamps t3 and t4, the slave sends a delay_req
packet to the master. Timestamp t3 is generated at packet egress. The master responds with
a delay_resp packet, which contains t4, the time of packet arrival at the master.
If multiple master clocks are available, PTP employs an algorithm, the best master clock algo-
16
4 SYNCHRONIZATION TECHNIQUES ON ETHERNET
rithm (BMCA), to find the most accurate and stable master clock. In hierarchical topologies
with multiple switches which can act as slave and master, a loop free configuration is found
by the BMCA.
Network switches and routers usually lead to a degradation of the synchronization, as they
introduce queuing delay, that is dependent on the network load and not deterministic from
the nodes view. IEEE 1588 capable routers and switches address this problem. Besides clock
master and slave two additional clock types are defined:
• boundary clocks (BC)
• transparent clocks (TC)
A BC acts as slave and a master simultaneously. On the slave port it receives the synchroniza-
tion information from the GM or another BC and synchronizes its local clock to the master.
On the other ports it acts as a master and provides synchronization to connected slave clocks.
Sync packets will not be forwarded through a BC.
Transparent clocks relay all sync messages. However they communicate the delay introduced
by buffering in their queues by altering the sync packets respectively the follow_up messages.
Two types of TCs are described by IEEE 1588:
The end-to-end transparent clock measures the time in which the sync packet was delayed
in the switch, the resident time. For this, timestamps on ingress and egress are taken, the
resident time is the difference. The resident time is communicated to the slave clocks via a
correction field in the PTP packets. As mentioned above this can happen either directly in
the sync packet or in a separate follow_up. The delay at the slave clocks is measured using
the delay_req and delay_resp messages as described above. For precise calculation of the
resident time, the clock of the TC needs to be syntonized but not synchronized.
The peer-to-peer transparent clocks measure additionally to the resident time also the link
delay to their direct neighbors. When a sync packet travels to the slave clock, the link delay
and the residence time is summed up in the correction field. Thus all information for syn-
tonization and synchronization is provided by the sync packets, respectively the follow_up
message. No additional delay_req and delay_resp packets need to be exchanged between
slave and master. This decreases the load on the GM in large networks.
One application field for PTP is realtime Ethernet, where it provides accurate synchronization
for TDMA.
4.4 802.1as
802.1as is a standard developed by the AVB group of 802.1. It is closely related to IEEE 1588,
being a subset of that standard composed for a specific application scenario and enhanced
for use also in 802.11 wireless networks. It is designed to synchronize clocks in heterogeneous
17
4 SYNCHRONIZATION TECHNIQUES ON ETHERNET
bridged networks with a deviation of at most ±500ns. The purpose is particularly the syn-
chronized playback of audio and video streams. It assumes quartz oscillators which conform
to a maximum offset of ±100ppm and a frequency drift of at most 1ppm/s. It defines several
default values where PTP offers a parameter set, e.g. it specifies the syntonization rate to be
1/8 s and the delay calculation interval as 1 s.
It features an automatic selection of the best clock, the GM, and construction of a hierarchical
clock distribution tree. A clock that can serve as GM sends announce messages. If a GM
receives announce messages of a more precise GM, it abandons to announce itself. Clock
aware bridges relay only the announce messages of the best GM. In the end only one GM
will be left and supports the whole network with its clock.
A particular role play the 802.1as capable bridges, which are similar to the peer-to-peer TCs
in IEEE 1588. Each bridge measures on each port the delay to its neighbor, using the metrics
described in NTP and PTP, equation 2. Additionally the bridges determine the clock ratio
rc of all neighbors compared to their own clock for each link:
rc =tn(i+ 1)− tn(i)
tl(i+ 1)− tl(i). (5)
Where tn are the timestamps received from the neighbors, tl are the locally generated times-
tamps. The exchange and generation of timestamps is done as the syntonization in PTP.
No GM is needed for the calculation of delay and clock ratio to the neighbor nodes. As soon
as a GM announces itself, synchronization is gained within a short period, as the necessary
values are already calculated. The clock is propagated from the root clock towards the leafs
of the clock distribution tree. The clock ratios on the clock path are cumulated to compute
the ratio between GM and the local clock.
Rc = Rcn + (1.0− rc) (6)
Rc is the clock ratio between local clock and GM, rc is the clock ratio between local clock
and the neighbors clock in direction of the master clock. Rcn is the cumulated clock ratio
between neighbor and GM and is initialized with 1. The delay is the sum of all propagation
and processing delays.
When clock ratio and the cumulated delay to the clock source is available at a client, it is
able to calculate the correct time as:
ts(t)!
= tm(t) + ∆(t) =
= tm(t) + (∆p(t) + ∆r(t)) + δpn(t) ·Rc(t) =
= tm(t) +∑i
(δip(t) + δir(t)
)Ri
c(t) + δpn(t) ·Rc(t)
(7)
ts(t) is the correct time at the slave node, tm(t) is the timestamp sent from the clock master,
18
4 SYNCHRONIZATION TECHNIQUES ON ETHERNET
Figure 6: Synchronization with 802.1as capable bridges
∆p(t) is the cumulated propagation delay on the path and ∆r(t) the cumulated resident delay.
i is the number of 802.1as bridges in the path between master and slave. Rc(t) is the clock
ratio as given in equation 6 and Ric(t) is the cumulated clock ratio between bridge i and the
GM. δpn(t) is the estimated propagation delay to the slaves direct neighbor, δip(t) and δir(t)
is the path delay and resident delay for each bridge.
The propagation and resident delays in figure 6 are assumed to be already relative to the
GM’s clock ratio.
19
5 GENLOCK IN MPEG
5 Genlock in MPEG
Broadcasters transmit MPEG transport streams at a predefined frame rate. The renderers
must play the received transport stream at exactly the same rate, otherwise buffer over-
or underruns will occur. This would lead to frame skips respectively frame repetitions. It
is crucial to eliminate the slightest clock deviation, as even a small frequency difference
cumulates a phase offset with each played frame. Therefore the frequency of the receiver must
be synchronized to the senders frequency, which is accomplished by a genlock mechanism.
Besides the synchronization of sender and receiver in broadcast mode also synchronization at
the playback of files on a local system is necessary. Corresponding audio and video streams
need to be played out at a matching rate - called lipsync - to provide smooth playback
and correct timing. Thus also a MPEG program stream need to include timing information
though the the playback only affects one local system.
ISO/IEC 13818-1 [13] describes the insertion of timestamps in MPEG 2 transport and pro-
gram streams.
The genlock mechanism is implemented by including reference timestamps into the MPEG
transport and programs streams. In transport streams these timestamps are called program
clock reference (PCR), program streams refer to system clock reference (SCR). The function
is basically the same, so the description using the term PCR applies to SCR as well.
The timestamps represent a 27MHz system clock. As nearly all devices deduce their system
frequency from a 27MHz quartz, the PCR timestamps are a direct representation of that
clock. The timestamps are sampled with 90kHz, which is 1/300 of 27MHz.
The timestamps are expressed in 42-bit, a 33-bit PCR base field, which encodes the 90kHz
value. The remainder is encoded in the 9-bit PCR extension field. Though the maximal value
fitting in the 9 bit would be 512, the remainder wraps at 300. Figure 7 shows the position of
the PCR field in the MPEG TS.
20
5 GENLOCK IN MPEG
Figure 7: PCR in Mpeg Transport Stream
The current system clock can be calculated from both PCR fields as
PCR(i) = PCR base(i) · 300 + PCR ext(i). (8)
The separation into base and extension from the current system clock value is retrieved as
PCR base(i) = PCR(i)/300
PCR ext(i) = PCR(i)%300(9)
As figure 7 indicates, the PCR is an optional field in the MPEG TS packets. The reason is,
that not each packet carries a PCR, but the PCR is periodically inserted. ISO/IEC 13818-1
specifies the maximal intervals between two consecutive PCR values, 40ms for TS and 100ms
for PS. The maximum of allowed jitter is ±500ns.
Figure 8 illustrates how the PCR is used to synchronize sender and receiver. The sender
generates the 42-bit PCR base and PCR extension values using a counter on its system time
clock (STC). The PCR is used to ensure synchronization in the encoded video and audio
streams. The PCR timestamps are multiplexed together with encoded audio and video data
into the MPEG TS.
21
5 GENLOCK IN MPEG
Figure 8: PCR at transmitter and receiver
The receiver demultiplexes the TS and extracts the PCR timestamps. The PCRs is used for
two purposes:
• The receiver synchronizes its own clock to the clock of the sender. The synchronization
of the STC is accomplished by a phase-locked-loop (PLL). Section 9 describes the PLL
in detail. As soon as the PLL has locked onto the senders frequency, the delay between
the encoding and the decoding of the video will be constant. The maximum amount of
jitter that is allowed in the PCR of MPEG TS is ±500 ns, thus the receiver is able to
acquire lock in less than a second.
• The packet elementary streams (PES) carried in the MPEG TS contain Presentation
Time Stamp (PTS) and optional Decoding Timestamp (DTS), both are relative to the
PCR. The DTS delivers the information, at what time the receiver has to decode a
frame. This is necessary, if the frames arrive in different order than the frames have to
be decoded.
The PTS carries the information at which a frame should be played out. DTS and PTS
are values ahead of the current time, but limited by ISO/IEC 13818-1 to 1s. Thus a
receiver must have enough buffer to store at least as many frames as equivalent to 1s
of play time.
22
6 DISPLAY TECHNOLOGIES FOR STEREO VISION
6 Display Technologies for Stereo Vision
The key for making a two-dimensional image on a flat screen visible as a three-dimensional
object is to produce and separate two images, one for each eye.
In the field of computer and television displays there are two competing technologies, active
shutter technology and polarization. Both require 3D glasses that carry out the separation
of the stereo images.
Current research tries to supersede the necessity to wear glasses, by transferring the image
separation into in the the display integrated parallax barriers. Though there are already a
few autostereoscopic displays available and this techniques probably will spread, but they
still have drawbacks that rule them out for the composition of a display wall: they usually
require the spectator to stand still in front of the screen. Eyetracking methods coping with
this problem still support only a limited number of viewers.
6.1 Active Shutter Stereo Display Technology
The active shutter technique interleaves the frames for right and left eye in time. The stereo
glasses consist of two small LCDs that can be independently switched see-through or opaque.
The glasses alternatingly change from opaque to translucent, synchronized to the refresh rate
of the stereo display, thus only the corresponding eye can see the current frame. The eye on
which the LCD is switched to opaque sees actually nothing, but with a sufficient refresh rate,
typically 120Hz, the brain substitutes the image it has seen last. In the head the images from
both eyes are assembled to a stereo impression.
120 Hz
60 Hz
60 Hz
60 Hz
Figure 9: Active Shutter Technique
The refresh rate of the display is halved by the stereo glasses. Thus the refresh rate should
be doubled compared to two-dimensional mode. The drawback of the shutter technique is
a reduced brightness due to the switching LCDs in the glasses. The perceived brightness is
23
6 DISPLAY TECHNOLOGIES FOR STEREO VISION
determined by the on and off-time of the glasses, but at most 50%. The advantage is no
reduction of the visible resolution.
6.2 Polarization Stereo Display Technology
The polarization stereo display technique makes use of polarized light and polarizing filters.
The two frames for the right and the left eye are interlaced into one frame. The odd lines
belong to the frame for one eye, the even ones are for the other eye, both are separated by
emitting them with different polarized light.
There are two types of polarization, that can be used.
• linear polarization
• circular polarization
The viewer wears glasses with polarizing filters. Each of the glasses lets only one direction
of polarization pass, thus the lines of the interleaved frame are separated. In this technique
both image parts from the stereo image are seen at the same time.
Linear polarization is the simpler form, also allowing cheaper filters. However linear polariza-
tion is not rotation invariant, so it lets the viewer loose the stereo vision, if he tilts his head.
Therefore circular polarization is the preferred method.
Figure 10: Left- and right-handed circular polarized waves
Opposing to the shutter technique there is no flickering, but the interlaced image provides
only half of the resolution of 2D mode.
6.3 HDMI 1.4a
HDMI 1.4a is the part of the HDMI 1.4 standard, that describes the format of the transmission
of stereo images from the graphics device to the display. It is to be expected, that the support
24
6 DISPLAY TECHNOLOGIES FOR STEREO VISION
Figure 11: HDMI 1.4a Frame Packing (compare p.8[9])
of HDMI 1.4a in future stereo capable devices will increase and it might also be the choice
of video format in the final Display Wall. Therefore it should be explained here shortly.
Before the introduction of HDMI 1.4a there were two methods to bring stereo data onto
displays, both described above:
• time interleaving the frames for right and left eye, which doubles the refresh rate com-
pared to that of 2D content
• frame interleaving both two stereo fields into one frame, reducing the vertical resolution
by a factor of 1/2
HDMI 1.4a defines three frame packing methods. While the video sinks need to be capable of
HDMI 1.4a it is possible to generate HDMI 1.4a conforming video formats on graphics cards
supporting only HDMI 1.3 by using custom modelines.
Several formats are proposed for addition to the standard in future, currently these modes
are defined and mandatory for a HDMI 1.4a capable sink:
• Frame Packing
• Side-by-Side (Half)
• Top-and-Bottom
All three modes transmit the fields for the left and right eye in one stereo frame. The modes
Side-by-Side (Half) and Top-Bottom divide the resolution in horizontal respectively vertical
direction and attach both frames together. Both method differ to the frame interleaving only
in the alignment of rows or columns. However, the format is suitable for both stereo display
technologies. The display hardware can either rearrange the pixels to an interleaved frame
for the polarization technique or display both halves of the frame alternatingly and scaled to
fullscreen for active shutter technique.
Interesting is the mode Frame Packing. It assembles the frames for left and right eye in
vertical direction into one “superframe”. The superframe has a resolution that is twice as
high as the 2D frames plus an additional margin between both stereo fields. In this mode the
pixelclock is doubled compared to 2D video.
At a video display which uses the shutter technique, both stereo fields from the superframe
25
6 DISPLAY TECHNOLOGIES FOR STEREO VISION
are shown alternatingly. Thus the refresh rate at the video sink doubles, compared to the
refresh rate of the graphics device.
This mode is an alternative for synchronization. The graphics cards would measure a frame
rate that equals the refresh rate for one eye and the synchronization would be based on this
frequency. The displays automatically double the refresh rate and flips between the frames
for right and left eye. It turned out, that also mixed HDMI 1.4a frame packing at a refresh
rate of e.g. 60Hz and time interleaved stereo frames at a rate of 120Hz on different nodes
can be synchronized. The visible results on the display are the same, but the pixel transport
and the VBLANK rate at the display nodes are different. The resolution at a refresh rate of
60Hz is limited to 720p.
26
7 OTHER DISPLAY WALL SOLUTIONS AND PROJECTS
7 Other Display Wall Solutions and Projects
Display walls are a solution, if large displays areas are necessary and the use of beamers
or special super large displays is not possible. Therefor a number of different solutions for
composing display walls exist. A selection of Display Wall technologies and projects will be
described in the following section and related to our project.
7.1 Hardware Genlock
Solutions to build a display wall with synchronized displays comes from different hardware
manufacturers. One example comes from NVidia and is presented here. NVidia high end
graphics devices of the NVidia Quadro series can be connected to an additional NVidia
Quadro G-Sync card. The G-Sync card augments an synchronization interface to the graphics
cards, such that the Quadro graphics devices can be put into slave mode and follow the
timing received at the input ports of the G-Sync card. G-Sync cards of remote hosts can
be connected via CAT5 patch cables, to relay a framelock and eventually a genlock signal
between the connected graphics cards. Though the solution of Nvidia uses the same cables
as Ethernet, the both are incompatible with each other, as signals and voltage levels are
different.
Each G-Sync card has two framelock ports that can serve either as input or output port.
Thus the framelock server can provide synchronization signals to at most two clients, each
client can relay the synchronization signal to another client. Thus this solution requires the
nodes to be connected by a daisy-chain.
Additionally to the framelock it is possible to feed a genlock signal to the master by connecting
a genlock source to the genlock connector on the G-Sync card. The master following the
genlock synchronizes also the clients to the genlock. Additionally to the framelock the NVidia
graphics driver provides an GLX extension for swap lock.
NVidia states its synchronization solution the be precise below scanline level. This means
that the phase offset of all synchronized displays will not be larger than the period of one
horizontal line. That is at a display resolution of 1080p and refresh rate of 120Hz less than
±10µs.
Though dedicated hardware synchronization solutions are very accurate, there are some short-
comings:
The specialized hardware is quite expensive, though it has the possibility to connect two
screens to each graphics device. The daisy-chain setup is vulnerable compared to a broadcast
or hierarchical scenario, if one of the slaves fails. All slaves behind the faulty slave will be cut
off and loose synchronization. In case of permanent failure, cables need to be reconnected.
Furthermore it needs a dedicated cable connection purely for the synchronization.
27
7 OTHER DISPLAY WALL SOLUTIONS AND PROJECTS
7.2 SoftGenLock
The software SoftGenLock[1] was released in 2001. Its goal is to provide genlock synchroniza-
tion for active stereo on analog CRTs with a precision of 5µs − 40µs. The approach is to
use standard consumer graphics cards, abandoning hardware modifications on the graphics
devices. The synchronization is controlled by a master, which signals sync events to a number
of slaves. Master and slaves are connected in star topology.
Active stereo is displayed by preserving two buffers, one for the left and one for right eye.
With each VBLANK the source address of the displayed image is exchanged, thus switching
between the buffers. The shutter glasses for active stereo are connected to the master.
Two ingredients are necessary for the synchronization:
• detection of VBLANKS at the master and the slaves
• modification of the refresh rate at the slaves
VBLANK detection can be done in two ways:
• attaching an interrupt handler to the interrupt of the graphics card
• polling of a VGA state register
The first is obviously the more effective method, as the CPU load is minimized between the
VBLANKS. However it is implemented for NVidia graphics devices only.
The second approach supports a wider range of graphics devices. The busy waiting time
is thereby minimized by estimating the time until the next VBLANK. Within this interval
nothing is expected to happen, so the process puts itself to sleep and awakes just before the
next VBLANK.
Refresh rate modification also can be accomplished by two different methods.
• modifications of the pixelclock
• adjustments of the image geometry - adding or removing hidden columns or lines
In Softgenlock the modification of the pixelclock is bound to NVidia graphics cards. However
it can be implemented for graphics cards of other vendors too.
The latter approach, changing the frame geometry has been described in section 8. However
Softgenlock accesses the VGA registers. These registers are specified by the VGA standard.
As this standard is rather old and therefore has a number of limitations e.g. no support
for high resolutions. Furthermore, the access requires two steps: first the desired register is
written as an index into the index/data register1. Afterwards the data is read from or written
to the same register. The authors of Softgenlock report problems, that arise from this not
1address offset: 0x3C0
28
7 OTHER DISPLAY WALL SOLUTIONS AND PROJECTS
atomic registers access, as the graphics driver might access the index/data register at same.
This can lead to an index written as data, or data written as index. The results according to
the authors range from corrupted display to dead lock of the whole system. The frequency
depends on the used hardware. Due to this, the pixelclock modification is preferred.
The genlocking mechanism works as follows [1, compare p.257]:
• On detection of a VBLANK the master sends a signal to the slaves.
• Each slave measures the arrival time of the signal. It estimates the time of the VBLANK
at the master tm by subtracting the estimated signal runtime.
• Alls slaves measure the time tl of their local VBLANK.
• If the offset between the VBLANK of master and slave |tm − tl| is larger than the
accepted tolerance, the slave alters its refresh rate accordingly.
• At the detection of the VBLANK event all swap their buffers.
Two things are crucial. To calculate the exact time of the VBLANK at the master the slaves
must assume a constant delay between the VBLANK event at the master and the arrival of
the signal.
a) This requires a real time operation system to have deterministic scheduling.
b) Standard Ethernet as signal path drops out. Instead a separate cabling via Parallel Port
is used.
Master and slaves are connected via parallel port. The authors report to have on each of the
8 data pins of the parallel port up to 4 nodes connected. Thus 33 nodes can be synchronized
without spending additional effort in amplifying the signals on the wire. According to the
authors the parallel port provides a fast signaling with constant delay of around 5µs on a real
time system.
One of the shortcomings is the urge to have a real time system. The fact that each VBLANK
triggers a calculation makes overlooked VBLANK interrupts severe, especially at the mas-
ter. The display configuration via VGA registers is now, eleven years after the release of
Softgenlock, not up-to-date and not used by many modern graphics cards.
The both ways of video timings adjustments work well on analog video sinks. A CRT does
not directly receive a pixelclock signal, but is synchronized to the graphics device by the
horizontal and vertical sync signals. Our experiments showed, that digital video sources
behave different and are more sensitive to changes of video timings.
SoftGenLock is not developed further, but has inspired two derivatives: WinSGL for Windows
and Genlock for Linux systems.
29
7 OTHER DISPLAY WALL SOLUTIONS AND PROJECTS
7.3 WinSGL
WinSGL is a software solution for software genlock on Windows systems. It was proposed in
2006 at the Eurographics Symposium.
The differences compared to Softgenlock are:
• It abandons a realtime operating system and runs on a standard Windows.
• A constant runtime delay can not be guaranteed due to the non realtime OS. Thus
the clock master is a dedicated hardware, like a function generator or a microprocessor,
generating an external clock signal to which all slaves synchronize.
• The target video sinks are beamers, the target refresh rate is 60Hz and no active stereo.
• It relies on 3rd party software to do the frequency adjustments: PowerStrip2 from
EnTech Taiwan.
The targeted video sinks are digital beamers connected to the VGA port. However the authors
state that the results were tested and are also valid on devices connected to DVI.
The authors report that their experiments revealed adjustments of the pixelclock to show jitter
and distortion of the whole image. Therefore WinSGL uses no adjustments of the pixelclock.
Furthermore they describe the used digital video sinks to react very sensitive to changes on
invisible pixels. They experienced shifts of the image in all cases other than increasing or
decreasing the vertical front porch. As a result WinSGL restricts on manipulations of the
vertical front porch to achieve smooth frequency adjustments. Though the primary devices
were DLP projectors, they report the same results for two DELL LCDs.
The detection of VBLANKS is accomplished by an API call to the Windows DirectDraw
API. Detection however is not guaranteed - sometimes a VBLANK is “overlooked” due to
scheduling latencies - so WinSGL uses timestamping to cope with missed VBLANKS. In this
case it will skip one comparison and continue with the next two timestamps.
The steps of synchronization are summarized below:
1. Initialization starts with finding two modelines nearest to the genlock frequency, one
above and one below.
2. In the next step the phase offset is eliminated by reducing or accelerating the refresh
rate until |tm − tl| ≤ tolerance.
3. As soon as the phase offset is near zero, the slaves try to stay in sync. For each received
sync signal from the genlock master, the slaves compare the timestamp of arrival with
their local VBLANKs. If the deviation exceeds the tolerance range, the modeline is
switched.
The tested resolution were 1024x768 at a rate of 60Hz. The granularity between two possible
2http://entechtaiwan.com/util/ps.shtm
30
7 OTHER DISPLAY WALL SOLUTIONS AND PROJECTS
refresh rates is therefore approximately 0.07Hz−0.08Hz. To keep in sync the slaves therefore
need to switch very frequently between the two modelines.
A synchronization precision of ±30µs is reported. The results were also compared to the
performance of Softgenlock, which was stated to achieve a higher synchronization precision
of up to ±7µs.
The experienced effects of refresh rate variations described are almost contrary to our results
explained in section 8. This indicates a strong dependence of the used display hardware.
31
8 REFRESH RATE ADAPTATION
8 Refresh Rate Adaptation
One of the requirements for a display synchronization is a method to adapt the refresh rate
of the display. As already mentioned on page 12, according to equation 1 two approaches are
possible:
• changing the number of pixels transferred to the display
• variations of the pixelclock
The following part describes both approaches and the results on common graphics cards.
Tests were made primarily on Intel graphics devices of fourth3 and fifth4 generation, but also
a recent NVidia graphics card.
8.1 Display Timing on Common Graphics Devices
Resolution Variations
As depicted in figure 2 on page 11, the screen resolution consists of an active part – containing
the visible pixels – and the blanking part, that is not visible on the screen. This is illustrated
by an example.
The VESA coordinated video timings (CVT) formula delivers for a resolution of 720p at
120Hz the following modeline:
pixelclock horizontalactive
horizontal blanking verticalactive
vertical blankingfront HSYNC back front VSYNC back
162.00MHz 1280 96 136 232 720 3 5 47
Table 1: Modeline for 720p at 120Hz
The non-visible pixel margin is large enough to allow changing the number of columns in the
horizontal blanking period as well as the number of rows in the vertical blanking period. The
changes on hidden pixels do not affect the visible display resolution. Thus besides possible
switching artifacts no pertaining distortion of the image is produced.
Graphics card registers determine the display resolution. If the addresses of these registers
is known, changes can be made without a reset of the display – but still disturbances might
occur. The corresponding display resolution registers for Intel graphics cards are summarized
in the appendix on page 78. The standard resolution setting mechanism provided by the
driver will always reset the display and is therefore not usable.
The resolution changes are limited to addition or removal of whole rows or columns. By
combining horizontal and vertical direction, granularity can be increased. The refresh rate
3i965GM (Crestline)4integrated graphics in Ironlake processors
32
8 REFRESH RATE ADAPTATION
1.346 1.3462 1.3464 1.3466 1.3468 1.347 1.3472 1.3474 1.3476 1.3478 1.348
x 106
119.96
119.98
120
120.02
120.04
120.06
120.08
120.1
120.12
number of pixels
refr
esh
rate
[Hz]
Student Version of MATLAB
rv htot vtot120.119Hz 1746 771120.101Hz 1744 772120.083Hz 1742 773120.066Hz 1740 774120.049Hz 1738 775120.032Hz 1736 776120.016Hz 1734 777120.000Hz 1732 778119.985Hz 1730 779119.970Hz 1728 780
Figure 12:
difference in case of changing the number of lines is
∆vr =fphtot·(vtot2 − vtot1vtot1 · vtot2
)(10)
Analog the difference in case of changing the number of columns is
∆vr =fpvtot·(htot2 − htot1htot1 · htot2
)(11)
Combining both directions leads to
∆vr = fp ·(
1
htot1 · vtot1− 1
htot2 · vtot2
)(12)
∆vr is the change of the refresh rate, fp is the frequency of the pixelclock and htot, vtot are
the total number of pixels in horizontal resp. vertical direction, including the active portion
and the hidden pixels.
Taking the resolution in table 1 as example, removing one line increases the refresh rate by
0.16Hz. Reducing the resolution by one column changes the refresh rate by approximately
0.06Hz. A combination of horizontal increase and vertical decrease can provide a granularity
of roughly 0.02Hz − 0.015Hz. A lower pixelclock frequency requires a lower resolution to
produce the same refresh rate. Thus less hidden lines are included in the frame, which
further decreases the step size for removal or addition of columns to roughly 0.01Hz.
The results however revealed this method of refresh rate variation unfeasible for synchroniza-
tion in this project for the following reasons: The tested displays reacted to modifications of
the resolution with a black screen for roughly one second, before bringing the display content
back. The refresh rate has changed afterwards, but the blackout period is not acceptable
for the application in a display wall. The granularity of the refresh rate adaptation and the
33
8 REFRESH RATE ADAPTATION
estimated amount of jitter give reason to expect changes of the refresh rate within each ten
to hundred milliseconds. This would lead to a continuously black screen.
Pixelclock Variations
The pixelclock is produced by a frequency synthesizer on the graphics device. It is generated
by applying multipliers and divisors to a reference frequency that is provided by a quartz
oscillator.
The exact calculation formula depends on the graphics card vendor and also differs between
different hardware generations.
Equation 13 shows, how the pixelclock is generated on Intel graphics cards of generations 4
to 6.5[12]
fp =fref · (5 · (M1 + 2) + (M2 + 2))
(N + 2) · (P1 · P2)(13)
fp is the pixelclock and fref a fixed frequency (96MHz in generation four Intel devices) that
is deduced from an oscillator. M1, M2, N and P1, P2 are integer parameters that can be
adjusted within predetermined limits.[12]
The mulitplicator and divisor parameters are controlled by a number of registers. In the
appendix on page 78 this in showed in more detail. Writing into these registers controls
the pixelclock, on Intel graphics devices the change is carried out at the next VBLANK.
Pixelclock variations work mostly without visible artifacts, but occasional flickering. Some
times however the display switches off for a second, probably because the PLL in the display
lost lock on the pixelclock frequency. This happens non-deterministically.
As it can be obtained from equation 13, the step size between two dot clocks - thus refresh rates
- is neither arbitrarily small nor uniform. Fig.13 illustrates this, the appertaining parameter
sets are summarized in table 2.
5Intel graphics devices generations are summarized in the appendix on page 80.
34
8 REFRESH RATE ADAPTATION
117
118
119
120
121
122
123
refr
esh
rate
[Hz]
Dotclock on Intel Graphics
Student Version of MATLAB
Figure 13: Refresh rates around 120Hz on an Intel 965GM integrated graphics
VRate ∆fV M1 M2 N P1 P2 fV CO valid
118.58 Hz 0.14 Hz 15 8 5 1 10 1302.86 MHz y
119.04 Hz 0.22 Hz 18 7 6 1 10 1308.00 MHz y
119.41 Hz 0.36 Hz 13 5 4 1 10 1312.00 MHz y
119.83 Hz 0.42 Hz 15 9 5 1 10 1316.57 MHz y
120.14 Hz 0.31 Hz 18 8 6 1 10 1320.00 MHz y
120.38 Hz 0.24 Hz 21 7 7 1 10 1322.66 MHz y
120.57 Hz 0.19 Hz 10 7 3 1 10 1324.80 MHz n
120.86 Hz 0.29 Hz 13 6 4 1 10 1328.00 MHz y
121.07 Hz 0.21 Hz 16 5 5 1 10 1330.28 MHz y
Table 2: Parameters for different dot clocks on Intel graphics
The results above make obvious, that for a smooth synchronization the steps between two
refresh rates are too large. One possibility to deal with this problem is to switch between
two refresh rates at each VBLANK, thus the average refresh rate matches the target refresh
rate. Unfortunately it emerged that the LCDs are not able to follow such fast variations
of the pixelclock. The frequent switching produces an artificial jitter on the pixelclock, that
presumably overwhelms the PLL in the displays. As a result the displays switch the image off
until synchronization with the pixelclock is regained. It turned out, that once synchronization
is lost, the display remains black, until the pixelclock variations are suspended.
The results are different to that of the Softgenlock Project[1] as described in section 7.2.
There are two important distinctions that have to be minded. First we want to have a
continuous frequency and phase lock - instead of temporary phase adjustments to match the
phase. Second the mentioned project uses analog CRT monitors. As described in section 3
in the analog display data transmission there is no pixelclock, but the refresh rate is deduced
from the frequency of HSYNC and VSYNC. An in- or decrease of the pixelclock changes the
rate of the sync signals, but is not directly visible to the display as a changed clock.
Also the results of WinSGL[20] are quite contrary to the results presented here. While they
35
8 REFRESH RATE ADAPTATION
experienced stronger artifacts with pixelclock modifications and successfully manipulated the
number of lines in the front porch, our results rendered pixelclock adaptation more stable
and hidden pixel variations useless. It seems likely that the effects of both methods strongly
depend on the used hardware and the signal processing in the display devices. Moreover it is to
observe, that newer generations of displays reacts more sensitive to unexpected disturbances,
presumably due to an increased amount of signal processing.
8.2 Software controlled VCXO
Standard PCs usually do not provide a VCXO (voltage controlled crystal oscillator). Settop
boxes for DVB contain a VCXO to enable genlock as described in section 5, but this VCXO
is usually not accessible by custom software.
A STB including an Intel consumer electronics processor is available. The CE4100 processor
is a SoC based on the Intel Atom and enhanced by special multimedia features. STBs with the
CE4100 contain a software controllable VCXO. The SDK for the CE4100 platform provides
functions to set the VCXO voltage. The VCXO oscillates at a nominal frequency of 27MHz.
The pixelclock is derived from these 27MHz and thus can be adapted by controlling the
VCXO voltage.
The refresh rate adaptation has a very fine granularity which is determined by the sigma-delta
DAC that controls the VCXO voltage. It is applied immediately and does not produce any
artifacts.
The pull range of the quartz is specified as ±125ppm.[10] We experienced that the refresh
rate can be changed in a range of approximately ±0.025Hz.
One of the drawbacks is the very small tuning range of the refresh rate. This is not surprising
as the tolerance of a quartz in a DVB settop box is specified as ±30ppm.
That implies that building a system with a number of sync nodes, special care needs to be
taken that the tuning range of all nodes are overlapping.
As the CE4100 STBs provide a very fine granular refresh rate adaptation, they will be used
as display nodes for the prototype.
36
9 PHASE-LOCKED-LOOP
9 Phase-Locked-Loop
The core of nearly every system doing frequency and phase synchronization or frequency
estimation is a PLL. While basic design of PLLs in general is common, there are many
choices of implementation in the constructing building blocks. PLLs can be purely analog,
mixed signal, digital or in software implemented. Designing PLLs for a specific application
is non-trivial and a field of its own. The performance of a PLL determines the performance
of a communication system to a large extent.
The next part describes the operating principles of PLLs. The focus will hereby be laid on
digital PLLs, as the Synchronization Architecture described in this thesis uses a digital PLL
implemented in software - a software phase-locked-loop (SPLL). The structure and theories
apply to analog PLLs as well.
A PLL is a negative feedback system. Its goal is to lock the phase of an internal oscillator to
an external reference signal. As the frequency is the derivative of the phase,
dϕ(t)
dt= ω(t) (14)
locked frequency is a consequence of locked phase.
Figure 14 depicts the block diagram of a PLL.
Figure 14: Phase-Locked-Loop
The building blocks of a PLL are:
Phase Detector (A) Estimator of the error in the phase of the reference signal and the
internal oscillator signal.
Inner-Loop Filter (B) Lowpass filter to reduce the jitter on the phase error.
(Digitally) Controlled Oscillator (C) Oscillator which oscillates with its inherent frequency
37
9 PHASE-LOCKED-LOOP
f0 but can be controlled by an external signal to adjust its frequency within the pull
range foutmin ≤ f0 ≤ foutmax .
Phase Predictor (D) Integrator that predicts the next phase ϕlo based on the current oscil-
lator frequency and past ϕlo.
Phase Detector
The phase detector calculates the phase error θ, the difference of the phase of the external
reference signal and the internal oscillator signal predicted by the phase predictor. In digital
PLLs it is simply an adder.
θ(n) = θref (n)− θlo(n) (15)
Inner-Loop Filter
There are two sources of phase noise:
1. The external phase is disturbed by phase noise, that can be produced by various sources,
but particularly is introduced by the channel.
2. The internal phase can be disturbed by noise that is produced by the oscillator.
In the application that is described in this thesis, phase refers to clock ticks, sampled as
discrete timestamps. The term phase noise therefore refers here to deviations of the period
between timestamps, that should be equally spaced. Therefore the term jitter is used here as
a synonym to phase noise.
Figure 15: Phase Noise
The jitter is directly reflected in the phase error. To enable the PLL to follow frequency
drifts but prevent the influence of the jitter on the PLLs output frequency, the inner loop
filter reduces the high frequency components in the phase error. In a digital phase-locked-
loop (DPLL) the loop filter is usually an IIR lowpass. The choice of inner loop filter greatly
influences the behavior of the PLL and its reaction to dynamics in the inputs. It determines
the order and the type of the PLL. The order of a PLL is the highest power of the denominator
in the closed loop transfer function. The order of the PLL is always one higher than the order
38
9 PHASE-LOCKED-LOOP
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
frequency correction factor (normalized to T/2)
VC
O fr
eque
ncy
cent
ered
aro
und
f 0 (no
rmal
ized
to (
f max
−f m
in)/
2)
K = 0.5K = 1.0K = 2.0
Student Version of MATLAB
Figure 16: Relation between VCO pulled frequency and gain K
of the inner loop filter.
θ(n) is the filtered phase error which is obtained by filtering the phase error with the inner
loop filter with transfer function Hlf (z). The inner loop filter will be explained in more detail
later.
θ(n) = Hlf (θ(n)) (16)
Controlled Oscillator
The filtered phase error is used to change the output frequency of an oscillator e.g. a voltage
controlled oscillator (VCO), numerically controlled oscillator (NCO) or digitally controlled
oscillator (DCO). Without external control this device oscillates at its inherent frequency
fout = f0. By adding the filtered phase error θ(n) , it is pulled to a different frequency. This
explains the strong relation between the frequency characteristics of the loop filter and the
frequency characteristics of the PLL.
The output frequency of the PLL is calculated as
fout(n) = f0 +K · fs · θ(n) (17)
The phase error was accumulated within 1fs
, the period between two clock samples. The term
fs in equation 17 accounts for this.
A VCO typically has a limited pull range. The output frequency of the oscillator in depen-
dency of the phase error is controlled by the oscillator gain K. The gain thereby determines
the change of the oscillator frequency, if the phase error θ 6= 0, as depicted in figure 16. The
choice of the gain factor however has no influence on the limit of the pull range. A PLL with
larger oscillator gain K will reach the positive or negative limit of fout at a smaller phase
error, than a PLL with a smaller gain.
A gain that is too small leaves the pull range of the oscillator unexploited.
39
9 PHASE-LOCKED-LOOP
A larger gain reduces the acquisition time, the time until the frequency is locked. The
downside is a reduced stability against jitter and phase noise, as the impact of these on the
PLL output frequency fout is higher. A smaller gain instead leads to a more stable frequency
control and smooth fout. Thus in the choice of the gain fast settling-time has to be traded off
against stability. Application requirements and expected jitter are to be regarded. A special
role is assigned to the gain factor in PLLs of type I as described in section 9.2.
Phase Predictor
When the PLL is locked fout(n) = fref (n). The phase predictor estimates the next phase
value, based on the current oscillator frequency in equation 17 by adding the number of
expected clock ticks within one sample interval. The prediction is required to compensate
the delay induced by the digital loop filter. The transfer function is that of an integrator:
P (z) =T
1− z−1(18)
The predicted phase is the output of the integrator:
P (f(n)) = ϕlo(n+ 1) = fout(n) · T + ϕlo(n) (19)
9.1 Frequency Characteristics of PLLs
Figure 17: Frequency Ranges of PLLs
The pull range of PLLs is divided into four ranges:
lock range The frequency range in which the unlocked PLL can lock to the reference fre-
quency without skipping one or multiple periods.
pull-out range The range in which a locked PLL is able to follow a frequency step of the
reference frequency.
pull-in range The frequency range in which an unlocked PLL is able to lock onto reference
frequency, only if at least one period is skipped.
hold-in range A locked PLL can follow a slow frequency drift of the reference frequency in
finite time.
40
9 PHASE-LOCKED-LOOP
9.2 Type I and Type II Phase-Locked-Loops
The type of a PLL is determined by the number of integrators in the open-loop transfer
function. Most practical relevance have PLLs of type I and type II.
The difference between type I and II is that a type II PLL contains an integrator in the loop
filter. The effect of this integrator is a steady-state phase error of zero. The steady-state
phase error is the phase error which remains constant if the PLL is locked onto the reference
frequency.
In type I PLLs the filtered phase error controls the DCO frequency directly. If the internal
oscillator frequency f0 differs from the reference frequency fref , a constant phase error is
necessary to make both frequencies equal. This is undesirable for some applications.
The loop filter of type II PLLs integrates over the filtered phase error, thus maintaining a
non-zero output also in case of a zero phase error. Therefore the steady-state phase error
is zero. The drawback is a longer settling-time and increased overshooting compared to the
type I PLL.
0 50 100 150 200 250 300 350 400 450 500119.9845
119.985
119.9855
119.986
119.9865
119.987
119.9875
119.988
119.9885
119.989
time [s]
freq
uenc
y [H
z]
fout
fref
Student Version of MATLAB
0 50 100 150 200 250 300 350 400 450 500119.7
119.8
119.9
120
120.1
120.2
120.3
120.4
120.5
time [s]
freq
uenc
y [H
z]
fout
fref
Student Version of MATLAB
Figure 18: Settling behavior of type-I (left) and type-II (right) PLL
Special care must be taken when designing digital filters that contain integrators. The transfer
function of an integrator is
H(z) =1
1− z−1(20)
and thus has a pole at z = 1, making the loop filter potentially unstable. Shifting the pole
slightly into the unit circle by using a coefficient b1 = 1− ε, e.g. b1 = 0.98, helps to maintain
stability, but can not eliminate the steady-state phase error completely. The modified transfer
41
9 PHASE-LOCKED-LOOP
function is
H(z) =1
1− 0.98z−1. (21)
9.3 Inner Loop Filter
The loop filter considerably determines the behavior of the PLL. Its purpose is to filter the
phase error in order to eliminate the phase noise. The design of loop filters is a comprehensive
topic on its own and will not be covered here. There are filter design tools, that create filters
with the desired frequency characteristics, based on the theory of filter design.
The loop filter has some characteristics. As it is a lowpass filter it has a cutoff frequency,
frequencies above are filtered out. In the context of PLLs it is rather often referred to the
term loop filter bandwidth f0 than to the commonly used −3dB cutoff. f0 is defined as the
frequency at which an asymptote to the falloff crosses the 0dB line.
Figure 19: Bandwidth definition of loop filter
Loop Filter Design
The typical design process for digital filters follows these two steps:
1. Calculation of an analog filter with a transfer function which fulfills the given require-
ments.
2. Bilinear transform from the s-plane to the z-plane to get a digital filter.
Loop Filter Example: Butterworth Filter
Butterworth filters are IIR filters. They have a maximal flat magnitude below the cutoff
frequency and an almost linear falloff with ≈ −20dB · n/decade, where n is the oder of the
filter. This makes the Butterworth a filter often used for PLLs, although other filters provide
a sharper falloff and Butterworth filters do not have a linear phase.
The transfer function of a digital IIR filter is
H(z) =a0 + a1z
−1 + a2z−2 + . . . anz−n
1 + b1z−1 + b2z−2 + . . . bnz−n. (22)
42
9 PHASE-LOCKED-LOOP
−30
−25
−20
−15
−10
−5
0
Mag
nitu
de (
dB)
10−2
10−1
−180
0
180
360
Pha
se (
deg)
Bode Diagram
Frequency (Hz)
Student Version of MATLAB
Figure 20: Bode plots of Butterworth filters with different orders
The difference equation describing the filter output therefore is
y(k) = a0x(k) + a1x(k − 1) + . . .+ anx(k − n)− b1y(k − 1)− . . . bny(k − n). (23)
Equation 23 can be easily implemented as digital filter using 2n shift registers. In a SPLL it
requires 2n additions and 2n+ 1 multiplications. 2n values must be stored. A code snippet
implementing a SPLL is given in the appendix on page 77.
Figure 20 compares the Bode plots for four Butterworth filters of different order. The cutoff
frequency is 0.02Hz.
9.3.1 Loop Filter Design Tools
There are two helpful tools, that help in the loop filter design process. The fdatool, part of
the Matlab6 Signal Processing toolbox delivers the filter coefficients for a digital Butterworth
filter, given the cutoff frequency, sampling rate and order of the filter.
To design a digital IIR filter, Matlab follows the digital filter design principle described above:
First an analog filter is designed, which fulfills the specified requirements. After some inter-
mediate steps, Matlab uses bilinear transform to create a digital filter with the same transfer
6www.mathworks.com
43
9 PHASE-LOCKED-LOOP
function as the analog filter.7
A special PLL design tool is included in CppSim8 which is available for free. The PLL
designer of CppSim calculates coefficients for analog loop filters. The user has to specify the
loop filter bandwidth, PLL type, filter order and shape. The analog filter given by CppSim
has to be transformed to a digital loop filter with bilinear transform, to be used in a SPLL.
7see http://www.mathworks.de/help/toolbox/signal/ref/butter.html8http://www.cppsim.com
44
9 PHASE-LOCKED-LOOP
9.4 Software PLL in the Display Synchronization
Synchronization over a jitter inducing channel requires a PLL, other frequency estimations are
highly unreliable and provide poor performance. Averaging over a frequency which is constant
but disturbed by normal distributed phase noise requires an unfeasible large window. A PLL
instead is able to lock very precise onto a frequency, enabled by the negative feedback. To
emphasize this, the result of a simulation is depicted below. It compares the performance
of a 2nd order PLL and cutoff frequency of 10−2Hz, with an exponentially weighted moving
average9 filter with α = 0.2, α = 0.1 and α = 0.01. The standard deviation of the normal
distribution is σ = 10−7, thus 66% of the jitter is below 100ns. Such a small σ is far from the
conditions found on a standard Ethernet or on a (non-realtime) operating system. It is even
below the allowed 500ns of jitter in the PCR of a MPEG TS. This small σ is only chosen
here to illustrate the poor frequency estimation capabilities of an averaging method.
0 50 100 150 200 250 300 350 400 450119.97
119.975
119.98
119.985
119.99
119.995
120
120.005
time [s]
freq
uenc
y [H
z]
EWMA with α = 0.2EWMA with α = 0.1EWMA with α = 0.01PLL fout
fref
Student Version of MATLAB
Figure 21: Comparison of a PLL and EWMA averaging
This result makes obvious, that an accurate synchronization that is dependent on a noisy and
non-deterministic channel can not abandon a PLL. The frequency estimation with an EWMA
even with the smallest coefficient makes larger jumps than the pull range of the VCXO in
the CE4100 STB. Furthermore, as already stated, a jitter in the range of 100ns is absolutely
unrealistic.
9.4.1 PLL Design Choices
A PLL achieves best performance, if it is designed specifically for a certain application. There-
fore specification of the characteristics and analysis of prevalent conditions is necessary.
Measurements have shown that under the conditions valid for the prototype, the expected
jitter on network packets and local VBLANKs both is in the order of several ten µs.
9The EWMA is explained on page 68.
45
9 PHASE-LOCKED-LOOP
Matlab simulations have been made to evaluate different parameters and the effect of tweaking
these parameters. Results of these simulations are presented in the following. Yet successful
simulations in Matlab still do not guarantee that the results are directly comparable to results
experienced in the application. However they usually give a good hint, help to shorten the
design process and can be used to prove a successful application. Circumstances that lead to
a different behavior of simulation and application testing are:
• The VCXO has a limited pull range.
• Depending on the system load of the used hardware lags and late timestamps occur.
Problematic are in the first place a number of late timestamps, that are delivered to
the application all at once.
• The simulated phase noise is normal distributed and statistically independent.
• There is a delay of at least 2T between a change of the VCXO control voltage and the
detection by the PLL.
• The PLL gain must be adapted to exploit the pull range of the VCXO and can not be
adjusted freely as in the simulation.
The next part describes the evaluation and choice of some PLL loop filter characteristics. The
simulations made in Matlab are based on measurements and observations made on the target
system. Determining one parameter will also have impact on the other parameters, e.g. a
larger gain factor requires a better filtered phase error to preserve a stable VCXO frequency
output. Therefore tests of the parameters have not been made in the order described here,
but repeated with different variations.
Loop Filter Order
The order of the loop filter determines the filtering performance of the inner loop filter. A
higher order provides a sharper falloff, but at the cost of overshoot. The additional compu-
tation effort of filters with order two or three compared to order one loop filters is nearly
negligible in software. Simulations with loop filters of order one to three show the differences
in transient behavior. The simulations have been made with normal distributed jitter of mean
zero and a standard deviation of 2.5 · 10−4, thus 66% of the jitter is below 250µs. The PLL
gain is 5.0 and the cutoff frequency fc = 10−2Hz.
46
9 PHASE-LOCKED-LOOP
0 50 100 150 200 250 300 350 400 450 500−5
−4.5
−4
−3.5
−3
−2.5
−2
−1.5
−1
−0.5
0x 10
−3
time [s]
phas
e of
fset
[s]
θθ 1st order filtered
θ 2nd order filtered
θ 3rd order filtered
Student Version of MATLAB
Figure 22: Simulation of filtering performance of different loop filter orders
Figure 22 shows the phase error θ and the filtered phase error θ with loop filters of order 1, 2
and 3. As the order of the PLL is always one higher than the order of the loop filter, this is
equivalent to PLLs of order 2,3 and 4. Figure 23 shows the resulting output frequency fout.
The conditions for the simulation are the same as described above. The left image depicts the
complete simulation including transient phase, the right one is a close-up of fout in frequency
lock.
0 50 100 150 200 250 300 350 400 450 500
119.984
119.986
119.988
119.99
119.992
119.994
time [s]
freq
uenc
y [H
z]
fout 1st order filter
fout 2nd order filter
fout 3rd order filter
fref
Student Version of MATLAB
300 350 400 450
119.9879
119.988
119.988
119.988
119.988
119.988
119.9881
119.9881
119.9881
time [s]
freq
uenc
y [H
z]
fout 1st order filter
fout 2nd order filter
fout 3rd order filter
fref
Student Version of MATLAB
Figure 23: Simulation of fout with different PLL orders.
The figures show a filtering performance, that increases with the filter order. However the
better filtering is achieved by the cost of an increased settling-time. The order of the loop
filter therefore should be chosen in respect to the applications preconditions. If higher jitter
is expected, a better stability might make an increased settling-time acceptable.
47
9 PHASE-LOCKED-LOOP
Cutoff Frequency
The cutoff frequency is the most important property of the loop filter. A well chosen cutoff
frequency is necessary to achieve reasonable filtering. If the cutoff frequency is too small
the PLL has a very long settling time, require long to adapt to frequency changes and react
delayed to steps of the reference frequency. In contrast a reasonable amount of filtering is
necessary to make the PLL stable and have a smooth VCXO output frequency.
Figure 24 shows the results of different cutoff frequencies. In the left a filter of order 1 is
simulated, the right graph is an order 2 loop filter.
0 50 100 150 200 250 300 350 400 450 500119.985
119.9855
119.986
119.9865
119.987
119.9875
119.988
119.9885
119.989
119.9895
119.99
time [s]
freq
uenc
y [H
z]
fc = 10−4Hz
fc = 10−1Hz
fc = 10−2Hz
fref
Student Version of MATLAB
0 50 100 150 200 250 300 350 400 450 500119.985
119.9855
119.986
119.9865
119.987
119.9875
119.988
119.9885
119.989
119.9895
119.99
time [s]
freq
uenc
y [H
z]
fc = 10−4Hz
fc = 10−1Hz
fc = 10−2Hz
fref
Student Version of MATLAB
Figure 24: Simulation of different cutoff frequencies
The result shows a mentionable overshoot of a second order filter. A first order filter with a
cutoff frequency of 10−2Hz shows to be sufficient under the conditions valid for the prototype.
PLL Type
As framelock requires a zero phase offset, a type II PLL that eliminates all phase error
automatically would be convenient. However the limited pull range of the VCXO in the
CE4100 STBs makes it difficult to find a stable, fast and reliable type II PLL. While the PLL
tries to eliminate the phase offset and lock onto the reference frequency, the loop filter of a
type II PLL integrates the phase error. Within acquisition the integrator in the loop filter
builds up an amount for θ that is larger than is required for fout = fref . The PLL starts to
reduce θ when fout = fref . The reduction however requires some time and this leads to the
typical transient phase of type II PLLs as shown in figure 18.
On the CE4100 STBs the elimination of a phase offset however may require more than 30
seconds. Within this time such a large amount for θ will be accumulated, that fout = foutmax
or fout = foutmin . The PLL is not able to reduce θ within one period, thus one period will
be skipped. This behavior continues and the PLL becomes unstable. The result is that fout
continuously oscillates between foutmax and foutmin .
48
9 PHASE-LOCKED-LOOP
ht0 50 100 150 200 250 300 350 400 450 500
119.92
119.94
119.96
119.98
120
120.02
120.04
120.06
120.08
time [s]
freq
uenc
y [H
z]
fout
fref
foutmin
foutmax
Student Version of MATLAB
Figure 25: Settling behavior of a type II PLL with small filter gain
One possibility to prevent this is to make the control of fout very slow, by choosing a very
small gain factor K or small filter gain. However this would lead to a very long acquisition
time as is shown in figure 25.
As a result of the two reasons stated above, a type I PLL is used, trading a fast settling-time
against an automatic phase error elimination. Instead a two step solution for phase error
elimination has been worked out:
1. acquisition of frequency lock
2. iterative decreasing of the steady-state phase error towards zero.
The implementation details are explained in section 12.
49
10 NECESSITY OF SYNCHRONIZATION
10 Necessity of Synchronization
As mentioned before, active stereo on a composite display is not possible without framelock.
The functional principle of active stereo technology obviously requires synchronization on a
composite display. The stereo separation would be lost without synchronization, as the stereo
glasses let parts of both images pass to both eyes.
In case of unsynchronized refresh rates there will be a non constant phase offset. Thus the
viewer would experience a continuously changing amount of ghosting. The maximal ghosting
occurs if the switching of the stereo glasses is shifted exactly half a period against the phase
of the display. The viewer sees the same effect as if he would not wear stereo-glasses. A phase
shift of one period, 1/120Hz = 8.33ms swaps right and left frame.
Also, the effect of a missing swaplock is severe, as the nodes would consume frames at a
different rate. This leads on one hand to a discontinuous composite image, and the other
hand to frame skips.
Even the slightest difference in refresh rate is problematic, as the the phase offset is cumulated
within each period. Figure 26 illustrates the impact of frequency offsets of the pixelclock.
0 10 20 30 40 50 60 70 80 90 100
-360°
-180°
-90°
0°
90°
180°
360°
time [s]
phas
e of
fset
Expected Phase Offset
100 kHz offset (0.1%)10 kHz offset (0.01%)-50 kHz offset (-0.05%)
Student Version of MATLAB
Figure 26: Impact of frequency differences
Manufacturing tolerance of a standard quartz is around ±30ppm. Therefore even if identi-
cal hardware is used and exactly the same modeline is chosen, frequency differences can not
be avoided. Besides, that would leave correction of phase offset unsolved.
Apart from the manufacturing tolerance a quartz has a temperature dependent offset, that
50
10 NECESSITY OF SYNCHRONIZATION
can be in the same order of magnitude. Finally a quartz underlies aging which influences the
frequency around ±1ppm/year.
The frequency offsets stated above may seen to be very small. The example calculation below
shows the effects on the refresh rate. Assumed is a resolution of 1080p at a refresh rate of
120Hz. The original pixelclock is 142.625MHz.
A deviation of the quartz of 30ppm leads to an offset pixelclock of
f ′p = fp(1 + 30 · 10−6
)= 142.629MHz. (24)
The difference of the pixelclock is 4.28kHz, difference of the refresh rate is 0.0018Hz. This
leads to a 180 degree phase offset after 4 minutes and 37 seconds.
Therefore the only possibility to ensure a proper framelock and swaplock is to have a steady
synchronization of all display nodes.
51
11 SYNCHRONIZATION ARCHITECTURE
11 Synchronization Architecture
The Synchronization Architecture consists of three types of participants:
• clock master display (CMD)
• slave display node (SDN)
• frame deadline predictor
The communication is based on small UDP packets - the display clock reference (DCR)
messages. The format of the DCRs is shown in figure 28.
The CMD serves as the master clock for all other participants in the synchronization archi-
tecture. There is only one clock master allowed.
It measures its graphics cards refresh rate and maintains a framecounter. On the detection
of a VBLANK it sends a DCR packet to all receivers. It can be configured to subsample, i.e.
send a packet only every n-th VBLANK. The used hardware for the slave display nodes has
limited computation power and each received DCR packet triggers a frequency estimation
and adaptation loop. Subsampling therefore reduces the system load. As the sender includes
the framenumber into the DCR packets, the slave nodes are able to detect the subsampling
ratio automatically – no initial configuration is necessary.
The DCR packets also contain a PCR field. The PCR is generated at the master and delivers
the required information for synchronized video playback at the SDNs. The video sources can
include this PCR into the generated video streams, it provides a timebase for the decoders
at the display nodes.
The SDNs listen for incoming DCR packets and use them to generate timing information.
Independently from DCR message reception they watch their own VBLANK events. The
slave nodes comprise a SPLL which is triggered by receiving a DCR message. The SPLL
compares the local refresh rate with the refresh rate of the clock master. According to the
result the SPLL controls the local refresh rate.
Additionally all slave nodes measure the RTT to the master. The purpose is to estimate the
delay of the DCR packets, to deal with the phase offset.
The frame deadline predictor runs on the video source(s). It listens to DCR packets from
the CMD. Just as the SDNs it measures the RTT to the CMD. From the masters refresh rate,
the estimated RTT and the framenumber of the master it can predict at what time the video
source has to send a frame to the display. As the Display Wall can receive video streams
from multiple sources, the frame deadline predictor provides the possibility to relay the DCR
packets received from the CMD to multiple sources. One instances of the frame deadline
52
11 SYNCHRONIZATION ARCHITECTURE
predictor runs on each video source. One of the frame deadline predictors receives the DCR
packets from the CMD and broadcasts or mulitcasts the DCR messages to the other frame
deadline predictors. Thus the master only needs to send DCR packets to one video source
via unicast. To let the other frame deadline predictors estimate the RTT to the master, the
relaying frame deadline predictor inserts the RTT measured to the CMD in the DCR packets.
Figure 27 gives an overview of the synchronization architecture, the communication paths
and the transmitted information. The DCR messages are sent with the rate of the vertical
refresh rate vr, eventually with subsampling by factor n and k. Subsampling of the packets
for SDNs and frame deadline predictors does not necessarily need to be equal. The following
sections discuss the parts of the synchronization architecture in detail.
Figure 27: Synchronization Architecture
53
11 SYNCHRONIZATION ARCHITECTURE
11.1 Synchronization Packet Format
Figure 28: Format of the DCR packets
The DCR packets carry a 16 byte payload. They provide the necessary information for
synchronization.
The field framenumber is filled with the framenumber of the CMD. This number is increased
on each VBLANK. It is a 32-bit integer, therefore at a refresh rate of 120Hz it would wrap
around each 414 days and 6 hours. The framenumber fulfills two functions: first it lets the
clients detect packet losses or reordering; second the clients can detect the subsampling ratio.
The RTT field is used for relaying purposes at the video sources. The frame deadline predictor
provides the functionality to pass the received DCR packets to additional destinations. To
let these nodes know the RTT to the CMD the relaying node puts the estimated RTT into
the DCR packets before it broadcasts them.
A PCR is generated at the CMD, which is inserted by the video sources into the video streams
to enable synchronized playback at the display nodes. The PCR is transmitted in the PCR
field, divided in PCR base and extension as described in section 5.
Reference Clock Independent Frequency Estimation
Besides the information carried by the DCR packets, a special purpose is fulfilled: by design
the DCR messages do not contain any frequency information, but serve as a sync ping.
Sharing absolute time or frequency via the DCR packets would require, that the clocks are
synchronized, otherwise timestamps or time differences would not be reliable as the referenced
clocks might have divergent clock rates.
Instead the timestamps are generated at packet ingress. Thus each synchronized system relies
only on its own clock and all calculations are based on this local clock. This is similar to the
PTP syntonization as described in section 4.3. Absolute time is of no interest, but the clock
rates – which is here the refresh rate – must match.
54
12 IMPLEMENTATION DETAILS
12 Implementation Details
The following section details the working principles and implementation details of the com-
ponents of the synchronization architecture.
12.1 Clock Master Display
The master display estimates its refresh rate by watching the VBLANKS. There are a few
possibilities to do this:
• Hooking an event handler to the interrupt of the graphics cards. However not every
interrupt has to be VBLANK interrupt, thus for reliable detection the interrupt han-
dler must look up the interrupt source in an interrupt status register. This requires
knowledge of the graphics cards register addresses. This is the most direct method, but
requires information on the used graphics card.
• Graphics cards drivers usually implement the detection of VBLANKS. If the driver
provides an access to this functionality from outside the driver, the VBLANKS can
be queried by this API. For example the direct rendering manager (DRM) uses this
method.
• OpenGL Extensions to the X Server (GLX) includes a function call to wait for the next
VBLANK. This method requires a lot more overhead than the direct interrupt handler,
given that the VBLANK information in the end - after a number of function calls - is
delivered by the drivers interrupt handler. However it requires no technical knowledge
about the graphics card or driver and runs on most Xservers without further effort.
The current implementation of the CMD uses GLX to detect VBLANKs, as it avoid an
implementation for specific hardware. GLX provides a blocking call which returns on the
VBLANK event. This function also returns the number of the current frame since the start
of the Xserver. This framenumber is transmitted in the DCR packets to SDNs and frame
deadline predictor.
On the CE4100 STBs no Xserver is running, thus if the CMD runs on a CE4100 STB, a dif-
ferent method for VBLANK detection has to be used. A library in the CE4100 SDK provides
a function for this purpose. It is also a blocking call, which returns on the next detected
VBLANK. However it does not deliver a framecount, such that a virtual framecount must be
maintained. Missed VBLANKS must be detected by timestamping and frequency estimation,
to keep the framecount reliable.
The functional principle of the CMD is depicted in figure 29. These two steps are continu-
ously repeated:
55
12 IMPLEMENTATION DETAILS
1. With a blocking call the CMD pauses until the next VBLANK.
2. When the function returns and if the framecount fc and the subsampling ratio rS
indicate, that a DCR shall be sent, the CMD assembles the packet by filling the current
framecount into the field framenumber, sets the field rtt_useconds to zero and adds
the PCR value.
Figure 29: Flow diagram of the CMD
The PCR is calculated from 27MHz, but we do not have access to a 27MHz clock on a
standard PC. Furthermore this 27MHz clock must be synchronized to the refresh rate in
order to do reverse genlock. Therefore we create virtual 27MHz timestamps and calculate
from these a virtual PCR.
A virtual 27MHz clock tick vClk can be calculated for each frame n from the refresh rate vr
by
vClk[n] = vClk[n− 1] + 27MHz · 1/vr[n] (25)
PCR base and extension are calculated according to equation 9, the virtual PCR inserted
into the sync packet is:
PCR[n] = vCLK[n]/300 << 9 + vCLK[n]%300 (26)
All nodes in the synchronization architecture are synchronized to the refresh rate of the CMD,
thus also to the virtual 27MHz clock - even if an outstanding viewer would observe a rate
56
12 IMPLEMENTATION DETAILS
different from 27MHz.
The ICMP Echo requests that are received from the SDNs and frame deadline predictors
are automatically answered by the Linux kernel, as the sent packets conform to the ICMP
protocol. Thus no additional implementation at theCMD, has to be done.
12.2 Slave Display Nodes
The main part of the synchronization is carried out by the slave display nodes (SDNs). A
block diagram is depicted in figure 30.
Three tasks are executed concurrently and independently:
1. VBLANK Detector (VD): detection of local VBLANKs and generation of VBLANK
timestamps (marked green in figure 30)
2. Synchronization Core (SC): reception of DCR packets, and frequency adaptation
(blue)
3. RTT Estimator: estimation of the RTT (yellow)
The first and second tasks are separated due to the following reason. The reception of a
DCR packet triggers one round of synchronization: the SPLL updates all values and adjusts
the output frequency of the VCXO. In order to do this, the phase detector requires the
current VBLANK timestamp. If the phase detector would wait for the occurrence of the
next VBLANK – which pauses the process execution – , it might happen that the next DCR
packet arrives already before the VBLANK and would get lost. Therefore the handling of
VBLANKs is separated into an independent process which traces all VBLANKs. This ensures
that each synchronization round can be executed instantly.
Figure 30: Blockdiagram of the SDNs
57
12 IMPLEMENTATION DETAILS
12.2.1 The VBLANK Detector
The VD calls the blocking function wait_for_vblank(). This function returns as soon as a
VBLANK event happens and generates a timestamp tvb.
It is not guaranteed that all VBLANK events are catched. Therefore, the VD keeps track of
the refresh rate. It detects missed VBLANKs by comparing the time difference between the
current and the last VBLANK timestamp with the period of the refresh rate. A VBLANK
is assumed to be lost if: tvb(n)− tvb(n− 1) < 1.5 · 1/vr.On detection of a loss, the VD checks the number of missed VBLANKs. It can have a severe
impact on the synchronization if it is not dealt with the loss of multiple VBLANKs. The
reason is the subsampling: If subsampling is enabled, every n-th timestamp is compared. If
a number of k timestamps is not detected, with k being not a multiple of n, there will be a
jump in the phase difference.
An example is illustrated in figure 31 with a subsampling of two and one missed VBLANK.
Figure 31: Consequence of a lost VBLANK
In this example every second timestamp is compared. Before the slave has missed the
VBLANK, the phase error is small. The consequence of the not detected VBLANK is, that
the framecounter is not increased. This shifts the timestamps associated for comparison by1vr
. The phase error instantly increases by one period.
The number of missed VBLANKs is calculated by dividing the time difference between current
and the last VBLANK timestamp by the estimated VBLANK interval. The framecounter is
increased accordingly.
12.2.2 Synchronization Core
The SC controls the whole synchronization. It listens on the predefined port for incoming
DCR packets. The time of arrival is stamped by the network protocol stack. Each arriving
packet triggers one PLL synchronization round. The synchronization round comprises the
following steps:
58
12 IMPLEMENTATION DETAILS
1. calculation of phase error in positive and negative direction
2. calculation of the direction with smaller phase error
3. filtering of the phase error
4. adjustment of VCXO control voltage
5. estimation of CMD and VBLANK frequency
6. calculation of statistics
One goal was to make the PLL aware of the periodicity. For framelock it does not matter,
if one or multiple periods are skipped - as long as the association of left and right frames is
maintained.
The advantage of periodicity is, that a phase error can either be corrected by increasing or
decreasing the output frequency of the VCXO. If fout is increased, the SDN will catch up
with the CMD. If fout is decreased, the SDN lets himself fall back.
Two different strategies of phase error elimination are possible:
direction-aware choose the direction of smaller phase error
pull-range-aware choose the direction where the difference between the limit of the VCXO
pull range and the reference frequency is larger.
The first strategy is easier implementable, as for the latter the PLL must know fref , foutmin
and foutmax . This requires a longer initialization, where estimates of these frequencies are
calculated. Moreover, it requires thresholding, to prevent that the PLL compensates a very a
small phase error which is introduced by jitter, by shifting the phase of its internal oscillator
by a whole period. In section 12 both strategies are compared shortly. For the prototype the
direction-aware strategy was implemented.
For the direction-aware strategy the PLL calculates the phase difference from each DCR
packet timestamp to the VBLANK timestamp recorded before the arrival of the DCR packet
and the following VBLANK. The smaller of both is the direction of phase reduction. If fref
is not in the center of the pull range, phase error reduction in one direction takes longer than
in the other direction.
59
12 IMPLEMENTATION DETAILS
Figure 32: Direction of phase correction
The calculations are triggered whenever a sync packet arrives. To enable the calculation of
phase offset in positive and negative direction, not tdcr(i) but tdcr(i−2) is the reference phase
for offset computation. It is compared with tvb(i− 1) and tvb(i− 2). Thus the calculation is
delayed by two. This ensures that always a timestamp tvb is available which was generated
at the VBLANK after the reception of the current DCR packet. This avoids jumps in the
phase error.
The core of the SDNs is a software PLL. However there are specialties in this SPLL:
• A standard DPLL comprises a predictor with transfer function T1−z−1 , which predicts
the next phase value, based on the current VCXO or NCO frequency. Here however no
phase prediction is made – instead the next phase is directly measured.
• Though it is possible to calculate a corrected phase - the time at which the VBLANK
should have happened - it is not possible to correct the phase of the VBLANKs directly.
Instead the phase has to be shifted by increasing or decreasing the refresh rate until zero
phase offset – this is especially important for a type I PLL, as it does not automatically
eliminate the steady-state phase error.
• Usually a PLL compares clock ticks in a fixed interval. A faster clock implies more
clock ticks in this interval. Here the interval is not fixed, but timestamps which mark
the events are compared. Thus a higher number means a longer period, thus a slower
clock.
The SPLL in the SC works, as described in section 9. The DCR timestamp tdcr(n − 2) is
compared with the VBLANK timestamps (tvb(n − 1), tvb(n − 2)) to find the positive and
negative phase error.
θ+(n) = tdcr(n− 2)− tvb(n− 2)
θ−(n) = tdcr(n− 2) + tvb(n− 1)(27)
60
12 IMPLEMENTATION DETAILS
The minimum of both is used as phase error:
θ(n) = min(θ+(n), θ−(n)
)(28)
To eliminate the influence of jitter on the SPLLs frequency output the phase error is filtered
with a Butterworth filter. A first order loop filter has been implemented.The filter coefficients
can be specified in the configuration or will be set to default values.
The cutoff frequency of the filter can be chosen below 1Hz, this is still enough to follow slow
frequency drifts, but reliably filters all high frequent jitter. The structure of the filter is shown
below:
Figure 33: Loop filter structure in Direct Form I
A choice of coefficients for a first order loop filter with a cutoff frequency of 0.01Hz are:
a0 = a1 = 0.267 · 10−3
b0 = 1.0
b1 = −0.9995
(29)
Good performance provide also these filter coefficients [7, p.114]:
a0 = 0.005a1 = −0.0012b0 = 1.0
b1 = −0.9962(30)
The transfer function is
HLF (z) =a0 + a1z
−1
1 + b1z1(31)
The corresponding difference equation calculated by the loop filters is
θ(n) = a0 · θ(n) + a1 · θ(n− 1)− b1 · θ(n− 1). (32)
61
12 IMPLEMENTATION DETAILS
VCXO Gain
The output of the loop filter is used to control the frequency of the VCXO on the CE4100
STB. The phase error, thus also the filtered phase error will lay in the range of few ms to
tens of µs. Therefore the filtered phase error must be multiplicated with a gain factor to be
in a value range where it can control the frequency reasonably.
A too small gain factor does not exploit the whole VCXO range - which is already small
enough. E.g. at a DCR rate of 120Hz, the phase error can not become larger than T = 8.33ms,
therefore the gain must be at least RV /T to achieve full frequency control. RV denotes the
maximal numerical value that is accepted by the CE4100s VCXO control. However larger
gain than the minimum is preferable, as it reduces the settling time and the steady-state phase
error. (Still also a larger steady-state phase error is eliminated in a second step anyhow).
A too large gain factor lets the PLL react sensitive to phase noise and results in a less stable
output frequency. One possibility to combine the advantages of both is to have a nonlinear
gain factor. The PLL collects statistics about the difference between reference and VCXO
frequency. The gain factor is increased at a larger frequency difference, thus providing a
shorter settling time. At a small frequency difference in contrast fout is more smooth.
The risk of a non-reachable tuning range is avoided with type II PLLs, however at the price
of a larger settling time and a far more oscillating transient phase.
The SC uses two different gain factors. The gain factor to control the VCXO as described
above and a second gain factor for separate frequency estimation PLLs. The PLL responsible
for the synchronization does not calculate any absolute frequency: The filtered phase error is
the input to the VCXO, which generates a resulting frequency of fout = f0 +K · θ. The PLL
however does not know the absolute value of f0. Therefore it also can not calculate fout.
Nevertheless the SDN should know its own refresh rate and the refresh rate of the master for
three reasons:
• The initialization step - described below - needs an estimation of the masters and its
own frequency.
• The VBLANK Detector needs information about the refresh rate to detect missed
VBLANKs.
• The elimination of the steady-state phase error is dependent on detection of frequency
lock.
Frequency estimation by averaging over a number of periods is not reliable enough as shown
in section 9.4. Estimation by a SPLL in contrast delivers an accuracy in the order of 10−4Hz
at very limited cost. A SPLL is easily implemented and requires only around ten arithmetic
operations.
Therefore the SC comprises three SPLLs:
62
12 IMPLEMENTATION DETAILS
1. synchronization SPLL controlling the refresh rate
2. CMD refresh rate estimation SPLL
3. local VBLANK refresh rate estimation SPLL
The working principle of both frequency estimating SPLLs is as described in 9. Page 77 in
the appendix shows a sample implementation of a SPLL.
Detection of Frequency Lock
The SC is able to find out, if its PLL is locked to the reference frequency. For this it calculates
first and second order statistic over the difference of fref and fout. Each PLL round the
difference is calculated as
∆f = fref − fout (33)
∆f is stored in a ringbuffer which provides space to collect the last 600 frequency differences.
Every five seconds the mean of ∆f is computed as the sum of elements in the ringbuffer,
divided by the number of elements N ≤ 600:
µ =
N−1∑i=0
∆fiN
. (34)
The variance is calculated from µ and the expectation over (∆f)2:
σ2 =1
N
N−1∑i=0
(∆fi)2 − µ2 (35)
The frequency is accepted as locked, if the outcome of the mean and the variance are below
a threshold.
Elimination of Steady-State Phase Error
The PLL type is type I due to the reasons explained in section 9.4. The drawback is a non-zero
steady-state phase error. However this phase error is not acceptable, because it destroys the
stereo vision. Increasing the gain factor is insufficient, because it can decrease but not remove
the steady-state phase error. Despite, the consequence is a less stable frequency output.
Therefore a different approach is used to remove the steady-state phase error.
The idea to achieve a zero phase error and frequency lock is the following:
A steady-state phase error is required to compensate the frequency difference between fref
63
12 IMPLEMENTATION DETAILS
and fout. The equation for the PLL output frequency is replaced by
fout(n) = f0 +G ·(θ(n) + γ(n)
), (36)
If γ(n) is equal to the expected steady-state phase error, the PLL will lock with a phase error
of zero.
However the steady-state phase error depends on the difference of fref and f0. The process
of finding γ is:
1. Calculation of statistics to detect frequency lock. On frequency lock the steady-state
phase error is known.
2. Stepwise adjustment of γ:
γ = γ + k · θ(n) (37)
where k ≤ 1. Higher values of k increase the steady-state phase elimination but tend
to less stability. A k of 0.25 works reliable on the CE4100 STB.
3. A change of γ leads to a temporal increase or decrease of fout. Therefore for reduction of
phase error and regained frequency lock needs to be waited. Thus to step 1 is returned.
The whole process is iteratively executed.
γ is continuously adapted. This allows the PLL to follow also frequency drifts. Nevertheless
the iterative phase error elimination requires several minutes, depending on k. The transient
time to acquire frequency lock and zero phase error thus can be distinguished into two steps:
• First the PLL obtains frequency lock with constant phase offset. The phase error is
reduced up to the steady-state phase error as fast as the VCXO tuning range allows.
With a sufficient VCXO gain the phase error will already in this stage be reduced to
an amount of less than 1ms.
• The second stage iteratively eliminates the remaining phase error and thereby increases
synchronization accuracy
Figure 34 shows a simulation of the γ adaptation process. The left image shows the phase
error θ, the filtered phase error θ and γ. In the right the reference frequency and the VCXO
output frequency are shown. In the simulation k = 0.15 is used. The inner loop filter is a
second order Butterworth lowpass.
The steady-state phase error approaches zero, while γ is iteratively adjusted.
64
12 IMPLEMENTATION DETAILS
0 50 100 150 200 250 300 350 400 450 500−5
−4
−3
−2
−1
0
1
2x 10
−3
time [s]
phas
e of
fset
[s]
θθ filteredγ
Student Version of MATLAB
0 50 100 150 200 250 300 350 400 450 500119.98
119.982
119.984
119.986
119.988
119.99
119.992
119.994
119.996
119.998
120
time [s]
freq
uenc
y [H
z]
fref
fout
Student Version of MATLAB
Figure 34: Simulation of iterative steady-state phase error elimination
Two Step Initialization
In the following a two step initialization process is described, which aims to establish good
initial conditions for synchronization. The goal is to reduce the time that the PLL requires
until frequency lock with steady-state phase error of zero. Within this initialization synchro-
nization is not yet started.
Estimation of γ: If γ would be known in advance, frequency lock with a zero steady-state
phase error would be sped up enormously. A first initialization stage calculates an initial
value for γ. For this, the frequencies of fref and f0 are estimated by two separate SPLLs.
From the difference of fref and f0 and the approximately known VCXO range an estimation
of γ can be calculated: The VCXO range is roughly ±0.025Hz. Thus γ is calculated as
γ =
(fref − f0|Rfout |
·RV
)· 1
K(38)
where Rfout is the tuning range of the VCXO, RV is the numerical input range10 of the VCXO
control voltage and K is the PLL gain.
When the PLL has acquired stable frequency lock, γ is set to the estimated value.
VBLANK Resets: The ideal case would be a perfectly guessed γ and an initial phase offset of
(nearly) zero. The alignment of VBLANK phase is random and it is not directly influenceable.
The only possibility to change the position of VBLANKs is to force the graphics card to reset
the display output. One possibility to achieve this is to switch between different refresh rates
to prompt the graphics driver to reconfigure the display.
The display driver of the CE4100 provides a function which configures the whole display
10For the exact value see the extra appendix “Software Documentation”
65
12 IMPLEMENTATION DETAILS
settings. Calling this function twice - once with an arbitrary refresh rate and a second time
with the desired refresh rate - includes the chance, that the phase of the VBLANKs has
shifted afterwards. The implementation switches to 100Hz and then back to the original
120Hz.
A phase shift however is random from the applications view and not guaranteed. To increase
the chance an arbitrary delay of 10ms is included. The probability of a random phase offset
of zero is very small. Thus a threshold is defined within a phase offset is accepted. The
decision of a certain threshold value must be taken with care:
• A too small threshold leaves the capabilities unexploited. Nevertheless the consequence
of a too small threshold is less severe than that of a too high one.
• A too large threshold increases the problem that no phase shift within the threshold
is found. It is unfeasible not to limit the time in which VBLANK resets are tried.
First there is no determinism in this process, and no synchronization within this step is
possible. Second, at some point in time – depending on the initial phase offset and the
position of fref in the tuneable VCXO range – it would be quicker to remove the phase
offset by increasing or decreasing fout than waiting for a small random phase offset.
Time limiting the VBLANK reset stage comprises the risk, that the phase offset on
termination is larger than phase offsets “diced” before.
An empirical value for the threshold is T/4.
66
12 IMPLEMENTATION DETAILS
12.2.3 RTT Estimator
For a proper framelock and a good stereo visibility it is not enough to synchronize the
frequency, but the phase offset must be reduced to zero. To estimate the phase at the CMD,
the SDNs must know how long the DCR packet was delayed on the transmission. The
RTT estimator sends ICMP [8] packets to the CMD, conforming to the commonly used ping
method. As this is implemented on nearly every network capable machine, no additional
implementation at the CMD is necessary.
The destination address for the ICMP packets is known, as it is the source of the sync packets.
The IMCP packets are sent with type 8 (ICMP Echo Request). The payload of these
packets is a timestamp, which is generated by the RTT estimator directly before it sends the
packet. Ping packets are always returned with exactly the same payload, only the type field
gets changed to type 0 (Echo Response) and the CRC is recalculated.
The RTT estimator creates a timestamp on reception of an ICMP type 0 packet. Some fil-
tering is necessary, to avoid intermingling with ICMP packets of other sources: each ICMP
type 8 packet carries an 16-bit identifier. The identifier can be chosen arbitrarily, but as it
should be unique the process id of the sending process is a good choice.
Thus RTT estimator extracts the payload, matches the identifier number to its own process
id. If it was the origin of the ICMP echo packet it extracts the timestamp from the payload,
and subtracts it from the timestamp generated on packet ingress. The difference equals the
RTT. Included in this RTT is the processing delay at the CMD. Nevertheless this delay is
small enough to neglect it: The replay packet is formed in kernelspace and requires almost
no computations, except the for the CRC - which is a quite simple calculation. Besides, the
RTT is used to estimate the delay between the VBLANK at the sender and the arrival time
of the sync packet. Also this transmission includes some processing delay, which is corrected
by a manual offset. The processing delay of creating the RTT reply compensates therefore a
part of this offset.
The interval between two RTT estimations is configured via the configuration file. Experi-
ments showed that values in the range of several seconds work well. However the tests so
far have been made under good conditions, in a different environment another configuration
might be necessary. The interval has to be chosen small enough to let the estimated RTT
follow the dynamics of the transmission delay. Nevertheless a too small interval increases the
network load on the CMD and might, depending on the number of SDNs, increase the jitter
of the sync packets.
RTT filtering The estimated RTT is subtracted from the arrival time of the sync packets,
as an close approximation of the original VBLANK time at the CMD. Due to this, jitter
67
12 IMPLEMENTATION DETAILS
in the RTT has a linear influence on the phase error and thus also results in a less stable
frequency output of the PLL.
To avoid degradation of the framelock caused by jitter on the RTT measurements, the RTT
needs to be filtered. A good choice is an EWMA. The EWMA is an IIR filter, where older val-
ues are multiplied with decreasing weights. Its impulse response is an decreasing exponential
function. The EWMA is calculated according to
y(n) = α · x(n) + (1− α) · y(n− 1). (39)
For filtering the RTT two characteristics have to be traded off:
• fast adaptation to step-like changes in the RTT
• sufficient filtering of random jitter
Figure 35 shows different values of α. The RTT has mean 0.2ms between 0 and 35s, afterwards
the RTT jumps to a mean of 0.5ms. The standard deviation is 0.1.
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
RT
T [m
s]
time [s]
filtered, α = 0.1
filtered, α=0.2
filtered, α=0.4measured RTT
Student Version of MATLAB
Figure 35: Comparison of EWMA filter results for different α
From the figure it can be seen that for coefficient α = 0.1 it takes the EWMA more than 20s
to adapt to the new RTT. With α = 0.2 it instead takes around 7s, of course at the price of
less smooth filtering.
Figure 36 compares the filtering performance for update intervals of 1s and 5s, both with
α = 0.25.
68
12 IMPLEMENTATION DETAILS
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
RT
T [m
s]
time [s]
filtered, 1s intervalfiltered, 5s intervalmeasured, 1s intervalmeasured, 5s interval
Student Version of MATLAB
Figure 36: Filtered RTT with different update intervals
The choice of EWMA filter coefficient α and the update interval need to be chosen carefully
in respect to the expected amount of jitter.
12.3 Frame deadline Predictor
To ensure steady and interruption-free video frame playback on master and slave displays
the video source needs to know at what time each frame has to be put onto the network.
Highly undesired are artifacts produced by missing or late frames. If the video source sends
frames at a higher frequency than the displays can consume, the end to end delay increases,
which should be prevented as well. Especially in the case where the video frames come from
realtime renderers, it should be ensured, that the rendering nodes have information how fast
the frames need to be generated.
In situations where the video source and the displays are not on the same network, largely
dynamic RTT has to be dealt with, to ensure that increasing RTT does not empty the buffers
at the display nodes unreversibly.
The calculation of the right duty time for each frame needs three informations:
• The framenumber of the frame that is currently displayed at the display nodes.
• The rate of frame consumption at the displays: the refresh rate.
• The round trip time between the CMD and the video source.
The precondition is complete synchronization between the CMD and the SDN, ensuring that
the framenumber and the refresh rate of the master is the same also for all slaves. Estimating
the RTT only to the master, neglecting a different RTT to the slave nodes is based on the
assumption, that – as master and slave displays are arranged to a composite display – the
69
12 IMPLEMENTATION DETAILS
RTT is nearly the same.
The framenumber is contained in the payload of the DCR packets. The refresh rate is deduced
from intervals between the DCR packets. Each time a DCR packet is received, the frame
deadline predictor generates a timestamp with its local clock. This timestamp is the input
to the SPLL. In the SPLL the timestamp is compared with a predicted timestamp. This
timestamp is an estimation at what time regarding the local clock the next DCR packet
arrives. If this expectation deviates from the time when the DCR message is received, the
SPLL adjusts the estimated frequency.
The estimated RTT is the delay to the source of the DCR packets. This source is not
necessarily the same as the CMD, as the packets may also be received from another frame
deadline predictor that is in relaying mode. Therefore the value of the rtt field in the DCR
packet is added to the estimated RTT.
The frametime ft(n) for a certain frame number n is then calculated by
ft(n) = (n− frame(t)) · 1
vr+ rtt,
where frame(n) is the framenumber of the currently displayed frame,
vr is the estimated refresh rate based on the local clock and
rtt is the estimated round trip time, consisting of the RTT to the source of the DCR messages
and the value in the rtt_useconds field in the DCR packets.
Figure 37: Frametime Predictor
Fıgure 37 shows the sequence diagram of the frame deadline predictor.
70
13 MEASUREMENTS AND SYNCHRONIZATION PERFORMANCE
13 Measurements and Synchronization Performance
Measurement Methods
To prove the correct synchronization different external measurement methods have been
tested:
• Making the pixelclock visible on an oscilloscope.
• Tapping the voltage of the IR diode which synchronizes the shutter glasses, and display-
ing it on an oscilloscope.
• Observing the synchronization with shutter glasses and judging the visual quality.
The pixelclock is transmitted on a pin pair as a differential voltage signal. To monitor changes
of the refresh rate, the voltage signal was tapped and displayed on an oscilloscope. While
it was possible to observe changes of the pixelclock on Intel graphics devices, the range of
the VCXO frequency control is below the precision of measurement. The pixelclock signal is
too noisy to determine the frequency exactly by the averaged estimation of the period length.
Thus it is not possible to measure a change of the pixelclock on the CE4100 STBs reliably.
As good solution to prove the synchronization of the frequency has evolved the tapping of
the IR diode voltage. The shutter glasses are synchronized by an IR diode, which is driven
by a rectangle voltage signal, with a frequency of half the refresh rate. The frequency of the
IR diode connected to the CMD is used to trigger the oscilloscope. The voltage from an IR
diode connected to the SDN is displayed as second signal. If the frequencies of CMD and
SDN are equal, no drift of the signals may occur on the oscilloscope. Thus it was possible
to verify, that the SDNs successfully acquire frequency lock. The elimination of phase offset
instead can not be proved, as a time difference between both voltage signals does not directly
correspond to a phase offset.
An effective method to validate not only frequency lock but also phase offset elimination is
to observe the displays simultaneously with shutter glasses. It turned out, that the human
brain is very sensitive to even small phase offset and ghosting. Though it presumably depends
on the viewer, the stereo effect vanishes at a phase offset of roughly 1.5ms, but is already
severely degraded at offsets of more than 1ms. This also depends on the stereo emitter. Not
all emitters leave the LCDs of the shutter glasses opened over the whole period of the frame.
Some IR emitters close both eyes at the beginning and the end of the frame period. This
ensures that ghosting is eliminated, also in case of not perfectly synchronized stereo glasses.
Of course this increases the amount of light that is filtered by the glasses to more than 50%.
To observe the tracking behavior of the PLL a variable phase offset can be set by the user.
71
13 MEASUREMENTS AND SYNCHRONIZATION PERFORMANCE
Synchronization Performance
The software displays values calculated in the PLL. This are e.g. the phase error and filtered
phase error, the frequency of the DCR packets and the frequency of the VBLANKs. As
external measurements verified the synchronization to work, these values may be regarded as
trustworthy.
From the reported values it can be withdrawn that in case of frequency lock the difference
between reference frequency and local refresh rate is in the range of 10−5Hz. The phase error
in case of frequency lock and elimination of steady-state phase error is in the range of ±50µs
- ±100µs.
Settling-Time
An important characteristic is the settling-time. Though the settling-time usually is particu-
larly determined by the PLL parameters, here the small pull range of the VCXO is the main
limiting factor. While the PLL is in acquisition and reduces the phase offset, it decreases or
increases fout. However the frequency difference between f0 and the pull range limit is very
small compared to the length of one period. Therefore reduction of the phase offset always
requires a notable amount of time. Below an example is given, which describes the worst
case, if the phase error is corrected using the pull-range aware strategy.
The phase offset that can be decreased within one second is:
∆θ/s = fref
(1
fref− 1
fout
)=
(1−
freffout
)(40)
Assuming f0 = fref , that is the frequency of CMD and SDN are equal. The frequency
difference between fref and foutmax or foutmin is roughly ±0.025Hz. Thus equation 41 becomes
∆θ/s =
(1− 1
1− ±0.025Hzfref
)(41)
At a refresh rate of 120Hz, the slave can therefore catch up or let himself fall back 208µs per
second, thus reduce the phase offset about roughly 9◦/s. Here 360◦ corresponds to 2 frames,
thus two periods. At a phase shift of one period – 180◦ – the frames for right and left eye
are swapped. A 180◦ offset can be corrected by swapping the buffers for right and left eye,
thus the phase offset can be at most one period. If the phase offset is corrected using the
direction-aware strategy, the maximal phase offset is ±T/2.
In the case that the reference frequency is near to the VCXO limit, it might be faster to
correct a phase offset of T always in the same direction, than T/2 in the smaller direction,
but at slower speed.
72
13 MEASUREMENTS AND SYNCHRONIZATION PERFORMANCE
In the example above, using the pull-range-aware strategy, correction of a phase error of T
requires about 40s and is the worst case. If the phase error instead is reduced using the
direction-aware strategy, this would be the ideal scenario and require only half of the time.
Furthermore the pull-range-aware strategy requires a longer initialization to estimate the pull
range limits.
Nevertheless the worst case in the error-direction aware strategy would be that the PLL tries
to catch up a phase error of T/2, with e.g. fref = 0.9999 · foutmax . Thus the acquisition time
theoretically can become infinite. As a best master clock selection algorithm aims to find the
clock which is nearest to f0 of the most slaves, this case is less probably. Still it may leave
single displays with outlying quartz frequencies unsynchronized, which is unacceptable for
the Display Wall. Therefore the pull-range aware strategy is the better choice.
Scaling for a Larger Number of Nodes
The synchronization architecture itself scales well also for a larger number of nodes. The net-
work load for each additional display node is – at the the side of synchronization – negligible.
As the DCR messages are transmitted by broadcast or multicast no additional sync packets
are necessary. The additional display node will send and receive ICMP Echo packets, but at
a very low rate of one packet each n seconds.
However each display node needs its own video stream, thus each additional SDN adds traffic
to the network. The number of displays is therefore limited by the bandwidth of the IP
network and the point where congestion produces so many packet losses that a synchronization
is not reliable any more.
However, assuming a video bitrate of 5Mbps the traffic for 20 display nodes will only consume
10% of the available bandwidth of gigabit Ethernet. In this example each additional node
increases the network load by 0.5% of the bandwith.
One possibility to reduce the impact of network load on the jitter of DCR packets would be
to include a QoS priority handling for the DCR packets.
73
14 OUTLOOK
14 Outlook
The prototype has proven the synchronization to work accurate and provide enough stability
to create a good visual quality for stereo content. It is expected to scale also for a larger
number of displays.
It has shown that after acquisition of frequency lock, good tracking performance is provided.
The used STBs maintain the VCXO frequency, once it is set. Thus also subsampling the
refresh rate to a lower DCR packet rate is possible – presumed a good filtering of the jitter.
Even a temporary loss of network packets due to loss of the network connection does not
break down the synchronization. Within this time synchronization is suspended, but the
STBs maintain the last set refresh rate. Thus the displays will maintain framelock under
the following conditions: the displays have already acquired frame lock and the frequency
drift of the oscillators at slave and master is negligible. Of course with increasing time
of suspended synchronization it is more likely that the framelock might get lost. At the
moment when packet reception continues, the synchronization is resumed and will regain
accurate synchronization.
There are some points, where improvements of the synchronization performance might be
achieved.
• Further effort can be put on the loop filter of the PLL. The synchronization architecture
was developed and tested in a very favorable environment, with low network traffic.
Other preconditions may require PLL characteristics that tend more towards stability
and increased phase noise filtering.
• All software is currently implemented in user space. Shifting the detection of VBLANKs
into kernel space might reduce the jitter on the local refresh rate estimation, as this
reduces the number of function calls between the VBLANK event and the timestamp
generation. The timestamps at ingress of DCR packets are already in this implementa-
tion created by the network card driver, thus in kernel space.
Looking back, it was shown that a synchronization of active stereo displays is possible without
dedicated synchronization cabling, over an IP network. We have found a hardware that
features a refresh rate control which works good for a very precise synchronization. While
precision might be increased with further effort, this might not be necessary, as the precision
provided by the prototype is sufficient to ensure a framelock that is accurate enough to create
a good stereo impression.
74
References
References
[1] J. Allard, V. Gouranton, G. Lamarque, E. Melin, and B. Raffin. Softgenlock: Active
Stereo and Genlock for PC Cluster. In Proceedings of the Joint IPT/EGVE’03 Workshop,
pages 255–260, 2003.
[2] D. K. Banerjee. PLL performance, simulation, and design. Indianapolis and IN, 2006.
[3] Digital Display Working Group. Digital Visual Interface. 1.0 edition, 1999.
[4] J. Eidson. IEEE 1588: An Update on the Standard and its Application. 2006.
[5] G. M. Garner. IEEE 802.1AS and IEEE 1588. 2010.
[6] HDMI Licensing, LLC. High-Definition Media Interface Specification. Number 1.4. 2010.
[7] T. Herfet. Future Media Internet: Video- & Audiotransport - A new Paradigm. 2009.
[8] IETF. Internet Control Message Protocol. Number RFC 792. 1981.
[9] IETF. Network Time Protocol Version 4: Protocol and Algorithms Specification. Number
5905. 2010.
[10] Integrated Device Technology. IDT6V49061A: VCXO Audio/Video Clock Generator.
2011.
[11] Intel Corporation. Intel CE platform references (confidential).
[12] Intel Corporation. Intel 965 Express Chipset Family and Intel G35 Express Chipset
Graphics Controller PRM: Programmer’s Reference Manual (PRM). 2008.
[13] ISO/IEC. Information technology — Generic coding of moving pictures and associated
audio information: Systems. Number 13818-1:2000(E). 2 edition, 2000.
[14] J. Kiszka, B. Wagner, Y. Zhang, and J. Broenink. RTnet - A flexible Hard Real-Time
Networking Framework. 2005.
[15] Nirnimesh and P. J. Narayanan. Scalable, Tiled Display Wall for Graphics using a
Coordinated Cluster of PCs. 2006.
[16] NVIDIA Corporation. NVIDIA Quadro G-Sync II User Guide. http://de.download.
nvidia.com/nvidia/Quadro/PDFs/Quadro_GSync_5800_4800_install_guide.pdf,
2011-11-26.
[17] M. H. Perrott. PLL Design Using the PLL Design Assistant Program. 2008.
[18] K. B. Stanton. 802.1AS Tutorial. 2008.
[19] M. Waschbusch, D. Cotting, M. Duller, and M. Gross. WinSGL: Software Genlocking
for Cost-Effective Display Synchronization under Microsoft Windows. 2006.
[20] M. Waschbusch, D. Cotting, M. Duller, and M. Gross. WinSGL: synchronizing displays
in parallel graphics using cost-effective software genlocking. Parallel Computing, vol.
75
Appendix References
33(6):420–437, 2007.
[21] H. Weibel. IEEE 1588, Standard for a Precision Clock Synchronization Protocol. 2006.
[22] H. Weibel. Technology Update on IEEE 1588: The Second Edition of the High Precision
Clock Synchronization Protocol. 2009.
[23] X.org Foundation. Development/Documentation/HowVideoCardsWork. http://wiki.
x.org/wiki/Development/Documentation/HowVideoCardsWork, 2011-01-21.
76
Appendix A SPLL CODE SNIPPET
A SPLL Code Snippet
The following listing shows a minimalistic example of a SPLL for comparison of timestamps.SPLL.c -- Printed on 21.12.2011, 14:23:51 -- Page 1
double SPLL(t_dcr) {
static double theta_filtered;
static double theta[2];
static double fout; // output frequency
static double t_osc; // predicted oscillator timestamp
// SPLL parameters:
double a0 = 0.00013;
double a1 = 0.00013;
double b1 = -0.99997;
double f0 = 120.0;
double K = 1.0;
// phase detector
theta[0] = t_osc-t_dcr;
// inner loop filter
theta_filtered = a0*theta[0] + a1*theta[1] - b1*theta_filtered;
// VCXO control
fout = f0 + K*theta_filtered[0];
// phase prediction
t_osc= 1/fout + t_osc;
theta[1] = theta[0];
return fout;
}
D:\Uni\MA\Code\SPLL.c -- File date: 21.12.2011 -- File time: 14:23:45
Figure 38: Minimalistic SPLL code snippet
77
Appendix B DISPLAY TIMINGS ON INTEL GRAPHICS CARDS
B Display Timings on Intel Graphics Cards
The following part shortly summarizes the registers that must be accessed for refresh rate
adaptation on Intel graphics devices. As this method of refresh rate adaptation is not used in
the developed prototype, only a coarse overview is given. More details can be found in [12].
Pixelclock Calculation
The exact calculation of the pixelclock on Intel graphics depends on the graphics chipset. In
most chipsets it is calculated by:
dotclock =refclock · (5 · (M1 + 2) + (M2 + 2))
(N + 2) · (P1 · P2).
The integrated graphics in the Intel Atom, named Pineview uses this formula:
dotclock =refclock · (M2 + 2))
N · (P1 · P2).
On Pineview the reference clock is 120MHz, on devices of generation four it is 96MHz.
The limits of the parameters are summarized in table 3
parameter min max
M1 10 22
M2 5 9
N 1 6
P1 1 8
VCO 1750MHz 3500MHz
Table 3: Parameters for pixelclock on Intel graphics devices of fourth generation
The Intel graphics devices have two pixel pipes, denoted with A and B. All registers exist
once for pipe A and once for pipe B. The registers that determine the pixelclock parameters
are summarized shortly below. More details can be found in [12].
DPLL Control Register
DPPLA/DPLLB control register (0x06014/0x6018)
- bit 31 controls/indicates which Pipe is currently in use (A/B)
- bit 8 controls which one of the DPLL Divisor Registers (e.g. FPA0/FPA1) is used
- bits 25:24 determine P2
- bits 23:16 determine P1
78
Appendix B DISPLAY TIMINGS ON INTEL GRAPHICS CARDS
31 07
FP[A,B]0 P1 Post Divisor ¹
8
FP[A,B]0FP[A,B]1
selector ¹
¹ Only valid on [DevCTG]
912131415
0
1623
FP[A,B]0/1 P1 Post Divisor
2425
FP[A,B]0/1 P2 Post Divisor
2627282930
0
DPLL[A,B] VCO Enable
DPLL[A,B] Mode Select ²
0 1²
² 1 0 in LVDS mode (only on mobile devices)
Figure 39: DPLL Control Register
- bits 7:0 determine P1 (only on DevCTG11)
DPLL Divisor Register
31 05
FP[A,B] M2 Divisor ¹
678131415162124252627282930
0 0 0 0 0
¹ register value is two less than actual value
FP[A,B] M1 Divisor ¹FP[A,B] N Divisor ¹0 0 0 0 0 0
Figure 40: FPA Control Register
FPA0/FPA1/FPB0/FPB1 DPLL Divisor register (0x06040/0x06044/0x06048/0x0604C)
- two registers for each pipe, which contain the same parameters; switching between both
registers is possible
- bits 21:16 determine parameter N
- bits 13:8 determine M1
- bits 5:0 determine M2
B.1 Display Pipe timing registers
The register that determine the display resolution are summarized in table 4:
11Intel R© GM45 Chipset
79
Appendix B DISPLAY TIMINGS ON INTEL GRAPHICS CARDS
register name address offset function
HTOTAL A/B 0x60000/0x61000 bits 31:16 horizontal totalbits 15:0 horizontal active
HBLANK A/B 0x60004/0x61004 bits 31:16 hblank endbits 15:0 hblank start
HSYNC A/B 0x60008/0x61008 bits 31:16 hsync endbits 15:0 hsync start
VTOTAL A/B 0x6000C/0x6100C bits 31:16 vertical totalbits 15:0 vertical active
VBLANK A/B 0x60010/0x61010 bits 31:16 vblank endbits 15:0 vblank start
VSYNC A/B 0x60014/0x61014 bits 31:16 vsync endbits 15:0 vsync start
Table 4: Resolution registers
B.2 Intel graphics devices generations
Generation 2 i830, 845g, i85x, i865
Generation 3 i915g, i915gm, i945g, i945gm, Pineview
Generation 4 i965g (Broadwater), i965gm (Crestline), G33, G45, GM45
Generation 5 Ironlake
Generation 6 Sandy Bridge
Table 5: Generation of the Intel GMAs
80
Appendix List of Figures
List of Figures
1 Projected Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 pixel alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Pixel transmission on the display cable . . . . . . . . . . . . . . . . . . . . . . 11
4 NTP Synchronization Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 PTP Syntonization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6 Synchronization with 802.1as capable bridges . . . . . . . . . . . . . . . . . . 19
7 PCR in Mpeg Transport Stream . . . . . . . . . . . . . . . . . . . . . . . . . 21
8 PCR at transmitter and receiver . . . . . . . . . . . . . . . . . . . . . . . . . 22
9 Active Shutter Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
10 Left- and right-handed circular polarized waves . . . . . . . . . . . . . . . . . 24
11 HDMI 1.4a Frame Packing (compare p.8[9]) . . . . . . . . . . . . . . . . . . . 25
12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
13 Refresh rates around 120Hz on an Intel 965GM integrated graphics . . . . . . 35
14 Phase-Locked-Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
15 Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
16 Relation between VCO pulled frequency and gain K . . . . . . . . . . . . . . 39
17 Frequency Ranges of PLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
18 Settling behavior of type-I (left) and type-II (right) PLL . . . . . . . . . . . . 41
19 Bandwidth definition of loop filter . . . . . . . . . . . . . . . . . . . . . . . . 42
20 Bode plots of Butterworth filters with different orders . . . . . . . . . . . . . 43
21 Comparison of a PLL and EWMA averaging . . . . . . . . . . . . . . . . . . 45
22 Simulation of filtering performance of different loop filter orders . . . . . . . . 47
23 Simulation of fout with different PLL orders. . . . . . . . . . . . . . . . . . . 47
24 Simulation of different cutoff frequencies . . . . . . . . . . . . . . . . . . . . . 48
25 Settling behavior of a type II PLL with small filter gain . . . . . . . . . . . . 49
26 Impact of frequency differences . . . . . . . . . . . . . . . . . . . . . . . . . . 50
27 Synchronization Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
28 Format of the DCR packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
29 Flow diagram of the CMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
30 Blockdiagram of the SDNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
31 Consequence of a lost VBLANK . . . . . . . . . . . . . . . . . . . . . . . . . 58
32 Direction of phase correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
33 Loop filter structure in Direct Form I . . . . . . . . . . . . . . . . . . . . . . 61
34 Simulation of iterative steady-state phase error elimination . . . . . . . . . . 65
35 Comparison of EWMA filter results for different α . . . . . . . . . . . . . . . 68
36 Filtered RTT with different update intervals . . . . . . . . . . . . . . . . . . . 69
37 Frametime Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
38 Minimalistic SPLL code snippet . . . . . . . . . . . . . . . . . . . . . . . . . . 77
81
Appendix List of Tables
39 DPLL Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
40 FPA Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
List of Tables
1 Modeline for 720p at 120Hz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2 Parameters for different dot clocks on Intel graphics . . . . . . . . . . . . . . 35
3 Parameters for pixelclock on Intel graphics devices of fourth generation . . . 78
4 Resolution registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5 Generation of the Intel GMAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
82
Appendix Glossary
Glossary
BMCA best master clock algorithm - algorithm in PTP that aims to find the most stable
and accurate clock amongst multiple candidates and hierarchical topologies builds up
a loop free clock distribution tree.
CMD Clock Master Display - node which provides the frequency information for the other
display nodes.
CVT Coordinated Video Timings - formula which defines the size of blanking on LCDs for
each display resolution, specification from VESA.
DCO Digitally Controlled Oscillator - oscillator whose frequency output is digitally con-
trolled.
DCR Display Clock Reference - small UDP packets that provide information for synchroniza-
tion.
DPLL Digital Phase-Locked-Loop - digital, time discrete PLL.
DTS Decoding Timestamp - timestamp relative to the PCR which tells the receiver the
correct decoding time of a frame, used in MPEG TS and MPEG PS.
DTV Digital Television Set.
EWMA Exponentially Weighted Moving Average - IIR filter with exponentially decreasing
impulse response.
GM grandmaster - root node that provides the master clock in 802.1as clock trees.
GTF General Timing Formula - formula which defines the size of blanking on CRTs for each
display resolution, specification from VESA.
HBLANK horizontal blanking interval - the time in which the electron beam of a CRT drives
from the end of the line to the beginning of the next line; still present in digital video
outputs.
HSYNC horizontal sync interval - a signal on the video cable which signaled the CRT to
begin the next line.
MPEG TS MPEG transport stream - used in not error free environments, i.e. DVB broad-
cast.
NCO Numerically Controlled Oscillator - oscillator whose frequency can be set by numerical
values.
83
Appendix Glossary
NTP Network Time Protocol - network protocol for clock synchronization, also over large
distances and many hops.
PCR program clock reference - 27MHz timestamp in MPEG TS, sampled in 90kHz.
PLL Phase-Locked-Loop - closed loop feedback system for frequency and phase synchroniza-
tion.
PTP Precision Time Protocol - network protocol for sub-µs precise clock synchronization.
PTS Presentation Time Stamp - timestamp relative to the PCR which tells the receiver the
correct playout time of a frame, used in MPEG TS and MPEG PS.
SCR system clock reference - 27MHz timestamp in MPEG PS, sampled in 90kHz.
SDN Slave Display Node - node in the Display Wall which synchronizes to the CMD.
SPLL Software Phase-Locked-Loop - PLL implemented purely in software.
STC System Time Clock - system clock.
VBI Vertical Blanking Interval.
VBLANK vertical blanking interval - the time in which the electron beam of a CRT drives
from the bottom right to the top left corner; still present in digital video outputs.
VCO Voltage Controlled Oscillator - oscillator whose frequency can be influenced by an
external control voltage.
VSYNC vertical sync interval - a signal on the video cable which signaled the CRT to begin
the next frame.
84