Active-Stereo Synchronization of multiple Displays via Ethernet

Active-Stereo Synchronization of multiple Displays via

EthernetMaster Thesis

Saarland University

Faculty 6 - Natural Sciences and Technology I

Computer and Communications Technology

submitted by: Julian Metzger

on: January 2, 2012

supervisor: Dipl.-Ing. Jochen Miroll

1st examiner: Prof. Dr.-Ing. Thorsten Herfet

2nd examiner: Prof. Dr.-Ing. Philipp Slusallek

Masterarbeit

Master ThesisB.Sc. Julian Metzger

Topic:Active-Stereo Synchronization of multiple Displays via Ethernet

Tiled displays, video walls and virtual reality (VR) installations typically consist ofmultiple identical displays such as a number (n) of CRT-monitors, LCDs or projec-tors, creating an immersive experience by capturing a large part of the field of vision.Composition of the set of n displays in an (x.y=n) setup enables an increased resolu-tion compared to a single display. In order to create a single virtual canvas, the dis-plays have lo be frame-locked (Framelock) and the content sources have to begenerator-locked (GenLock).

In this work, Framelock of n displays driven by n independent PCs, where onedisplay serves as the clock master, shall be realized via Ethernet at an accuracy thatis sufficient tor l2OHz active-stereo while ghosting artifacts are limited. A scalablemechanism and the software framework for this purpose shall be established andresults of an evaluation of the accuracy in theory and by measurement shall be ob-tained.

This topic includes the following tasks:

o Brief description of real-time Ethernet and comparison of Ethernet clocksynchronization protocols and their implementations, such as NTPv4, IEEE1588 (PTP) and 802.1as.

. Description and analysis of GenLock mechanisms as used in MPEG Trans-port streams (H.222.01SO/|EC 13818-1) when implemented in software onPCs, as well as digitalvideo signal (DVliHDMl) generation and Genlock.

. Description and evaluation of runtime refresh rate varialion and its artifacts.o Summary of stereoscoprc display technologies and of scalable, tiled (VR)

display wall projects as provided in the literature, and their requirements.. Description of possible lP-based, scalable multi display Framelock architec-

tures in which one the displays serves as the master clock.o Design, implementation and evaluation of a prototype for active stereoscopy.o Measurement of synchronization accuracy and long term stability in the

presence of background traffic and extrapolation of the results for many(n t 8) displays.

Software development may be based upon pre-existing open source projects and/orvideo driver code. The prototype shall consist ol at least three display nodes, forwhich hardware is available. Measurements may be obtained by synchronous dis-play of "dummy" images.

Betreuer: a

. "Q-7a1_tL

Dipl.- lng. Jochen Miroll

LehrstuhlfürNachrichtentechnik

FR Informatik

Prof. Dr. Th. Herfet

Universität des SaarlandesGampus Saarbrückenc6 3, 10. OG66123 Saarbrucken

Telefon (0681) 302-6541Telefax (0681) 302-6542

www. nt.unl-saarland.de

UNIVERSIT

Eidesstattliche Erklärung Ich erkläre hiermit an Eides Statt, dass ich die vorliegende Arbeit selbstständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe.

Statement under Oath I confirm under oath that I have written this thesis on my own and that I have not used any other media or materials than the ones referred to in this thesis.

Einverständniserklärung Ich bin damit einverstanden, dass meine (bestandene) Arbeit in beiden Versionen in die Bibliothek der Informatik aufgenommen und damit veröffentlicht wird.

Declaration of Consent I agree to make both versions of my thesis (with a passing grade) accessible to the public by having them added to the library of the Computer Science Department. Saarbrücken,…………………………….. …………………………………………. (Datum / Date) (Unterschrift / Signature)

Contents

Contents

Contents 4

1 Introduction 6

2 Project Description 7

3 Refresh Rate and Display Timing 10

4 Synchronization Techniques on Ethernet 13

4.1 Realtime Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2 Network Time Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.3 Precision Time Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 802.1as . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Genlock in MPEG 20

6 Display Technologies for Stereo Vision 23

6.1 Active Shutter Stereo Display Technology . . . . . . . . . . . . . . . . . . . . 23

6.2 Polarization Stereo Display Technology . . . . . . . . . . . . . . . . . . . . . . 24

6.3 HDMI 1.4a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

7 Other Display Wall Solutions and Projects 27

7.1 Hardware Genlock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7.2 SoftGenLock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7.3 WinSGL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

8 Refresh Rate Adaptation 32

8.1 Display Timing on Common Graphics Devices . . . . . . . . . . . . . . . . . . 32

8.2 Software controlled VCXO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

9 Phase-Locked-Loop 37

9.1 Frequency Characteristics of PLLs . . . . . . . . . . . . . . . . . . . . . . . . 40

9.2 Type I and Type II Phase-Locked-Loops . . . . . . . . . . . . . . . . . . . . . 41

9.3 Inner Loop Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

9.3.1 Loop Filter Design Tools . . . . . . . . . . . . . . . . . . . . . . . . . 43

9.4 Software PLL in the Display Synchronization . . . . . . . . . . . . . . . . . . 45

9.4.1 PLL Design Choices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

10 Necessity of Synchronization 50

4

Contents

11 Synchronization Architecture 52

11.1 Synchronization Packet Format . . . . . . . . . . . . . . . . . . . . . . . . . . 54

12 Implementation Details 55

12.1 Clock Master Display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

12.2 Slave Display Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

12.2.1 The VBLANK Detector . . . . . . . . . . . . . . . . . . . . . . . . . . 58

12.2.2 Synchronization Core . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

12.2.3 RTT Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

12.3 Frame deadline Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

13 Measurements and Synchronization Performance 71

14 Outlook 74

References 76

A SPLL Code Snippet 77

B Display Timings on Intel Graphics Cards 78

B.1 Display Pipe timing registers . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

B.2 Intel graphics devices generations . . . . . . . . . . . . . . . . . . . . . . . . . 80

List of Figures 82

List of Tables 82

Glossary 84

5

1 INTRODUCTION

1 Introduction

Composite displays built from LCDs are an appropriate solution for large screen sizes. These

display walls can provide a very high resolution of more than 10 mega pixels and a very high

pixel density, as each single LCD in the composite display already can feature at least roughly

2 mega pixel. The LCDs provide a brilliance in color, that is unmatched by digital video

projectors and the content of LCDs is visible also in bright environment.

High performance projectors are very expensive compared to LCDs, and already consumer

LCDs provide a high image quality. Furthermore no canvas – which is also quite expensive

for large screen sizes – is necessary.

Today stereo capable displays are available at moderate prices, allowing to build up a large,

several times FullHD, and stereoscopic composite display.

There are several commercial and non-commercial software solutions for connecting a number

of displays into one large screen. These solutions however have in common, that the displays

are connected to the content generating nodes by dedicated display cabling like DVI or HDMI.

Connecting multiple PCs to the composite display is complex and costs time to set up. A

reconfiguration requires reconnecting cables and reconfiguring software.

The Display Wall project that is investigated at the Intel Visual Computing Institute in

Saarbrucken aims to build a stereo capable composite display. Exceptional is that the only

connection of the displays is an IP network. Configuration and transfer of the video content

is purely software based, encapsulated in IP packets.

To enable a good visual quality and facilitate the impression of one composite display, with

seamless images across the borders of the single screens, a tight time synchronization of the

displays is necessary. Furthermore it is essential for the goal to make the Display Wall stereo

capable. Missing synchronization already disturbs the visual quality for two-dimensional

content, but will completely destroy any stereo effect and leave the spectator with doubled

images, and severe ghosting.

Not only the display nodes but also the image generating nodes should be synchronized to

transmit the content right in time and match their rendering rate to that of the sinks.

As the only connection to the outside is an IP network, the synchronization is required to run

over Ethernet and abandon any additional synchronization cabling. The topic of this thesis

is the synchronization of active-stereo displays over Ethernet.

6

2 PROJECT DESCRIPTION

Figure 1: Projected Prototype

2 Project Description

The goal of the project described in this thesis is to implement and evaluate a software

based method to synchronize a number of display nodes assembled to a composite display.

A prototype with three synchronized display nodes was built to demonstrate and test the

development.

The requirements designated are:

• The synchronization should be precise enough to enable active-stereo vision.

• The synchronization should rely on Ethernet only and abandon any dedicated, addi-

tional cabling for synchronization.

• The synchronization must not create artifacts that affect the visual quality.

• The software architecture should scale to a larger number of displays nodes.

Figure 1 illustrates the projected prototype.

One of the display nodes, the clock master display (CMD), serves as master clock and provides

7


a reference clock to a set of slave display nodes (SDNs). Each node is connected to a stereo

capable display, driven with a refresh rate of e.g. 120Hz. The master is connected to the slave

nodes via Ethernet and periodically broadcasts synchronization information for the slaves to

adapt to the masters exact refresh rate. The stereo glasses are separately synchronized to

one display node via infrared.

Talking about video synchronization and composite displays there exist several terms to

describe the level of synchronization:

Genlock The video output of a system is synchronized to an external clock signal (generator

lock).

Framelock At least two nodes that display frames at exactly the same rate and with the

same phase have framelock. Framelock is a crucial requirement for active stereo vision.

Swaplock Applications that run on different nodes but generate content for a composite

display need to swap their buffers at exactly the same time. This is ensured by swaplock.

Framelock is an indispensable requirement for displaying active stereo on a composite display.

Missing framelock does not only disturb the impression of the composite image, as the stereo

shutter-glasses are synchronized only to one display node. Without framelock, the stereo

vision will vanish on all but this one displays, as no stereo separation will be possible.

The developed synchronization architecture is to be included in the Display Wall project later

and will display video streams received over Ethernet. Thus also synchronization with the

video sources is necessary: the generation rate of video frames must match the consumption

rate at the video sinks in order to avoid buffer over- and underrun. This can be achieved with

genlock. However, the restrictions on frequency are much tighter at the video sinks. Also

it is not guaranteed that the video sources are on the same network as the video sinks. A

genlock from a video source over a large distance would complicate the synchronization due

to larger jitter and delay, possible packet reordering and packet loss.

For these reasons it was decided to use a reverse genlock. Instead of the video source clock-

ing the displays, the master display provides information to the video source about frame

consumption rate and video generation requirements.

Also swaplock is a necessary ingredient for the final Display Wall. Swaplock ensures, that

all video sinks display the correct video frame. This is important especially for video scenes

that contain movements as it avoids incoherence between single parts of the composite im-

age. The synchronization architecture itself does not provide swaplock, but it supports the

implementation of swaplock by providing the necessary information.

The main goal of this work is to ensure proper and accurate framelock but it also provides

the necessary information for genlock and swaplock.

The following sections 3, 4, 5, 6 and 9 will unroll theoretical background regarding synchro-

nization and display technologies.

8


Section 7 summarizes earlier display wall projects and commercial solutions, section 8 explains

approaches and the results of three different refresh rate variation methods.

Section 11 describes the synchronization architecture - as implemented in the prototype - to

build a scalable, synchronized display wall for active stereo.

Section 12 explains the details of the implementation.

9

3 REFRESH RATE AND DISPLAY TIMING

3 Refresh Rate and Display Timing

The following section explains the details of refresh rate and display timing.

The image on a display device is generated at some rate by the graphics device and sent to

the display. The display refreshes the currently shown image with the new pixel data. At this

time LCDs are the common display devices and have nearly displaced CRTs. The creation

of the visible image is completely different but the format of the pixel data sent via cable to

the displays is still the same.

To understand the background of refresh rate generation one has to look onto the image

generation of analog CRTs: In a CRT an electron beam moves over the screen and excites

a fluorescent material to emit light. Each pixel is drawn serially and separately onto the

screen, starting from the upper left corner and proceeding line by line to the right bottom.

At each line end the beam needs to be steered from the right edge back to the beginning of

the next line. This action is called the horizontal retrace. Within the retrace the beam must

be switched off, to prevent drawing unwanted pixels onto the screen. The phase within the

beam is switched off and is horizontally retraced, is called the horizontal blanking interval

(HBLANK).

To signal the monitor the end of the line, the horizontal sync interval (HSYNC) is included

at the end of each line. It is positioned within the HBLANK. To allow the analog voltage

signal on the display cable to stabilize before and after HSYNC, two additional margins are

inserted, the front porch and the back porch.

The same applies in the vertical direction. At the end of the last bottom line, the electron

beam is required to travel back to the upper left display corner. Therefore the vertical blank-

ing interval (VBLANK) follows after the last line of the visible image. The vertical sync

interval (VSYNC) instructs the electron beam to retrace. A vertical front and back porch is

included as well.

A periodic refresh of the image is necessary, to create an image that not appears flickering

to the viewer. The fluorescent material has a specific afterglow. If the pixel is not re-drawn

within this afterglow period, the pixel will darken out and vanish. This periodic refreshing of

each image pixel is called the refresh rate – in the following denoted as vr – and is identical

to the rate of the VSYNCs. CRTs require – depending on the hardware and susceptibility of

the viewer – refresh rates of at least 75− 80Hz for an undisturbed image perceptibility.

As the electron beam requires a few µs to retrace, the length of the blanking periods must be

sufficient. The lengths of the blanking periods for CRTs are specified by the Video Electronics

Standards Association (VESA) in the general timing formula (GTF). Figure 2 shows the

geometry of the image.

10


Visible Display Area

HS

YN

C

H_ACTIVE

H_TOTAL

V_B

LAN

K_S

TA

RT

V_B

LAN

K_E

ND

V_S

YN

C_E

ND

V_S

YN

C_S

TA

RT

VSYNC

Figure 2: pixel alignment

On the cable each pixel is transferred after each other in one continuous stream. This is

illustrated in figure 3

Figure 3: Pixel transmission on the display cable

The image generation of LCDs is completely different than that of CRTs. A LCD has a

fixed number of pixels, which are accessed by a matrix. Once a pixel is switched on, it

theoretically needs to be changed only, if the image content changes. Though the DVI

specification already mentions a selective refresh, commonly a periodic refresh with fixed

rate is used. As no flickering occurs, the refresh rate on LCDs is typically chosen as 60Hz.

On digital television sets (DTVs) refresh rates that are equal or multiples of video frame rates

11


are preferred, e.g. 60Hz · 1000/1001 = 59.94Hz = 2 · 29.97fps.

The format of the video output of the graphics card is the same also for digital display devices.

All periods described above are contained in the video signal. Basis for the transmission in

DVI and HDMI is the TMDS link. Each TMDS link contains three data channels, that carry

10-bit symbols that are created of 8-bit pixel data each. Each TMDS link can be clocked

with a rate of up to 165MHz. This clock is transmitted on the display cable and is called the

pixelclock or dotclock. On digital display devices this pixelclock is used for synchronization of

display and graphics card instead of the HSYNC and VSYNC. One TMDS link is mandatory,

the second one optional. Intel graphics devices only support one TMDS link.

The pixelclock is directly related to the refresh rate. The pixelclock is the rate at which each

single pixel is transmitted. The refresh rate rv is determined as the pixelclock fp divided by

the total number of displayed pixels.

rv =fp

htot · vtot(1)

Equation 1 makes obvious that two possibilities exist to change the refresh rate.

1. A change of the denominator, the number of pixels transferred to the display.

2. Modification of the numerator, the pixelclock.

Both methods were evaluated, the approach and the results are explained in detail in section 8.

12

4 SYNCHRONIZATION TECHNIQUES ON ETHERNET

4 Synchronization Techniques on Ethernet

Synchronization between remote systems is crucial to many applications. There are several

points that influence the accuracy of synchronization over networks:

• frequency stability and deviation of quartz oscillators

• accuracy of timestamp generation on ingress and egress of network packets

• delay and jitter induced by the network

• network topology

• delay and scheduling indeterminism of the operating systems

The following part gives an overview over realtime Ethernet and time synchronization proto-

cols.

4.1 Realtime Ethernet

Standard Ethernet lacks realtime capabilities. It does not provide reliable transmission nor

sticks to deterministic time constraints.

Indeterminism in 802.3 is introduced at several points:

• no realtime scheduling of the OS

• dynamic address resolution

• collisions on the shared medium

• delay and lost packets due to congestion

There exist currently a number of realtime Ethernet solutions, mostly commercial, that aim

to overcome the problems stated above.

Two concepts of realtime can be distinguished:

hard realtime Missing a deadline is not tolerable and is regarded as a failure of the system.

soft realtime Missing a deadline results in degraded system performance but the system

remains functional.

Approaches to make standard Ethernet realtime can be made on all network layers. The

effectiveness however increases towards the physical layer.

One method to enable soft realtime is to introduce a priority scheme. Depending on the

deadline and importance of the data a priority number is assigned. Network devices at the

nodes and switches antedate the transmission of packets with a higher priority.

However this can not guarantee determinism. In case of many senders transmitting high

priority packets still congestion or packet losses can occur.

13


Achieving hard realtime usually requires modifications on the lower layers. In switched net-

works switches with realtime extensions are necessary. The principle in most approaches to

establish a guaranteed transmission delay and bandwidth is to introduce TDMA. A master

assigns timeslots to the nodes in the network. In each timeslot only one specific node is al-

lowed to send data. An accurate time synchronization is required between master and slaves

as the slaves must transmit exactly at the admeasured time.

4.2 Network Time Protocol

The Network Time Protocol (NTP) was invented in 1985. Its goal is to synchronize clocks

on distant nodes. The techniques and algorithms it uses enable precision of double-digit

milliseconds, in ideal cases in LANs up to a few milliseconds. The base of the protocol is a

periodic exchange of synchronization messages containing timestamps. Meanwhile version 4

of the protocol is up to date.

The NTP timestamp contains a 64-bit value, which consists of an unsigned 32-bit seconds field

and an unsigned 32-bit fractional seconds field, which gives an accuracy of 2−32s = 232ps and

a range of 232s = 136.19years. For special purposes also a 32-bit short format and a 128-bit

date format are available.

The synchronization is based on calculations with four timestamps, which are exchanged in

the NTP packets. In the following one synchronization round is described, as depicted in

figure 4: Peer A – in the role of a polling client – sends a NTP message containing timestamp

t1(t) to peer B. Peer B generates timestamp t2(t) on packet ingress. It builds a reply packet,

inserts t1(t) and t2(t) and adds t3(t), the timestamp generated at the time of egress. When

the reply packet arrives at peer A, peer A generates timestamp t4(t).

Figure 4: NTP Synchronization Diagram

From these timestamps peer A is able to calculate the round-trip delay δ(t), the time that the

packet needs to travel one round, excluding the processing time at the remote node. Knowing

the round-trip delay, the client is able to compute offset θ(t), the time difference between t2(t)

and the correct time at the master in the moment t2(t) was generated. The calculation of

14


θ(t) assumes a symmetric path delay.

δ(t) = (t2(t)− t1(t)) + (t4(t)− t3(t)) (2)

θ(t) = (t2(t)− t1(t))− δ(t)/2 =(t2(t)− t1(t))− (t4(t)− t3(t))

2(3)

To increase accuracy and minimize the impact of jitter and temporarily increased round-trip

delay due to congestion NTP uses a clock filtering algorithm. For each incoming NTP packet

from different servers, statistics besides θ(t) and δ(t) are calculated. Based on these statistics

only the good NTP time servers are chosen as time reference. Details can be found in [9].

4.3 Precision Time Protocol

The Precision Time Protocol (PTP) , standardized as IEEE 1588, was developed to provide

finer time precision and an increased accuracy compared to NTP. It was proposed in 2002, in

2008 it was updated to the second version. It can provide accuracy up to ns range. It includes

an algorithm to build up a hierarchical clock tree, with one root clock, the grandmaster (GM).

The GM is typically connected to an external high precision clock, e.g. GPS or radio clock.

One of the key differences compared to NTP is the separation of the synchronization in two

separate steps:

• Synchronization of the oscillator frequency at the slaves to the reference frequency of

the GM.

• Calculation of the delay to the GM to determine the absolute time.

The process of clock synchronization is called syntonization. For syntonization the PTP

server, the clock master, periodically broadcasts packets with timestamp tm to the client, the

PTP slave. At each arrival of a syntonization packet, the client generates timestamp ts. The

slave is in sync with the master, if

ti+1m − tim = ti+1

s − timand

ti+1s − ti+1

m = tis − tim(4)

15


Figure 5: PTP Syntonization

The follow_up messages shown in figure 5 are one of two syntonization options. In the two

step option the client creates its local timestamp on the reception of the sync packet. The

masters timestamp is received afterwards in the follow_up message. This option is included

in the standard to allow nodes take part in the synchronization which are not able to alter the

packet content on the fly. This could be e.g. PTP capable Ethernet bridges. Alternatively

PTP allows to abandon the follow_up packet, if instead the timestamp is sent within the

sync packet. The sync and follow_up messages are sent as multicast, thus allows all clients

to receive syntonization.

In IEEE 1588 version 1 the sync message contains the timestamps for syntonization and

the information for building a clock tree in hierarchical network topologies. To increase the

precision, the second version of PTP splits the sync packets of PTP version 1 into a sync

and an announce message. Syntonization is based on the sync packets and the announce

messages are exchanged between the network nodes to build the clock tree. For the sync

packets version 2 defines packet rates up to 1/8s, the highest rate in version 1 is 1s. The

separation into sync and announce reduces the overhead, as the clock tree informations can

be updated with a lower rate.

One of the most important ingredients is a hardware timestamping mechanism. Timestamps

that are generated on the application layer suffer from varying scheduling and processing

delay, induced by passing the packets through the network stack. Timestamps that are

generated by the network hardware preserve the exact moment of packet ingress also when

the PTP packet is handled with delay by the PTP application. Therefore IEEE 1588 capable

network hardware includes a hardware timestamping mechanism between the MAC layer and

the physical layer and thus increases the synchronization performance of PTP.

The syntonization provides for the correct frequency at the client. To remove a possible

offset between the slave and master clock, PTP proceeds similar as NTP. The clock offset is

calculated in the same way as in NTP, given in equation 3. Timestamps t1 and t2 are already

available from the last sync packet. To get timestamps t3 and t4, the slave sends a delay_req

packet to the master. Timestamp t3 is generated at packet egress. The master responds with

a delay_resp packet, which contains t4, the time of packet arrival at the master.

If multiple master clocks are available, PTP employs an algorithm, the best master clock algo-

16


rithm (BMCA), to find the most accurate and stable master clock. In hierarchical topologies

with multiple switches which can act as slave and master, a loop free configuration is found

by the BMCA.

Network switches and routers usually lead to a degradation of the synchronization, as they

introduce queuing delay, that is dependent on the network load and not deterministic from

the nodes view. IEEE 1588 capable routers and switches address this problem. Besides clock

master and slave two additional clock types are defined:

• boundary clocks (BC)

• transparent clocks (TC)

A BC acts as slave and a master simultaneously. On the slave port it receives the synchroniza-

tion information from the GM or another BC and synchronizes its local clock to the master.

On the other ports it acts as a master and provides synchronization to connected slave clocks.

Sync packets will not be forwarded through a BC.

Transparent clocks relay all sync messages. However they communicate the delay introduced

by buffering in their queues by altering the sync packets respectively the follow_up messages.

Two types of TCs are described by IEEE 1588:

The end-to-end transparent clock measures the time in which the sync packet was delayed

in the switch, the resident time. For this, timestamps on ingress and egress are taken, the

resident time is the difference. The resident time is communicated to the slave clocks via a

correction field in the PTP packets. As mentioned above this can happen either directly in

the sync packet or in a separate follow_up. The delay at the slave clocks is measured using

the delay_req and delay_resp messages as described above. For precise calculation of the

resident time, the clock of the TC needs to be syntonized but not synchronized.

The peer-to-peer transparent clocks measure additionally to the resident time also the link

delay to their direct neighbors. When a sync packet travels to the slave clock, the link delay

and the residence time is summed up in the correction field. Thus all information for syn-

tonization and synchronization is provided by the sync packets, respectively the follow_up

message. No additional delay_req and delay_resp packets need to be exchanged between

slave and master. This decreases the load on the GM in large networks.

One application field for PTP is realtime Ethernet, where it provides accurate synchronization

for TDMA.

4.4 802.1as

802.1as is a standard developed by the AVB group of 802.1. It is closely related to IEEE 1588,

being a subset of that standard composed for a specific application scenario and enhanced

for use also in 802.11 wireless networks. It is designed to synchronize clocks in heterogeneous

17


bridged networks with a deviation of at most ±500ns. The purpose is particularly the syn-

chronized playback of audio and video streams. It assumes quartz oscillators which conform

to a maximum offset of ±100ppm and a frequency drift of at most 1ppm/s. It defines several

default values where PTP offers a parameter set, e.g. it specifies the syntonization rate to be

1/8 s and the delay calculation interval as 1 s.

It features an automatic selection of the best clock, the GM, and construction of a hierarchical

clock distribution tree. A clock that can serve as GM sends announce messages. If a GM

receives announce messages of a more precise GM, it abandons to announce itself. Clock

aware bridges relay only the announce messages of the best GM. In the end only one GM

will be left and supports the whole network with its clock.

A particular role play the 802.1as capable bridges, which are similar to the peer-to-peer TCs

in IEEE 1588. Each bridge measures on each port the delay to its neighbor, using the metrics

described in NTP and PTP, equation 2. Additionally the bridges determine the clock ratio

rc of all neighbors compared to their own clock for each link:

rc =tn(i+ 1)− tn(i)

tl(i+ 1)− tl(i). (5)

Where tn are the timestamps received from the neighbors, tl are the locally generated times-

tamps. The exchange and generation of timestamps is done as the syntonization in PTP.

No GM is needed for the calculation of delay and clock ratio to the neighbor nodes. As soon

as a GM announces itself, synchronization is gained within a short period, as the necessary

values are already calculated. The clock is propagated from the root clock towards the leafs

of the clock distribution tree. The clock ratios on the clock path are cumulated to compute

the ratio between GM and the local clock.

Rc = Rcn + (1.0− rc) (6)

Rc is the clock ratio between local clock and GM, rc is the clock ratio between local clock

and the neighbors clock in direction of the master clock. Rcn is the cumulated clock ratio

between neighbor and GM and is initialized with 1. The delay is the sum of all propagation

and processing delays.

When clock ratio and the cumulated delay to the clock source is available at a client, it is

able to calculate the correct time as:

ts(t)!

= tm(t) + ∆(t) =

= tm(t) + (∆p(t) + ∆r(t)) + δpn(t) ·Rc(t) =

= tm(t) +∑i

(δip(t) + δir(t)

)Ri

c(t) + δpn(t) ·Rc(t)

(7)

ts(t) is the correct time at the slave node, tm(t) is the timestamp sent from the clock master,

18


Figure 6: Synchronization with 802.1as capable bridges

∆p(t) is the cumulated propagation delay on the path and ∆r(t) the cumulated resident delay.

i is the number of 802.1as bridges in the path between master and slave. Rc(t) is the clock

ratio as given in equation 6 and Ric(t) is the cumulated clock ratio between bridge i and the

GM. δpn(t) is the estimated propagation delay to the slaves direct neighbor, δip(t) and δir(t)

is the path delay and resident delay for each bridge.

The propagation and resident delays in figure 6 are assumed to be already relative to the

GM’s clock ratio.

19

5 GENLOCK IN MPEG

5 Genlock in MPEG

Broadcasters transmit MPEG transport streams at a predefined frame rate. The renderers

must play the received transport stream at exactly the same rate, otherwise buffer over-

or underruns will occur. This would lead to frame skips respectively frame repetitions. It

is crucial to eliminate the slightest clock deviation, as even a small frequency difference

cumulates a phase offset with each played frame. Therefore the frequency of the receiver must

be synchronized to the senders frequency, which is accomplished by a genlock mechanism.

Besides the synchronization of sender and receiver in broadcast mode also synchronization at

the playback of files on a local system is necessary. Corresponding audio and video streams

need to be played out at a matching rate - called lipsync - to provide smooth playback

and correct timing. Thus also a MPEG program stream need to include timing information

though the the playback only affects one local system.

ISO/IEC 13818-1 [13] describes the insertion of timestamps in MPEG 2 transport and pro-

gram streams.

The genlock mechanism is implemented by including reference timestamps into the MPEG

transport and programs streams. In transport streams these timestamps are called program

clock reference (PCR), program streams refer to system clock reference (SCR). The function

is basically the same, so the description using the term PCR applies to SCR as well.

The timestamps represent a 27MHz system clock. As nearly all devices deduce their system

frequency from a 27MHz quartz, the PCR timestamps are a direct representation of that

clock. The timestamps are sampled with 90kHz, which is 1/300 of 27MHz.

The timestamps are expressed in 42-bit, a 33-bit PCR base field, which encodes the 90kHz

value. The remainder is encoded in the 9-bit PCR extension field. Though the maximal value

fitting in the 9 bit would be 512, the remainder wraps at 300. Figure 7 shows the position of

the PCR field in the MPEG TS.

20

5 GENLOCK IN MPEG

Figure 7: PCR in Mpeg Transport Stream

The current system clock can be calculated from both PCR fields as

PCR(i) = PCR base(i) · 300 + PCR ext(i). (8)

The separation into base and extension from the current system clock value is retrieved as

PCR base(i) = PCR(i)/300

PCR ext(i) = PCR(i)%300(9)

As figure 7 indicates, the PCR is an optional field in the MPEG TS packets. The reason is,

that not each packet carries a PCR, but the PCR is periodically inserted. ISO/IEC 13818-1

specifies the maximal intervals between two consecutive PCR values, 40ms for TS and 100ms

for PS. The maximum of allowed jitter is ±500ns.

Figure 8 illustrates how the PCR is used to synchronize sender and receiver. The sender

generates the 42-bit PCR base and PCR extension values using a counter on its system time

clock (STC). The PCR is used to ensure synchronization in the encoded video and audio

streams. The PCR timestamps are multiplexed together with encoded audio and video data

into the MPEG TS.

21

5 GENLOCK IN MPEG

Figure 8: PCR at transmitter and receiver

The receiver demultiplexes the TS and extracts the PCR timestamps. The PCRs is used for

two purposes:

• The receiver synchronizes its own clock to the clock of the sender. The synchronization

of the STC is accomplished by a phase-locked-loop (PLL). Section 9 describes the PLL

in detail. As soon as the PLL has locked onto the senders frequency, the delay between

the encoding and the decoding of the video will be constant. The maximum amount of

jitter that is allowed in the PCR of MPEG TS is ±500 ns, thus the receiver is able to

acquire lock in less than a second.

• The packet elementary streams (PES) carried in the MPEG TS contain Presentation

Time Stamp (PTS) and optional Decoding Timestamp (DTS), both are relative to the

PCR. The DTS delivers the information, at what time the receiver has to decode a

frame. This is necessary, if the frames arrive in different order than the frames have to

be decoded.

The PTS carries the information at which a frame should be played out. DTS and PTS

are values ahead of the current time, but limited by ISO/IEC 13818-1 to 1s. Thus a

receiver must have enough buffer to store at least as many frames as equivalent to 1s

of play time.

22

6 DISPLAY TECHNOLOGIES FOR STEREO VISION

6 Display Technologies for Stereo Vision

The key for making a two-dimensional image on a flat screen visible as a three-dimensional

object is to produce and separate two images, one for each eye.

In the field of computer and television displays there are two competing technologies, active

shutter technology and polarization. Both require 3D glasses that carry out the separation

of the stereo images.

Current research tries to supersede the necessity to wear glasses, by transferring the image

separation into in the the display integrated parallax barriers. Though there are already a

few autostereoscopic displays available and this techniques probably will spread, but they

still have drawbacks that rule them out for the composition of a display wall: they usually

require the spectator to stand still in front of the screen. Eyetracking methods coping with

this problem still support only a limited number of viewers.

6.1 Active Shutter Stereo Display Technology

The active shutter technique interleaves the frames for right and left eye in time. The stereo

glasses consist of two small LCDs that can be independently switched see-through or opaque.

The glasses alternatingly change from opaque to translucent, synchronized to the refresh rate

of the stereo display, thus only the corresponding eye can see the current frame. The eye on

which the LCD is switched to opaque sees actually nothing, but with a sufficient refresh rate,

typically 120Hz, the brain substitutes the image it has seen last. In the head the images from

both eyes are assembled to a stereo impression.

120 Hz

60 Hz

60 Hz

60 Hz

Figure 9: Active Shutter Technique

The refresh rate of the display is halved by the stereo glasses. Thus the refresh rate should

be doubled compared to two-dimensional mode. The drawback of the shutter technique is

a reduced brightness due to the switching LCDs in the glasses. The perceived brightness is

23


determined by the on and off-time of the glasses, but at most 50%. The advantage is no

reduction of the visible resolution.

6.2 Polarization Stereo Display Technology

The polarization stereo display technique makes use of polarized light and polarizing filters.

The two frames for the right and the left eye are interlaced into one frame. The odd lines

belong to the frame for one eye, the even ones are for the other eye, both are separated by

emitting them with different polarized light.

There are two types of polarization, that can be used.

• linear polarization

• circular polarization

The viewer wears glasses with polarizing filters. Each of the glasses lets only one direction

of polarization pass, thus the lines of the interleaved frame are separated. In this technique

both image parts from the stereo image are seen at the same time.

Linear polarization is the simpler form, also allowing cheaper filters. However linear polariza-

tion is not rotation invariant, so it lets the viewer loose the stereo vision, if he tilts his head.

Therefore circular polarization is the preferred method.

Figure 10: Left- and right-handed circular polarized waves

Opposing to the shutter technique there is no flickering, but the interlaced image provides

only half of the resolution of 2D mode.

6.3 HDMI 1.4a

HDMI 1.4a is the part of the HDMI 1.4 standard, that describes the format of the transmission

of stereo images from the graphics device to the display. It is to be expected, that the support

24


Figure 11: HDMI 1.4a Frame Packing (compare p.8[9])

of HDMI 1.4a in future stereo capable devices will increase and it might also be the choice

of video format in the final Display Wall. Therefore it should be explained here shortly.

Before the introduction of HDMI 1.4a there were two methods to bring stereo data onto

displays, both described above:

• time interleaving the frames for right and left eye, which doubles the refresh rate com-

pared to that of 2D content

• frame interleaving both two stereo fields into one frame, reducing the vertical resolution

by a factor of 1/2

HDMI 1.4a defines three frame packing methods. While the video sinks need to be capable of

HDMI 1.4a it is possible to generate HDMI 1.4a conforming video formats on graphics cards

supporting only HDMI 1.3 by using custom modelines.

Several formats are proposed for addition to the standard in future, currently these modes

are defined and mandatory for a HDMI 1.4a capable sink:

• Frame Packing

• Side-by-Side (Half)

• Top-and-Bottom

All three modes transmit the fields for the left and right eye in one stereo frame. The modes

Side-by-Side (Half) and Top-Bottom divide the resolution in horizontal respectively vertical

direction and attach both frames together. Both method differ to the frame interleaving only

in the alignment of rows or columns. However, the format is suitable for both stereo display

technologies. The display hardware can either rearrange the pixels to an interleaved frame

for the polarization technique or display both halves of the frame alternatingly and scaled to

fullscreen for active shutter technique.

Interesting is the mode Frame Packing. It assembles the frames for left and right eye in

vertical direction into one “superframe”. The superframe has a resolution that is twice as

high as the 2D frames plus an additional margin between both stereo fields. In this mode the

pixelclock is doubled compared to 2D video.

At a video display which uses the shutter technique, both stereo fields from the superframe

25


are shown alternatingly. Thus the refresh rate at the video sink doubles, compared to the

refresh rate of the graphics device.

This mode is an alternative for synchronization. The graphics cards would measure a frame

rate that equals the refresh rate for one eye and the synchronization would be based on this

frequency. The displays automatically double the refresh rate and flips between the frames

for right and left eye. It turned out, that also mixed HDMI 1.4a frame packing at a refresh

rate of e.g. 60Hz and time interleaved stereo frames at a rate of 120Hz on different nodes

can be synchronized. The visible results on the display are the same, but the pixel transport

and the VBLANK rate at the display nodes are different. The resolution at a refresh rate of

60Hz is limited to 720p.

26

7 OTHER DISPLAY WALL SOLUTIONS AND PROJECTS

7 Other Display Wall Solutions and Projects

Display walls are a solution, if large displays areas are necessary and the use of beamers

or special super large displays is not possible. Therefor a number of different solutions for

composing display walls exist. A selection of Display Wall technologies and projects will be

described in the following section and related to our project.

7.1 Hardware Genlock

Solutions to build a display wall with synchronized displays comes from different hardware

manufacturers. One example comes from NVidia and is presented here. NVidia high end

graphics devices of the NVidia Quadro series can be connected to an additional NVidia

Quadro G-Sync card. The G-Sync card augments an synchronization interface to the graphics

cards, such that the Quadro graphics devices can be put into slave mode and follow the

timing received at the input ports of the G-Sync card. G-Sync cards of remote hosts can

be connected via CAT5 patch cables, to relay a framelock and eventually a genlock signal

between the connected graphics cards. Though the solution of Nvidia uses the same cables

as Ethernet, the both are incompatible with each other, as signals and voltage levels are

different.

Each G-Sync card has two framelock ports that can serve either as input or output port.

Thus the framelock server can provide synchronization signals to at most two clients, each

client can relay the synchronization signal to another client. Thus this solution requires the

nodes to be connected by a daisy-chain.

Additionally to the framelock it is possible to feed a genlock signal to the master by connecting

a genlock source to the genlock connector on the G-Sync card. The master following the

genlock synchronizes also the clients to the genlock. Additionally to the framelock the NVidia

graphics driver provides an GLX extension for swap lock.

NVidia states its synchronization solution the be precise below scanline level. This means

that the phase offset of all synchronized displays will not be larger than the period of one

horizontal line. That is at a display resolution of 1080p and refresh rate of 120Hz less than

±10µs.

Though dedicated hardware synchronization solutions are very accurate, there are some short-

comings:

The specialized hardware is quite expensive, though it has the possibility to connect two

screens to each graphics device. The daisy-chain setup is vulnerable compared to a broadcast

or hierarchical scenario, if one of the slaves fails. All slaves behind the faulty slave will be cut

off and loose synchronization. In case of permanent failure, cables need to be reconnected.

Furthermore it needs a dedicated cable connection purely for the synchronization.

27


7.2 SoftGenLock

The software SoftGenLock[1] was released in 2001. Its goal is to provide genlock synchroniza-

tion for active stereo on analog CRTs with a precision of 5µs − 40µs. The approach is to

use standard consumer graphics cards, abandoning hardware modifications on the graphics

devices. The synchronization is controlled by a master, which signals sync events to a number

of slaves. Master and slaves are connected in star topology.

Active stereo is displayed by preserving two buffers, one for the left and one for right eye.

With each VBLANK the source address of the displayed image is exchanged, thus switching

between the buffers. The shutter glasses for active stereo are connected to the master.

Two ingredients are necessary for the synchronization:

• detection of VBLANKS at the master and the slaves

• modification of the refresh rate at the slaves

VBLANK detection can be done in two ways:

• attaching an interrupt handler to the interrupt of the graphics card

• polling of a VGA state register

The first is obviously the more effective method, as the CPU load is minimized between the

VBLANKS. However it is implemented for NVidia graphics devices only.

The second approach supports a wider range of graphics devices. The busy waiting time

is thereby minimized by estimating the time until the next VBLANK. Within this interval

nothing is expected to happen, so the process puts itself to sleep and awakes just before the

next VBLANK.

Refresh rate modification also can be accomplished by two different methods.

• modifications of the pixelclock

• adjustments of the image geometry - adding or removing hidden columns or lines

In Softgenlock the modification of the pixelclock is bound to NVidia graphics cards. However

it can be implemented for graphics cards of other vendors too.

The latter approach, changing the frame geometry has been described in section 8. However

Softgenlock accesses the VGA registers. These registers are specified by the VGA standard.

As this standard is rather old and therefore has a number of limitations e.g. no support

for high resolutions. Furthermore, the access requires two steps: first the desired register is

written as an index into the index/data register1. Afterwards the data is read from or written

to the same register. The authors of Softgenlock report problems, that arise from this not

1address offset: 0x3C0

28


atomic registers access, as the graphics driver might access the index/data register at same.

This can lead to an index written as data, or data written as index. The results according to

the authors range from corrupted display to dead lock of the whole system. The frequency

depends on the used hardware. Due to this, the pixelclock modification is preferred.

The genlocking mechanism works as follows [1, compare p.257]:

• On detection of a VBLANK the master sends a signal to the slaves.

• Each slave measures the arrival time of the signal. It estimates the time of the VBLANK

at the master tm by subtracting the estimated signal runtime.

• Alls slaves measure the time tl of their local VBLANK.

• If the offset between the VBLANK of master and slave |tm − tl| is larger than the

accepted tolerance, the slave alters its refresh rate accordingly.

• At the detection of the VBLANK event all swap their buffers.

Two things are crucial. To calculate the exact time of the VBLANK at the master the slaves

must assume a constant delay between the VBLANK event at the master and the arrival of

the signal.

a) This requires a real time operation system to have deterministic scheduling.

b) Standard Ethernet as signal path drops out. Instead a separate cabling via Parallel Port

is used.

Master and slaves are connected via parallel port. The authors report to have on each of the

8 data pins of the parallel port up to 4 nodes connected. Thus 33 nodes can be synchronized

without spending additional effort in amplifying the signals on the wire. According to the

authors the parallel port provides a fast signaling with constant delay of around 5µs on a real

time system.

One of the shortcomings is the urge to have a real time system. The fact that each VBLANK

triggers a calculation makes overlooked VBLANK interrupts severe, especially at the mas-

ter. The display configuration via VGA registers is now, eleven years after the release of

Softgenlock, not up-to-date and not used by many modern graphics cards.

The both ways of video timings adjustments work well on analog video sinks. A CRT does

not directly receive a pixelclock signal, but is synchronized to the graphics device by the

horizontal and vertical sync signals. Our experiments showed, that digital video sources

behave different and are more sensitive to changes of video timings.

SoftGenLock is not developed further, but has inspired two derivatives: WinSGL for Windows

and Genlock for Linux systems.

29


7.3 WinSGL

WinSGL is a software solution for software genlock on Windows systems. It was proposed in

2006 at the Eurographics Symposium.

The differences compared to Softgenlock are:

• It abandons a realtime operating system and runs on a standard Windows.

• A constant runtime delay can not be guaranteed due to the non realtime OS. Thus

the clock master is a dedicated hardware, like a function generator or a microprocessor,

generating an external clock signal to which all slaves synchronize.

• The target video sinks are beamers, the target refresh rate is 60Hz and no active stereo.

• It relies on 3rd party software to do the frequency adjustments: PowerStrip2 from

EnTech Taiwan.

The targeted video sinks are digital beamers connected to the VGA port. However the authors

state that the results were tested and are also valid on devices connected to DVI.

The authors report that their experiments revealed adjustments of the pixelclock to show jitter

and distortion of the whole image. Therefore WinSGL uses no adjustments of the pixelclock.

Furthermore they describe the used digital video sinks to react very sensitive to changes on

invisible pixels. They experienced shifts of the image in all cases other than increasing or

decreasing the vertical front porch. As a result WinSGL restricts on manipulations of the

vertical front porch to achieve smooth frequency adjustments. Though the primary devices

were DLP projectors, they report the same results for two DELL LCDs.

The detection of VBLANKS is accomplished by an API call to the Windows DirectDraw

API. Detection however is not guaranteed - sometimes a VBLANK is “overlooked” due to

scheduling latencies - so WinSGL uses timestamping to cope with missed VBLANKS. In this

case it will skip one comparison and continue with the next two timestamps.

The steps of synchronization are summarized below:

1. Initialization starts with finding two modelines nearest to the genlock frequency, one

above and one below.

2. In the next step the phase offset is eliminated by reducing or accelerating the refresh

rate until |tm − tl| ≤ tolerance.

3. As soon as the phase offset is near zero, the slaves try to stay in sync. For each received

sync signal from the genlock master, the slaves compare the timestamp of arrival with

their local VBLANKs. If the deviation exceeds the tolerance range, the modeline is

switched.

The tested resolution were 1024x768 at a rate of 60Hz. The granularity between two possible

2http://entechtaiwan.com/util/ps.shtm

30

http://entechtaiwan.com/util/ps.shtm


refresh rates is therefore approximately 0.07Hz−0.08Hz. To keep in sync the slaves therefore

need to switch very frequently between the two modelines.

A synchronization precision of ±30µs is reported. The results were also compared to the

performance of Softgenlock, which was stated to achieve a higher synchronization precision

of up to ±7µs.

The experienced effects of refresh rate variations described are almost contrary to our results

explained in section 8. This indicates a strong dependence of the used display hardware.

31

8 REFRESH RATE ADAPTATION

8 Refresh Rate Adaptation

One of the requirements for a display synchronization is a method to adapt the refresh rate

of the display. As already mentioned on page 12, according to equation 1 two approaches are

possible:

• changing the number of pixels transferred to the display

• variations of the pixelclock

The following part describes both approaches and the results on common graphics cards.

Tests were made primarily on Intel graphics devices of fourth3 and fifth4 generation, but also

a recent NVidia graphics card.

8.1 Display Timing on Common Graphics Devices

Resolution Variations

As depicted in figure 2 on page 11, the screen resolution consists of an active part – containing

the visible pixels – and the blanking part, that is not visible on the screen. This is illustrated

by an example.

The VESA coordinated video timings (CVT) formula delivers for a resolution of 720p at

120Hz the following modeline:

pixelclock horizontalactive

horizontal blanking verticalactive

vertical blankingfront HSYNC back front VSYNC back

162.00MHz 1280 96 136 232 720 3 5 47

Table 1: Modeline for 720p at 120Hz

The non-visible pixel margin is large enough to allow changing the number of columns in the

horizontal blanking period as well as the number of rows in the vertical blanking period. The

changes on hidden pixels do not affect the visible display resolution. Thus besides possible

switching artifacts no pertaining distortion of the image is produced.

Graphics card registers determine the display resolution. If the addresses of these registers

is known, changes can be made without a reset of the display – but still disturbances might

occur. The corresponding display resolution registers for Intel graphics cards are summarized

in the appendix on page 78. The standard resolution setting mechanism provided by the

driver will always reset the display and is therefore not usable.

The resolution changes are limited to addition or removal of whole rows or columns. By

combining horizontal and vertical direction, granularity can be increased. The refresh rate

3i965GM (Crestline)4integrated graphics in Ironlake processors

32


1.346 1.3462 1.3464 1.3466 1.3468 1.347 1.3472 1.3474 1.3476 1.3478 1.348

x 106

119.96

119.98

120

120.02

120.04

120.06

120.08

120.1

120.12

number of pixels

refr

esh

rate

[Hz]

Student Version of MATLAB

rv htot vtot120.119Hz 1746 771120.101Hz 1744 772120.083Hz 1742 773120.066Hz 1740 774120.049Hz 1738 775120.032Hz 1736 776120.016Hz 1734 777120.000Hz 1732 778119.985Hz 1730 779119.970Hz 1728 780

Figure 12:

difference in case of changing the number of lines is

∆vr =fphtot·(vtot2 − vtot1vtot1 · vtot2

)(10)

Analog the difference in case of changing the number of columns is

∆vr =fpvtot·(htot2 − htot1htot1 · htot2

)(11)

Combining both directions leads to

∆vr = fp ·(

1

htot1 · vtot1− 1

htot2 · vtot2

)(12)

∆vr is the change of the refresh rate, fp is the frequency of the pixelclock and htot, vtot are

the total number of pixels in horizontal resp. vertical direction, including the active portion

and the hidden pixels.

Taking the resolution in table 1 as example, removing one line increases the refresh rate by

0.16Hz. Reducing the resolution by one column changes the refresh rate by approximately

0.06Hz. A combination of horizontal increase and vertical decrease can provide a granularity

of roughly 0.02Hz − 0.015Hz. A lower pixelclock frequency requires a lower resolution to

produce the same refresh rate. Thus less hidden lines are included in the frame, which

further decreases the step size for removal or addition of columns to roughly 0.01Hz.

The results however revealed this method of refresh rate variation unfeasible for synchroniza-

tion in this project for the following reasons: The tested displays reacted to modifications of

the resolution with a black screen for roughly one second, before bringing the display content

back. The refresh rate has changed afterwards, but the blackout period is not acceptable

for the application in a display wall. The granularity of the refresh rate adaptation and the

33


estimated amount of jitter give reason to expect changes of the refresh rate within each ten

to hundred milliseconds. This would lead to a continuously black screen.

Pixelclock Variations

The pixelclock is produced by a frequency synthesizer on the graphics device. It is generated

by applying multipliers and divisors to a reference frequency that is provided by a quartz

oscillator.

The exact calculation formula depends on the graphics card vendor and also differs between

different hardware generations.

Equation 13 shows, how the pixelclock is generated on Intel graphics cards of generations 4

to 6.5[12]

fp =fref · (5 · (M1 + 2) + (M2 + 2))

(N + 2) · (P1 · P2)(13)

fp is the pixelclock and fref a fixed frequency (96MHz in generation four Intel devices) that

is deduced from an oscillator. M1, M2, N and P1, P2 are integer parameters that can be

adjusted within predetermined limits.[12]

The mulitplicator and divisor parameters are controlled by a number of registers. In the

appendix on page 78 this in showed in more detail. Writing into these registers controls

the pixelclock, on Intel graphics devices the change is carried out at the next VBLANK.

Pixelclock variations work mostly without visible artifacts, but occasional flickering. Some

times however the display switches off for a second, probably because the PLL in the display

lost lock on the pixelclock frequency. This happens non-deterministically.

As it can be obtained from equation 13, the step size between two dot clocks - thus refresh rates

- is neither arbitrarily small nor uniform. Fig.13 illustrates this, the appertaining parameter

sets are summarized in table 2.

5Intel graphics devices generations are summarized in the appendix on page 80.

34


117

118

119

120

121

122

123

refr

esh

rate

[Hz]

Dotclock on Intel Graphics


Figure 13: Refresh rates around 120Hz on an Intel 965GM integrated graphics

VRate ∆fV M1 M2 N P1 P2 fV CO valid

118.58 Hz 0.14 Hz 15 8 5 1 10 1302.86 MHz y

119.04 Hz 0.22 Hz 18 7 6 1 10 1308.00 MHz y

119.41 Hz 0.36 Hz 13 5 4 1 10 1312.00 MHz y

119.83 Hz 0.42 Hz 15 9 5 1 10 1316.57 MHz y

120.14 Hz 0.31 Hz 18 8 6 1 10 1320.00 MHz y

120.38 Hz 0.24 Hz 21 7 7 1 10 1322.66 MHz y

120.57 Hz 0.19 Hz 10 7 3 1 10 1324.80 MHz n

120.86 Hz 0.29 Hz 13 6 4 1 10 1328.00 MHz y

121.07 Hz 0.21 Hz 16 5 5 1 10 1330.28 MHz y

Table 2: Parameters for different dot clocks on Intel graphics

The results above make obvious, that for a smooth synchronization the steps between two

refresh rates are too large. One possibility to deal with this problem is to switch between

two refresh rates at each VBLANK, thus the average refresh rate matches the target refresh

rate. Unfortunately it emerged that the LCDs are not able to follow such fast variations

of the pixelclock. The frequent switching produces an artificial jitter on the pixelclock, that

presumably overwhelms the PLL in the displays. As a result the displays switch the image off

until synchronization with the pixelclock is regained. It turned out, that once synchronization

is lost, the display remains black, until the pixelclock variations are suspended.

The results are different to that of the Softgenlock Project[1] as described in section 7.2.

There are two important distinctions that have to be minded. First we want to have a

continuous frequency and phase lock - instead of temporary phase adjustments to match the

phase. Second the mentioned project uses analog CRT monitors. As described in section 3

in the analog display data transmission there is no pixelclock, but the refresh rate is deduced

from the frequency of HSYNC and VSYNC. An in- or decrease of the pixelclock changes the

rate of the sync signals, but is not directly visible to the display as a changed clock.

Also the results of WinSGL[20] are quite contrary to the results presented here. While they

35


experienced stronger artifacts with pixelclock modifications and successfully manipulated the

number of lines in the front porch, our results rendered pixelclock adaptation more stable

and hidden pixel variations useless. It seems likely that the effects of both methods strongly

depend on the used hardware and the signal processing in the display devices. Moreover it is to

observe, that newer generations of displays reacts more sensitive to unexpected disturbances,

presumably due to an increased amount of signal processing.

8.2 Software controlled VCXO

Standard PCs usually do not provide a VCXO (voltage controlled crystal oscillator). Settop

boxes for DVB contain a VCXO to enable genlock as described in section 5, but this VCXO

is usually not accessible by custom software.

A STB including an Intel consumer electronics processor is available. The CE4100 processor

is a SoC based on the Intel Atom and enhanced by special multimedia features. STBs with the

CE4100 contain a software controllable VCXO. The SDK for the CE4100 platform provides

functions to set the VCXO voltage. The VCXO oscillates at a nominal frequency of 27MHz.

The pixelclock is derived from these 27MHz and thus can be adapted by controlling the

VCXO voltage.

The refresh rate adaptation has a very fine granularity which is determined by the sigma-delta

DAC that controls the VCXO voltage. It is applied immediately and does not produce any

artifacts.

The pull range of the quartz is specified as ±125ppm.[10] We experienced that the refresh

rate can be changed in a range of approximately ±0.025Hz.

One of the drawbacks is the very small tuning range of the refresh rate. This is not surprising

as the tolerance of a quartz in a DVB settop box is specified as ±30ppm.

That implies that building a system with a number of sync nodes, special care needs to be

taken that the tuning range of all nodes are overlapping.

As the CE4100 STBs provide a very fine granular refresh rate adaptation, they will be used

as display nodes for the prototype.

36

9 PHASE-LOCKED-LOOP

9 Phase-Locked-Loop

The core of nearly every system doing frequency and phase synchronization or frequency

estimation is a PLL. While basic design of PLLs in general is common, there are many

choices of implementation in the constructing building blocks. PLLs can be purely analog,

mixed signal, digital or in software implemented. Designing PLLs for a specific application

is non-trivial and a field of its own. The performance of a PLL determines the performance

of a communication system to a large extent.

The next part describes the operating principles of PLLs. The focus will hereby be laid on

digital PLLs, as the Synchronization Architecture described in this thesis uses a digital PLL

implemented in software - a software phase-locked-loop (SPLL). The structure and theories

apply to analog PLLs as well.

A PLL is a negative feedback system. Its goal is to lock the phase of an internal oscillator to

an external reference signal. As the frequency is the derivative of the phase,

dϕ(t)

dt= ω(t) (14)

locked frequency is a consequence of locked phase.

Figure 14 depicts the block diagram of a PLL.

Figure 14: Phase-Locked-Loop

The building blocks of a PLL are:

Phase Detector (A) Estimator of the error in the phase of the reference signal and the

internal oscillator signal.

Inner-Loop Filter (B) Lowpass filter to reduce the jitter on the phase error.

(Digitally) Controlled Oscillator (C) Oscillator which oscillates with its inherent frequency

37

9 PHASE-LOCKED-LOOP

f0 but can be controlled by an external signal to adjust its frequency within the pull

range foutmin ≤ f0 ≤ foutmax .

Phase Predictor (D) Integrator that predicts the next phase ϕlo based on the current oscil-

lator frequency and past ϕlo.

Phase Detector

The phase detector calculates the phase error θ, the difference of the phase of the external

reference signal and the internal oscillator signal predicted by the phase predictor. In digital

PLLs it is simply an adder.

θ(n) = θref (n)− θlo(n) (15)

Inner-Loop Filter

There are two sources of phase noise:

1. The external phase is disturbed by phase noise, that can be produced by various sources,

but particularly is introduced by the channel.

2. The internal phase can be disturbed by noise that is produced by the oscillator.

In the application that is described in this thesis, phase refers to clock ticks, sampled as

discrete timestamps. The term phase noise therefore refers here to deviations of the period

between timestamps, that should be equally spaced. Therefore the term jitter is used here as

a synonym to phase noise.

Figure 15: Phase Noise

The jitter is directly reflected in the phase error. To enable the PLL to follow frequency

drifts but prevent the influence of the jitter on the PLLs output frequency, the inner loop

filter reduces the high frequency components in the phase error. In a digital phase-locked-

loop (DPLL) the loop filter is usually an IIR lowpass. The choice of inner loop filter greatly

influences the behavior of the PLL and its reaction to dynamics in the inputs. It determines

the order and the type of the PLL. The order of a PLL is the highest power of the denominator

in the closed loop transfer function. The order of the PLL is always one higher than the order

38

9 PHASE-LOCKED-LOOP

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

frequency correction factor (normalized to T/2)

VC

O fr

eque

ncy

cent

ered

aro

und

f 0 (no

rmal

ized

to (

f max

−f m

in)/

2)

K = 0.5K = 1.0K = 2.0


Figure 16: Relation between VCO pulled frequency and gain K

of the inner loop filter.

θ(n) is the filtered phase error which is obtained by filtering the phase error with the inner

loop filter with transfer function Hlf (z). The inner loop filter will be explained in more detail

later.

θ(n) = Hlf (θ(n)) (16)

Controlled Oscillator

The filtered phase error is used to change the output frequency of an oscillator e.g. a voltage

controlled oscillator (VCO), numerically controlled oscillator (NCO) or digitally controlled

oscillator (DCO). Without external control this device oscillates at its inherent frequency

fout = f0. By adding the filtered phase error θ(n) , it is pulled to a different frequency. This

explains the strong relation between the frequency characteristics of the loop filter and the

frequency characteristics of the PLL.

The output frequency of the PLL is calculated as

fout(n) = f0 +K · fs · θ(n) (17)

The phase error was accumulated within 1fs

, the period between two clock samples. The term

fs in equation 17 accounts for this.

A VCO typically has a limited pull range. The output frequency of the oscillator in depen-

dency of the phase error is controlled by the oscillator gain K. The gain thereby determines

the change of the oscillator frequency, if the phase error θ 6= 0, as depicted in figure 16. The

choice of the gain factor however has no influence on the limit of the pull range. A PLL with

larger oscillator gain K will reach the positive or negative limit of fout at a smaller phase

error, than a PLL with a smaller gain.

A gain that is too small leaves the pull range of the oscillator unexploited.

39

9 PHASE-LOCKED-LOOP

A larger gain reduces the acquisition time, the time until the frequency is locked. The

downside is a reduced stability against jitter and phase noise, as the impact of these on the

PLL output frequency fout is higher. A smaller gain instead leads to a more stable frequency

control and smooth fout. Thus in the choice of the gain fast settling-time has to be traded off

against stability. Application requirements and expected jitter are to be regarded. A special

role is assigned to the gain factor in PLLs of type I as described in section 9.2.

Phase Predictor

When the PLL is locked fout(n) = fref (n). The phase predictor estimates the next phase

value, based on the current oscillator frequency in equation 17 by adding the number of

expected clock ticks within one sample interval. The prediction is required to compensate

the delay induced by the digital loop filter. The transfer function is that of an integrator:

P (z) =T

1− z−1(18)

The predicted phase is the output of the integrator:

P (f(n)) = ϕlo(n+ 1) = fout(n) · T + ϕlo(n) (19)

9.1 Frequency Characteristics of PLLs

Figure 17: Frequency Ranges of PLLs

The pull range of PLLs is divided into four ranges:

lock range The frequency range in which the unlocked PLL can lock to the reference fre-

quency without skipping one or multiple periods.

pull-out range The range in which a locked PLL is able to follow a frequency step of the

reference frequency.

pull-in range The frequency range in which an unlocked PLL is able to lock onto reference

frequency, only if at least one period is skipped.

hold-in range A locked PLL can follow a slow frequency drift of the reference frequency in

finite time.

40

9 PHASE-LOCKED-LOOP

9.2 Type I and Type II Phase-Locked-Loops

The type of a PLL is determined by the number of integrators in the open-loop transfer

function. Most practical relevance have PLLs of type I and type II.

The difference between type I and II is that a type II PLL contains an integrator in the loop

filter. The effect of this integrator is a steady-state phase error of zero. The steady-state

phase error is the phase error which remains constant if the PLL is locked onto the reference

frequency.

In type I PLLs the filtered phase error controls the DCO frequency directly. If the internal

oscillator frequency f0 differs from the reference frequency fref , a constant phase error is

necessary to make both frequencies equal. This is undesirable for some applications.

The loop filter of type II PLLs integrates over the filtered phase error, thus maintaining a

non-zero output also in case of a zero phase error. Therefore the steady-state phase error

is zero. The drawback is a longer settling-time and increased overshooting compared to the

type I PLL.

0 50 100 150 200 250 300 350 400 450 500119.9845

119.985

119.9855

119.986

119.9865

119.987

119.9875

119.988

119.9885

119.989

time [s]

freq

uenc

y [H

z]

fout

fref


0 50 100 150 200 250 300 350 400 450 500119.7

119.8

119.9

120

120.1

120.2

120.3

120.4

120.5

time [s]

freq

uenc

y [H

z]

fout

fref


Figure 18: Settling behavior of type-I (left) and type-II (right) PLL

Special care must be taken when designing digital filters that contain integrators. The transfer

function of an integrator is

H(z) =1

1− z−1(20)

and thus has a pole at z = 1, making the loop filter potentially unstable. Shifting the pole

slightly into the unit circle by using a coefficient b1 = 1− ε, e.g. b1 = 0.98, helps to maintain

stability, but can not eliminate the steady-state phase error completely. The modified transfer

41

9 PHASE-LOCKED-LOOP

function is

H(z) =1

1− 0.98z−1. (21)

9.3 Inner Loop Filter

The loop filter considerably determines the behavior of the PLL. Its purpose is to filter the

phase error in order to eliminate the phase noise. The design of loop filters is a comprehensive

topic on its own and will not be covered here. There are filter design tools, that create filters

with the desired frequency characteristics, based on the theory of filter design.

The loop filter has some characteristics. As it is a lowpass filter it has a cutoff frequency,

frequencies above are filtered out. In the context of PLLs it is rather often referred to the

term loop filter bandwidth f0 than to the commonly used −3dB cutoff. f0 is defined as the

frequency at which an asymptote to the falloff crosses the 0dB line.

Figure 19: Bandwidth definition of loop filter

Loop Filter Design

The typical design process for digital filters follows these two steps:

1. Calculation of an analog filter with a transfer function which fulfills the given require-

ments.

2. Bilinear transform from the s-plane to the z-plane to get a digital filter.

Loop Filter Example: Butterworth Filter

Butterworth filters are IIR filters. They have a maximal flat magnitude below the cutoff

frequency and an almost linear falloff with ≈ −20dB · n/decade, where n is the oder of the

filter. This makes the Butterworth a filter often used for PLLs, although other filters provide

a sharper falloff and Butterworth filters do not have a linear phase.

The transfer function of a digital IIR filter is

H(z) =a0 + a1z

−1 + a2z−2 + . . . anz−n

1 + b1z−1 + b2z−2 + . . . bnz−n. (22)

42

9 PHASE-LOCKED-LOOP

−30

−25

−20

−15

−10

−5

0

Mag

nitu

de (

dB)

10−2

10−1

−180

0

180

360

Pha

se (

deg)

Bode Diagram

Frequency (Hz)


Figure 20: Bode plots of Butterworth filters with different orders

The difference equation describing the filter output therefore is

y(k) = a0x(k) + a1x(k − 1) + . . .+ anx(k − n)− b1y(k − 1)− . . . bny(k − n). (23)

Equation 23 can be easily implemented as digital filter using 2n shift registers. In a SPLL it

requires 2n additions and 2n+ 1 multiplications. 2n values must be stored. A code snippet

implementing a SPLL is given in the appendix on page 77.

Figure 20 compares the Bode plots for four Butterworth filters of different order. The cutoff

frequency is 0.02Hz.

9.3.1 Loop Filter Design Tools

There are two helpful tools, that help in the loop filter design process. The fdatool, part of

the Matlab6 Signal Processing toolbox delivers the filter coefficients for a digital Butterworth

filter, given the cutoff frequency, sampling rate and order of the filter.

To design a digital IIR filter, Matlab follows the digital filter design principle described above:

First an analog filter is designed, which fulfills the specified requirements. After some inter-

mediate steps, Matlab uses bilinear transform to create a digital filter with the same transfer

6www.mathworks.com

43

www.mathworks.com

9 PHASE-LOCKED-LOOP

function as the analog filter.7

A special PLL design tool is included in CppSim8 which is available for free. The PLL

designer of CppSim calculates coefficients for analog loop filters. The user has to specify the

loop filter bandwidth, PLL type, filter order and shape. The analog filter given by CppSim

has to be transformed to a digital loop filter with bilinear transform, to be used in a SPLL.

7see http://www.mathworks.de/help/toolbox/signal/ref/butter.html8http://www.cppsim.com

44

http://www.mathworks.de/help/toolbox/signal/ref/butter.html

http://www.cppsim.com

9 PHASE-LOCKED-LOOP

9.4 Software PLL in the Display Synchronization

Synchronization over a jitter inducing channel requires a PLL, other frequency estimations are

highly unreliable and provide poor performance. Averaging over a frequency which is constant

but disturbed by normal distributed phase noise requires an unfeasible large window. A PLL

instead is able to lock very precise onto a frequency, enabled by the negative feedback. To

emphasize this, the result of a simulation is depicted below. It compares the performance

of a 2nd order PLL and cutoff frequency of 10−2Hz, with an exponentially weighted moving

average9 filter with α = 0.2, α = 0.1 and α = 0.01. The standard deviation of the normal

distribution is σ = 10−7, thus 66% of the jitter is below 100ns. Such a small σ is far from the

conditions found on a standard Ethernet or on a (non-realtime) operating system. It is even

below the allowed 500ns of jitter in the PCR of a MPEG TS. This small σ is only chosen

here to illustrate the poor frequency estimation capabilities of an averaging method.

0 50 100 150 200 250 300 350 400 450119.97

119.975

119.98

119.985

119.99

119.995

120

120.005

time [s]

freq

uenc

y [H

z]

EWMA with α = 0.2EWMA with α = 0.1EWMA with α = 0.01PLL fout

fref


Figure 21: Comparison of a PLL and EWMA averaging

This result makes obvious, that an accurate synchronization that is dependent on a noisy and

non-deterministic channel can not abandon a PLL. The frequency estimation with an EWMA

even with the smallest coefficient makes larger jumps than the pull range of the VCXO in

the CE4100 STB. Furthermore, as already stated, a jitter in the range of 100ns is absolutely

unrealistic.

9.4.1 PLL Design Choices

A PLL achieves best performance, if it is designed specifically for a certain application. There-

fore specification of the characteristics and analysis of prevalent conditions is necessary.

Measurements have shown that under the conditions valid for the prototype, the expected

jitter on network packets and local VBLANKs both is in the order of several ten µs.

9The EWMA is explained on page 68.

45

9 PHASE-LOCKED-LOOP

Matlab simulations have been made to evaluate different parameters and the effect of tweaking

these parameters. Results of these simulations are presented in the following. Yet successful

simulations in Matlab still do not guarantee that the results are directly comparable to results

experienced in the application. However they usually give a good hint, help to shorten the

design process and can be used to prove a successful application. Circumstances that lead to

a different behavior of simulation and application testing are:

• The VCXO has a limited pull range.

• Depending on the system load of the used hardware lags and late timestamps occur.

Problematic are in the first place a number of late timestamps, that are delivered to

the application all at once.

• The simulated phase noise is normal distributed and statistically independent.

• There is a delay of at least 2T between a change of the VCXO control voltage and the

detection by the PLL.

• The PLL gain must be adapted to exploit the pull range of the VCXO and can not be

adjusted freely as in the simulation.

The next part describes the evaluation and choice of some PLL loop filter characteristics. The

simulations made in Matlab are based on measurements and observations made on the target

system. Determining one parameter will also have impact on the other parameters, e.g. a

larger gain factor requires a better filtered phase error to preserve a stable VCXO frequency

output. Therefore tests of the parameters have not been made in the order described here,

but repeated with different variations.

Loop Filter Order

The order of the loop filter determines the filtering performance of the inner loop filter. A

higher order provides a sharper falloff, but at the cost of overshoot. The additional compu-

tation effort of filters with order two or three compared to order one loop filters is nearly

negligible in software. Simulations with loop filters of order one to three show the differences

in transient behavior. The simulations have been made with normal distributed jitter of mean

zero and a standard deviation of 2.5 · 10−4, thus 66% of the jitter is below 250µs. The PLL

gain is 5.0 and the cutoff frequency fc = 10−2Hz.

46

9 PHASE-LOCKED-LOOP

0 50 100 150 200 250 300 350 400 450 500−5

−4.5

−4

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0x 10

−3

time [s]

phas

e of

fset

[s]

θθ 1st order filtered

θ 2nd order filtered

θ 3rd order filtered


Figure 22: Simulation of filtering performance of different loop filter orders

Figure 22 shows the phase error θ and the filtered phase error θ with loop filters of order 1, 2

and 3. As the order of the PLL is always one higher than the order of the loop filter, this is

equivalent to PLLs of order 2,3 and 4. Figure 23 shows the resulting output frequency fout.

The conditions for the simulation are the same as described above. The left image depicts the

complete simulation including transient phase, the right one is a close-up of fout in frequency

lock.

0 50 100 150 200 250 300 350 400 450 500

119.984

119.986

119.988

119.99

119.992

119.994

time [s]

freq

uenc

y [H

z]

fout 1st order filter

fout 2nd order filter

fout 3rd order filter

fref


300 350 400 450

119.9879

119.988

119.988

119.988

119.988

119.988

119.9881

119.9881

119.9881

time [s]

freq

uenc

y [H

z]

fout 1st order filter

fout 2nd order filter

fout 3rd order filter

fref


Figure 23: Simulation of fout with different PLL orders.

The figures show a filtering performance, that increases with the filter order. However the

better filtering is achieved by the cost of an increased settling-time. The order of the loop

filter therefore should be chosen in respect to the applications preconditions. If higher jitter

is expected, a better stability might make an increased settling-time acceptable.

47

9 PHASE-LOCKED-LOOP

Cutoff Frequency

The cutoff frequency is the most important property of the loop filter. A well chosen cutoff

frequency is necessary to achieve reasonable filtering. If the cutoff frequency is too small

the PLL has a very long settling time, require long to adapt to frequency changes and react

delayed to steps of the reference frequency. In contrast a reasonable amount of filtering is

necessary to make the PLL stable and have a smooth VCXO output frequency.

Figure 24 shows the results of different cutoff frequencies. In the left a filter of order 1 is

simulated, the right graph is an order 2 loop filter.

0 50 100 150 200 250 300 350 400 450 500119.985

119.9855

119.986

119.9865

119.987

119.9875

119.988

119.9885

119.989

119.9895

119.99

time [s]

freq

uenc

y [H

z]

fc = 10−4Hz

fc = 10−1Hz

fc = 10−2Hz

fref


0 50 100 150 200 250 300 350 400 450 500119.985

119.9855

119.986

119.9865

119.987

119.9875

119.988

119.9885

119.989

119.9895

119.99

time [s]

freq

uenc

y [H

z]

fc = 10−4Hz

fc = 10−1Hz

fc = 10−2Hz

fref


Figure 24: Simulation of different cutoff frequencies

The result shows a mentionable overshoot of a second order filter. A first order filter with a

cutoff frequency of 10−2Hz shows to be sufficient under the conditions valid for the prototype.

PLL Type

As framelock requires a zero phase offset, a type II PLL that eliminates all phase error

automatically would be convenient. However the limited pull range of the VCXO in the

CE4100 STBs makes it difficult to find a stable, fast and reliable type II PLL. While the PLL

tries to eliminate the phase offset and lock onto the reference frequency, the loop filter of a

type II PLL integrates the phase error. Within acquisition the integrator in the loop filter

builds up an amount for θ that is larger than is required for fout = fref . The PLL starts to

reduce θ when fout = fref . The reduction however requires some time and this leads to the

typical transient phase of type II PLLs as shown in figure 18.

On the CE4100 STBs the elimination of a phase offset however may require more than 30

seconds. Within this time such a large amount for θ will be accumulated, that fout = foutmax

or fout = foutmin . The PLL is not able to reduce θ within one period, thus one period will

be skipped. This behavior continues and the PLL becomes unstable. The result is that fout

continuously oscillates between foutmax and foutmin .

48

9 PHASE-LOCKED-LOOP

ht0 50 100 150 200 250 300 350 400 450 500

119.92

119.94

119.96

119.98

120

120.02

120.04

120.06

120.08

time [s]

freq

uenc

y [H

z]

fout

fref

foutmin

foutmax


Figure 25: Settling behavior of a type II PLL with small filter gain

One possibility to prevent this is to make the control of fout very slow, by choosing a very

small gain factor K or small filter gain. However this would lead to a very long acquisition

time as is shown in figure 25.

As a result of the two reasons stated above, a type I PLL is used, trading a fast settling-time

against an automatic phase error elimination. Instead a two step solution for phase error

elimination has been worked out:

1. acquisition of frequency lock

2. iterative decreasing of the steady-state phase error towards zero.

The implementation details are explained in section 12.

49

10 NECESSITY OF SYNCHRONIZATION

10 Necessity of Synchronization

As mentioned before, active stereo on a composite display is not possible without framelock.

The functional principle of active stereo technology obviously requires synchronization on a

composite display. The stereo separation would be lost without synchronization, as the stereo

glasses let parts of both images pass to both eyes.

In case of unsynchronized refresh rates there will be a non constant phase offset. Thus the

viewer would experience a continuously changing amount of ghosting. The maximal ghosting

occurs if the switching of the stereo glasses is shifted exactly half a period against the phase

of the display. The viewer sees the same effect as if he would not wear stereo-glasses. A phase

shift of one period, 1/120Hz = 8.33ms swaps right and left frame.

Also, the effect of a missing swaplock is severe, as the nodes would consume frames at a

different rate. This leads on one hand to a discontinuous composite image, and the other

hand to frame skips.

Even the slightest difference in refresh rate is problematic, as the the phase offset is cumulated

within each period. Figure 26 illustrates the impact of frequency offsets of the pixelclock.

0 10 20 30 40 50 60 70 80 90 100

-360°

-180°

-90°

0°

90°

180°

360°

time [s]

phas

e of

fset

Expected Phase Offset

100 kHz offset (0.1%)10 kHz offset (0.01%)-50 kHz offset (-0.05%)


Figure 26: Impact of frequency differences

Manufacturing tolerance of a standard quartz is around ±30ppm. Therefore even if identi-

cal hardware is used and exactly the same modeline is chosen, frequency differences can not

be avoided. Besides, that would leave correction of phase offset unsolved.

Apart from the manufacturing tolerance a quartz has a temperature dependent offset, that

50

10 NECESSITY OF SYNCHRONIZATION

can be in the same order of magnitude. Finally a quartz underlies aging which influences the

frequency around ±1ppm/year.

The frequency offsets stated above may seen to be very small. The example calculation below

shows the effects on the refresh rate. Assumed is a resolution of 1080p at a refresh rate of

120Hz. The original pixelclock is 142.625MHz.

A deviation of the quartz of 30ppm leads to an offset pixelclock of

f ′p = fp(1 + 30 · 10−6

)= 142.629MHz. (24)

The difference of the pixelclock is 4.28kHz, difference of the refresh rate is 0.0018Hz. This

leads to a 180 degree phase offset after 4 minutes and 37 seconds.

Therefore the only possibility to ensure a proper framelock and swaplock is to have a steady

synchronization of all display nodes.

51

11 SYNCHRONIZATION ARCHITECTURE

11 Synchronization Architecture

The Synchronization Architecture consists of three types of participants:

• clock master display (CMD)

• slave display node (SDN)

• frame deadline predictor

The communication is based on small UDP packets - the display clock reference (DCR)

messages. The format of the DCRs is shown in figure 28.

The CMD serves as the master clock for all other participants in the synchronization archi-

tecture. There is only one clock master allowed.

It measures its graphics cards refresh rate and maintains a framecounter. On the detection

of a VBLANK it sends a DCR packet to all receivers. It can be configured to subsample, i.e.

send a packet only every n-th VBLANK. The used hardware for the slave display nodes has

limited computation power and each received DCR packet triggers a frequency estimation

and adaptation loop. Subsampling therefore reduces the system load. As the sender includes

the framenumber into the DCR packets, the slave nodes are able to detect the subsampling

ratio automatically – no initial configuration is necessary.

The DCR packets also contain a PCR field. The PCR is generated at the master and delivers

the required information for synchronized video playback at the SDNs. The video sources can

include this PCR into the generated video streams, it provides a timebase for the decoders

at the display nodes.

The SDNs listen for incoming DCR packets and use them to generate timing information.

Independently from DCR message reception they watch their own VBLANK events. The

slave nodes comprise a SPLL which is triggered by receiving a DCR message. The SPLL

compares the local refresh rate with the refresh rate of the clock master. According to the

result the SPLL controls the local refresh rate.

Additionally all slave nodes measure the RTT to the master. The purpose is to estimate the

delay of the DCR packets, to deal with the phase offset.

The frame deadline predictor runs on the video source(s). It listens to DCR packets from

the CMD. Just as the SDNs it measures the RTT to the CMD. From the masters refresh rate,

the estimated RTT and the framenumber of the master it can predict at what time the video

source has to send a frame to the display. As the Display Wall can receive video streams

from multiple sources, the frame deadline predictor provides the possibility to relay the DCR

packets received from the CMD to multiple sources. One instances of the frame deadline

52


predictor runs on each video source. One of the frame deadline predictors receives the DCR

packets from the CMD and broadcasts or mulitcasts the DCR messages to the other frame

deadline predictors. Thus the master only needs to send DCR packets to one video source

via unicast. To let the other frame deadline predictors estimate the RTT to the master, the

relaying frame deadline predictor inserts the RTT measured to the CMD in the DCR packets.

Figure 27 gives an overview of the synchronization architecture, the communication paths

and the transmitted information. The DCR messages are sent with the rate of the vertical

refresh rate vr, eventually with subsampling by factor n and k. Subsampling of the packets

for SDNs and frame deadline predictors does not necessarily need to be equal. The following

sections discuss the parts of the synchronization architecture in detail.

Figure 27: Synchronization Architecture

53


11.1 Synchronization Packet Format

Figure 28: Format of the DCR packets

The DCR packets carry a 16 byte payload. They provide the necessary information for

synchronization.

The field framenumber is filled with the framenumber of the CMD. This number is increased

on each VBLANK. It is a 32-bit integer, therefore at a refresh rate of 120Hz it would wrap

around each 414 days and 6 hours. The framenumber fulfills two functions: first it lets the

clients detect packet losses or reordering; second the clients can detect the subsampling ratio.

The RTT field is used for relaying purposes at the video sources. The frame deadline predictor

provides the functionality to pass the received DCR packets to additional destinations. To

let these nodes know the RTT to the CMD the relaying node puts the estimated RTT into

the DCR packets before it broadcasts them.

A PCR is generated at the CMD, which is inserted by the video sources into the video streams

to enable synchronized playback at the display nodes. The PCR is transmitted in the PCR

field, divided in PCR base and extension as described in section 5.

Reference Clock Independent Frequency Estimation

Besides the information carried by the DCR packets, a special purpose is fulfilled: by design

the DCR messages do not contain any frequency information, but serve as a sync ping.

Sharing absolute time or frequency via the DCR packets would require, that the clocks are

synchronized, otherwise timestamps or time differences would not be reliable as the referenced

clocks might have divergent clock rates.

Instead the timestamps are generated at packet ingress. Thus each synchronized system relies

only on its own clock and all calculations are based on this local clock. This is similar to the

PTP syntonization as described in section 4.3. Absolute time is of no interest, but the clock

rates – which is here the refresh rate – must match.

54

12 IMPLEMENTATION DETAILS

12 Implementation Details

The following section details the working principles and implementation details of the com-

ponents of the synchronization architecture.

12.1 Clock Master Display

The master display estimates its refresh rate by watching the VBLANKS. There are a few

possibilities to do this:

• Hooking an event handler to the interrupt of the graphics cards. However not every

interrupt has to be VBLANK interrupt, thus for reliable detection the interrupt han-

dler must look up the interrupt source in an interrupt status register. This requires

knowledge of the graphics cards register addresses. This is the most direct method, but

requires information on the used graphics card.

• Graphics cards drivers usually implement the detection of VBLANKS. If the driver

provides an access to this functionality from outside the driver, the VBLANKS can

be queried by this API. For example the direct rendering manager (DRM) uses this

method.

• OpenGL Extensions to the X Server (GLX) includes a function call to wait for the next

VBLANK. This method requires a lot more overhead than the direct interrupt handler,

given that the VBLANK information in the end - after a number of function calls - is

delivered by the drivers interrupt handler. However it requires no technical knowledge

about the graphics card or driver and runs on most Xservers without further effort.

The current implementation of the CMD uses GLX to detect VBLANKs, as it avoid an

implementation for specific hardware. GLX provides a blocking call which returns on the

VBLANK event. This function also returns the number of the current frame since the start

of the Xserver. This framenumber is transmitted in the DCR packets to SDNs and frame

deadline predictor.

On the CE4100 STBs no Xserver is running, thus if the CMD runs on a CE4100 STB, a dif-

ferent method for VBLANK detection has to be used. A library in the CE4100 SDK provides

a function for this purpose. It is also a blocking call, which returns on the next detected

VBLANK. However it does not deliver a framecount, such that a virtual framecount must be

maintained. Missed VBLANKS must be detected by timestamping and frequency estimation,

to keep the framecount reliable.

The functional principle of the CMD is depicted in figure 29. These two steps are continu-

ously repeated:

55


1. With a blocking call the CMD pauses until the next VBLANK.

2. When the function returns and if the framecount fc and the subsampling ratio rS

indicate, that a DCR shall be sent, the CMD assembles the packet by filling the current

framecount into the field framenumber, sets the field rtt_useconds to zero and adds

the PCR value.

Figure 29: Flow diagram of the CMD

The PCR is calculated from 27MHz, but we do not have access to a 27MHz clock on a

standard PC. Furthermore this 27MHz clock must be synchronized to the refresh rate in

order to do reverse genlock. Therefore we create virtual 27MHz timestamps and calculate

from these a virtual PCR.

A virtual 27MHz clock tick vClk can be calculated for each frame n from the refresh rate vr

by

vClk[n] = vClk[n− 1] + 27MHz · 1/vr[n] (25)

PCR base and extension are calculated according to equation 9, the virtual PCR inserted

into the sync packet is:

PCR[n] = vCLK[n]/300 << 9 + vCLK[n]%300 (26)

All nodes in the synchronization architecture are synchronized to the refresh rate of the CMD,

thus also to the virtual 27MHz clock - even if an outstanding viewer would observe a rate

56


different from 27MHz.

The ICMP Echo requests that are received from the SDNs and frame deadline predictors

are automatically answered by the Linux kernel, as the sent packets conform to the ICMP

protocol. Thus no additional implementation at theCMD, has to be done.

12.2 Slave Display Nodes

The main part of the synchronization is carried out by the slave display nodes (SDNs). A

block diagram is depicted in figure 30.

Three tasks are executed concurrently and independently:

1. VBLANK Detector (VD): detection of local VBLANKs and generation of VBLANK

timestamps (marked green in figure 30)

2. Synchronization Core (SC): reception of DCR packets, and frequency adaptation

(blue)

3. RTT Estimator: estimation of the RTT (yellow)

The first and second tasks are separated due to the following reason. The reception of a

DCR packet triggers one round of synchronization: the SPLL updates all values and adjusts

the output frequency of the VCXO. In order to do this, the phase detector requires the

current VBLANK timestamp. If the phase detector would wait for the occurrence of the

next VBLANK – which pauses the process execution – , it might happen that the next DCR

packet arrives already before the VBLANK and would get lost. Therefore the handling of

VBLANKs is separated into an independent process which traces all VBLANKs. This ensures

that each synchronization round can be executed instantly.

Figure 30: Blockdiagram of the SDNs

57


12.2.1 The VBLANK Detector

The VD calls the blocking function wait_for_vblank(). This function returns as soon as a

VBLANK event happens and generates a timestamp tvb.

It is not guaranteed that all VBLANK events are catched. Therefore, the VD keeps track of

the refresh rate. It detects missed VBLANKs by comparing the time difference between the

current and the last VBLANK timestamp with the period of the refresh rate. A VBLANK

is assumed to be lost if: tvb(n)− tvb(n− 1) < 1.5 · 1/vr.On detection of a loss, the VD checks the number of missed VBLANKs. It can have a severe

impact on the synchronization if it is not dealt with the loss of multiple VBLANKs. The

reason is the subsampling: If subsampling is enabled, every n-th timestamp is compared. If

a number of k timestamps is not detected, with k being not a multiple of n, there will be a

jump in the phase difference.

An example is illustrated in figure 31 with a subsampling of two and one missed VBLANK.

Figure 31: Consequence of a lost VBLANK

In this example every second timestamp is compared. Before the slave has missed the

VBLANK, the phase error is small. The consequence of the not detected VBLANK is, that

the framecounter is not increased. This shifts the timestamps associated for comparison by1vr

. The phase error instantly increases by one period.

The number of missed VBLANKs is calculated by dividing the time difference between current

and the last VBLANK timestamp by the estimated VBLANK interval. The framecounter is

increased accordingly.

12.2.2 Synchronization Core

The SC controls the whole synchronization. It listens on the predefined port for incoming

DCR packets. The time of arrival is stamped by the network protocol stack. Each arriving

packet triggers one PLL synchronization round. The synchronization round comprises the

following steps:

58


1. calculation of phase error in positive and negative direction

2. calculation of the direction with smaller phase error

3. filtering of the phase error

4. adjustment of VCXO control voltage

5. estimation of CMD and VBLANK frequency

6. calculation of statistics

One goal was to make the PLL aware of the periodicity. For framelock it does not matter,

if one or multiple periods are skipped - as long as the association of left and right frames is

maintained.

The advantage of periodicity is, that a phase error can either be corrected by increasing or

decreasing the output frequency of the VCXO. If fout is increased, the SDN will catch up

with the CMD. If fout is decreased, the SDN lets himself fall back.

Two different strategies of phase error elimination are possible:

direction-aware choose the direction of smaller phase error

pull-range-aware choose the direction where the difference between the limit of the VCXO

pull range and the reference frequency is larger.

The first strategy is easier implementable, as for the latter the PLL must know fref , foutmin

and foutmax . This requires a longer initialization, where estimates of these frequencies are

calculated. Moreover, it requires thresholding, to prevent that the PLL compensates a very a

small phase error which is introduced by jitter, by shifting the phase of its internal oscillator

by a whole period. In section 12 both strategies are compared shortly. For the prototype the

direction-aware strategy was implemented.

For the direction-aware strategy the PLL calculates the phase difference from each DCR

packet timestamp to the VBLANK timestamp recorded before the arrival of the DCR packet

and the following VBLANK. The smaller of both is the direction of phase reduction. If fref

is not in the center of the pull range, phase error reduction in one direction takes longer than

in the other direction.

59


Figure 32: Direction of phase correction

The calculations are triggered whenever a sync packet arrives. To enable the calculation of

phase offset in positive and negative direction, not tdcr(i) but tdcr(i−2) is the reference phase

for offset computation. It is compared with tvb(i− 1) and tvb(i− 2). Thus the calculation is

delayed by two. This ensures that always a timestamp tvb is available which was generated

at the VBLANK after the reception of the current DCR packet. This avoids jumps in the

phase error.

The core of the SDNs is a software PLL. However there are specialties in this SPLL:

• A standard DPLL comprises a predictor with transfer function T1−z−1 , which predicts

the next phase value, based on the current VCXO or NCO frequency. Here however no

phase prediction is made – instead the next phase is directly measured.

• Though it is possible to calculate a corrected phase - the time at which the VBLANK

should have happened - it is not possible to correct the phase of the VBLANKs directly.

Instead the phase has to be shifted by increasing or decreasing the refresh rate until zero

phase offset – this is especially important for a type I PLL, as it does not automatically

eliminate the steady-state phase error.

• Usually a PLL compares clock ticks in a fixed interval. A faster clock implies more

clock ticks in this interval. Here the interval is not fixed, but timestamps which mark

the events are compared. Thus a higher number means a longer period, thus a slower

clock.

The SPLL in the SC works, as described in section 9. The DCR timestamp tdcr(n − 2) is

compared with the VBLANK timestamps (tvb(n − 1), tvb(n − 2)) to find the positive and

negative phase error.

θ+(n) = tdcr(n− 2)− tvb(n− 2)

θ−(n) = tdcr(n− 2) + tvb(n− 1)(27)

60


The minimum of both is used as phase error:

θ(n) = min(θ+(n), θ−(n)

)(28)

To eliminate the influence of jitter on the SPLLs frequency output the phase error is filtered

with a Butterworth filter. A first order loop filter has been implemented.The filter coefficients

can be specified in the configuration or will be set to default values.

The cutoff frequency of the filter can be chosen below 1Hz, this is still enough to follow slow

frequency drifts, but reliably filters all high frequent jitter. The structure of the filter is shown

below:

Figure 33: Loop filter structure in Direct Form I

A choice of coefficients for a first order loop filter with a cutoff frequency of 0.01Hz are:

a0 = a1 = 0.267 · 10−3

b0 = 1.0

b1 = −0.9995

(29)

Good performance provide also these filter coefficients [7, p.114]:

a0 = 0.005a1 = −0.0012b0 = 1.0

b1 = −0.9962(30)

The transfer function is

HLF (z) =a0 + a1z

−1

1 + b1z1(31)

The corresponding difference equation calculated by the loop filters is

θ(n) = a0 · θ(n) + a1 · θ(n− 1)− b1 · θ(n− 1). (32)

61


VCXO Gain

The output of the loop filter is used to control the frequency of the VCXO on the CE4100

STB. The phase error, thus also the filtered phase error will lay in the range of few ms to

tens of µs. Therefore the filtered phase error must be multiplicated with a gain factor to be

in a value range where it can control the frequency reasonably.

A too small gain factor does not exploit the whole VCXO range - which is already small

enough. E.g. at a DCR rate of 120Hz, the phase error can not become larger than T = 8.33ms,

therefore the gain must be at least RV /T to achieve full frequency control. RV denotes the

maximal numerical value that is accepted by the CE4100s VCXO control. However larger

gain than the minimum is preferable, as it reduces the settling time and the steady-state phase

error. (Still also a larger steady-state phase error is eliminated in a second step anyhow).

A too large gain factor lets the PLL react sensitive to phase noise and results in a less stable

output frequency. One possibility to combine the advantages of both is to have a nonlinear

gain factor. The PLL collects statistics about the difference between reference and VCXO

frequency. The gain factor is increased at a larger frequency difference, thus providing a

shorter settling time. At a small frequency difference in contrast fout is more smooth.

The risk of a non-reachable tuning range is avoided with type II PLLs, however at the price

of a larger settling time and a far more oscillating transient phase.

The SC uses two different gain factors. The gain factor to control the VCXO as described

above and a second gain factor for separate frequency estimation PLLs. The PLL responsible

for the synchronization does not calculate any absolute frequency: The filtered phase error is

the input to the VCXO, which generates a resulting frequency of fout = f0 +K · θ. The PLL

however does not know the absolute value of f0. Therefore it also can not calculate fout.

Nevertheless the SDN should know its own refresh rate and the refresh rate of the master for

three reasons:

• The initialization step - described below - needs an estimation of the masters and its

own frequency.

• The VBLANK Detector needs information about the refresh rate to detect missed

VBLANKs.

• The elimination of the steady-state phase error is dependent on detection of frequency

lock.

Frequency estimation by averaging over a number of periods is not reliable enough as shown

in section 9.4. Estimation by a SPLL in contrast delivers an accuracy in the order of 10−4Hz

at very limited cost. A SPLL is easily implemented and requires only around ten arithmetic

operations.

Therefore the SC comprises three SPLLs:

62


1. synchronization SPLL controlling the refresh rate

2. CMD refresh rate estimation SPLL

3. local VBLANK refresh rate estimation SPLL

The working principle of both frequency estimating SPLLs is as described in 9. Page 77 in

the appendix shows a sample implementation of a SPLL.

Detection of Frequency Lock

The SC is able to find out, if its PLL is locked to the reference frequency. For this it calculates

first and second order statistic over the difference of fref and fout. Each PLL round the

difference is calculated as

∆f = fref − fout (33)

∆f is stored in a ringbuffer which provides space to collect the last 600 frequency differences.

Every five seconds the mean of ∆f is computed as the sum of elements in the ringbuffer,

divided by the number of elements N ≤ 600:

µ =

N−1∑i=0

∆fiN

. (34)

The variance is calculated from µ and the expectation over (∆f)2:

σ2 =1

N

N−1∑i=0

(∆fi)2 − µ2 (35)

The frequency is accepted as locked, if the outcome of the mean and the variance are below

a threshold.

Elimination of Steady-State Phase Error

The PLL type is type I due to the reasons explained in section 9.4. The drawback is a non-zero

steady-state phase error. However this phase error is not acceptable, because it destroys the

stereo vision. Increasing the gain factor is insufficient, because it can decrease but not remove

the steady-state phase error. Despite, the consequence is a less stable frequency output.

Therefore a different approach is used to remove the steady-state phase error.

The idea to achieve a zero phase error and frequency lock is the following:

A steady-state phase error is required to compensate the frequency difference between fref

63


and fout. The equation for the PLL output frequency is replaced by

fout(n) = f0 +G ·(θ(n) + γ(n)

), (36)

If γ(n) is equal to the expected steady-state phase error, the PLL will lock with a phase error

of zero.

However the steady-state phase error depends on the difference of fref and f0. The process

of finding γ is:

1. Calculation of statistics to detect frequency lock. On frequency lock the steady-state

phase error is known.

2. Stepwise adjustment of γ:

γ = γ + k · θ(n) (37)

where k ≤ 1. Higher values of k increase the steady-state phase elimination but tend

to less stability. A k of 0.25 works reliable on the CE4100 STB.

3. A change of γ leads to a temporal increase or decrease of fout. Therefore for reduction of

phase error and regained frequency lock needs to be waited. Thus to step 1 is returned.

The whole process is iteratively executed.

γ is continuously adapted. This allows the PLL to follow also frequency drifts. Nevertheless

the iterative phase error elimination requires several minutes, depending on k. The transient

time to acquire frequency lock and zero phase error thus can be distinguished into two steps:

• First the PLL obtains frequency lock with constant phase offset. The phase error is

reduced up to the steady-state phase error as fast as the VCXO tuning range allows.

With a sufficient VCXO gain the phase error will already in this stage be reduced to

an amount of less than 1ms.

• The second stage iteratively eliminates the remaining phase error and thereby increases

synchronization accuracy

Figure 34 shows a simulation of the γ adaptation process. The left image shows the phase

error θ, the filtered phase error θ and γ. In the right the reference frequency and the VCXO

output frequency are shown. In the simulation k = 0.15 is used. The inner loop filter is a

second order Butterworth lowpass.

The steady-state phase error approaches zero, while γ is iteratively adjusted.

64


0 50 100 150 200 250 300 350 400 450 500−5

−4

−3

−2

−1

0

1

2x 10

−3

time [s]

phas

e of

fset

[s]

θθ filteredγ


0 50 100 150 200 250 300 350 400 450 500119.98

119.982

119.984

119.986

119.988

119.99

119.992

119.994

119.996

119.998

120

time [s]

freq

uenc

y [H

z]

fref

fout


Figure 34: Simulation of iterative steady-state phase error elimination

Two Step Initialization

In the following a two step initialization process is described, which aims to establish good

initial conditions for synchronization. The goal is to reduce the time that the PLL requires

until frequency lock with steady-state phase error of zero. Within this initialization synchro-

nization is not yet started.

Estimation of γ: If γ would be known in advance, frequency lock with a zero steady-state

phase error would be sped up enormously. A first initialization stage calculates an initial

value for γ. For this, the frequencies of fref and f0 are estimated by two separate SPLLs.

From the difference of fref and f0 and the approximately known VCXO range an estimation

of γ can be calculated: The VCXO range is roughly ±0.025Hz. Thus γ is calculated as

γ =

(fref − f0|Rfout |

·RV

)· 1

K(38)

where Rfout is the tuning range of the VCXO, RV is the numerical input range10 of the VCXO

control voltage and K is the PLL gain.

When the PLL has acquired stable frequency lock, γ is set to the estimated value.

VBLANK Resets: The ideal case would be a perfectly guessed γ and an initial phase offset of

(nearly) zero. The alignment of VBLANK phase is random and it is not directly influenceable.

The only possibility to change the position of VBLANKs is to force the graphics card to reset

the display output. One possibility to achieve this is to switch between different refresh rates

to prompt the graphics driver to reconfigure the display.

The display driver of the CE4100 provides a function which configures the whole display

10For the exact value see the extra appendix “Software Documentation”

65


settings. Calling this function twice - once with an arbitrary refresh rate and a second time

with the desired refresh rate - includes the chance, that the phase of the VBLANKs has

shifted afterwards. The implementation switches to 100Hz and then back to the original

120Hz.

A phase shift however is random from the applications view and not guaranteed. To increase

the chance an arbitrary delay of 10ms is included. The probability of a random phase offset

of zero is very small. Thus a threshold is defined within a phase offset is accepted. The

decision of a certain threshold value must be taken with care:

• A too small threshold leaves the capabilities unexploited. Nevertheless the consequence

of a too small threshold is less severe than that of a too high one.

• A too large threshold increases the problem that no phase shift within the threshold

is found. It is unfeasible not to limit the time in which VBLANK resets are tried.

First there is no determinism in this process, and no synchronization within this step is

possible. Second, at some point in time – depending on the initial phase offset and the

position of fref in the tuneable VCXO range – it would be quicker to remove the phase

offset by increasing or decreasing fout than waiting for a small random phase offset.

Time limiting the VBLANK reset stage comprises the risk, that the phase offset on

termination is larger than phase offsets “diced” before.

An empirical value for the threshold is T/4.

66


12.2.3 RTT Estimator

For a proper framelock and a good stereo visibility it is not enough to synchronize the

frequency, but the phase offset must be reduced to zero. To estimate the phase at the CMD,

the SDNs must know how long the DCR packet was delayed on the transmission. The

RTT estimator sends ICMP [8] packets to the CMD, conforming to the commonly used ping

method. As this is implemented on nearly every network capable machine, no additional

implementation at the CMD is necessary.

The destination address for the ICMP packets is known, as it is the source of the sync packets.

The IMCP packets are sent with type 8 (ICMP Echo Request). The payload of these

packets is a timestamp, which is generated by the RTT estimator directly before it sends the

packet. Ping packets are always returned with exactly the same payload, only the type field

gets changed to type 0 (Echo Response) and the CRC is recalculated.

The RTT estimator creates a timestamp on reception of an ICMP type 0 packet. Some fil-

tering is necessary, to avoid intermingling with ICMP packets of other sources: each ICMP

type 8 packet carries an 16-bit identifier. The identifier can be chosen arbitrarily, but as it

should be unique the process id of the sending process is a good choice.

Thus RTT estimator extracts the payload, matches the identifier number to its own process

id. If it was the origin of the ICMP echo packet it extracts the timestamp from the payload,

and subtracts it from the timestamp generated on packet ingress. The difference equals the

RTT. Included in this RTT is the processing delay at the CMD. Nevertheless this delay is

small enough to neglect it: The replay packet is formed in kernelspace and requires almost

no computations, except the for the CRC - which is a quite simple calculation. Besides, the

RTT is used to estimate the delay between the VBLANK at the sender and the arrival time

of the sync packet. Also this transmission includes some processing delay, which is corrected

by a manual offset. The processing delay of creating the RTT reply compensates therefore a

part of this offset.

The interval between two RTT estimations is configured via the configuration file. Experi-

ments showed that values in the range of several seconds work well. However the tests so

far have been made under good conditions, in a different environment another configuration

might be necessary. The interval has to be chosen small enough to let the estimated RTT

follow the dynamics of the transmission delay. Nevertheless a too small interval increases the

network load on the CMD and might, depending on the number of SDNs, increase the jitter

of the sync packets.

RTT filtering The estimated RTT is subtracted from the arrival time of the sync packets,

as an close approximation of the original VBLANK time at the CMD. Due to this, jitter

67


in the RTT has a linear influence on the phase error and thus also results in a less stable

frequency output of the PLL.

To avoid degradation of the framelock caused by jitter on the RTT measurements, the RTT

needs to be filtered. A good choice is an EWMA. The EWMA is an IIR filter, where older val-

ues are multiplied with decreasing weights. Its impulse response is an decreasing exponential

function. The EWMA is calculated according to

y(n) = α · x(n) + (1− α) · y(n− 1). (39)

For filtering the RTT two characteristics have to be traded off:

• fast adaptation to step-like changes in the RTT

• sufficient filtering of random jitter

Figure 35 shows different values of α. The RTT has mean 0.2ms between 0 and 35s, afterwards

the RTT jumps to a mean of 0.5ms. The standard deviation is 0.1.

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

RT

T [m

s]

time [s]

filtered, α = 0.1

filtered, α=0.2

filtered, α=0.4measured RTT


Figure 35: Comparison of EWMA filter results for different α

From the figure it can be seen that for coefficient α = 0.1 it takes the EWMA more than 20s

to adapt to the new RTT. With α = 0.2 it instead takes around 7s, of course at the price of

less smooth filtering.

Figure 36 compares the filtering performance for update intervals of 1s and 5s, both with

α = 0.25.

68


0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

RT

T [m

s]

time [s]

filtered, 1s intervalfiltered, 5s intervalmeasured, 1s intervalmeasured, 5s interval


Figure 36: Filtered RTT with different update intervals

The choice of EWMA filter coefficient α and the update interval need to be chosen carefully

in respect to the expected amount of jitter.

12.3 Frame deadline Predictor

To ensure steady and interruption-free video frame playback on master and slave displays

the video source needs to know at what time each frame has to be put onto the network.

Highly undesired are artifacts produced by missing or late frames. If the video source sends

frames at a higher frequency than the displays can consume, the end to end delay increases,

which should be prevented as well. Especially in the case where the video frames come from

realtime renderers, it should be ensured, that the rendering nodes have information how fast

the frames need to be generated.

In situations where the video source and the displays are not on the same network, largely

dynamic RTT has to be dealt with, to ensure that increasing RTT does not empty the buffers

at the display nodes unreversibly.

The calculation of the right duty time for each frame needs three informations:

• The framenumber of the frame that is currently displayed at the display nodes.

• The rate of frame consumption at the displays: the refresh rate.

• The round trip time between the CMD and the video source.

The precondition is complete synchronization between the CMD and the SDN, ensuring that

the framenumber and the refresh rate of the master is the same also for all slaves. Estimating

the RTT only to the master, neglecting a different RTT to the slave nodes is based on the

assumption, that – as master and slave displays are arranged to a composite display – the

69


RTT is nearly the same.

The framenumber is contained in the payload of the DCR packets. The refresh rate is deduced

from intervals between the DCR packets. Each time a DCR packet is received, the frame

deadline predictor generates a timestamp with its local clock. This timestamp is the input

to the SPLL. In the SPLL the timestamp is compared with a predicted timestamp. This

timestamp is an estimation at what time regarding the local clock the next DCR packet

arrives. If this expectation deviates from the time when the DCR message is received, the

SPLL adjusts the estimated frequency.

The estimated RTT is the delay to the source of the DCR packets. This source is not

necessarily the same as the CMD, as the packets may also be received from another frame

deadline predictor that is in relaying mode. Therefore the value of the rtt field in the DCR

packet is added to the estimated RTT.

The frametime ft(n) for a certain frame number n is then calculated by

ft(n) = (n− frame(t)) · 1

vr+ rtt,

where frame(n) is the framenumber of the currently displayed frame,

vr is the estimated refresh rate based on the local clock and

rtt is the estimated round trip time, consisting of the RTT to the source of the DCR messages

and the value in the rtt_useconds field in the DCR packets.

Figure 37: Frametime Predictor

Fıgure 37 shows the sequence diagram of the frame deadline predictor.

70

13 MEASUREMENTS AND SYNCHRONIZATION PERFORMANCE

13 Measurements and Synchronization Performance

Measurement Methods

To prove the correct synchronization different external measurement methods have been

tested:

• Making the pixelclock visible on an oscilloscope.

• Tapping the voltage of the IR diode which synchronizes the shutter glasses, and display-

ing it on an oscilloscope.

• Observing the synchronization with shutter glasses and judging the visual quality.

The pixelclock is transmitted on a pin pair as a differential voltage signal. To monitor changes

of the refresh rate, the voltage signal was tapped and displayed on an oscilloscope. While

it was possible to observe changes of the pixelclock on Intel graphics devices, the range of

the VCXO frequency control is below the precision of measurement. The pixelclock signal is

too noisy to determine the frequency exactly by the averaged estimation of the period length.

Thus it is not possible to measure a change of the pixelclock on the CE4100 STBs reliably.

As good solution to prove the synchronization of the frequency has evolved the tapping of

the IR diode voltage. The shutter glasses are synchronized by an IR diode, which is driven

by a rectangle voltage signal, with a frequency of half the refresh rate. The frequency of the

IR diode connected to the CMD is used to trigger the oscilloscope. The voltage from an IR

diode connected to the SDN is displayed as second signal. If the frequencies of CMD and

SDN are equal, no drift of the signals may occur on the oscilloscope. Thus it was possible

to verify, that the SDNs successfully acquire frequency lock. The elimination of phase offset

instead can not be proved, as a time difference between both voltage signals does not directly

correspond to a phase offset.

An effective method to validate not only frequency lock but also phase offset elimination is

to observe the displays simultaneously with shutter glasses. It turned out, that the human

brain is very sensitive to even small phase offset and ghosting. Though it presumably depends

on the viewer, the stereo effect vanishes at a phase offset of roughly 1.5ms, but is already

severely degraded at offsets of more than 1ms. This also depends on the stereo emitter. Not

all emitters leave the LCDs of the shutter glasses opened over the whole period of the frame.

Some IR emitters close both eyes at the beginning and the end of the frame period. This

ensures that ghosting is eliminated, also in case of not perfectly synchronized stereo glasses.

Of course this increases the amount of light that is filtered by the glasses to more than 50%.

To observe the tracking behavior of the PLL a variable phase offset can be set by the user.

71


Synchronization Performance

The software displays values calculated in the PLL. This are e.g. the phase error and filtered

phase error, the frequency of the DCR packets and the frequency of the VBLANKs. As

external measurements verified the synchronization to work, these values may be regarded as

trustworthy.

From the reported values it can be withdrawn that in case of frequency lock the difference

between reference frequency and local refresh rate is in the range of 10−5Hz. The phase error

in case of frequency lock and elimination of steady-state phase error is in the range of ±50µs

- ±100µs.

Settling-Time

An important characteristic is the settling-time. Though the settling-time usually is particu-

larly determined by the PLL parameters, here the small pull range of the VCXO is the main

limiting factor. While the PLL is in acquisition and reduces the phase offset, it decreases or

increases fout. However the frequency difference between f0 and the pull range limit is very

small compared to the length of one period. Therefore reduction of the phase offset always

requires a notable amount of time. Below an example is given, which describes the worst

case, if the phase error is corrected using the pull-range aware strategy.

The phase offset that can be decreased within one second is:

∆θ/s = fref

(1

fref− 1

fout

)=

(1−

freffout

)(40)

Assuming f0 = fref , that is the frequency of CMD and SDN are equal. The frequency

difference between fref and foutmax or foutmin is roughly ±0.025Hz. Thus equation 41 becomes

∆θ/s =

(1− 1

1− ±0.025Hzfref

)(41)

At a refresh rate of 120Hz, the slave can therefore catch up or let himself fall back 208µs per

second, thus reduce the phase offset about roughly 9◦/s. Here 360◦ corresponds to 2 frames,

thus two periods. At a phase shift of one period – 180◦ – the frames for right and left eye

are swapped. A 180◦ offset can be corrected by swapping the buffers for right and left eye,

thus the phase offset can be at most one period. If the phase offset is corrected using the

direction-aware strategy, the maximal phase offset is ±T/2.

In the case that the reference frequency is near to the VCXO limit, it might be faster to

correct a phase offset of T always in the same direction, than T/2 in the smaller direction,

but at slower speed.

72


In the example above, using the pull-range-aware strategy, correction of a phase error of T

requires about 40s and is the worst case. If the phase error instead is reduced using the

direction-aware strategy, this would be the ideal scenario and require only half of the time.

Furthermore the pull-range-aware strategy requires a longer initialization to estimate the pull

range limits.

Nevertheless the worst case in the error-direction aware strategy would be that the PLL tries

to catch up a phase error of T/2, with e.g. fref = 0.9999 · foutmax . Thus the acquisition time

theoretically can become infinite. As a best master clock selection algorithm aims to find the

clock which is nearest to f0 of the most slaves, this case is less probably. Still it may leave

single displays with outlying quartz frequencies unsynchronized, which is unacceptable for

the Display Wall. Therefore the pull-range aware strategy is the better choice.

Scaling for a Larger Number of Nodes

The synchronization architecture itself scales well also for a larger number of nodes. The net-

work load for each additional display node is – at the the side of synchronization – negligible.

As the DCR messages are transmitted by broadcast or multicast no additional sync packets

are necessary. The additional display node will send and receive ICMP Echo packets, but at

a very low rate of one packet each n seconds.

However each display node needs its own video stream, thus each additional SDN adds traffic

to the network. The number of displays is therefore limited by the bandwidth of the IP

network and the point where congestion produces so many packet losses that a synchronization

is not reliable any more.

However, assuming a video bitrate of 5Mbps the traffic for 20 display nodes will only consume

10% of the available bandwidth of gigabit Ethernet. In this example each additional node

increases the network load by 0.5% of the bandwith.

One possibility to reduce the impact of network load on the jitter of DCR packets would be

to include a QoS priority handling for the DCR packets.

73

14 OUTLOOK

14 Outlook

The prototype has proven the synchronization to work accurate and provide enough stability

to create a good visual quality for stereo content. It is expected to scale also for a larger

number of displays.

It has shown that after acquisition of frequency lock, good tracking performance is provided.

The used STBs maintain the VCXO frequency, once it is set. Thus also subsampling the

refresh rate to a lower DCR packet rate is possible – presumed a good filtering of the jitter.

Even a temporary loss of network packets due to loss of the network connection does not

break down the synchronization. Within this time synchronization is suspended, but the

STBs maintain the last set refresh rate. Thus the displays will maintain framelock under

the following conditions: the displays have already acquired frame lock and the frequency

drift of the oscillators at slave and master is negligible. Of course with increasing time

of suspended synchronization it is more likely that the framelock might get lost. At the

moment when packet reception continues, the synchronization is resumed and will regain

accurate synchronization.

There are some points, where improvements of the synchronization performance might be

achieved.

• Further effort can be put on the loop filter of the PLL. The synchronization architecture

was developed and tested in a very favorable environment, with low network traffic.

Other preconditions may require PLL characteristics that tend more towards stability

and increased phase noise filtering.

• All software is currently implemented in user space. Shifting the detection of VBLANKs

into kernel space might reduce the jitter on the local refresh rate estimation, as this

reduces the number of function calls between the VBLANK event and the timestamp

generation. The timestamps at ingress of DCR packets are already in this implementa-

tion created by the network card driver, thus in kernel space.

Looking back, it was shown that a synchronization of active stereo displays is possible without

dedicated synchronization cabling, over an IP network. We have found a hardware that

features a refresh rate control which works good for a very precise synchronization. While

precision might be increased with further effort, this might not be necessary, as the precision

provided by the prototype is sufficient to ensure a framelock that is accurate enough to create

a good stereo impression.

74

References

References

[1] J. Allard, V. Gouranton, G. Lamarque, E. Melin, and B. Raffin. Softgenlock: Active

Stereo and Genlock for PC Cluster. In Proceedings of the Joint IPT/EGVE’03 Workshop,

pages 255–260, 2003.

[2] D. K. Banerjee. PLL performance, simulation, and design. Indianapolis and IN, 2006.

[3] Digital Display Working Group. Digital Visual Interface. 1.0 edition, 1999.

[4] J. Eidson. IEEE 1588: An Update on the Standard and its Application. 2006.

[5] G. M. Garner. IEEE 802.1AS and IEEE 1588. 2010.

[6] HDMI Licensing, LLC. High-Definition Media Interface Specification. Number 1.4. 2010.

[7] T. Herfet. Future Media Internet: Video- & Audiotransport - A new Paradigm. 2009.

[8] IETF. Internet Control Message Protocol. Number RFC 792. 1981.

[9] IETF. Network Time Protocol Version 4: Protocol and Algorithms Specification. Number

5905. 2010.

[10] Integrated Device Technology. IDT6V49061A: VCXO Audio/Video Clock Generator.

2011.

[11] Intel Corporation. Intel CE platform references (confidential).

[12] Intel Corporation. Intel 965 Express Chipset Family and Intel G35 Express Chipset

Graphics Controller PRM: Programmer’s Reference Manual (PRM). 2008.

[13] ISO/IEC. Information technology — Generic coding of moving pictures and associated

audio information: Systems. Number 13818-1:2000(E). 2 edition, 2000.

[14] J. Kiszka, B. Wagner, Y. Zhang, and J. Broenink. RTnet - A flexible Hard Real-Time

Networking Framework. 2005.

[15] Nirnimesh and P. J. Narayanan. Scalable, Tiled Display Wall for Graphics using a

Coordinated Cluster of PCs. 2006.

[16] NVIDIA Corporation. NVIDIA Quadro G-Sync II User Guide. http://de.download.

nvidia.com/nvidia/Quadro/PDFs/Quadro_GSync_5800_4800_install_guide.pdf,

2011-11-26.

[17] M. H. Perrott. PLL Design Using the PLL Design Assistant Program. 2008.

[18] K. B. Stanton. 802.1AS Tutorial. 2008.

[19] M. Waschbusch, D. Cotting, M. Duller, and M. Gross. WinSGL: Software Genlocking

for Cost-Effective Display Synchronization under Microsoft Windows. 2006.

[20] M. Waschbusch, D. Cotting, M. Duller, and M. Gross. WinSGL: synchronizing displays

in parallel graphics using cost-effective software genlocking. Parallel Computing, vol.

75

http://de.download.nvidia.com/nvidia/Quadro/PDFs/Quadro_GSync_5800_4800_install_guide.pdf

http://de.download.nvidia.com/nvidia/Quadro/PDFs/Quadro_GSync_5800_4800_install_guide.pdf

Appendix References

33(6):420–437, 2007.

[21] H. Weibel. IEEE 1588, Standard for a Precision Clock Synchronization Protocol. 2006.

[22] H. Weibel. Technology Update on IEEE 1588: The Second Edition of the High Precision

Clock Synchronization Protocol. 2009.

[23] X.org Foundation. Development/Documentation/HowVideoCardsWork. http://wiki.

x.org/wiki/Development/Documentation/HowVideoCardsWork, 2011-01-21.

76

http://wiki.x.org/wiki/Development/Documentation/HowVideoCardsWork

http://wiki.x.org/wiki/Development/Documentation/HowVideoCardsWork

Appendix A SPLL CODE SNIPPET

A SPLL Code Snippet

The following listing shows a minimalistic example of a SPLL for comparison of timestamps.SPLL.c -- Printed on 21.12.2011, 14:23:51 -- Page 1

double SPLL(t_dcr) {

static double theta_filtered;

static double theta[2];

static double fout; // output frequency

static double t_osc; // predicted oscillator timestamp

// SPLL parameters:

double a0 = 0.00013;

double a1 = 0.00013;

double b1 = -0.99997;

double f0 = 120.0;

double K = 1.0;

// phase detector

theta[0] = t_osc-t_dcr;

// inner loop filter

theta_filtered = a0*theta[0] + a1*theta[1] - b1*theta_filtered;

// VCXO control

fout = f0 + K*theta_filtered[0];

// phase prediction

t_osc= 1/fout + t_osc;

theta[1] = theta[0];

return fout;

}

D:\Uni\MA\Code\SPLL.c -- File date: 21.12.2011 -- File time: 14:23:45

Figure 38: Minimalistic SPLL code snippet

77

Appendix B DISPLAY TIMINGS ON INTEL GRAPHICS CARDS

B Display Timings on Intel Graphics Cards

The following part shortly summarizes the registers that must be accessed for refresh rate

adaptation on Intel graphics devices. As this method of refresh rate adaptation is not used in

the developed prototype, only a coarse overview is given. More details can be found in [12].

Pixelclock Calculation

The exact calculation of the pixelclock on Intel graphics depends on the graphics chipset. In

most chipsets it is calculated by:

dotclock =refclock · (5 · (M1 + 2) + (M2 + 2))

(N + 2) · (P1 · P2).

The integrated graphics in the Intel Atom, named Pineview uses this formula:

dotclock =refclock · (M2 + 2))

N · (P1 · P2).

On Pineview the reference clock is 120MHz, on devices of generation four it is 96MHz.

The limits of the parameters are summarized in table 3

parameter min max

M1 10 22

M2 5 9

N 1 6

P1 1 8

VCO 1750MHz 3500MHz

Table 3: Parameters for pixelclock on Intel graphics devices of fourth generation

The Intel graphics devices have two pixel pipes, denoted with A and B. All registers exist

once for pipe A and once for pipe B. The registers that determine the pixelclock parameters

are summarized shortly below. More details can be found in [12].

DPLL Control Register

DPPLA/DPLLB control register (0x06014/0x6018)

- bit 31 controls/indicates which Pipe is currently in use (A/B)

- bit 8 controls which one of the DPLL Divisor Registers (e.g. FPA0/FPA1) is used

- bits 25:24 determine P2

- bits 23:16 determine P1

78


31 07

FP[A,B]0 P1 Post Divisor ¹

8

FP[A,B]0FP[A,B]1

selector ¹

¹ Only valid on [DevCTG]

912131415

0

1623

FP[A,B]0/1 P1 Post Divisor

2425

FP[A,B]0/1 P2 Post Divisor

2627282930

0

DPLL[A,B] VCO Enable

DPLL[A,B] Mode Select ²

0 1²

² 1 0 in LVDS mode (only on mobile devices)

Figure 39: DPLL Control Register

- bits 7:0 determine P1 (only on DevCTG11)

DPLL Divisor Register

31 05

FP[A,B] M2 Divisor ¹

678131415162124252627282930

0 0 0 0 0

¹ register value is two less than actual value

FP[A,B] M1 Divisor ¹FP[A,B] N Divisor ¹0 0 0 0 0 0

Figure 40: FPA Control Register

FPA0/FPA1/FPB0/FPB1 DPLL Divisor register (0x06040/0x06044/0x06048/0x0604C)

- two registers for each pipe, which contain the same parameters; switching between both

registers is possible

- bits 21:16 determine parameter N

- bits 13:8 determine M1

- bits 5:0 determine M2

B.1 Display Pipe timing registers

The register that determine the display resolution are summarized in table 4:

11Intel R© GM45 Chipset

79


register name address offset function

HTOTAL A/B 0x60000/0x61000 bits 31:16 horizontal totalbits 15:0 horizontal active

HBLANK A/B 0x60004/0x61004 bits 31:16 hblank endbits 15:0 hblank start

HSYNC A/B 0x60008/0x61008 bits 31:16 hsync endbits 15:0 hsync start

VTOTAL A/B 0x6000C/0x6100C bits 31:16 vertical totalbits 15:0 vertical active

VBLANK A/B 0x60010/0x61010 bits 31:16 vblank endbits 15:0 vblank start

VSYNC A/B 0x60014/0x61014 bits 31:16 vsync endbits 15:0 vsync start

Table 4: Resolution registers

B.2 Intel graphics devices generations

Generation 2 i830, 845g, i85x, i865

Generation 3 i915g, i915gm, i945g, i945gm, Pineview

Generation 4 i965g (Broadwater), i965gm (Crestline), G33, G45, GM45

Generation 5 Ironlake

Generation 6 Sandy Bridge

Table 5: Generation of the Intel GMAs

80

Appendix List of Figures

List of Figures

1 Projected Prototype . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 pixel alignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Pixel transmission on the display cable . . . . . . . . . . . . . . . . . . . . . . 11

4 NTP Synchronization Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5 PTP Syntonization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6 Synchronization with 802.1as capable bridges . . . . . . . . . . . . . . . . . . 19

7 PCR in Mpeg Transport Stream . . . . . . . . . . . . . . . . . . . . . . . . . 21

8 PCR at transmitter and receiver . . . . . . . . . . . . . . . . . . . . . . . . . 22

9 Active Shutter Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

10 Left- and right-handed circular polarized waves . . . . . . . . . . . . . . . . . 24

11 HDMI 1.4a Frame Packing (compare p.8[9]) . . . . . . . . . . . . . . . . . . . 25

12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

13 Refresh rates around 120Hz on an Intel 965GM integrated graphics . . . . . . 35

14 Phase-Locked-Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

15 Phase Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

16 Relation between VCO pulled frequency and gain K . . . . . . . . . . . . . . 39

17 Frequency Ranges of PLLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

18 Settling behavior of type-I (left) and type-II (right) PLL . . . . . . . . . . . . 41

19 Bandwidth definition of loop filter . . . . . . . . . . . . . . . . . . . . . . . . 42

20 Bode plots of Butterworth filters with different orders . . . . . . . . . . . . . 43

21 Comparison of a PLL and EWMA averaging . . . . . . . . . . . . . . . . . . 45

22 Simulation of filtering performance of different loop filter orders . . . . . . . . 47

23 Simulation of fout with different PLL orders. . . . . . . . . . . . . . . . . . . 47

24 Simulation of different cutoff frequencies . . . . . . . . . . . . . . . . . . . . . 48

25 Settling behavior of a type II PLL with small filter gain . . . . . . . . . . . . 49

26 Impact of frequency differences . . . . . . . . . . . . . . . . . . . . . . . . . . 50

27 Synchronization Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

28 Format of the DCR packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

29 Flow diagram of the CMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

30 Blockdiagram of the SDNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

31 Consequence of a lost VBLANK . . . . . . . . . . . . . . . . . . . . . . . . . 58

32 Direction of phase correction . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

33 Loop filter structure in Direct Form I . . . . . . . . . . . . . . . . . . . . . . 61

34 Simulation of iterative steady-state phase error elimination . . . . . . . . . . 65

35 Comparison of EWMA filter results for different α . . . . . . . . . . . . . . . 68

36 Filtered RTT with different update intervals . . . . . . . . . . . . . . . . . . . 69

37 Frametime Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

38 Minimalistic SPLL code snippet . . . . . . . . . . . . . . . . . . . . . . . . . . 77

81

Appendix List of Tables

39 DPLL Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

40 FPA Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

List of Tables

1 Modeline for 720p at 120Hz . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2 Parameters for different dot clocks on Intel graphics . . . . . . . . . . . . . . 35

3 Parameters for pixelclock on Intel graphics devices of fourth generation . . . 78

4 Resolution registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

5 Generation of the Intel GMAs . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

82

Appendix Glossary

Glossary

BMCA best master clock algorithm - algorithm in PTP that aims to find the most stable

and accurate clock amongst multiple candidates and hierarchical topologies builds up

a loop free clock distribution tree.

CMD Clock Master Display - node which provides the frequency information for the other

display nodes.

CVT Coordinated Video Timings - formula which defines the size of blanking on LCDs for

each display resolution, specification from VESA.

DCO Digitally Controlled Oscillator - oscillator whose frequency output is digitally con-

trolled.

DCR Display Clock Reference - small UDP packets that provide information for synchroniza-

tion.

DPLL Digital Phase-Locked-Loop - digital, time discrete PLL.

DTS Decoding Timestamp - timestamp relative to the PCR which tells the receiver the

correct decoding time of a frame, used in MPEG TS and MPEG PS.

DTV Digital Television Set.

EWMA Exponentially Weighted Moving Average - IIR filter with exponentially decreasing

impulse response.

GM grandmaster - root node that provides the master clock in 802.1as clock trees.

GTF General Timing Formula - formula which defines the size of blanking on CRTs for each

display resolution, specification from VESA.

HBLANK horizontal blanking interval - the time in which the electron beam of a CRT drives

from the end of the line to the beginning of the next line; still present in digital video

outputs.

HSYNC horizontal sync interval - a signal on the video cable which signaled the CRT to

begin the next line.

MPEG TS MPEG transport stream - used in not error free environments, i.e. DVB broad-

cast.

NCO Numerically Controlled Oscillator - oscillator whose frequency can be set by numerical

values.

83

Appendix Glossary

NTP Network Time Protocol - network protocol for clock synchronization, also over large

distances and many hops.

PCR program clock reference - 27MHz timestamp in MPEG TS, sampled in 90kHz.

PLL Phase-Locked-Loop - closed loop feedback system for frequency and phase synchroniza-

tion.

PTP Precision Time Protocol - network protocol for sub-µs precise clock synchronization.

PTS Presentation Time Stamp - timestamp relative to the PCR which tells the receiver the

correct playout time of a frame, used in MPEG TS and MPEG PS.

SCR system clock reference - 27MHz timestamp in MPEG PS, sampled in 90kHz.

SDN Slave Display Node - node in the Display Wall which synchronizes to the CMD.

SPLL Software Phase-Locked-Loop - PLL implemented purely in software.

STC System Time Clock - system clock.

VBI Vertical Blanking Interval.

VBLANK vertical blanking interval - the time in which the electron beam of a CRT drives

from the bottom right to the top left corner; still present in digital video outputs.

VCO Voltage Controlled Oscillator - oscillator whose frequency can be influenced by an

external control voltage.

VSYNC vertical sync interval - a signal on the video cable which signaled the CRT to begin

the next frame.

84

Documents

Active-Stereo Synchronization of multiple Displays via Ethernet