Linux Webcam

Martin Rubli

Building aWebcam Infrastructure

for GNU/Linux

Master ThesisEPFL, Switzerland2006

Prof. Matthias Grossglauser,Laboratory for Computer

Communications and Applications, EPFL

Richard Nicolet, LogitechRemy Zimmermann, Logitech

© 2006, Martin RubliSchool of Computer and Communication Sciences,Swiss Federal Institute of Technology, Lausanne, Switzerland

Logitech, Fremont, California

Revision a.

All trademarks used are properties of their respective owners.

This document was set in Meridien LT and Frutiger using the LATEXtypesetting system on Debian GNU/Linux.

Abstract

In this thesis we analyze the current state of webcam support on the GNU/Linuxplatform. Based on the results gained from that analysis we develop a frame-work of new software components and improve the current platform with thegoal of enhancing the user experience of webcam owners. Along the waywe get a close insight of the components involved in streaming video from awebcam and of what today’s hardware is capable of doing.

Contents

1 Introduction 1

2 Current state of webcam hardware 32.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.3 Logitech webcams . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.3.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3.2 Cameras using proprietary protocols . . . . . . . . . . . . 52.3.3 USB Video Class cameras . . . . . . . . . . . . . . . . . . 6

2.4 USB Video Class . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 82.4.2 Device descriptor . . . . . . . . . . . . . . . . . . . . . . . 92.4.3 Device topology . . . . . . . . . . . . . . . . . . . . . . . 92.4.4 Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.4.5 Payload formats . . . . . . . . . . . . . . . . . . . . . . . 102.4.6 Transfer modes . . . . . . . . . . . . . . . . . . . . . . . . 10

2.5 Non-Logitech cameras . . . . . . . . . . . . . . . . . . . . . . . . 10

3 An introduction to Linux multimedia 123.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.2 Linux kernel multimedia support . . . . . . . . . . . . . . . . . . 12

3.2.1 A brief history of Video4Linux . . . . . . . . . . . . . . . 123.2.2 Linux audio support . . . . . . . . . . . . . . . . . . . . . 13

3.3 Linux user mode multimedia support . . . . . . . . . . . . . . . 133.3.1 GStreamer . . . . . . . . . . . . . . . . . . . . . . . . . . 143.3.2 NMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Current discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Current state of Linux webcam support 184.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.1.1 Webcams and audio . . . . . . . . . . . . . . . . . . . . . 184.2 V4L2: Video for Linux Two . . . . . . . . . . . . . . . . . . . . . 19

4.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2.2 The API . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

iv

4.2.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3 Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3.1 The Philips USB Webcam driver . . . . . . . . . . . . . . 214.3.2 The Spca5xx Webcam driver . . . . . . . . . . . . . . . . 224.3.3 The QuickCam Messenger & Communicate driver . . . . 224.3.4 The QuickCam Express driver . . . . . . . . . . . . . . . 234.3.5 The Linux USB Video Class driver . . . . . . . . . . . . . 23

4.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4.1 V4L2 applications . . . . . . . . . . . . . . . . . . . . . . 244.4.2 V4L applications . . . . . . . . . . . . . . . . . . . . . . . 264.4.3 GStreamer applications . . . . . . . . . . . . . . . . . . . 26

4.5 Problems and design issues . . . . . . . . . . . . . . . . . . . . . 274.5.1 Kernel mode vs. user mode . . . . . . . . . . . . . . . . . 274.5.2 The Video4Linux user mode library . . . . . . . . . . . . 334.5.3 V4L2 related problems . . . . . . . . . . . . . . . . . . . . 34

5 Designing the webcam infrastructure 395.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.3 Architecture overview . . . . . . . . . . . . . . . . . . . . . . . . 435.4 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.4.2 UVC driver . . . . . . . . . . . . . . . . . . . . . . . . . . 465.4.3 V4L2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.4.4 GStreamer . . . . . . . . . . . . . . . . . . . . . . . . . . 475.4.5 v4l2src . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.4.6 lvfilter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.4.7 LVGstCap (part 1 of 3: video streaming) . . . . . . . . . . 505.4.8 libwebcam . . . . . . . . . . . . . . . . . . . . . . . . . . 505.4.9 libwebcampanel . . . . . . . . . . . . . . . . . . . . . . . 515.4.10 LVGstCap (part 2 of 3: camera controls) . . . . . . . . . . 515.4.11 liblumvp . . . . . . . . . . . . . . . . . . . . . . . . . . . 525.4.12 LVGstCap (part 3 of 3: feature controls) . . . . . . . . . . 525.4.13 lvcmdpanel . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.5 Flashback: current problems . . . . . . . . . . . . . . . . . . . . 53

6 Enhancing existing components 566.1 Linux UVC driver . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6.1.1 Multiple open . . . . . . . . . . . . . . . . . . . . . . . . 566.1.2 UVC extension support . . . . . . . . . . . . . . . . . . . 596.1.3 V4L2 controls in sysfs . . . . . . . . . . . . . . . . . . . . 62

6.2 Video4Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.3 GStreamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 656.4 Bits and pieces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

v

7 New components 677.1 libwebcam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.1.1 Enumeration functions . . . . . . . . . . . . . . . . . . . 687.1.2 Thread-safety . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.2 liblumvp and lvfilter . . . . . . . . . . . . . . . . . . . . . . . . . 707.3 libwebcampanel . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.3.1 Meta information . . . . . . . . . . . . . . . . . . . . . . 717.3.2 Feature controls . . . . . . . . . . . . . . . . . . . . . . . 73

7.4 Build system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.5.1 UVC driver . . . . . . . . . . . . . . . . . . . . . . . . . . 757.5.2 Linux webcam framework . . . . . . . . . . . . . . . . . 78

7.6 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 797.7 Licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

7.7.1 Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 807.7.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.8 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

8 The new webcam infrastructure at work 838.1 LVGstCap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 838.2 lvcmdpanel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

9 Conclusion 86

A List of Logitech webcam USB PIDs 88

vi

Chapter 1

Introduction

Getting a webcam to work on Linux is a challenge on different levels. Makingthe system recognize the device properly sets the bar to a level that manyusers feel unable to cross, often for mostly unsubstantiated fear of compilingkernel drivers. Even once that first hurdle is cleared, the adventure has onlyjust started. A webcam is perfectly useless without good software that takesadvantage of its features, so where do users go from here? Since the firstwebcams appeared on the market, they have evolved from simple devices thatcaptured relatively poor quality videos the size of a postage stamp to high-tech devices that allow screen-filling videos to be recorded all while applyingcomplex real-time video processing in hardware and software.

Traditionally, Linux has been used for server installations and only in therecent years has it started to conquer the desktop. This fact still shows in theform of two important differences when one compares webcam support onLinux and Windows. For one, Linux applications have primarily focused onretrieving still images from the cameras, oftentimes for "live" cameras on theInternet that update a static picture every few seconds. These programs oftenwork in a headless environment, i.e. one that does not require a graphicaluser interface and a physical screen. For another, webcam manufacturershave provided little support for the Linux platform, most of which was in theform of giving technical information to the open source community withouttaking the opportunity to actively participate and influence the direction thatwebcam software takes.

This project is an attempt of Logitech to change this in order to provideLinux users with an improved webcam experience that eventually convergestowards the one that Windows users enjoy today.

Obviously, the timeline of such an undertaking is in the order of years dueto the sheer amount of components and people involved. Luckily, the scopeof a Master thesis is enough to lay the foundations that are required, not onlyof a technical nature but also in terms of establishing discussions between theparties involved.

In the course of this project, apart from presenting the newly developed

1

framework, we will look at many of the components that already exist today,highlighting their strengths but also their weaknesses. It was this extensiveanalysis that eventually led to the design of the proposed framework in anattempt to learn from previous mistakes and raise awareness of current lim-itations. The latter is especially important for a platform that has to keep upwith powerful and agile competitors.

The foundations we laid with the Linux webcam framework make it easierfor developers to base their products on a common core which reduces devel-opment time, increases stability, and makes applications easier to maintain.All of these are key to establishing a successful multimedia platform and de-livering users the experience they expect from an operating system that hasofficially set out to conquer the desktop.

I would like to thank first of all my supervisors at Logitech, Richard Nicoletand Remy Zimmermann, for their advice and the expertise they shared withme, but also the rest of the video driver and firmware team for their big helpwith various questions that kept coming up. Thanks also to Matthias Gross-glauser, my supervisor at EPFL, for his guidance.

A big thank you to the people in the open source community I got towork with or ask questions to. In particular this goes to Laurent Pinchart,the author of the Linux UVC driver, first of all for having written the driver,thereby letting me concentrate on the higher-level components, and secondof all for the constructive collaboration in extending it.

Last but not least, thanks to everybody who helped make this project hap-pen in one way or another but whose name did not make it into this section.

Fremont, USA, September 2006

2

Chapter 2

Current state of webcamhardware

2.1 Introduction

The goal of this chapter is to give an overview of the webcams that are cur-rently on the market. We will first focus on Logitech devices and devote asmall section to cameras of other vendors later on. We will also give anoverview of the USB Video Class, or simply UVC, specification, which is thedesignated standard for all future USB camera devices. The Linux webcamframework was designed primarily with UVC devices in mind and the maingoal of this chapter is to present the hardware requirements of the framework.Therefore, the majority of the chapter is dedicated to UVC cameras as devicesusing proprietary protocols are slowly phased out by the manufacturers. Wewill nevertheless mention the most important past generations of webcamsbecause some of them remain in broad use and it will be interesting to seehow they differ in functionality.

2.2 Terminology

There are a few terms that will keep coming up in the rest of the report. Letus quickly go over some of them to avoid any terminology related confusion.

USB modes In the context of USB we will often use the terms high-speed todenote USB 2.0 operation and full-speed for the USB 1.x case. There also existsa mode called low-speed that was designed for very low bandwidth devices likekeyboards or mice. For webcams, low-speed is irrelevant.

Image resolutions There is a number of standard resolutions that have cor-responding acronyms. We will sometimes use these acronyms for readability’s

3

sake. Table 2.1 has a list of the most common ones.1

Width [px] Height [px] Acronym

160 120 QSIF

176 144 QCIF

320 240 QVGA (also SIF)

352 288 CIF

640 480 VGA

1024 768 XGA

1280 960 SXGA (4:3)

1280 1024 SXGA (5:4)

Table 2.1: List of standard resolutions and commonly used acronyms.

2.3 Logitech webcams

2.3.1 History

In the last years the market has seen a myriad of different webcam models andtechnologies. The first webcams were devices for the parallel port allowingvery limited bandwidth and a user experience that was far from the plug-and-play that users take for granted nowadays.

With the advent of the Universal Serial Bus, webcams finally became com-fortable and simple enough to use for the average PC user. Driver installationbecame simple and multiple devices could share the bus. Using a printer and awebcam at the same time was no longer a problem. One of the limitations ofUSB, however, was a bandwidth that was still relatively low and image resolu-tions above 320x240 pixels required compression algorithms that could sendVGA images over the bus at tolerable frame rates.

Higher resolution video at 25 or more frames per second only became pos-sible when USB 2.0 was introduced. A maximum theoretical transfer rateof 480 Mb/s provides enough reserves for the next generations of webcamswith multi-megapixel sensors. All recent Logitech cameras take advantage ofUSB 2.0, although they still work on USB 1.x controllers, albeit with a limitedresolution set.

1For some of the acronyms there exist different resolutions depending on the analog videostandard they were derived from. For example, 352x288 is the PAL version of CIF whereas NTSCCIF is 352x240.

4

2.3.2 Cameras using proprietary protocols

From a driver point of view Logitech cameras are best distinguished by theASIC2 they are based on. While the sensors are also an important componentthat the driver has to know about, such knowledge becomes less importantbecause the firmware hides sensor specific commands from the USB interface.In the case of UVC cameras, even the ASIC is completely abstracted by theprotocol and–in the optimal case–every UVC camera works with any UVCdriver, at least as far as the functionality covered by the standard is concerned.

This following list shows a number of Logitech’s non-UVC cameras and istherefore grouped by the ASIC family they use. We will see in chapter 4 thatthis categorization is useful when it comes to selecting a driver.

Vimicro 30x based Cameras with the Vimicro 301 or 302 chips are USB1.1 devices, in the case of the 302 with built-in audio support. They supporta maximum resolution of VGA at 15 frames per second. Apart from uncom-pressed YUV data, they can also deliver uncompressed 8 or 9-bit RGB Bayerdata or, with the help of an integrated encoder chip, JPEG frames.

• Logitech QuickCam IM

• Logitech QuickCam Connect

• Logitech QuickCam Chat

• Logitech QuickCam Messenger

• Logitech QuickCam for Notebooks

• Logitech QuickCam for Notebooks Deluxe

• Logitech QuickCam Communicate STX

• Labtec Webcam Plus

• Labtec Notebook Pro

Philips SAA8116 based The Philips SAA8116 is also a USB 1.1 chipset thatsupports VGA at a maximum of 15 fps. It has built-in microphone supportand delivers image data in 8, 9, or 10-bit RGB Bayer format. It can also use aproprietary YUV compression format that we will encounter again in section4.3.1 where we talk about the Linux driver for cameras based on this chip.

• Logitech QuickCam Zoom

• Logitech QuickCam Pro 3000


• Logitech QuickCam Orbit/Sphere3

2The application-specific integrated circuit in a webcam is the processor designed to process theimage data and communicate them to the host.

3There also exists a model of this camera that does not use Philips ASICs but the SPCA525described below. This model has a different USB identifier as can be seen in the table in appendixA.

5

• Logitech QuickCam Pro for Notebooks

• Logitech ViewPort AV100

• Cisco VT Camera

Sunplus SPCA561 based The Sunplus SPCA561 is a low-end USB 1.1 chipsetthat only supports the CIF format at up to 15 fps. The following is a list of cam-eras that are based on this chip:

• Logitech QuickCam Chat

• Logitech QuickCam Express

• Logitech QuickCam for Notebooks

• Labtec Webcam

• Labtec Webcam Plus

2.3.3 USB Video Class cameras

Logitech was the first webcam manufacturer to offer products that use theUSB Video Class protocol, although this transition was done in two steps. Itstarted with a first set of cameras containing the Sunplus SPCA525 chip whichsupports both a proprietary protocol as well as the UVC standard. The USBdescriptors of these cameras still announce the camera as a so-called vendorclass device. This conservative approach was due to the fact that the firstmodels did not pass all the tests required to qualify as UVC devices. As wewill see later on when we talk about the Linux UVC driver in more detail, theUVC support of these cameras is still fairly complete, which is why it simplyoverrides the device class and treats them as ordinary UVC devices.

The following is a complete list of these devices:

• Logitech QuickCam Fusion

• Logitech QuickCam Orbit MP/Sphere MP


• Logitech QuickCam for Notebooks Pro

• Logitech QuickCam for Dell Notebooks (built-in camera for notebooks)

• Acer OrbiCam (built-in camera for notebooks)

• Cisco VT Camera II

Figure 2.1 shows product photos of some of these cameras.All SPCA525 based cameras are USB 2.0 compliant and include an audio

chip. They support VGA at 30 fps and, depending on the sensor used, higherresolutions up to 1.3 megapixels at lower frame rates. To reduce the traffic onthe bus they feature a built-in JPEG encoder to support streaming of MJPEGdata in addition to uncompressed YUV.

6

(a) QuickCam Fusion (b) QuickCam Orbit MP

(c) QuickCam Pro 5000 (d) QuickCam for Note-books Pro

Figure 2.1: The first Logitech webcams with UVC support.

7

The next generation of Logitech webcams scheduled for the second half of2006 are pure UVC-compliant cameras. Among those are the QuickCam UltraVision and the 2006 model of the QuickCam Fusion.

Figure 2.2: The first pure Logitech UVC webcam: QuickCam UltraVision

All of these new cameras are supported by the Linux UVC driver and areautomatically recognized because their USB descriptors mark them as USBVideo Class devices, therefore eliminating the need to hardcode their productidentifiers in the software.

2.4 USB Video Class

2.4.1 Introduction

We have already quickly mentioned the concept of USB device classes. Eachdevice can either classify itself as a custom, vendor-specific, device or as be-longing to one of the different device classes that the USB forum has defined.There exist many device classes with some of the best-known being mass stor-age, HID (Human Interface Devices), printers, and audio devices. If an op-erating system comes with a USB class driver for a given device class, it cantake advantage of most or all of the device’s features without requiring the in-stallation of a specific driver, hence greatly adding to the user’s plug-and-playexperience.

The USB Video Class standard follows the same strategy supporting videodevices such as digital camcorders, television tuners, and webcams. It supportsa variety of features that cover the most frequently used cases while allowingdevice manufacturers to add their own extensions.

The remainder of this section gives the reader a short introduction to someof the key concepts of UVC. We will only cover what is important to under-stand the scope of this report and refer the interested reader to [6] for thetechnical details.

8

2.4.2 Device descriptor

USB devices are self-descriptive to a large degree, exporting all informationnecessary for a driver to make the device work in a so-called descriptor. Whilethe USB standard imposes a few ground rules on what the descriptor mustcontain and on the format of that data, different device classes build their ownclass-specific descriptors on top of these.

The UVC descriptor contains such information as the list of video standards,resolutions, and frame rates supported by the device as well as a descriptionof all the entities that the device defines. The host can retrieve all informationit needs from these descriptors and make the device’s features available toapplications.

2.4.3 Device topology

The functionality of UVC devices is divided up into two different entities: unitsand terminals. Terminals are data sources or data sinks with typical examplesbeing a CCD sensor or a USB endpoint. Terminals only have a single pinthrough which they can be connected to other entities. Units, on the otherhand, are intermediate entities that have at least one input and one outputpin. They can be used to select one of many inputs (selector unit) or to controlimage attributes (processing unit).

There is a special type of unit that we will talk most about in this report,the extension unit. Extension units are the means through which vendors canadd features to their devices that the UVC standard does not specify. To doanything useful with the functionality that extension units provide, the hostdriver or application must have additional knowledge about the device be-cause while the extension units themselves are self-descriptive, the controlsthey contain are not. We shall see the implications of this fact later on whenwe discuss the Linux UVC driver.

When the driver initializes the device, it enumerates its entities and buildsa graph with two terminal nodes, an input and an output terminal, and oneor multiple units in between.

2.4.4 Controls

Both, units and terminals contain sets of so-called controls through which awide range of camera settings can be changed or retrieved. Table 2.2 lists afew typical examples of such controls grouped by the entities they belong to.Note that the controls in the third column are not specified by the standard butare instead taken from the list of extension controls that the current LogitechUVC webcams provide.

9

Camera terminal Processing unit Extension units

• Exposure time

• Lens focus

• Zoom

• Motor control(pan/tilt/roll)

• Backlightcompensation

• Brightness

• Contrast

• Hue

• Saturation

• White balance

• Pan/tilt reset

• Firmware version

• LED state

• Pixel defectcorrection

Table 2.2: A selection of UVC terminal and unit controls. The controls in thefirst two columns are defined in the standard, the availability and definition ofthe controls in the last column depends on the camera model.

2.4.5 Payload formats

The UVC standard defines a number of different formats for the streamingdata that is to be transferred from the device to the host, such as DV, MPEG-2,MJPEG, or uncompressed. Each of these formats has its own adapted headerformat that the driver needs to be able to parse and process correctly. MJPEGand uncompressed are the only formats used by today’s Logitech webcams andthey are also currently the only ones understood by the Linux UVC driver.

2.4.6 Transfer modes

UVC devices have the choice between using bulk and isochronous data transfer.Bulk transfers guarantee that all data arrives without loss but do not make anysimilar guarantees as to bandwidth or latency. They are commonly used in filetransfers where reliability is more important than speed. Isochronous transfersare used when a minimum speed is required but the loss of certain packets istolerable. Most webcams use isochronous transfers because it is more accept-able to drop a frame than to transmit and display the frames delayed. In thecase of a lost frame, the driver can simply repeat the previous frame, some-thing that is barely noticeable by the user, whereas delayed frames are usuallyconsidered more disruptive of a video conversation.

2.5 Non-Logitech cameras

Creative WebCam Creative has a number of webcams that work on Linux,most of them with the SPCA5xx driver. A list of supported devices can befound on the developer’s website[23]. Creative also has a collection of links todrivers that work with some of their older camera models[3].

10

Microsoft LifeCam In summer 2006 Microsoft entered the webcammarketwith two new products, the LifeCam VX-3000 and VX-6000 models. Both ofthem are not currently supported by Linux due to the fact that they use aproprietary protocol. Further models are scheduled but none of them arereported to be UVC compliant at this time.

11

Chapter 3

An introduction to Linuxmultimedia

3.1 Introduction

This chapter gives an overview of what the current state of multimedia supportlooks like on GNU/Linux. We shall first look at the history of the involvedcomponents and then proceed to the more technical details. At the end ofthis chapter the reader should have an overview of the different multimediacomponents available on Linux and how they work together.

3.2 Linux kernel multimedia support

3.2.1 A brief history of Video4Linux

Video devices were available long before webcams became popular. TV tunercards formed the first category of devices to spur the development of a multi-media framework for Linux. In 1996 a series of drivers targeted at the popularBrookTree Bt848 chipset that was used in many TV cards made it into the 2.0kernel under the name of bttv. The driver evolved quickly to include supportfor radio tuners and other chipsets. Eventually, more drivers started to showup, among others the first webcam driver for the Connectix QuickCam.

The next stable kernel version, Linux 2.2, was released in 1999 and in-cluded a multimedia framework called Video4Linux, or short V4L, that provideda common API for the available video drivers. It must be said that the name issomewhat misleading in the sense that Video4Linux not only supports videodevices but a whole range of related functions like radio tuners or teletextdecoders.

With V4L being criticized as too inflexible, work on a successor had startedas early as 1998 and, after four years, was merged into version 2.5 of the of-

12

ficial Linux kernel development tree. When version 2.6 of the kernel wasreleased, it was the first version of Linux to officially include Video for LinuxTwo, or simply V4L21. Backports of V4L2 to earlier kernel versions, in particu-lar 2.4, were developed and are still being used today.

V4L and V4L2 coexisted for a long time in the Linux 2.6 series but as of July2006 the old V4L1 API was officially deprecated and removed from the kernel.This leaves Video4Linux 2 as the sole kernel subsystem for video processingon current Linux versions.

3.2.2 Linux audio support

Linux has traditionally separated audio and video support. For one thing, au-dio has been around much longer than video has, and for another both subsys-tems have followed a rather strict separation of concerns. Even though theywere developed by different teams at different times, their history is markedby somewhat similar events.

Open Sound System

The Open Sound System, or simply OSS, was originally developed not only forthe Linux operating system but for a number of different Unix derivatives.While successful for a long time its rather simple architecture suffers froma number of problems, the most serious of which to the average user beingthe inability to share a sound device between different applications. As anexample it is not possible to hear system notification sounds while an audioapplication is playing music in the background. The first application to claimthe device blocks the device for all other applications.

Together with a number of non-technical reasons this eventually led to thedevelopment of ALSA, the Advanced Linux Sound Architecture.

Advanced Linux Sound Architecture

Starting with Linux 2.6, ALSA became the standard Linux sound subsystem,although OSS is still available as a deprecated option. The reason for this is thelack of ALSA audio drivers for some older sound devices.

Thanks to features like allowing devices to be shared among applicationsmost new applications come with ALSA support built in and many existingapplications make the conversion from older audio frameworks.

3.3 Linux user mode multimedia support

The Linux kernel community tries to move as many components as possibleinto user space. On the one hand this approach brings a number of advan-

1Note the variety in spelling. Depending on the author and the context Video for Linux Twois also referred to as Video4Linux 2 or just Video4Linux.

13

tages like easier debugging, faster development, and increased stability. Onthe other hand, user space solutions can suffer from problems such as reducedflexibility, the lack of transparency, or lower performance due to increasedoverhead.

Nevertheless the gains seem to outweigh the drawbacks, which is why a lotof effort has gone into the development of user space multimedia frameworks.Depending on the point of view, the fact that there is a variety of such frame-works available can be seen as a positive or negative outcome of this trend.The lack of a single common multimedia framework undoubtedly makes itmore difficult for application developers to pick a basis for their software. Theavailable choices range from simple media decoding libraries to fully grownnetwork-oriented and pipeline-based frameworks.

For the rest of this section we will present two of what we consider themost promising frameworks available today, GStreamer and NMM. The latterone is still relatively young and therefore not as wide-spread as GStreamerwhich has found its way into all current Linux distributions, albeit not alwaysin its latest and most complete version. Both projects are available under opensource licenses (LGPL and LGPL/GPL combined, respectively).

3.3.1 GStreamer

GStreamer can be thought of as a rather generic multimedia layer that pro-vides solid support for pipeline centric primitives such as elements, pads, andbuffers. It bears some resemblance to Microsoft DirectShow, which has been thecenter of Windows multimedia technology for many years now.

The GStreamer architecture is strongly plugin-based, i.e. the core libraryprovides basic functions like capability negotiation, routing facilities, or syn-chronization, while all input, processing, and output is handled by pluginsthat are loaded on the fly. Each plugin has an arbitrary number of so-calledpads. Two elements can be linked by their pads with the data flowing from thesource pad to the sink pad.

A typical pipeline consists of one or more sources that are connected viamultiple processing elements to one or more sinks. Figure 3.1 shows a very simpleexample.

Figure 3.1: A simple GStreamer pipeline that plays an MP3 audio file on thedefault ALSA source. The mad plugin decodes the MP3 data that it receives fromthe file source and sends the raw audio data to the ALSA sink.

Table 3.1 lists a few plugins for each category. Source elements are char-acterized by the fact that they only have source pads, sink elements only have

14

sink pads, and processing elements have at least one of each.

Sources Processing Sinks

• filesrc

• alsasrc

• v4l2src

• audioresample

• identity

• videoflip

• udpsink

• alsasink

• xvimagesink

Table 3.1: An arbitrary selection of GStreamer source, processing, and sink plug-ins.

3.3.2 NMM

NMM stands for Network-Integrated Multimedia Middleware and, as the name al-ready suggests, it tightly integrates network resources into the process. Bydoing so NMM sets a counterpoint to most other multimedia frameworks thattake a machine centric approach where input, processing, and output usuallyall happen on the same machine.

Let us look at two common examples of how today’s multimedia softwareinteracts with the network:

1. Playback of a file residing on a file server in the network

2. Playback of an on-demand audio or video stream coming from the net-work

1. Playback of a network file From the point of view of a player appli-cation, this is the easiest case because it is almost entirely transparent to theapplications. The main requirement is that the underlying layers (operatingsystem or desktop environment) know how to make network resources avail-able to their applications in a manner that resembles access to local resourcesas closely as possible. There are different ways how this can be realized, e.g.in kernel mode or user mode, but all of these are classified under the name ofa virtual file system.

As an example, an application can simply open a file path such as \\192.168.0.10\media\clip.avi (UNC path for a Windows file server resource) orsftp://192.168.1.2/home/mrubli/music/clip.ogg (generic URL for a se-cure FTP resource as used by many Linux environments). The underlyinglayers make sure that all the usual input/output functions work the same onthese files as on local files. So apart from supporting the syntax of such net-work paths the burden is not on the application writer.

15

2. Playback of an on-demand stream Playing back on-demand multime-dia streams has been made popular by applications such as RealPlayer or Win-dows Media Player. The applications communicate with a streaming server viapartially proprietary protocols based on UDP or TCP. The burden of flow con-trol, loss detection, and loss recovery lies entirely on the application’s shoul-ders. Apart from that, the client plays a rather passive role by just processingthe received data locally and exercising relatively little control over the pro-vided data flow. It is usually limited to starting or stopping the stream andjumping to a particular location within the stream. In particular, the applica-tion has no way of actively controlling remote devices, e.g. the zoom factor ofthe camera from which the video stream originates.

Note how there is no transparency from the point of view of the streamingclient. It requires deep knowledge of different network layers and protocols,which strongly reduces platform independence and interoperability.

NMM tries to escape this machine centric view by providing an infrastruc-ture that makes the entire network topology transparent to applications usingthe framework. The elements of the flow graph can be distributed within anetwork without requiring the application to be aware of this fact.

This allows applications to access remote hardware as if it were pluggedinto the local computer. It can change channels on a remote TV tuner cardor control the zoom level of a digital camera connected to a remote machine.The NMM framework abstracts all these controls and builds communicationchannels that reliably transmit data between the involved machines.

The website of the NMM project[16] lists a number of impressive examplesof the software’s capabilities. One of them can be seen in figure 3.2. The photois from an article that describes the setup of a video wall in detail[13].

3.4 Current discussion

Over the years many video device drivers have been developed by many dif-ferent people. Each one of these developers had their own vision of what adriver should or should not do. While the V4L2 API specifies the syntax andsemantics of the function calls that drivers have to implement, it does not pro-vide much help in terms of higher-level guidance, therefore leaving room forinterpretation.

The classic example where different people have different opinions is thecase of video formats and whether V4L2 drivers should include support forformat conversion. Some devices provide uncompressed data streams whereasothers offer compressed video data in addition to uncompressed formats. Notevery application, however, may be able to process compressed data, whichis why certain driver writers have included decompressor modules in theirdrivers. In the case of a decompressor-enabled driver format conversion canoccur transparently if an application asks for uncompressed data but the deviceprovides only compressed data. This guarantees maximum compatibility and

16

Figure 3.2: Video wall based on NMM. It uses two laptop computers to displayone half of a video each and a third system that renders the entire video.

allows applications to focus on their core business: processing or displayingvideo data.

Other authors take the view that decompressor modules have no place inthe kernel and base their opinion partly on ideological and partly on technicalreasons like the inability of using floating point mathematics in kernel-space.Therefore, for an application to work with devices that provide compresseddata, it has to supply its own decompressor module possibly leading to code–and bug–duplication unless a common library is used to carry out such tasks.

We will see the advantages and disadvantages of both approaches togetherwith possible solutions–existing and non-existing–in more detail in the nextchapter. What both sides have in common is that the main task of a mul-timedia framework is to abstract the device in a high-level manner so thatapplications need as little as possible a priori knowledge of the nature, brand,and model of the device they are talking to.

17

Chapter 4

Current state of Linuxwebcam support

4.1 Introduction

In the previous chapter we saw a number of components involved in gettingmultimedia data from the device to the user’s eyes and ears. This chapter willshow how these components are linked together in order to support webcams.We will find out what exactly they do and don’t do and what the interfacesbetween them look like. After this chapter readers should understand what isgoing on behind the scenes when a user opens his favorite webcam applicationand they should have enough background to understand the necessity of theenhancements and additions that were part of this project.

4.1.1 Webcams and audio

With the advent of USB webcams vendors started including microphones inthe devices. To the host system these webcams appear as two separate de-vices, one of them being the video part, the other being the microphone. Themicrophone adheres to the USB Audio Class standard and is available to everyhost that supplies a USB audio class driver. On Linux, this driver is calledsnd-usb-audio and exposes recognized device functions as ALSA devices.

Due to the availability of the Linux USB audio class driver there was noparticular need for us to concentrate on the audio part of current webcams asthey work out of the box. For this reason, and the fact that Video4Linux doesnot (need to) know about the audio part of webcams, audio will only comeup when it requires particular attention in the remainder of this report.

18

4.2 V4L2: Video for Linux Two

Video for Linux was already quickly introduced in section 3.2.1 where we sawthe evolution from the first video device drivers into what is today knownas Video for Linux Two, or just V4L2. This section focuses on the technicalaspects of this subsystem.

4.2.1 Overview

In a nutshell, V4L2 abstracts different video devices behind a common APIthat applications can use to retrieve video data without being aware of theparticularities of the involved hardware. Figure 4.1 shows a schematic of thearchitecture.

Figure 4.1: Simplified view of the components involved when a V4L2 applicationdisplays video. The dashed arrows indicate that there are further operating systemlayers involved between the driver and the hardware. The gray box shows whichcomponents run in kernel space.

The full story is a little more complicated than that. For one thing, V4L2 notonly supports video devices but related subdevices like audio chips integratedon multimedia boards, teletext decoders, or remote control interfaces. Thefact that these subdevices have relatively little in common makes the job ofspecifying a common API difficult.

The following is a list of device types that are supported by V4L2 and,where available, a few examples:

• Video capture devices (TV tuners, DVB decoders, webcams)

• Video overlay devices (TV tuners)

• Raw and sliced VBI input devices (Teletext, EPG, and closed captioningdecoders)

• Radio receivers (Radio tuners integrated on some TV tuner cards)

• Video output devices

19

In addition, the V4L2 specification talks about codecs and effects, whichare not real devices but virtual ones that can modify video data. However,support for these was never implemented, mostly due to disagreement howthey should be implemented, i.e. in user space or kernel space.

The scope of this project merely encloses the first category of the abovelist, video capture devices. Even though the API was originally designed withanalog devices in mind, webcam drivers also fall into this category. It is also thecategory that has by far the greatest number of devices, drivers, and practicalapplications.

4.2.2 The API

Due to its nature as a subsystem that communicates both with kernel spacecomponents and user space processes, V4L2 has two different interfaces, onefor user space and one for kernel space.

The V4L2 user space API

Every application that wishes to use the services that V4L2 provides needs away to communicate with the V4L2 subsystem. This communication is basedon two basic mechanisms: file I/O and ioctls.

Like most devices on Unix-like systems V4L2 devices appear as so-calleddevice nodes in a special tree within the file system. These device nodes canbe read from and written to in a similar manner as ordinary files. Using theread and write system calls is one of two ways to exchange data betweenvideo devices and applications. The other one is the use of mapped memorywhere kernel space buffers are mapped into an application’s address space toeliminate the need to copy memory around, thereby increasing performance.

Ioctls are a way for an application and a kernel space component to com-municate data without the usual read and write system calls. While ioctls arenot used to exchange large amounts of data, they are an ideal means to ex-change control commands. In V4L2 everything that is not reading or writingof video data is accomplished through ioctls1.

The V4L2 API[5] defines more than 50 such ioctls, ranging from video for-mat enumeration to stream control. The fact that the entire V4L2 API is basedon these two relatively basic elements makes it quite simple. That simplic-ity does, however, come with a few caveats as we will see later on when wediscuss the shortcomings of the current Linux video architecture.

The V4L2 kernel interface

The user space API is only one half of the V4L2 subsystem. The other halfconsists of the driver interface that every driver that abstracts a device forV4L2 must implement.

1In the case of memory mapped communication, or mmap, even the readiness of buffers iscommunicated via ioctls.

20

Obviously kernel space does not know the same abstractions as user space,so in the case of the V4L2 kernel interface all exchange is done through stan-dard function calls. When a V4L2 driver loads, it registers itself with the V4L2subsystem and gives it a number of function addresses that are called when-ever V4L2 needs something from the driver–usually in response to a user spaceioctl or read/write system call. At each callback the driver carries out the re-quested action and returns a value indicating success or failure.

The V4L2 kernel interface does not specify how drivers have to work inter-nally because the devices that these drivers talk to are fundamentally different.While webcam drivers usually communicate with their webcams through theUSB subsystem, other drivers find themselves accessing the PCI bus to whichTV tuner cards are connected. Therefore, each driver depends on its own setof kernel subsystems. What makes them V4L2 drivers is the fact that they allimplement a small number of V4L2 functions.

4.2.3 Summary

We have seen that the V4L2 subsystem itself is a rather thin layer that pro-vides a standardized way through which video applications and video devicedrivers can communicate. Compared to other platforms where the multime-dia subsystems have many additional tasks like converting between formats,managing data flow, clocks, and pipelines the V4L2 subsystem is rather lowlevel and focused on its core task: exchange of video and control.

4.3 Drivers

This section presents four drivers that are in one way or another relevant tothe Logitech QuickCam series of webcams. All of them are either V4L1 or V4L2drivers and available as open source.

4.3.1 The Philips USB Webcam driver

The Philips USB Webcam Driver, or simply PWC, has a troubled history and hascaused a lot of discussion and controversy in the Linux community.

The original version of the driver was written by a developer known underthe pseudonym Nemosoft as a project he did with the support of Philips. Atthe time there was no USB 2, so video compression had to be applied for videostreams above a certain data rate. These compression algorithms were propri-etary and Philips did not want to release them into open source. Therefore, thedriver was split into two parts: the actual device driver (pwc) that supported thebasic video modes that could be used without compression and a decompressormodule (pwcx) that attached to the driver and enabled the higher resolutions.Only the former one was released in source code, the decompressor moduleremained available in binary form. The pwc driver eventually made it into

21

the official kernel but the pwcx module had to be downloaded and installedseparately.

In August 2004, the maintainer of the Linux kernel USB subsystem, GregKroah-Hartman, decided to remove the hook that allowed the pwcx moduleto hook into the video stream. The reason he gave was the fact that the kernelis licensed under the GPL and such functionality is considered in violation ofit.

As a reaction, Nemosoft demanded that the pwc driver be removed en-tirely from the kernel because he felt that his work had been crippled and didnot agree with the way the situation was handled by the kernel maintainers.Much of the history can be found in [1] and the links in the article.

Only a few weeks later, Luc Saillard published a pure open source versionof the driver after having reverse-engineered large parts of the original pwcxmodule. Ever since, the driver has been under continuous development andwas even ported to V4L2.

The driver works with many Philips-based webcams from different ven-dors, among others a number of Logitech cameras. The complete list of Log-itech USB PIDs compatible with the PWC driver can be found in appendixA.

4.3.2 The Spca5xx Webcam driver

The name of the Spca5xx Webcam driver is a little misleading because it suggeststhat it only works with the Sunplus SPCA5xx series of chipsets. While thatwas true at one time, Michel Xhaard has developed the Spca5xx driver intoone of the most versatile Linux webcam drivers that exist today. Next to thementioned Sunplus chipsets it supports a number of others from manufactur-ers such as Pixart, Sonix, Vimicro, or Zoran. The (incomplete) list of supportedcameras at [23] contains more than 200 cameras and the author is workingon additional chipsets.

The main drawback of the Spca5xx driver is the fact that it does not supportthe V4L2 API yet. This limitation, and the way the driver has quickly grownover time, are the main reasons why the author has recently started rewritingthe driver from scratch, this time based on V4L2 and under the name of gspca.

Among the many supported cameras on the list, there is a fair number ofLogitech’s older camera models as well as some newer ones. Again, appendixA has a list of these devices.

4.3.3 The QuickCam Messenger & Communicate driver

This driver supports a relatively small number of cameras, notably a few mod-els of the QuickCamMessenger, QuickCam Communicate, and QuickCam Ex-press series. They are all based on the STMicroelectronics 6422 chip. Thedriver supports only V4L1 at the time of this writing and can be found at [14].

22

4.3.4 The QuickCam Express driver

Another relatively limited V4L1 driver, [19] focuses on the Logitech Quick-Cam Express and QuickCam Web models that contain chipsets from STMicro-electronics’ 6xx series. It is still actively maintained, although there are nosigns yet of a V4L2 version.

4.3.5 The Linux USB Video Class driver

Robot contests have been the starting point for many an open source softwareproject. The Linux UVC driver is one of the more prominent examples. Itwas developed in 2005 by Laurent Pinchart because he needed support forthe Logitech QuickCam for Notebooks Pro camera that he was planning to usefor his robot. The project quickly earned a lot of interest with Linux userswho tried to get their cameras to work. Driven by both, personal and com-munity interest, the driver has left the status of a hobby project behind and isdesignated to become the official UVC driver of the Linux kernel.

Since this driver is one of the corner stones of this project, we will give herea basic overview of the driver. Later in section 6.1 we shall discuss extensionsand changes that were done to support the Linux webcam infrastructure. Theofficial project website can be found at [17].

Technical overview

The Linux UVC driver, or short uvcvideo, is a Video4Linux 2 and a USB driverat the same time. It registers with the USB stack as a handler for devices of theUVC device class and, whenever a matching device is connected, the driverinitializes the device and registers it as a V4L2 device. Let us now look at a fewtasks and aspects of the UVC driver in the order they typically occur.

Device enumeration The first task of any USB driver is to define a criterialist for the operating system so that the latter one knows which devices thedriver is willing and able to handle. We saw in section 2.3.3 that some Logitechcameras do not announce themselves as UVC devices even though they arecapable of the protocol. For this reason, uvcvideo includes a hard-coded list ofproduct IDs of such devices in addition to the generic class specifier.

Device initialization As soon as a supported device is discovered, the driverreads and parses the device’s control descriptor and, if successful, sets up theinternal data structures for units and terminals before it finally registers thecamera with the V4L2 subsystem. At this point, the device becomes visible touser space, usually in the form of a device node, e.g. /dev/video0.

Stream setup and streaming If a V4L2 application requests a video stream,the driver enters the so-called probe/commit phase to negotiate the parame-ters of the video stream. This includes setting attributes like video data format,

23

frame size, and frame rate. When the driver finally receives video data fromthe device, it must parse the packets, check them for errors and reassemblethe raw frame data before it can send a frame to the application.

Controls Video streaming does not only consist of receiving video data fromthe device, but applications can use different controls to change the settings ofthe camera or the properties of the video stream. These control requests mustbe translated from the V4L2 requests that the driver receives to UVC requestsunderstood by the device. This process requires some mapping informationbecause the translation is all but obvious. We will have a closer look at thisproblem and how it can be solved later on.

Outlook

For obvious reasons V4L2 cannot support all possible features that the UVCspecification defines. The driver thus needs to take measures that allow userspace applications to access such features nonetheless. In section 6.1 we shallsee one such example that was realized with the help of the sysfs virtual filesystem and is about to be included in the project.

It is safe to say that the Linux USB Video Class driver is going to be the mostimportant Linux webcam driver in the foreseeable future. Logitech is alreadymoving all cameras onto the UVC track and other vendors are expected tofollow given that UVC is a Windows Vista logo requirement. For Linux usersthis means that all these cameras will be natively supported by the Linux UVCdriver.

4.4 Applications

4.4.1 V4L2 applications

Ekiga

Ekiga is a VoIP and video conferencing that supports SIP and H.323, whichmakes it compatible not only to applications such as NetMeeting but also toconferencing hardware that supports the same standards. It comes with plug-ins for both, V4L1 and V4L2, and is therefore able to support a large numberof different webcams.

Given the resemblance to other popular conferencing software, Ekiga isone of the main applications for webcams on Linux. It is licensed under theGPL, documentation, sources and binary packages can be downloaded from[18].

luvcview

This tool was developed by the author of the Spca5xx driver with the intentionto support some features unique to the Linux UVC driver, hence its name.

24

Figure 4.2: The main window of Ekiga during a call.

Thanks to its simplicity it has become one of the favorite programs for testingwhether the newly installed camera works. It is based on V4L2 for video inputand the SDL library for video output. The simple user interface allows basiccamera controls to be manipulated, including some of the custom controlsthat the UVC driver provides to enable mechanical pan/tilt for the LogitechQuickCam Orbit camera series.

The latest version includes a patch that was written during this project tohelp with debugging of camera and driver issues. It allows to easily save theraw data received from the device into files with the help of command lineoptions. luvcview can be downloaded from [22].

Figure 4.3 shows a screenshot of the luvcview user interface and the com-mand line used to start it in the background.

fswebcam

This nifty application is the proof that not all webcam software needs a GUI tobe useful. Purely command-line based it can be used to retrieve pictures froma webcam and store them in files, e.g. for uploading them to a web server inregular intervals. The fswebcam website can be found at [9].

25

Figure 4.3: The window of luvcview and the console used to start it in the back-ground.

4.4.2 V4L applications

Camorama

Camorama is a V4L1 only application made for taking pictures either man-ually or in specified intervals. It can even upload the pictures to a remoteweb server. Camorama allows adjusting the most common camera controlsand includes a number of video filters, some of which don’t seem very stable,though. It can be downloaded from [11] and is part of many Linux distribu-tions. Unfortunately development seems to stand still at the moment. Figure4.4 shows Camorama in action.

4.4.3 GStreamer applications

There are many small multimedia applications that use the GStreamer engineas a back-end but a relatively small number of prominent ones. The mostused ones are probably Amarok, the default KDE music player and Totem, theGNOME’s main media player. At the moment Amarok is limited to audio,although video support is being discussed.

What makes Totem interesting from the point of view of webcam users isa little webcam utility called Vanity. Unfortunately it has received very littleattention from both developers and users and it remains to be seen whetherthe project is revived or even integrated into Totem.

26

Figure 4.4: Camorama streaming at QVGA resolution from a Logitech QuickCamMessenger camera using the Spca5xx driver.

We will see another webcam application based on GStreamer in the nextchapter when we look at the software that was developed for this project. Atthat time we shall also see how GStreamer and V4L2 work together.

4.5 Problems and design issues

As with every architecture, there are a number of drawbacks, some of whichwere briefly hinted at in the previous sections. We will now look at theseissues in more detail and see what their implications on webcam support onthe Linux platform are. At the same time we will look at possible solutions tothese problems and how other platforms handle them.

4.5.1 Kernel mode vs. user mode

The discussion whether functionality X should be implemented in user modeor in kernel mode is an all-time classic in the open source community, partic-ularly in the Linux kernel. Unfortunately these discussions are oftentimes farfrom conclusive leading to slower progress in the implementation of certain

27

features or, in the worst case, to factually discontinued projects due to lack ofconsent and acceptance.

Table 4.1 shows the most notable differences between kernel mode anduser mode implementations of multimedia functionality. While the points arefocused on webcam applications, many of them can also be applied to otherdomains like audio processing or even devices completely unrelated to mul-timedia. In the following we will analyze these different points and presentpossible solutions and workarounds.

Kernel space User space

+ • Transparency for user space

• Direct device access

• Device works "out of the box"

• Simple upgrading

• Simple debugging

• Safer (bugs only affect oneprocess)

• More flexible licensing

– • No floating point math

• Complicated debugging

• Open source only

• No callback functions

• Difficult to establish standard

• Requires flexible kernelback-end

Table 4.1: Kernel space vs. user space software development

Format transparency

One of the main problems in multimedia application is the myriad of formatsthat are in use. Different vendors use different compression schemes for anumber of reasons: licensing and implementation costs, memory and pro-cessing power constraints, backward compatibility, and personal or corporatepreference. For application developers it becomes increasingly difficult to staycurrent on which devices use which formats and to support them all. In somecases, as in the case of the cameras using the PWC driver it may even be im-possible for someone to integrate certain algorithms for legal reasons. This is astrong argument for hiding the entire format conversion layer from the appli-cation, so that every application only needs to support a very small number ofstandard formats to remain compatible with all hardware and drivers.

A typical example is the way the current Logitech webcam drivers for Win-dows are implemented. While the devices usually provide two formats, com-pressed MJPEG and uncompressed YUY2, applications get to see neither ofthese formats. Instead, they are offered the choice between I420 and 24-bitRGB with the latter one being especially easy to process because each pixel

28

is represented by a red, green, and blue 8-bit color value. These formats areprovided independent of the mode in which the camera is being used. Forexample, if the camera is streaming in MJPEG mode and the capturing soft-ware requests RGB data, the driver uses its internal decompressor module toconvert the JPEG data coming from the camera into uncompressed RGB. Thecapturing software is not aware of this process and does not need to have itsown JPEG decoder; one nontrivial module less to implement.

At which layer this format conversion should happen depends on a num-ber of factors of both technical and historical nature. Traditionally, Windowsand Linux have seen different attempts at multimedia frameworks and manyof them have only survived because their removal would break compatibilitywith older applications still relying on these APIs. If vendors and driver devel-opers are interested in the support of these outdated frameworks, they mayneed to provide format filters for each one of these frameworks in the caseof a proprietary streaming format. If, however, the conversion takes place inthe driver itself, all frameworks can be presented with some standard formatthat they are guaranteed to understand. This can greatly simplify developmentby concentrating the effort on a single driver instead of different frameworkcomponents.

There are also performance considerations when deciding on which levela conversion should take place. If two or more applications want to access thevideo stream of a camera at the same time, they will create as many differ-ent pipelines as there are applications. If the format conversion–or any othercomputationally intensive process–is done in the user space framework, thesame process has to be carried out in the pipeline of each application becausethere is no way through which the applications could share the result. Thishas the effect of multiplying the required work, something that leads to poorscalability of the solution. In the opposite case, where the conversion processis carried out before the stream is multiplexed, the work is done just oncein the driver and all the frameworks receive the processed data as an input,therefore importantly reducing the overhead associated with multiple streamsin parallel.

Feature transparency

Up until now our discussion has focused primarily on format conversion.There exists another category of video processing that is different in a veryimportant way: computer vision. Computer vision is a form of image or videoprocessing with the goal of extracting meta data that enables computers to"see" or at least recognize certain features and patterns. A few classic exam-ples are face tracking, where the algorithm tries to keep track of the positionof one or multiple faces, feature tracking, where the computer locates not onlythe face but features like eyes, nose, or mouth, and face recognition, where soft-ware can recognize faces it has previously memorized. To see the fundamentaldifference between computer vision and format conversion modules we haveto look first at a basic mechanism of multimedia frameworks: pipeline graph

29

construction.When an application wants to play a certain media source it should not

have to know the individual filters that become part of the pipeline in orderto do so. The framework should automatically build a flow graph that putsthe right decoders and converters in the right order. The algorithms that dothis are usually based on capability descriptors that belong to each elementcombined with priorities to resolve ambiguities. For example, a decoder filtercould have a capability descriptor that says "Able to parse and decode .mp3 files"and "Able to output uncompressed audio/x-wav data". When an application wantsto play an .mp3 file, it can simply request a pipeline that has the given .mp3file as input and delivers audio/x-wav data as output.

In many cases there exist multiple graphs that are able to fulfill the giventask, so the graph builder algorithm has to take decisions. Back in our exam-ple there could be two MP3 decoders on the system, one that uses the SIMDinstruction set of the CPU if available and one that uses only simple arithmetic.Let us call the first module mp3_simd and assume it has a priority of 100. Thedefault MP3 decoder is calledmp3_dec and has a lower priority of 50. Naturally,the graph builder algorithm will first try to build the graph using mp3_simd. Ifthe current CPU supports the required SIMD instructions, the graph construc-tion will succeed. In the opposite case where the current machine lacks SIMD,mp3_simd can refuse to be part of the graph but the framework will still beable to build a working graph because it can fall back to our standard decoder,mp3_dec.

Imagine now an audio quality improvement filter called audio_qual thataccepts uncompressed audio/x-wav data as input and outputs the same type ofdata. How can the application benefit from audio_qualwithout having to knowabout it? The graph builder algorithm will always take the simplest graphpossible, so it does not see an advantage in introducing an additional filterelement that–from the algorithm’s capability oriented perspective–is nothingbut a null operation. This problem is not easy to solve because making everyaudio application aware of the plugin’s existence is not always practical.

The case of computer vision is very similar with respect to the pipelinegraph creation process. The computer vision module does not modify thedata, so the input and output formats are the same and the framework doesnot see the need to include the element into the graph.

One elegant solution to this problem is to do the processing in kernel modein the webcam driver before the data actually reaches the pipeline source. Ob-viously, this approach can require a format conversion in the driver if thecomputer vision algorithms cannot work directly on the video format deliv-ered by the camera. So the solution presented in the previous section becomesnot only a performance advantage but a necessity to support certain featurestransparently for all applications.

30

Direct device access

Another main advantage of a kernel mode multimedia framework is that theframework has easy access to special features that the device provides. Forexample, a new camera model can introduce motion control for pan and tilt.If the user mode multimedia framework is not aware of this or incapable tomap these controls onto its primitives, applications running on top of it cannotuse these features. Obviously this point is also valid for kernel mode frame-works but it is generally easier to communicate between kernel componentsthan across the barrier between user mode and kernel mode. For an appli-cation to be able to communicate with the driver, it is not enough to use theframework API, but a special side channel has to be established. The design ofsuch a side channel can turn out to be rather complicated if future reusabilityis a requirement because of the difficulty to predict the features of upcomingdevices.

We will see a concrete example of this issue–and a possible solution–lateron when we look at how the webcam framework developed as part of thisproject communicates with the device driver.

Callback

Many APIs rely on callback to implement certain features as opposed to pollingor waiting on handles. The advantage of this approach is that it has no impacton performance (especially compared to polling) and is much simpler becauseit does not require the application to use multiple threads to poll or wait.

There are many cases where such notification schemes are useful:

• Notification about newly available or unplugged devices

• Notification about controls whose value has changed, possibly as a resultof some device built-in automatism

• Notification about device buttons that have been pressed

• Notification about the success or failure of an action asynchronously trig-gered by the application (e.g. a pan or tilt request that can take sometime to finish)

• Notification about non-fatal errors on the bus or in the driver

Unfortunately, current operating systems provide no way to do direct call-back from kernel mode to user mode. Therefore, for V4L2 applications to beable to use the comfort of callback notification, a user space component wouldhave to be introduced that wraps polling or waiting and calls the applicationwhenever an event occurs. In chapter 7 we propose a design that does justthat.

Ease of use

The Linux kernel comes with a variety of features built in including manydrivers that users of other operating systems have to download and install

31

separately. If a certain device works "out of the box" it provides for gooduser experience because people can immediately start using the device andlaunch up their favorite applications. Such behavior is obviously desirablebecause it frees users from having to compile and install the driver themselves,something that not every Linux user may be comfortable doing.

On the other hand, the disadvantage of such an approach is the limitedupgradeability of kernel components. Even though current distributions pro-vide comfortable packaging of precompiled kernels, such an upgrade usuallyrequires rebooting the machine. In comparison, upgrading a user mode ap-plication is as easy as restarting the application once the application packagehas been upgraded. In high-availability environments, e.g. in the case of apopular webcam streaming server, the downtime incurred by a reboot can beunacceptable.

Development aspects

For a number of reasons programming in user mode tends to be easier thanprogramming in kernel mode. Three of these reasons are the variety of de-velopment tools, the implications of a software bug, and the comfort of theAPI.

Traditionally there are many more tools available for developing applica-tions than kernel components. The simple reason is for one that the devel-opment of user space tools itself is easier and for another that the number ofapplication developers is just much higher than the one of system developers.There is a large variety of debug tools and helper libraries out there but almostnone of them are applicable to kernel mode software. Therefore the Linux ker-nel mode developer has to rely mostly on kernel built-in tools. While theseare very useful, they cannot compare with the comfort of the kernel debuggertools available on the Windows platform.

If a problem in the kernel component occurs, the implications can be man-ifold. In some cases the entire machine can freeze without so much as a singleline of output that would help locate the problem. In less severe cases thekernel manages to write enough useful debug information to the system logand may even continue to run without the component in question. Neverthe-less, such an isolated crash often requires a reboot of the test machine becausethe crashed component cannot be replaced by a new, and possibly fixed, ver-sion anymore. These circumstances inevitably call for two machines, one fordevelopment and one for testing. In user mode an application bug is almostalways limited to a single process and trying out a new version is as easy asrecompiling and relaunching the program.

Finally, not all the comfort of the API that application programmers areused to is available in kernel space. Seemingly simple tasks like memory al-location, string handling, and basic mathematics can suddenly become muchmore complicated. One important difference is that floating point operations

32

are oftentimes not available in kernel mode for performance reasons2. One hasto resort to algorithms that avoid floating point computations or apply tricksthat are unlikely to receive a positive echo in the Linux kernel community.

All of these points make the development of multimedia software in usermode much easier, an important point given the complexity that the involvedalgorithms and subsystems often have.

Licensing

Nothing speaks against writing closed source software for Linux. As a matter offact, there is a large number of commercial Linux applications out there thatwere ported from other operating systems or written from scratch withoutreleasing their source code. The GNU General Public License (GPL), under whichthe Linux kernel and most of the system software is released, does not forbidclosed source applications.

The situation for kernel modules, however, is more complicated than that.Since the GPL requires derived works of a GPL-licensed product to be pub-lished under the same terms, most kernel modules are assumed derived works,therefore ruling out the development of closed source kernel modules[20].

There seems, however, to be an acceptable way of including a binary mod-ule into the Linux kernel. It basically consists of having a wrapper module,itself under the GPL, that serves as a proxy for the kernel functions requiredby the second module. This second module can be distributed in a binaryonly form and does not have to adopt the kernel’s license because it cannot beconsidered a derived work anymore.

Even after sidestepping the legal issues of a binary only kernel modulethere remain a few arguments against realizing a project in such a way, no-tably the lack of acceptance in the community and the difficult maintenancegiven the large number of different kernel packages that exist. In many cases,the software would have to be recompiled for every minor upgrade and forevery flavor and architecture of the supported Linux distributions. This candrastically limit the scope of supported platforms.

4.5.2 The Video4Linux user mode library

One solution to most of the problems just described keeps coming up whennew and missing features and design issues are discussed on the V4L mailinglist: a widely available, open source, user mode library that complements thekernel part of V4L2. Such a library could take over tasks like format conver-sion, providing a flexible interface for more direct hardware access, and takingcomplexity away from today’s applications. At the same time, the kernel partcould entirely concentrate on providing the drivers that abstract device capa-bilities and making sure that they implement the interfaces required by theV4L library.

2Banning floating point from kernel mode allows the kernel to omit the otherwise expensivesaving and restoring of floating point registers when the currently executing code is preempted.

33

While the approach sounds very promising and would bring the Linuxmultimedia platform a large step forward, nobody has found themselves will-ing or able to start such a project. In the meantime, other user mode frame-works like GStreamer or NMM have partly stepped into the breach. Unfor-tunately, since these frameworks do not primarily target V4L, they are rarelyable to abstract all desirable features. The growing popularity of these multi-media architectures, in turn, makes it increasingly harder for a V4L library tobecome widespread and eventually the tool of choice for V4L2 front-ends. Itseems fair to say that the project of the V4L user mode library has died longbefore it even got to the stage of a draft and it would require a fair amount ofinitiative to be revived.

4.5.3 V4L2 related problems

Video4Linux has a number of problems that have their roots partially in thelegacy of V4L1 and Unix systems in general as well as in design decisions thatwere made with strictly analog devices in mind. For some of them easy fixesare possible, for others solutions are more difficult.

Input and output

We saw in section 4.2.2 that V4L2 provides two different ways for applicationsto read and write video data. The use of the standard read and write systemcalls and memory mapped buffers (mmap). Device input and output usingthe read/write interface used to be–and still is in some cases–very popularbut is not the technique of choice due to the fact that it does not allow metainformation such as frame timestamps to be communicated alongside the data.This classic I/O-based approach, in turn, has the advantage of enabling everyapplication that supports file I/O to work with V4L2 devices.

While it would be possible for drivers to implement both techniques, someof them choose not to support read/write and mmap at the same time. Theuvcvideo driver for example does not support the read/write protocol in favorof the more flexible mmap.

The fact that for the application the availability of either protocol dependson the driver in use erodes the usefulness of the abstraction layer that V4Lis supposed to provide. To be on the safe side, an application would have toimplement both protocols at the same time, again something that not all ap-plication authors choose to do. Usually their decision depends on the purposeof their tool and the hardware they have access to during development.

The legacy of ioctl

The ioctl system call was first introduced with AT&T Unix version 7 in thelate seventies. It was used to exchange control data that did not fit into thestream-oriented I/O model. The operating system forwards ioctl requests di-rectly to the driver responsible for the device. Let us look at the prototype

34

of the ioctl function to understand where some of the design limitations inV4L2 come from:

int ioctl(int device, int request, void *argp);

There are two properties that stick out for an interface based on this func-tion:

1. There is only one untyped argument for passing data.

2. Every call needs a device handle.

The fact that ioctl provides only one argument for passing data betweencaller and callee is not a serious technical limitation in practice and neither isits untypedness. The ways that this interface is used, however, deprives thecompiler of doing any sort of compile-time type checking leading to possiblyhard to find bugs if a wrong data type is passed. For developers this also makesfor a little intuitive interface since even relatively simple requests require datastructures to be used where a few individual arguments of basic types wouldbe simpler.

While the first point is mostly a cosmetic one, the second one opposes amore important limitation on applications: there are no "stateless" calls to theV4L2 subsystem possible. Since the operating system requires a device handleto be passed to the ioctl request, the application has no choice but to openthe device prior to doing the ioctl call. As a consequence this eliminates thepossibility of device independent V4L2 functions. It is easy to come up with afew occasions where such stateless functions would be desirable:

• Device enumeration. It is currently left to the application to enumer-ate the device nodes in the /dev directory and filter those that belong toV4L2 devices.

• Device information querying. Unless the driver supports multipleopening of the same device, something that is not trivial to implementbecause the associated policies have to be carefully thought through, ap-plications have no more information than what the name of the devicenode itself provides. Currently this is restricted to the device type (Videodevices are called videoN , radio devices radioN , etc. where N is a num-ber.)

• Module enumeration. If the V4L2 system were to provide format con-version and other processing filters, applications would want to retrievea list of the currently available modules without requiring opening a de-vice first.

• System capability querying. Similarly, V4L2 capabilities whose ex-istence is independent of a device’s presence in the system could bequeried without the need for the application to know which capabilitywas introduced with which kernel version and hardcoding correspond-ing conditionals.

35

It is clear that the current API was designed to blend in nicely with the Unixway of communicating between applications and system components. Whilethis keeps the API rather simple from a technical point of view, it has to beasked whether it is worth sticking to these legacy interfaces that clearly werenot–and could not at the time–designed to handle all the cases that come upnowadays. Especially for fast advancing areas like multimedia a less genericbut more flexible approach is often desirable.

Missing frame format enumeration

We have mentioned that the current Video4Linux API was designed mostlywith analog devices in mind. Analog video devices have a certain advantageover digital ones in that they oftentimes have no constraints as to the videosize and frame rate they can deliver. For digital devices, this is different. Whilethe sensors used by digital webcams theoretically provide similar capabilities,these are hidden by the firmware to adapt to the way that digital video data istransmitted and used. So while an analog TV card may very well be capable ofdelivering an image 673 pixels wide and 187 pixels high, most webcams arenot. Instead, they limit the supported resolutions to a finite set, most of themwith a particular aspect ratio such as 4:3. Similar restrictions apply for framerates where multiples of 5 or 2.5 dominate.

One implication of this is that at the time V4L2 was designed, there was noneed to provide applications with a way to retrieve these finite sets. This haspeculiar effects at times:

• Many applications are completely unaware of the frame rate and rely onthe driver to apply a default value.

• The only way for V4L2 applications to enumerate frame rates is to testthem one by one and check if the driver accepts them.

• Since a one-by-one enumeration of resolutions is impossible due to thesheer number of possible value combinations, applications simply haveto live with this limitation and either provide a hardcoded list of resolu-tions likely to be supported or have the user enter them by hand. Once aselection is made, the application can test the given resolution. To makethis process less frustrating than what it seems V4L2 drivers return thenearest valid resolution if a resolution switch fails. As an example, ifan application requests 660x430, the driver would be likely to set theresolution to 640x480.

We shall see in 6.2 how this severe limitation was removed by enhancingthe V4L2 API.

Control value size

Another limitation that is likely to become a severe problem in the future isthe structure that V4L2 uses to get and set the values of device controls:

36

struct v4l2_control {__u32 id /* Identifies the control. */__s32 value /* New value or current value. */

};

The value field is limited to 32 bits, which is satisfactory for most simplecontrols but not for more complex controls. This has already given rise tothe recent introduction of extended controls (see the VIDIOC_G_EXT_CTRLS,VIDIOC_S_EXT_CTRLS, and VIDIOC_TRY_EXT_CTRLS ioctls in [5]), whichallow applications to group several control requests and provide some roomfor extension.

We will come back to this issue at the beginning of chapter 5 when wediscuss the goals of our webcam framework.

Lack of current documentation

The last problem we want to look at is unfortunately not limited to V4L2 butaffects a wide range of software products, especially in the non-commercialand open source sector: poor documentation.

The V4L2 documentation is split into two parts, an API specification forapplication programmers[5] and a driver writer’s guide[4]. While the first oneis mostly complete and up-to-date, the latter one is completely outdated, littlehelpful except for getting a first overview, and it gives no guidelines on howto implement a driver and what to watch out for.

The main source of information on how to write a V4L2 driver is thereforethe source code of existing drivers. The lack of a reference driver doesn’t makethe choice easy, though, and there exist some poorly written drivers out there.Moreover, there is little documentation available on what the V4L2 subsystemactually does and doesn’t do. Again, delving into the source code is the bestand only way to get answers.

This lack of starting points for developers is likely one of the biggest prob-lems of V4L2 at the moment. It sets the threshold for newcomers quite highand makes it hard for established developers to find common guidelines toadhere to, something that in turn prevents code sharing and modularizationof common features.

As part of this project the author has tried to set a good example by prop-erly documenting the newly added frame format enumeration features andproviding a reference implementation that demonstrates their usage. One canonly hope that the current developers eventually take a little time out of theirschedules to document the existing code as long as the knowledge and recol-lection is still there.

Stream synchronization

There is one important aspect normally present in multimedia frameworksthat all applications known to the author have blissfully ignored without anyobviously bad consequences: synchronization of multimedia streams.

37

Whenever a computer processes audio and video inputs simultaneouslythere is an inevitable tendency for the two streams to slowly drift apart whenthey are recorded. This has numerous reasons and there are different strategiesto reduce the problem, many of which are explained in [12], an excellentarticle by the author of VirtualDub, an extremely popular video processingutility for Windows.

The fact that no bad consequences can be observed with current Linuxwebcam software does not mean, however, that the problem does not existon the Linux platform. The problem only becomes apparent when videos arerecorded that include an audio stream and none of the common applicationsseem to do that yet. Once this has changed, applications will need to figureout a way to avoid the problem of having the video and audio streams driftapart. V4L2 on its own cannot prevent this because it has no access to theaudio data.

Despite all these problems, Linux has a functioning platform for webcamstoday. It is only a matter of time and effort to resolve them one buy one. Thenext chapter is a first step in that direction, as it provides some ideas and manyreal solutions.

38

Chapter 5

Designing the webcaminfrastructure

5.1 Introduction

After having seen all the relevant requirements for operating a webcam onLinux, we can finally discuss what our webcam framework looks like. Thischapter treats the ideas and goals behind the project, how we have tackled thedifficulties and why the solution looks as it looks today.

We will present all the components involved in a high-level manner andsave the technical details for the two following chapters. To conclude we shallrevisit the problems discussed in the previous chapters and summarize howour solution solves them and strives to avoid similar problems in the future.Before doing so, however, we need to be clear about the goals we want toachieve and set priorities. Software engineering without having clear goals inmind is almost guaranteed to lose focus of the main tasks over the little thingsand features.

5.2 Goals

The main goal of the project, enhancing the webcam experience of Linuxusers, is a rather vague one and does not primarily lend itself as a templatefor a technical specification. It does, however, entail a number of secondarygoals, or means, that fit together to achieve the primary goal. These goals areof a more concrete nature and can be broken down into technical or environ-mental requirements.

Apart from the obvious technical challenges that need to be solved, thereis another group of problems that are less immediate but must nevertheless becarefully considered: business and legal decisions. When a company takes a goat open source software, conflicts inevitably arise, usually between protection

39

of intellectual property and publishing source code. Their consideration hasplayed an important role in defining the infrastructure of the webcam frame-work and we will return to the topic when discussing the components affectedby it.

Let us now look at the different goals one by one and how they wereachieved.

A solution that works As trivial as it may sound, the solution should work.Not only on a small selection of systems that happens to be supported by thedeveloper but on as broad a system base as possible and for as many users aspossible. Nothing is more frustrating for a user than downloading a programjust to find out that it does not work on his system.

Unfortunately it cannot always be avoided to limit the system base to acertain degree for practical and technical reasons. Practical reasons are mostlydue to the fact that it is impossible to test the software on every system combi-nation out there. Many different versions of the kernel can be combined withjust as many different versions of the C runtime library. On the technical sidethere is an entire list of features that a given solution is based on and with-out which it cannot properly work. The size of the supported system base istherefore a tradeoff between development and testing effort on one side andsatisfying as many users as possible on the other.

Making this tradeoff was not particularly difficult for this project as one ofthe pillars of the webcam framework already sets a quite strict technical limit.For USB 2.0 isochronous mode to work properly a Linux kernel with version2.6.15 or higher is strongly recommended because the USB stack of earlierversions is known to have issues that can cause errors in the communicationbetween drivers and devices. In a similar way, certain features of Video4Linux2 only became available in recent versions of the kernel, notably the frameformat enumeration that we will see in 6.2.

This does not mean, however, that the solution does not work at all on sys-tems that do not meeting these requirements. The feature set of the webcamframework on older platforms is just smaller. Everything that does not dependon features of the UVC driver works on kernels older than 2.6.15 and the lackof a V4L2 implementation that does not provide frame format enumerationprevents only this particular feature from working.

A solution that works best–but not exclusively–with Logitech cam-eras Parts of the solution we have developed are clearly optimized for thelatest Logitech cameras, no need to hide this fact. Logitech has invested largeamounts of money and time into developing the QuickCam hardware andsoftware. There is a lot of intellectual property contained in the software aswell as some components licensed from third party companies. Even if Log-itech wanted to distribute these features in source code form, it would not belegally possible. As a result, these components must be distributed in binaryformat and they are designed to work only if a Logitech camera is present in

40

the system because other cameras don’t implement the necessary features.These binary components are limited to a single dynamic library that is not

required for the webcam infrastructure to work. For users this means thatthere is some extra functionality available if they are using a Logitech camerabut nothing stops them from using the same software with any other UVCcompliant camera.

Planning ahead In the fast moving world of consumer electronics it is some-times hard to predict where technology will lead us in a few years from now.Future webcams will have many features that today’s software does not knowabout. It is therefore important to be prepared for such features by designinginterfaces in a way that makes them easily extensible to accommodate newchallenges.

A typical example of this necessity is the set of values of certain cameracontrols. Most controls are limited to 32-bit integer values, which is enoughfor simple control such as image brightness or camera tilt. One can imag-ine, however, that certain software supported features could need to transmitchunks of data to the camera that do not fit in 32 bits. Image processing on thehost could compute a list of defect pixels that the camera should interpolate inthe firmware or it could transmit region information to help the camera usedifferent exposure settings for foreground and background.

In the provided solution we have avoided fixed-length value limitationswherever possible. Each control can have arbitrary long values and all fixed-length strings, often used in APIs for simplicity reasons, have been replaced byvariable-length, null-terminated strings. While it is true that this approach isslightly more complicated for all involved parties, it assures that future prob-lems do not encounter data width bottlenecks. We have carefully planned theAPI in a way that puts the burden on the libraries and not the applications andtheir developers. For applications, buffer management is mostly transparentand the enumeration API functions are no different than if fixed-width datahad been used.

Another example that guarantees future extensibility is the generic accessto UVC extension units that we added to the UVC driver. Without such afeature, the driver would need to be updated for every new camera model,the very process that generalizing standards like UVC strive to avoid. Thenew sysfs interface of the UVC driver allows user mode applications genericraw access to controls provided by a camera’s UVC extension units. Sincethese extension units are self-descriptive, the driver can retrieve all requiredinformation at runtime and need not be recompiled.

There are a few other places where we have planned ahead for future ex-tensions, such as the abstraction layers we are taking advantage of and themodularity of some of the involved modules. These examples will be ex-plained in more detail in the rest of this chapter.

41

Dealingwith current problems A prerequisite for and a goal of this projectat the same time was solving the problems we saw in chapter 4 in the bestmanner for everybody. This means that we did not want to further complicatethe current situation by introducing parallel systems but instead help solvethese problems so that currently existing applications can also leverage off theimprovements we required for our framework.

Admittedly, it may sometimes seem easier to reinvent the wheel than im-prove the wheels already in place, but in the end having a single solution thatsuits multiple problems is preferable because a combined effort often achievesa higher quality than two half-baked solutions do. The effects of a devel-oper branching the software out of frustration with the line a project is fol-lowing can be seen quite often in the open source community. The recentMambo/Joomla dispute1 is a typical example where it is doubtful that the splithas resulted in an advantage of any of the involved parties.

Let us use the UVC driver as an example to illustrate the situation in thewebcam context. Creating our own driver or forking the current one wouldhave made it easier to introduce features that are interesting for Logitech be-cause we could have changed the interface without discussing the implicationswith anyone. By doing so, both drivers would have received less testing and itwould have been harder to synchronize changes applicable to both branches.Keeping a single driver is a big win for the Linux webcam user and avoids thefrustrating situation where two similar devices require two slightly differentdrivers.

Community acceptance Many Linux projects with a commercial back-ground have received a lukewarm reception from the open source communityin the past, sometimes for valid reasons, sometimes out of fear and skepticism.There is no recipe for guaranteed acceptance by the Linux community butthere are a few traps one can try to avoid.

One of the traps that many companies fall into is that they strictly limituse of their software to their own products. Obviously, for certain deviceclasses they may not have any choice, take the example of a graphics board.Fortunately, for the scope of this project, this was relatively easy given thatthe webcams for which it was primarily designed adhere to the USB VideoClass standard. Linux users have every interest in good UVC support, so therewere very few negative reactions to Logitech’s involvement. The fact thatsomebody was already developing a UVC driver when we started the projectmay also have helped convince some of the more suspicious characters outthere that it was not our intent to create a software solution that was merelyfor Logitech’s benefit.

Throughout the project we have strived to add features to the UVC driverthat we depend on for the best support of our cameras in the most generic

1The open source content management system Mambo was forked in August 2005 after thecompany that owned the the trademark founded a non-profit organization with whose organiza-tion many of the developers did not agree with. The fork was named Joomla.

42

way so that devices of other vendors can take advantage of them. A typicalexample for this is the support for UVC extensions. While not strictly nec-essary for streaming video, all additional camera features are built on top ofUVC extension units. It can therefore be expected that other vendors will usethe same mechanisms as Logitech, so that by the time that more UVC devicesappear on the market, they will already be natively supported by Linux.

Avoid the slowness of democracy This goal may at first seem diametri-cal to the previous point. The open source community is a democracy whereeveryone can contribute their opinions, concerns, and suggestions. While thisoften helps make sure that bad solutions never even end up being realized, itrenders the process similarly slow as in politics. For projects with time con-straints and full-time jobs behind it, this is less than optimal, so we had toavoid being stalled by long discussions that dissolve without yielding an actualsolution.

However, like so often it can turn out to be more fruitful to confront peo-ple with an actual piece of software that they can touch and test. Feedbackbecomes more concrete, the limitations become better visible, and so do thegood points. If a project finds rapid acceptance with users, developers arelikely to become inspired and contribute or eventually use some of the ideasfor other projects. We are confident that the webcam framework will showsome of the pros as well as cons that a user mode library brings. Maybe oneday somebody revives the project of a V4L2 user mode library and integratesparts of the webcam framework as a subset of its functionality because that iswhere it would ideally lie.

5.3 Architecture overview

With a number of high-level goals in mind, we can start to translate thesegoals into an architecture of components and specify each component’s tasksand interfaces. To start off, let us compare what the component stack looks likewith the conventional approach on one side and with the webcam frameworkon the other.

From section 4.2 we already know how V4L2 interfaces with the UVCdriver on one side and the webcam application on the other (figure 5.1a). Thestack is relatively simple as all data, i.e. control and video data, flows throughV4L2 without carrying out any processing itself. This approach is used by allcurrent webcam applications and suffers from a few issues identified in section4.5.

The webcam framework positions itself between the operating system andthe application that receives live video from a camera. Figure 5.1b illustratesthe different subsystems involved and where the core of the webcam frame-work is located.

We see that the webcam framework fills a relatively small spot in the en-tire system but it is one of two interfaces that a webcam application interfaces

43

(a) (b)

Figure 5.1: Layer schema of the components involved in a video stream with (a)the conventional approach and (b) the webcam framework in action. Note theborder between user space and kernel space and how both V4L2 and sysfs haveinterfaces to either side.

44

with to communicate with the camera. This leaves the application the flexi-bility to choose for every task the component that performs it best: V4L2 forvideo streaming and related tasks such as frame format enumeration or streamsetup, the webcam framework for accessing camera controls and advancedfeatures that require more detailed information than what V4L2 provides.

5.4 Components

5.4.1 Overview

Despite of what the previous schema suggests, the Linux webcam frameworkis not a single monolithic component but a collection of different libraries withstrictly separated tasks. This modularity ensures that no single componentgrows too complicated and that the package remains easy to maintain anduse. Figure 5.2 gives an overview of the entire framework in the context of theGStreamer and Qt based webcam application, as well as a panel application.Both these applications are provided as part of the package and can be seen inaction in chapter 8.

Figure 5.2: Overview of the webcam framework kernel space and user spacecomponents. The dashed box shows the three components that use the GStreamermultimedia framework.

In the remainder of this section we will look at all of these components,

45

what their tasks are, and what the interfaces between them look like. Whiledoing so we shall see how they accomplish the goals discussed above.

5.4.2 UVC driver

The UVC driver was already introduced in chapter 4.3.5, therefore we willonly give a short recapitulation at this point. Its key tasks are:

• Supervise device enumeration and register the camera with the system.

• Communicate with the camera using the UVC protocol over USB.

• Verify and interpret the received data.

• Respond to V4L2 requests originating from applications.

• Provide additional interfaces for features not supported by V4L2.

It is the last of these points that makes it a key component in the webcamframework. Conventional webcam drivers oriented themselves at the featuressupported by V4L2 and tried to implement these as far as possible. This wasnot an easy task since the specifications available to the developers were oftenincomplete or even had to be reverse engineered from scratch. Therefore thenecessity to support features unknown to V4L2 rarely arose.

With the USB Video Class standard this is completely different. The stan-dard is publicly available and if both, device manufacturers and driver engi-neers stick to it, compatibility comes naturally. The challenge stems from thefact that the functions described in the UVC standard are not a subset of thosesupported by V4L2. It is therefore impossible for a Video4Linux application tomake use of the entire UVC feature spectrum without resorting to interfacesthat work in parallel to the V4L2 API.

For the UVC driver the sysfs virtual file system takes over this role. Itprovides raw access to user mode software in a generic manner, all of this inparallel to the V4L2 API, which is still used for the entire video streaming partand provides support for a fairly general subset of the camera controls.

5.4.3 V4L2

We have seen previously that Video4Linux has two key tasks relevant to we-bcams:

• Make the video stream captured by the device driver available to appli-cations.

• Provide image and camera related controls to applications.

V4L2 is good at the first point but it has some deficiencies when it comesto the second one due to its limitation of control values to 32 bits (see 4.5.3).This is why our scenario does not rely solely on V4L2 for webcam controls butuses the UVC driver’s sysfs interface where necessary.

46

We can see from the figures that V4L2 serves as interface between usermode and kernel mode. In user mode it takes requests from the application,which it then redirects towards the UVC driver that runs in kernel mode,vice-versa for the replies that originate from the driver and end up in theapplication.

Another important point is that V4L2 is not limited to talking to one ap-plication at a time. As long as the driver supports it–there is no multiplexingdone on Video4Linux’ part–, the same device can be opened multiple timesby one or more processes. This is required by the current webcam frameworkbecause the video application is not the only component to access the V4L2device handle. We will see the different access scenarios as we go.

5.4.4 GStreamer

Parts of our webcam framework are built on top of GStreamer because, inour opinion, it is currently the most advanced multimedia framework on theLinux platform. Its integration with the GNOME desktop environment provesthat it has reached a respectable grade of stability and flexibility and Phonon,the multimedia framework of KDE 4, will have a back-end for GStreamer.Together with the ongoing intensive development that takes place, this makesit a safe choice for multimedia applications and is likely to guarantee a smoothintegration into future software.

Note that even though, currently, GStreamer is the only framework sup-ported by the Linux webcam framework, plugins for different libraries likeNMM can be written very easily. All that needs to be ported in such a caseis the lvfilter plugin, the interface between GStreamer and liblumvp. This willbecome clear as we talk more about the components involved.

There are three elements in the figure that take advantage of the GStreamermultimedia framework. Simply speaking, the box labeled GStreamer is the"application" as far as V4L2 is concerned. Technically speaking, only theGStreamer v4l2src plugin uses the V4L2 API, all other components use tech-niques provided by the GStreamer library to exchange data.

Figure 5.3 visualizes this by comparing the component overview of a V4L2application to a GStreamer application that uses a V4L2 video source.

5.4.5 v4l2src

As the name already suggests, this plugin is the source of all V4L2 data thatflows through the GStreamer pipeline. It translates V4L2 device propertiesinto pad capabilities and pipeline state changes into V4L2 commands. This isbest illustrated by an example. Figure 5.1 shows the functions that v4l2src usesand the V4L2 counterparts that they call. Note that v4l2src does not directlyprocess the GStreamer state transitions but is based on the GstPushSrc pluginthat wraps those and uses a callback mechanism.

The capability negotiation that is carried out during stream initializationuses the information retrieved from V4L2 function calls like ENUMFMT or

47

(a) Components involved when aV4L2 application displays video.

(b) Components involved when a GStreamer based applica-tion displays V4L2 video.

Figure 5.3: Component overview with and without the use of the GStreamermultimedia framework.

GStreamer V4L2 Description

start open Initialization

get_caps ENUMFMTTRY_FMT Format enumeration

set_caps S_FMTSTREAMON Stream setup

create DQBUFQBUF Streaming

... ...

stop STREAMOFFclose Cleanup

Table 5.1: Translation between GStreamer and V4L2 elements and functions.

48

G_FMT to create a special data description format that GStreamer uses inter-nally to check pads for compatibility. There are two so-called caps descriptorsinvolved in our example, the pad capabilities and the fixed capabilities.

The former is created by enumerating the device features during the get_capsphase. It is a set that contains the supported range of formats, resolutions, andframe rates and looks something like this:

video/x-raw-yuv,format=YUY2,width=[ 160, 1280 ],height=[ 120, 960 ],framerate=[ 5/1, 30/1 ];

image/jpeg,width=[ 160, 960 ],height=[ 120, 720 ],framerate=[ 5/1, 25/1 ]

The format is mostly self-explanatory. The camera supports two pixel for-mats, YUV (uncompressed) and MJPEG (compressed) and the intervals givethe upper and lower limits on frame size and frame rate. Note that the sectionfor the uncompressed format has an additional format attribute that specifiesthe FourCC code. This is necessary for the pipeline to identify the exact YUVformat used as there are many different ones with YUY2 being only one ofthem.

The descriptor for the fixed capabilities is set only after the set_caps phasewhen the stream format has been negotiated with V4L2. This capability con-tains no ranges or lists but is a simple subset of the pad capabilities. Afterrequesting an uncompressed VGA stream at 25 fps from the camera, for ex-ample, it would look as follows:

video/x-raw-yuv,format=YUY2,width=640,height=480,framerate=25/1

We can clearly see that the format chosen for the pipeline is a subset of thepad capabilities seen above. The intervals have disappeared and all attributeshave fixed values now. All data that flows through the pipeline after the capsare fixed are of this format.

5.4.6 lvfilter

The Logitech video filter or, short, lvfilter component is also realized as a GStreamerplugin. Its task is relatively simple: intercept the video stream when enabled(filter mode) and act as a no-op when disabled (pass-through mode).

49

We will come back to the functionality of lvfilter when we look at some ofthe other components, in particular liblumvp. For the moment, let lvfilter be ano-op.

5.4.7 LVGstCap (part 1 of 3: video streaming)

The sample webcam software provided as part of the framework is LVGstCap,the Logitech Video GStreamer Capture application. It is the third componentin our schema that uses GStreamer and the only one with a user interface.LVGstCap is also the first webcam capture program to use the approach de-picted in 5.1b, i.e. use both V4L2 and the webcam framework simultaneouslyto access the device. This fact remains completely transparent to the user aseverything is nicely integrated into a single interface.

Among others, LVGstCap provides the basic features expected from a we-bcam capture application:

• List the available cameras and select one.

• List the available frame formats (i.e. a combination of pixel format, im-age resolution, and frame rate) and select one.

• Start, stop, and freeze the video stream.

• Modify image controls (e.g. brightness, contrast, sharpness).

These features work with all webcams as long as the camera is supportedby Linux and its driver works with the GStreamer v4l2src plugin.

On top of this basic functionality LVGstCap supports some additional fea-tures. We will talk about them in parts 2 and 3.

5.4.8 libwebcam

TheWebcam library is a cornerstone of the webcam framework in that all othernew components rely on it in one way or another. Being more than only animportant technical element, libwebcam realizes part of what the Video4Linuxuser space library was always supposed to be: an easy to use library that shieldsits users from many of the difficulties and problems of using the V4L2 APIdirectly.

Today libwebcam provides the following core features:

• Enumeration of all cameras available in the system.

• Provide detailed information about the detected devices.

• Wrapper for the V4L2 frame format enumeration.

• Provide unified access to V4L2 and sysfs camera controls.

In addition, the interface is prepared to handle device events ranging fromnewly detected cameras over control value changes to device button events.It is easy to add new features without breaking application compatibility andthe addition of new controls or events is straightforward.

50

5.4.9 libwebcampanel

The Webcam panel library takes libwebcam one step further. While libweb-cam is still relatively low-level and does not interpret any of the controls orevents directly, libwebcampanel does just that. It combines internal informa-tion about specific devices with the controls provided by libwebcam to pro-vide applications with meta information and other added value. This makesit a common repository for device-specific information that would otherwisebe distributed and duplicated within various applications. The core features oflibwebcampanel are:

• Provide meta data that applications need to display camera informationand user-friendly control elements.

• Implement a superset of libwebcam’s functionality.

• Give access to the feature controls that liblumvp provides.

We can see that the main goal of libwebcampanel is making the develop-ment of generic webcam applications easier. It is for this reason that mostapplications will want to use libwebcampanel instead of the lower-level lib-webcam.

The last point of the above list will become clear when we discuss liblumvp.Before doing so, however, let us look at LVGstCap one more time to see howit uses the control meta information.

5.4.10 LVGstCap (part 2 of 3: camera controls)

When the user selects a device in LVGstCap, it immediately enumerates thecontrols that the chosen device provides and displays them in a side panel.Ordinarily, i.e. in the case of V4L2 controls, there is no additional informationon the control apart from the value range and whether the control is a num-ber, a Boolean, or a list of choices. While most controls can be made to fit inone of these categories, in practice there are a number of controls for whichthis representation is not quite right.

Two examples are controls whose value is a bitmask and read-only con-trols. In the former case it seems inappropriate to present the user with aninteger control that accepts values from, say, 0 to 255 when each bit has a dis-tinct meaning. libwebcampanel might transform such a control either into alist of eight choices if the bits are mutually exclusive or split it up into eight dif-ferent Boolean controls if arbitrary bit combinations are allowed. This allowsLVGstCap to display the controls in a generic manner.

In the case of read-only controls the user should not be allowed to changethe GUI element but still be able to read its current value. Therefore, if libwe-bcampanel sets the read-only flag on a certain control, LVGstCap will disableuser interaction with it and gray it out to make this fact visually clear to theuser. We will see a few concrete examples of such cases later in chapter 7.

51

5.4.11 liblumvp

The name liblumvp stands for Logitech user mode video processing library. It is theonly component of the webcam framework that is not open source becauseit contains Logitech intellectual property. liblumvp consists of a fairly simplevideo pipeline that passes the video data it receives through a list of pluginsthat can process and modify the images before they are output again.

The library receives all its input from lvfilter. Whenever lvfilter is in filtermode, it sends the video data it intercepts to liblumvp and uses the–possiblymodified–video buffer it receives back as its output. All of this remains trans-parent to the application2.

One can think of a multitude of plugins that liblumvp could include, ba-sically it could implement all the features that Logitech QuickCam provideson Windows. This requires applications to be able to communicate with theseplugins, for example to enable or disable them or change certain parameters.For this reason, the library exposes a number of controls, so-called feature con-trols in a manner almost identical to how libwebcam does it. This is where thesecond reason for the additional layer introduced by libwebcampanel lies: itcan provide applications with a list of hardware camera controls on the onehand and a list of liblumvp software controls on the other hand. Applicationscan handle both categories in an almost symmetric manner3, which is justwhat LVGstCap does.

5.4.12 LVGstCap (part 3 of 3: feature controls)

LVGstCap uses libwebcampanel not only for presenting camera controls to theuser but also for feature controls if liblumvp is currently enabled. When avideo stream is started, the feature control list is retrieved and its control itemsare displayed to the user in a special tab next to the ordinary controls.

The application also has access to the names of the different features thatliblumvp has compiled in. This information can be used to group the controlsinto categories when required.

When the user changes a feature control, LVGstCap communicates this tolibwebcampanel, which takes care of the communication with liblumvp. Wewill later see that this communication is not as trivial in all cases as it maylook at first. In the example of a video application that incorporates bothvideo output and control panel in a single process, there is no need for specialmeasures. There is, however, a case where this does not hold true: panelapplications.

2As a matter of fact, the application must explicitly include lvfilter in its GStreamer pipeline,but once the pipeline stands, its presence is transparent and needs no further attention. We willsee the advantages and disadvantages of this in chapter 7

3The reasons why the two are not treated exactly the same are explained in chapter 7.

52

5.4.13 lvcmdpanel

A panel application is a–usually simple–program that does not do any videohandling itself but allows the user to control a video stream that is currentlyactive in another application. There are a few situations where panel applica-tions are useful:

• Allow command line tools or scripts to modify video stream parameters.

• Permit control over the video stream of an application that does not haveits own control panel.

• Provide an additional way of changing controls, e.g. from a tray applica-tion.

Our webcam framework includes an example application of the first kind,a command line tool called lvcmdpanel. Figure 5.4 shows the output of the helpcommand. Chapter 8 has a sample session to illustrate some of the commands.

lvcmdpanel 0.1

Control webcam video using the command line

Usage: lvcmdpanel [OPTIONS]... [VALUES]...

-h, --help Print help and exit-V, --version Print version and exit-v, --verbose Enable verbose output-d, --device=devicename Specify the device to use-l, --list List available cameras-c, --clist List available controls-g, --get=control Retrieve the current control value-s, --set=control Set a new control value

Figure 5.4: Command line options supported by lvcmdpanel.

5.5 Flashback: current problems

In chapter 4 we discovered a number of issues that current V4L2 applicationshave to deal with. Let us now revisit them one by one and show how ourwebcam framework avoids or solves them. Note that we don’t go to greattechnical details here but save those for chapter 7.

Avoid kernel mode components Apart from some work on the UVCdriver and V4L2 that are necessary to exploit the full feature set providedby current webcams the entire framework consists of user mode components.This demonstrates that there are good ways to realize video processing and

53

related tasks in user mode today and that for most of the associated drawbacksgood solutions can be found.

Direct device access While direct device access can never be achieved with-out the support of select kernel mode components, we tackled this problem byextending the UVC driver so that it allows user mode applications to access thefull spectrum of UVC extensions. With the help of sysfs, we have developedan interface that is superior to any standard C interface in that it allows shellscripts and system commands to access the hardware in an intuitive way.

Simple API We have seen that mechanisms such as function callback arevaluable if not indispensable for certain features like event notification. Thewebcam framework provides the corresponding interfaces that can be usedas soon as the kernel space components implement the necessary underlyingmechanisms.

In addition, the enumeration APIs that our libraries provide are superior interms of usability to those that V4L2 offers. While some V4L2 functions likeframe format enumeration can require dozens of ioctl calls and the manage-ment of dynamic data structures in the client, our framework allows all enu-meration data to be retrieved in two function calls. The first one returns therequired buffer size and the second one returns the data in one self-containedblock of memory. The complexity on the application’s side is minimal and sois the overhead.

Complicated device enumeration Applications should not have to loopthrough the huge number of device nodes in the system and filter out the de-vices they can handle. This approach requires the applications to know criteriathey should not have to know like the decision whether a given device nodeis a video device or not. If these criteria change, all applications have to beupdated, which is a big problem if certain programs are no longer maintained.This problem is solved by the device enumeration function of libwebcam.

No stateless device information querying It seems unnecessary to opena device just to retrieve its name and other information an application maywant to present to its user. In the same way that listing the contents of a di-rectory with ls does not open each single file, it would be desirable to querythe device information at enumeration time. libwebcam does this by main-taining an internal list of camera devices that contains such data. It can beretrieved at any time by any application without opening a V4L2 device.

Missing frame format enumeration As we will see later on, this problemwas solved by adding the missing functionality directly to V4L2 with the UVCdriver being the first one to support it. To keep the API as uniform and sim-ple as possible for application developers, libwebcam has a wrapper for frame

54

format enumeration that severely reduces the complexity associated with re-trieving the supported frame formats.

Lack of current documentation While we have not solved the problemof parts of the V4L2 documentation being outdated or incomplete, we didmake sure that all libraries that application developers can interact with arethoroughly documented; an extensive API specification is available in HTMLformat. In addition, this report gives a vast amount of design and implementa-tion background. This is a big advantage for developers who want to use partsof the webcam framework for their own applications.

The next two chapters are devoted to the more technical details of whatwas presented in this chapter. We will first look at the extensions and changesthat were applied to currently existing components before we focus on thenewly developed aspects of the webcam framework.

55

Chapter 6

Enhancing existingcomponents

In order to realize the webcam framework like it was described in the previouschapter a few extensions and changes to existing components were necessary.These range from small patches that correct wrong or inflexible behavior torewrites of bigger software parts. This chapter sums up the most important ofthese and lists them in the order of their importance.

6.1 Linux UVC driver

With UVC devices being at the center of the Linux webcam framework theUVC driver was the main focus of attention as far as preexisting componentsare concerned. The following sections describe some important changes andgive an outlook of what is about to change in the near future.

6.1.1 Multiple open

From chapter 5 we know that multiple open is a useful feature to work aroundsome of V4L2’s limitations. Since the webcam framework relies on the cameradriver being able to manage multiple simultaneously opened file handles to agiven device, this was one of the most important extensions to the UVC driver.

The main challenge when developing a concept for multiple device open-ing are permissions and priorities. As with ordinary file handles where theoperating system must make sure that readers and writers do not disrupt eachother, the video subsystemmust make sure that two video device handles can-not influence each other in unwanted ways. Webcam drivers that are unableto multiplex the video stream must make sure that only a single device handleis streaming at a time.

While this seems easy enough to do the problem arises because the conceptof "streaming" is not clearly definable. When does streaming start? When does

56

it stop? There are several steps involved between when an application decidesto start the video stream and when it frees the device again:

1. Open the device.

2. Set up the stream format.

3. Start the stream.

4. Stop the stream.

5. Close the device.

Drawing the line at the right place is a trade-off between preventing illinteractions on the one hand and allowing a maximum of parallel access onthe other. We decided to make the boundary right before the stream setup.To this end we divided the Video4Linux functions into privileged (or streaming)ioctls and unprivileged ioctls and introduced a state machine for the devicehandles (figure 6.1).

Figure 6.1: The state machine for the device handles of the Linux UVC driverused to guarantee device consistency for concurring applications. The roundedrectangles show which ioctls can be carried out in the corresponding state.

There are four different states:

57

• Closed The first unprivileged state. While not technically a state in thesoftware, this state serves as a visualization for all inexistent handles thatare about to spring into existence when they are opened by an applica-tion. It is also the state that all handles end up in when the applicationcloses them.

• Passive The second unprivileged state. Every handle is created in thisstate. It stands for the fact that the application has opened the device buthas not yet made any steps towards starting the stream. Querying deviceinformation or enumerating controls can already happen in this state.

• Active The first privileged state. A handle moves from passive to activewhen it starts setting up the video stream. Four ioctls can be identi-fied in the UVC driver that applications use before they start streaming:TRY_FMT, S_FMT, and S_PARM for stream format setup and REQBUFSfor buffer allocation. As soon as an application calls one of these func-tions, its handle moves into the active state–unless there already is an-other handle for the same device in a privileged state, in which case anerror is returned.

• Streaming The second privileged state. Using the STREAMON ioctl letsa handle move from active to streaming. Obviously only one handle canbe in this state at a time for any given device because the driver madesure that no two handles could get into the active state in the first place.

The categorization of all ioctls into privileged and unprivileged ones notonly yields the state transition events but also decides which ioctls can be usedin which states. Table 6.1 contains a list of privileged ioctls. Also note that theonly way for an application with a handle in a privileged state to give up itsprivileges is to close the handle.

ioctl Description

S_INPUT Select the current video input (no-op in uvcvideo).

QUERYBUF Retrieve information about a buffer.

QBUF Queue a video buffer.

DQBUF Dequeue a video buffer.

STREAMON Start streaming.

STREAMOFF Stop streaming.

Table 6.1: Privileged controls in the uvcvideo state machine used for multipleopen.

This schema guarantees that different device handles for the same devicecan perform the tasks required for panel applications and the Linux webcamframework while ensuring that the panel application cannot stop the streamor change its attributes in a way that could endanger the video application.

58

6.1.2 UVC extension support

We saw that in section 2.4 when we discussed the USB Video Class speci-fication that extension units are important for device manufacturers to addadditional features. For this reason, UVC drivers should have an interface thatallows applications to access these extension units. Otherwise, they may notbe able to exploit the full range of device capabilities.

Raw extension control support through sysfs

The first and obvious way to expose UVC extension controls in a generic wayis to give applications raw access. Under Linux sysfs is an ideal way to realizesuch an interface. Extensions and their controls are mapped to a hierarchicalstructure of virtual directories and files that applications can read from andwrite to. The files are treated like binary files, i.e. what the application writesto the file is sent as is to the device and what the application reads from thefile is the same buffer that the driver has received from the device. Duringthis whole process no interpretation of the relayed data is being done on thedriver’s side.

Let us look at a simplified example of such a sysfs directory structure:

extensions/|-- 63610682-5070-49AB-B8CC-B3855E8D221D|-- 63610682-5070-49AB-B8CC-B3855E8D221E|-- 63610682-5070-49AB-B8CC-B3855E8D221F+-- 63610682-5070-49AB-B8CC-B3855E8D2256

|-- ctrl_1+-- ctrl_2

|-- cur|-- def|-- info|-- len|-- max|-- min|-- name+-- res

We can see that the camera supports four different extension units, eachof which identified by a unique ID. The contents of the last one show twocontrols and one of the controls has its virtual files visible. All these files cor-respond directly to the UVC commands of the same name. For example, theread-only files def and len map to GET_DEF and GET_LEN. In the case of theonly writable file cur there are two corresponding UVC commands: GET_CURand SET_CUR. Whatever is written to the cur file is wrapped within a SET_CURcommand and sent to the device. In the opposite case where an applicationopens cur and reads from it, the driver creates a GET_CUR request, sends it tothe device and turns the device response into the file contents, followed by an

59

end-of-file marker. If an error occurs during the process, the correspondingread or write call returns an error message.

While this approach works well and is supported by our extended UVCdriver, there is a limitation associated with it that has to do with the waythat ownership and permissions are set on these virtual files. This can lead tosecurity issues on multi-user machines like section 7.5.1 will show.

Another problem with this approach of using raw data is that applicationsmust know exactly what they are doing. This is undesirable in the case ofgeneric applications because the knowledge has to be duplicated in every sin-gle one of them. The following section describes a possible way to resolve thisissue.

Mapping UVC to V4L2 controls

V4L2 applications cannot use the raw sysfs controls unless they include thenecessary tools and knowledge. Obviously, it would be easier to just use alibrary like libwebcam or libwebcampanel that can wrap any sort of controlsbehind a simple and consistent interface, but there are situations where thismay not be an option, for example in the case of applications that are nolonger maintained. If such an application has functions to enumerate V4L2controls and present them in a generic manner, then all it would take to allowthe program to use UVC extension controls is a mapping between the two.Designing and implementing a flexible mechanism that can cover most of thecases to be expected in the foreseeable future is an ongoing process for whichthe ground stones were laid as part of this project.

One of the assumptions we made was that there could be a 1:n mappingbetween UVC and V4L2 controls but not in the opposite direction. The ratio-nale behind this is that V4L2 controls must already be as simple as possibleand sensible since the application is in contact with them. For UVC controlshowever, it is conceivable that a device would pack multiple related settingsinto a single control1. If that is the case, applications should see multiple V4L2controls without knowing that the driver maps them to one and the same UVCcontrol in the background. Figure 6.2 gives a schema of such a mapping.

The next fundamental point was the question where the mapping defi-nitions should come from. The obvious answer is from the driver itself butwith the perspective of an increasing release frequency of new UVC devices inmind this cannot be the final answer. It would mean that new driver versionswould have to be released on a very frequent basis only to update the map-pings. We therefore came to the conclusion that the driver should hardcodeas few control mappings as possible with the majority coming from user space.

The decision on how such mappings are going to be fed to the driver hasnot yet been made. Two solutions seem reasonable:

1. Through sysfs. User space applications could write mapping data to asysfs file and the driver would generate a mapping from the data. The

1As a matter of fact we shall see such an example in section 7.3.

60

Figure 6.2: Schema of a UVC control to V4L2 control mapping. The UVC controldescriptor contains information about how to locate and access the UVC control.The V4L2 control part has attributes that determine offset and length inside theUVC control as well as the properties of the V4L2 control.

main challenge here would be to find a reasonable format that is bothhuman-readable and easily parseable by the driver. XML would be idealfor the first one but a driver cannot be expected to parse XML. Binarydata would be easier for the driver to parse but contradict the philos-ophy of sysfs after which exchanged data should be human-readable.Whatever the format looks like, the mapping setup would be as easy asredirecting a configuration file to a sysfs file.

2. Through custom ioctls. For the driver side the same argument as for abinary sysfs file applies here with the difference that ioctls were designedfor binary data. The drawback is that a specialized user space applicationwould be necessary to install the mapping data, such as a control dae-mon.

For the moment, we restrict ourselves to hardcoded mappings. The fu-ture will show which way turns out to be the best to manage the mappingconfiguration from user space.

Internally the driver manages a global list of control descriptors with theirV4L2 mappings. In addition, a device-dependent list of controls, the so-calledcontrol instances, is used to store information about each device’s controls like

61

the range of valid values. When a control descriptor is added, the driver loopsthrough all devices and adds a control instance only if the device in questionsupports the new control. This process required another change to the driver’sarchitecture: the addition of a global device list.

Many drivers do not need to maintain an internal list of devices becausealmost all APIs provide space for a custom pointer in the structures that theymake available when they call an application. Such a pointer allows for betterscaling and less overhead because the driver does not have to walk any datastructures to retrieve its internal state. This is indispensable for performancecritical applications and helps simplify the code in any case. The Linux UVCdriver also uses this technique whenever possible but for adding and removingcontrol mappings it must fall back to using the device list. Luckily, this doesnot cause any performance problems because these are exceptional events thatdo not occur during streaming.

Once all the data structures are in place, the V4L2 control access functionsmust be rewritten to use the mappings. Laurent Pinchart is currently workingon this as part of his rewrite of the control code that fixes a number of othersmall problems.

6.1.3 V4L2 controls in sysfs

In connection with the topics mentioned above there is an interesting discus-sion going on whether all V4L2 controls could be exposed to sysfs by defaultand in a generic manner. The idea comes from the pvrusb2 driver[10] whichdoes just that. What originally started out as a tool for debugging turned outto be a useful option for scripting the supported devices.

Given the broad application scenarios and generic nature of the feature itwould be optimal if the V4L2 core took care of automatically exposing all de-vice controls to sysfs in addition to the V4L2 controls that are available today.While currently not more than a point of discussion and an entry on the wishlist, it is likely that Video4Linux will eventually receive such a control map-ping layer. It would complete the sysfs interface of uvcvideo in a very nicemanner and open the doors for entirely new tools.

If such an interface became reality, libwebcam could automatically use itif the current driver does not support multiple opening of the same devicebecause this would prevent it from using the V4L2 controls that it uses now.This switch would be completely transparent to users of libwebcam.

6.2 Video4Linux

In section 4.5.3 we saw a number of issues that developers of software usingthe current version of V4L2 have to deal with. While most of them could notbe fixed without breaking backwards compatibility, the most severe one, thelack of frame format enumeration as described in 4.5.3, was relatively easy toovercome.

62

V4L2 currently provides a way to enumerate a device’s supported pixelformats using the VIDIOC_ENUM_FMT ioctl. It does this by using the standardway for list enumeration in V4L2: The application repeatedly calls a given ioctlwith an increasing index, starting at zero, and receives all the correspondinglist entries in return. If there are no more entries left, i.e. the index is out ofbounds, the driver returns the EINVAL error value.

There are two fundamental problems with this approach:

• Application complexity. The application cannot know howmany en-tries there are in the list. Using a single dynamically allocated memorybuffer is therefore out of question unless the buffer size is chosen muchbigger than the average expected size. The only reliable and scalableway is to build up a linked list within the application and add an entryfor each ioctl call. This shifts the complexity towards the application,something that should be avoided by an API in order to encourage de-velopers to use it in the first place and to discourage possibly unreliablehacks.

• Non-atomicity. If the list that the application wants to enumerate doesnot remain static over time, there is always a chance that the list changeswhile an application is enumerating the contents of the list. If this hap-pens, the received data is inevitably inconsistent leading to unexpectedbehavior in the best case or crashes in the worst case.

The first idea for a workaround that comes to mind is that the drivercould return a special error value indicating that the data has changedand that the application should restart the enumeration. Unfortunatelythis does not work because the driver has no way of knowing if an ap-plication is currently enumerating at all. Nothing forbids the applicationto start with a different index than zero or quit the enumeration processbefore the driver has had a chance to return the end of list marker.

When we decided to add frame size and frame rate enumeration, our firstdraft would have solved both of these problems at once. The entire list wouldhave been returned in a single buffer making it easy for the application to parseon the one hand and rendering the mechanism insusceptible to consistencyissues on the other. The draft received little positive feedback, however, andwe had to settle for a less elegant version that we present in the remainder ofthis section. The advantages of the second approach are its obvious simplicityfor the driver side. It is left up to the reader to decide whether driver simplicityjustifies the above problems.

No matter what enumeration approach is chosen, an important point mustbe kept in mind: the attributes pixel format, frame size, and frame rate are notindependent of each other. For any given pixel format, there is a list of sup-ported frame sizes and any given combination of pixel format and frame sizedetermines the supported frame rates. This seems to imply a certain hierar-chy of these three attributes, but it is not necessarily clear what this hierarchy

63

should look like. Technical details, like the UVC descriptor format, suggest thefollowing:

1. Pixel format

2. Frame size

3. Frame rate

However, for users it may not be obvious why they should even care aboutthe pixel format. A video stream should mainly have a large enough imageand a high enough frame rate. The pixel format and whether compression isused, is just a technicality that the application should deal with in an intelligentand transparent manner. As a result, a user might prefer a list of frame sizes tochoose from first and, possibly, a list of frame rates as a function of the selectedresolution.

In order to keep the V4L2 frame format enumeration API consistent withthe other layers, we decided to leave the hierarchy in the order mentionedabove. An application can still opt to collect the entire attribute hierarchy andpresent the user with a more suitable order.

Once such a hierarchy has been established, the input and output valuesof each of the enumeration functions becomes obvious. The highest levelhas no dependency on lower levels, the lower levels have dependencies ononly the higher levels. This mechanism can theoretically be extended to anarbitrary number of attributes although in practice there are limits to whatcan be considered a reasonable number of input values. Table 6.2 gives thesituation for the three attributes used by webcams.

Enumerationattribute

Input parameters Output values

Pixel format none Pixel formats

Frame size Pixel format f Frame sizes supportedfor frame format f

Frame rate Pixel format f,Frame size s

Frame rates supportedfor frame format f andframe size s

Table 6.2: Input and output values of the frame format enumeration functions.

As it happens the V4L2 API already provided a function for pixel formatenumeration, which means that it could be seamlessly integrated with ourdesign for frame size and frame rate enumeration. These functions are nowpart of the official V4L2 API, the documentation for which can be found at[5].

64

6.3 GStreamer

GStreamer has had V4L2 support in the form of the v4l2src plugin for a while,but had not received any testing with webcams using the UVC driver. Thereis a particularity about the UVC driver that causes it not to work with a fewapplications, notably the absence of the VIDIOC_G_PARM and VIDIOC_S_PARMioctls that do not apply to digital devices. The GStreamer V4L2 source was oneof these applications that would rely on these functions to be present and failin the adverse case.

After two small patches, however, the first to remove the above depen-dency and the second to fix a small bug in the frame rate detection code, thev4l2src plugin worked great with UVC webcams and proved to be a good choiceas a basis for our project.

In September 2006, Edgard Lima, one of the plugin’s authors, added propersupport for frame rate negotiation using GStreamer capabilities which allowsGStreamer applications to take full advantage of the spectrum of streamingparameters.

6.4 Bits and pieces

Especially during the first few weeks the project involved a lot of testing andbug fixing in various applications. Some of these changes are listed below.

Ekiga During some tests with a prototype camera a bug in the JPEG decoderof Ekiga became apparent. The JPEG standard allows an encoder to add acustomized Huffman table if it does not want to use the one defined in thestandard. The decoder did not process such images properly and failed todisplay the image as a result.

Also there were two issues with not supported ioctls and the frame ratecomputation, very similar to those in GStreamer’s v4l2src.

Spca5xx The Spca5xx driver already supports a large number of webcams aswe saw in section 4.3.2 and the author relies to a large part on user feedbackto maintain his compatibility list. We also did some tests at Logitech with anumber of our older cameras and found a few that were not recognized by thedriver but would still work with the driver after patching its USB PID list.

luvcview The luvcview tool had a problem with empty frames that couldoccur with certain cameras and which would make the application crash. Thiswas fixed as part of a patch that added two different modes for capturing rawframes. One mode writes each received frame into a separate file (raw framecapturing), the other one creates one single file where it stores the completevideo stream (raw frame stream capturing). The first mode can be used to easily

65

capture frames from the camera, although, depending on the pixel format, thedata may require some post processing, e.g. adding of an image header.

66

Chapter 7

New components

Chapter 5 gave an overview of our webcam framework and described its goalswithout going into much technical detail. This chapter is dedicated to elab-orate how some of these goals were achieved and implemented. It will alsoexplain the design decisions and explain why we have chosen certain solutionsbefore others.

At the same time we will show the limitations of the current solution andtheir implications towards future extensibility. Another topic of this chapteris the licensing model of the framework, a crucial topic of any open sourceproject. We will also give an outlook on future work and opportunities.

7.1 libwebcam

The goals of the Webcam library, or simply libwebcam, were briefly covered insection 5.4.8. The API is described in great detail in the documentation thatcomes with the sources. The functions can be grouped into the followingcategories:

• Initialization and cleanup

• Opening and closing devices

• Device enumeration and information retrieval

• Frame format enumeration

• Control enumeration and usage

• Event enumeration and registration

The general usage is rather simple. Each application must initialize the li-brary before it is first used. This allows the library to properly set up its internaldata structures. The client can now continue by either enumerating devicesor, if it already knows which device it wants to open, directly go ahead andopen a device. If a device was successfully opened, the library returns a handle

67

that the application has to use for all subsequent requests. This handle is thenused for tasks such as enumerating frame formats or controls and reading orwriting control values. Once the application is done, it should close the de-vice handles and uninitialize the library to properly free any resources that thelibrary may have allocated.

Let us now look at a few implementation details that are important forapplication developers using libwebcam to know.

7.1.1 Enumeration functions

All enumeration functions use an approach that makes it very easy for ap-plications to retrieve the contents of the list in question. This means thatenumeration usually takes exactly two calls, the first one to determine the re-quired buffer size, the second one to fill the buffer. In the rare occasion wherethe list changes between the two calls, a third call can be necessary, but withthe current implementation this situation can only arise for devices. The fol-lowing pseudo code illustrates the usage of this enumeration schema from thepoint of view of the application.

buffer := NULLbuffer_size := 0required_size := c_enum(buffer : NULL, size : buffer_size)while(required_size > buffer_size)

buffer_size := required_sizebuffer := allocate_memory(size : buffer_size)required_size := c_enum(buffer : buffer, size : buffer_size)

Obviously, the syntax of the actual API looks slightly different and appli-cations must do proper memory management and error handling.

Another aspect that makes this type of enumeration very easy for its usersis that the buffer is completely self-contained. Even though the buffer cancontain variable-sized data, it can be treated as an array through which theapplication can loop. Figure 7.1 illustrates the memory layout of such a buffer.

7.1.2 Thread-safety

The entire library is programmed in a way that makes it safe to use frommulti-threaded applications. All internal data structures are protected against simul-taneous changes from different threads that could otherwise lead to inconsis-tent data or program errors. Since most GUI applications are multi-threaded,this spares the application developer from taking additional steps to preventmultiple simultaneous calls to libwebcam functions.

68

Figure 7.1: Illustration of the memory block returned by a libwebcam enumera-tion function. The buffer contains three list items and a number of variable-sizeditems (strings in the example). Each list item has four words of fixed-sized dataand two char pointers. The second item shows pointers to two strings in thevariable-sized data area at the end of the buffer. Pointers can also be NULL, inwhich case there is no space reserved for them to point to. Note that only thepointers belonging to the second item are illustrated.

69

7.2 liblumvp and lvfilter

The Logitech user mode video processing library is in some ways similar to libweb-cam. It also provides controls as we have seen in section 5.4.11 and its inter-face is very similar when it comes to library initialization/cleanup or controlenumeration. The function categories are:

• Initialization and cleanup

• Opening and closing devices

• Video stream initialization

• Feature enumeration and management

• Feature control enumeration and usage

• Video processing

In our webcam framework, liblumvp is not directly used by the application.Instead, its two clients are lvfilter, the video interception filter that deliversvideo data, and libwebcampanel, from which it receives commands directed atthe features that it provides. Nothing, however, prevents an application fromdirectly using liblumvp apart from the fact that this would make the applica-tion directly dependent of a library that was designed to act transparently inthe background.

lvfilter hooks into the GStreamer video pipeline where it influences thestream capability negotiation in a way that makes sure that the format is un-derstood by liblumvp. It then initializes the latter with the negotiated streamparameters and waits for the pipeline state to change. When the stream starts,it redirects all video frames through liblumvp, where they can be processedand, possibly, modified before it outputs them to the remaining elements inthe pipeline.

While lvfilter takes care of the proper initialization of liblumvp, it doesnot use the feature controls that liblumvp provides. Interaction with thesehappens through libwebcampanel as we will see shortly.

We have mentioned that applications must explicitly make use of liblumvpby including lvfilter in their GStreamer pipeline. This has positive and negativesides. The list of drawbacks is led by the fact that it does not smoothly integrateinto existing applications and that each application must test for the existenceof lvfilter if it wants to use the extra features. It is this very fact, however,that can also be seen as an opportunity. Some users do not like componentsto work transparently, either because they could potentially have negativeinteractions that would make problems hard to debug or because they do nottrust closed source libraries.

Before we move on to the next topic a few words about the two pluginsthat are currently available:

70

Mirror The first, very simple plugin is available on any camera and lets theuser mirror the image vertically and horizontally. While the first one can beused to turn a webcam into an actual mirror, the second one can be useful forlaptops with cameras built into the top of the screen because these are usuallyrotatable by 180 degrees along the upper edge and allow switching betweentargeting the user and what is in front of the user.

Face tracking This module corresponds closely to what users of the Quick-Cam software on Windows know as the "Track two or more of us" mode ofthe face tracking feature. The algorithm detects people’s faces and zooms inon them, so that they are better visible when the user moves away from thecamera. If the camera supports mechanical pan and tilt, like the LogitechQuickCam Orbit, it does so by moving the lens head in the right direction.For other cameras the same is done digitally. This feature is only available forLogitech cameras that are UVC compatible.

In the future, more features from the Logitech QuickCam software will bemade available for Linux users through similar plugins.

7.3 libwebcampanel

The interface of theWebcam panel library is very similar to the one provided bylibwebcam. This was a design decision that should make it easy for applicationsthat started out using libwebcam to switch to libwebcampanel when they wantmore functionality.

7.3.1 Meta information

Section 5.4.9 gave a high-level overview of what sort of information filteringlibwebcampanel adds on top of what libwebcam provides. Let us look at thesein more detail.

• Devices

– Camera name change: The camera name string in libwebcamcomes from the V4L2 driver and is usually generic. In the case of theUVC driver it is always "USB Video Class device", not very helpfulfor the user who has three different UVC cameras connected. Forthis reason libwebcampanel has a built-in database of device namesthat it associates with the help of their USB vendor and productIDs. This gives the application more descriptive names like "Log-itech QuickCam Fusion". If the library recognizes only the vendorbut not the exemplary device ID 0x1234, it is still able to provide asomewhat useful string like "Unknown Logitech camera (0x1234)".

• Controls

71

– Control attribute modification: These modification range fromsimple name changes to more complex ones like modification of thevalue ranges or completely changing the type of a control. Controlscan also be made read-only or write-only.

– Control deletion: A control can be hidden from the application.This can be useful in cases where a driver wrongly reports a genericcontrol that is not supported by the hardware. The library can fil-ter those out stopping them from appearing in the application andconfusing users.

– Control splitting: A single control can be split into multiple con-trols. As a fictitious example, a 3D motion control could be split upinto three different motion controls, one for each axis.

While the first point is pretty self-explanatory, the second one deserves afew real life examples.

Example 1: Control attribute modification The UVC standard defines acontrol called Auto-exposure mode. It determines what parameters the camerachanges to adapt to different lighting conditions. This control is a 8-bit widebitmask with only four of the eight bits actually being used. The bits are mu-tually exclusive leaving 1, 2, 4, 8 as the set of legal values. However, due to thelimited control description capabilities of UVC, the control is usually exportedas an integer control with valid values ranging from 1 to 255.

If an application uses a generic algorithm to display such a control, it mightpresent the user with a slider or range control that can take all possible valuesbetween 1 and 255. Unfortunately, most values will have no effect becausethey do not represent a valid bitmask.

libwebcampanel comes with enough information to avoid this situation byturning the auto-exposure mode control into a selection control instead thatallows only four different settings–the ones defined in the UVC standard. Now,the user will see a list box, or whatever the application developer decided touse to represent a selection control, with each entry having a distinct andclear meaning and no chance for the user to accidentally select invalid values,a major gain in usability.

Example 2: Control splitting The Logitech QuickCam Orbit series has me-chanical pan and tilt capabilities with the help of two little motors. Both mo-tors can be moved separately by a given angle. The control, through whichthese capabilities are exposed, however, combines both values, i.e. relativepan angle and relative tilt angle, in a single 4-byte control containing a signed2-byte integer for each. For an application such a control is virtually unusablewithout the knowledge how the control values have to be interpreted.

libwebcampanel solves this problem very elegantly by splitting up the con-trol into two separate controls: relative pan angle and relative tilt angle. It also

72

marks both controls as write-only, because it makes no sense to read a rela-tive angle, and as action controls, meaning that changing the controls causesa one-time action to be performed. The application can use this information,as in the example of LVGstCap, to present the user with a slider control thatcan be dragged to either side and jumps back to the neutral position when letgo of.

Obviously, most of this information is device-specific and needs to be keptup-to-date whenever new devices become available. It can therefore be ex-pected that new minor versions of the library appear rather frequently includ-ing only minor changes.

An alternative approach would be to move all device-specific informationoutside the library, e.g. in XML configuration files. While this would make iteasier to keep the information current, it would also make it harder to describedevice-specific behavior. The future will show, which one of these approachesis more suitable.

7.3.2 Feature controls

Feature controls influence directly what goes on inside liblumvp. They canenable or disable certain features or change the way video effects operate.They are different to ordinary controls in a few ways and they require a fewspecial provisions as we shall see now.

Controls vs. feature controls

We have previously mentioned that controls and feature controls are han-dled in an almost symmetrical manner. The small but important differencebetween the two is that ordinary controls are device-related but feature con-trols are stream-related. What this means is that the list of device controls canbe queried before the application takes any steps to start the video stream. Thedriver and therefore V4L2 know about them from the very start. At this time,the GStreamer pipeline may not even be built and lvfilter and liblumvp notloaded.

So in practice, a video application will probably query the camera controlsright after device connection but feature controls only when the video is aboutto be displayed. This timing difference would make it considerably more com-plicated for applications to manage a combined control list in a user-friendlymanner. As a nice side-effect, it becomes easy for the application to selectivelysupport only one set of controls or to clearly separate the two sets.

Communication between client and library

There is another very important point that was left unmentioned until nowand that only occurs in the case of a panel application. The video stream, andtherefore liblumvp, and the panel application that uses libwebcampanel run in

73

two different processes. This means that the application would in vain try tochange feature controls. liblumvp could well be loaded into the application’saddress space, but it would be a second and completely independent instance.To avoid this problem, the two libraries must be able to communicate acrossprocess borders–a clear case for inter-process communication.

Both, libwebcampanel and liblumvp have a socket implementation overwhich they can transfer all requests related to feature controls. Their semanticsare completely identical, only the medium differs. Whenever a client opens adevice using liblumvp (in our case this is done by lvfilter), it creates a socketserver thread that waits for such requests. libwebcampanel, on the other side,has a socket client that it uses to send requests to liblumvp whenever one ofthe feature control functions is used.

There is a possible optimization here, namely the use of the C interfaceinstead of the IPC interface whenever both libraries run in the same process.However, the IPC implementation does not cause any noticeable delays sincethe amount of transmitted data remains in the order of kilobytes. We optedfor the simpler solution of using the same interface in both cases, althoughthe C version is still available and ready to use if circumstances make it seempreferable.

7.4 Build system

The build system of the webcam framework is based on the Autotools suite,the traditional choice for most Linux software. The project is mostly self-contained with the exception of liblumvp which has some dependencies onconvenience libraries1 outside the build tree. These convenience libraries con-tain some of the functionality that liblumvp plugins rely on and were portedfrom the corresponding Windows libraries.

The directory structure of the open source part looks as follows:

/+--lib| +--libwebcam| +--libwebcampanel| +--gstlvfilter|+--src

+--lvgstcap

The top level Makefile generated by Autotools compiles all the compo-nents, although each component can also be built and installed on its own.Generic build instructions are included in the source archive.

1Convenience libraries group a number of partially linked object files together. While they arenot suitable for use as-is, they can be compiled into other projects in a similar way to ordinaryobject files.

74

7.5 Limitations

Each solution has its trade-offs and limitations and it is important to be awareof them. Some of them have technical reasons; others are the result of timeconstraints or are beyond the project’s scope. This section is dedicated to makedevelopers and users of the Linux webcam framework aware of these limita-tions. At the same time it gives pointers for future work, which are the topicof the next section.

7.5.1 UVC driver

Even though the Linux UVC driver is stable and provides support for all basicUVC features needed to do video streaming and manage video controls, itis still work in progress and there remains a lot of work to be done for itto implement the entire UVC standard. At the moment, however, having acomplete UVC driver cannot be more than a long-term goal. For one thing,the UVC standard describes many features and technologies for which thereexist no devices today and for another, not even Windows ships with such adriver. What is important, and this is a short-term goal that will be achievedsoon, is that the driver supports the features that today’s devices use. Luckily,the list of tasks to get there is now down to a relative small number of items.A few of these are discussed below.

Support for status interrupts

The UVC standard defines a status interrupt endpoint that devices must imple-ment if they want to take advantage of certain special features. These are:

• Hardware triggers (e.g. buttons on the camera device for functions suchas still image capturing)

• Asynchronous controls (e.g. motor controls whose execution can takea considerable amount of time and after completion of which the drivershould be notified)

• AutoUpdate controls (controls whose values can change without an ex-ternal set request, e.g. sensor-based controls)

When such an event occurs, the device sends a corresponding interruptpacket to the host and the UVC driver can take the necessary action, for ex-ample update the internal state or pass the notification on to user space appli-cations.

Currently, the Linux UVC driver has no support for status interrupts andconsequently ignores the packets. While this has no influence on the videostream itself, it prevents applications from receiving device button events or benotified when a motor control command has finished. The latter one can bequite useful for applications because they may want to prevent the user fromsending further motion commands while the device is still moving.

75

In the context of mechanical pan/tilt there are two other issues that thelack of such a notification brings with it:

1. Motion tracking. When a motion tracking algorithm, like the one usedfor multiple face tracking in liblumvp, issues a pan or tilt command tothe camera, it must temporarily stop processing the video frames for theduration of the movement. Otherwise, the entire scene would be inter-preted as being in motion due to the viewport translation that happens.After the motion has completed, the algorithm must resynchronize. Ifthe algorithm has no way of knowing the exact completion time it mustresort to approximations and guess work, therefore decreasing its per-formance. This is what liblumvp does at the moment.

2. Keeping track of the current angle. If the hardware itself does notprovide the driver with information as to the current pan and tilt angles,the driver or user space library can approximate this by keeping track ofthe relative motion commands it sends to the device. For this purpose,it needs to know whether a given command has succeeded and if so, atwhat point in time in order to avoid overlapping requests.

One of the reasons why the UVC driver does not currently process statusinterrupts is that the V4L2 API does not itself have any event notificationsupport. As we saw in section 4.5.1 such a scheme is not easy to implementdue to the lack of callback techniques that kernel space components have attheir disposition.

The sysfs interface that is about to be included in the UVC driver is a firststep into the direction of adding a notification scheme. Since kernel 2.6.17it is possible to make sysfs attributes pollable (see [2] for an overview of theinterface). This polling process does not impose any CPU load on the systembecause it is implemented with the help of the poll system call. The polling pro-cess sleeps and wakes up as soon as one of the monitored attributes changes.For the application this incurs some extra complexity, notably the necessity ofmulti-threading.

This is clearly a task for a library like libwebcam. The polling functional-ity only needs to be written once and at the same time the notifications canbe sent using a more application friendly mechanism like callback. libweb-cam already has an interface designed for this exact purpose. As soon as thedriver is up to the task, applications will be able to register callback functionsfor individual events, some of them coming from the hardware, others beingsynthesized by the library itself.

Sysfs permissions

Another problem that still awaits resolution is to find a method to avoid givingall users arbitrary access to the controls exported to the sysfs virtual file sys-tem. Since sysfs attributes have fixed root:root ownership when the UVC drivercreates them, this does not leave it much choice when it comes to defining

76

permissions. Modes 0660 and 0664, on the one hand, would only give thesuperuser write access to the sysfs attributes, and therefore the UVC extensioncontrols. Mode 0666, on the other hand, would permit every user to changethe behavior of the attached video devices leading to a rather undesirable situ-ation: a user guest that happens to be logged in via SSH on a machine on whicha video conference is in progress could change settings such as brightness oreven cause the camera to tilt despite not having access to the video stream orthe V4L2 interface itself.

For device nodes this problem is usually resolved by changing the groupownership to something like root:video and giving it 0660 permissions. Thisstill does not give fine grained permissions to individual users but at least auser has to be a member of the video group to be able to access the camera.

A good solution would be to duplicate the ownership and permission fromthe device node and apply them to the sysfs nodes. This would make sure thatwhoever has access to the V4L2 video device also has access to the device’sUVC extensions and controls. Currently, however, such a solution does notseem feasible due to the hard-coded attribute ownership.

Another approach to the problem would be to let user space handle thepermissions. Even though sysfs attributes have their UID and GID set to 0on creation, they do preserve new values when set from user space, e.g. us-ing chmod. A user space application running with elevated privileges couldtherefore take care of this task.

Ongoing development

The ongoing development of the UVC driver is of course not a limitation initself. The limitation merely stems from the fact that not all of the proposedchanges have made their way into the main driver branch yet. As of the timeof this writing, the author is rewriting parts of the driver to be more modularand to better adapt them to future needs. At the same time, he is integratingthe extensions presented in 6.1 piece by piece. The latest SVN version of theUVC driver does not yet contain the sysfs interface, but it will be added assoon as the completely rewritten control management is finished. Therefore,for the time being, users who want to try out the webcam framework in itsentirety, in particular functions that require raw access to the extension unitsneed to use the version distributed as part of the framework.

Another aspect of the current rewrite is the consolidation of some internalstructures, notably the combination of the uvc_terminal and uvc_unit structs.This will simplify large parts of the control code because both entity typescan contain controls. The version distributed with the framework does notproperly support controls on the camera terminal. This only affects the con-trols related to exposure parameters and will automatically be fixed during themerge back.

77

Still image support

The UVC specification includes features to retrieve still images from the cam-era. Still images are treated differently from streaming video in that they donot have to be real-time, which gives the camera time to apply image qualityenhancing algorithms and techniques.

At the moment, the Linux UVC driver does not support this method at all.This is hardly a limitation at the moment because current applications are sim-ply not prepared for such a special mode. All single frame capture applicationsthat currently exist open a video stream and then process single frames only,something that obviously works perfectly fine with the UVC driver.

In the future one could, however, think of some interesting features likethe ability to read still images directly from /dev/videoX after setting a few pa-rameters in sysfs. This would allow frame capturing with simple commandline tools or amazingly simple scripts. Imagine the following, for example:

dd if=/dev/video0 of=capture.jpg

It would be fairly simple to extend the driver to support such a feature, butthe priorities are clearly elsewhere at the moment.

7.5.2 Linux webcam framework

Missing event support

The fact that libwebcam currently lacks support for events, despite the factthat the interface is there, was already mentioned above. To give the readeran idea of what the future holds, let us look at the list of events that libwebcamand libwebcampanel could support:

• Device discovered/unplugged

• Control value changed automatically (e.g. for UVC AutoUpdate controls)

• Control value changed by client (to synchronize multiple clients of lib-webcam)

• Control value change completed (for asynchronous controls)

• Other, driver-specific events

• Feature control value changed (libwebcampanel only)

• Events specific to liblumvp feature plugins (libwebcampanel only)

Again, the events supported by libwebcampanel will be a superset of thoseknown to libwebcam in a manner analog to controls.

78

Single stream per device

The entire framework is laid out to only work with a single video stream ata time. This means that it is impossible to multiplex the stream, for examplewith the help of the GStreamer tee element, and control the feature pluginsseparately for both substreams.

This design decision was made for simple practicality; the additional workrequired would hardly justify the benefits. For most conceivable applicationsthis is not a limitation, though. There are no applications today that providemultiple video windows per camera at the same time and the possible usecases seem restricted to development and debug purposes.

There is another reason why it is unlikely that such applications appear inthe near future: the XVideo extension used on Linux to accelerate video ren-dering can only be used by one stream at a time, so that any additional streamswould have to be rendered using unaccelerated methods. In GStreamer termsthis means that the slower ximagesink would have to be used instead of xvima-gesink, which is the default in LVGstCap.

7.6 Outlook

Providing an outlook of the further development of the Linux webcam frame-work at this moment is not easy given that it has not been published yet andtherefore received very little feedback. There are, however, a few signs thatthere is quite some demand out there for Linux webcam software as well asrelated information.

For one thing, requests and responses that come up on the Linux UVCmailing list clearly show that the current software has deficits. A classic exam-ple is the fact that there are still many programs out there that do not supportV4L2 but are still based on the deprecated V4L1 interface. Even V4L2 applica-tions still use API calls that are not suitable for digital devices, clearly showingtheir origins in the world of TV cards.

For another thing, the demand for detailed and reliable information outthere is quite large. Linux users who want to use webcams have a number ofinformation related problems to overcome. Typical questions that arise are:

• What camera should I buy so that it works on Linux?

• I have camera X. Does it work on Linux?

• Which driver do I need? Where do I download it?

• How do I compile and install the driver? How can I verify its properfunctioning?

• What applications are there? What can they do?

• What camera features are supported? What would it take to fix this?

All these questions are not easy to answer. Even though the informationis present somewhere on the web, it is usually not easy to find because there

79

is no single point to start from. Many sites are incomplete and/or featureoutdated information, making the search even harder.

Providing software is thus not the only task on the to do list of Linux we-bcam developers. More and better information is required, something thatLogitech is taking initiative in. Together with the webcam framework, Log-itech will publish a website that is designated to become such an informationportal. At the end of this chapter we will give more details about that project.

In terms of software, the Linux webcam framework certainly has the po-tential to spur the development of new and great webcam applications as wellas giving new improved tools to preexisting ones. Our hope is that, on the onehand, the broader use of the framework will bring forth further needs that canbe satisfied by future versions and, on the other hand, that the project will giveimpulses for improving the existing components. The Linux UVC driver is onesuch component that is rapidly improving. As we have seen during the dis-cussion of limitations above, new versions will create the need for libwebcamextensions.

But libwebcam is not the only component that will see further improve-ments. Logitech will add more feature plugins to liblumvp as the frameworkgains momentum with the most prominent one being an algorithm for facetracking. Compared to the current motion tracker algorithm it performs muchbetter when there is only a single person visible in the picture.

7.7 Licensing

The licensing of open source software is a complex topic, especially when com-bined with closed source components. There are literally hundreds of differ-ent open source licenses out there and many projects choose to use their own,adapted license, further complicating the situation.

7.7.1 Libraries

One key point that poses constraints on the licensing of a project is the setof used licenses for the underlying components. In our case, this situation isquite easy. The only closed source component of our framework, liblumvp,uses GStreamer, which is in turn developed under the LGPL. The LGPL isconsidered one of the most appropriate licenses for libraries because it allowsboth open and closed source components to link against it. Such a licensingscheme considerably increases the number of potential users because devel-opers of closed source applications do not need to reinvent the wheel, but caninstead rely on libraries proven to be stable. For this reason libwebcam andlibwebcampanel are also released under the LGPL enabling any application tolink against them and use their features. The same reasoning applies to thelvfilter GStreamer plugin.

The only closed source component of the webcam framework is the li-blumvp library. Some of the feature plugins contain code that Logitech has

80

licensed from third parties under conditions that disallow their distributionin source code form. While liblumvp is free or charge, it is covered by anend-user license agreement very similar to the one that is used for Logitech’sWindows applications.

There is one question that keeps coming up in Internet forums when closedsource components are discussed: "Why doesn’t the company want to publishthe source code?" The answer is usually not that companies do not want tobut that they cannot for legal reasons. Hardware manufacturers often buysoftware modules from specialized companies and these licenses do not allowthe source to be made public.

7.7.2 Applications

All non-library code, in particular LVGstCap and lvcmdpanel, is licensed underversion 2 of the GNU GPL. This allows anybody to make changes to the codeand publish new versions as long as the modified source code is also madeavailable.

Table 7.1 gives an overview over the licenses used for the different compo-nents of this project. The complete text of the GPL and LGPL licenses can befound in [7] and [8].

Component License

libwebcam LGPL

libwebcampanel LGPL

lvfilter LGPL

liblumvp Closed source

LVGstCap GPL

lvcmdpanel GPL

Samples Public domain

Table 7.1: Overview of the licenses used for the Linux webcam framework com-ponents.

7.8 Distribution

Making the webcam framework public and getting people to use it, test it, andreceive feedback will be an important task of the upcoming months. Logitechis currently setting up a web server that is expected to go online in the lastquarter of 2006 and will contain the following:

• List of drivers: Overview of the different webcam drivers available forLogitech cameras.

81

• Compatibility information: Which devices work with which drivers?

• FAQ: Answers to questions that frequently come up in the context ofwebcams.

• Downloads: All components of the Linux webcam framework (incl.sources except for liblumvp).

• Forum: Possibility for users to discuss problems with each other and askquestions to Logitech developers.

The address will be announced through the appropriate channels, for ex-ample on the mailing list of the Linux UVC driver.

82

Chapter 8

The new webcaminfrastructure at work

After the technical details it is now time to see the webcam framework inaction–or at least static snapshots of this action. The user only has direct con-tact with the video capture application LVGstCap and the panel applicationlvcmdpanel. The work of the remaining components is, however, still visible,especially in the case of lvcmdpanel, whose interface is very close to libweb-campanel’s.

8.1 LVGstCap

Figure 8.1 shows a screenshot of LVGstCap with its separation into video andcontrol area. The video window to the left displays the current picture stream-ing from the webcam while the right-hand side contains both camera and fea-ture controls in separate tabs.

The Camera tab allows the user to change settings directly related to theimage and the camera itself. All control elements are dynamically generatedfrom the information that libwebcampanel provides. The Features tab givescontrol over the plugins that liblumvp contains. Currently it allows flippingthe image about the horizontal and vertical axes and enabling or disabling theface tracker.

8.2 lvcmdpanel

The following console transcript shows an example of how lvcmdpanel can beused.

$ lvcmdpanel -lListing available devices:

83

Figure 8.1: A screenshot of LVGstCap with the format choice menu open.

video0 Unknown Logitech camera (0x08cc)video1 Logitech QuickCam Fusion

There are two devices in the system; one was recognized, the other one wasdetected as an unknown Logitech device and its USB PID is displayed instead.

$ lvcmdpanel -d video1 -cListing available controls for device video1:

Power Line FrequencyBacklight CompensationGammaContrastBrightness

$ lvcmdpanel -d video1 -cvListing available controls for device video1:

Power Line FrequencyID : 13,Type : Choice,Flags : { CAN_READ, CAN_WRITE, IS_CUSTOM },Values : { ’Disabled’[0], ’50 Hz’[1], ’60 Hz’[2] },Default : 2

Backlight CompensationID : 12,Type : Dword,Flags : { CAN_READ, CAN_WRITE, IS_CUSTOM },Values : [ 0 .. 2, step size: 1 ],Default : 1

84

GammaID : 6,Type : Dword,Flags : { CAN_READ, CAN_WRITE },Values : [ 100 .. 220, step size: 120 ],Default : 220

ContrastID : 2,Type : Dword,Flags : { CAN_READ, CAN_WRITE },Values : [ 0 .. 255, step size: 1 ],Default : 32

BrightnessID : 1,Type : Dword,Flags : { CAN_READ, CAN_WRITE },Values : [ 0 .. 255, step size: 1 ],Default : 127

The -c command line switch outputs a list of controls supported by the specifiedvideo device, in this case the second one. For the second list the verbose switchwas enabled, which yields detailed information about the type of control, theaccepted and default values, etc. (Note that the output was slightly shortenedby leaving out a number of less interesting controls.)

The final part of the transcript can be followed easiest by first starting aninstance of luvcview in the background. The commands below change thebrightness of the image while luvcview–or any other video application–is run-ning.

$ lvcmdpanel -d video1 -g brightness127

$ lvcmdpanel -d video1 -s brightness 255

$ lvcmdpanel -d video1 -g brightness255

The current brightness value is 127 as printed by the first command. Thesecond command changes the brightness value to the maximum of 255 andthe third one shows that the value was in fact changed.

The last example shows how simple it is to create scripts to automate taskswith the help of panel applications. Even writing an actual panel applicationis very straightforward; lvcmdpanel consists of less than 400 lines of code andalready covers the basic functionality.

85

Chapter 9

Conclusion

Jumping into work in the open source community with the support of a com-pany in the back is a truly gratifying job. The expression "it’s the little thingsthat count" immediately comes to mind and the positive reactions one re-ceives, even for small favors, is a great motivation along the way.

Having been on the user side of hardware and software products for manyyears myself, I know how helpful the little insider tips can be. Until recentlymost companies were unaware of the fact that small pieces of information thatseem obvious on the inside of a product team can have a much higher valuewhen carried outside. The success of modern media like Internet forums withemployee participation and corporate blogs is a clear sign for this. Open sourceis in some ways similar to these media. Simple information that is given outcomes back in the form of improved product support, drivers written fromscratch, and, last but not least, reputation.

The Logitech video team has had such a relationship with the open sourcecommunity for a while, although in a rather low-profile manner leading tolittle public perception. This is the first time that we have actively participatedand while it remains to be seen what the influence of the project will be,the little feedback we have received makes us confident that the project is asuccess and will not end here.

As far as the author’s personal experience is concerned, the vast majoritywas of a positive nature. I was in contact with project mailing lists, devel-opers, and ordinary users of open source software without a strong program-ming background. Out of these three, the last two are certainly the easiest towork with. Developers are grateful for feedback, test results, suggestions, andpatches whereas users appreciate help with questions to which the answersare not necessarily obvious.

Mailing lists are a category of their own. While many fruitful discussionsare held, some of them reminded me of modern politics. What makes democ-racy a successful process is the fact that everybody has their say and everybodyis encouraged to speak up, something that holds true for mailing lists. Unfor-tunately, the good and bad sides go hand in hand and so mailing lists inherit

86

the dangers of slow decision making and standstill. Many discussions fail toreach a conclusion and silently dissolve, much to the frustration of the personwho brought up the topic. If open source developers need to learn one thing,it is seeing their users as customers and treating them as such. The pragmaticsolution often beats the technically more elegant in terms of utility, a fact thateach developer must learn to live with.

The future will show whether we are able to reach our long-term goal,achieving a webcam experience among users that can catch up to what Win-dows offers nowadays. The Linux platform has undoubtedly become a com-petitive platform but in order not to lose its momentum, Linux must focuson its weaknesses and multimedia is clearly one of them. The components arethere for the most part but they need to be consistently improved to make surethat they work together more closely. There are high hopes on KDE 4 withits multimedia architecture and camera support will definitely have its placein it. The moment when Linux users can plug in their webcam, start theirfavorite instant messenger and have a video conference taking advantage ofall the camera’s features is within a grasp–an opportunity not to be missed.

87

Appendix A

List of Logitech webcam USBPIDs

This appendix contains a list webcams manufactured by Logitech, their USBidentifiers and the name of the driver they are reported or tested work with.We use the following abbreviated driver names in the table:

Key Driver

pwc Philips USB Webcam driver (see 4.3.1)

qcexpress QuickCam Express driver (see 4.3.4)

quickcam QuickCam Messenger & Communicate driver (see 4.3.3)

spca5xx Spca5xx Webcam driver (see 4.3.2)

uvcvideo Linux USB Video Class driver (see 4.3.5)

The table below contains the following information:

1. The USB product ID as reported, for example, by lsusb. Note that thevendor ID is always 0x046D.

2. The ASIC that the camera is based on.

3. The name under which the product was released.

4. The driver by which the camera is supported.

An asterisk means that the state of support for the given camera is untestedbut that the camera is likely to work the driver given the ASIC. Possibly thedriver may need patching in order to recognize the given PID. A dash meansthat the camera is not currently supported.

88

PID ASIC Product name Driver

0840 ST600 Logitech QuickCam Express qcexpress

0850 ST610 Logitech QuickCam Web qcexpress

0870 ST602 Logitech QuickCam ExpressLogitech QuickCam for NotebooksLabtec WebCam

qcexpress

0892 VC321 Acer OrbiCam –

0896 VC321 Acer OrbiCam –

08A0 VC301 Logitech QuickCam IM spca5xx

08A2 VC302 Labtec Webcam Plus spca5xx

08A4 VC301 Logitech QuickCam IM spca5xx (*)

08A7 VC302 Logitech QuickCam Image spca5xx (*)

08A9 VC302 Logitech QuickCam for Notebooks Deluxe spca5xx

08AA VC302 Labtec Notebook Pro spca5xx

08AC VC301 Logitech QuickCam IM spca5xx (*)

08AD VC302 Logitech QuickCam Communicate STX spca5xx

08AE VC302 Logitech QuickCam for Notebooks spca5xx

08B0 SAA8116 Logitech QuickCam ProLogitech QuickCam Pro 3000

pwc

08B1 SAA8116 Logitech QuickCam Pro for Notebooks pwc

08B2 SAA8116 Logitech QuickCam Pro 4000 pwc

08B3 SAA8116 Logitech QuickCam Zoom pwc

08B4 SAA8116 Logitech QuickCam Zoom pwc

08B5 SAA8116 Logitech QuickCam OrbitLogitech QuickCam Sphere

pwc

08B6 SAA8116 Cisco VT Camera pwc

08B7 SAA8116 Logitech ViewPort AV100 pwc

08BD SAA8116 Logitech QuickCam Pro 4000 pwc

08BE SAA8116 Logitech QuickCam Zoom pwc

08C1 SPCA525 Logitech QuickCam Fusion uvcvideo

08C2 SPCA525 Logitech QuickCam Orbit MPLogitech QuickCam Sphere MP

uvcvideo

08C3 SPCA525 Logitech QuickCam for Notebooks Pro uvcvideo

08C5 SPCA525 Logitech QuickCam Pro 5000 uvcvideo

08C6 SPCA525 QuickCam for Dell Notebooks uvcvideo

08C7 SPCA525 Cisco VT Camera II uvcvideo

08D9 VC302 Logitech QuickCam IMLogitech QuickCam Connect

spca5xx

08DA VC302 Logitech QuickCam Messenger spca5xx

89

08F0 ST6422 Logitech QuickCam Messenger quickcam

08F1 ST6422 Logitech QuickCam Express quickcam (*)

08F4 ST6422 Labtec WebCam quickcam (*)

08F5 ST6422 Logitech QuickCam Communicate quickcam

08F6 ST6422 Logitech QuickCam Communicate quickcam

0920 ICM532 Logitech QuickCam Express spca5xx

0921 ICM532 Labtec WebCam spca5xx

0922 ICM532 Logitech QuickCam Live spca5xx (*)

0928 SPCA561B Logitech QuickCam Express spca5xx

0929 SPCA561B Labtec WebCam spca5xx

092A SPCA561B Logitech QuickCam for Notebooks spca5xx

092B SPCA561B Labtec WebCam Plus spca5xx

092C SPCA561B Logitech QuickCam Chat spca5xx

092D SPCA561B Logitech QuickCam Express spca5xx (*)

092E SPCA561B Logitech QuickCam Chat spca5xx (*)

092F SPCA561B Logitech QuckCam Express spca5xx

09C0 SPCA525 QuickCam for Dell Notebooks uvcvideo

90

Bibliography

[1] Jonathan Corbet. Linux loses the Philips webcam driver. LWN, 2004.URL http://lwn.net/Articles/99615/.

[2] Jonathan Corbet. Some upcoming sysfs enhancements. LWN, 2006. URLhttp://lwn.net/Articles/174660/.

[3] Creative. Creative Open Source: Webcam support. URL http://opensource.creative.com/.

[4] Bill Dirks. Video for Linux Two - Driver Writer’s Guide, 1999. URL http://www.thedirks.org/v4l2/v4l2dwg.htm.

[5] Bill Dirks, Michael H. Schimek, and Hans Verkuil. Video for Linux TwoAPI Specification, 1999-2006. URL http://www.linuxtv.org/downloads/video4linux/API/V4L2_API/.

[6] USB Implementers Forum. Universal Serial Bus Device Class Definition forVideo Devices. Revision 1.1 edition, 2005. URL http://www.usb.org/developers/devclass_docs.

[7] Free Software Foundation. GNU General Public License. 1991. URLhttp://www.gnu.org/copyleft/gpl.html.

[8] Free Software Foundation. GNU Lesser General Public License. 1999.URL http://www.gnu.org/copyleft/lesser.html.

[9] Philip Heron. fswebcam, 2006. URL http://www.firestorm.cx/fswebcam/.

[10] Mike Isely. pvrusb2 driver, 2006. URL http://www.isely.net/pvrusb2/pvrusb2.html.

[11] Greg Jones and Jens Knutson. Camorama, 2005. URL http://camorama.fixedgear.org/.

[12] Avery Lee. Capture timing and capture sync. 2005. URL http://www.virtualdub.org/blog/pivot/entry.php?id=78.

91

http://lwn.net/Articles/99615/

http://lwn.net/Articles/174660/

http://opensource.creative.com/

http://opensource.creative.com/

http://www.thedirks.org/v4l2/v4l2dwg.htm

http://www.thedirks.org/v4l2/v4l2dwg.htm

http://www.linuxtv.org/downloads/video4linux/API/V4L2_API/

http://www.linuxtv.org/downloads/video4linux/API/V4L2_API/

http://www.usb.org/developers/devclass_docs

http://www.usb.org/developers/devclass_docs

http://www.gnu.org/copyleft/gpl.html

http://www.gnu.org/copyleft/lesser.html

http://www.firestorm.cx/fswebcam/

http://www.firestorm.cx/fswebcam/

http://www.isely.net/pvrusb2/pvrusb2.html

http://www.isely.net/pvrusb2/pvrusb2.html

http://camorama.fixedgear.org/

http://camorama.fixedgear.org/

http://www.virtualdub.org/blog/pivot/entry.php?id=78

http://www.virtualdub.org/blog/pivot/entry.php?id=78

[13] Marco Lohse. Setting up a Video Wall with NMM. 2004. URL http://graphics.cs.uni-sb.de/NMM/current/Docs/videowall/index.html.

[14] Christian Magnusson. QuickCam Messenger & Communicate driver forLinux, 2006. URL http://home.mag.cx/messenger/.

[15] Juan Antonio Martínez. VideoForLinux: El canal del Pingüino (“ThePenguin Channel”). TLDP-ES/LuCAS, 1998. URL http://es.tldp.org/Articulos-periodisticos/jantonio/video4linux/v4l_1.html.

[16] Motama and Saarland University Computer Graphics Lab.Network-Integrated Multimedia Middleware. URL http://www.networkmultimedia.org/.

[17] Laurent Pinchart. Linux UVC driver, 2006. URL http://linux-uvc.berlios.de/.

[18] Damien Sandras. Ekiga, 2006. URL http://www.ekiga.org/.

[19] Tuukka Toivonen and Kurt Wal. QuickCam Express Driver, 2006. URLhttp://qce-ga.sourceforge.net/.

[20] Linus Torvalds. Linux GPL and binary module exception clause?, Decem-ber 2003. URL http://www.ussg.iu.edu/hypermail/linux/kernel/0312.0/0670.html.

[21] Dave Wilson. FOURCC.org, 2006. URL http://www.fourcc.org/.

[22] Michel Xhaard. luvcview, 2006. URL http://mxhaard.free.fr/spca50x/Investigation/uvc/.

[23] Michel Xhaard. SPCA5xx Webcam driver, 2006. URL http://mxhaard.free.fr/spca5xx.html.

92

http://graphics.cs.uni-sb.de/NMM/current/Docs/videowall/index.html

http://graphics.cs.uni-sb.de/NMM/current/Docs/videowall/index.html

http://home.mag.cx/messenger/

http://es.tldp.org/Articulos-periodisticos/jantonio/video4linux/v4l_1.html

http://es.tldp.org/Articulos-periodisticos/jantonio/video4linux/v4l_1.html

http://www.networkmultimedia.org/

http://www.networkmultimedia.org/

http://linux-uvc.berlios.de/

http://linux-uvc.berlios.de/

http://www.ekiga.org/

http://qce-ga.sourceforge.net/

http://www.ussg.iu.edu/hypermail/linux/kernel/0312.0/0670.html

http://www.ussg.iu.edu/hypermail/linux/kernel/0312.0/0670.html

http://www.fourcc.org/

http://mxhaard.free.fr/spca50x/Investigation/uvc/

http://mxhaard.free.fr/spca50x/Investigation/uvc/

http://mxhaard.free.fr/spca5xx.html

http://mxhaard.free.fr/spca5xx.html

Documents

Linux Webcam