First Edition Community Release

Embed Size (px)

Citation preview

  • 8/8/2019 First Edition Community Release

    1/100

  • 8/8/2019 First Edition Community Release

    2/100

    Multi-Touch Technologies

    NUI Group Authors

    1st edition [Community Release]: May 2009

    This book is 2009 NUI Group. All the content on this page is pro-

    tected by a Attribution-Share Alike License. For more information,

    see http://creativecommons.org/licenses/by-sa/3.0

    Cover design: Justin Riggio

    Contact: [email protected]

  • 8/8/2019 First Edition Community Release

    3/100

    Contents

    Preface . . . . . . . . . . . . . . . . . . . . . . . . . i

    About this book ii About the authors iii

    Chapter 1: Hardware Technologies . . . . . . . . . . 1

    11 Introduction to Hardware 2

    12 Introduction to Optical Multi-ouch echnologies 3

    1.2.1 Infrared Light Sources . . . . . . . . . . . . . . . . . . 3

    1.2.2 Infrared Cameras. . . . . . . . . . . . . . . . . . . . . 5

    13 Frustrated otal Internal Reection (FIR) 9

    1.3.1 FTIR Layers . . . . . . . . . . . . . . . . . . . . . . .10

    1.3.2 Compliant Surfaces . . . . . . . . . . . . . . . . . . .11

    14 Diused Illumination (DI) 13

    1.4.1 Front Diffused Illumination . . . . . . . . . . . . . . .131.4.2 Rear Diffused Illumination . . . . . . . . . . . . . . .13

    15: Laser light planE (LLP) 15

    16: Diused Surace Illumination (DSI) 17

    17 LED Light Plane (LED-LP) 18

    18 echnique Comparison 20

    19: Display echniques 231.9.1 Projectors . . . . . . . . . . . . . . . . . . . . . . . .23

    1.9.2 LCD Monitors . . . . . . . . . . . . . . . . . . . . . .24

    Chapter 2: Software & Applications . . . . . . . . 25

    21: Introduction to Sotware Programming 26

    22 racking 27

    2.2.1 Tracking for Multi-Touch . . . . . . . . . . . . . . . .28

    23: Gesture Recognition 30

    2.3.1 GUIs to Gesture . . . . . . . . . . . . . . . . . . . . .30

    2.3.2 Gesture Widgets . . . . . . . . . . . . . . . . . . . . .31

  • 8/8/2019 First Edition Community Release

    4/100

    2.3.3 Gesture Recognition . . . . . . . . . . . . . . . . . . 32

    2.3.4 Development Frameworks. . . . . . . . . . . . . . . 34

    24 Python 37

    2.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . 37

    2.4.2 Python Multi-touch Modules and Projects . . . . . . 38

    2.4.3 PyMT . . . . . . . . . . . . . . . . . . . . . . . . . 39

    2.4.4 PyMT Architecture. . . . . . . . . . . . . . . . . . . 39

    2.4.5 OpenGL . . . . . . . . . . . . . . . . . . . . . . . . 40

    2.4.6 Windows, Widgets and Events. . . . . . . . . . . . . 41

    2.4.7 Programming for Multi-Touch Input . . . . . . . . . 452.4.8 Implementing the Rotate/Scale/Move Interaction . . . 46

    2.4.9 Drawing with Matrix Transforms . . . . . . . . . . . 47

    2.4.10 Parameterizing the Problem and Computing the Trans-

    formation. . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    2.4.11 Interpret Touch Locations to Compute the Parameters 50

    25 ActionScript 3 & Flash 522.5.1 Opening the Lines of Communication. . . . . . . . . 52

    26 NE/C# 56

    2.6.1 Advantages of using .NET. . . . . . . . . . . . . . . 56

    2.6.2 History of .NET and multi-touch . . . . . . . . . . . 56

    2.6.3 Getting started with .NET and multi-touch . . . . . . 57

    27: Frameworks and Libraries 59

    2.7.1 Computer Vision . . . . . . . . . . . . . . . . . . . . 59

    2.7.2 Gateway Applications . . . . . . . . . . . . . . . . . 61

    2.7.3 Clients . . . . . . . . . . . . . . . . . . . . . . . . . 61

    2.7.4 Simulators . . . . . . . . . . . . . . . . . . . . . . . 63

    Appendix A: Abbreviations 65

    Appendix B: Building an LLP setup 67

    Appendix C: Building a Rear-DI Setup 75

    Appendix D: Building an Interactive Floor 81

    Appendix E: Reerences 87

  • 8/8/2019 First Edition Community Release

    5/100

    i

    Introduction to Multi-Touch

    In this chapter:

    About this bookAuthors

  • 8/8/2019 First Edition Community Release

    6/100

    ii Multi-Touch Technologies Book

    About this book

    Founded in 2006, Natural User Interace Group (~ NUI Group) is aninteractive media community researching and creating open source ma-

    chine sensing techniques to benet artistic, commercial and educational

    applications. Its the largest online (~5000+) open source community

    related to the open study o Natural User Interaces.

    NUI Group oers a collaborative environment or developers that are interested

    in learning and sharing new HCI (Human Computer Interaction) methods and

    concepts. Tis may include topics such as: augmented reality, voice/handwriting/gesture recognition, touch computing, computer vision, and inormation visualiza-

    tion.

    Tis book is a collection o wiki entries chapters written exclusively or this book,

    making this publication the rst o its kind and unique in its eld. All contribu-

    tors are listed here and countless others have been a part the community that has

    developed and nurtured the NUI Group multi-touch community, and to them we

    are indebted.

    We are sure youll nd this book pleasant to read. Should you have any commnts,

    criticisms or section proposals, please eel ree to contact us via the-book@nu-

    igroup.com. Our doors are always open and looking or new people with similar

    interests and dreams.

  • 8/8/2019 First Edition Community Release

    7/100

    iiiIntroduction to Multi-Touch

    About the authorsAlex eiche is an enthusiast that has been working with Human-ComputerInteraction or several years, with a very acute interest in collaborative multiuser

    interaction. He recently completed his LLP able, and has been contributing to

    the PyM project or several months. He is now experimenting with thin multi-

    touch techniques that are constructable using easily accessible equipment to the

    average hobbyist.

    Ashish Kumar Rai is an undergraduate student in Electronics Engineering atInstitute o echnology, Banaras Hindu University, India. His major contribution

    is the development o the simulator QMSim. Currently his work consists odevelopment o innovative hardware and sotware solutions or the advancement

    o NUI.

    Chris Yanc has been a proessional interactive designer at an online market-ing agency in Cleveland, OH or over 4 years. Ater constructing his own FIR

    multi-touch table, he has been ocusing on using Flash and Actionscript 3.0 to

    develop multi-touch applications rom communication with CCV and ouchlib.

    Chris has been sharing the progress o his multi-touch Flash experiments with theNUIGroup community.

    Christian Moore is a pioneer in open-source technology and is one o the keyevangelists o Natural User Interaces. As the ounder o NUI Group, he has led a

    community that has developed state-o-the-art human sensing techniques and is

    working on the creation o open standards to uniy the communitys eorts into

    well-ormatted documentation and rameworks.

    Donovan Solms is a Multimedia student at the University o Pretoria inGauteng, South Arica. Ater building the rst multitouch screen in South Arica

    in the rst quarter o 2007, he continued to develop sotware or multitouch us-

    ing .NE 3 and ActionScript 3. He can currently be ound building commercial

    multitouch suraces and sotware at Multimedia Solutions in South Arica.

    Grkem etin denes himsel an open source evangelist. His current doctor-

    ate studies ocus on human computer interaction issues o open source sotware.etin has authored 5 books, written numerous articles or technical and sector

    magazines and spoken at various conerences.

    Justin Riggio is a proessional interactive developer and designer that has beenworking reelance or over 5 years now. He lives and works in the NYC area and

  • 8/8/2019 First Edition Community Release

    8/100

    iv Multi-Touch Technologies Book

    can be ound at the Jersey shore surng at times. He has designed and built a 40

    screen DI multi touch table and has set up and tested LLP LCD set ups and also

    projected units. You can nd on his drating table a new plan or a DSI system andAS3 multi touch document reader.

    Nolan Ramseyer is a student at University o Caliornia, San Diego in theUnited States studying Bioengineering: Biotechnology with a background in

    computers and multimedia. His work in optical-based multi-touch has consumed

    a larger part o his ree time and through his experimentations and learning has

    helped others achieve their own projects and goals.

    Paul DIntino is the echnical raining Manager at Pioneer Electronics inMelbourne, Australia. In 2003 he completed a Certicate III in Electrotechnol-

    ogy Entertainment and Servicing, and went onto repairing Consumer Electronics

    equipment, specializing in Plasma Vs. His work in both hardware and sotware

    or multi-touch systems grew rom his passion or electronics, which he aims to

    urther develop with help rom the NUI Group.

    Laurence Muller(M.Sc.) is a scientic programmer at the Scientic Visualiza-tion and Virtual Reality Research Group (Section Computational Science) at the

    University o Amsterdam, Amsterdam in the Netherlands. In 2008 he completed

    his thesis research project titled: Multi-touch displays: design, applications and

    perormance evaluation. Currently he is working on innovative scientic sotware

    or multi-touch devices.

    Ramsin Khoshabeh is a graduate Ph.D. student in the University o Caliornia,San Diego in the Electrical and Computer Engineering department. He works in

    the in the Distributed-Cognition and Human-Computer Interaction Laboratory

    in the Cognitive Science department. His research is ocused on developing novel

    tools or the interaction with and analysis o large datasets o videos. As part o

    his work, he has created MultiMouse, a multi-touch Windows mouse interace,

    has built VideoNavigator, a cross-platorm multi-touch video navigation interace

    Rishi Bedi is a student in Baltimore, Maryland with an interest in emerging userinteraction technologies, programming in several languages, and computer hard-

    ware. As an actively contributing NUI Group community member, he hopes to

    build his own multi-touch devices & applications, and to continue to support the

    communitys various endeavors. Also interested in graphic design, he has designed

    this book and has worked in various print and electronic mediums.

    Mohammad aha Bintahiris an undergraduate student at University o On-

  • 8/8/2019 First Edition Community Release

    9/100

    vIntroduction to Multi-Touch

    tario Institute o echnology, in oronto Canada studying Electrical Engineering

    and Business and ocusing on microelectronics and programming. Hes currently

    doing personal research or his thesis project involving, both electronic and opticalbased Multi-touch and multi-modal hardware systems.

    Tomas Hansen is PhD. Student at the University o Iowa and open sourceenthusiast. His research interests include Inormation Visualization, Computer

    Graphics, and especially Human Computer Interaction directed at therapeutic

    applications o novel interaces such as multi-touch displays. Tis and his other

    work on multi-touch interaces has been supported by the University o Iowas

    Mathematical and Physical Sciences Funding Program.

    im Roth studied Interaction Design at the Zurich University o the Arts. Dur-ing his studies, he ocussed on interace design, dynamic data visualization and

    new methods o human computer interaction.His diploma dealt with the cus-

    tomer and consultant situation in the private banking sector. Te nal output was

    a multitouch surace, that helped the consultant to explain and visualize the com-

    plexeties o modern nancial planning.

    Seth Sandleris a University o Caliornia, San Diego graduate with a Bachelorsdegree in Interdisciplinary Computing and the Arts, with an emphasis in music.

    During Seths nal year o study he began researching multitouch and ultimat-

    ley produced a musical multi-touch project entitled Audioouch. Seth is most

    notably known or his presence on the NUI Group orum, introduction o the

    Mmini, and CCV development.

    EditorsTis book was edited & designed by Grkem etin, Rishi Bedi and Seth Sandler.

    Copyright

    Tis book is 2009 NUI Group. All the content on this page is protected by a

    Attribution-Share Alike License. For more inormation, see http://creativecom-

    mons.org/licenses/by-sa/3.0

  • 8/8/2019 First Edition Community Release

    10/100

    vi Multi-Touch Technologies Book

  • 8/8/2019 First Edition Community Release

    11/100

    Chapter 1Multi-Touch Technologies

    In this chapter:

    1.1 Introduction to Hardware

    1.2 Introduction to Optical echniques1.3 Frustrated otal Internal Reection (FIR)1.4 Diused Illumination (DI)1.5 Laser Light Plane (LLP)1.6 Diused Surace Il lumination (DSI)1.7 LED Light Plane (LED-LP)1.8 echnqiue Comparison1.9 Display echniques

    1

  • 8/8/2019 First Edition Community Release

    12/100

    2 Multi-Touch Technologies Handbook

    1.1 Introduction to Hardware

    Multi-touch (or multitouch) denotes a set o interaction techniquesthat allow computer users to control graphical applications with

    several ngers. Multi-touch devices consist o a touch screen (e.g.,

    computer display, table, wall) or touchpad, as well as sotware that

    recognizes multiple simultaneous touch points, as opposed to the standard touch-

    screen (e.g. computer touchpad, AM), which recognizes only one touch point.

    Te Natural User Interace and its inuence on multi-touch gestural interace

    design has brought key changes in computing hardware design, especially thecreation o true multi-touch hardware systems (i.e. support or more than two

    inputs). Te aim o NUI Group is to provide an open platorm where hardware

    and sotware knowledge can be exchanged reely; through this ree exchange o

    knowledge and inormation, there has been an increase in development in regards

    to hardware.

    On the hardware rontier, NUI Group aims to be an inormational resource hub

    or others interested in prototyping and/or constructing, a low cost, high resolu-tion open-source multi-input hardware system. Trough the community research

    eorts, there have been improvements to existing multi-touch systems as well as

    the creation o new techniques that allow or the development o not only multi-

    touch hardware systems, but also multi-modal devices. At the moment there are

    ve major techniques being rened by the community that allow or the creation

    o a stable multi-touch hardware systems; these include: Je Hans pioneering

    Frustrated otal Internal Reection (FIR), Rear Diused Illumination (Rear

    DI) such as Microsots Surace able, Laser Light Plan (LLP) pioneered in thecommunity by Alex Popovich and also seen in Microsot s Laserouch prototype,

    LED-Light Plane (LED-LP) developed within the community by Nima Mota-

    medi, and nally Diused Surace Illumination (DSI) developed within the com-

    munity by im Roth.

    Tese ve techniques being utilized by the community work on the principal o

    Computer Vision and optics (cameras). While optical sensing makes up the vast

    majority o techniques in the NUI Group community, there are several other sens-ing techniques that can be utilized in making natural user interace and multitouch

    devices. Some o these sensing devices include proximity, acoustic, capacitive, re-

    sistive, motion, orientation, and pressure. Oten, various sensors are combined to

    orm a particular multitouch sensing technique. In this chapter, we explore some

    o the mentioned techniques.

  • 8/8/2019 First Edition Community Release

    13/100

    3Multi-Touch Technologies

    1.2 Introduction to Optical Multi-Touch Technologies

    Optical or light sensing (camera) based solutions make up a large per-centage o multi-touch devices. Te scalability, low cost and ease o

    setup are suggestive reasoning or the popularity o optical solutions.

    Stereo Vision , Overhead cameras, Frustrated otal Interal Reec-

    tion, Front and Rear Diused Illumination, Laser Light Plane, and Diused Sur-

    ace Illumination are all examples o camera based multi-touch systems.

    Each o these techniques consist o an optical sensor (typically a camera), inrared

    light source, and visual eedback in the orm o projection or LCD. Prior to learn-ing about each particular technique, it is important to understand these three parts

    that all optical techniques share.

    1.2.1 Inrared Light Sources

    Inrared (IR in short) is a portion o the light spectrum that lies just beyond what

    can be seen by the human eye. It is a range o wavelengths longer than visible light,

    but shorter than microwaves. Near Inrared (NIR) is the lower end o the inrared

    light spectrum and typically considered wavelengths between 700nm and 1000nm

    (nanometers). Most digital camera sensors are also sensitive to at least NIR and

    are oten tted with a lter to remove that part o the spectrum so they only

    capture the visible light spectrum. By removing the inrared lter and replacing

    it with one that removes the visible light instead, a camera that only sees inrared

    light is created.

    In regards to multitouch, inrared light is mainly used to distinguish between a

    visual image on the touch surace and the object(s)/nger(s) being tracked. Since

    most systems have a visual eedback system where an image rom a projector,

    LCD or other display is on the touch surace, it is important that the camera does

    not see this image when attempting to track objects overlyaed on the display. In

    order to separate the objects being tracked rom the visual display, a camera, as

    explained above, is modied to only see the inrared spectrum o light; this cuts

    out the visual image (visible light spectrum) rom being seen by the camera andthereore, the camera is able to see only the inrared light that illuminates the

    object(s)/nger(s) on the touch surace.

    Many brands o acrylic sheet are intentionally designed to reduce their IR trans-

    mission above 900nm to help control heat when used as windows. Some cameras

  • 8/8/2019 First Edition Community Release

    14/100

    4 Multi-Touch Technologies Handbook

    sensors have either dramatically greater, or dramatically lessened sensitivity to

    940nm, which also has a slightly reduced occourance in natural sunlight.

    IR LEDs are not required, but a IR light source is. In most multitouch optical

    techniques (especially LED-LP and FIR), IR LEDs are used because they are

    ecient and eective at providing inrared light. On the other hand, Diused

    Illumination (DI) does not require IR LEDs, per se, but rather, some kind o

    inrared light source like an inrared illuminator (which may have LEDs inside).

    Laser light plane (LLP) uses IR lasers as the IR light source.

    Most o the time, LEDs can be bought in orms o single LEDs or LED rib-bons:

    Single LEDs: Single through-hole LEDs are a cheap and easy to create LED

    rames when making FIR, DSI, and LED-LP M setups. Tey require a

    knowledge in soldering and a little electrical wiring when constructing. LED

    calculators like here and here make it easy ot people to gure out how to

    wire the LEDs up. Te most commonly through-hole IR LED used are the

    OSRAM SFH 485 P. I you are trying to make a LCD FIR, you will needbrighter than normal IR LEDs, so these are probably your best bet.

    LED ribbons: Te easiest solution or making LED rames instead o solder-

    ing a ton o through-hole LEDs, LED ribbons are FFC cables with surace

    mount IR LEDs already soldered on them as seen here. Tey come with an

    adhesive side that can be stuck into a rame and wrapped around a piece o

    acrylic with a continuous LED ribbon. All that is required is wrap and plug

    the ribbon into the power adapter and done. Te best quality ribbons can beound at environmentallights.com.

    LED emitters: When making Rear DI or Front DI setups, pre-built IR emit-

    ters are mush easier than soldering a ton o single through-hole LEDs to-

    gether. For most Rear DI setups, 4 o these are usually all that is needed to

    completely cover the insides o the box. By buying the grouped LEDs, you

    will have to eliminate hot spot IR blobs caused by the emitters by bouncing

    the IR light o the sides and oor o the enclosed box.

    Beore buying LEDs, its strongly advised to check the data sheet o the LED.

    Wavelength, angle, and radiant intensity are the most important specications or

    all techniques.

    Wavelength: 780-940nm. LEDs in this range are easily seen by most cameras

  • 8/8/2019 First Edition Community Release

    15/100

    5Multi-Touch Technologies

    and visible light lters can be easily ound or these wavelengths. Te lower

    the wavelength, the higher sensitivity which equates to a higher ability to

    determine the pressure.

    Radiant intensity: Minimum o 80mw. Te ideal is the highest radiant inten-

    sity you can nd, which can be much higher than 80mw.

    Angle or FIR: Anything less than +/- 48 will not take ull advantage o

    total internal reection, and anything above +/- 48 degrees will escape the

    acrylic. In order to ensure there is coverage, going beyond +/- 48 degrees is

    ne, but anything above +/- 60 is really just a waste as (60 - 48 = +/- 12 de-grees) will escape the acrylic.

    Angle or diused illumination: Wider angle LEDs are generally better. Te

    wider the LED angle, the easier it is to achieve even illumination. Te wave-

    length and radiant intensity

    For the DI setup, many setups bump into problem o hotspots. In order to elimi-

    nate this, IR light needs to be bounced o the bottom o the box when mounted

    to avoid IR hot spots on the screen. Also, a band pass lter or the camera is re-quired homemade band pass lters such as the exposed negative lm or a piece

    o a oppy disk will work, but theyll provide poor results. Its always better to buy

    a (cheap) band pass lter and putting it into the lens o the camera.

    1.2.2 Inrared Cameras

    Simple webcams work very well or multitouch setups, but they need to be modi-ed rst. Regular webcams and cameras block out inrared light, letting only vis-

    ible light in. We need just the opposite. ypically, by opening the camera up, you

    can simply pop the lter o, but on expensive cameras this lter is usually applied

    directly to the lens and cannot be modied.

    Most cameras will show some inrared light without modication, but much bet-

    ter perormance can be achieved i the lter is replaced.

    Te perormance o the multi-touch device depends on the used components.

    Tereore it is important to careully select your hardware components. Beore

    buying a camera it is important to know or what purpose you will be using it.

    When you are building your rst (small) test multi-touch device, the requirements

    may be lower than when you are building one that is going to be used or demon-

  • 8/8/2019 First Edition Community Release

    16/100

    6 Multi-Touch Technologies Handbook

    stration purposes.

    Resolution:Te resolution o the camera is very important. Te higher the reso-lution the more pixels are available to detect nger or objects in the camera image.Tis is very important or the precision o the touch device. For small multi-touch

    suraces a low resolution webcam (320 x 240 pixels) can be sucient. Larger sur-

    aces require cameras with a resolution o 640x480 or higher in order to maintain

    the precision.

    Frame rate:Te rame rate is the number o rames a camera can take within

    one second. More snapshots means that we have more data o what happened in aspecic time step. In order to cope with ast movements and responsiveness o the

    system a camera with at least a rame rate o 30 rames per second (FPS) is recom-

    mended. Higher rame rates provide a smoother and more responsive experience.

    Interace: Basically there are two types o interaces that can be used to connecta camera device to a computer. Depending on the available budget one can choose

    between a consumer grade webcam that uses a USB interace or a proessional

    camera that is using the IEEE 1394 interace (which is also known as FireWire).An IEEE 1394 device is recommend because it usually has less overhead and

    lower latency in transerring the camera image to the computer. Again, lower

    latency results in a more responsive system.

    Lens type: Most consumer webcams contain an inrared (IR) lter that preventsIR light rom reaching the camera sensor. Tis is done to prevent image distor-

    tion. However or our purpose, we want to capture and use IR light. On some

    webcams it is possible to remove the IR lter. Tis lter is placed behind the lensand oten has a red color. I it is not possible to remove the IR lter, the lens has

    to be replaced with a lens without coating. Webcams oten use a M12 mount.

    Proessional series cameras (IEEE 1394) oten come without a lens. Depending

    on the type it is usualy possible to use a M12, C or CS mount to attach the lens.

    Choosing the right lens can be a dicult task, ortunately many manuactures

    provide an online lens calculator. Te calculator calculates the required ocal

    length based on two input parameters which are the distance between the lens

    and the object (touch surace) and the width or height o the touch surace. Be

    sure to check i the calculator chooses a proper lens. Lenses with a low ocal length

    oten suer rom severe image distortion (Barrel distortion / sh eye), which can

    complicate the calibration o the touch tracking sotware.

    Camera sensor & IR bandpass lter: Since FIR works with IR light we

  • 8/8/2019 First Edition Community Release

    17/100

    7Multi-Touch Technologies

    need to check i our webcam is able to see IR light. Oten user manuals mention

    the sensor name. Using the sensor name one can nd the data sheets o the camera

    sensor. Te data sheet usually contains a page with a graph similar as below. Tisimage is belongs to the data sheet o the Sony ICX098BQ CCD sensor [Fig. 1].

    Figure 1: Sensitivity o a Sony CCD sensorTe graph shows the spectral sensitivity characteristics. Tese characteristics show

    how sensitive the sensor is to light rom specic wavelegths. Te wave length o

    IR light is around 880 nm. Unortunately the image example image only shows a

    range between 400 and 700 nm.

    Beore we can actually use the camera it is required to add a bandpass lter. When

    using the (IR sensitive) camera, it will also show all other colors o the spectrum.

    In order to block this light one can use a cut-o lter or a bandpass lter. Te cut-

    o lter blocks light below a certain wave length, the bandpass lter only allows

    light rom a specic wavelength to pass through.

    Bandpass lters are usually quite expensive, a cheap solution is to use overexposed

    developed negatives.

    Recommended hardware

    When using a USB webcam it is recommended to buy either o the ollowing:

    Te Playstation 3 camera (640x480 at 30 FPS). Te lter can be removed and

    higher rame rates are possible using lower resolution.

  • 8/8/2019 First Edition Community Release

    18/100

    8 Multi-Touch Technologies Handbook

    Philips SPC 900NC (640x480 at 30 FPS). Te lter cannot be removed, it is

    required to replace the lens with a dierent one.

    When using IEEE 1394 (Firewire) it is recommended to buy:

    Unibrain Fire-i (640x480 at 30 FPS) a cheap low latency camera. Uses the

    same sensor as the Philips SPC 900NC.

    Point Grey Firey (640x480 at 60 FPS)

    Firewire cameras have some benets over normal USB webcams:

    Higher ramerate

    Capture size

    Higher bandwidth

    Less overhead or driver (due to less compression)

  • 8/8/2019 First Edition Community Release

    19/100

    9Multi-Touch Technologies

    1.3 Frustrated Total Internal Refection (FTIR)

    FIR is a name used by the multi-touch community to describe an opti-

    cal multi-touch methodology developed by Je Han (Han 2005). Te

    phrase actually reers to the well-known underlying optical phenom-

    enon underlying Hans method. otal Internal Reection describes a

    condition present in certain materials when light enters one material rom another

    material with a higher reractive index, at an angle o incidence greater than a spe-

    cic angle (Gettys, Keller and Skove 1989, p.799). Te specic angle at which this

    occurs depends on the reractive indexes o both materials, and is known as the

    critical angle, which can be calculated mathematically using Snells law.

    When this happens, no reraction occurs in the material, and the light beam is

    totally reected. Hans method uses this to great eect, ooding the inside o a

    piece o acrylic with inrared light by trapping the light rays within the acrylic us-

    ing the principle o otal Internal Reection. When the user comes into contact

    with the surace, the light rays are said to be rustrated, since they can now passthrough into the contact material (usually skin), and the reection is no longer

    total at that point. [Fig. 2] Tis rustrated light is scattered downwards towards an

    inrared webcam, capable o picking these blobs up, and relaying them to tracking

    sotware.

    Figure 2: FIRschematic diagramdepicting the bareminimum o partsneeded or a FIR

    setup.

  • 8/8/2019 First Edition Community Release

    20/100

    10 Multi-Touch Technologies Handbook

    1.3.1 FIR Layers

    Tis principle is very useul or implementing multi-touch displays, since the light

    that is rustrated by the user is now able to exit the acrylic in a well dened area

    under the contact point and becomes clearly visible to the camera below.

    Acrylic

    According to the paper o Han, it is necessary to use acrylic or the screen. Te

    minimum thickness is 6 mm however large screens should use 1 cm to prevent the

    screen rom bending.

    Beore a sheet o acrylic can be used or a multi-touch screen it needs to be pre-

    pared. Because acrylic oten gets cut up roughly, it is required to polish the sides

    o the sheet. Tis is done to improve the illumination rom the sides. o polish

    the sheet it is recommend to use dierent kinds o sandpaper. First start with a

    ne sandpaper to remove most o the scratches, ater that continue with very ne,

    super ne and even wet sandpaper. o make your sheets shine you can use Brasso.

    Bafe

    Te bafe is required to hide the light that is leaking rom the sides o the LEDs.

    Tis can be a border o any material (wood/metal).

    Diuser

    Without a diuser the camera will not only see the touches, but also all objects

    behind the surace. By using a diuser, only bright objects (touches) will be visibleto the camera. All other noise data will be let out.

    Compliant layer

    With a basic FIR setup, the perormance mainly depends on how greasy the

    ngertips o the user are. Wet ngers are able to make better contact with the

    surace. Dry ngers and objects wont be able to rustrate the IR. o overcome

    this problem it is recommended to add a compliant layer on top o the surace.

    Instead o rustrating the total internal reection by touch, a compliant layer will

    act as a proxy. Te complaint layer can be made out o a silicon material such as

    ELASOSIL M 4641. o protect and improve the touch surace, rear projection

    material such as Rosco Gray #02105 can be used. With this setup it is no longer

    required to have a diuser on the rear side.

  • 8/8/2019 First Edition Community Release

    21/100

    11Multi-Touch Technologies

    1.3.2 Compliant Suraces

    Te compliant surace is an overlay placed above the acrylic waveguide in a FIR

    based multi-touch system. Te compliant surace overlay needs to be made o a

    material that has a higher reractive index than that o the acrylic waveguide, and

    one that will couple with the acrylic surace under pressure and set o the FIR

    eect, and then uncouple once the pressure is released. Note that compliant sur-

    aces are only needed or FIR - not or any other method (DI, LLP, DSI). Te

    compliant surace overlay can also be used as a projection screen.

    Te compliant surace or compliant layer is simply an additional layer between

    the projection surace and the acrylic. It enhances the nger contact and gives you

    more robust blobs, particularly when dragging as your nger will have less adhe-

    sion to the surace. In the FIR technique, the inrared light is emitted into side

    the acrylic waveguide, the light travels inside the medium (due to total internal

    reection much like a bre optic cable), when you touch the surace o the acrylic

    (you rustrate this IR eect) causing the light to reract within the medium on

    points o contact and creating the blobs (bright luminescent objects). Tere ismuch experimentation ongoing in the quest or the perect compliant layer.

    Some materials used to success include Sorta Clear 40 and similar catalyzed sili-

    con rubbers, lexel, and various brands o RV silicone. Others have used abric

    based solutions like silicon impregnated interacing and SulkySolvy.

    Te original successul method, still rather popular, is to cast a smooth silicone

    surace directly on top o your acrylic and then lay your projection surace on that

    ater it cures. Tis requires a material that closely approximates the optical proper-

    ties o the acrylic as it will then be a part o the acrylic as ar as your transmitted

    IR is concerned, hence the three rubber materials mentioned earlier...they all have

    a reractive index that is slightly higher but very close to that o acrylic. Gaining in

    popularity is the Cast exture method. inkerman has been leading the pack in

    making this a simple process or DIY rather than an involved commercial process.

    But essentially, by applying the compliant layer to the underside o the projection

    surace, and texturing it, then laying the result on acrylic, you gain several benets.Te compliant surace is no longer a part o the acrylic IR eect so you are no

    longer limited to materials with a similar reractive index to that o acrylic, al-

    though RV and Lexel remain the most popular choices, edging out catalyzed

    silicons here. Since it is largely suspended over the acrylic by the texture, except

  • 8/8/2019 First Edition Community Release

    22/100

    12 Multi-Touch Technologies Handbook

    where you touch it, you get less attenuation o your reracted IR light, resulting in

    brighter blobs.

    Fabric based solutions have a smaller ollowing here, and less dramatic o a proven

    success rate, but are without question the easiest to implement i an appropriate

    material can be sourced. Basically they involve lining the edges o your acrylic

    with two sided tape, and stretching the abric over it, then repeating the process

    to attach your display surace. Currently the hunt is on to nd a compliant surace

    overlay that works as both a projection surace as well as a touch surace. Users

    have been experimenting with various rubber materials like silicone. Compliant

    suraces, while not a baseline requirement or FIR, present the ollowing advan-tages:

    Protects the expensive acrylic rom scratches

    Blocks a lot o light pollution.

    Provides consistent results (the eectiveness o the bare acrylic touch seems to

    be down to how sweaty/greasy yours hands are).

    Zero visual disparity between the touch surace and the projection surace

    Pressure sensitive

    Seems to react better or dragging movements (at least in my experience)

    Brighter blobs to track, as there is no longer a diuser between the IR blob

    light and the camera.

    Developing a compliant surace or a LCD FIR setup is dicult, as the surace

    must be absolutely clear and distortion-ree, so as to not obscure the LCD im-

    age. Tis diculty is not present in projection-based FIR setups, as the image

    is projected onto the compliant surace itsel. o date, no one has successully

    built a LCD FIR setup with a 100% clear compliant surace. However, several

    individuals have had success with LCD FIR setups, with no compliant surace

    whatsoever. It appears that the need or a compliant surace largely depends on

    how strong blobs are without one, and specic setups.

  • 8/8/2019 First Edition Community Release

    23/100

    13Multi-Touch Technologies

    1.4 Diused Illumination (DI)

    Diused Illumination (DI) comes in two main orms: Front DiusedIllumination and Rear Diused Illumination. Both techniques rely

    on the same basic principles - the contrast between the silent image

    and the nger that touches the surace.

    1.4.1 Front Difused Illumination

    Visible light (oten rom the ambient surroundings) is shined at the screen rom

    above the touch surace. A diuser is placed on top or on bottom o the touch

    surace. When an object touches the surace, a shadow is created in the position o

    the object. Te camera senses this shadow.

    1.4.2 Rear Difused Illumination

    Depending on size and conguration o the table, it can be quite challenging to

    achieve a uniorm distribution o IR light across the surace or rear DI setups.While certain areas are lit well and hence touches are deteced easily, other areas are

    darker, thus requiring the user to press harder in order or a touch to be detected.

    Te rst approach or solving this problem should be to optimize the hardware

    setup, i.e. positioning o illuminators, changing wall materials, etc. However, i

    there is no improvement possible anymore on this level, a sotware based approach

    might help.

    Currently, ouchlib applies any lter with the same intensity to the whole inputimage. It makes sense, though, to change the lters intensity or dierent areas

    o the surace to compensate or changing light conditions. A gradient based ap-

    proach (http://eis.comp.lancs.ac.uk/~dominik/cms/index.php?pid=24) can be ap-

    plied where a grayscale map is used to determine on a per-pixel-base how strong

    the respective lter is supposed to be applied to a certain pixel. Tis grayscale map

    can be created in an additional calibration step.

    Inrared light is shined at the screen rom below the touch surace. A diuser isplaced on top or on bottom o the touch surace. When an object touches the

    surace it reects more light than the diuser or objects in the background; the

    extra light is sensed by a camera. [Fig. 3] Depending on the diuser, this method

    can also detect hover and objects placed on the surace. Rear DI, as demonstrated

    in the gure below, requires inrared illuminators to unction. While these can be

  • 8/8/2019 First Edition Community Release

    24/100

    14 Multi-Touch Technologies Handbook

    bought pre-abricated, as discussed in the Inrared Light Source section, they can

    also be constructed manually using individual LEDs. Unlike other setups, Rear DI

    also needs some sort o a diuser material to diuse the light, which requentlyalso doubles as the projection surace.

    Figure 3: Rear DI schematic

  • 8/8/2019 First Edition Community Release

    25/100

    15Multi-Touch Technologies

    1.5: Laser light planE (LLP)

    Inrared light rom a laser(s) is shined just above the surace. Te laser planeo light is about 1mm thick and is positioned right above the surace, when

    the nger just touches it, it will hit the tip o the nger which will register

    as a IR blob. [Fig. 4]

    Inrared lasers are an easy and usually inexpensive way to create a M setup using

    the LLP method. Most setups go with 2-4 lasers, postioned on the corners o the

    touch surace. Te laser wattage power rating (mW,W) is related to the brightness

    o the laser, so the more power the brighter the IR plane will be.

    Te common light wavelengths used are 780nm and 940nm as those are the

    wavelengths available on the Aixiz.com website where most people buy their laser

    modules. Laser modules need to have line lenses on them to create a light plane.

    Te 120 degree line lens is most commonly used, so as to reduce the number o

    lasers necessary to cover the entire touch surace.

    Saety when using lasers o any power is important, so exercise common sense and

    be mindul o where the laser beams are traveling.

    1.5.1 Laser Saety

    Inrared lasers are used to achieve the LLP eect, and these lasers carry some in-

    herent risk. For most multi-touch setups, 5mW-25mW is sucient. Even lasers o

    this grade, however, do possess some risk actors with regard to eye damage. Vis-

    Figure 4: LLP schematic

  • 8/8/2019 First Edition Community Release

    26/100

    16 Multi-Touch Technologies Handbook

    ible light lasers, while certainly dangerous i looked directly into, activate a blink-

    response, minimizing the lasers damage. Inrared lasers, however, are impercepti-

    ble to the human eye and thereore activate no blink response, allowing or greaterdamage. Te lasers would all under these laser saety categories [Wikipedia]:

    A Class 3B laser is hazardous i the eye is exposed directly, but di-use reections such as rom paper or other matte suraces are not harmul.Continuous lasers in the wavelength range rom 315 nm to ar inraredare limited to 0.5 W. For pulsed lasers between 400 and 700 nm, the limit

    is 30 mJ. Other limits apply to other wavelengths and to ultrashort pulsedlasers. Protective eyewear is typically required where direct viewing o a

    class 3B laser beam may occur. Class-3B lasers must be equipped with a keyswitch and a saety interlock.

    As explained in Appendix B, a line lens is used to expand a lasers 1-dimensional

    line into a plane. Tis line lens reduces the intensity o the laser, but risk is still

    present. It is imperative that laser saety goggles matching the wavelength o the

    lasers used (i.e. 780 nm) are used during setup o the LLP device. Following initialsetup and alignment, caution should be exercised when using a LLP setup. No

    objects that could reract light oddly should be placed on the setup - or example,

    a wine glasss cylindrical bottom would cause laser light to be reected all over

    the area haphazardly. A ence should be erected around the border o the setup to

    enclose stray laser beams and no lasers should ever be looked at directly. Goggles

    vary widely in their protection: Optical density (OD) is a measure o protection

    provided by a lens against various wavelengths. Its logarithmic, so a lens provid-

    ing OD 5 reduces beam power by 100 times more than an OD 3 lens. Tere is

    merit to setting up a system with low-power visible red lasers or basic alignment,

    etc, since they are saer and optically behave similarly. When switching into IR,

    detection cards are available that allow observation o low power IR lasers by

    absorbing the light and re-emitting visible light - like an index card to check a vis-

    ible beam, these are a useul way to check alignment o the IR laser while wearing

    the goggles.

    o reiterate the most important saety precautions enumerated above: always useinrared saety goggles when setting up, and never shine a laser directly in your eye.

    I extreme caution is exercised and these rules not broken, LLP setups are quite

    sae. However, violation o any o these saety guidelines could result in serious

    injury and permanent damage to your retina.

  • 8/8/2019 First Edition Community Release

    27/100

    17Multi-Touch Technologies

    Diused Surace Illumination (DSI)

    DSI uses a special acrylic to distribute the IR evenly across the sur-ace. Basically use your standard FIR setup with an LED Frame

    (no compliant silicone surace needed), and just switch to a special

    acrylic. [Fig. 5] Tis acrylic uses small particles that are inside the

    material, acting like thousands o small mirrors. When you shine IR light into the

    edges o this material, the light gets redirected and spread to the surace o the

    acrylic. Te eect is similar to DI, but with even illumination, no hotspots, and

    same setup process as FIR.

    Evonic manuactures some dierent types o Endlighten. Tese vary in their

    thickness and also in the amount o particles inside the material. Available thick-

    ness ranges between 6-10mm, ollwing L, XL and XXL or particle amount.

    Te 6 mm (L) is too exible or a table setup, but the 10 mm (XXL) works nicely.

    Figure 5: DSI Schematic

  • 8/8/2019 First Edition Community Release

    28/100

    18 Multi-Touch Technologies Handbook

    1.7. LED Light Plane (LED-LP)

    LED-LP is setup the same way as an FIR setup except that the thick

    acrylic that the inrared light travels through is removed and the light

    travels over the touch surace. Tis picture [Fig. 6] shows the layers

    that are common in an LED-LP setup.

    Te inrared LEDs are placed around the touch surace; with all sides being sur-

    rounding preerred to get a more even distribution o light. Similar to LLP, LED-

    LP creates a plane o IR light that lays over the touch surace. Since the light

    coming rom the LEDs is conical instead o a at laser plane, the light will light up

    objects placed above the touch surace instead o touching it. Tis can be adjusted

    or by adjusting lter settings in the sotware (touchlib/Community Core Vision)

    such as the threshold levels to only pick up objects that are lit up when they are

    very close to the touch surace. Tis is a problem or people starting with this type

    o setup and takes some patience. It is also recommended that a bezel (as can be

    seen in the picture above) is put over the LEDs to shield the light into more o a

    Figure 6: LED-LP 3D Schematic created in SecondLie

  • 8/8/2019 First Edition Community Release

    29/100

    19Multi-Touch Technologies

    plane.

    LED-LP is usually only recommended when working with an LCD screen asthere are better methods such as Rear DI when using a projector that usually dont

    work with an LCD screen. Like Rear DI and LLP the touch surace need not be

    thick like in FIR, but only as strong as it needs to support the orces rom work-

    ing on the touch surace.

    1.7.1 Layers used

    First, a touch surace, which is a strong, durable surace that can take the pressureo user interaction that is optically clear should be used. Tis is usually acrylic

    or glass. I using in a projector setup, the image is stopped and displayed on the

    projection layer. I using in an LCD setup, the diuser is placed below the LCD

    screen to evenly distribute the light rom the LCD backlight.

    Te source o inrared light or an LED-LP setup comes rom inrared LEDs

    that are placed around at least 2 sides o the acrylic right above the touch surace.

    ypically the more sides surrounded, the better the setup will be in IR prevalentlighting conditions. Reer to the LED section or more inormation on IR LEDs.

    A computer webcam is placed on the opposite site o the touch surace so that is

    can see the blobs. See the camera section or more inormation on cameras that

    are commonly used.

    Figure 7: Picture o LED-LP setup

  • 8/8/2019 First Edition Community Release

    30/100

    20 Multi-Touch Technologies Handbook

    1.8 Technique comparison

    Arequent question is What is the best technique? Unortunately,there is no simple answer. Each technique has its advantages and dis-

    advantages. Te answer is less about what is best and more about what

    is best or you, which only you can answer.

    FIR

    Advantages Disadvantages

    An enclosed box is not required

    Blobs have strong contrast

    Allows or varying blob pressure

    With a compliant surace, it can

    be used with something as small

    as a pen tip

    Setup calls or some type o LED

    rame (soldering required)

    Requires a compliant surace (sili-

    cone rubber) or proper use - no

    glass surace

    Cannot recognize objects or du-

    cial markers

    Rear DI

    Advantages Disadvantages

    No need or a compliant surace,

    just an diuser/projection suraceon top/bottom

    Can use any transparent material

    like glass (not just acrylic)

    No LED rame required

    No soldering (you can buy the IR-

    Illuminators ready to go)

    Can track objects, ngers, du-

    cials, hovering

    Dicult to get even illumination

    Blobs have lower contrast (harder

    to pick up by sotware)

    Greater chance o alse blobs

    Enclosed box is required

  • 8/8/2019 First Edition Community Release

    31/100

    21Multi-Touch Technologies

    Front DI

    Advantages Disadvantages No need or a compliant surace,

    just an diuser/projection surace

    on top/bottom

    Can use any transparent material

    like glass (not just acrylic)

    No LED rame required

    No soldering (you can buy the IR-

    Illuminators ready to go)

    Can track ngers and hovering

    An enclosed box is not required

    Simplest setup

    Cannot track objects and ducials

    Dicult to get even illumination

    Greater chance o alse blobs

    Not as reliable (relies heavily on

    ambient lighting environment)

    LLP

    Advantages Disadvantages

    No compliant surace (silicone)

    Can use any transparent materiallike glass (not just acrylic)

    No LED rame required

    An enclosed box is not required

    Simplest setup

    Could be slightly cheaper thanother techniques

    Cannot track traditional objects

    and ducials Not truly pressure sensitive (since

    light intensity doesnt change with

    pressure).

    Can cause occlusion i only using

    1 or 2 lasers where light hitting

    one nger blocks another nger

    rom receiving light.

  • 8/8/2019 First Edition Community Release

    32/100

    22 Multi-Touch Technologies Handbook

    DSI

    Advantages Disadvantages No compliant surace (silicone)

    Can easily switch back and orth

    between DI (DSI) and FIR

    Can detect objects, hovering, and

    ducials

    Is pressure sensitive

    No hotspots

    Even nger/object illumination

    throughout the surace

    Endlighten Acrylic costs more

    than regular acrylic (but the some

    o the cost can be made up since

    no IR illuminators are needed)

    Blobs have lower contrast (harder

    to pick up by sotware) than FIR

    and LLP

    Possible size restrictions due to

    plexiglasss sotness

    LED-LP

    Advantages Disadvantages

    No compliant surace (silicone)

    Can use any transparent materiallike glass (not just acrylic)

    No LED rame required

    An enclosed box is not required

    Could be slightly cheaper than

    other techniques

    Hovering might be detected

    Does not track objects or ducials

    LED rame (soldering) required

    Only narrow-beam LEDs can be

    used - no ribbons.

  • 8/8/2019 First Edition Community Release

    33/100

    23Multi-Touch Technologies

    1.9: Display Techniques

    1.9.1 Projectors

    he use o a projector is one o the methods used to display visual eed-

    back on the table. Any video projection device (LCDs, OLEDs, etc.)

    should work but projectors tend to be the most versatile and popular

    in terms o image size.

    Tere are two main display types o projectors: DLP and LCD.

    LCD (Liquid Crystal Displays) are made up o a grid o dots that go on and

    o as needed. Tese use the same technology that is in your at panel moni-

    tor or laptop screen. Tese displays are very sharp and have very strong color.

    LCD projectors have a screen door eect and the color tends to ade ater a

    ew years.

    DLP (Digital Light Processing) is a technology that works by the use o

    thousands o tiny mirrors moving back and orth. Te color is then created bya spinning color wheel. Tis type o projector has a very good contrast ratio

    and is very small in physical size. DLP may have a slower response rate.

    One thing to consider with projects is brightness. Its measured in ANSI lumens.

    Te higher the number the brighter the projector and the brighter the setting can

    be. In a home theater setting you alway want a brighter screen but in a table setup

    it diers.

    With the projector close to the screen and the screen size being smaller the pro-

    jection can be too bright and give you a hotspot in the screen. Tis hotspot eect

    can be dazzling ater looking at the screen or several minutes.

    One o the biggest limiting points o a projector is the throw distance. Tis is the

    distance that is needed between the lens o the projector and the screen to get the

    right image size. For example in order to have a screen size o 34in, then you may

    need to have a distance o 2 eet between you projector and the screen. Check tosee i your projection image can be adjusted to t the size o the screen you are

    using.

    In case you want to make your multi touch display into a boxed solution, a mirror

    can assist in redirecting the projected image. Tis provides the necessary throw

  • 8/8/2019 First Edition Community Release

    34/100

    24 Multi-Touch Technologies Handbook

    length between the projector and screen while allowing a more compact congu-

    ration.

    Te aspect ratio is a measure o image width over image height or the projected

    image. or example i you have a standard television then your aspect ratio is 4:3.

    I you have a wide screen tv then you have a 16:9 ratio. In most cases a 4:3 ratio is

    preerred but it depends on the use o the interace.

    1.9.2 LCD Monitors

    While projection-based displays tend to be the most versatile in terms o setupsize, LCD displays are also an option or providing visual eedback on multi-touch

    setups. All LCD displays are inherently transparent the LCD matrix itsel has

    no opacity. I all o theplastic casing o the display is removed and IR-blocking di-

    user layers are discarded, this LCD matrix remains, attached to its circuit boards,

    controller, and power supply (PSU). When mounted in a multi-touch setup, this

    LCD matrix allows inrared light to pass through it, and at the same time, display

    an image rivaling that o an unmodied LCD monitor or projector.Naturally, in order or an LCD monitor to produce an image when disassembled,

    it must be connected to its circuit boards. Tese circuit boards are attached to the

    matrix via FFC (at ex cable) ribbon that is o a certain length. A LCD monitor

    is unusable or a multi-touch setup i these FFC ribbons are too short to extend

    all components o the LCD monitor ar enough away rom the matrix itsel so

    as to not inhibit transparency o inrared light through it. Databases o FFC-

    compliant LCD monitors exist on websites such as LumenLabs, up to 19. LCD

    monitors that are not FFC compliant are still potentially usable or multi-touch

    setups, i the FFC ribbons are extended manually. FFC extensions can be ound

    in electronics parts stores, and can be soldered onto existing cable to make it the

    necessary length.

  • 8/8/2019 First Edition Community Release

    35/100

    Chapter 2Software & Applications

    In this chapter:

    2.1 Introduction to Sotware2.2 Blob racking2.3 Gesture Recognition2.4 Python

    2.5 ActionScript32.6 .NE/C#2.7 List o Frameworks and Applications

    25

  • 8/8/2019 First Edition Community Release

    36/100

    26 Multi-Touch Technologies Handbook

    2.1: introduction to Sotware Programming

    Programming or multi-touch input is much like any other orm o cod-ing, however there are certain protocols, methods, and standards in the

    multi-touch world o programming. Trough the work o NUI Group

    and other organizations, rameworks have been developed or several

    languages, such as ActionScript 3, Python, C, C++, C#, and Java. Multi-touch pro-

    gramming is two-old: reading and translating the blob input rom the camera

    or other input device, and relaying this inormation through pre-dened protocols

    to rameworks which allow this raw blob data to be assembled into gestures that

    high-level language can then use to interact with an application. UIO (angibleUser Interace Protocol) has become the industry standard or tracking blob data,

    and the ollowing chapters discuss both aspects o multi-touch sotware: touch

    tracking, as well as the applications operating o o the tracking rameworks.

  • 8/8/2019 First Edition Community Release

    37/100

    27Software & Applications

    2.2. Tracking

    Object tracking has been a undamental research element in the eldo Computer Vision. Te task o tracking consists o reliably being

    able to re-identiy a given object or a series o video rames contain-

    ing that object (estimating the states o physical objects in time rom

    a series o unreliable observations). In general this is a very dicult problem since

    the object needs to rst be detected (quite oten in clutter, with occlusion, or under

    varying lighting conditions) in all the rames and then the data must be associated

    somehow between rames in order to identiy a recognized object.

    Countless approaches have been presented as solutions to the problem. Te most

    common model or the tracking problem is the generative model, which is the

    basis o popular solutions such as the Kalman and particle lters.

    In most systems, a robust background-subtraction algorithm needs to rst pre-

    process each rame. Tis assures that static or background objects in the images are

    taken out o consideration. Since illumination changes requent videos, adaptive

    models, such as Gaussian mixture models, o background have been developed tointelligently adapt to non-uniorm, dynamic backgrounds.

    Once the background has been subtracted out, all that remains are the oreground

    objects. Tese objects are oten identied with their centroids, and it is these points

    that are tracked rom rame to rame. Given an extracted centroid, the tracking al-

    gorithm predicts where that point should be located in the next rame.

    For example, to illustrate tracking, a airly simple model built on a Kalman lter

    might be a linear constant velocity model. A Kalman lter is governed by two

    dynamical equations, the state equation and the observation equation. In this

    example, the set o equations would respectively be:

    Te state equation describes the propagation o the state variable, xk, with pro-

    cess noise, k (oten drawn rom a zero-mean

    Normal distribution) I we say xk-1 represents

    the true position o the object at time k-1, then

    the model predicts the position o the object at

    time k as a linear combination o its position at

    time k-1, a velocity term based on constant velocity, Vuk, and the noise term,

    k. Now since, according to the model, we dont have direct observations o the

    precise object positions, the xks, we need the observation equation. Te observa-

  • 8/8/2019 First Edition Community Release

    38/100

    28 Multi-Touch Technologies Handbook

    tion equation describes the actual observed variable, yk, which in this case would

    simply dene as a noisy observation o the true position, xk. Te measurement

    noise, k, is also assumed to be zero-mean Gaussian white noise. Using these twoequations, the Kalman lter recursively iterates a predicted object position given

    inormation about its previous state.

    For each state prediction, data association is perormed to connect tracking pre-

    dictions with object observations. racks that have no nearby observation are con-

    sidered to have died and observations that have no nearby tracks are considered to

    be a new object to be tracked.

    2.2.1 racking or Multi-ouch

    racking is very important to multi-touch technology. It is what enables multiple

    ngers to perorm various actions without interrupting each other. We are also

    able to identiy gestures because the trajectory o each nger can be traced over

    time, impossible without tracking.

    Tankully, todays multi-touch hardware greatly simplies the task o tracking anobject, so even the simplest Kalman lter in actuality becomes unnecessary. In act

    much o the perormance bottleneck o tracking systems tends to come rom gen-

    erating and maintaining a model o the background. Computational costs o these

    systems heavily constrict CPU usage unless clever tricks are employed. However,

    with the current inrared (IR) approaches in multi-touch hardware (e.g., FIR

    or DI), an adaptive background model turns out to be overkill. Due to the act

    that the captured images lter out (non-inrared) light, much o the background

    is removed by the hardware. Given these IR images, it is oten sucient to simply

    capture a single static background image to remove nearly all o the ambient light.

    Tis background image is then subtracted rom all subsequent rames, the resul-

    tant rames have a threshold applied to them, and we are let with images contain-

    ing blobs o oreground objects (ngers or surace widgets we wish to track).

    Furthermore, given the domain, the tracking problem is even simpler. Unlike gen-

    eral tracking solutions, we know that rom one rame o video to another (~33ms

    or standard reresh rate o 30Hz), a human nger will travel only a limited dis-

    tance. In light o this predictability, we can avoid modeling object dynamics and

    merely perorm data association by looking or the closest matching object be-

    tween two rames with a k-Nearest Neighbors (k-NN) approach. Normal NN ex-

    haustively compares data points rom one set to another to orm correspondences

  • 8/8/2019 First Edition Community Release

    39/100

    29Software & Applications

    between the closest points, oten measured with Euclidian distance. With k-NN,

    a data point is compared with the set o k points closest to it and they vote or the

    label assigned to the point. Using this approach, we are able to reliably track blobsidentied in the images or higher-level implementations. Heuristics oten need

    to be employed to deal with situations where ambiguity occurs, or example, when

    one object occludes another.

    Both the ouchlib and CCV rameworks contain examples o such trackers. Using

    OpenCV, an open-source Computer Vision library that enables straightorward

    manipulation o images and videos, the rameworks track detected blobs in real-

    time with high condence. Te inormation pertaining to the tracked blobs (posi-tion, ID, area, etc.) can then be transmitted as events that identiy when a new blob

    has been detected, when a blob dies, and when a blob has moved. Application

    developers then utilize this inormation by creating outside applications that listen

    or these blob events and act upon them.

  • 8/8/2019 First Edition Community Release

    40/100

    30 Multi-Touch Technologies Handbook

    2.3: Gesture Recognition

    Te Challenge o NUI : Simple is hard. Easy is harder. Invisible ishardest. - Jean-Louis Gasse

    2.3.1 GUIs to Gesture

    he uture o Human-Computer-Interaction is the Natural User In-

    terace which is now blurring its boundary with the present. With

    the advancement in the development o cheap yet reliable multi-touch

    hardware, it wouldnt be long when we can see multi-touch screens not

    only in highly erudite labs but also in study rooms to drawing rooms and maybe

    kitchens too.

    Te mouse and the GUI interace has been one o the main reasons or the huge

    penetration o computers in the society. However the interaction technique is in-

    direct and recognition based. Te Natural User Interace with multi-touch screens

    is intuitive, contextual and evocative. Te shit rom GUI to Gesture based inter-

    ace will urther make computers an integral but unobtrusive part o our liestyle.

    In its broadest sense, the notion o gesture is to embrace all kinds o instances

    where an individual engages in movements whose communicative intent is para-

    mount, maniest, and openly acknowledged [3] Communication through ges-

    tures has been one o the oldest orm o interaction in human civilization owing

    to various psychological reasons which, however, is beyond the scope o present

    discussion. Te GUI systems leverages previous experience and amiliarity with

    the application, whereas the NUI interace leverages the human assumptions and

    its logical conclusions to present an intuitive and contextual interace based on

    gestures. Tus a gesture based interace is a perect candidate or social and col-

    laborative tasks as well or applications involving an artistic touch. Te interace is

    physical, more visible and with direct manipulation.

    However, only preliminary gestures are being used today in stand-alone applica-

    tions on the multi-touch hardware which gives a lot o scope or its critics. Multi-

    touch interace requires a new approach rather than re-implementing the GUI

    and WIMP methods with it. Te orm o the gestures determines whether the

    type o interaction is actually multi-touch or single-touch multi-user. We will dis-

    cuss the kind o new gesture widgets required, development o gesture recognition

  • 8/8/2019 First Edition Community Release

    41/100

    31Software & Applications

    modules and the supporting ramework to ully leverage the utility o multi-touch

    hardware and develop customizable, easy to use complex multi-touch applications.

    2.3.2 Gesture Widgets

    Multi-touch based NUI setups provide a strong motivation and platorm or a

    gestural interace as it is object based ( as opposed to WIMP ) and hence remove

    the abstractness between the real world and the application. Te goal o the in-

    terace should be to realize a direct manipulation, higher immersion interace but

    with tolerance to the lower accuracy implied with such an interace. Te populargestures or scaling, rotating and translating images with two ngers, commonly

    reerred as manipulation gestures, are good examples o natural gestures. [Fig. 1]

    New types o gesture-widgets [Fig. 2] are required to be build to ully implementthe concept o direct manipulation with the objects ( everything is an object in NUI

    with which the user can interact ) and a evocative and contextual environment and

    not just trying to emulate mouse-clicks with a gesture.Gesture widgets should be de-

    signed with

    c r e a t i v e

    t h i n k i n g ,

    proper user

    e e d b a c k keeping in

    mind the

    context o

    the applica-

    Figure 2: Gestural widget examples

    Figure 1: Examples o gestures in multi-touch applications

  • 8/8/2019 First Edition Community Release

    42/100

    32 Multi-Touch Technologies Handbook

    tion and the underlying environment. Tese gesture widgets can then be extended

    by the application developer to design complex applications. Tey should also sup-

    port customizable user dened gestures.

    2.3.3 Gesture Recognition

    Te primary goal o gesture recognition research is to create a system which can

    identiy specic human gestures and use them to convey inormation or or device

    control. In order to ensure accurate gesture recognition and an intuitive interace

    a number o constraints are applied to the model. Te multi-touch setup shouldprovide the capability or more than one user to interact with it working on inde-

    pendent (but maybe collaborative) applications and multiple such setups working

    in tandem.

    Gestures are dened by the starting point within the boundary o one context,

    end point and the dynamic motion between the start and end points. With multi-

    touch input it should also be able to recognize meaning o combination o gestures

    separated in space or time. Te gesture recognition procedure can be categorizedin three sequential process:

    Detection o Intention: Gestures should only be interpreted when they are

    made within the application window. With the implementation o support

    or multi-touch interace in X Input Extension Protocol and the upcoming

    Figure 3: Gesture examples - (1) interpreted as 1 gesture, (2) as 3.

  • 8/8/2019 First Edition Community Release

    43/100

    33Software & Applications

    XI2, it will be the job o the X server to relay the touch event sequence to the

    correct application.

    Gesture Segmentation: Te same set o gestures in the same application can

    map to several meanings depending on the context o the touch events. Tus

    the touch events should be again patterned into parts depending on the ob-

    ject o intention. Tese patterned data will be sent to the gesture recognition

    module.

    Gesture Classication: Te gesture recognition module will work upon the

    patterned data to map it to the correct command. Tere are various techniquesor gesture recognition which can be used alone or in combination like Hid-

    den Markov Models, Articial Neural Networks, Finite State Machine etc.

    One o the most important techniques is Hidden Markov Models (HMM).

    HMM is a statistical model where the distributed initial points work well and

    the output distributions are automatically learned by the training process. It is

    capable o modeling spatio-temporal time series where the same gesture can dier

    in shape and duration. [4]An articial neural network (ANN), oten just called a neural network (NN), is a

    computational model based on biological neural networks [Fig. 4]. It is very ex-

    ible in changing environments. A novel approach can be based on the use o a set

    o hybrid recognizers using both HMM and partially recurrent ANN. [2]

    Te gesture recognition module should also provide unctionality or online and

    Figure 4: Blob detection to gesture recognition ramework outline

  • 8/8/2019 First Edition Community Release

    44/100

    34 Multi-Touch Technologies Handbook

    ofine recognition as well as shot and progressive gestures. A model based on the

    present discussion can be pictorially shown as:

    Te separation o the gesture recognition module rom the main client architec-

    ture, which itsel can be a MVC architecture, helps in using a gesture recognition

    module as per requirement eg say later on speech or other inputs are available to

    the gesture recognition module then it can map it to appropriate command or the

    interace controller to update the view and the model.

    2.3.4 Development Frameworks

    A number o rameworks have been released and are being developed to help in the

    development o multi-touch applications providing an interace or the manage-

    ment o touch events in an object-oriented ashion. However the level o abstrac-

    tion is still till the device input management, in the orm o touch events being

    sent through UIO protocol irrespective o the underlying hardware. Te gesture

    recognition task which will realize the true potential o multi-touch suraces, is

    still the job o the client. Oten, however, some basic gestures are already included,in particular those or natural manipulation (e.g. o photos), but in general these

    rameworks arent ocused on gestural interaces. Tey rather tend to port the GUI

    and WIMP canons to a multitouch environment.[1]

    Gesture and gesture recognition modules have currently gained a lot o momen-

    tum with the coming up o the NUI interace. Some o the important rameworks

    are:

    Sparsh-UI - (http://codegooglecom/p/sparsh-ui/)

    Sparsh-UI [30], published under LGPL license, seems to be the rst actual mul-

    titouch gesture recognition ramework. It can be connected to a a variety o hard-

    ware devices and supports dierent operating systems, programming languages

    and UI rameworks. ouch messages are retrieved rom the connected devices

    and then processed or gesture recognition. Every visual component o the client

    interace can be associated to a specic set o gestures that will be attempted to be

    recognized. New gestures and recognition algorithms can be added to the deaultset included in the package.

  • 8/8/2019 First Edition Community Release

    45/100

    35Software & Applications

    Gesture Denition Markup Language - (http://nuigroupcom/wiki/Getsure_Recognition/)

    GDML is a proposed XML dialect that describes how events on the input surace

    are built up to create distinct gestures. By using an XML dialect to describe ges-

    tures, will allow individual applications to speciy their range o supported gestures

    to the Gesture Engine. Custom gestures can be supported. Tis project envisages

    1. Describing a standard library (Gesturelib) o gestures suitable or the majority

    o applications

    2. A library o code that supports the dened gestures, and generates events to

    the application layer.

    Grati

    Grati is a C# ramework built on top o the uio client that manages multi-

    touch interactions in table-top interaces. Te possible use o tangible objects is

    particularly contemplated. It is designed to support the use o third party modules

    or (specialized) gesture recognition algorithms. However a set o modules or therecognition o some basic gestures is included in this project.

    NUIFrame

    NUIFrame is a C++ ramework based on the above discussed model (currently

    under development). It provides a separate gesture recognition module to which

    besides the touch-event sequence, contextual inormation regarding the view o

    the interace is also provided by the client application. Tis ensures that the samegesture on dierent objects can result in dierent operations depending on the

    context. It will also support custom gestures based on user specication or a par-

    ticular command. Te set o gesture widgets will also support automatic debugging

    by using pattern generation, according to the gestures supported by it.

    AME Patterns Library

    Te AME Patterns library is a new C++ pattern recognition library, currently o-

    cused on real-time gesture recognition. It uses concept-based programming to ex-press gesture recognition algorithms in a generic ashion. Te library has recently

    been released under the GNU General Public License as a part o AMELiA (the

    Arts, Media and Engineering Library Assortment), an open source library col-

    lection. It implements both a traditional hidden Markov model or gesture rec-

    ognition, as well as some reduced-parameter models that provide reduced train-

  • 8/8/2019 First Edition Community Release

    46/100

    36 Multi-Touch Technologies Handbook

    ing requirements (only 1-2 examples) and improved run-time perormance while

    maintaining good recognition results.

    Tere are also many challenges associated with the accuracy and useulness o

    Gesture Recognition sotware. Tese can be enumerated as ollows:

    Noise pattern in the gestures

    Inconsistency o people in use o their gestures.

    Finding the common gestures used in a given domain instead o mapping a

    dicult to remember gesture.

    olerance to the variability in the reproduction o the gestures

    olerance with respect to inaccurate pointing as compared to GUI system.

    Distinguishing intentional gestures rom unintentional postural adjustments-

    known as the gesture saliency problem.

    Gorilla arm: Humans arent designed to hold their arms in ront o their acesmaking small motions. Ater more than a very ew selections, the arm begins to

    eel sore, cramped, and oversized -- the operator looks like a gorilla while using

    the touch screen and eels like one aterwards. It limits the development and use

    o vertically-oriented touch-screens as a mainstream input technology despite a

    promising start in the early 1980s. However its not a problem or short-term us-

    age.

    Further improvement to the NUI interace can be added by augmenting the inputto the gesture recognition module with the inputs rom proximity and pressure

    sensing, voice input, acial gestures etc.

  • 8/8/2019 First Edition Community Release

    47/100

    37Software & Applications

    2.4. Python

    Python is a dynamic object-oriented programming language thatcan be used or many kinds o sotware development. It oers strong sup-

    port or integration with other languages and tools, comes with extensivestandard libraries, and can be learned in a ew days. Many Python pro-grammers report substantial productivity gains and eel the language en-courages the development o higher quality, more maintainable code.

    2.4.1 Introduction

    his section describes two great technologies: multi-touch user interac-

    es and the Python programming language[1]. Te ocus is specically

    directed towards using Python or developing multi-touch sotware.

    Te Python module PyM[2] is introduced and used to demonstrate

    some o the paradigms and challenges involved in writing sotware or multi-

    touch input. PyM is an open source python module or rapid prototyping anddevelopment o multi-touch applications.

    In many ways both Python and multi-touch have emerged or similar reasons o

    making computers easier to use. Although as a programming language Python

    certainly requires advanced technical knowledge o computers, a major goal is to

    let developers write code aster and with less hassle. Python is oten touted as

    being much easier to learn than other programming languages[3]. Many Python

    programmers eel strongly that the language lets them ocus on problem solvingand reaching their goals, rather than deal with arcane syntax and strict language

    properties. With its emphasis on readability and a plethora o available modules

    available rom the open source community, Python tries to make programming

    as easy and un as multi-touch user interaces make computers more natural and

    intuitive to use.

    Although the availability o multi-touch interaces is growing at an unprece-

    dented rate, interactions and applications or multi-touch interaces have yet toreach their ull potential. Especially during this still very experimental phase o

    exploring multi-touch interaces, being able to implement and test interactions or

    prototypes quickly is o utmost importance. Pythons dynamic nature, rapid devel-

    opment capabilities, and plethora o available modules, make it an ideal language

    or prototyping and developing multi-touch interactions or applications quickly.

  • 8/8/2019 First Edition Community Release

    48/100

    38 Multi-Touch Technologies Handbook

    2.4.2 Python Multi-touch Modules and Projects

    Te ollowing is a brie overview o some multi-touch related python projects and

    modules available:

    PyM [2]

    PyM is a python module or developing multi-touch enabled media rich

    OpenGL applications. It was started as a research project at the University o

    Iowa[10], and more recently has been maintained and grown substantially thanks

    to a group o very motivated and talented open source developers [2]. It runs onLinux, OSX, and Windows. PyM will be discussed in urther detail in the ol-

    lowing subsections. PyM is licensed under the GPL license.

    touchPy [4]

    Python ramework or working with the UIO[8] protocol. touchPy listens to

    UIO input and implements an Observer pattern that developers can use or sub-

    class. Te Observer pattern makes touchPy platorm and module agnostic. Forexample it does not care which ramework is used to draw to the screen. Tere are

    a series o great tutorials written or touchPy by Alex eiche [9].

    PointIR [5]

    PointIR is a python based multi-touch system demoed at PyCon 2008. While it

    certainly looks very promising, it seems to have vanished rom the (searchable?)

    realms o the Internet. Specic license inormation is unknown.

    libAVG [6]

    According to its web page, libavg is a high-level media development platorm

    with a ocus on interactive installations [4]. While not strictly a multi-touch

    module, it has been used successully to receive UIO input and do blob tracking

    to work as a multi-touch capable system. libavg is currently available or Linux and

    Mac OS X. It is open source and licensed under the LGPL.

    pyUIO [7]

    A python module or receiving and parsing UIO input. pyUIO is licensed

    under the MI license.

  • 8/8/2019 First Edition Community Release

    49/100

    39Software & Applications

    2.4.3 PyM

    PyM is a python module or developing multi-touch enabled media rich OpenGL

    applications. Te main objective o PyM is to make developing novel, and cus-

    tom interaces as easy and ast as possible. Tis section will introduce PyM in

    more detail, discuss its architecture, and use it as an example to discuss some o the

    challenges involved in writing sotware or multi-touch user interaces.

    Every (useul) program ever written does two things: take input and provide

    output. Without some sort o input, a program would always produce the exact

    same output (e.g Hello World!), which would make the program rather useless.

    Without output, no one would ever know what the program was doing or i it was

    even doing anything at all. For applications on multi-touch displays, the main

    method o input is touch. Graphics drawn on the display are the primary output.

    PyM tries to make dealing with these two orms o input and output as easy

    and exible as possible. For dealing with input, PyM wraps the UIO protocol

    [8] into an event driven widget ramework. For graphical output PyM builds

    on OpenGL to allow or hardware accelerated graphics and allow maximum ex-ibility in drawing.

    2.4.4 PyM Architecture

    Figure 5 displays the main architecture o PyM. As already mentioned it uses

    UIO and OpenGL or input/output. Te current version o PyM relies on py-

    glet[11], which is a a cross-platorm OpenGL windowing and multimedia library

    or Python. Especially pyglets multimedia unctionality makes dealing with im-ages, audio and video les very easy. Most media les can be loaded with a single

    line o python code (and drawn/played to an OpenGL context with a single line as

    well). Tis subsection briey discusses the role o OpenGL as a rendinering engine

    as well as the event system and widget hierarchy at the heart o PyM.

  • 8/8/2019 First Edition Community Release

    50/100

    40 Multi-Touch Technologies Handbook

    Figure 5: PyM Architecture: PyM builds on top o pyglet [11], which providesOpenGL windowing and multimedia loading or various fle ormats. It also imple-ments a UIO client that listens or input over the network. PyM glues these tech-nologies together and provides an object oriented library o widgets and a hierarchical

    system or layout and event propagation.

    2.4.5 OpenGL

    Using OpenGL as a drawing backend has both advantages and disadvantages.

    While it allows or ultimate perormance and exibility in terms o 2D and 3D

    rendering, it is also very low-level. Tereore its learning curve can be rather steep

    and requires a good understanding o basic o computer graphics.

    PyM tries to counteract the need or advanced OpenGL knowledge by pro-

    viding basic drawing unctions that act as wrappers to OpenGL. For examplePyM includes unctions such as drawCircle, drawRectangle, drawLine, draw-

    exturedRectangle and many others that require no knowledge o OpenGL. Us-

    ing OpenGL as an underlying rendering engine however, lets more advanced users

    take ull control over the visual output o their applications at peek perormance.

    PyM also provides helper unctions and classes to aid advanced OpenGL pro-

    gramming. Creating, or example, a Frame Buer Object or shaders rom glsl

    source can be done in one line. PyM also handles projection matrices and layout

    transormations under the hood, so that widgets can always draw and act in their

    local coordinate space without having to deal with outside parameters and trans-

    ormations. Te main idea is to do things the easy way most o the time; but i

    advanced techniques or custom drawing is required to provide easy access to raw

    OpenGL power.

  • 8/8/2019 First Edition Community Release

    51/100

    41Software & Applications

    Describing OpenGL itsel ar exceeds the scope o this document. For gen-

    eral inormation about OpenGL see [12][13], the standard reerence Te Red

    Book[14], or the classic Nehe OpenGL tutorials on nehe.gamedev.net [15]

    2.4.6 Windows, Widgets and Events

    Most Graphical User Interace (GUI) toolkits and rameworks have a concept

    called the Widget. A Widget is considered the building block o a common

    graphical user interace. It is an element that is displayed visually and can be

    manipulated through interactions. Te Wikipedia article on Widgets or examplesays the ollowing [16]:

    In computer programming, a widget (or control) is an element oa graphical user interace (GUI) that displays an inormation arrange-ment changeable by the user, such as a window or a text box. [...] A amilyo common reusable widgets has evolved or holding general inormation

    based on the PARC research or the Xerox Alto User Interace. [...] Eachtype o widgets generally is defned as a class by object-oriented program-ming (OOP). Tereore, many widgets are derived rom class inheritance.

    Wikipedia Article: GUI Widget[16]

    PyM uses similar concepts as other GUI toolkits and provides an array o wid-

    gets or use in multi-touch applications as part o the ramework. However, the

    main ocus lies on letting the programmer easily implement custom widgets andexperiment with novel interaction techniques, rather than providing a stock o

    standard widgets. Tis is motivated primarily by the observations and assump-

    tions that:

    Only very ew Widgets and Interactions have proven themselves standard

    within the natural user interace (NUI).

    Te NUI is inherently dierent rom the classic GUI / WIMP (Window,Icon, Menu, Pointing