Download pdf - A First Person Immersive Animation Tool · The tool uses an Oculus Rift DK2 device to give the user a 3D view with 6 DOF head tracking. The hand tracking is done with a Leap Motion

Aalto University

School of Science

Degree Programme in Computer Science and Engineering

Jere Nevalainen

A First Person Immersive AnimationTool

Master’s ThesisEspoo, May 11, 2015

Supervisor: Assistant Professor Perttu Hamalainen

Aalto UniversitySchool of ScienceDegree Programme in Computer Science and Engineering

ABSTRACT OFMASTER’S THESIS

Author: Jere Nevalainen

Title:A First Person Immersive Animation Tool

Date: May 11, 2015 Pages: viii + 60

Major: Media Technology Code: T-111

Supervisor: Assistant Professor Perttu Hamalainen

Computer-generated animation has an important role in both video game andfilm industry. 3D computer animation is generally done with 2D devices, suchas the computer mouse, that are not optimal for that kind of use. To use themefficiently in 3D, long training periods are needed. This makes them especiallyinefficient in novice hands. Additionally, 2D display devices, such as the computermonitor, are unable to give the user depth perception. Therefore the user has torely on perspective projection.

The goal of this thesis was to create an animation tool, which provides the usera 3D view of the objects of interest and the user can manipulate the objects withhis own hands in virtual space. This should be quite natural for users, becausehumans have lived their whole lives in a 3D world. The tool was developed andtested in collaboration with professional game industry animators.

The tool uses an Oculus Rift DK2 device to give the user a 3D view with 6 DOFhead tracking. The hand tracking is done with a Leap Motion controller, which ismounted in front of the Oculus Rift device. The software was programmed usingthe Unity game engine.

According to the results there are uses for this sort of a tool especially in the firstrough posing phases. The hand tracking quality still has something to improve,but even at the current level it can increase productivity in certain parts of theworkflow.

Keywords: animation, virtual reality, oculus rift, leap motion, handtracking, hand gestures

Language: English

ii

Aalto-yliopistoPerustieteiden korkeakouluTietotekniikan koulutusohjelma

DIPLOMITYONTIIVISTELMA

Tekija: Jere Nevalainen

Tyon nimi:Ensimmaisen persoonan immersiivinen animaatiotyokalu

Paivays: 11. toukokuuta 2015 Sivumaara: viii + 60

Paaaine: Mediatekniikka Koodi: T-111

Valvoja: Apulaisprofessori Perttu Hamalainen

Tietokoneella luotu animaatio on tarkeassa osassa videopeli- ja elokuvateolli-suudessa. Kolmiulotteisia tietokoneanimaatioita tehdaan yleisesti kaksiulotteisil-la laitteilla, kuten hiirella, jotka eivat ole optimaalisia tahan kayttoon. Niidenkayttaminen tehokkaasti vaatii pitkaa harjoittelua, joten ne ovat epatehokkaitaetenkin vasta-alkajien kasissa. Kaksiulotteiset nayttolaitteet eivat myoskaan pys-ty antamaan syvyysvaikutelmaa, joten kayttajat joutuvat tyytymaan perspektii-viprojektioon.

Taman diplomityon tarkoituksena oli luoda animaatiotyokalu, jonka avulla ani-maattori nakee hahmon kolmiulotteisena ja voi muokata sen asentoa omin kasinperinteisen nukkeanimaation tapaan. Taman pitaisi olla hyvin luontevaa, koskaihmiset ovat oppineet elamaan kolmiulotteisessa maailmassa. Tyokalu kehitettiinyhteistyossa peliteollisuudessa tyoskentelevien animaattoreiden kanssa.

Tyokalu kayttaa Oculus Rift DK2 -laitetta antaakseen kayttajalle kolmiulotteisennakyman ja kuuden vapausasteen liikkeentunnistuksen paan asennoille. Kasienliiketta seurataan Leap Motion -ohjaimella. Tyokalun ohjelmointi tehtiin Unity-pelimoottoria kayttaen.

Tuloksien perusteella tamankaltaisella tyokalulla olisi kayttoa etenkin karkeas-sa alkuvaiheen sommittelussa. Kasienseurantalaitteen tarkkuudessa on viela pa-rantamisen varaa, mutta nykytasollakin pystyy tehostamaan tiettyja kohtiatyonkulusta.

Asiasanat: animaatio, virtuaalitodellisuus, oculus rift, leap motion,kasien seuranta, kasieleet

Kieli: Englanti

iii

Acknowledgements

I would like to thank my thesis supervisor, professor Perttu Hamalainen forthe original idea for this thesis and all the following input. The whole projectwas very interesting and I learned a lot in the process.

I would also like to thank all the testers who gave their input during theprototyping phase and the testers who participated in the final user test.The future work ideas I received were very valuable.

I thank the computer science guild Tietokilta ry for providing me seven yearsof networking and joy during my studies. The years in the board and variouscommittees were an excellent balance to study and work life.

Finally I want to thank my family, friends and especially Gaja for all thesupport during this whole project. It was quite a ride.

Espoo, May 11, 2015

Jere Nevalainen

iv

Abbreviations and Acronyms

DK2 Development Kit 2DLL Dynamic Link LibraryDOF Degrees of FreedomFBX Filmbox (file format)FOV Field of ViewHCI Human-Computer InteractionHMD Head Mounted DisplayLerp Linear InterpolationNCF IK Non-Iterative, Closed-Form, Inverse Kinematic Chain

SolverSDK Software Development KitSlerp Spherical Linear InterpolationSUS System Usability ScaleVR Virtual Reality

v

Contents

Abbreviations and Acronyms v

1 Introduction 1

2 Goals and requirements 3

2.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Interaction in computer animation 5

3.1 Human-computer interaction techniques in 3D . . . . . . . . . 5

3.1.1 Command based interaction . . . . . . . . . . . . . . . 7

3.1.2 Menu selection . . . . . . . . . . . . . . . . . . . . . . 8

3.1.3 Form fill-in . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.4 Natural language . . . . . . . . . . . . . . . . . . . . . 9

3.1.5 Direct manipulation . . . . . . . . . . . . . . . . . . . 10

3.2 Input devices . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.1 Keyboard . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.2 Mouse . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.2.2.1 Axis Selection in Mouse Based Interaction . . 13

3.2.3 Physical input devices with varying degrees of freedom 14

3.2.4 Computer vision based input . . . . . . . . . . . . . . . 16

3.2.5 Physically manipulable devices . . . . . . . . . . . . . 19

vi

3.2.6 Wearable devices . . . . . . . . . . . . . . . . . . . . . 20

3.3 Output devices . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.3.1 Monoscopic displays . . . . . . . . . . . . . . . . . . . 21

3.3.2 Stereoscopic displays . . . . . . . . . . . . . . . . . . . 22

4 Environment 24

4.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.1.1 Leap Motion Controller . . . . . . . . . . . . . . . . . 24

4.1.2 Oculus Rift Development Kit 2 . . . . . . . . . . . . . 26

4.1.3 Wireless Xbox 360 Controller . . . . . . . . . . . . . . 27

4.1.4 Keyboard . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.2.1 Unity . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Implementation 30

5.1 System overview . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.2 Inverse Kinematics . . . . . . . . . . . . . . . . . . . . . . . . 32

5.3 Leap Motion controller . . . . . . . . . . . . . . . . . . . . . . 35

5.3.1 Hand tracking . . . . . . . . . . . . . . . . . . . . . . . 35

5.3.2 Image passthrough . . . . . . . . . . . . . . . . . . . . 36

5.4 Oculus Rift . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

5.5 Poseable character . . . . . . . . . . . . . . . . . . . . . . . . 38

5.6 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.6.1 Timeline Slider . . . . . . . . . . . . . . . . . . . . . . 39

5.6.2 Buttons . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.6.3 Wireless Xbox 360 Controller . . . . . . . . . . . . . . 43

5.7 Floor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.8 Animation controller . . . . . . . . . . . . . . . . . . . . . . . 45

6 Evaluation 47

vii

6.1 Usability testing . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2 Usability testing results . . . . . . . . . . . . . . . . . . . . . . 48

7 Conclusions and future work 54

7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

viii

Chapter 1

Introduction

Three dimensional (3D) computer animation has become a staple of movieand video game entertainment. Computer animation’s roots are deep, and fora long time the animations have been created with tools that can work onlyin two dimensions (2D): the computer mouse, keyboard and the traditional2D computer display. These are not optimal when working with 3D worlds.There are alternative ways to create high quality animation data such asmotion capture technologies. The problem with motion capture technologiesare the generally large space requirements, the need of actors and long cali-bration procedures. Such things are not always possible, so the mouse andkeyboard combination is a solid option.

When it comes to working on a desktop, various different vision basedand mechanical devices have been invented that are supposed to be moresuitable for 3D work. These offer more degrees of freedom (DOF) than thepopular computer mouse, which is confined to working in two dimensions.Plenty of the devices have stayed on a proof-of-concept level and have neverfound their way into professional usage.

Lately virtual reality (VR) head mounted display (HMD) devices havegained a boost in popularity with the introduction of consumer grade de-vices such as the Oculus Rift Development Kits 1 and 2 and the upcomingSteamVR by Valve Corporation and HTC Corporation. These devices bring3D virtual worlds with head tracking to homes and offices at a very affordableprice.

Hand and tool tracking has been researched for decades. The resultingdevices have had little presence in the input device market compared to thedominating input devices: the computer keyboard and mouse. Lately a hand

1

CHAPTER 1. INTRODUCTION 2

tracking device called the Leap Motion controller was introduced and it offers6 degrees of freedom tracking for hands and tools. Research shows that 6DOF tasks such as positioning and rotating objects in virtual space shouldbe done with 6 DOF tools, so using the device for animation purposes is anattractive idea.

The goal of this thesis was to build an animation tool that uses the latesteasily available virtual reality hardware and to see how well current virtualreality technologies fit into animation production. The combination of 3Ddisplay, head tracking and hand tracking hopefully offers the user a verynatural way to interact with a poseable 3D character. The tool should notrequire a lot of space around the user and setting up the system should notrequire a lot of calibration every time the system is brought into use. Thetool is especially good for novices, since most animation software has quitesteep learning curves because the interaction with mouse and keyboard is notnatural when working with 3D objects. When using these existing tools, theuser has to learn how his or her 2D actions map to the 3D world. Bringingthe user’s hands into the 3D world and giving the user a 3D presentationof the world hopefully make the learning process much faster, since the userhas already learned how to interact with 3D objects in real life with his orher hands.

In Chapter 2 we describe the goals of this thesis more thoroughly. InChapter 3 we go through different interaction techniques in 3D and howdifferent input and display devices have been used in 3D animation. InChapter 4 we describe the hardware and software used in this thesis project.In Chapter 5 we describe how the required features were implemented andhow the hardware functions were used in the software. In Chapter 6 weevaluate how suitable the created tool is for animation production. Lastly,in Chapter 7 we discuss the conclusions and think of possible future work.

Chapter 2

Goals and requirements

In this chapter we go through the goals and requirements for our project.

2.1 Goals

The goal was to create a first person immersive animation tool using currenteasily available hardware that hopefully feels very natural for the users touse. We found out how well it fared in animation creation and how animationprofessionals felt about using such devices in their work.

2 DOF mouse and keyboard have been dominating the industry for along time, although they are not optimal for 6 DOF 3D manipulation (seechapter 3). Using 6 DOF devices for 6 DOF tasks presumably decreases taskcompletion times, but not necessarily accuracy, because the computer mouseand keyboard have been found to be very precise devices [19]. We askedcomputer animation professionals to test the system and to provide feedbackon how well they could see this sort of a system being used in the industryand for what sort of tasks.

2.2 Requirements

The following requirements were defined at the start of this thesis for thecharacter posing tasks:

• The user needs to be able to freely move the animated character abovea plane that is designated as the floor.

3

CHAPTER 2. GOALS AND REQUIREMENTS 4

• The character itself should be moved by grabbing the root body part,which is for humanoid character usually the hip region.

• Other body parts should be rotatable and twistable.

• Arms start moving from the shoulders, legs start moving from the hips,head from the chest and spine from the hips.

• Fingers and toes are not movable, since there is no scaling in this versionof this project. Fingers and toes are too small to be grabbed reliably.

For timeline control and keyframing, following requirements were defined:

• The user creates keyframes and the tool will interpolate the framesbetween them.

• The user can also modify and delete these keyframes at will.

• For body part rotations spherical linear interpolation (Slerp) will beused and for the root positions linear interpolation (Lerp) is used.

• The user must be able to manually scroll through the animation byscrubbing the timeline. This means the user must be able to jump intoany point of the animation and insert new keyframes or edit existingkeyframes.

• The length of the whole animation, therefore also the length of thetimeline, must be adjustable by the user.

Additionally, the viewpoint must be adjustable. If the default viewpointis set in front of the animated character, the user must be able to look at thecharacter from any angle and distance.

Originally it was planned to be able to be able to export animation datain Filmbox (FBX) format, but as the project went on this idea was scrappeddue to time and software constraints. Unity 4.6 Free edition has limitationson how it can handle external Dynamic Link Libraries (DLL). Unity 4 Freecan handle managed DLL libraries, which means it only supports .NET as-semblies. Exporting in FBX requires use of the FBX Software DevelopmentKit (SDK) DLL, which is an unmanaged C++ library and therefore requiresa Unity 4 Pro license. At a late part of this thesis project Unity 5 wasreleased and it can use the FBX DLL for free, but it was too late at thatpoint.

Chapter 3

Interaction in computer anima-tion

The field of computer animation is filled with different techniques and devicesfor creating realistic and imaginative animations. Some favor ease of useand speed over quality, while some systems use a lot of resources, becausesometimes only the finest quality is acceptable.

The first section 3.1 of this chapter concentrates on what requirementsinteraction in 3D and animation has and how different interaction techniquesmanage to meet them. The next section 3.2 looks at different input devicesand talks about the pros and cons of each one when interacting with 3Dobjects or the user interface. Lastly, interacting with computers without anyfeedback is quite difficult. There is a wide variety of different output devicesand some of them are more suitable for 3D work than others. These arediscussed in the section 3.3.

3.1 Human-computer interaction techniques

in 3D

When building a user interface, no matter if it is for 2D or 3D work, theinterface should be intuitive and easy enough to use so the user can con-centrate on learning the task domain instead of fighting the interface syntaxor rules. [22] Human capabilities should be taken into consideration whendesigning how the user will interact with the virtual presentations. There isplenty of research into how human sensorimotor system works and how we

5

CHAPTER 3. INTERACTION IN COMPUTER ANIMATION 6

handle 3D tasks. All this data should be taken into consideration so the userinterface can be not only intuitive but also natural for the target demogra-phy. The more natural and transparent the experience is, the more engagedand productive the users are. [14]

In real life 3D objects can be moved around, rotated and depending onthe material the objects can also be reshaped. Being able to do these thingsintuitively in a virtual space increases immersion and makes the users feellike they are really there. The user interface fades away and the users are leftalone with their tasks. Virtual spaces also add things that can not be donein real life, such as creating objects from nothing, destroying them withouta trace and scaling them freely.

Just manipulating objects is not enough when working with 3D objectsand animations. If the application is interactive, the users generally haveto be able to change their viewpoint at will. They need to be able to movearound in the virtual space and circle around points of interest, so theycan for example see the animation they have been working on play backfrom different angles. Working on small details, such as a character’s fingermovements, is difficult or in some cases impossible without zooming features.

On top of manipulating objects and moving around in the virtual space,the user usually needs to be able to control the application he or she isworking with. This can be as simple as shutting down the application or insome cases complex, such as when the user needs to switch between toolsor modify parameters that affect the 3D scenes he or she is working on.Even if some interaction techniques or devices are exceptionally good whenmanipulating virtual 3D objects, they might be extremely cumbersome orinefficient when controlling the application itself. This might introduce issuessuch as increased amount of interaction devices and the need for the user toswitch between them and adjust to each of them.

Important application control issue in computer animation is timelinescrubbing. The users need to be able to move forward and backward ona timeline so they can observe how any changes they have made affect theanimation. The users also need to have some sort of control on what sortof poses the characters have at certain times. One way to do these arekeyframes. The user poses a character and then tells the application to savethe current situation as a keyframe. After the user has set different keyframesat different points on the timeline, the software can interpolate poses for theframes between the keyframes.

Different interaction techniques have been developed to aid human and


computer co-operation. Each of them have areas where they are better thanothers. We will have a look how common techniques can handle the tasksthat arise when working with 3D objects and animation software. Human-computer interaction methods are commonly split into different styles: com-mand language, menu selection and direct interaction [22] with sometimesform fill-in and natural language as fourth and fifth category [29]. Thesemethods can be used alone or mixed together.

3.1.1 Command based interaction

In command based interaction the user inputs text commands in a text fieldor a command line. The commands may have extra variables appended to theend that change what the command does. The advantages and disadvantagesof a command based interface are [29]:

• Advantages

– Flexibility

– Appealing to advanced users

– Support for user initiative

– Allows creation of user-defined macros

• Disadvantages

– Poor error handling

– Requires substantial training and memorization

Command based interaction is not very suitable when dealing with objectsin 3D space. The user needs a high level of expertise to know how theobject reacts to commands and variables that he or she inputs. Example of acommand that moves an object to a position in a 3D space could be translate50 50 0 where translate is the command to move the object to a positionthat is given by the trailing numbers, as XYZ-coordinates in this example.The user has to know how much one coordinate unit moves the object inthe 3D space in question. Command based interaction is very precise andfast if the user happens to know exact coordinates beforehand, but even inthat case the used interaction technique is probably form fill-in described insection 3.1.3.


Since actions have their own commands and some commands might evencombine multiple actions into one command, the user has to learn all relevantcommands by heart or refer to a manual every time infrequent commandsare needed. This makes this interaction quite slow to use for rookies, butseasoned veterans can use the full flexibility and power of this technique.

3.1.2 Menu selection

Menu selection systems provide the user with lists of actions. The user canpick the best action for the current situation. As long as the list has sen-sible terminology and the available actions are comprehensive enough, theuser can get through their task with minimal learning, especially when com-pared to command based systems. Shneiderman [29] lists the menu selectionadvantages and disadvantages as:

• Advantages

– Short learning times

– Reduces keystrokes

– Helps structure decision making

– Permits use of dialog-management tools

– Easy support of error handling

• Disadvantages

– Actions might get lost in case of several menus

– May be slower for advanced users than for example commandlanguage

– Consumes screen space

– Requires a fast enough display rate

Using menu selection to manipulate 3D objects has the same problems ascommand based interaction. Menu selection is essentially the same thing, thecommand are just laid down for the user to use with the mouse or keyboardshortcuts. Menus are better suited for selecting tools or modes that are thenused with direct manipulation. More on direct manipulation comes later insection 3.1.5.


3.1.3 Form fill-in

When using form fill-in techniques, the users literally fill forms the user in-terface asks them to. Form fill-in usually accompanies other interaction tech-niques, since it is geared towards data entry rather than telling the computerwhat actions to do. For example some menu selected action can give the usera dialog box with forms so the user can give more specifications what theaction is supposed to do. Form fill-in advantages and disadvantages are [29]:

• Advantages

– Simplifies data entry

– Modest training required

– Can give assistance to the user

– Permits use of form-management tools

• Disadvantages

– Consumes screen space

Form fill-in is very common in modern 3D capable applications. Forexample the Unity editor interface shows a frame window for the currentlyselected object and the window includes information about the object. Someof the information can not be changed directly, but the form contains fieldsfor variables such as position, rotation and scale. The user can directly inputthe values he wants without dragging the object around on the screen. Thisis faster and more accurate than manipulating the object with a mouse forexample, but the values have to be known beforehand. Otherwise findingthe correct position or rotation probably takes multiple tries and a lot longerthan with direct interaction through the object.

3.1.4 Natural language

Natural language is completely different from the techniques mentioned be-fore. In natural language systems the user simply writes or orates naturallanguage sentences and the computer is supposed to understand what to do.For example the user could say ”Make a kneeling pose with both hands up inthe air”. Without any rules or syntax the amount of possible interpretationsfor the commands is immense and there has been little success so far. [29]Natural language will not be covered more as it is not particularly relevantyet.


3.1.5 Direct manipulation

Direct manipulation systems are more visual and rapid than the systemsdiscussed so far. In direct manipulation systems the objects of interest arekept visible at all times. Secondly, any complex syntax is replaced by physicalactions or well labeled buttons. Finally, operations consist of many rapid andeasily reversible actions and their effects can be seen on the objects in realtime. [15] Direct manipulation advantages and disadvantages are [29]:

• Advantages

– Visually presents task concepts

– Easy learning

– Easy retention

– Allows errors to be avoided

– Encourages exploration

– Affords high subjective satisfaction

• Disadvantages

– May be hard to program

– May require graphics display and pointing devices

The visualness of direct manipulation is obviously a very attractive con-cept when designing an user interface for 3D interaction. The rapid anditerative approach is also effective when posing characters for an animation.If the first pose is not quite what the user wanted, he or she can easily fine-tune the pose or reverse his or her actions easily. Application control tasksuch as moving on a timeline can be done directly by clicking on a point onthe timeline and the time jumps to that position, or the user could drag amarker that shows the current position around.

When using physical input devices for animation and posing purposes,direct manipulation is the most desirable of the described interaction tech-niques. Major reason for this is that animation is always visual and withdirect manipulation the users can always see how their actions affect thecharacter on the screen. Especially when using head mounted display (HMD)devices that place the user in the virtual world, directly manipulating theobjects the user actually sees in 3D is as natural it can get.


3.2 Input devices

Ultimately all input devices are devices that encode motion, sound or otherwave data into a signal that can be read by a computer. Since computershave so many different uses, fundamentally different input devices have beendeveloped over time. Some are more generalistic that have many purposes,while some have been designed to do specialized tasks.

In this section we cover current and past input devices that have beenused when dealing with objects in 3D space. User interface usage is alsotouched, if the device is especially suitable for that. Some of the devices,such as the mouse and keyboard, are common in general everyday computerusage, while some of them are more experimental or especially geared towards3D work.

3.2.1 Keyboard

In 3D interaction, keyboard offers more experienced users large degree ofcontrol and accuracy at the price of sacrificing ease of use [19]. Since thekeyboard is mostly used to input exact values in fields the interface provides,the user has to already know the values or at least be able to roughly guessthe correct values and fine tune them later. If the user is unfamiliar how theinterface maps the values to the 3D world, adjusting objects such as characterlimbs may require multiple tries to get results even close to acceptable.

The keyboard can also be used to manipulate objects directly withoutinputting any values. Keyboard’s keys can be mapped into operations thatmanipulate objects in the virtual world. For example, six keys can be mappedas positive and negative motions on X, Y and Z axes and similarly six otherkeys for rotations. This might sound a lot to remember for something thatis very natural for people to do with physical objects in real life.

In real world we can move and rotate objects relative to their currentposition by picking them up and simply moving them. With keyboard themovement has to be done by pressing a button that changes the positionor rotation incrementally. All interaction with a keyboard that is not aboutinputting values into fields is always incremental. This takes away from thenaturalness.

Keyboards excel in user familiarity because just as the computer mouse,most users learn to use them as soon as they start using the computer.Since nearly every computer system has a keyboard, most computer software


supports them. In 3D software the keyboard’s part is more on the supportiveside. The keyboard is used to input shortcuts to change tools or usage modes,so an input device more suitable to handle 3D interaction can be used moreefficiently.

3.2.2 Mouse

The computer mouse, just like the keyboard, offers a high level of precisionand control. These days, the mouse should be familiar to anyone who isinterested at doing computer animation and all the major animation softwarehave user interfaces that are generally used with a mouse. Even though themouse has pervaded the animation scene so widely, we can show some reasonswhy it is not necessarily the best option for 3D posing and animation.

Since the mouse is so prevalent in current computer setups, most of thesuccessful 3D modeling software have user interfaces designed for the mouse.The user interfaces usually have menus and tool buttons laid on the screenaround a window that contains a scene where the virtual objects lie. Atleast a set of most important tools also have keyboard shortcuts, but theinterfaces are generally best used with the mouse and can not be used withsome interaction devices if they can not control the mouse cursor and sendsignals that imply a mouse click.

Selecting objects with a mouse in 3D environments is generally done byray casting. A ray is cast from the viewpoint through a pixel the mouse ispointing at. The object the ray intersects with first is selected. This hasbeen found to be an efficient and easy to learn way to select the objects. [31]How these selected objects can be manipulated is talked about in the sec-tion 3.2.2.1.

The generic computer mouse has 2 DOF. The mouse can be slid on a tablein any direction parallel to the table surface, but the movement will still bemapped to two coordinate axes. These axes can be for example X and Y.If translating an object in 3D space in a way that all three of its Cartesiancoordinates change, the mouse can not perform the translation in just oneaction [18]. The first action translates the object’s two coordinates (let thembe X and Y ) and on the second action translates the final coordinate. Howthe affected coordinates are chosen depends on the software. Common wayis to use transform gizmos that display the axes on screen and one or twoof the axes can be chosen for modifying. More on transform gizmos can befound in section 3.2.2.1.


There have been more exotic computer mice with more than 2 DOF. Twoof these are the Cubic Mouse by Frohlich et al. [11] and the Rockin’ Mouseby Balakrishnan et al. [1] These are not shaped like the computer mouse weknow today and they did not become a staple computer peripheral. However,testers of both mice had faster results when working with 3D objects andpreferred the more exotic mice over the traditional mouse.

One advantage of a computer mouse is that using one leaves the non-dominant hand free for other tasks and devices. This way the user canenhance his productivity with for example a keyboard that is used to inputkeyboard shortcuts. These shortcuts can change tools the user controls withthe mouse, change between different viewpoints or switch between programsaltogether. This eliminates the time needed when using the mouse to gothrough menus or click tool buttons in the interface to change interactionmodes or tools.

3.2.2.1 Axis Selection in Mouse Based Interaction

Computer mice generally have two degrees of freedom, but translation, rota-tion and scaling each have three adjustable variables. Because of this therehas to be a way to select which variables the mouse movements modify. Therehave been different solutions for this ranging from keyboard shortcuts thattoggle between modes, to on-screen tools that are used with the mouse itself.

Most effective way to apply transformations to objects on the scene witha mouse is to use manipulators that are on the scene with the objects. Whenthe user manipulates the objects, the manipulators move along with themand the user’s attention stays on the object. They also separate the required3D operations into simpler to use 1D or 2D operations. Moreover, wellimplemented manipulators give graphical hints how their actions affect theobjects. [20]

In modern 3D capable applications such as Unity, Maya or 3ds Max thesetransformation widgets are usually called transform gizmos. A picture of thetransform gizmos in Unity can be found in figure 3.1.

The gizmos for translate and scale operations are heavily based on skittersand jacks principle by Bier [3]. He describes skitters as cursors that visuallyshow positive vectors for all three axes of the object. The skitter can beon the surface or inside of the object. Jacks are the same thing but withnegative vectors added. In figure 3.1 the green, red and blue arrows or cubesfor translate and scale gizmos show these positive axes. If the user clicks


Figure 3.1: Unity transform gizmos and their keyboard shortcuts as seen inUnity documentation[34]

any of them and drags the mouse, the object will be translated or scaled inthat direction. If the user clicks on the plane perpendicular to two axes, themouse dragging applies to both axes at the same time. This means the usercan at any time only manipulate two axes at the same time. To affect allthree axes the user needs at minimum two operations.

The rotation gizmo is based on Bell’s Trackball [2], the Virtual Trackballby Chen et al. [8] and the Arcball by Shoemake [30]. Each of these create avisual ball inside or around the object that is to be rotated. The user can ro-tate these virtual balls with the mouse cursor as they would rotate a physicaltrackball mouse. This rotation maps to the object and it moves accordingly.In the modern rotation gizmo, if the user clicks on any of the colored linesaround the ball and drags the mouse, the object will rotate along the selectedaxis. When dragging elsewhere on the ball, which axes are affected dependson the viewpoint and which direction the mouse is moved to, but in the endthe rotations apply to maximum two of the axes simultaneously.

3.2.3 Physical input devices with varying degrees offreedom

Since mouse and keyboard are clearly unideal for 3D work, it is not sur-prising that multiple solutions have been devised to aid users in their work.They range from simple knobs and sliders to sophisticated systems that trackhuman hand position and orientation in full 6 DOF.

Knob boxes have been used with 3D tools such as CAD software. They


are boxes or planks that have knobs for each movement and rotation axis.Although these devices have high precision, the interaction style is far fromnatural for human beings, who are used to directly manipulating objectswith hands. When interacting with 3D objects, we should be able to takeadvantage of the skills we have learned when growing up in a 3D world. [13]This means the devices should somehow take hand movements into accountin an integral way, not separable. Tasks such as positioning an object in3D space is done fastest with integral tools [18] and devices that take ad-vantage of the user’s visual and proprioceptive abilities have best hand-eyecoordination results [21].

There are different ways to track how the user uses his or her hand.Common ways are mechanical, electromagnetic, optical, acoustic and inter-tial tracking [13]. Mechanically tracked devices have a handle the user grabsand the handle is connected to an arm, that has sensors in its joints thattrack the rotations between joints. These sensors make it possible to geo-metrically calculate the position and rotation of the object. These devicesare generally cheap and track changes fast, but the usable area is constrainedby the arm length and the mechanical parts will wear out over time.

Electromagnetic trackers use transmitters and receivers to sense how thetracked objects are oriented and positioned. While mechanical devices mightbe stiff enough that the user can let go of the object, electromagneticallytracked devices have to be kept in hand at all times when tracking. Theygenerally use less desk space than the mechanical counterparts and the track-ing area can be made larger with added receivers. However, they are proneto interference from other electrical devices or magnetic objects. [13]

Optical devices are tracked with cameras. The devices can have someeasily recognizable markers, such as infrared LEDs, so the system has to doless calculations when trying to recognize the tracked object. Vision basedsystems are talked about more in section 3.2.4, but in short they generallyhave large operation areas, but their greatest pitfall are occlusion problems.If any other objects, such as the user’s hand, block the camera from seeingthe object or its markers, the tracking halts and the system has to predictthe motions or simply tell the user to bring the object back in sight.

Acoustic systems do not differ greatly from electromagnetically trackedsystems. They use sound emitters and receivers instead of electromagnetism.They also get interference, but this time from echoing sound waves fromother surfaces. Physical objects between the tracked object and trackerscause problems, just as with optical systems. [13]


Intertial tracking systems use devices such as gyroscopes to sense theobject’s motion internally. This means no wires in the way, as long as thedata transmission is done wirelessly. Although the setup is easy, gyroscopebased devices start to drift over time and need calibration according to thesurrounding temperature. [13]

When compared to a computer mouse, using these integral 6 DOF de-vices generally results in faster task completion times and faster learning,because moving one’s hand is a natural way for users to positions objects.On the other hand, accuracy is not always on par with the mouse or knobbased systems and if the devices are free moving, fatigue becomes a problem.Holding one’s arm up and moving it around in the air takes more effort thanusing a computer mouse on a desk. [20, 38]

It might feel like 6 DOF controllers should always be used instead ofa traditional computer mouse, since they are supersets of 2 DOF devices.A controller with 6 DOF can do everything a mouse can and more, whilea mouse is generally confined to 2 DOF. Usually reasons boil down to er-gonomics, cost and accuracy. Modern mice are usually quite light, cheap andare offered in various shapes fitting practically any size of a hand. 6 DOFdevices might require a lot of desk space, can be quite costly due to beingspecialized equipment and might not be as gentle on the user’s arm whenbeing used for 8 hours per day or longer.[18] Specialized hardware also re-quires support in the software. For an input tool to become popular, it needssoftware support, but on the other hand implementing support for it coststime and money. If the device is very uncommon, software vendors may feellike it is not worth the money to add support for it.

3.2.4 Computer vision based input

A very natural way to create pose and motion data is to use physical dolls orliving actors to act the poses and motions in front of cameras. Stop motionanimation has been used and is still used to make feature length movies.Stop motion animation involves moving and molding objects incrementallyand photographing each frame separately. When the frames are played backsequentially it looks like the inanimate objects are moving. When computersare added to the equation, the computer can take fewer key frames andinterpolate the frames between them. This makes animating faster and theamount of frames between key frames can be adjusted so the fluidity of theanimation does not depend on how many photographs of the incrementalmotions are taken.


Feng et al. created an artist doll based capture system [10] that usestwo cameras to look at a wooden doll that has its joints painted in differentcolors. The computer calculates from the photographs how the doll is posed.Even though this particular system is intended for searching a database ofalready existing motion data, the same idea can be used to create key framesfor an animation.

Advantages of these doll based systems are that they do not generallyrequire a lot of space so a desk next to a computer is sufficient. Neither dothey require much of physical effort from the animator, as long as the doll ismade of a material that is easy to manipulate. Using materials such as waxto make the doll makes it possible to create quite imaginative characters, sothe tracked skeletons are not limited to humanoid forms.

When it comes to computer animation, one particularly successful com-puter vision based interaction technique is motion capture. Motion capturehas become one of the most common ways to create fluid and lifelike ani-mations. They are widely used when producing computer games, animatedfilms or live action films with added computer generated elements. Whenusing motion capture, an actor or multiple actors act in front of camera,their motions are tracked and are applied to virtual characters in real time(online) or at a later point (offline). These capture systems range from single-camera setups that look at the actor from one angle to multi-camera setupswhere the actor is at all times seen from multiple angles to combat occlusionproblems and not to confuse the actor’s limbs.

Motion capture systems can be divided into two categories: with or with-out markers. When using systems with markers, the actors wear bodysuitsthat have active electronic markers that may flash in certain order or passivereflective markers that are seen at all times unless occluded by something.Both types of markers are put in predetermined places that mark certainspots on the skeleton on which the motions will be mapped. The computersoftware uses these markers and their relative positions to deduce which limbsmove where and how the motions should look like on the character.

Markerless systems do not require the actors to wear suits with markingsor to put any such markings on their clothes. The system analyzes theinput from one or multiple cameras and tries to build a skeleton based onits knowledge on human form. These are more suitable especially for videogames, where suiting up to play a game can quickly become cumbersome.Markerless systems are faster to start using, but the tracking accuracy isn’tas good as with passive or active marker systems [24].


Since the actors are most commonly human, their motions sometimeshave to be mapped to remarkably different skeletons than ours. For examplean animated film might have aliens with arms twice as long as a human, whilestill have tiny legs and torso in comparison. Without some sort of animationretargeting the human motions will not produce the wanted effects whenapplied to the alien body. Luckily this is a widely researched field and thereare systems such as KinEtre by Chen et al. [7] that can map human motionsto objects such as chairs or work by Ishigaki et al. [16] that allows real timecontrol of video game characters.

Motion capture technology can also be used on facial animation. As theactors talk and make faces to convey emotion, the markers on the actor’sface move and the tracked motions can be applied to virtual faces just asfull body capture is applied to skeletons. Although motion capture generallyis not used to capture the body type or appearance of the actors, becauseof facial capture an increasing amount of video game characters are beingmodeled after the actors who do their motion capture performances andvoice acting. This way the end result can be as close to reality as possible.

Although motion capture produces very life-like animations, it is notwithout downsides. The equipment cost is high especially when using multi-camera setups, the performance space tends to be very large and they requireactors to perform the motions. Actors introduce the problem of mapping hu-man motions to different skeletons.[17]

Common issues with all computer vision based systems are lighting, cam-era placement and calibration.[17] If the system uses light visible to the hu-man eye, same lighting based problems are present as in photography. Notenough light and the system can not see enough to discern between limbsand on the other hand too much light causes overexposure and the view isagain obscured. Some systems use infrared light that comes from an infraredprojector near the camera and reflects from the target surfaces. These alsohave problems with lighting, since some light sources like the sun and incan-descent light bulbs radiate a lot of infrared light that is often in the samewavelength range as the infrared light from the projectors.

Since cameras can not see through solid objects, camera placement be-comes a thing to consider. Some angles are better than others when tryingto maximize how well different limbs can be tracked visually. Bad cameraangles may cause body parts to occlude other body parts too much for thecomputer to handle. The amount of cameras has to be considered as well.The more cameras you have, the more angles you can cover, but also theequipment cost goes up proportionally.


Camera systems usually have to be calibrated in some way. Some systemsneed to see the scene without the tracked objects so the system knows howthe background looks like and can ignore it while analyzing the data. Somestereo systems like Microsoft Kinect have to be calibrated so the depth datacomes out right. Motion capture systems with multiple cameras have to becalibrated so the camera positions and tracking area is known to the system.

3.2.5 Physically manipulable devices

Physically manipulable devices that track the joint rotations and positionselectronically do not have the occlusion problems that are present in camerabased systems. The devices have internal sensors that keep track on how theskeletal structure is posed and the computer can map this data to differentcharacters much like camera based motion capture systems do. The easeof use of these devices is comparable to vision based dolls, since only thetracking system changes. Also, the electronics provide some extra featuresthat can not be done with simple dolls in front of cameras.

One interesting feature in incorporating electronics into physical skeletonsis that the joints can have motors in them. This makes it possible to makethe physical skeleton take a preconfigured pose quickly without the usermanually resetting it to a starting pose or a existing key frame. One suchdevice is an actuated physical puppet by Yoshizaki et al. [37] On top of theautomated pose reconfiguration, the motors can make the puppet make morehumanlike motions when for example reaching with an arm towards a target.Some joints might resist change because human joints have limited ranges ofmotion and some body parts can be rotated by multiple joints, such as handtwisting can be done from the shoulder or from the elbow. With recordedhuman data the motors can make the redundant rotations more realistic.

Because such devices are quite complicated, the skeletal structure is gen-erally fixed and can not be easily changed to imitate virtual characters withsignificantly different skeletons. [19] Of course the joints can be mapped tothe virtual character’s joints even if the distances between joints are different,but then the results have to be constantly observed on a screen, because thephysical model does not represent the virtual character’s poses anymore.

In contrast to such fixed topology devices, Jacobson et al. have madea modular input device [19] that consists of interchangeable, hot-pluggableparts that can be put together to form different skeletons. The parts can be ofdifferent lengths and shapes and can have multiple connectors, so branching isalso possible. The system automatically notices changes in the topology and


the mapping to a virtual character is done semi-automatically. Compared tomouse and keyboard posing, this approach did not have significant differencein completion time and the accuracy of the final pose, but the amount ofwork needed was clearly lower.

Physically poseable dolls and devices, no matter if they are tracked byvision or internally, always have the advantage of great naturalness. Theusers can look at the devices from different angles and rotate them aroundwith their hands as long as the devices are handy enough. Physical feedbackfrom touch is there and as long as the topology between the physical deviceand the virtual character are close enough to each other, the users do nothave to look at a computer screen all the time to see the if the pose is gettingalong.

3.2.6 Wearable devices

Two early wearable devices with 3D capabilities were Z-Glove and DataGlove.Both of the devices are gloves that have positional and rotation tracking in3D, tactile feedback in fingers and flex sensors, that track how much eachfinger is bent. Manipulating objects with such devices is very natural, butthe devices are now roughly 30 years old and quite dated. [39]. A moremodern version of such a glove is for example the setup by Gallo [12] thatcombines a DG5 VHand 2.0 glove with a Nintendo Wiimote wireless gamecontroller. With this setup, the user can grab objects, translate and rotatethem and control the camera to object distance. The device was intendedfor exploring 3D medical data, but one can see how such a setup could alsowork when posing an virtual character for an animation.

Since the wearable devices have direct contact to multiple parts of skin orclothes, they can add one feature that is missing from many other interactiondevices, which is tactile feedback. For example a glove can give some sort ofstimulus to fingers that are in contact with a virtual object. If the glove hadsome motorized joints and a somewhat stiff skin, the glove could even resistgrabbing motions and make it feel like the object is really in the user’s hand.

Obvious drawback to using wearable devices is that the user has to putthe device on before use. Depending on how complicated and cumbersomethe device is, this could be just a slight annoyance or a major issue. If thesetup process of such a system takes so long that it is sensible to keep it onfor the whole day, then doing animation with it would pretty much tie theuser or users to the task for the whole duration.


With the advent of modern head mounted display (HMD) devices, moreand more wearable devices are coming to the market to enhance the immer-sion of those HMD devices. Many of them might have animation potential,but little research has been done yet. Nevertheless, Thomas Zimmerman saidit best: ”Just as speech is our natural means of communication, the humanhand is our natural means of manipulating the physical world.” [39]

3.3 Output devices

Since animation is such a visual field, using computers to generate animationswithout any visual feedback would be quite frustrating. The visual feedbackdevices essentially come in two flavors, 2D and 3D display devices. In thissection we go through some common devices in use and compare how wellthey fare in 3D animation.

3.3.1 Monoscopic displays

As of now, most desktop computer and laptop monitors are only capable ofproducing a monoscopic 2D image. If we want to present a 3D environmenton such a flat display, the 3D points need to be mapped to a 2D plane.This process is called 3D projection. There are different types of projection,such as orthographic projection and perspective projection. Orthographicprojection ignores such perspective effects as objects in the distance lookingsmaller than ones close to the viewpoint, so it is not feasible for 3D posingand animation. The user has no idea which objects are close and whichfurther away, unless they are occluded by other objects.

On the other hand, perspective projection strives for a realistic presenta-tion of the world. The view is similar to what you would see if you looked ata view with one eye open. You lack depth because of stereo vision, but youcan still have some sort of assumptions on how far the objects are based ontheir size and how they are positioned compared to other objects.

Even if the perspective projection is trying to be as realistic as possibleon a flat surface, trying to grab an object with a 3 DOF or 6 DOF inputdevice while looking at the 2D perspective projection is analogous to tryingto grab an object in real life with one eye closed. On top of that, many inputdevices lack any sort of tactile feedback, so the user can not count on feelingthe object either. This obviously poses a problem when trying to pose a 3Dcharacter.


Research shows that 3D tasks such as positioning and pointing are slowerwhen using monoscopic display devices and 6 DOF input devices, but thereis no significant disadvantage when doing rotations [4, 5]. On the other hand,object selection works quite well on 2D screens, when using a 2 DOF inputdevice such as a mouse with ray casting [25, 26].

In conclusion, monoscopic display devices are well sufficient for objectselection when using 2 DOF input devices, but object manipulation is morecumbersome than with stereoscopic display devices.

3.3.2 Stereoscopic displays

Stereoscopic displays use various techniques to display different informationfor both eyes. These images are combined in the human brain and as a resultuser gets depth perception. This way the user does not have to settle for a2D perspective projection, although both eyes do get their own perspectiveprojections when using a flat display device to provide the image data to theeyes.

The displays either show the images simulatenously to both eyes or usetechniques that make it possible to alter which eye is going to receive theinformation. Movie theaters, some televisions and monitors use either activeshutter glasses or passive polarizing glasses. Active shutter glasses are syn-chronized with the display device and switch at high rates which eye can seethe images. When the shutting rate is high enough the vision system con-siders the input as stereoscopic and the user can see depth. Problem withthe glasses is that they generally make the display slightly darker than itwould be without using them. Moreover, they decrease the viewing angle. Ifthe user is not watching the screen from a suitable angle, the image qualityplummets.

Passive polarization glasses have one lens with horizontal polarization andthe other one with vertical polarization. The display device has filters thatmake it possible for the glasses to show every other frame for the other eye.They do not require electricity and have better viewing angles than theiractive counterparts.

One widely researched field in stereoscopic displays are head mounteddisplays (HMD). These devices are helmets or goggles that display imagerysimulatenously to both eyes even if the users turn their head to either side.They often have some sort of head tracking systems in place, so the user’shead motions change his or her viewpoint in the virtual world. Possibly the


first HMD VR system was the Sword of Damocles by Sutherland [32]. Thedevice was quite cumbersome and the tracking features required a consider-able space above the user, where a tracking system hanged above the user,hence the name. The upcoming devices had improvements, but were stillquite bulky to be used for extended periods of time [27].

HMD devices have had a boost in popularity lately with the introductionof more recreation oriented HMDs like the Oculus Rift by Oculus VR andVive by Valve Corporation and HTC Corporation. They are much lighterand compact than earlier designs and can be worn for a longer time beforethe user gets fatigued by the weight. These devices have an enclosed screenthat is positioned in front of the user’s eyes by using head straps. Betweenthe screen and the user’s eyes are lenses that increase the field of view (FOV).Because the lenses distort the viewed image, the image on the screen has tobe distorted as well to counterbalance the lens distortion. This is why theimage looks rounded and oddly colored on the edges if looked at on a regularcomputer screen. Pictures of this distortion can be seen on the edges offigures 5.1 and 5.7a.

Both of these modern devices track the position and rotation of the device,so the user can look at objects of interest from different angles by movinghis head or upped body. This way actions like camera rotation and move-ment can be done without having to use the hands that are possibly usingmanipulation tools.

Research shows that having stereoscopic view of the scene means fastercompletion times when pointing [4] and positioning [5] in 3D environments,but not when matching object rotations [5]. In neither study was head track-ing found to decrease task completion times, but for camera control purposesthe possibilities are intriguing.

Chapter 4

Environment

In this chapter we introduce the hardware and software used in this project.

4.1 Hardware

The hardware needs to be able to show the user 3D imagery, follow theuser’s head motions and track the user’s hands. These requirements are metby combining an Oculus Rift DK2 head mounted display (HMD) and LeapMotion controller.

4.1.1 Leap Motion Controller

The Leap Motion controller is a hand and tool tracking device developedand manufactured by Leap Motion, Inc. The controller uses infrared totrack hands and tools within an effective range of 0.25 cm to 60 cm. Thetracking angle is 150◦ wide and 120◦ deep. The device weighs 45 g and hasmeasurements of 13 mm × 13 mm × 76 mm.

The tracking device was originally designed to be placed on a flat tablesurface and the users would move their hands above the device. After theOculus Rift and other HMD devices gained popularity, Leap Motion releaseda VR Developer Mount that is attached to a VR-device with double sidedtape and the Leap Motion controller can be placed in a slot in the mount.This way the tracker does not have to be permanently attached to the VR-device and can be still used in desktop mode if needed. A Leap Motioncontroller attached to an Oculus Rift DK2 can be seen in figure 4.1. Due to

24

CHAPTER 4. ENVIRONMENT 25

the contoller placement, the hands can be only tracked when the users havethem in view. If the users turn their heads and keep their hands stationary,the controller loses the hands.

Figure 4.1: A Leap Motion controller attached to an Oculus Rift DK2

The Leap Motion controller has some physical limitations due to the wayhow it works. Two CMOS sensors in the device track infrared light withthe wavelenght of 850 nm[9]. This somewhat restricts how the room wherethe device is used can be lit. Light sources that have high power in the in-frared spectrum near the tracked 850 nm degrade the quality and reliabilityof tracking. Light sources that usually cause less than ideal conditions areincandescent light bulbs, halogen lamps and daylight. LED lamps and fluo-rescent lamps generally cause less problems with tracking. Leap Motion hasa setting called Robust Mode that tries to improve tracking in a brightly litenvironment, but currently it can not completely negate unfavorable lightingconditions.

The sensors sense infrared light originally emitted from the three neigh-


boring infrared LEDs and are reflected from the users hands. If the areawhere the controller is pointed has highly reflecting surfaces the infraredlight can be reflected back to the device further away than intended. Thismay trick the device into believing the reflecting surface is actually a handor the reflected light can cause problems with hand tracking, because someof the light can shine back between the users fingers. For best results thefacing direction should not have highly reflecting objects as mirrors, glossycomputer screens, brightly reflecting white walls or metallic objects such aslamp shades.

This project uses Leap Motion controller firmware version 1.7.0 and SDKversion 2.2.5+26752. The ”Known issues list” for this version includes re-duced pinch gesture tracking in HMD mode when compared to the standarddesktop mode, which is a bit problematic, since holding the characters bodyparts by pinching comes to the users naturally. Some people had more issueswith pinching than others and we included a secondary way of grabbing thebody parts. Known issues list also includes overall reduced tracking qualityin HMD mode compared to the desktop mode, but the quality is graduallygetting better.

4.1.2 Oculus Rift Development Kit 2

Oculus Rift DK2 is a VR HMD device developed by Oculus VR. It featuresa 1920 × 1080 low-persistence PenTile AMOLED display with 75Hz refreshrate, which means each eye gets a resolution of 960× 1080. The device has 6DOF tracking with rotation tracking done internally and positional trackingdone with a near infrared CMOS sensor. The device has lenses between theuser’s eyes and the screen, which gives the device a 100◦ nominal FOV. Thedevice connects via HDMI 1.4b and USB 2.0. The device weighs 440 g.

The device shows each eye simulatenously a separate image throughlenses, that give a wider FOV than it would be without them. Because ofthe distortion caused by the lenses, the original image needs to be distortedas well to compensate for this. Due to this, simply using the Oculus Riftas a secondary monitor and displaying the computer desktop on it will notwork, because the image has to be processed before it can be seen properlythrough the lenses.

The 6 DOF tracking means the users can rotate and move their headsand the virtual world follows, as long as head tracking features have beenimplemented in the software. The users can for example tilt their heads andpeek through windows. Although research shows that head tracking does not


make manipulating objects faster when using a stereoscopic display device [4,5], the head tracking can be used to control the camera. With Oculus Rift,the user can move his or her head and upper body to look at the objectof interest from different angles without having to use another interactiondevice to move the viewpoint. This makes making small adjustments tothe viewpoint extremely fast and the way to move the viewpoint is verynatural to human beings, who in real life can do the same when handlingphysical objects. In the end, being able to move around the object makesthe manipulation faster as well, since the character’s arm might be occludedby the body and a slight adjustment in the user’s position makes it visibleinstead of rotating the object and then possibly back to its original position.

When using a Leap Motion controller in head mounted display (HMD)mode with an Oculus Rift, the Leap Motion controller is attached to the frontpanel of the Oculus Rift device with either glue or tape. This configurationcan sometimes cause problems with positional head tracking, because thepositional infrared camera has to see the infrared LEDs in the Rift’s frontpanel and sides. While the Leap Motion controller itself does not block theview to the LEDs when installed correctly, the user’s hands can sometimesblock enough of the view so the positional tracking loses the headset’s posi-tion. The user can see this as jerky movement of the whole virtual world andsuch unexpected movements can result in some discomfort. To minimize this,we found that turning about 45◦ away sideways from the monitor mountedcamera keeps hands from blocking the LEDs too much. The camera itselfcan also be placed on a table in such a way that it always sees enough of theLEDs.

4.1.3 Wireless Xbox 360 Controller

The Wireless Xbox 360 controller is a gamepad manufactured by MicrosoftCorporation for the Xbox 360 video game console. The controller is alsocompatible with Windows Vista, Windows 7, Windows 8 and some otheroperating systems with the aid of a Wireless Gaming Receiver also manufac-tured by Microsoft. The controller has these inputs

• 2x analog control sticks

• 2x analog triggers

• Digital D-pad

• 11x digital buttons:


– 4x face buttons (A, B, X, Y)

– 2x shoulder or ’bumper’ buttons

– Back and Start buttons

– 2x digital buttons that activate when the analog sticks are presseddown

– Guide button

In this project the controller was used to navigate in the virtual spaceand perform some of the actions the user can also do with hand tracking.This way the users have a choice if they want to do everything with trackedhands or keep one hand on the controller and only use the second hand forposing. Not all of the inputs are utilized and the button layout can be seenin figure 5.6.

4.1.4 Keyboard

In this project the keyboard served two purposes: application control (resetand exit) and as an alternative grabbing tool.

Possible other purposes for the keyboard would be expanded applicationcontrol tasks, such as loading different models, saving and loading anima-tion data and exporting data to different formats. As of now, the keyboardusability is limited.

4.2 Software

The selected software must be able to take advantage of SDKs offered byOculus VR and Leap Motion. Additionally, the wider support for 3D ma-nipulation the software has, the less development time needs to be spent onthe engine and presentation side and more time can be allocated in makingthe user experience natural.

4.2.1 Unity

Unity is a cross-platform game engine developed by Unity Technologies. Itwas originally released in 2005 and has since grown into one of the most used


game engines. Scripting in Unity uses Mono, which is an open source imple-mentation of Microsoft’s .NET framework. Scripting can be done in eitherC#, Boo or UnityScript. UnityScript is syntactically similar to JavaScriptand Unity Technologies even calls it JavaScript in the integrated developmentenvironment (IDE) and in the documentation, but there are some differences.This project was done completely in C# using the Unity version 4.6.1f.

Unity was chosen for this project because it is officially supported by bothOculus VR and Leap Motion. While Oculus VR offers integration for otherengines such as Unreal Engine 4, at the time of writing Leap Motion supportfor Unity is better than for any other relevant platform.

Oculus VR and Leap Motion provide ready made assets for Unity. Themost important assets are prefabricated camera rigs with hand controllersincluded, so the scene can be rendered properly with the Oculus Rift deviceand the hands detected by the Leap Motion are rendered in the correctposition related to the user’s head. Leap Motion assets also include differentlooking skeletal models to represent tracked hands and a few different handmodels users can expand with their own code.

Chapter 5

Implementation

This chapter describes the system that was implemented through iterativeprototyping and user testing. We organized three testing rounds with four,four and five users respectively. A couple of differences between the proto-types and the final version are illustrated in figure 5.7. An overview pictureof the final version is shown in figure 5.1.

5.1 System overview

The basic workflow goes like this: as the system boots up, the users takecomfortable positions on their chairs and set themselves up so their handswill not block the Oculus Rift positional tracking camera and there are noreflective surfaces in front of them, so the Leap Motion controller does notget interference. When the users are ready, they should press the Backbutton on the Xbox 360 controller or Tab on the keyboard. This resets thehead tracking and puts the viewpoint in the default position in front of thecharacter. Now the system has been set up.

The character starts in a standard T-pose and the timeline is set on theleftmost end, which marks the time at zero seconds and the initial keyframehas been created. Now the user can pose the character into the initial posehe or she wants. This is done by manipulating the character with the LeapMotion pinching gesture or using the Xbox 360 controller or keyboard if theuser wants to use the button-to-grab mode. The user can move around thecharacter with the Xbox 360 controller or with head motions. When theuser is done with the initial pose, he or she should move the timeline sliderto another position on the timeline and start posing again. As soon as the

30

CHAPTER 5. IMPLEMENTATION 31

Figure 5.1: Overall picture of the tool. The user is using the Xbox 360controller with one hand and posing the character with the other. To see thebuttons on right the user has to turn his or her head.

user grabs and lets go of the character in this new timeline position, a newkeyframe is created. Then the user can do additional adjustments to the poseand the new keyframe is edited. The user should repeat these steps until the


wanted amount of keyframes are created.

If at any point the user thinks one of the keyframes does not fit the flow,the user can select it with the timeline slider or by using the ”Previous/NextKeyframe” buttons and push the delete keyframe button. On the other hand,if some part of the animation needs additional keyframes between existingones, the user can simply go to that part of the animation and pose normally.The new keyframe is created between the old keyframes. The user can alsouse undo to cancel unwanted changes to a pose, but the undo buffer resetsevery time the user moves in any direction on the timeline or changes betweenkeyframes with the buttons.

While the user is doing posing work or after the animation is ready, he orshe can push the ”Play Animation” button to see how the animation lookslike. If the user feels like the animation is going too fast or too slow, he orshe can adjust the animation length with the buttons on his or her right sideor the D-pad on the Xbox 360 controller.

Because the Leap Motion controller works better the more it can see ofthe user’s fingers, the way you keep your hand while grabbing makes a bigdifference on the tracking reliability when using the pinching mode. Althoughthe gesture can get tiring on the hand after a while, it is best if the user cankeep all the other fingers extended and visible while grabbing with the thumband index finger. A proper way is shown in figure 5.7a.

5.2 Inverse Kinematics

There are two ways how a model and its limbs can be posed: forward kine-matics (FK) and inverse kinematics (IK). If we have a model with an armand take a chain of bones and joints from the shoulder joint to the end ofthe index finger, with forward kinematics we can calculate the position ofthe index finger from the lengths of the bones and angles of the joints. Withinverse kinematics we can calculate the angles of the joints when we knowthe desired end position for the index finger and the lengths of the chain’sbones. In short, FK poses the chain from root to tip and IK posing is basedon the target. Since in this project we want to be able to pose the characterby using the Leap Motion tracker and grabbing the model’s limbs in a virtualspace, inverse kinematics is the proper approach.

Unity has a built in inverse kinematics solver, but it was a part of thePro-only features until the release of Unity 5. As this project was developed


using Unity version 4.6.1, we had to implement the solver by ourselves. Oursolver is a C# implementation of Non-Iterative, Closed-Form, Inverse Kine-matic Chain Solver (NCF IK) as described by Philip Taylor [33]. Comparedto other inverse kinematics solving methods such as the Jacobian InversionMethod [35] or the Cyclic-Coordinate Descent Method [36], the NCF IK canbe solved in one pass with no iteration and is at least in our opinion sim-pler to implement. NCF IK does not support twisting the target limb to acorrect position, but this does not matter in this project since the twistingwill be done by the user manually. The implementation can solve IK chainsof length N. The solver tries to preserve the initial shape of the chain whenreaching for the target.

Pseudocode for the solver can be seen in program 5.1. In this pseudocodethe angles and targets are as follows:

Chain target alignmentEach bone is aligned towards the target before determining what to dobased on the bone’s location in the chain. This is done by aligning thechain by making the vector from the current (sub)chain’s root to thetip of the chain parallel with the vector from the current (sub)chain’sroot to the target. This maintains the overall shape of the chain andkeeps all deformations on a single plane.

IK bone angleAngle between the bone length vector and the bone to goal vector.

FK bone angleAngle between the bone to chain tip vector before solving and bonelength vector.

Maximum FK bone angleThe angle between the bone length vector and the vector from thecurrent (sub)chain’s root to the tip of the chain, if the rest of the chainwas laid out straight with the tip of the chain staying where it wasbefore straightening. This tells how much the bone can rotate awayfrom the current tip of the chain while still being able to reach it withthe rest of the chain.

Maximum IK bone angleSame as the above, except the tip of the chain stays at the IK target.This tells how much the bone can rotate away from the IK target whilestill being able to reach it with the rest of the chain.


Function SolveIKChain( chain )

begin

calculate chain target alignment

for each bone in chain

begin

apply chain target alignment to bone

if bone is last bone

aim bone at target

else if bone is second last

use trigonometry to calculate bone angle

else

begin

determine FK bone angle

determine maximum FK bone angle

determine maximum IK bone angle

IK bone angle = ( FK bone angle / maximum FK bone angle ) *

maximum IK bone angle

end

end

end

Program 5.1: Pseudocode for NCF IK [33]


The solver itself resides in the poseable character’s behavior script calledPoseableCharacter. The solver takes a ChainData object as a parameter.The ChainData contains the chain’s starting body part (for example leftshoulder), ending body part (for example the left forearm) and the end point(for example left wrist, as that is where the left forearm ends). These arestored as Unity Transforms, which all objects in the scene have. Thesetransforms are the transforms the character’s limb objects have. These areused to calculate the bone lengths and to rotate the character’s limbs, as anyrotation to the transforms apply directly to the character. The ChainDataobject also includes the target position, all the transforms in the chain asa list, the lengths of the bones in a list and the total length of the chain.All this information is needed by the solver and the ChainData is built bya method in the PoseableCharacter script that takes the above mentionedthree body part transforms as parameters.

The starting body parts for the chain are set by the user in Unity editor.For arms and hands the starting part is now in the corresponding shouldersand for legs and feet the starting parts are right and left sides of the hips.If the system at some point has a possibility to import models inside thesystem and not the editor, this feature needs to be changed so the editor isnot needed.

5.3 Leap Motion controller

The development for Leap Motion controller features involved downloadingthe Leap Motion’s Unity Core Assets package and modifying the relevantclasses and creating some new ones.

5.3.1 Hand tracking

The basis for the hands are the RigidHand prefab and class found in theLeap Motion’s Unity Core Assets package. RigidHand, which inherits fromSkeletalHand, handles moving the hand model in the scene based on theLeap Motion controller’s input. The hand model itself is made out of variouspolyhedra and the bones in the fingers are not visibly connected to each other,the palm has a see through hole in the middle and there are no visible wristor forearm models. This hand model was chosen over models that are morerealistic human hands with attached arms, because the model covers less ofthe view when in use while still being very hand-like and natural. The Leap


Motion’s Unity assets include plenty of RiggedHands that are modeled afterreal human hands and arms, and they are better suited for virtual worldswhere a different sort of immersion is sought-after. The hand model can beseen in figures 5.1 and 5.7a.

Pinching gesture, grab-by-button and rotating the character’s limbs arefeatures we added to the RigidHand behavior. The pinching gesture checkswhether the thumb and index finger’s finger tips are close enough to eachother. If they are and no grabbing has been initiated yet, the script checks ifthere are any body parts close to the mid point between the fingers. If there isa body part, a sound notification in played, a translucent sphere is created onthe surface of the body part to show where it was grabbed and a ChainDataobject is created, unless the grabbed body part was the character’s hips. Ifthe part was hips, then all translation and rotation of the palm of the hand isdirectly applied to it. If ChainData was created, any rotation of the palm ofthe hand is applied to the starting body part of the chain. If the user grabsthe character’s left foot, all rotations are applied to the joint that connectsthe left thigh to hips. After the rotation has been applied, the system solvesthe IK chain and rotates joints accordingly. The IK target stays between thepinching fingers when the hand moves around. The pinching gesture endswhen the distance between index finger and thumb go over a set threshold.When the pinching gesture ends and there is an existing keyframe at thattime in the animation, the pose stored in that keyframe is edited. Otherwisea new keyframe is created and the pose is stored in it.

If the button to grab a limb is pressed, same things as with pinchinggesture happen, except the point where the limb is selected is at the tipof the index finger. Hand movement and rotation are applied as with thepinching gesture.

There is no limit on how many hands can be used simultaneously, so theuser can have a friend to ”help” with the animation tasks, but the trackingarea quickly gets cluttered and the other user will not have a 3D view of thescene anyway, so the whole idea is quite useless. The only way left and righthands differ in the system is that the right hand gets a red sphere as thegrabbing marker and the left hand gets a blue one. This is useful if two bodyparts are grabbed and the grabbing positions get close to each other.

5.3.2 Image passthrough

With the Leap Motion SDK it is possible to access the infrared cameradata. This makes it possible to feed this data into the Unity scene in front


of the Oculus cameras. Consequently the user can see whatever the LeapMotion sees and this is quite useful if the application requires the use of inputdevices such as a mouse, a keyboard or a game controller. Without the imagepassthrough mode, the user might lose track where the mouse or keyboard iswhile being immersed in the virtual world. The image passthrough featurerequires that the Allow Images setting is enabled in the Leap Motion controlpanel. A comparison between the view with image passthrough enabled anddisabled is in figure 5.2.

The image data is rendered on two quads, one for each eye, that are hier-archically children of the Oculus Rift camera rig prefab. We made the quadstogglable by a button, that can be pressed with Leap Motion tracked hands.The mode is not togglable by keyboard or Xbox 360 controller buttons. Whenthe button is in OFF position, the quads are simply not rendered.

By default the passthrough is on, so the user can see where the OculusRift positional camera and other input devices are while he or she is settlinginto a comfortable position. Some people prefer to keep the passthrough onwhile posing the character, while some find it distracting and turn it off. Italso causes some discomfort to some people, since the image data is slightlydelayed compared to head position and orientation changes. The differenceis not much, but it is easily noticeable if the user turns his or her head leftand right repeatedly.

Figure 5.2: Leap Motion image passthrough ON and OFF


5.4 Oculus Rift

Although Oculus Rift provides its own Unity assets, for this project the assetscame from Leap Motion. The Leap Motion VR assets combines the Oculus’VR assets with Leap Motion’s own. The main difference between the HMDprefabs is the added Leap Motion tracker between the eye positions. Withthis positioning the tracked hands are rendered in the correct positions inthe virtual space related to the user’s head.

The user can reset the Oculus Rift position by pressing Tab on the key-board or pressing the Back button on the Xbox 360 controller. This resetsthe initial position for the user in the virtual space, puts the user interfaceto his or her left and right sides and places the poseable character in frontof him or her. Resetting is usually needed in three situations. First is whenbeginning to use the tool and still looking for a comfortable position thatis favorable for the Leap Motion controller and the Oculus Rift positionaltracker.

The second reason is the user might have a swivel chair with rollers andhe accidentally moves around too much when immersed in the virtual space.Swiveling too much can cause the Oculus Rift positional tracking to fail whenit can not see enough of the tracking LEDs. The user can also turn too faraway from a table to place the Xbox 360 controller on or use the keyboard. Inthis situation the user can just return to his or her original physical positionand hit the reset position button.

The situation for the third reason can happen if the user accidentallyloses the poseable character when moving around the virtual space with theXbox 360 controller. Instead of flying around and looking for it the user canjust hit the reset and the character is again in front of him or her withoutactually resetting the scene and all keyframes.

In the end the camera rig prefab LeapOVRPlayerControllerButtonScrollthat includes the Oculus Rift and Leap Motion support was left mostly un-changed. Additions are the widgets for the user interface and the possibilityto toggle the image passthrough with a button.

5.5 Poseable character

The character model used as the default testing character in this project isa model called Alpha by Mixamo Inc.[23] The character is a generic male


bipedal humanoid and can be seen in figure 5.7a. After importing this modelinto Unity, the Ragdoll Wizard tool in Unity editor was used. The tool createssuitable colliders, rigidbodies and joints for the character. The user has todrag correct limbs from the character’s hierarchy to the wizard’s properties,which means the tool does not care how many bones the character modelhas.

The resulting rigidbodies and joints are not needed for this tool to work,so they can be deleted after the ragdoll creation is done. There is no need forphysics so the rigidbodies will not do anything and the joints are not used atany point. The colliders are necessary for the body part grabbing and floorto work. When a collider touches the floor, the floor’s actions are triggered.The floor is more thoroughly explained in section 5.7. When the grabbingis triggered by pinching or pushing a button, an overlap sphere checks if thethere are any colliders inside the radius. If a collider is detected, actionsdescribed in section 5.3.1 are performed.

The character has a behavior script PoseableCharacter attached to it.The script is responsible for building ChainData and solving IK for the Chain-Data.

5.6 User Interface

The user interface consists of buttons, text labels and a slider that look likethey are floating in space around the user. Hierarchically the buttons andslider are placed as children of the Oculus camera rig. This placement meansthat after the Oculus Rift tracking position has been reset, the buttons willstay where they are, even if the user moves around on his or her chair. Thisway the buttons on left and right will come visible by turning the head andthe user can get closer to or further away from the buttons if he or she feelslike it. If for example the tracking distance deteriorates quickly with distancebecause of unfavorable lighting conditions the user can move his or her facetowards the buttons so the tracking distance from the Leap Motion trackerto the user’s fingers gets shorter and the buttons can be pushed again.

5.6.1 Timeline Slider

The animation timeline is shown as a horizontal slider, which has a markerthe user can move with his or her hands or an Xbox 360 controller. Thetimeline shows created keyframes as blue dots above slider. The leftmost


point of the slider is the beginning of the animation and the pose at timezero seconds. The rightmost point is the end of the animation and the poseat user set maximum seconds. The timeline is shown in figure 5.3.

The slider and its behavior script SliderTimeline is based on the LeapMotion Unity widget slider that comes with Leap Motion’s Unity Core As-sets package. Modifications were made so the slider shows dots for eachkeyframe’s position in the animation’s time frame instead of showing a setnumber of dots spread evenly. The largest modification is the scrubbing fea-ture. When the user moves the marker on the timeline, the slider tells theAnimationController (described in section 5.8 to set the character’s pose towhat it is supposed to be at that part of the animation. It also tells theAnimationController what the currently selected keyframe is.

When the scene has been reset and the camera is located in the defaultposition, the timeline slider is positioned below and slightly further away fromthe viewpoint than the poseable character is. With this position the usershould be able to handle the slider and character without moving around onthe chair, since the tracking distance is enough to cover the whole area. If forwhatever reason the tracked hands start malfunctioning at the slider distancethen the user can physically move a bit forward and move the viewpointbackwards with the Xbox 360 controller so the slider comes closer while thecharacter stays the same distance away from the user.

Figure 5.3: Timeline slider with previous and next keyframe buttons

In the first version of the timeline slider the user first had to move theslider marker to a position where he or she wanted to create a new keyframeor edit an existing one, make a pose with the character and then hit a CreateKeyframe button on the left side menu or hit a corresponding button onthe Xbox 360 controller. If the slider marker was set to the end of the ani-mation when creating this new keyframe, a new one second clip was addedto the animation and its length was controlled with buttons on the rightside of the user. If the slider marker was between two existing keyframes,


a new keyframe was created between them while the total length betweenthe previous keyframes didn’t change. This means the new keyframe was anintermediate pose between the original start and end poses and the start andend poses themselves did not change in anyway. Keyframes were removedwith a Remove Keyframe button or pushing an equivalent Xbox 360 con-troller button. This removed the keyframe at the slider marker position orthe first keyframe on the left from the marker, if the marker was betweenkeyframes.

In user tests this version of the timeline proved to be slightly confusing tothe users. Users regularly forgot to push the Create Keyframe button beforemoving to a new position on the slider. This removed all the changes the userhad made and no new keyframes were created and no existing keyframes wereedited. Sometimes the users were confused if their actions had been saved inany way.

The next and final version removed the Create Keyframe button com-pletely. Now the user moves the slider marker on the slider just as beforeand poses the character. Each time the user lets go off the character a newkeyframe is created or an existing one is modified. This removes the need forthe user to remember to push the Create Keyframe button before startingon his or her next posing action.

In this version the whole animation does not get longer when a newkeyframe is created with the slider marker at the right end of the slider. It justmarks the end of the animation and the final pose of the character. Insteadthe buttons that previously set the amount of time between keyframes nowcontrol the length of the whole animation. The default setting is 5 seconds.If the user sets the total length as 15 second the leftmost point of the slideris the animation at zero seconds, middle point is 7.5 seconds and the end ofthe animation occurs at 15 seconds.

The users got a hang of this version faster than the previous one. Theusers did not have to ask if their actions had been saved and if they can moveto an another part of the animation.

5.6.2 Buttons

The buttons are based on the togglable button found in the Leap Motion’sUnity Core Assets package called ButtongToggleBase. The togglable buttonis used for the passthrough and floor locking buttons, with only modificationsto the logic that tells what the buttons should do when they are on or off.


The rest of the buttons have a new base called ButtonSingleBase, that is onlyon while the button is kept pressed. This type of a button is more suitablefor triggering actions rather than changing between modes.

The buttons are used by poking at them with the RigidHand that istracked by the Leap Motion controller. Any finger or part of the hand modelworks while pressing the buttons. The buttons have an adjustable depth howfar the buttons have to be pushed down before they activate, which meansthey are not static and have visible depth in them. Some of the buttons havetogglable on and off modes which activate and disable features while most ofthe buttons are single press and perform the binded action each time theyare pressed down.

The buttons and their actions are:

UndoUndoes the previous posing if possible. A list of previous positions isheld as long as the user does not move back or forth on the timeline ordoes not delete the current keyframe.

Remove keyframeRemoves the keyframe that is the first one on the left of the currentslider position or the keyframe exactly in the current slider position,depending on how the slider is positioned. The keyframe at timelineposition 0 can not be deleted.

Passthrough (ON/OFF)Toggles the Leap Motion image passthrough mode. When ON the usercan see a black and white picture of what the Leap Motion tracker sees.When OFF the area behind the character and buttons is simply black.This button is useful if the user has lost track of the table, keyboardor gamepad. Default state is ON.

Lock limbs to floor (ON/OFF)Toggles if the character’s limbs stick to the floor. When ON, the char-acter’s limbs stay at the positions they first touched the floor. WhenOFF the limbs slide on the floor as the character moves above the floor.

Previous / NextButtons on the left and right side of the timeline slider. Buttons jumpto the previous and next keyframes respectively.

Increase / Decrease animation lengthIncrease or decrease the total animation length with these buttons.


This changes the total length of the timeline slider in seconds, but doesnot change the length visually in virtual space.

The button layout can be seen in figures 5.4 and 5.5.

Figure 5.4: Button layout on the left side of the user

5.6.3 Wireless Xbox 360 Controller

The Xbox 360 controller implementation is not complicated. Unity inputmanager was used to bind buttons for all of the actions described in figure 5.6.Behaviour scripts for floor, timeline slider, animation controller and camerarig check if any of those actions are triggered and act accordingly.

5.7 Floor

The floor is a blue grid that is placed under the character in the virtualworld. This floor keeps the character’s limbs from going under it, unless thecharacter is grabbed by the hips and forced through the floor. This will causeheavy deformations to its arms and legs. The point of the floor is to show


Figure 5.5: Button layout on the right side of the user

the user a plane which the character can walk on and aid with poses likekneeling and walking.

The floor has two modes: lock limbs to floor ON and OFF. The differencebetween them is if the limb that has touched the floor will stay in place evenif its hierarchical parents move it around. If the locking mode is ON andthe character’s feet are touching the ground, they will stay in place when thecharacter is moved around from the hips. If the mode is OFF, the feet willslide on the floor as the character is moved around. No matter which modeis on, the character’s knees will bend if the character is pushed towards thefloor by the hips and the knees will straighten if the character is moved awayfrom the floor. Same rules apply to other body parts as well.

The floor does this by keeping track of body part colliders that come incontact with the floor. If a contact happens, a ChainData object is created.This ChainData object is identical to what is created when a body part isgrabbed by pinching or pushing a button to grab. In a sense pinching fingersare created in the position where the contact happens. If the locking is on,the pinch will stay in place and if the mode is off, the pinching will followthe characters movements while staying in the floor plane. As long as thecontact is happening, the ChainData persists and the IK chain is solved forthat body part.


Figure 5.6: Xbox 360 controller layout

5.8 Animation controller

The AnimationController behavior script is responsible for keeping trackof the keyframes, undo system, setting the character’s pose when changingbetween keyframes or scrubbing on the timeline and playing the animationwhenever the ”Play Animation” button is pressed.

Keyframe data is stored in Keyframe objects which the AnimationCon-troller has a list of. When the animation is played, the AnimationControllerinterpolates the character’s current pose based on the keyframe data, howthey are positioned on the timeline and how long the whole animation is.

Whenever the user lets go off the character, AnimationController checksif a keyframe exists at the current position. If there is, then it is updatedwith the current pose, otherwise a new one is created. The keyframes storetheir position on the timeline in percentage of how far on the right they are.In the list they are in the same order as they appear on the timeline, sothe method that handles animation can go through the list sequentially anduse the percentage figures to calculate how long the duration between eachkeyframe should be.

The undo system stores poses in a list and a new pose is added each timethe user starts grabbing. This means the user can cancel his or her previous


(a) On the left the very first IK test with Oculus Rift and Leap Motion controllerincluded. On the right doing the same grab in the final version.

(b) Difference between the original menu design and the final design. The finaldesign is much simpler to use after the removal of Create / Edit Keyframe buttons.

Figure 5.7: The tool in prototype and final phases.

posing or posings by pressing the undo button. The undo buffer is clearedafter any action is performed on the timeline.

When the user uses the timeline to scrub, the timeline tells Animation-Controller to set the character’s pose to a pose stored in a keyframe or tointerpolate a pose between keyframes, if the position on the timeline is be-tween two keyframes.

Chapter 6

Evaluation

6.1 Usability testing

To see how the tool works in real life, a final user test was organized withfive professional game industry animators. Three of them had used an earlierversion of this tool briefly before and the other two had seen a video of it inaction.

The testers were told to familiarize themselves with the controls for fiveminutes, use 15 minutes to animate whatever animation clip they want andimmediately after that fill a system usability scale questionnaire and answerfive open questions. After answering the predetermined questions the testershad time to provide any other feedback they might have.

The system usability scale (SUS) was developed by John Brooke in 1986 [6].It is a simple and fast 10 item Likert scale that has been widely used in us-ability testing across various technologies. The 10 questions were selectedfrom a pool of 50 questions based on results from 20 testers. The standard10 questions can be seen in table 6.1. Over 40% of current usability testingis done with SUS. [28]

Of the 10 questions odd numbered questions have a positive tone in theirwording while the even numbered questions are worded negatively. To cal-culate the SUS score, each odd numbered question score is reduced by one(score - 1) and each even numbered score is taken away from five (5 - score).This gives each question a score of 0-4, where zero is always the worst and 4is always the best result. Then all of the question scores are tallied togetherand multiplied by 2.5 to get a score of 0 to 100.

47

CHAPTER 6. EVALUATION 48

Although the scoring is between 0 and 100, the SUS scores should not beconsidered as percentages. A better way to make sense of the results is toconvert the obtained SUS score to a percentile rank using tables based onresearched data. A table of such rankings can be seen in 6.2. [28] This wayyou can find out roughly how usable the system really is. They found outthat the mean score is 68, so a system with a SUS score of 68 scores betterthan approximately 50% of systems.

For the open questions, the testers wrote their answers on a piece of paperimmediately after filling the SUS questionnaire and before any debriefing.The open questions were as follows:

• What was good in the tested system?

• What was bad in the tested system?

• What would you change or add?

• If the system would work perfectly, how would you integrate it intoyour tool palette and workflow?

• For what sort of animation purpose or what work stage this sort ofsystem would work best?

After these questions the testers had time to talk about the system andoffer feedback on issues that were not touched upon by the questionnaire andopen questions. General conversation about how they liked the hardware andsoftware was had with the testers.

6.2 Usability testing results

The mean SUS score from the five testers is 62.5, which gives the systemgrade D and places it in the higher end of percentile range 15-34 accordingto the curved grading scale interpretation of SUS scores by Sauro [28]. Thegrading scale is shown in table 6.2.

Looking at the average scores per question in figure 6.1, the major scorereducers are user confidence when using the system at 1.4 points and usereagerness to use the system frequently at 1.6 points, both on a 0-4 pointscale. In the debriefing the testers said the major reasons they had, thatdecrease their confidence in using the system, were occasionally unreliable


1 = Strongly disagree, 5 = Strongly agree 1 2 3 4 51. I think that I would like to use this system frequently.2. I found the system unnecessarily complex.3. I thought the system was easy to use.4. I think that I would need the support of a technicalperson to be able to use this system.5. I found the various functions in this system were wellintegrated.6. I thought there was too much inconsistency in thissystem.7. I would imagine that most people would learn to usethis system very quickly.8. I found the system very cumbersome to use.9. I felt very confident using the system.10. I needed to learn a lot of things before I could getgoing with this system.

Table 6.1: System usability scale questionnaire

SUS Score Range Grade Percentile Range84.1-100 A+ 96-10080.8-84 A 90-9578.9-80.7 A- 85-8977.2-78.8 B+ 80-8474.1-77.1 B 70-7972.6-74 B- 65-6971.1-72.5 C+ 60-6465-71 C 41-5962.7-64.9 C- 35-4051.7-62.6 D 15-340-51.7 F 0-14

Table 6.2: Curved grading scale interpretation of SUSscores according to Sauro [28]


tracking and how the implemented timeline worked. The testers said theissues were major enough that they would not use the system frequently, butwould like to use the tool for certain tasks especially if the troublesome partswere addressed.

0 1 2 3 4

I think that I would like to usethis system frequently

I found the system unnecessar-ily complex

I thought the system was easyto use

I think that I would need thesupport of a technical person tobe able to use this system

I found the various functions inthe system were well integrated

I thought there was too muchinconsistency in the system

I would imagine that most peo-ple would learn to use this sys-tem very quickly

I found the system very cum-bersome to use

I felt very confident using thesystem

I needed to learn a lot of thingsbefore I could get going withthis system

1.6

3.4

2.6

3.4

3

2.2

2.6

2

1.4

2.8

Average score (higher is better)

Figure 6.1: The mean SUS scores by question. Error bars represent 95%confidence interval

Better scores were attained on questions that address how easy the systemis to use and how fast the testers could adapt to the system. The testerssaid the character posing was very easy and natural to learn and the userinterface was nice and simple to get a grasp of easily. However, the testerstook some time to get used to the timeline slider and how it operates, butotherwise the system is easy to learn.

When the testers were given the choice between a pinching gesture andpressing a button to grab a limb, in the end all of the testers started using thebutton mode. In the debriefing they said it was much faster and more precise,because the pinching mode sometimes stops pinching if the hand positionbecomes too awkward for the Leap Motion controller to track. Starting thepinch is also unreliable if the hands are on the edge of the Leap Motion


controller’s range. Releasing the pinch also causes some residual movementin the limbs unless the user can keep his or her hand very steady while lettinggo. One user wanted to use the pinching much more than the button, becausehe felt pinching the limbs to manipulate them is more natural than havingto press a button and just depend on the index finger’s motions. The testersaid positional manipulation was fine with the button mode, but orientationis easier with pinching. In the end the tester kept trying to use the pinchingmode, but eventually switched to using the button mode. At the time ofwriting, the known issues for Leap Motion controller’s VR mode lists pinchtracking as unreliable, so this issue is hopefully solved in the future by LeapMotion updates.

When asked ”What was good in the system?”, one tester answered posingthe character was intuitive, fun and easy once they learned how to do it.Another tester also noted how intuitive especially the pinching manipulationwas, while also appreciating how the user interface displayed the basic func-tions. Two testers especially liked how the Oculus Rift made it feel like thecharacter is there and it can be looked at from different viewpoints easily.The button mode to grab the character was also stated to work very well.

To the question ”What was bad in the system?” three testers found thekeyframe handling lacking. They either had trouble with the timeline orkeeping track on where the character is positionally in the scene betweenkeyframes. Hand responsiveness came up as lacking and a tester had troublerotating the character at the hips, because the rotation comes from the palmand to rotate the palm the whole arm has to be moved. One tester hadproblems with placing the character’s feet on the floor.

To ”What would you change or add?” three testers gave quite similaranswers by wanting more advanced rig features. For example, they wouldlike to see limbs or joints locked in place and hand gestures directly mappedto the character’s hand. A ghost that shows the character’s position andpose in the previous keyframe was a wanted feature. A better timeline ingeneral came up in the answers and one tester would like to see an indicatorthat tells where the Oculus Rift positional tracking camera is, so it is notblocked by hands as easily.

”If the system could work perfectly, how would you integrate it into yourtool palette and workflow?” was met with answers from two opposite ends. Amajority of four testers would most likely use it for rough draft work, initialkey framing or testing out deformations in the rigging stage. However, onetester would have liked to try polishing existing animations made in someother software.


The question ”For what sort of animation purpose or what work stagethis sort of system would work best?” had similar answers as the previousquestions. While rough work was on top, there were suggestions to use it asa previewing tool for artists, so they could observe their models with it. Alsoediting existing motion clips came up again, from a different tester than inthe previous question.

During the debriefing the testers mentioned fatigue as a possible issue.Since the Leap Motion controller is attached to the Oculus Rift, it is notadvisable to rest one’s elbows on a table, because the table will interfere withthe tracking quality. This means the user’s hands have to be unsupportedwhen using the system, unless the chair happens to have suitable arm rests.On the other hand, upper body movement is quite important if the userwants to fully utilize the Oculus Rift’s positional tracking features, so thearm rests would not help much in the long run. Additionally, if using thepinching motion, the user’s hands have to be kept in a pose that makes thefinger as visible to the Leap Motion controller as possible. This hand posecan get a bit tiring after a while. Using the button mode is easier on thehand, but the arm fatigue is still a problem.

The testers also reported problems with the system’s floor implementa-tion. Sometimes the character’s feet jumped between different pose configu-rations while the character was supposed to be stationary. The testers likedhow the character takes the floor into account when solving the pose, butthe floor implementation needs some work so it is more intuitive to use. Asof now, the user can select between a mode that locks the part of the bodythat contacts the floor in place, or the body part just slides on the floor. Thelocking system was good when the joint rotations were being manipulated,but it made changing the floor touching body part’s position difficult. Tochange the position, the user has to disable the floor locking mode or alter-natively lift the touch part off the floor and place it elsewhere. The testersalso noted that the character’s heel can in some situations go through thefloor.

Movable keyframes were the most wanted feature for the timeline. Whilethe inaccurate timeline control when using the Leap Motion control is also aproblem, being able to fix the keyframes later on alleviates such problems. Ingeneral the timeline was found to be a bit difficult to use and some anotherapproach to the time manipulation problem should probably be thought of,since the task is a quite precise one and the implemented timeline is toorough for it.

A suggested way by a tester to handle viewpoint movement is to create


a turntable under the character that can be used to rotate the character infront of the user. This would not affect the animation data, but would makeit possible to change the viewpoint with the Leap Motion controller, so theuser does not have to switch to using the gamepad for camera control.

As a positive thing the testers said the system keeps things simple, whichmakes it quite easy to learn. It did not take long for the users to get proficientenough to do animations they wanted. The provided five minutes was a bittoo short to get a good handle on how to best use the Leap Motion controller,but the system’s interface was easy to learn in that time. The implementedfeatures were easy to use with the provided controls and the user interfacewas intuitive, except for the timeline. There were no problems when theusers had to use a gamepad and Leap Motion controller alternatively.

Chapter 7

Conclusions and future work

The goal was to create an animation tool that should feel natural for the userand be more productive than the dominant keyboard and mouse combination.In this chapter we ponder the results and see how well we managed to meetour goals.

7.1 Conclusions

The grade D from SUS questionnaire does not look good on paper and thereason for the grade became clear when discussing with testers about theproblems they had. The tracking for hands was too unreliable for the testersto feel like they would like to use the system frequently. They did not feelconfident when using the tool, because at any moment the hands could disap-pear or jerk in some direction and mess up the task at hand, especially whenusing the timeline. Using the timeline with the Xbox 360 controller is easierand more precise than with the Leap Motion, because the hands do shake abit and the timeline work is quite precise. Indeed, the slider probably wasnot ideal for this sort of timeline work and there should be another approachon how to solve the problem. One idea is presented in the next section 7.2.

On the other hand, the testers said the posing with Leap Motion felt verynatural after they learned the limitations of the device and did not try todo too long reaching motions. They said they could see use for such tools intasks such as rough initial keyframing, manipulating existing animation dataor using it as a tool for artists to inspect their model in 3D. As long as someneeded features were implemented and the hand tracking was more reliable,they could integrate it into their tool palette.

54

CHAPTER 7. CONCLUSIONS AND FUTURE WORK 55

The hand tracking is hopefully improving over time, since the Leap Mo-tion VR mode is still in beta and the tracking quality is not as good as withthe original desktop mode. We will look into its development with interest.

The head tracking worked very well in conjunction with object manipula-tion by hand tracking. While the user had two hands grabbing the character,he still could change the viewpoint without having to free one hand to use thegamepad. The users expressed how natural it feels to work with an virtualobject that looks like it is really there.

In the end, the tool performed much better in the posing tasks than moreapplication control -like tasks such as the keyframe handling and timelinescrubbing. Combining the tool with the keyboard and mouse using the imagepassthrough feature as an aide is something that could increase the tool’susability and productivity by a large margin. The technology is not quitethere yet because of the reliability issues, but the issues are likely to bealleviated by software updates.

7.2 Future work

Although the system performed fairly well for posing tasks, with the presentfeatures it is not very useful, because the animation data can not be exportedand the character model can only be changed in the Unity editor and thewhole system needs to be built again. Testers also had some features inmind that would make the system more desirable. To make the system moreuseful, here are some ideas for future work:

Better timelineThe timeline presented in this version was found to be difficult to useand too inaccurate when using the Leap Motion. Possible alternativeis to implement a more traditional timeline found in current leadinganimation software and control it with a mouse. Since the Oculus Riftand Leap Motion combination has the passthrough ability, finding themouse on a table would not be a problem. The computer mouse is veryaccurate and established for timeline control after all, so this approachshould be considered.

Controlling the character’s hands with hand gesturesOne obvious use for the Leap Motion controller is to control the char-acter’s hands directly. The Leap Motion SDK gives accurate joint

CHAPTER 7. CONCLUSIONS AND FUTURE WORK 56

positions, so the character’s hand’s joints could be mapped to them,even if the hand was not completely human-like.

Turntable below the characterThis idea came from the testers. A spinnable disc under the charac-ter that can be grabbed by Leap Motion hands would make rotatingaround the character faster and more intuitive than using the Xbox360 controller.

Animation data import and exportTo make the system truly useful, the resulting animation needs to beusable elsewhere. FBX export seems like the most sensible solution.Animation data importing is also useful, so the system can be usedto inspect and modify animation data made with the system before orwith other software.

Model importRight now it is not possible to import a character inside the system.The user has to use the Unity editor to import the character and buildthe system again. On top of that, the user has to run the ragdoll wizard,drag the relevant transforms in place and tell the PoseableCharacterscript which transforms are to be used as IK chain starting points.

Locking joints and body parts in placeThe testers would have liked to lock certain body parts in place, muchlike the floor does with the limbs that come in contact with it. A buttonpress that locks the grabbed limb in place should be implemented.

Bibliography

[1] Balakrishnan, R., Baudel, T., Kurtenbach, G., and Fitz-maurice, G. The Rockin’Mouse: integral 3D manipulation on a plane.In Proceedings of the ACM SIGCHI Conference on Human factors incomputing systems (1997), ACM, pp. 311–318.

[2] Bell, G. Bell’s trackball. Written as part of the ”flip” demo to demon-strate the Silicon Graphics (now SGI) hardware (1988).

[3] Bier, E. A. Skitters and jacks: Interactive 3D positioning tools. InProceedings of the 1986 Workshop on Interactive 3D Graphics (NewYork, NY, USA, 1987), I3D ’86, ACM, pp. 183–196.

[4] Boritz, J., and Booth, K. S. A study of interactive 3D pointlocation in a computer simulated virtual environment. In Proceedings ofthe ACM symposium on Virtual reality software and technology (1997),ACM, pp. 181–187.

[5] Boritz, J., and Booth, K. S. A study of interactive 6 DOF dockingin a computerised virtual environment. In Virtual Reality Annual In-ternational Symposium, 1998. Proceedings., IEEE 1998 (1998), IEEE,pp. 139–146.

[6] Brooke, J. SUS - a quick and dirty usability scale. Usability evaluationin industry 189, 194 (1996), 4–7.

[7] Chen, J., Izadi, S., and Fitzgibbon, A. Kinetre: animating theworld with the human body. In Proceedings of the 25th annual ACMsymposium on User interface software and technology (2012), ACM,pp. 435–444.

[8] Chen, M., Mountford, S. J., and Sellen, A. A study in in-teractive 3-D rotation using 2-D control devices. SIGGRAPH Comput.Graph. 22, 4 (June 1988), 121–129.

57

BIBLIOGRAPHY 58

[9] Colgan, A. How does the Leap Motion con-troller work?, Aug. 2014. http://blog.leapmotion.com/

hardware-to-software-how-does-the-leap-motion-controller-work/.Accessed: 23 January 2015.

[10] Feng, T.-C., Gunawardane, P., Davis, J., and Jiang, B. Mo-tion capture data retrieval using an artist’s doll. In Pattern Recogni-tion, 2008. ICPR 2008. 19th International Conference on (2008), IEEE,pp. 1–4.

[11] Frohlich, B., and Plate, J. The cubic mouse: A new device forthree-dimensional input. In Proceedings of the SIGCHI Conference onHuman Factors in Computing Systems (New York, NY, USA, 2000),CHI ’00, ACM, pp. 526–531.

[12] Gallo, L. A glove-based interface for 3D medical image visualiza-tion. In Intelligent interactive multimedia systems and services. Springer,2010, pp. 221–230.

[13] Hand, C. A survey of 3-D input devices. DMU CS TR94/2 (1994).

[14] Hand, C. A survey of 3D interaction techniques. Computer GraphicsForum 16, 5 (1997), 269–281.

[15] Hutchins, E. L., Hollan, J. D., and Norman, D. A. Directmanipulation interfaces. Human–Computer Interaction 1, 4 (1985), 311–338.

[16] Ishigaki, S., White, T., Zordan, V. B., and Liu, C. K.Performance-based control interface for character animation. In ACMTransactions on Graphics (TOG) (2009), vol. 28, ACM, p. 61.

[17] Ishii, H., and Ullmer, B. Tangible bits: Towards seamless interfacesbetween people, bits and atoms. In Proceedings of the ACM SIGCHIConference on Human Factors in Computing Systems (New York, NY,USA, 1997), CHI ’97, ACM, pp. 234–241.

[18] Jacob, R. J. K., Sibert, L. E., McFarlane, D. C., and Mullen,Jr., M. P. Integrality and separability of input devices. ACM Trans.Comput.-Hum. Interact. 1, 1 (Mar. 1994), 3–26.

[19] Jacobson, A., Panozzo, D., Glauser, O., Pradalier, C., Hil-leges, O., and Sorkine-Horning, O. Tangible and modular inputdevice for character articulation. ACM Transactions on Graphics (pro-ceedings of ACM SIGGRAPH) 33, 4 (2014).

http://blog.leapmotion.com/hardware-to-software-how-does-the-leap-motion-controller-work/

http://blog.leapmotion.com/hardware-to-software-how-does-the-leap-motion-controller-work/

BIBLIOGRAPHY 59

[20] Jankowski, J., and Hachet, M. A Survey of Interaction Techniquesfor Interactive 3D Environments. In Eurographics 2013 - STAR (Girona,Spain, May 2013).

[21] Jones, P. E. Three-dimensional input device with six degrees of free-dom. Mechatronics 9, 7 (1999), 717 – 729.

[22] Margono, S., and Shneiderman, B. A study of file manipulation bynovices using commands vs, direct manipulation. Sparks of Innovationin Human-computer Interaction (1993), 39.

[23] Mixamo Inc. Mixamo Alpha-model, Mar. 2015. https://www.mixamo.com/editor/new/729?character_id=110749. Accessed: 1 April 2015.

[24] Moeslund, T. B., Hilton, A., and Kruger, V. A survey of ad-vances in vision-based human motion capture and analysis. Computervision and image understanding 104, 2 (2006), 90–126.

[25] Oh, J.-Y., and Stuerzlinger, W. Moving objects with 2D input de-vices in CAD systems and desktop virtual environments. In Proceedingsof Graphics Interface 2005 (2005), Canadian Human-Computer Com-munications Society, pp. 195–202.

[26] Poupyrev, I., Ichikawa, T., Weghorst, S., and Billinghurst,M. Egocentric object manipulation in virtual environments: empiri-cal evaluation of interaction techniques. In Computer Graphics Forum(1998), vol. 17, Wiley Online Library, pp. 41–52.

[27] Rolland, J. P., Holloway, R. L., and Fuchs, H. Comparisonof optical and video see-through, head-mounted displays. In Photonicsfor Industrial Applications (1995), International Society for Optics andPhotonics, pp. 293–307.

[28] Sauro, J., and Lewis, J. R. Quantifying the user experience: Prac-tical statistics for user research. Elsevier, 2012.

[29] Shneiderman, B., and Plaisant, C. Designing the User Inter-face: Strategies for Effective Human-Computer Interaction (5th Edi-tion). Pearson Addison Wesley, 2010.

[30] Shoemake, K. ARCBALL: A user interface for specifying three-dimensional orientation using a mouse. In Proceedings of the Conferenceon Graphics Interface ’92 (San Francisco, CA, USA, 1992), MorganKaufmann Publishers Inc., pp. 151–156.

https://www.mixamo.com/editor/new/729?character_id=110749

https://www.mixamo.com/editor/new/729?character_id=110749

BIBLIOGRAPHY 60

[31] Smith, G., Stuerzlinger, W., Salzman, T., Watson, B., andBuchanan, J. 3D scene manipulation with 2D devices and constraints.In Graphics Interface (2001), vol. 1, pp. 135–142.

[32] Sutherland, I. E. A head-mounted three dimensional display. InProceedings of the December 9-11, 1968, fall joint computer conference,part I (1968), ACM, pp. 757–764.

[33] Taylor, P. Non-iterative, closed-form, inverse kinematic chain solver(NCF IK). In Game Programming Gems 8, A. Lake, Ed. Cengage Learn-ing, 2010, pp. 141–151.

[34] Unity Technologies. Unity documentation: PositioningGameObjects, Mar. 2015. http://docs.unity3d.com/Manual/

PositioningGameObjects.html. Accessed: 17 March 2015.

[35] Watt, A., and Watt, M. Advanced animation and rendering tech-niques. Addison-Wesley (1992).

[36] Welman, C. Inverse kinematics and geometric constraints for artic-ulated figure manipulation. Master’s thesis, Simon Fraser University,1993.

[37] Yoshizaki, W., Sugiura, Y., Chiou, A. C., Hashimoto, S., In-ami, M., Igarashi, T., Akazawa, Y., Kawachi, K., Kagami, S.,and Mochimaru, M. An actuated physical puppet as an input de-vice for controlling a digital manikin. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems (2011), ACM,pp. 637–646.

[38] Zhai, S. User performance in relation to 3D input device design. SIG-GRAPH Comput. Graph. 32, 4 (Nov. 1998), 50–54.

[39] Zimmerman, T. G., Lanier, J., Blanchard, C., Bryson, S.,and Harvill, Y. A hand gesture interface device. In Proceedings ofthe SIGCHI/GI Conference on Human Factors in Computing Systemsand Graphics Interface (New York, NY, USA, 1987), CHI ’87, ACM,pp. 189–192.

http://docs.unity3d.com/Manual/PositioningGameObjects.html

http://docs.unity3d.com/Manual/PositioningGameObjects.html