Natural Interfaces for Augmented Reality

Natural Interfaces for Augmented Reality

Mark BillinghurstHIT Lab NZ

University of Canterbury

Augmented Reality Definition

Defining Characteristics [Azuma 97] Combines Real and Virtual Images

- Both can be seen at the same time Interactive in real-time

- The virtual content can be interacted with Registered in 3D

- Virtual objects appear fixed in space

AR Today Most widely used AR is mobile or web based Mobile AR

Outdoor AR (GPS + compass)- Layar (10 million+ users), Junaio, etc

Indoor AR (image based tracking)- QCAR, String etc

Web based (Flash) Flartoolkit marker tracking Markerless tracking

AR Interaction You can see spatially registered AR..

how can you interact with it?

AR Interaction Today Mostly simple interaction Mobile

Outdoor (Junaio, Layar, Wikitude, etc)- Viewing information in place, touch virtual tags

Indoor (Invizimals, Qualcomm demos)- Change viewpoint, screen based (touch screen)

Web based Change viewpoint, screen interaction

(mouse)

History of AR Interaction

1. AR Information Viewing Information is registered to

real-world context Hand held AR displays

Interaction Manipulation of a window

into information space 2D/3D virtual viewpoint control

Applications Context-aware information

displays Examples

NaviCam, Cameleon, etcNaviCam Rekimoto, et al. 1997

Current AR Information Browsers

Mobile AR GPS + compass

Many Applications Layar Wikitude Acrossair PressLite Yelp AR Car Finder …

2. 3D AR Interfaces Virtual objects displayed in 3D

physical space and manipulated HMDs and 6DOF head-tracking 6DOF hand trackers for input

Interaction Viewpoint control Traditional 3D UI interaction:

manipulation, selection, etc.

Requires custom input devicesKiyokawa, et al. 2000

VLEGO - AR 3D Interaction

3. Augmented Surfaces and Tangible Interfaces Basic principles

Virtual objects are projected on a surface

Physical objects are used as controls for virtual objects

Support for collaboration

Augmented Surfaces

Rekimoto, et al. 1998 Front projection Marker-based tracking Multiple projection

surfaces

Tangible User Interfaces (Ishii 97)

Create digital shadows for physical objects

Foreground graspable UI

Background ambient interfaces

Tangible Interface: ARgroove

Collaborative Instrument Exploring Physically Based

Interaction Move and track physical record Map physical actions to Midi output

- Translation, rotation- Tilt, shake

Limitation AR output shown on screen Separation between input and

output

Lessons from Tangible Interfaces

Benefits Physical objects make us smart (affordances,

constraints) Objects aid collaboration (shared meaning) Objects increase understanding (cognitive

artifacts)

Limitations Difficult to change object properties Limited display capabilities (project onto surface) Separation between object and display

4: Tangible AR

AR overcomes limitation of TUIs enhance display possibilities merge task/display space provide public and private views

TUI + AR = Tangible AR Apply TUI methods to AR interface design

Example Tangible AR Applications Use of natural physical object

manipulations to control virtual objects

LevelHead (Oliver) Physical cubes become rooms

VOMAR (Kato 2000) Furniture catalog book:

- Turn over the page to see new models

Paddle interaction:- Push, shake, incline, hit, scoop

VOMAR Interface

Evolution of AR Interaction1. Information Viewing Interfaces

simple (conceptually!), unobtrusive

2. 3D AR Interfaces expressive, creative, require attention

3. Tangible Interfaces Embedded into conventional

environments

4. Tangible AR Combines TUI input + AR display

Limitations Typical limitations

Simple/No interaction (viewpoint control) Require custom devices Single mode interaction 2D input for 3D (screen based interaction) No understanding of real world Explicit vs. implicit interaction Unintelligent interfaces (no learning)

Natural Interaction

The Vision of AR

To Make the Vision Real.. Hardware/software requirements

Contact lens displays Free space hand/body tracking Environment recognition Speech/gesture recognition Etc..

Natural Interaction Automatically detecting real

environment Environmental awareness Physically based interaction

Gesture Input Free-hand interaction

Multimodal Input Speech and gesture interaction Implicit rather than Explicit interaction

Environmental Awareness

AR MicroMachines AR experience with environment

awareness and physically-based interaction Based on MS Kinect RGB-D sensor

Augmented environment supports occlusion, shadows physically-based interaction between

real and virtual objects

Operating Environment

Architecture Our framework uses five libraries:

OpenNI OpenCV OPIRA Bullet Physics OpenSceneGraph

System Flow The system flow consists of three

sections: Image Processing and Marker Tracking Physics Simulation Rendering

Physics Simulation

Create virtual mesh over real world

Update at 10 fps – can move real objects

Use by physics engine for collision detection (virtual/real)

Use by OpenScenegraph for occlusion and shadows

Rendering

Occlusion Shadows

Natural Gesture Interaction

HIT Lab NZ AR Gesture Library

MotivationAR MicroMachines and PhobiAR

• Treated the environment as static – no tracking

• Tracked objects in 2D

More realistic interaction requires 3D gesture tracking

MotivationOcclusion Issues

AR MicroMachines only achieved realistic occlusion because the user’s viewpoint matched the Kinect’s

Proper occlusion requires a more complete model of scene objects

Architecture

HITLabNZ’s Gesture Library

Architecture


o Supports PCL, OpenNI, OpenCV, and Kinect SDK.o Provides access to depth, RGB, XYZRGB.o Usage: Capturing color image, depth image and

concatenated point clouds from a single or multiple cameras

o For example:

Kinect for Xbox 360

Kinect for Windows

Asus Xtion Pro Live

Architectureo Segment images and point clouds based on

color, depth and space. o Usage: Segmenting images or point clouds using

color models, depth, or spatial properties such as location, shape and size.

o For example:


Skin color segmentation

Depth threshold

Architectureo Identify and track objects between frames based

on XYZRGB.o Usage: Identifying current position/orientation of

the tracked object in space.o For example:


Training set of hand poses, colors represent unique regions of the hand.

Raw output (without-cleaning) classified on real hand input (depth image).

Architectureo Hand Recognition/Modeling

Skeleton based (for low resolution approximation)

Model based (for more accurate representation)

o Object Modeling (identification and tracking rigid-body objects)

o Physical Modeling (physical interaction) Sphere Proxy Model based Mesh based

o Usage: For general spatial interaction in AR/VR environment


MethodRepresent models as collections of spheres moving with

the models in the Bullet physics engine

MethodRender AR scene with OpenSceneGraph, using depth map

for occlusion

Shadows yet to be implemented

Results

Architectureo Static (hand pose recognition)o Dynamic (meaningful movement recognition)o Context-based gesture recognition (gestures

with context, e.g. pointing)o Usage: Issuing commands/anticipating user

intention and high level interaction.


Multimodal Interaction

Multimodal Interaction Combined speech input Gesture and Speech complimentary

Speech- modal commands, quantities

Gesture- selection, motion, qualities

Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction

1. Marker Based Multimodal Interface

Add speech recognition to VOMAR Paddle + speech commands

Commands Recognized Create Command "Make a blue chair": to create a

virtual object and place it on the paddle. Duplicate Command "Copy this": to duplicate a

virtual object and place it on the paddle. Grab Command "Grab table": to select a virtual

object and place it on the paddle. Place Command "Place here": to place the

attached object in the workspace. Move Command "Move the couch": to attach a

virtual object in the workspace to the paddle so that it follows the paddle movement.

System Architecture

Object Relationships

"Put chair behind the table”Where is behind?

View specific regions

User Evaluation Performance time

Speech + static paddle significantly faster

Gesture-only condition less accurate for position/orientation Users preferred speech + paddle input

Subjective Surveys

2. Free Hand Multimodal Input

Use free hand to interact with AR content

Recognize simple gestures No marker tracking

Point Move Pick/Drop

Multimodal Architecture

Multimodal Fusion

Hand Occlusion

User Evaluation

Change object shape, colour and position Conditions

Speech only, gesture only, multimodal Measure

performance time, error, subjective survey

Experimental Setup

Change object shape and colour

Results Average performance time (MMI, speech fastest)

Gesture: 15.44s Speech: 12.38s Multimodal: 11.78s

No difference in user errors User subjective survey

Q1: How natural was it to manipulate the object? - MMI, speech significantly better

70% preferred MMI, 25% speech only, 5% gesture only

Future Directions

Future Research Mobile real world capture Mobile gesture input Intelligent interfaces Virtual characters

Natural Gesture Interaction on Mobile

Use mobile camera for hand tracking Fingertip detection

Evaluation

Gesture input more than twice as slow as touch No difference in naturalness

Intelligent Interfaces Most AR systems stupid

Don’t recognize user behaviour Don’t provide feedback Don’t adapt to user

Especially important for training Scaffolded learning Moving beyond check-lists of actions

Intelligent Interfaces

AR interface + intelligent tutoring system ASPIRE constraint based system (from UC) Constraints

- relevance cond., satisfaction cond., feedback

Domain Ontology

Intelligent Feedback

Actively monitors user behaviour Implicit vs. explicit interaction

Provides corrective feedback

Evaluation Results 16 subjects, with and without ITS Improved task completion

Improved learning

Intelligent Agents AR characters

Virtual embodiment of system Multimodal input/output

Examples AR Lego, Welbo, etc Mr Virtuoso

- AR character more real, more fun- On-screen 3D and AR similar in

usefulness

Conclusions

Conclusions AR traditionally involves tangible

interaction New technologies support natural

interaction Environment capture Natural gestures Multimodal interaction

Opportunities for future research Mobile, intelligent systems, characters

More Information

• Mark Billinghurst– [email protected]

• Website– http://www.hitlabnz.org/

Technology

Natural Interfaces for Augmented Reality