426 Lecture 9: Research Directions in AR

COSC 426: Augmented Reality

Mark Billinghurst

[email protected]

Sept 19th 2012

Lecture 9: AR Research Directions

Looking to the Future

The Future is with us It takes at least 20 years for new

technologies to go from the lab to the lounge..

“The technologies that will significantly affect our lives over the next 10 years have been around for a decade.

The future is with us. The trick is learning how to spot it. The commercialization of research, in other words, is far more about prospecting than alchemy.”

Bill Buxton

Oct 11th 2004

experiences

applications

tools

components

Sony CSL © 2004

Research Directions

Tracking, Display

Authoring

Interaction

Usability

Research Directions   Components

 Markerless tracking, hybrid tracking  Displays, input devices

  Tools   Authoring tools, user generated content

  Applications   Interaction techniques/metaphors

  Experiences  User evaluation, novel AR/MR experiences

HMD Design

Occlusion with See-through HMD   The Problem

  Occluding real objects with virtual   Occluding virtual objects with real

Real Scene Current See-through HMD

ELMO (Kiyokawa 2001)

  Occlusive see-through HMD   Masking LCD   Real time range finding

ELMO Demo

ELMO Design

  Use LCD mask to block real world   Depth sensing for occluding virtual images

Virtual images from LCD

Real World

Optical Combiner

LCD Mask Depth Sensing

ELMO Results

Future Displays

  Always on, unobtrusive

Google Glasses

Contact Lens Display   Babak Parviz

 University Washington   MEMS components

  Transparent elements  Micro-sensors

  Challenges  Miniaturization   Assembly   Eye-safe

Contact Lens Prototype

Applications

Interaction Techniques   Input techniques

  3D vs. 2D input   Pen/buttons/gestures

  Natural Interaction   Speech + gesture input

  Intelligent Interfaces   Artificial agents   Context sensing

Flexible Displays   Flexible Lens Surface

  Bimanual interaction   Digital paper analogy

Red Planet, 2000

Sony CSL © 2004

Sony CSL © 2004

Tangible User Interfaces (TUIs)   GUMMI bendable display prototype   Reproduced by permission of Sony CSL

Sony CSL © 2004

Sony CSL © 2004

Lucid Touch   Microsoft Research & Mitsubishi Electric Research Labs   Wigdor, D., Forlines, C., Baudisch, P., Barnwell, J., Shen, C.

LucidTouch: A See-Through Mobile Device In Proceedings of UIST 2007, Newport, Rhode Island, October 7-10, 2007, pp. 269–278.

Auditory Modalities

  Auditory   auditory icons   earcons   speech synthesis/recognition

  Nomadic Radio (Sawhney) -  combines spatialized audio -  auditory cues -  speech synthesis/recognition

Gestural interfaces   1. Micro-gestures

  (unistroke, smartPad)

  2. Device-based gestures   (tilt based examples)

  3. Embodied interaction   (eye toy)

Natural Gesture Interaction on Mobile

  Use mobile camera for hand tracking   Fingertip detection

Evaluation

  Gesture input more than twice as slow as touch   No difference in naturalness

Haptic Modalities

  Haptic interfaces   Simple uses in mobiles? (vibration instead of ringtone)   Sony’s Touchengine

-  physiological experiments show you can perceive two stimulus 5ms apart, and spaced as low as 0.2 microns

n層 28 µｍ

n層

4 µｍ

V

Haptic Input

  AR Haptic Workbench  CSIRO 2003 – Adcock et. al.

AR Haptic Interface

  Phantom, ARToolKit, Magellan

Natural Interaction

The Vision of AR

To Make the Vision Real..   Hardware/software requirements

 Contact lens displays   Free space hand/body tracking   Environment recognition   Speech/gesture recognition   Etc..

Natural Interaction   Automatically detecting real environment

  Environmental awareness   Physically based interaction

  Gesture Input   Free-hand interaction

  Multimodal Input   Speech and gesture interaction   Implicit rather than Explicit interaction

Environmental Awareness

AR MicroMachines   AR experience with environment awareness

and physically-based interaction   Based on MS Kinect RGB-D sensor

  Augmented environment supports   occlusion, shadows   physically-based interaction between real and

virtual objects

Operating Environment

Architecture   Our framework uses five libraries:

 OpenNI  OpenCV  OPIRA   Bullet Physics  OpenSceneGraph

System Flow   The system flow consists of three sections:

  Image Processing and Marker Tracking   Physics Simulation   Rendering

Physics Simulation

  Create virtual mesh over real world

  Update at 10 fps – can move real objects

  Use by physics engine for collision detection (virtual/real)

  Use by OpenScenegraph for occlusion and shadows

Rendering

Occlusion Shadows

Natural Gesture Interaction

Mo#va#on AR MicroMachines and PhobiAR

•  Treated the environment as sta/c – no tracking

•  Tracked objects in 2D

More realis#c interac#on requires 3D gesture tracking

Mo#va#on Occlusion Issues

AR MicroMachines only achieved realis/c occlusion because the user’s viewpoint matched the Kinect’s

Proper occlusion requires a more complete model of scene objects

Architecture 5. Gesture

•  Static Gestures • Dynamic Gestures •  Context based Gestures

4. Modeling

• Hand recognition/modeling •  Rigid-body modeling

3. Classification/Tracking

2. Segmentation

1. Hardware Interface

HITLabNZ’s Gesture Library


•  Static Gestures •  Dynamic Gestures •  Context based Gestures

4. Modeling

•  Hand recognition/modeling

•  Rigid-body modeling


2. Segmentation



o  Supports PCL, OpenNI, OpenCV, and Kinect SDK. o  Provides access to depth, RGB, XYZRGB. o  Usage: Capturing color image, depth image and concatenated

point clouds from a single or multiple cameras o  For example:

Kinect for Xbox 360

Kinect for Windows

Asus Xtion Pro Live



4. Modeling




2. Segmentation


o  Segment images and point clouds based on color, depth and space.

o  Usage: Segmenting images or point clouds using color models, depth, or spatial properties such as location, shape and size.

o  For example:


Skin color segmentation

Depth threshold



4. Modeling




2. Segmentation


o  Identify and track objects between frames based on XYZRGB.

o  Usage: Identifying current position/orientation of the tracked object in space.

o  For example:


Training set of hand poses, colors represent unique regions of the hand.

Raw output (without-cleaning) classified on real hand input (depth image).



4. Modeling




2. Segmentation


o  Hand Recognition/Modeling   Skeleton based (for low resolution

approximation)   Model based (for more accurate

representation) o  Object Modeling (identification and tracking rigid-

body objects) o  Physical Modeling (physical interaction)

  Sphere Proxy   Model based   Mesh based

o  Usage: For general spatial interaction in AR/VR environment


Method Represent models as collec#ons of spheres moving with the

models in the Bullet physics engine

Method Render AR scene with OpenSceneGraph, using depth map

for occlusion

Shadows yet to be implemented

Results



4. Modeling




2. Segmentation


o  Static (hand pose recognition) o  Dynamic (meaningful movement recognition) o  Context-based gesture recognition (gestures with context,

e.g. pointing) o  Usage: Issuing commands/anticipating user intention and high

level interaction.


Multimodal Interaction

Multimodal Interaction   Combined speech input   Gesture and Speech complimentary

  Speech -  modal commands, quantities

 Gesture -  selection, motion, qualities

  Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction

1. Marker Based Multimodal Interface

  Add speech recognition to VOMAR   Paddle + speech commands

Commands Recognized   Create Command "Make a blue chair": to create a virtual

object and place it on the paddle.   Duplicate Command "Copy this": to duplicate a virtual object

and place it on the paddle.   Grab Command "Grab table": to select a virtual object and

place it on the paddle.   Place Command "Place here": to place the attached object in

the workspace.   Move Command "Move the couch": to attach a virtual object

in the workspace to the paddle so that it follows the paddle movement.

System Architecture

Object Relationships

"Put chair behind the table” Where is behind?

View specific regions

User Evaluation   Performance time

  Speech + static paddle significantly faster

  Gesture-only condition less accurate for position/orientation   Users preferred speech + paddle input

Subjective Surveys

2. Free Hand Multimodal Input   Use free hand to interact with AR content   Recognize simple gestures   No marker tracking

Point Move Pick/Drop

Multimodal Architecture

Multimodal Fusion

Hand Occlusion

User Evaluation

  Change object shape, colour and position   Conditions

  Speech only, gesture only, multimodal

  Measure   performance time, error, subjective survey

Experimental Setup

Change object shape and colour

Results   Average performance time (MMI, speech fastest)

  Gesture: 15.44s   Speech: 12.38s   Multimodal: 11.78s

  No difference in user errors   User subjective survey

  Q1: How natural was it to manipulate the object? -  MMI, speech significantly better

  70% preferred MMI, 25% speech only, 5% gesture only

Intelligent Interfaces

Intelligent Interfaces   Most AR systems stupid

 Don’t recognize user behaviour  Don’t provide feedback  Don’t adapt to user

  Especially important for training   Scaffolded learning  Moving beyond check-lists of actions

Intelligent Interfaces

  AR interface + intelligent tutoring system   ASPIRE constraint based system (from UC)  Constraints

-  relevance cond., satisfaction cond., feedback

Domain Ontology

Intelligent Feedback

  Actively monitors user behaviour   Implicit vs. explicit interaction

  Provides corrective feedback

Evaluation Results   16 subjects, with and without ITS   Improved task completion

  Improved learning

Intelligent Agents   AR characters

  Virtual embodiment of system  Multimodal input/output

  Examples   AR Lego, Welbo, etc  Mr Virtuoso

-  AR character more real, more fun -  On-screen 3D and AR similar in usefulness

Context Sensing

Context Sensing   TKK Project   Using context to

manage information   Context from

  Speech  Gaze   Real world

  AR Display

Gaze Interaction

AR View

More Information Over Time

Experiences

Novel Experiences   Crossing Boundaries

 Ubiquitous VR/AR

  Collaborative Experiences   Massive AR

  AR + Social Networking

  Usability

Crossing Boundaries

Jun Rekimoto, Sony CSL

Invisible Interfaces


Milgram’s Reality-Virtuality continuum

Mixed Reality

Reality - Virtuality (RV) Continuum

Real Environment

Augmented Reality (AR)

Augmented Virtuality (AV)

Virtual Environment

The MagicBook

Reality Virtuality Augmented Reality (AR)

Augmented Virtuality (AV)



Example: Visualizing Sensor Networks

  Rauhala et. al. 2007 (Linkoping)   Network of Humidity Sensors

  ZigBee wireless communication

  Use Mobile AR to Visualize Humidity



UbiVR – CAMAR

CAMAR Companion

CAMAR Viewer

CAMAR Controller

GIST - Korea

ubiHome @ GIST

©ubiHome

What/When/How

Where/When

Media services

Who/What/ When/How

ubiKey

Couch Sensor PDA

Tag-it

Door Sensor

ubiTrack

When/How When/How Who/What/When/How

Light service MR window

CAMAR - GIST

(CAMAR: Context-Aware Mobile Augmented Reality)

  UCAM: Architecture

Operating System

BAN/PAN (BT)

TCP/IP (Discovery,Control,Event)

Network Interface

Context Interface

Sensor Service (Integrator,Manager,

Interpreter,ServiceProvider)

Content

wear-UCAM

vr-UCAM

ubi-UCAM

Hybrid User Inerfaces Goal: To incorporate AR into normal meeting

environment   Physical Components

  Real props

  Display Elements   2D and 3D (AR) displays

  Interaction Metaphor   Use multiple tools – each relevant for the task

Hybrid User Interfaces

PERSONAL

1 TABLETOP

2 WHITEBOARD

3 MULTIGROUP

4

Private Display Private Display Group Display

Private Display Public Display

Private Display Group Display Public Display

Reality Virtual Reality

Terminal

Ubiquitous

Desktop AR VR

Milgram

Weiser

UbiComp

Mobile AR

Ubi AR

Ubi VR

From: Joe Newman

Reality

VR

Ubiquitous

Terminal

Milgram

Weiser

Single User

Massive Multi User

Remote Collaboration

AR Client

  HMD and HHD   Showing virtual images over real world   Images drawn by remote expert   Local interaction

Shared Visual Context (Fussell ,1999)

  Remote video collaboration   Shared manual, video viewing  Compared Video, Audio, Side-by-side collaboration  Communication analysis

WACL(Kurata,2004)

  Wearable Camera/Laser Pointer   Independent pointer control   Remote panorama view

WACL(Kurata,2004)

  Remote Expert View   Panorama viewing, annotation, image capture

As If Being There (Poelman, 2012)

  AR + Scene Capture  HMD viewing, remote expert  Gesture input   Scene capture (PTAM), stereo camera

As If Being There (Poelman, 2012)

  Gesture Interaction  Hand postures recognized  Menu superimposed on hands

Real World Capture

  Using Kinect for 3D Scene Capture  Camera tracking   AR overlay   Remote situational awareness

Remote scene capture with AR annotations added

Massive Multiuser   Handheld AR for the first time allows extremely high

numbers of AR users   Requires

  New types of applications/games   New infrastructure (server/client/peer-to-peer)   Content distribution…

Future Directions SLIDE 116

Massive MultiUser   2D Applications

  MSN – 29 million   Skype – 10 million   Facebook – 100m+

  3D/VR Applications   SecondLife > 50K   Stereo projection - <500

  Augmented Reality   Shared Space (1999) - 4   Invisible Train (2004) - 8

BASIC VIEW

PERSONAL VIEW

Augmented Reality 2.0 Infrastructure

Leveraging Web 2.0   Content retrieval using HTTP   XML encoded meta information

  KML placemarks + extensions   Queries

  Based on location (from GPS, image recognition)   Based on situation (barcode markers)

  Queries also deliver tracking feature databases   Everybody can set up an AR 2.0 server   Syndication:

  Community servers for end-user content   Tagging

  AR client subscribes to arbitrary number of feeds

Content   Content creation and delivery

 Content creation pipeline  Delivering previously unknown content

  Streaming of  Data (objects, multi-media)  Applications

  Distribution  How do users learn about all that content?  How do they access it?

ARML (AR Markup Language)

Scaling Up

  AR on a City Scale   Using mobile phone as ubiquitous sensor   MIT Senseable City Lab

  http://senseable.mit.edu/

WikiCity Rome (Senseable City Lab MIT)

Conclusions

AR Research in the HIT Lab NZ   Gesture interaction

  Gesture library

  Multimodal interaction   Collaborative speech/gesture interfaces

  Mobile AR interfaces   Outdoor AR, interaction methods, navigation tools

  AR authoring tools   Visual programming for AR

  Remote Collaboration   Mobile AR for remote interaction

More Information •  Mark Billinghurst

– [email protected]

•  Websites –  http://www.hitlabnz.org/ –  http://artoolkit.sourceforge.net/ –  http://www.osgart.org/ –  http://www.hitlabnz.org/wiki/buildAR/

Technology

426 Lecture 9: Research Directions in AR