Upload
mark-billinghurst
View
3.213
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Natural User Interfaces for Augmented Reality. Keynote speech given by Mark Billinghurst at the CHINZ 2012 conference, in Dunedin, July 2nd 2012.
Citation preview
Natural Interfaces for Augmented Reality
Mark BillinghurstHIT Lab NZ
University of Canterbury
Augmented Reality Definition
Defining Characteristics [Azuma 97] Combines Real and Virtual Images
- Both can be seen at the same time Interactive in real-time
- The virtual content can be interacted with Registered in 3D
- Virtual objects appear fixed in space
AR Today Most widely used AR is mobile or web based Mobile AR
Outdoor AR (GPS + compass)- Layar (10 million+ users), Junaio, etc
Indoor AR (image based tracking)- QCAR, String etc
Web based (Flash) Flartoolkit marker tracking Markerless tracking
AR Interaction You can see spatially registered AR..
how can you interact with it?
AR Interaction Today Mostly simple interaction Mobile
Outdoor (Junaio, Layar, Wikitude, etc)- Viewing information in place, touch virtual tags
Indoor (Invizimals, Qualcomm demos)- Change viewpoint, screen based (touch screen)
Web based Change viewpoint, screen interaction
(mouse)
History of AR Interaction
1. AR Information Viewing Information is registered to
real-world context Hand held AR displays
Interaction Manipulation of a window
into information space 2D/3D virtual viewpoint control
Applications Context-aware information
displays Examples
NaviCam, Cameleon, etcNaviCam Rekimoto, et al. 1997
Current AR Information Browsers
Mobile AR GPS + compass
Many Applications Layar Wikitude Acrossair PressLite Yelp AR Car Finder …
2. 3D AR Interfaces Virtual objects displayed in 3D
physical space and manipulated HMDs and 6DOF head-tracking 6DOF hand trackers for input
Interaction Viewpoint control Traditional 3D UI interaction:
manipulation, selection, etc.
Requires custom input devicesKiyokawa, et al. 2000
VLEGO - AR 3D Interaction
3. Augmented Surfaces and Tangible Interfaces Basic principles
Virtual objects are projected on a surface
Physical objects are used as controls for virtual objects
Support for collaboration
Augmented Surfaces
Rekimoto, et al. 1998 Front projection Marker-based tracking Multiple projection
surfaces
Tangible User Interfaces (Ishii 97)
Create digital shadows for physical objects
Foreground graspable UI
Background ambient interfaces
Tangible Interface: ARgroove
Collaborative Instrument Exploring Physically Based
Interaction Move and track physical record Map physical actions to Midi output
- Translation, rotation- Tilt, shake
Limitation AR output shown on screen Separation between input and
output
Lessons from Tangible Interfaces
Benefits Physical objects make us smart (affordances,
constraints) Objects aid collaboration (shared meaning) Objects increase understanding (cognitive
artifacts)
Limitations Difficult to change object properties Limited display capabilities (project onto surface) Separation between object and display
4: Tangible AR
AR overcomes limitation of TUIs enhance display possibilities merge task/display space provide public and private views
TUI + AR = Tangible AR Apply TUI methods to AR interface design
Example Tangible AR Applications Use of natural physical object
manipulations to control virtual objects
LevelHead (Oliver) Physical cubes become rooms
VOMAR (Kato 2000) Furniture catalog book:
- Turn over the page to see new models
Paddle interaction:- Push, shake, incline, hit, scoop
VOMAR Interface
Evolution of AR Interaction1. Information Viewing Interfaces
simple (conceptually!), unobtrusive
2. 3D AR Interfaces expressive, creative, require attention
3. Tangible Interfaces Embedded into conventional
environments
4. Tangible AR Combines TUI input + AR display
Limitations Typical limitations
Simple/No interaction (viewpoint control) Require custom devices Single mode interaction 2D input for 3D (screen based interaction) No understanding of real world Explicit vs. implicit interaction Unintelligent interfaces (no learning)
Natural Interaction
The Vision of AR
To Make the Vision Real.. Hardware/software requirements
Contact lens displays Free space hand/body tracking Environment recognition Speech/gesture recognition Etc..
Natural Interaction Automatically detecting real
environment Environmental awareness Physically based interaction
Gesture Input Free-hand interaction
Multimodal Input Speech and gesture interaction Implicit rather than Explicit interaction
Environmental Awareness
AR MicroMachines AR experience with environment
awareness and physically-based interaction Based on MS Kinect RGB-D sensor
Augmented environment supports occlusion, shadows physically-based interaction between
real and virtual objects
Operating Environment
Architecture Our framework uses five libraries:
OpenNI OpenCV OPIRA Bullet Physics OpenSceneGraph
System Flow The system flow consists of three
sections: Image Processing and Marker Tracking Physics Simulation Rendering
Physics Simulation
Create virtual mesh over real world
Update at 10 fps – can move real objects
Use by physics engine for collision detection (virtual/real)
Use by OpenScenegraph for occlusion and shadows
Rendering
Occlusion Shadows
Natural Gesture Interaction
HIT Lab NZ AR Gesture Library
MotivationAR MicroMachines and PhobiAR
• Treated the environment as static – no tracking
• Tracked objects in 2D
More realistic interaction requires 3D gesture tracking
MotivationOcclusion Issues
AR MicroMachines only achieved realistic occlusion because the user’s viewpoint matched the Kinect’s
Proper occlusion requires a more complete model of scene objects
Architecture
HITLabNZ’s Gesture Library
Architecture
HITLabNZ’s Gesture Library
o Supports PCL, OpenNI, OpenCV, and Kinect SDK.o Provides access to depth, RGB, XYZRGB.o Usage: Capturing color image, depth image and
concatenated point clouds from a single or multiple cameras
o For example:
Kinect for Xbox 360
Kinect for Windows
Asus Xtion Pro Live
Architectureo Segment images and point clouds based on
color, depth and space. o Usage: Segmenting images or point clouds using
color models, depth, or spatial properties such as location, shape and size.
o For example:
HITLabNZ’s Gesture Library
Skin color segmentation
Depth threshold
Architectureo Identify and track objects between frames based
on XYZRGB.o Usage: Identifying current position/orientation of
the tracked object in space.o For example:
HITLabNZ’s Gesture Library
Training set of hand poses, colors represent unique regions of the hand.
Raw output (without-cleaning) classified on real hand input (depth image).
Architectureo Hand Recognition/Modeling
Skeleton based (for low resolution approximation)
Model based (for more accurate representation)
o Object Modeling (identification and tracking rigid-body objects)
o Physical Modeling (physical interaction) Sphere Proxy Model based Mesh based
o Usage: For general spatial interaction in AR/VR environment
HITLabNZ’s Gesture Library
MethodRepresent models as collections of spheres moving with
the models in the Bullet physics engine
MethodRender AR scene with OpenSceneGraph, using depth map
for occlusion
Shadows yet to be implemented
Results
Architectureo Static (hand pose recognition)o Dynamic (meaningful movement recognition)o Context-based gesture recognition (gestures
with context, e.g. pointing)o Usage: Issuing commands/anticipating user
intention and high level interaction.
HITLabNZ’s Gesture Library
Multimodal Interaction
Multimodal Interaction Combined speech input Gesture and Speech complimentary
Speech- modal commands, quantities
Gesture- selection, motion, qualities
Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction
1. Marker Based Multimodal Interface
Add speech recognition to VOMAR Paddle + speech commands
Commands Recognized Create Command "Make a blue chair": to create a
virtual object and place it on the paddle. Duplicate Command "Copy this": to duplicate a
virtual object and place it on the paddle. Grab Command "Grab table": to select a virtual
object and place it on the paddle. Place Command "Place here": to place the
attached object in the workspace. Move Command "Move the couch": to attach a
virtual object in the workspace to the paddle so that it follows the paddle movement.
System Architecture
Object Relationships
"Put chair behind the table”Where is behind?
View specific regions
User Evaluation Performance time
Speech + static paddle significantly faster
Gesture-only condition less accurate for position/orientation Users preferred speech + paddle input
Subjective Surveys
2. Free Hand Multimodal Input
Use free hand to interact with AR content
Recognize simple gestures No marker tracking
Point Move Pick/Drop
Multimodal Architecture
Multimodal Fusion
Hand Occlusion
User Evaluation
Change object shape, colour and position Conditions
Speech only, gesture only, multimodal Measure
performance time, error, subjective survey
Experimental Setup
Change object shape and colour
Results Average performance time (MMI, speech fastest)
Gesture: 15.44s Speech: 12.38s Multimodal: 11.78s
No difference in user errors User subjective survey
Q1: How natural was it to manipulate the object? - MMI, speech significantly better
70% preferred MMI, 25% speech only, 5% gesture only
Future Directions
Future Research Mobile real world capture Mobile gesture input Intelligent interfaces Virtual characters
Natural Gesture Interaction on Mobile
Use mobile camera for hand tracking Fingertip detection
Evaluation
Gesture input more than twice as slow as touch No difference in naturalness
Intelligent Interfaces Most AR systems stupid
Don’t recognize user behaviour Don’t provide feedback Don’t adapt to user
Especially important for training Scaffolded learning Moving beyond check-lists of actions
Intelligent Interfaces
AR interface + intelligent tutoring system ASPIRE constraint based system (from UC) Constraints
- relevance cond., satisfaction cond., feedback
Domain Ontology
Intelligent Feedback
Actively monitors user behaviour Implicit vs. explicit interaction
Provides corrective feedback
Evaluation Results 16 subjects, with and without ITS Improved task completion
Improved learning
Intelligent Agents AR characters
Virtual embodiment of system Multimodal input/output
Examples AR Lego, Welbo, etc Mr Virtuoso
- AR character more real, more fun- On-screen 3D and AR similar in
usefulness
Conclusions
Conclusions AR traditionally involves tangible
interaction New technologies support natural
interaction Environment capture Natural gestures Multimodal interaction
Opportunities for future research Mobile, intelligent systems, characters