Upload
mark-billinghurst
View
3.946
Download
2
Embed Size (px)
DESCRIPTION
The final lecture in the COSC 426 graduate course in Augmented Reality. Taught by Mark Billinghurst from the HIT Lab NZ at the University of Canterbury on Sept. 19th 2012
Citation preview
COSC 426: Augmented Reality
Mark Billinghurst
Sept 19th 2012
Lecture 9: AR Research Directions
Looking to the Future
The Future is with us It takes at least 20 years for new
technologies to go from the lab to the lounge..
“The technologies that will significantly affect our lives over the next 10 years have been around for a decade.
The future is with us. The trick is learning how to spot it. The commercialization of research, in other words, is far more about prospecting than alchemy.”
Bill Buxton
Oct 11th 2004
experiences
applications
tools
components
Sony CSL © 2004
Research Directions
Tracking, Display
Authoring
Interaction
Usability
Research Directions Components
Markerless tracking, hybrid tracking Displays, input devices
Tools Authoring tools, user generated content
Applications Interaction techniques/metaphors
Experiences User evaluation, novel AR/MR experiences
HMD Design
Occlusion with See-through HMD The Problem
Occluding real objects with virtual Occluding virtual objects with real
Real Scene Current See-through HMD
ELMO (Kiyokawa 2001)
Occlusive see-through HMD Masking LCD Real time range finding
ELMO Demo
ELMO Design
Use LCD mask to block real world Depth sensing for occluding virtual images
Virtual images from LCD
Real World
Optical Combiner
LCD Mask Depth Sensing
ELMO Results
Future Displays
Always on, unobtrusive
Google Glasses
Contact Lens Display Babak Parviz
University Washington MEMS components
Transparent elements Micro-sensors
Challenges Miniaturization Assembly Eye-safe
Contact Lens Prototype
Applications
Interaction Techniques Input techniques
3D vs. 2D input Pen/buttons/gestures
Natural Interaction Speech + gesture input
Intelligent Interfaces Artificial agents Context sensing
Flexible Displays Flexible Lens Surface
Bimanual interaction Digital paper analogy
Red Planet, 2000
Sony CSL © 2004
Sony CSL © 2004
Tangible User Interfaces (TUIs) GUMMI bendable display prototype Reproduced by permission of Sony CSL
Sony CSL © 2004
Sony CSL © 2004
Lucid Touch Microsoft Research & Mitsubishi Electric Research Labs Wigdor, D., Forlines, C., Baudisch, P., Barnwell, J., Shen, C.
LucidTouch: A See-Through Mobile Device In Proceedings of UIST 2007, Newport, Rhode Island, October 7-10, 2007, pp. 269–278.
Auditory Modalities
Auditory auditory icons earcons speech synthesis/recognition
Nomadic Radio (Sawhney) - combines spatialized audio - auditory cues - speech synthesis/recognition
Gestural interfaces 1. Micro-gestures
(unistroke, smartPad)
2. Device-based gestures (tilt based examples)
3. Embodied interaction (eye toy)
Natural Gesture Interaction on Mobile
Use mobile camera for hand tracking Fingertip detection
Evaluation
Gesture input more than twice as slow as touch No difference in naturalness
Haptic Modalities
Haptic interfaces Simple uses in mobiles? (vibration instead of ringtone) Sony’s Touchengine
- physiological experiments show you can perceive two stimulus 5ms apart, and spaced as low as 0.2 microns
n層 28 µm
n層
4 µm
V
Haptic Input
AR Haptic Workbench CSIRO 2003 – Adcock et. al.
AR Haptic Interface
Phantom, ARToolKit, Magellan
Natural Interaction
The Vision of AR
To Make the Vision Real.. Hardware/software requirements
Contact lens displays Free space hand/body tracking Environment recognition Speech/gesture recognition Etc..
Natural Interaction Automatically detecting real environment
Environmental awareness Physically based interaction
Gesture Input Free-hand interaction
Multimodal Input Speech and gesture interaction Implicit rather than Explicit interaction
Environmental Awareness
AR MicroMachines AR experience with environment awareness
and physically-based interaction Based on MS Kinect RGB-D sensor
Augmented environment supports occlusion, shadows physically-based interaction between real and
virtual objects
Operating Environment
Architecture Our framework uses five libraries:
OpenNI OpenCV OPIRA Bullet Physics OpenSceneGraph
System Flow The system flow consists of three sections:
Image Processing and Marker Tracking Physics Simulation Rendering
Physics Simulation
Create virtual mesh over real world
Update at 10 fps – can move real objects
Use by physics engine for collision detection (virtual/real)
Use by OpenScenegraph for occlusion and shadows
Rendering
Occlusion Shadows
Natural Gesture Interaction
Mo#va#on AR MicroMachines and PhobiAR
• Treated the environment as sta/c – no tracking
• Tracked objects in 2D
More realis#c interac#on requires 3D gesture tracking
Mo#va#on Occlusion Issues
AR MicroMachines only achieved realis/c occlusion because the user’s viewpoint matched the Kinect’s
Proper occlusion requires a more complete model of scene objects
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling
• Hand recognition/modeling • Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
HITLabNZ’s Gesture Library
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling
• Hand recognition/modeling
• Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
HITLabNZ’s Gesture Library
o Supports PCL, OpenNI, OpenCV, and Kinect SDK. o Provides access to depth, RGB, XYZRGB. o Usage: Capturing color image, depth image and concatenated
point clouds from a single or multiple cameras o For example:
Kinect for Xbox 360
Kinect for Windows
Asus Xtion Pro Live
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling
• Hand recognition/modeling
• Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Segment images and point clouds based on color, depth and space.
o Usage: Segmenting images or point clouds using color models, depth, or spatial properties such as location, shape and size.
o For example:
HITLabNZ’s Gesture Library
Skin color segmentation
Depth threshold
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling
• Hand recognition/modeling
• Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Identify and track objects between frames based on XYZRGB.
o Usage: Identifying current position/orientation of the tracked object in space.
o For example:
HITLabNZ’s Gesture Library
Training set of hand poses, colors represent unique regions of the hand.
Raw output (without-cleaning) classified on real hand input (depth image).
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling
• Hand recognition/modeling
• Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Hand Recognition/Modeling Skeleton based (for low resolution
approximation) Model based (for more accurate
representation) o Object Modeling (identification and tracking rigid-
body objects) o Physical Modeling (physical interaction)
Sphere Proxy Model based Mesh based
o Usage: For general spatial interaction in AR/VR environment
HITLabNZ’s Gesture Library
Method Represent models as collec#ons of spheres moving with the
models in the Bullet physics engine
Method Render AR scene with OpenSceneGraph, using depth map
for occlusion
Shadows yet to be implemented
Results
Architecture 5. Gesture
• Static Gestures • Dynamic Gestures • Context based Gestures
4. Modeling
• Hand recognition/modeling
• Rigid-body modeling
3. Classification/Tracking
2. Segmentation
1. Hardware Interface
o Static (hand pose recognition) o Dynamic (meaningful movement recognition) o Context-based gesture recognition (gestures with context,
e.g. pointing) o Usage: Issuing commands/anticipating user intention and high
level interaction.
HITLabNZ’s Gesture Library
Multimodal Interaction
Multimodal Interaction Combined speech input Gesture and Speech complimentary
Speech - modal commands, quantities
Gesture - selection, motion, qualities
Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction
1. Marker Based Multimodal Interface
Add speech recognition to VOMAR Paddle + speech commands
Commands Recognized Create Command "Make a blue chair": to create a virtual
object and place it on the paddle. Duplicate Command "Copy this": to duplicate a virtual object
and place it on the paddle. Grab Command "Grab table": to select a virtual object and
place it on the paddle. Place Command "Place here": to place the attached object in
the workspace. Move Command "Move the couch": to attach a virtual object
in the workspace to the paddle so that it follows the paddle movement.
System Architecture
Object Relationships
"Put chair behind the table” Where is behind?
View specific regions
User Evaluation Performance time
Speech + static paddle significantly faster
Gesture-only condition less accurate for position/orientation Users preferred speech + paddle input
Subjective Surveys
2. Free Hand Multimodal Input Use free hand to interact with AR content Recognize simple gestures No marker tracking
Point Move Pick/Drop
Multimodal Architecture
Multimodal Fusion
Hand Occlusion
User Evaluation
Change object shape, colour and position Conditions
Speech only, gesture only, multimodal
Measure performance time, error, subjective survey
Experimental Setup
Change object shape and colour
Results Average performance time (MMI, speech fastest)
Gesture: 15.44s Speech: 12.38s Multimodal: 11.78s
No difference in user errors User subjective survey
Q1: How natural was it to manipulate the object? - MMI, speech significantly better
70% preferred MMI, 25% speech only, 5% gesture only
Intelligent Interfaces
Intelligent Interfaces Most AR systems stupid
Don’t recognize user behaviour Don’t provide feedback Don’t adapt to user
Especially important for training Scaffolded learning Moving beyond check-lists of actions
Intelligent Interfaces
AR interface + intelligent tutoring system ASPIRE constraint based system (from UC) Constraints
- relevance cond., satisfaction cond., feedback
Domain Ontology
Intelligent Feedback
Actively monitors user behaviour Implicit vs. explicit interaction
Provides corrective feedback
Evaluation Results 16 subjects, with and without ITS Improved task completion
Improved learning
Intelligent Agents AR characters
Virtual embodiment of system Multimodal input/output
Examples AR Lego, Welbo, etc Mr Virtuoso
- AR character more real, more fun - On-screen 3D and AR similar in usefulness
Context Sensing
Context Sensing TKK Project Using context to
manage information Context from
Speech Gaze Real world
AR Display
Gaze Interaction
AR View
More Information Over Time
Experiences
Novel Experiences Crossing Boundaries
Ubiquitous VR/AR
Collaborative Experiences Massive AR
AR + Social Networking
Usability
Crossing Boundaries
Jun Rekimoto, Sony CSL
Invisible Interfaces
Jun Rekimoto, Sony CSL
Milgram’s Reality-Virtuality continuum
Mixed Reality
Reality - Virtuality (RV) Continuum
Real Environment
Augmented Reality (AR)
Augmented Virtuality (AV)
Virtual Environment
The MagicBook
Reality Virtuality Augmented Reality (AR)
Augmented Virtuality (AV)
Invisible Interfaces
Jun Rekimoto, Sony CSL
Example: Visualizing Sensor Networks
Rauhala et. al. 2007 (Linkoping) Network of Humidity Sensors
ZigBee wireless communication
Use Mobile AR to Visualize Humidity
Invisible Interfaces
Jun Rekimoto, Sony CSL
UbiVR – CAMAR
CAMAR Companion
CAMAR Viewer
CAMAR Controller
GIST - Korea
ubiHome @ GIST
©ubiHome
What/When/How
Where/When
Media services
Who/What/ When/How
ubiKey
Couch Sensor PDA
Tag-it
Door Sensor
ubiTrack
When/How When/How Who/What/When/How
Light service MR window
CAMAR - GIST
(CAMAR: Context-Aware Mobile Augmented Reality)
UCAM: Architecture
Operating System
BAN/PAN (BT)
TCP/IP (Discovery,Control,Event)
Network Interface
Context Interface
Sensor Service (Integrator,Manager,
Interpreter,ServiceProvider)
Content
wear-UCAM
vr-UCAM
ubi-UCAM
Hybrid User Inerfaces Goal: To incorporate AR into normal meeting
environment Physical Components
Real props
Display Elements 2D and 3D (AR) displays
Interaction Metaphor Use multiple tools – each relevant for the task
Hybrid User Interfaces
PERSONAL
1 TABLETOP
2 WHITEBOARD
3 MULTIGROUP
4
Private Display Private Display Group Display
Private Display Public Display
Private Display Group Display Public Display
Reality Virtual Reality
Terminal
Ubiquitous
Desktop AR VR
Milgram
Weiser
UbiComp
Mobile AR
Ubi AR
Ubi VR
From: Joe Newman
Reality
VR
Ubiquitous
Terminal
Milgram
Weiser
Single User
Massive Multi User
Remote Collaboration
AR Client
HMD and HHD Showing virtual images over real world Images drawn by remote expert Local interaction
Shared Visual Context (Fussell ,1999)
Remote video collaboration Shared manual, video viewing Compared Video, Audio, Side-by-side collaboration Communication analysis
WACL(Kurata,2004)
Wearable Camera/Laser Pointer Independent pointer control Remote panorama view
WACL(Kurata,2004)
Remote Expert View Panorama viewing, annotation, image capture
As If Being There (Poelman, 2012)
AR + Scene Capture HMD viewing, remote expert Gesture input Scene capture (PTAM), stereo camera
As If Being There (Poelman, 2012)
Gesture Interaction Hand postures recognized Menu superimposed on hands
Real World Capture
Using Kinect for 3D Scene Capture Camera tracking AR overlay Remote situational awareness
Remote scene capture with AR annotations added
Massive Multiuser Handheld AR for the first time allows extremely high
numbers of AR users Requires
New types of applications/games New infrastructure (server/client/peer-to-peer) Content distribution…
Future Directions SLIDE 116
Massive MultiUser 2D Applications
MSN – 29 million Skype – 10 million Facebook – 100m+
3D/VR Applications SecondLife > 50K Stereo projection - <500
Augmented Reality Shared Space (1999) - 4 Invisible Train (2004) - 8
BASIC VIEW
PERSONAL VIEW
Augmented Reality 2.0 Infrastructure
Leveraging Web 2.0 Content retrieval using HTTP XML encoded meta information
KML placemarks + extensions Queries
Based on location (from GPS, image recognition) Based on situation (barcode markers)
Queries also deliver tracking feature databases Everybody can set up an AR 2.0 server Syndication:
Community servers for end-user content Tagging
AR client subscribes to arbitrary number of feeds
Content Content creation and delivery
Content creation pipeline Delivering previously unknown content
Streaming of Data (objects, multi-media) Applications
Distribution How do users learn about all that content? How do they access it?
ARML (AR Markup Language)
Scaling Up
AR on a City Scale Using mobile phone as ubiquitous sensor MIT Senseable City Lab
http://senseable.mit.edu/
WikiCity Rome (Senseable City Lab MIT)
Conclusions
AR Research in the HIT Lab NZ Gesture interaction
Gesture library
Multimodal interaction Collaborative speech/gesture interfaces
Mobile AR interfaces Outdoor AR, interaction methods, navigation tools
AR authoring tools Visual programming for AR
Remote Collaboration Mobile AR for remote interaction
More Information • Mark Billinghurst
• Websites – http://www.hitlabnz.org/ – http://artoolkit.sourceforge.net/ – http://www.osgart.org/ – http://www.hitlabnz.org/wiki/buildAR/