put that there Aim m models for human pointing m interpretation …tpfeiffe/pubs/2008... · 2013. 3. 18. · Deixis: How to Determine Demonstrated Objects Using a Pointing Cone

Conversational Interface Agentsfacilitate natural interactions inVirtual Reality Environments.

Bielefeld University

Deictic expressions

pointinggestures regions or objects primary task

Virtual Reality manipulation ofobjects

selection of objectsray casting occlusion arm extension

interaction mediatedEmbodied Conversational Agent

understanding natural communication andnatural gesturesrobustness and accuracy interpretation naturalpointing gestures

study

pointing-based conversational interactions immersive VirtualReality

(such as “ ”) are fundamentalin human communication to refer to entities in the environment.In situated contexts, deictic expressions often comprise

directed at . One of thein applications is the visuallyperceivable . Thus VR research has focused ondeveloping metaphors that optimize the tradeoff between aswift and precise . Prominent examples are

, , or . These technologies arewell suited for interacting directly with the system.When the with the system is , e.g., by an

(ECA), the primary focus lieson a smooth of

. It is thus recommended to improve theof the of

, i.e., gestures without facilitation per visualaids or other auxiliaries.To attain these ends, we contribute results from a onpointing and draw conclusions for the implementation of

in.

put that there

Conversational Pointing Gestures for Virtual Reality InteractionImplications from an Empirical Study

Thies [email protected]

Ipke [email protected]

m

m

m

m

m

m

m

m

m

m

m

m

Measuring and Reconstructing Pointing inVisual Contexts

Deictic Object Reference in Task-orientedDialogue

Processing Instructions

Deixis: How to Determine DemonstratedObjects Using a Pointing Cone

Resolving Object References in MultimodalDialogues for Immersive VirtualEnvironments

Resolution of Multimodal ObjectReferences Using Conceptual Short TermMemory

3D UserInterfaces – Theory and Practice.

Mutual Disambiguation of 3D MultimodalInteraction in Augmented and VirtualReality.

A Gesture ProcessingFramework for Multimodal Interaction inVirtual Reality.

SenseShapes: Using Statistical Geometryfor Object Selection in a MultimodalAugmented Reality System.

AVirtual Interface Agent and its Agency.

Towards Preferences inVirtual Environment Interfaces.

Kranstedt, Lücking, Pfeiffer, Rieser &Staudacher

Kranstedt, Lücking, Pfeiffer, Rieser &Wachsmuth

Mouton de Gruyter, Berlin. 2006.

Weiß, Pfeiffer, Eikmeyer & Rickheit

Mouton de Gruyter, Berlin. 2006.

Kranstedt, Lücking, Pfeiffer, Rieser &Wachsmuth

.Springer-Verlag GmbH, Berlin Heidelberg.2006.

Pfeiffer & Latoschik

Pfeiffer, Voss & Latoschik

D. A. Bowman, E. Kruijff, J. Joseph J.LaViola, and I. Poupyrev.

Addison-Wesley, 2005.E. Kaiser, A. Olwal, D. McGee, H. Benko, A.Corradini, X. Li, P. Cohen, and S. Feiner.

In

, pages 12–19. ACM Press, 2003.M. E. Latoschik.

In

, AFRIGRAPH 2001, pages 95–100.ACM SIGGRAPH, 2001.A. Olwal, H. Benko, and S. Feiner.

In

, pages 300–301,Tokyo, Japan, October 7–10 2003.I. Wachsmuth, B. Lenzmann, T. Jörding, B.Jung, M. Latoschik, and M. Fröhlich.

In

, pages516–517, 1997.C. A. Wingrave, D. A. Bowman, and N.Ramakrishnan.

In

, pages 63–72,Aire-la-Ville, Switzerland, Switzerland, 2002.Eurographics Association.

Proceedings of the Brandial 2006

Situated Communication.

Situated Communication.

6th International Gesture Workshop

Proceedings of the IEEE Virtual Reality 2004

Proceedings of the EuroCogSci03.

Proceedings of the 5thInternational Conference on MultimodalInterfaces

Proceedings of the 1stInternational Conference on ComputerGraphics, Virtual Reality and Visualisationin Africa

Proceedingsof The Second IEEE and ACM InternationalSymposium on Mixed and AugmentedReality (ISMAR 2003)

Proceedings of the First InternationalConference on Autonomous Agents

EGVE’02: Proceedings of the Workshop onVirtual Environments 2002

Research BackgroundResearch Background

AcknowledgementsAcknowledgements

This work has been funded by theEC

Deutsche Forschungsgemeinschaft(DFG) in the CollaborativeResearch Center 360, “

”.

in the project ,FP6 - IST program - referencenumber 27654, and by the

PASION

SituatedArtificial Communicators

BibliographyBibliography

Secondary

Future Work

m

m

m

Taking the direction of gaze into account does not always improveperformance (contrary to the mainstream opinion).

Humans display a non-linear behavior at the borders of the domain

Confirm the results in a mixed setting with one human and oneembodied conversational agent over virtual objects

At least in thesetting used in our study, with widely spaced objects (20 cm), it canbe ignored when going for high overall success

A combined model for the extension ofproximal and distal pointing. The boundarybetween proximal and distal pointing isdefined by the personal distance .d

Motivation

Aim

How accura

te are point

ing gestures

?

m

m

m

ImprovedAdvances for the and

Contributing to more

models for human pointinginterpretation

production of pointing gesturesrobust

multimodal conversational interfaces

Applicationsm

m

m

m

Human-Computer Interaction

Human-Agent Interaction

Assistive TechnologyEmpirical research Usability Studies

- multimodal interfaces

- Human-Robot Interaction

and- automatic multimodal annotation- automatic grounding of gestures based on a world model

- multimodal conversational interfaces

Observations

Modelling approach

m

m

m

m

m

(as expected)

Ellipse shapes of the bagplots suggest a cone-based model of theextension of pointing

Pointing is fuzzy , but even in short range distanceFuzziness increases with distanceOvershooting at the edge of the domain (intentional)Still, the human object identifier shows a good performance of83.9%correct identifications

AI GroupFaculty of TechnologyBielefeld UniversityGermany

PASION

Marc E. [email protected]

SituatedArtificialCommunicators

SFB 360

Method

How to determine the parameters of the pointing extension model?

What is the opening angle?

What defines the anchor of the model?

Results

Pragmatic Model

IFP is more precisethan GFP

distinguish between proximal and distalpointing

When the pointing extension is handled on thelevel of pragmatics, we can allow for inferencemechanisms to disambiguate between severalobjects and, hence, use heuristics. For thesimulation, we used a basic heuristics based onangular distance between objects and thepointing-ray. Again, the table on the right depictsthe results of our simulation runs. This time IFPperforms better than GFP.

. The opening angles in the proximalrows are rather large, while the angles in the moredistal angles are much smaller. This motivates usto

.

Primarym

m

m

m

Pointing is best interpreted at the level of pragmatics and not semanticsIndex-Finger-Pointing is more preciseGaze-Finger-Pointing is more accurateThe results stated in the tables to the left and our qualitativeobservations using IADE suggest a dichotomy of proximal vs. distalpointing. This fits nicely with the dichotomy common in manylanguages (here vs. there)

Conclusion

row IFP GFP

α perf. α perf.

1 84 70.27 86 68.92

2 80 61.84 68 75

3 71 71.43 69 81.82

4 60 53.95 38 65.79

5 36 43.84 24 57.53

6 24 31.15 25 42.62

7 14 23.26 17 23.26

8 10 7.14 10 14.29

all 71 38.54 61 48.12

row IFP GFP

α perf. α perf.

1 120 98.65 143 98.65

2 109 100 124 100

3 99 94.81 94 93.51

4 109 98.68 89 93.42

5 72 97.26 75 94.52

6 44 91.8 50 90.16

7 38 86.05 41 67.44

8 31 52.38 26 69.05

all 120 96.04 143 92.71

Simulations

Exploring the data

The optimal opening angles per row for aof the pointing extension. In

addition, the performance in terms of correctly identifiedobjects in percent of all objects within the specified area isdepicted, both for IFP and GFP. The row titled all showsthe performance for rows 1-7, row 8 has been excludedbecause of the overshooting behavior.

a strict semanticpointing cone model

The optimal opening angles per row for aof the pointing extension. In

addition, the performance in terms of correctly identifiedobjects in percent of all objects within the specified area isdepicted, both for IFP and GFP. The row titled all showsthe performance for rows 1-7, row 8 has been excludedbecause of the overshooting behavior.

a pragmaticpointing cone model

A visualization of the intersections of the pointing-ray (dots) for four different objects over allparticipants. The data is grouped via bagplots, the asterisk marks the mean, the darker areaclusters 50 percent, the brighter area 75 percent of the demonstrations. In the depicted settingthe person pointing was standing to the left, the person identifying the objects to the right.

(M1)

(M2)

(M3)

Study on object pointingm

m

m

m

m

Interaction of two participantsTwo conditions: speech +gesture and gesture onlyReal objectsStudy with 62 participantsCooperative effort withlinguists

proximalcone

distalcone

distance d

Technologym

m

m

m

Audio + video recordingsMotion capturing using ARTGmbH optical tracking systemAutomatic adaptation of amodel of the user’s postureSpecial hand-made soft gloves

Interaction Gamem

m

m

m

m

m

Description Giver is presented withthe object to demonstrateDescription Giver utters deicticexpression (s+g or g only)Object Identifier tries to identify theobjectDescription Giver gives feedback(yes/no)Proceed with next objectNo corrections or repairs!

The study combines multimodaldata comprising audio, video,motion capture and annotationdata. A coherent synchronized viewof all data sources is provided usingthe Interactive Augmented DataExplorer (IADE), developed atBielefeld University.The picture to the right shows asession with the Interactive Aug-mented Data Explorer. The scientistto the right interactively explores therecorded data for qualitativeanalysis. The model to the left showsthe table with the objects and a stick-figure driven by the motion capturedata. The video taken from onecamera perspective is displayed onthe floating panel, together with theaudio recordings. Information from

specific annotation tiers is presented as floating text.The scientist can, e.g., take the perspective of thedescription giver or the object identifier. Allelements are interactive, e.g. the video panels andannotations can be resized or positioned to allow fora comfortable investigation.

Several simulation runs have been conducted totest different approaches to model the pointingextension on the collected data.

For a strict semantic model the pointing extensionhas to single out object. In thesimulation runs we determined the optimal anglefor such a pointing cone per row and for theoverall area. The results are depicted in the tableon the right. For the strict semantic model theGFP offers better performance while having anarrower opening angle.

Strict Semantic Model

one and only one

GFP is more accuratethan IFP.

Do we aim with the index finger (IFP) orby gazing over the index finger (GFP)?

spotlight

camera

display- M1: task- M2 + M3: system time

tracking system

Documents

put that there Aim m models for human pointing m interpretation …tpfeiffe/pubs/2008... · 2013. 3. 18. · Deixis: How to Determine Demonstrated Objects Using a Pointing Cone