16

Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Embed Size (px)

Citation preview

Page 1: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical
Page 2: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Dan Bohus

in physically situated interactive systems

Page 3: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical
Page 4: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical
Page 5: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical
Page 6: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Microphone

Microphone Array

Capture

VAD

Voice Activity

Argon

Microsoft.Speech

Speech Recognition

NLU

NLU

Speech Pipeline

Vision Pipeline

USB Camera

Kinect

Capture

IP Camera

Background Models

Image Processing

Optical Flow

Detection & Tracking

Face Tracking

Face Recognition

Gender Detection

Face Pose Tracking

Skeletal Tracking

Blob Tracking

RFID / Badge IR Proximity Sensor GUI / Mouse Eventing Accelerometer

Other Input Sensors

Fusion and Scene Analysis

inp

uts

PointGrey Camera

Page 7: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Microphone

Microphone Array

Capture

VAD

Voice Activity

Argon

Microsoft.Speech

Speech Recognition

NLU

NLU

Speech Pipeline

Vision Pipeline

USB Camera

Kinect

Capture

IP Camera

Background Models

Image Processing

Optical Flow

Detection & Tracking

Face Tracking

Face Recognition

Gender Detection

Face Pose Tracking

Skeletal Tracking

Blob Tracking

RFID / Badge IR Proximity Sensor GUI / Mouse Eventing Accelerometer

Other Input Sensors

Fusion and Scene Analysis

inp

uts

PointGrey Camera

Visual Focus-of-Attention Model

Page 8: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Visual Focus-of-Attention Model

Microphone

Microphone Array

Capture

VAD

Voice Activity

Argon

Microsoft.Speech

Speech Recognition

NLU

NLU

Speech Pipeline

Vision Pipeline

USB Camera

Kinect

Capture

IP Camera

Background Models

Image Processing

Optical Flow

Detection & Tracking

Face Tracking

Face Recognition

Gender Detection

Face Pose Tracking

Skeletal Tracking

Blob Tracking

RFID / Badge IR Proximity Sensor GUI / Mouse Eventing Accelerometer

Other Input Sensors

Fusion and Scene Analysis

inp

uts

PointGrey Camera

Engagement Model

Page 9: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Visual Focus-of-Attention Model

Engagement Model

Microphone

Microphone Array

Capture

VAD

Voice Activity

Argon

Microsoft.Speech

Speech Recognition

NLU

NLU

Speech Pipeline

Vision Pipeline

USB Camera

Kinect

Capture

IP Camera

Background Models

Image Processing

Optical Flow

Detection & Tracking

Face Tracking

Face Recognition

Gender Detection

Face Pose Tracking

Skeletal Tracking

Blob Tracking

RFID / Badge IR Proximity Sensor GUI / Mouse Eventing Accelerometer

Other Input Sensors

Fusion and Scene Analysis

inp

uts

PointGrey Camera

Speech Source-Target Model

Page 10: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Visual Focus-of-Attention Model

Engagement Model

Speech Source-Target Model

Microphone

Microphone Array

Capture

VAD

Voice Activity

Argon

Microsoft.Speech

Speech Recognition

NLU

NLU

Speech Pipeline

Vision Pipeline

USB Camera

Kinect

Capture

IP Camera

Background Models

Image Processing

Optical Flow

Detection & Tracking

Face Tracking

Face Recognition

Gender Detection

Face Pose Tracking

Skeletal Tracking

Blob Tracking

RFID / Badge IR Proximity Sensor GUI / Mouse Eventing Accelerometer

Other Input Sensors

Fusion and Scene Analysis

Dialog Management /Interaction Planning

Output Control

inp

uts

Rendering and Effectors

ou

tpu

ts

PointGrey Camera

Floor Inference Model

Identity Inference Model

Semantic Input Inference Model

Natural Language Generation

Gaze Control

Gesture Control

Display/GUI Control

Speech Synthesis

3D Avatar Head

Nao Robot

GUI

Finite-State Dialog Management

HTN-based Dialog Management *

Situated Activity Management *

Page 11: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Visual Focus-of-Attention Model

Engagement Model

Speech Source-Target Model

Microphone

Microphone Array

Capture

VAD

Voice Activity

Argon

Microsoft.Speech

Speech Recognition

NLU

NLU

Speech Pipeline

Vision Pipeline

USB Camera

Kinect

Capture

IP Camera

Background Models

Image Processing

Optical Flow

Detection & Tracking

Face Tracking

Face Recognition

Gender Detection

Face Pose Tracking

Skeletal Tracking

Blob Tracking

RFID / Badge IR Proximity Sensor GUI / Mouse Eventing Accelerometer

Other Input Sensors

Fusion and Scene Analysis

Dialog Management /Interaction Planning

Output Control

inp

uts

Rendering and Effectors

ou

tpu

ts

PointGrey Camera

Floor Inference Model

Identity Inference Model

Semantic Input Inference Model

Natural Language Generation

Gaze Control

Gesture Control

Display/GUI Control

Speech Synthesis

3D Avatar Head

Nao Robot

GUI

Finite-State Dialog Management

HTN-based Dialog Management *

Situated Activity Management *

sense

thin

k

act

Page 12: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Managing complexity

programming models for parallel,

coordinated computation

debugging and visualization tools

Time

Uncertainty & ML

inp

uts

ou

tpu

ts

Page 13: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical
Page 14: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

stream double f;

f=3; f=x*f-y;

persistence w/ historical access (e.g. f[-200ms]), sampling, transforms (e.g. f.Slope[-500ms:0ms])

sychronization and coordination primitives

inp

uts

ou

tpu

ts

meta-reasoning about time

Page 15: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Microphone array capture

Sound source localization

Speech recognition

Language understanding

Infrared proximity sensors

Badge sensors

Face detection and tracking

Head-pose tracking

Facial feature tracking

Face identity recognition

Gender detection

Attention models

Engagement models

Turn-taking models

Behavioral control

Dialog management

Natural language generation

Speech synthesis

Avatar synthesis

Robot motion control

Floor-plan models

User models

composability

testing and maintenance

versioning

interactivity (with outside world or other

components)

blame assignment

system-level optimization

Page 16: Dan Bohus Activity Argon Microsoft.Speech Speech Recognition NLU NLU Speech Pipeline Vision Pipeline USB Camera Kinect Capture IP Camera Background Models Image Processing Optical

Artificial

Intelligence

Software

Engineering

Machine

LearningSystems