75
Understanding and Predicting User Satisfaction with Intelligent Assistants Julia Kiseleva, Kyle Williams, Jiepu Jiang, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, Tasos Anastasakos Eindhoven University of Technology Pennsylvania State University University of Massachusetts Amherst Microsoft

Understanding and Predicting User Satisfaction with Intelligent Assistants

Embed Size (px)

Citation preview

Page 1: Understanding and Predicting User Satisfaction with Intelligent Assistants

Understanding and Predicting User Satisfaction

with Intelligent AssistantsJulia Kiseleva, Kyle Williams, Jiepu Jiang,

Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, Tasos Anastasakos

Eindhoven University of Technology Pennsylvania State University

University of Massachusetts Amherst Microsoft

Page 2: Understanding and Predicting User Satisfaction with Intelligent Assistants

Why do we care?

2011

-09

2011

-11

2012

-01

2012

-03

2012

-05

2012

-07

2012

-09

2012

-11

2013

-01

2013

-03

2013

-05

2013

-07

2013

-09

2013

-11

2014

-01

2014

-03

2014

-05

2014

-07

2014

-09

2014

-11

2015

-01

2015

-03

2015

-05

2015

-07

2015

-090

20

40

60

80

100Desktop Mobile

Timeline

Perc

enta

ge o

f Tr

affic

http://gs.statcounter.com

Page 3: Understanding and Predicting User Satisfaction with Intelligent Assistants

Desktop Mobile

Page 4: Understanding and Predicting User Satisfaction with Intelligent Assistants

Desktop Mobile

Page 5: Understanding and Predicting User Satisfaction with Intelligent Assistants

Understanding User Satisfaction with Intelligent Assistants

Page 6: Understanding and Predicting User Satisfaction with Intelligent Assistants

Q1: how is the weather in ChicagoQ2: how is it this weekendQ3: find me hotelsQ4: which one of these is the cheapestQ5: which one of these has at least 4 starsQ6: find me directions from the Chicago airport to number one

User’s dialogue with

Cortana:Task is

“Finding a hotel in

Chicago”

Page 7: Understanding and Predicting User Satisfaction with Intelligent Assistants

Q1: find me a pharmacy nearbyQ2: which of these is highly ratedQ3: show more information about number 2Q4: how long will it take me to get thereQ5: Thanks

User’s dialogue with

Cortana:Task is

“Finding a pharmacy”

Page 8: Understanding and Predicting User Satisfaction with Intelligent Assistants

Research Questions• RQ1: What are characteristic types of scenarios of

use?

Page 9: Understanding and Predicting User Satisfaction with Intelligent Assistants

Controlling Device• Call a person

• Send a text message

• Check on-device calendar

• Open an application

• Turn on/off wi-fi

• Play music

Page 10: Understanding and Predicting User Satisfaction with Intelligent Assistants
Page 11: Understanding and Predicting User Satisfaction with Intelligent Assistants

Knowledge Pane

Image Answer

Page 12: Understanding and Predicting User Satisfaction with Intelligent Assistants

Knowledge Pane

Image Answer Image Answer

Organic Results

Page 13: Understanding and Predicting User Satisfaction with Intelligent Assistants

Knowledge Pane

Image Answer Image Answer

Location Answer

Organic Results

Page 14: Understanding and Predicting User Satisfaction with Intelligent Assistants

User:“Do I

need to have a jacket

tomorrow?”

Search Dialogue

Page 15: Understanding and Predicting User Satisfaction with Intelligent Assistants

User:“Do I

need to have a jacket

tomorrow?”

Cortana: “You could

probably go without one. The forecast

shows …”

Search Dialogue

Page 16: Understanding and Predicting User Satisfaction with Intelligent Assistants

Cortana: “Here are

ten restaurants near you”

User:“show

restaurants near

me”

Search Dialogue

Page 17: Understanding and Predicting User Satisfaction with Intelligent Assistants

Cortana: “Here are

ten restaurants near you”

Cortana:“Here are ten restaurants

near you that have good reviews”

User:“show

restaurants near

me”

User:“show

the best restaurants near

me ”

Search Dialogue

Page 18: Understanding and Predicting User Satisfaction with Intelligent Assistants

Cortana: “Here are

ten restaurants near you”

Cortana:“Here are ten restaurants

near you that have good reviews”

Cortana:“Getting you direction to the Mayuri

Indian Cuisine”

User:“show

restaurants near

me”

User:“show

the best restaurants near

me ”

User:“show

directions to the second one”

Search Dialogue

Page 19: Understanding and Predicting User Satisfaction with Intelligent Assistants

Research Questions• RQ1: What are characteristic types of scenarios of use?

• RQ2: How can we measure different aspects of user satisfaction?

• RQ3: What are key factors determining user satisfaction for the different scenarios?

• RQ4: How to characterize abandonment in the web search scenario?

• RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?

Page 20: Understanding and Predicting User Satisfaction with Intelligent Assistants

Research Questions• RQ1: What are characteristic types of scenarios of use?

• RQ2: How can we measure different aspects of user satisfaction?

• RQ3: What are key factors determining user satisfaction for the different scenarios?

• RQ4: How to characterize abandonment in the web search scenario?

• RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?

USE

R

STU

DY

Page 21: Understanding and Predicting User Satisfaction with Intelligent Assistants

User Study Participants

55%45%

LANGUAGEEnglish Other

• 60 Participants• 25.53 +/- 5.42 years

Page 22: Understanding and Predicting User Satisfaction with Intelligent Assistants

User Study Participants

75%

25%

GENDER

Male Female

55%45%

LANGUAGEEnglish Other

• 60 Participants• 25.53 +/- 5.42 years

Page 23: Understanding and Predicting User Satisfaction with Intelligent Assistants

User Study Participants

75%

25%

GENDER

Male Female

55%45%

LANGUAGEEnglish Other

82%

8%2% 8%

Education

Computer ScienceElectrical EngineeringMathematicsOther

• 60 Participants• 25.53 +/- 5.42 years

Page 24: Understanding and Predicting User Satisfaction with Intelligent Assistants

User Study Design• Video Instructions (same for all participants)

• Tasks are realistic – mined from Cortana logs:

o Control type of taskso Queries where users don’t clicko Search dialogue tasks – mostly localization type of

queries

Page 25: Understanding and Predicting User Satisfaction with Intelligent Assistants

Find out what is the hair color of

your favorite celebrity

Page 26: Understanding and Predicting User Satisfaction with Intelligent Assistants

You are planning a vacation. Pick a

place. Check if the weather is good enough for the period you are planning the

vacation. Find a hotel that suits you.

Find the driving directions to this

place.

Page 27: Understanding and Predicting User Satisfaction with Intelligent Assistants

You are planning a vacation. Pick a

place. Check if the weather is good enough for the period you are planning the

vacation. Find a hotel that suits you.

Find the driving directions to this

place.

Page 28: Understanding and Predicting User Satisfaction with Intelligent Assistants

Questionnaire: Controlling Device

• Were you able to complete the task?o Yes/No

• How satisfied are you with your experience in this task?o 5-point Likert scale

• How well did Cortana recognize what you said?o 5-point Likert scale

• Did you put in a lot of effort to complete the task?o 5-point Likert scale

Page 29: Understanding and Predicting User Satisfaction with Intelligent Assistants

Questionnaire: Controlling Device

• Were you able to complete the task?o Yes/No

• How satisfied are you with your experience in this task?o 5-point Likert scale

• How well did Cortana recognize what you said?o 5-point Likert scale

• Did you put in a lot of effort to complete the task?o 5-point Likert scale

5 Tasks20 Minutes

Page 30: Understanding and Predicting User Satisfaction with Intelligent Assistants

Questionnaire: Good Abandonment

• Were you able to complete the task?o Yes/No

• Where did you find the answer?o Answer Box, Image, SERP, Visited Website

• Which query led you to finding the answer?o First, Second, Third, >= Fourth

• How satisfied are you with your experience in this task?o 5-point Likert scale

• Did you put in a lot of effort to complete the task?o 5-point Likert scale

Page 31: Understanding and Predicting User Satisfaction with Intelligent Assistants

Questionnaire: Good Abandonment

• Were you able to complete the task?o Yes/No

• Where did you find the answer?o Answer Box, Image, SERP, Visited Website

• Which query led you to finding the answer?o First, Second, Third, >= Fourth

• How satisfied are you with your experience in this task?o 5-point Likert scale

• Did you put in a lot of effort to complete the task?o 5-point Likert scale

5 Tasks20 Minutes

Page 32: Understanding and Predicting User Satisfaction with Intelligent Assistants

Questionnaire: Search Dialogue

• Were you able to complete the task?o Yes/No

• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded

satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?

• How well did Cortana recognize what you said?o 5-point Likert scale

• Did you put in a lot of effort to complete the task?o 5-point Likert scale

Page 33: Understanding and Predicting User Satisfaction with Intelligent Assistants

Questionnaire: Search Dialogue

• Were you able to complete the task?o Yes/No

• How satisfied are you with your experience in this task?o If the task has sub-tasks participants indicate their graded

satisfaction e.g. o a. How satisfied are you with your experience in finding a hotel? o b. How satisfied are you with your experience in finding directions?

• How well did Cortana recognize what you said?o 5-point Likert scale

• Did you put in a lot of effort to complete the task?o 5-point Likert scale

8 Tasks: 1 simple, 4 with 2 subtasks, 3 with 3 subtasks

30 Minutes

Page 34: Understanding and Predicting User Satisfaction with Intelligent Assistants

Search Dialog Dataset• 540 tasks that incorporated

• 2, 040 queries, of which 1, 969 were unique

• the average query-length is 7.07

• The simple task generated 130 queries in total

• Tasks with 2 context switches generated 685 queries

• Tasks with 3 context switches generated 1, 355 queries

Page 35: Understanding and Predicting User Satisfaction with Intelligent Assistants

Factors Determining Satisfaction

RQ3: What are key factors determining user satisfaction for the different scenarios?

Page 36: Understanding and Predicting User Satisfaction with Intelligent Assistants

Across Scenar-

ious

Device Control

Web Search

Structured Dialog

50

1

2

3

4

5

6

Across Scenar-

ious

Device Control

Web Search

Structured Dialog

50

1

2

3

4

5

6

Satis

fact

ion

Leve

l

Effor

ts

Results Over ScenariosMean of Satisfaction

Page 37: Understanding and Predicting User Satisfaction with Intelligent Assistants

Results `Good Abandonment’

RQ4: How to characterize abandonment in the web search scenario?

Page 38: Understanding and Predicting User Satisfaction with Intelligent Assistants

First Query

Second Query

Third Query

>= Fourth Quey

0

1

2

3

4

5

6

Answer Box

Image SERP Visited WebSite

50

1

2

3

4

5

6

Satis

fact

ion

Leve

l

Results `Good Abandonment’

Mean of Satisfaction

Page 39: Understanding and Predicting User Satisfaction with Intelligent Assistants

Search Dialogue Satisfaction

RQ5: How does query-level satisfaction relate to overall user satisfaction for the structured search dialogue scenario?

Page 40: Understanding and Predicting User Satisfaction with Intelligent Assistants

Cortana: “Here are

ten restaurants near you”

Cortana:“Here are ten restaurants

near you that have good reviews”

Cortana:“Getting you direction to the Mayuri

Indian Cuisine”

User:“show

restaurants near

me”

User:“show

the best restaurants near

me ”

User:“show

directions to the second one”

SAT?

SAT?

SAT?

SAT?

SAT?

SAT?

Overall

SAT??

Page 41: Understanding and Predicting User Satisfaction with Intelligent Assistants

Search Dialogue Satisfaction

RQ5: How does query-level satisfaction relate to overall user satisfaction for the structured search dialogue scenario?

Page 42: Understanding and Predicting User Satisfaction with Intelligent Assistants

Satisfaction Over Different Tasks

Satisfaction Level

Weather Task

Num

ber

of

Ans

wer

s

1 2 3 4 5

Page 43: Understanding and Predicting User Satisfaction with Intelligent Assistants

Satisfaction Over Different Tasks

Satisfaction Level

Weather Task Mission Task (2 sub-tasks)

Num

ber

of

Ans

wer

s

1 2 3 4 5

Page 44: Understanding and Predicting User Satisfaction with Intelligent Assistants

Satisfaction Over Different Tasks

Satisfaction Level

Weather Task Mission Task (2 sub-tasks)

Mission Task (3 sub-tasks)

Num

ber

of

Ans

wer

s

1 2 3 4 5

Page 45: Understanding and Predicting User Satisfaction with Intelligent Assistants

Q1: what do you have medicine for the stomach acheQ2: stomach ache medicine over the counter

Q3: show me the nearest pharmacyQ4: more information on the second one

Q5: do they have a stool softenerQ6: does Fred Meyer have stool softeners

General Search

Search Dialog

Combination of scenarios

User’s dialogue with Cortana related to the ‘stomach ache’ problem

Page 46: Understanding and Predicting User Satisfaction with Intelligent Assistants

Conclusions (1)• RQ1: What are characteristic types of scenarios of use?• We proposed three main types of scenarios

• RQ2: How can we measure different aspects of user satisfaction?

• We designed a series of user studies tailored to the three scenarios

• RQ3: What are key factors determining user satisfaction for the different scenarios?

• Effort is a key component of user satisfaction across the different intelligent assistants scenarios

Page 47: Understanding and Predicting User Satisfaction with Intelligent Assistants

Conclusions (2)• RQ4: How to characterize abandonment in the web

search scenario?• We concluded that to measure good abandonment we

need to investigate the other forms of interaction signals that are not based on clicks or reformulation

• RQ5: How does query-level satisfaction relate to overall user satisfaction for the search dialogue scenario?

• We looked at user satisfaction as ‘a user journey towards an information goal where each step is important,’ and showed the importance of session context

Page 48: Understanding and Predicting User Satisfaction with Intelligent Assistants

Predicting User Satisfaction with Intelligent Assistants(Good Abandonment Case)

Page 49: Understanding and Predicting User Satisfaction with Intelligent Assistants

Evaluating User Satisfaction

• We need metrics to evaluate user satisfaction

• Good abandonment [Human et. al, 2009]: Mobile: 36% of abandoned queries in were likely good Desktop: 14.3%

• Traditional methods use implicit signals: clicks and dwell time

Page 50: Understanding and Predicting User Satisfaction with Intelligent Assistants

Evaluating User Satisfaction

• We need metrics to evaluate user satisfaction

• Good abandonment [Human et. al, 2009]: Mobile: 36% of abandoned queries in were likely good Desktop: 14.3%

• Traditional methods use implicit signals: clicks and dwell time

Don’t work

Page 51: Understanding and Predicting User Satisfaction with Intelligent Assistants

Our Main Research Problem

In the absence of clicks, what is the relationship between a user's gestures and satisfaction and can we use gestures to detect satisfaction and good abandonment?

Page 52: Understanding and Predicting User Satisfaction with Intelligent Assistants

Research Questions• RQ1: What SERP elements are the sources of good

abandonment in mobile search?

• RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search?

• RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment?

Page 53: Understanding and Predicting User Satisfaction with Intelligent Assistants

Research Questions• RQ1: What SERP elements are the sources of good

abandonment in mobile search?

• RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search?

• RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment?

USE

R

STU

DY

Page 54: Understanding and Predicting User Satisfaction with Intelligent Assistants

Research Questions• RQ1: What SERP elements are the sources of good

abandonment in mobile search?

• RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search?

• RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment?

USE

R

STU

DY

CR

OW

DSO

UR

CIN

G

Page 55: Understanding and Predicting User Satisfaction with Intelligent Assistants

Crowdsourcing ProcedureRandom sample of abandoned queries from the search logs of a personal digital assistant during one week in June 2015 (no query suggestion)

Page 56: Understanding and Predicting User Satisfaction with Intelligent Assistants

Crowdsourcing ProcedureQuery: Peniston

Previous Query: third eroics

Page 57: Understanding and Predicting User Satisfaction with Intelligent Assistants

Crowdsourcing Data• Total amount of queries – 3,895

• Judgments agreement (3 per one query) – 73%

• After filtering: SAT – 1,565 and DSAT – 1,924

Page 58: Understanding and Predicting User Satisfaction with Intelligent Assistants

RQ1: Reasons of Good Abandonment

Page 59: Understanding and Predicting User Satisfaction with Intelligent Assistants

RQ1: Reasons of Good Abandonment

Mean of Satisfaction

Page 60: Understanding and Predicting User Satisfaction with Intelligent Assistants

Query and Session Features• Session duration• Number of queries in session

Session Features

Page 61: Understanding and Predicting User Satisfaction with Intelligent Assistants

Query and Session Features• Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated

Session Features

Query Features

Page 62: Understanding and Predicting User Satisfaction with Intelligent Assistants

Query and Session Features• Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated• Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec)

Session Features

Query Features

Click Features

Page 63: Understanding and Predicting User Satisfaction with Intelligent Assistants

Baseline 1:Click & Dwell• Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated• Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec)

Session Features

Query Features

Click Features

Click > 30 sec

No Refomulation

B1: Click, Dwell with no Reform

ulation

Page 64: Understanding and Predicting User Satisfaction with Intelligent Assistants

Baseline 2: Optimistic • Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated• Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec)

Session Features

Query Features

Click Features

NOClick

NO Refomulation

B2: Optimistic

Page 65: Understanding and Predicting User Satisfaction with Intelligent Assistants

Baseline 3: Query-Session Model• Session duration• Number of queries in session • Index of query within session• Time to next query • Query length (number of words)• Is this query a reformulation• Was this query reformulated• Click count • Number of SAT clicks (> 30 sec) • Number of back-click clicks (< 30 sec)

Session Features

Query Features

Click Features

B3: Query-Session Model:

Training Random Forest

Page 66: Understanding and Predicting User Satisfaction with Intelligent Assistants

Gesture Features (1)• Viewport features swipes-related:

o up swipes and down swipeso changes in swipe direction o swiped distance in pixels and average swiped distanceo swipe distance divided by time spent on the SERP

Page 67: Understanding and Predicting User Satisfaction with Intelligent Assistants

Gesture Features (1)• Viewport features swipes-related:

o up swipes and down swipeso changes in swipe direction o swiped distance in pixels and average swiped distanceo swipe distance divided by time spent on the SERP

• Time To Focuso Time to focus on Answero Time to Focus on Organic Search Results

Page 68: Understanding and Predicting User Satisfaction with Intelligent Assistants

3 seconds

6 seconds33% of

ViewPort 66% of

ViewPort

View

Port

H

eigh

t

2 seconds20% of ViewPo

rt

1s 4s 0.4s 5.4s+ + =

GF(2): Attributed Reading Time

Page 69: Understanding and Predicting User Satisfaction with Intelligent Assistants

400 pixels

300 pixels

AttributedReading Time: 5.4s

Pixel Area: (400 pix x 300

pix)

0.045 ms/pix2=

GF (3): Attributed Reading Time Per Pixel

Page 70: Understanding and Predicting User Satisfaction with Intelligent Assistants

Models: Detecting Good Abandonment

M1: Gesture Model:Training Random Forest based on gesture features

M2: Gesture Model + Query and Session Features:Training Random Forest based on gesture, query and session features

Page 71: Understanding and Predicting User Satisfaction with Intelligent Assistants

RQ2: Are gestures useful? (1)

On only abandoned user study data: 148 SAT queries and 313 DSAT queries

Page 72: Understanding and Predicting User Satisfaction with Intelligent Assistants

RQ2: Are gestures useful? (2)

On crowdsourced data: 1565 SAT queries and 1924 DSAT queries

Page 73: Understanding and Predicting User Satisfaction with Intelligent Assistants

RQ2: Are gestures useful? (3)

On all user study data: 179 SAT queries and 384 DSAT queries

Gestures Features are useful to detect user satisfaction in general!

Page 74: Understanding and Predicting User Satisfaction with Intelligent Assistants

Conclusions• RQ1: What SERP elements are the sources of good

abandonment in mobile search?Answer, Images and Snippet

• RQ2: Do a user's gestures provide signals that can be used to detect satisfaction and good abandonment in mobile search?

Yes

• RQ3: Which user gestures provide the strongest signals for satisfaction and good abandonment

Time spent interacting with Answers is positively correlated. Swipe actions and time spent with SERP is negatively correlated

Page 75: Understanding and Predicting User Satisfaction with Intelligent Assistants

• Answer, Images and Snippet are potentially source of the good abandonment

• User gestures provide useful signals to detect good abandonment

• Time spent interacting with Answers is positively correlated. Swipe actions and time spent with SERP is negatively correlated

Questions?