23
Final Report IN5480 Specialization in research in design of IT Autumn 2018 Interaction with AI Use of Conversational Agents By Brage Westvik Bråten, Vetle Alexander Gjestang Ivar Skorpen Johnsen and Linett Simonsen

Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

Final Report IN5480 Specialization in research in design of IT Autumn 2018 Interaction with AI Use of Conversational Agents

  By Brage Westvik Bråten, Vetle Alexander Gjestang Ivar Skorpen Johnsen and Linett Simonsen

Page 2: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

Table of contents

1. Introduction 2 1.1 Group members 2 1.2 Aim 2 1.3 Project motivation 2 1.4 User group 3 1.5 Questions we want to adress 3

2. Background 3 2.1 Definitions 3 2.2 Literature review 3

3. Methods 5 3.1 Interviews 6 3.2 Review of Conversational Agents 6

4. Empirical findings 6 4.1 Interviews 6 4.2 Review of Conversational Agents 7

4.2.1 Siri 7 4.2.2 Amazon Alexa 7 4.2.3 Google Assistant 7

5. Discussion 8

6. Concluding remarks 10

7. Evaluation approach and reflections on the proposed plan 11

8. References 12

Appendix A 13 A.1 Purpose, design and implementation 13 A.2 Reflections on the process 16

Appendix B 17 B.1 Initial script 17 B.2 Iterations on the script 18

Appendix C 20

Appendix D 21 D.1 Scenario 1: Automation Level 7 21 D.2 Scenario 2: Automation Level 10 21

1

Page 3: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

1. Introduction

1.1 Group members

Our group consists of Brage W. Bråten ([email protected]), Vetle A. Gjestang

([email protected]), Ivar S. Johnsen ([email protected]) and Linett Simonsen

([email protected]), all attending the master program in Informatics: Design, Use, Interaction.

Brage, Vetle and Ivar are in their first semester. Linett is in her third semester.

1.2 Aim

The main objective of this project is to examine human interaction with Conversational Agents

(CAs) on all platforms, including laptops, smartphones and smart speakers. Our main aim is to

investigate causes for use or not use of CAs. We also want to investigate to what extend users

trust their CA, and if level of trust impact what tasks the CAs are given.

1.3 Project motivation

Conversational Agents are heavily advertised by technology manufacturers, and products such

as headphones are made with buttons for quick access to them. According to Moore et al.

(2017), “conversational interfaces are hitting the mainstream and becoming ubiquitous in our

daily lives”. However, we noticed that most of the group members in this project do not use their

CAs, i.e. Siri or Google Assistant. However, we have anecdotal stories of friends of us, who runs

about their entire homes from their CAs. This study will investigate if there are different

motivations, as practical needs, and different security concerns, that makes users differ in their

use of such virtual assistants.

1.4 User group

We have chosen young adults, so-called digital natives (Prensky, 2001), as our user group for

this project. Digital natives are the generation of young people who are “native speakers” of the

digital language of computers, video games and the Internet (Prensky, 2001). One of the

reasons for choosing young adults as our user group, is that we find it interesting to study how

2

Page 4: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

digital natives have included CAs in their daily life. Another reason is that young adults are an

accessible user group, and therefore easy to find for interviews within the time constraints in this

project.

1.5 Questions we want to adress

Our key research question is as follows:

• What types of tasks do people use conversational agents for?

To answer our key research question we will also take into consideration:

• To which extent does users trust the VA in regards to which tasks they are given?

2. Background

2.1 Definitions

According to Luger and Sellen (2016), there have in recent years been a rise of Conversational

Agents (CAs) in everyday life. CAs can be defined as “dialogue systems often endowed with

“humanlike behaviour” (Vassallo et al., 2010, p. 358). Another definition of an CA, retrieved from

Chatbots.org (2018), is as a “software program which interprets and responds to statements

made by users in ordinary natural language”. Such programs integrate computational linguistics

techniques with communication over the Internet.

2.2 Literature review

Researchers and practitioners in the field of Human-Computer Interaction (HCI) have for

decades been improving their skills in designing Graphical User Interfaces (GUIs) (Følstad &

Brandtzæg, 2017). As new technology develop, new opportunities arises, and according to

Følstad and Brandtzæg (2017, p. 38), major technology companies currently see natural

language user interfaces as the next big thing. If digital interaction moves from GUI based

websites and apps towards Natural Language User Interfaces (NLUI), huge challenges and

opportunities may await the field of HCI (Følstad & Brandtzæg, 2017). NLUI is widely used for

CAs.

3

Page 5: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

In NLUI based systems, where content and features of the underlying service are mostly hidden

from the user, interaction is more dependent of the user’s input than in a GUI based system

(Følstad and Brandtzæg, 2017). Design for usability in a NLUI based system will for instance

include suggesting to the users what they can expect in the service, and adequate interpretation

of the users’ responses. In purely voice-based dialogue systems, where interactions, services,

and content previously demarcated, blur into the same conversational thread, focus will turn

towards entire service processes across conversational touchpoints with the user.

When designing for NLUI based systems, Følstad and Brandtzæg (2017) emphasize the issues

regarding so-called one-size-fits-all setups. When all users, regardless of needs, preferences,

and degrees of digital literacy, receive responses in the same language, it is possible that some

undesirable biases may be introduced (Følstad & Brandtzæg, 2017). However, as new

technology emerges, personalization can be supported to a degree that biases and divides are

mitigated (Følstad & Brandtzæg, 2017). According to Følstad and Brandtzæg (2017), a key

success factor with NLUIs is how well they can “support conversational processes while

providing useful output”. For this reason, to design for both user guidance towards attainable

goals, and acceptable responses in cases of conversational breakdowns, is important. In the

article, Følstad and Brandtzæg (2017) also express their concerns regarding ethical and privacy

challenges in NLUI based systems. It is argued that a stronger attention on ethics and privacy is

needed, and that HCI researchers and practitioners have an important role to play in this work

(Følstad & Brandtzæg, 2017).

The article "Like Having a Really Bad PA", written by Luger and Sellen (2016), also addresses

some of the current limitations with NLUI based systems. Based on 14 semi-structured

interviews conducted with users of different CA systems, the authors find user expectations to

be “dramatically out of step with the operation of the systems” (Luger & Sellen, 2016, p. 5286).

According to Luger and Sellen, users in general have poor mental models of how their CA work,

and they tend to have too high expectations regarding the system’s intelligence, capability and

goals.

In their study, Luger and Sellen found that the primary user goal of their interviewees wasn’t

solely to use the CA, making the system a means to an end rather than an end in itself. The

users Luger and Sellen interviewed, mostly used their CA for relative simple tasks such as

4

Page 6: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

checking upcoming weather and setting alarms, particularly in situations where there hands

were otherwise engaged and occupied by other tasks. According to Luger and Sellen, this

implies that the principle use-case of a CA system is hands-free.

The majority of the users Luger and Sellen interviewed tend to engage with their CA system

only up to the point that it ceases to provide utility. Most of the participants had a reluctance to

use their CA for complex or sensitive tasks, especially where they perceived a high social cost

to failure. In their study, Luger and Sellen also found that participants with technical skills were

better able to “see beyond artificial humanlike qualities to devise their own mental models of

interaction”. Less skilled users described greater levels of frustration, leading them to doubt the

intelligence of their CA.

Factors that had negatively affected the participants use of their CA, was mainly that their CA

had misunderstood their words or commands. If the CA had responded to task requests by

defaulting to on-screen web-search results, this was commonly perceived as a system failure.

According to Luger and Sellen, a majority of the participants did express their desire to have

more natural conversational interactions with their CA. Several of the interviewees also reported

issues regarding a lack of feedback and transparency. Concludingly, Luger and Sellen state that

there is a need for humanlike cues and affordances relied upon by multimodal systems.

3. Methods

In this study we will use interviews as a method to gather data. We will compare the findings

from our interviews with the relevant literature. By doing this, we can compare data, and

possibly get a firmer grip of the situation. As practically every younger person we know owns a

smartphone with a CA, we think that participants will be easily in our reach. In addition to

interviews and surveys, we will do a review of three commonly used CAs.

3.1 Interviews

In this study we aim to conduct in-depth semi-structured interviews with different users of

Conversational Agents. The reason why we choose to conduct interviews is because we want to

get a understanding of the different users’ motivation to use or not use Conversational Agents.

5

Page 7: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

We also want to hear the users’ thoughts on issues regarding privacy, reliability and

dependency. Prior to the interviews, we will inform the interviewees about all aspects of the trial.

Thereafter the interviewees have to sign a informed consent form, a confirmation of voluntarily

willingness to participate. After we have conducted the interviews, we will do a transcription of

our notes. We will then do an open coding, to label issues and key points as they arise.

3.2 Review of Conversational Agents

In this study we will do a review of three different and commonly used Conversational Agents.

We have chosen to review Apple’s Siri, Google’s Google Assistant and Amazon’s Alexa, and do

a comparison of the three to find out if they have some similarities or differences..

4. Empirical findings

4.1 Interviews

We have conducted interviews with people in our social circles to try to gather data on their use

of CAs. Through the interviews we noticed that people rarely use their CA. However, when they

used it, they generally stuck to one CA. One of our interviewees used a smart speaker, the rest

used the CA on their smartphones. One interviewee said that even though he had heard about

the possibility of speaking to his phone, he had never used it. He didn’t see a need for it, and he

wasn’t interested in trying it.

Another interviewee said that in his apartment, almost the entire lighting system was connected

to the CA. However, even though he could control the lights through a NLUI, he mainly used an

app or the good old fashioned light switch. For him, it was awkward and unnatural to talk to his

CA.

Another one of our interviewees motivation for using his CA was that he simply enjoyed it. He

describes that he uses his CA as another interface for interacting with Google. When he is

looking for information while on the couch he finds it easy to ask the question out loud instead of

picking up his phone and manually typing the question into Google. He also describes that he

uses it to find information about when specific shops close or to get the forecast.

6

Page 8: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

4.2 Review of Conversational Agents

4.2.1 Siri

Siri is Apple’s CA and it’s on every new Apple device. You can ask it simple queries, and Siri

use voice recognition and search functions to carry out the task as well as possible within the

CAs technical confinements. You can for example ask Siri to “call my mom” or “set a timer for 10

minutes”, and the CA will do that. To harder, more personal or controversial queries like “Is

Donald Trump a good president?”, the CA will answer “Here's what I found on the web for ‘Is

donald Trump a good president’”. The CA will not read any of the articles, but it will give you

easy access to click on them. One of our interviewees used Siri

4.2.2 Amazon Alexa

Alexa is Amazon’s CA. Alexa is something you only can get through a “smart speaker”. Since

Alexa is made by one of the largest online retailers in the world, it is really good at buying things

from Amazon's website. If you for example have forgotten diapers for your baby while grocery

shopping, you can ask Alexa to get them for you. None of our interviewees used Amazon Alexa.

4.2.3 Google Assistant

Google Assistant is Google's CA. Google Assistant is an app for iPhone and Android phones,

and it also comes preloaded on a few Android phones. According to Følstad and Brandtzæg

(2017), Google Assistant “reliably helps you out with questions in natural language, such as

when the sun sets or where to find the nearest coffee shop, even when asked follow-up

questions for directions or opening hours”. Google Assistant is very good at task that require

looking up something on the Internet. If you ask Google Assistant “Hvor på tabellen ligger

Brann?”, or in english “Where on the placings is Brann located?”, Google Assistant will show

you the full placements of teams in the Norwegian soccer league. Several of our interviewees

used Google Assistant.

7

Page 9: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

5. Discussion

The users we interviewed, seemed to mostly use their CAs for rather simple tasks, such as

hands-free operation (e.g. when cooking). When trying to recruit interviewees, we noticed that

many didn't use their CAs at all, even though almost everybody have one in their pocket. The

reason for their limited use is something we find interesting, but it was not within our research

scope. We found that the level of trust given to the CA by users was generally held back due to

scepticism to either the CA abilities or privacy issues.

An interviewee had the possibility to control his apartment through the interface of his CA, but

said he felt stupid interacting with it. One of the reasons why he felt stupid was because he had

to talk to the CA in an an unnatural way for it to understand his intentions. This might be a

limitation due to the UX-designers limited knowledge of nuances and complexity behind a

natural conversation (Moore et al., 2017). According to Moore et al. (2016), modeling natural

conversation is still a hard problem. Although it is easy to get a system to produce words, none

of the current CAs display general conversational competence (Moore et al., 2016). According

to the Luger and Sellen’s (2016) findings, there was a desire by several of their participants to

be able to carry out more natural conversational interactions. However, if an answer from a CA

is too natural, users could over-expect its performance (Lin et al., 2016).

Some also felt uncomfortable talking to their CAs in public. These findings are also present in

the article “Like having a really bad PA” by Luger and Sellen (2016). Users from Luger and

Sellens research pointed out that especially in social situations, they were careful with which

tasks they gave their CAs. They only used them for very simple tasks such as directions from

here to a given place. The article pointed out that lack of different understandings of the

underlying technology also affects how it is used. Users with a higher understanding of the

technology are more likely to adjust how a question is phrased, to make it easier for the CA.

Although we didn’t go in depth on our interviewees technological understanding, this might also

be an explanation for why some of them either had a very specific use for them, or didn’t bother

at all.

8

Page 10: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

One of our interviewees, who used Google Assistant on his Google Home device, said he didn’t

use the CA on a regular basis due to the technological limitations. The interviewee said he

preferred using his smartphone simply because in his opinion, the CA in the Google Home

wasn’t working good enough to be trusted. He mentioned that he didn’t even trust the CA to turn

on a alarm due to previous experience were the alarm didn't go off. The reason for buying it, he

said, was to test the technology. He had concluded that due to the CA’s technological

limitations, his interactions with it was limited. According to Følstad and Brandtzæg,

conversations with Google Assistant “break down fast enough for this to be an interface for only

the most enthusiastic of techies” (Følstad and Brandtzæg, 2017, p. 40), a statement that can be

seen in relation to the experience of our interviewee.

Another interviewee only used Siri. He used Siri for very specific hands free tasks, such as

when he was cooking. He was using Siri to time the cooking of e.g. rice. He also said that he

uses Siri a lot when it’s cold outside and he is wearing gloves. The main reason was that it was

easier to use Siri than to pick up the phone, take off his gloves and then dial someone. When

asked if he had bad experiences with the use, he answered that he had almost dialed wrong

person a couple of times, but as Siri is telling what it is about to do, he has the time to abort a

call. He admitted that he waits till he is not in near proximity of anyone else before calling, as he

feels a bit weird when he is talking to his own phone, through headphones. This also limits the

use of the CA somewhat, eg. in downtown Oslo.

Lin et al. (2016) describes that it's important that chatbots doesn't speak too naturally because

users might become confused to how smart the chatbot really is. We think that think that this

also is applicable to CA. It’s hard to conevey to the users what tasks the CA can perform and

not perform. The unnatural language of CA might be a good thing to convey to the users that

the CA’s isn’t the most advanced nor powerful.

A way designers can help the user to get a understanding and creating a mental model of their

CA is to design CAs as animal companions. Phillips et al. (2012) says that using animals as a

analogy can help users better understand the capabilities of robots. We believe that this is

applicable for the design of CAs as well. From our interviews we found out that some of the

users, out of old habit, only used the CA as a search engine. This could be related to the name

9

Page 11: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

of the CA being Google Assistant. Simply the fact of just being associated with the name

Google might project wrong mental models, i.e. the CA’s only use is to conduct web searches.

Looking at our own findings, and the findings of Luger and Sellen, it seems that the average

user in most situations have more motivation or reasons to use another interface or perform a

task by themself, than to take use of their CA.

6. Concluding remarks

We consider the data we have gathered during the project not substantial enough to make a

clear conclusion. We do however see that the findings are generally in line with the literature we

researched for the project. Similarities between our research and the literature is reflected in

how the users talk to their CA, i.e. altering language structure for a voice command to increase

efficiency of task completion. This is also mentioned by the users in the paper by Luger and

Sellen (2016).

Our findings indicate that users do not have a high level of trust in their CAs. The participants in

this study showed in general a reluctance to give their CA tasks involving risks, e.g. to set

important alarms, and tasks that was perceived as too complicated for their CAs to handle.

The interviewees in our projects are younger and on average more skilled in IT than the

participants in the research done by Luger and Sellen (2016). Interestingly enough, our

interviewees still had less trust or saw the benefits of using their CAs, in contradiction to what

the article describes - those with the most knowledge within computer science are more likely to

use a CA. This is an interesting find, even though most of the findings from our interviews are in

line with the article.

As a continuation of this research, an interesting question to address would be why the users

feel stupid while talking to their CAs, and how they can be designed to fit more naturally into

everyday life. Hands-free has been around for decades, so seemingly talking to no one in public

has become quite natural for many. There can be many reasons for why this feels strange with

CAs, such as knowing it is a computer, or the fact that one often has to use a restricted

language that sounds more like commands than a conversation.

10

Page 12: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

7. Evaluation approach and reflections on the proposed

plan

We ended up changing our research question from “What causes the users to or not to interact

with the virtual assistant?” to “What types of tasks do people use conversational agents for?”, as

we considered our initial question as too wide and open. The rephrased question gave us the

possibility to gain more insight into what kind of situations CAs are used in, but it still didn’t

completely rule out the ability to get insight into reasons for not using CAs. From our previous

experience with open projects, a lot of time can be consumed alone by just trying to make the

initial scope smaller to fit the timeframe and resources.

In hindsight, we can see that the interviews we conducted provided less information than we

initially thought and had hoped for. For this reason, we had to rely on literature studies to a

higher degree than what we had planned. We could have gone wider out with an online survey,

but this was cut due to time constraints. This is unfortunate, considering that the data from the

interviews was not of the volume we had hoped for. An online survey would probably have given

us more data, which in turn would have given us a better basis for generalisation. In our study,

we decided to exclude all non-users of CAs as participants. However, while we were searching

for possible interviewees, we found that several young adults, didn’t use a CA, even though they

had one available. As a proposal for future research, we consider this to be an interesting topic

for further investigation.

11

Page 13: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

8. References

Chatbots.org. (2018). Conversational Agent. Retrieved October 16, 2018 from

https://www.chatbots.org/conversational_agent/

Følstad, A., & Brandtzæg, P. B. (2017). Chatbots and the new world of HCI. Interactions,

24(4), 38-42.

Lin, L., D'Haro, L. F. & Banchs, R. (2016). A Web-based Platform for Collection of

Human-Chatbot Interactions. Proceedings of the Fourth International Conference on

Human Agent Interaction (HAI '16). ACM, New York, NY, USA, 363-366. doi:

https://doi.org/10.1145/2974804.2980500

Luger, E., & Sellen, A. (2016). Like having a really bad PA: the gulf between user expectation

and experience of conversational agents. Proceedings of the 2016 CHI Conference on

Human Factors in Computing Systems, 5286-5297. ACM.

Moore, R. J., Arar, R., Ren, G. & Szymanski, M. H. (2017). Conversational UX Design.

Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in

Computing Systems (CHI EA '17). ACM, New York, NY, USA, 492-497. doi:

https://doi-org.ezproxy.uio.no/10.1145/3027063.3027077

Phillips, E., Ososky, S., Swigert, B., & Jentsch, F. (2012). Human-animal teams as an analog

for future human-robot teams. Proceedings of the Human Factors and Ergonomics

Society Annual Meeting, 56(1), 1553-1557.

Prensky, M. (2001). Digital Natives, Digital Immigrants. On the Horizon, 9(5), 1-6.

Vassallo, G., Pilato, G., Augello, A. & Gaglio, S. (2010). Phrase Coherence in Conceptual

Spaces for Conversational Agents. Semantic Computing, 357-371. doi:

10.1002/9780470588222.ch18

12

Page 14: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

Appendix A

A.1 Purpose, design and implementation

The purpose of our prototype (Excuse Bot) is to is give excuses for being late to an

appointment. The key tasks of the chatbot is to purpose excuses that can can be used in

situations where the user is late for something. We chose to use Chatfuel as the implementation

platform since we wanted to create a bot for messaging and social media platforms, more

specific, a Facebook Messenger bot. On Facebook Messenger, 46 % of the bots are powered

by Chatfuel , and the company aims at making it easy for everyone to build chatbots on 1

Facebook Messenger. We created a scripted chatbot, and implemented some AI-rules so the

chatbot should be able to recognise certain phrases and reply with a relevant answer.

Figure 1: A diagram showing how we planned to implement the chatbot prototype.

1 Chatfuel. (2018). Relationship-based Messenger marketing. Retrieved 15.10.18 from https://chatfuel.com/

13

Page 15: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

An example of a dialogue the chatbot should support.

User: Hello

Bot: Hi, [first name]. Are you late for something?

User: Yes

Bot: Do you want help with an excuse?

User: Yes

Bot: Does this one work?

Bot: I'm so sorry. The public transport(metro, tram, bus) was delayed.

User: Yes

Bot: Okay, that's great! Glad I could help!

Bot: Good luck with your excuse!

User: Thanks!

Bot: You're welcome!

A list of examples of excuses gathered from the “Excuse Bank”.

- There was construction right in front of my front door so I had to wait for the construction

workers to get done.

- I'm so sorry. The public transport(metro, tram, bus) was delayed.

- There were two dogs fighting on my way here so I had to help out the owners.

- I’m sorry, I misplaced the contact lens and couldn’t find it.

- I had a dream where I already was here, and then I woke up! I’m so sorry!

- I’m so so sorry. My pet got stuck in the toilet so I had to get him out.

- I was stuck in an elevator with a kid who pushed all the buttons.

- I’m so sorry! I got confused and drove to my elementary school, then i saw myself in the

mirror and remembered I’m not 9 years old and I don't go to elementary school. I’m sure

you can understand.

The AI-rules that were implemented in Chatfuel.

- Phrases like “Hi” will trigger the response message “Hi, [first name]. Are you late for

something?”.

- Phrases like “Thanks” will trigger the response message “You're welcome!”.

14

Page 16: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

Figure 2: A diagram showing how we implemented our prototype with four different excuses for

each category (personal, work and school).

15

Page 17: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

A.2 Reflections on the process

We started the design process with a discussion on what we wanted the purpose of the chatbot

to be. After some brainstorming, we decided to create an Excuse Bot. One of the reasons for

this was because we wanted the chatbot to have a narrow scope. After we had decided the

purpose of the chatbot, we identified the key tasks we wanted the chatbot to perform. We

decided that we wanted to divide the excuses into categories based on appointment type, and

we chose to start with three different appointment categories (school, work and personal). In

Chatfuel we created four different blocks of excuses linked to each category-block. We tested

the chatbot ourselves, and iterated several times. Due to the fact that the excuses are

generated at random, each excuse has an equal likelihood of being shown. A weakness with

the chosen design is that there are cases where the same excuse has been generated twice in

a row.

Figure 3: Screenshots of our chatbot interacting with a user.

16

Page 18: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

Appendix B

B.1 Initial script

We started the work process with running the initial script. There are two epochs in this script.

According to Brownlee (2018), the number of epochs is a “hyperparameter of gradient descent

that controls the number of complete passes through the training dataset”. The results after the 2

second epoch was an accuracy around 26 % and a validation accuracy around 1 %. As we

understand these results, the neural network hadn’t learned very well.

batch_size = 32

max_words = 1000

epochs = 2

#Model

model = Sequential()

model.add(Dense(512, input_shape=(max_words,)))

model.add(Activation('relu'))

model.add(Dropout(0.5))

model.add(Dense(num_classes))

model.add(Activation('softmax'))

Figure 4: These are the results from the program given out by Morten Goodwin.

2 https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/

17

Page 19: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

B.2 Iterations on the script

We started our work process by increasing and decreasing different values, in addition to

changing the types and number of layers. We then observed how the level of accuracy

changed, and made iterations to improve the results.

batch_size = 32

max_words = 1000

epochs = 5

#Model

model = Sequential()

model.add(Dense(5000, input_shape=(max_words,)))

model.add(Dense(7000, input_shape=(max_words,)))

model.add(Activation('relu'))

model.add(Dropout(0.3))

model.add(Dense(num_classes))

model.add(Activation('softmax'))

18

Page 20: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

Figure 5: These are the results we got when using five epochs and two dense layers. Dense

layers are fully connected layers where all neurons in one layer are connected to those in the

next layer.

From the results, we can see that the Deep Neural Net has been learning a lot of the data from

the training data set. However, when tested on new data, the accuracy is low. This can be read

from the high accuracy value (0.8122) and the low validity accuracy value (0.0200).

We have increased the number of dense layers to two and increased their value from 512 to

5000 and 7000. We have no scientific reason to why we did this, other than that we wanted to

experiment.

The time the machine used to run through the different epochs has also increased drastically.

This is because of the much higher values in dense but also because we changed the dropout

value from 0.5 to 0.3. Dropout is a a technique used to tackle overfitting. We tried to decrease

the dropout to speed up the learning process, but then we will not get a so robust learning

algorithm.

19

Page 21: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

Appendix C

Rema 1000 Smart House commercial Link: https://www.youtube.com/watch?v=sgJLpuprQp8

In the commercial, we can see a highly automated home. About everything in the main

characters life is ran by voice commands to some computer. When he goes to the dentist and

gets an anesthesia, his voice and pronunciation gets slurred. He is then locked out of his home

due to the voice controlled lock not responding. The problem escalades when the VA

misinterpret the voice commands e.g. “Hey! Open the door!” becomes “Play: On the floor”.

This problem could be solved by adding a combination lock to the door or by having the

possibility to override the system with a regular key. The problems shown in the commercial

could easily been discovered earlier by testing the system. The AI in the home could have been

designed to consider the context of a command, e.g.: the likeliness of someone shouting “play

on the floor” to a door lock is very low. A fuzzy search could also have noticed the similarity

between the interpreted “play on the floor” and “hey, open the door”.

20

Page 22: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

Appendix D Introduction: Sorry for our long appendix D. We had too much fun with Scenario 2.

D.1 Scenario 1: Automation Level 7

Level 7: Computer generates recommended options human decides (or input own choice) and

system carries out

We purpose a system were the intelligent agent gathers data through a normal recruitment

process, but instead of HR reading resumes and applications letters, the AI does this. The AI

then recommends the HR what candidates that are most applicable for an available position.

HR then invites the candidates for an interview, and follow the normal hiring routine.

The advantages with such a system is that it can help the humans take better decisions, but at

the same time, it is still the human that is in control. A problem with this form of AI is that the AI

can be biased due to the training data. There was recently a big news story about Amazon’s

hiring AI being unfavourable towards women because it had combed through male-dominated 3

résumés to accrue its data. By involving humans in the hiring process, such biases can be dealt

with.

D.2 Scenario 2: Automation Level 10

Level 10: The computers acts autonomously ignoring the human

We propose a system were the intelligent agent is hiring people based on big data gathered

from the Internet and IoT connected devices. No one has to manually apply for a job, because

there is another AI program that searches for people that are looking for jobs. This AI is looking

for companies that the person may want to work in based on manually configured preferences

and big data.

3 https://nordic.businessinsider.com/amazon-built-ai-to-hire-people-discriminated-against-women-2018-10?r=US&IR=T

21

Page 23: Final Report - Universitetet i oslo · As new technology develop, new opportunities arises, and according to Følstad and Brandtzæg (2017, p. 38), major technology companies currently

IN5480 – Autumn 2018 – Bråten et al.

The recruitment AI, on the other hand, is constantly searching for profiles matching the needs of

the company that is hiring. When the recruitment AI makes a match with a job-searching AI, a

person is automatically hired. The person will get a notification, and the ability to approve or

decline a given position. Should the person accept their new job they get a rudimentary fill-in

about the specific workplace and can just walk into their new workplace.

Because of the highest level of automation the AI also provides a “stock-video” presentation of

the new employee to the company. It gatherers pictures and personal aspects from social media

and automates a bite-sized video for the company to watch and “get to know their new

employee”.

The advantages with such a system is that it will free up time for the human employees. It is

also possible that it will hire more suited candidates than a human recruiter. Disadvantages of

such as system is that co-workers and supervisors basically knows nothing before the new

employee starts at work. It can also be considered strange with new people just walking in to

work, never having met anyone there before.

A possible problem for this scenario is that the company is always changing employees. This

might cause current workers to feel uncertain of their job security and this can cause a bad work

environment. To take this problem further we could argue that this constant uncertainty impact

stress levels and lead mental health problems or fatigue. That again could impact employees

engagement and motivation in both extremes of the spectrum. Either losing interest in work-life

or maximizing their efforts beyond healthy levels to keep their job.

A solution to this problem could be that the AI get some rules and restrictions on how to manage

employees, e.g. the AI will only look for new employees if an access card is scheduled for

deactivation.

22