Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Final Report IN5480 Specialization in research in design of IT Autumn 2018 Interaction with AI Use of Conversational Agents
By Brage Westvik Bråten, Vetle Alexander Gjestang Ivar Skorpen Johnsen and Linett Simonsen
IN5480 – Autumn 2018 – Bråten et al.
Table of contents
1. Introduction 2 1.1 Group members 2 1.2 Aim 2 1.3 Project motivation 2 1.4 User group 3 1.5 Questions we want to adress 3
2. Background 3 2.1 Definitions 3 2.2 Literature review 3
3. Methods 5 3.1 Interviews 6 3.2 Review of Conversational Agents 6
4. Empirical findings 6 4.1 Interviews 6 4.2 Review of Conversational Agents 7
4.2.1 Siri 7 4.2.2 Amazon Alexa 7 4.2.3 Google Assistant 7
5. Discussion 8
6. Concluding remarks 10
7. Evaluation approach and reflections on the proposed plan 11
8. References 12
Appendix A 13 A.1 Purpose, design and implementation 13 A.2 Reflections on the process 16
Appendix B 17 B.1 Initial script 17 B.2 Iterations on the script 18
Appendix C 20
Appendix D 21 D.1 Scenario 1: Automation Level 7 21 D.2 Scenario 2: Automation Level 10 21
1
IN5480 – Autumn 2018 – Bråten et al.
1. Introduction
1.1 Group members
Our group consists of Brage W. Bråten ([email protected]), Vetle A. Gjestang
([email protected]), Ivar S. Johnsen ([email protected]) and Linett Simonsen
([email protected]), all attending the master program in Informatics: Design, Use, Interaction.
Brage, Vetle and Ivar are in their first semester. Linett is in her third semester.
1.2 Aim
The main objective of this project is to examine human interaction with Conversational Agents
(CAs) on all platforms, including laptops, smartphones and smart speakers. Our main aim is to
investigate causes for use or not use of CAs. We also want to investigate to what extend users
trust their CA, and if level of trust impact what tasks the CAs are given.
1.3 Project motivation
Conversational Agents are heavily advertised by technology manufacturers, and products such
as headphones are made with buttons for quick access to them. According to Moore et al.
(2017), “conversational interfaces are hitting the mainstream and becoming ubiquitous in our
daily lives”. However, we noticed that most of the group members in this project do not use their
CAs, i.e. Siri or Google Assistant. However, we have anecdotal stories of friends of us, who runs
about their entire homes from their CAs. This study will investigate if there are different
motivations, as practical needs, and different security concerns, that makes users differ in their
use of such virtual assistants.
1.4 User group
We have chosen young adults, so-called digital natives (Prensky, 2001), as our user group for
this project. Digital natives are the generation of young people who are “native speakers” of the
digital language of computers, video games and the Internet (Prensky, 2001). One of the
reasons for choosing young adults as our user group, is that we find it interesting to study how
2
IN5480 – Autumn 2018 – Bråten et al.
digital natives have included CAs in their daily life. Another reason is that young adults are an
accessible user group, and therefore easy to find for interviews within the time constraints in this
project.
1.5 Questions we want to adress
Our key research question is as follows:
• What types of tasks do people use conversational agents for?
To answer our key research question we will also take into consideration:
• To which extent does users trust the VA in regards to which tasks they are given?
2. Background
2.1 Definitions
According to Luger and Sellen (2016), there have in recent years been a rise of Conversational
Agents (CAs) in everyday life. CAs can be defined as “dialogue systems often endowed with
“humanlike behaviour” (Vassallo et al., 2010, p. 358). Another definition of an CA, retrieved from
Chatbots.org (2018), is as a “software program which interprets and responds to statements
made by users in ordinary natural language”. Such programs integrate computational linguistics
techniques with communication over the Internet.
2.2 Literature review
Researchers and practitioners in the field of Human-Computer Interaction (HCI) have for
decades been improving their skills in designing Graphical User Interfaces (GUIs) (Følstad &
Brandtzæg, 2017). As new technology develop, new opportunities arises, and according to
Følstad and Brandtzæg (2017, p. 38), major technology companies currently see natural
language user interfaces as the next big thing. If digital interaction moves from GUI based
websites and apps towards Natural Language User Interfaces (NLUI), huge challenges and
opportunities may await the field of HCI (Følstad & Brandtzæg, 2017). NLUI is widely used for
CAs.
3
IN5480 – Autumn 2018 – Bråten et al.
In NLUI based systems, where content and features of the underlying service are mostly hidden
from the user, interaction is more dependent of the user’s input than in a GUI based system
(Følstad and Brandtzæg, 2017). Design for usability in a NLUI based system will for instance
include suggesting to the users what they can expect in the service, and adequate interpretation
of the users’ responses. In purely voice-based dialogue systems, where interactions, services,
and content previously demarcated, blur into the same conversational thread, focus will turn
towards entire service processes across conversational touchpoints with the user.
When designing for NLUI based systems, Følstad and Brandtzæg (2017) emphasize the issues
regarding so-called one-size-fits-all setups. When all users, regardless of needs, preferences,
and degrees of digital literacy, receive responses in the same language, it is possible that some
undesirable biases may be introduced (Følstad & Brandtzæg, 2017). However, as new
technology emerges, personalization can be supported to a degree that biases and divides are
mitigated (Følstad & Brandtzæg, 2017). According to Følstad and Brandtzæg (2017), a key
success factor with NLUIs is how well they can “support conversational processes while
providing useful output”. For this reason, to design for both user guidance towards attainable
goals, and acceptable responses in cases of conversational breakdowns, is important. In the
article, Følstad and Brandtzæg (2017) also express their concerns regarding ethical and privacy
challenges in NLUI based systems. It is argued that a stronger attention on ethics and privacy is
needed, and that HCI researchers and practitioners have an important role to play in this work
(Følstad & Brandtzæg, 2017).
The article "Like Having a Really Bad PA", written by Luger and Sellen (2016), also addresses
some of the current limitations with NLUI based systems. Based on 14 semi-structured
interviews conducted with users of different CA systems, the authors find user expectations to
be “dramatically out of step with the operation of the systems” (Luger & Sellen, 2016, p. 5286).
According to Luger and Sellen, users in general have poor mental models of how their CA work,
and they tend to have too high expectations regarding the system’s intelligence, capability and
goals.
In their study, Luger and Sellen found that the primary user goal of their interviewees wasn’t
solely to use the CA, making the system a means to an end rather than an end in itself. The
users Luger and Sellen interviewed, mostly used their CA for relative simple tasks such as
4
IN5480 – Autumn 2018 – Bråten et al.
checking upcoming weather and setting alarms, particularly in situations where there hands
were otherwise engaged and occupied by other tasks. According to Luger and Sellen, this
implies that the principle use-case of a CA system is hands-free.
The majority of the users Luger and Sellen interviewed tend to engage with their CA system
only up to the point that it ceases to provide utility. Most of the participants had a reluctance to
use their CA for complex or sensitive tasks, especially where they perceived a high social cost
to failure. In their study, Luger and Sellen also found that participants with technical skills were
better able to “see beyond artificial humanlike qualities to devise their own mental models of
interaction”. Less skilled users described greater levels of frustration, leading them to doubt the
intelligence of their CA.
Factors that had negatively affected the participants use of their CA, was mainly that their CA
had misunderstood their words or commands. If the CA had responded to task requests by
defaulting to on-screen web-search results, this was commonly perceived as a system failure.
According to Luger and Sellen, a majority of the participants did express their desire to have
more natural conversational interactions with their CA. Several of the interviewees also reported
issues regarding a lack of feedback and transparency. Concludingly, Luger and Sellen state that
there is a need for humanlike cues and affordances relied upon by multimodal systems.
3. Methods
In this study we will use interviews as a method to gather data. We will compare the findings
from our interviews with the relevant literature. By doing this, we can compare data, and
possibly get a firmer grip of the situation. As practically every younger person we know owns a
smartphone with a CA, we think that participants will be easily in our reach. In addition to
interviews and surveys, we will do a review of three commonly used CAs.
3.1 Interviews
In this study we aim to conduct in-depth semi-structured interviews with different users of
Conversational Agents. The reason why we choose to conduct interviews is because we want to
get a understanding of the different users’ motivation to use or not use Conversational Agents.
5
IN5480 – Autumn 2018 – Bråten et al.
We also want to hear the users’ thoughts on issues regarding privacy, reliability and
dependency. Prior to the interviews, we will inform the interviewees about all aspects of the trial.
Thereafter the interviewees have to sign a informed consent form, a confirmation of voluntarily
willingness to participate. After we have conducted the interviews, we will do a transcription of
our notes. We will then do an open coding, to label issues and key points as they arise.
3.2 Review of Conversational Agents
In this study we will do a review of three different and commonly used Conversational Agents.
We have chosen to review Apple’s Siri, Google’s Google Assistant and Amazon’s Alexa, and do
a comparison of the three to find out if they have some similarities or differences..
4. Empirical findings
4.1 Interviews
We have conducted interviews with people in our social circles to try to gather data on their use
of CAs. Through the interviews we noticed that people rarely use their CA. However, when they
used it, they generally stuck to one CA. One of our interviewees used a smart speaker, the rest
used the CA on their smartphones. One interviewee said that even though he had heard about
the possibility of speaking to his phone, he had never used it. He didn’t see a need for it, and he
wasn’t interested in trying it.
Another interviewee said that in his apartment, almost the entire lighting system was connected
to the CA. However, even though he could control the lights through a NLUI, he mainly used an
app or the good old fashioned light switch. For him, it was awkward and unnatural to talk to his
CA.
Another one of our interviewees motivation for using his CA was that he simply enjoyed it. He
describes that he uses his CA as another interface for interacting with Google. When he is
looking for information while on the couch he finds it easy to ask the question out loud instead of
picking up his phone and manually typing the question into Google. He also describes that he
uses it to find information about when specific shops close or to get the forecast.
6
IN5480 – Autumn 2018 – Bråten et al.
4.2 Review of Conversational Agents
4.2.1 Siri
Siri is Apple’s CA and it’s on every new Apple device. You can ask it simple queries, and Siri
use voice recognition and search functions to carry out the task as well as possible within the
CAs technical confinements. You can for example ask Siri to “call my mom” or “set a timer for 10
minutes”, and the CA will do that. To harder, more personal or controversial queries like “Is
Donald Trump a good president?”, the CA will answer “Here's what I found on the web for ‘Is
donald Trump a good president’”. The CA will not read any of the articles, but it will give you
easy access to click on them. One of our interviewees used Siri
4.2.2 Amazon Alexa
Alexa is Amazon’s CA. Alexa is something you only can get through a “smart speaker”. Since
Alexa is made by one of the largest online retailers in the world, it is really good at buying things
from Amazon's website. If you for example have forgotten diapers for your baby while grocery
shopping, you can ask Alexa to get them for you. None of our interviewees used Amazon Alexa.
4.2.3 Google Assistant
Google Assistant is Google's CA. Google Assistant is an app for iPhone and Android phones,
and it also comes preloaded on a few Android phones. According to Følstad and Brandtzæg
(2017), Google Assistant “reliably helps you out with questions in natural language, such as
when the sun sets or where to find the nearest coffee shop, even when asked follow-up
questions for directions or opening hours”. Google Assistant is very good at task that require
looking up something on the Internet. If you ask Google Assistant “Hvor på tabellen ligger
Brann?”, or in english “Where on the placings is Brann located?”, Google Assistant will show
you the full placements of teams in the Norwegian soccer league. Several of our interviewees
used Google Assistant.
7
IN5480 – Autumn 2018 – Bråten et al.
5. Discussion
The users we interviewed, seemed to mostly use their CAs for rather simple tasks, such as
hands-free operation (e.g. when cooking). When trying to recruit interviewees, we noticed that
many didn't use their CAs at all, even though almost everybody have one in their pocket. The
reason for their limited use is something we find interesting, but it was not within our research
scope. We found that the level of trust given to the CA by users was generally held back due to
scepticism to either the CA abilities or privacy issues.
An interviewee had the possibility to control his apartment through the interface of his CA, but
said he felt stupid interacting with it. One of the reasons why he felt stupid was because he had
to talk to the CA in an an unnatural way for it to understand his intentions. This might be a
limitation due to the UX-designers limited knowledge of nuances and complexity behind a
natural conversation (Moore et al., 2017). According to Moore et al. (2016), modeling natural
conversation is still a hard problem. Although it is easy to get a system to produce words, none
of the current CAs display general conversational competence (Moore et al., 2016). According
to the Luger and Sellen’s (2016) findings, there was a desire by several of their participants to
be able to carry out more natural conversational interactions. However, if an answer from a CA
is too natural, users could over-expect its performance (Lin et al., 2016).
Some also felt uncomfortable talking to their CAs in public. These findings are also present in
the article “Like having a really bad PA” by Luger and Sellen (2016). Users from Luger and
Sellens research pointed out that especially in social situations, they were careful with which
tasks they gave their CAs. They only used them for very simple tasks such as directions from
here to a given place. The article pointed out that lack of different understandings of the
underlying technology also affects how it is used. Users with a higher understanding of the
technology are more likely to adjust how a question is phrased, to make it easier for the CA.
Although we didn’t go in depth on our interviewees technological understanding, this might also
be an explanation for why some of them either had a very specific use for them, or didn’t bother
at all.
8
IN5480 – Autumn 2018 – Bråten et al.
One of our interviewees, who used Google Assistant on his Google Home device, said he didn’t
use the CA on a regular basis due to the technological limitations. The interviewee said he
preferred using his smartphone simply because in his opinion, the CA in the Google Home
wasn’t working good enough to be trusted. He mentioned that he didn’t even trust the CA to turn
on a alarm due to previous experience were the alarm didn't go off. The reason for buying it, he
said, was to test the technology. He had concluded that due to the CA’s technological
limitations, his interactions with it was limited. According to Følstad and Brandtzæg,
conversations with Google Assistant “break down fast enough for this to be an interface for only
the most enthusiastic of techies” (Følstad and Brandtzæg, 2017, p. 40), a statement that can be
seen in relation to the experience of our interviewee.
Another interviewee only used Siri. He used Siri for very specific hands free tasks, such as
when he was cooking. He was using Siri to time the cooking of e.g. rice. He also said that he
uses Siri a lot when it’s cold outside and he is wearing gloves. The main reason was that it was
easier to use Siri than to pick up the phone, take off his gloves and then dial someone. When
asked if he had bad experiences with the use, he answered that he had almost dialed wrong
person a couple of times, but as Siri is telling what it is about to do, he has the time to abort a
call. He admitted that he waits till he is not in near proximity of anyone else before calling, as he
feels a bit weird when he is talking to his own phone, through headphones. This also limits the
use of the CA somewhat, eg. in downtown Oslo.
Lin et al. (2016) describes that it's important that chatbots doesn't speak too naturally because
users might become confused to how smart the chatbot really is. We think that think that this
also is applicable to CA. It’s hard to conevey to the users what tasks the CA can perform and
not perform. The unnatural language of CA might be a good thing to convey to the users that
the CA’s isn’t the most advanced nor powerful.
A way designers can help the user to get a understanding and creating a mental model of their
CA is to design CAs as animal companions. Phillips et al. (2012) says that using animals as a
analogy can help users better understand the capabilities of robots. We believe that this is
applicable for the design of CAs as well. From our interviews we found out that some of the
users, out of old habit, only used the CA as a search engine. This could be related to the name
9
IN5480 – Autumn 2018 – Bråten et al.
of the CA being Google Assistant. Simply the fact of just being associated with the name
Google might project wrong mental models, i.e. the CA’s only use is to conduct web searches.
Looking at our own findings, and the findings of Luger and Sellen, it seems that the average
user in most situations have more motivation or reasons to use another interface or perform a
task by themself, than to take use of their CA.
6. Concluding remarks
We consider the data we have gathered during the project not substantial enough to make a
clear conclusion. We do however see that the findings are generally in line with the literature we
researched for the project. Similarities between our research and the literature is reflected in
how the users talk to their CA, i.e. altering language structure for a voice command to increase
efficiency of task completion. This is also mentioned by the users in the paper by Luger and
Sellen (2016).
Our findings indicate that users do not have a high level of trust in their CAs. The participants in
this study showed in general a reluctance to give their CA tasks involving risks, e.g. to set
important alarms, and tasks that was perceived as too complicated for their CAs to handle.
The interviewees in our projects are younger and on average more skilled in IT than the
participants in the research done by Luger and Sellen (2016). Interestingly enough, our
interviewees still had less trust or saw the benefits of using their CAs, in contradiction to what
the article describes - those with the most knowledge within computer science are more likely to
use a CA. This is an interesting find, even though most of the findings from our interviews are in
line with the article.
As a continuation of this research, an interesting question to address would be why the users
feel stupid while talking to their CAs, and how they can be designed to fit more naturally into
everyday life. Hands-free has been around for decades, so seemingly talking to no one in public
has become quite natural for many. There can be many reasons for why this feels strange with
CAs, such as knowing it is a computer, or the fact that one often has to use a restricted
language that sounds more like commands than a conversation.
10
IN5480 – Autumn 2018 – Bråten et al.
7. Evaluation approach and reflections on the proposed
plan
We ended up changing our research question from “What causes the users to or not to interact
with the virtual assistant?” to “What types of tasks do people use conversational agents for?”, as
we considered our initial question as too wide and open. The rephrased question gave us the
possibility to gain more insight into what kind of situations CAs are used in, but it still didn’t
completely rule out the ability to get insight into reasons for not using CAs. From our previous
experience with open projects, a lot of time can be consumed alone by just trying to make the
initial scope smaller to fit the timeframe and resources.
In hindsight, we can see that the interviews we conducted provided less information than we
initially thought and had hoped for. For this reason, we had to rely on literature studies to a
higher degree than what we had planned. We could have gone wider out with an online survey,
but this was cut due to time constraints. This is unfortunate, considering that the data from the
interviews was not of the volume we had hoped for. An online survey would probably have given
us more data, which in turn would have given us a better basis for generalisation. In our study,
we decided to exclude all non-users of CAs as participants. However, while we were searching
for possible interviewees, we found that several young adults, didn’t use a CA, even though they
had one available. As a proposal for future research, we consider this to be an interesting topic
for further investigation.
11
IN5480 – Autumn 2018 – Bråten et al.
8. References
Chatbots.org. (2018). Conversational Agent. Retrieved October 16, 2018 from
https://www.chatbots.org/conversational_agent/
Følstad, A., & Brandtzæg, P. B. (2017). Chatbots and the new world of HCI. Interactions,
24(4), 38-42.
Lin, L., D'Haro, L. F. & Banchs, R. (2016). A Web-based Platform for Collection of
Human-Chatbot Interactions. Proceedings of the Fourth International Conference on
Human Agent Interaction (HAI '16). ACM, New York, NY, USA, 363-366. doi:
https://doi.org/10.1145/2974804.2980500
Luger, E., & Sellen, A. (2016). Like having a really bad PA: the gulf between user expectation
and experience of conversational agents. Proceedings of the 2016 CHI Conference on
Human Factors in Computing Systems, 5286-5297. ACM.
Moore, R. J., Arar, R., Ren, G. & Szymanski, M. H. (2017). Conversational UX Design.
Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in
Computing Systems (CHI EA '17). ACM, New York, NY, USA, 492-497. doi:
https://doi-org.ezproxy.uio.no/10.1145/3027063.3027077
Phillips, E., Ososky, S., Swigert, B., & Jentsch, F. (2012). Human-animal teams as an analog
for future human-robot teams. Proceedings of the Human Factors and Ergonomics
Society Annual Meeting, 56(1), 1553-1557.
Prensky, M. (2001). Digital Natives, Digital Immigrants. On the Horizon, 9(5), 1-6.
Vassallo, G., Pilato, G., Augello, A. & Gaglio, S. (2010). Phrase Coherence in Conceptual
Spaces for Conversational Agents. Semantic Computing, 357-371. doi:
10.1002/9780470588222.ch18
12
IN5480 – Autumn 2018 – Bråten et al.
Appendix A
A.1 Purpose, design and implementation
The purpose of our prototype (Excuse Bot) is to is give excuses for being late to an
appointment. The key tasks of the chatbot is to purpose excuses that can can be used in
situations where the user is late for something. We chose to use Chatfuel as the implementation
platform since we wanted to create a bot for messaging and social media platforms, more
specific, a Facebook Messenger bot. On Facebook Messenger, 46 % of the bots are powered
by Chatfuel , and the company aims at making it easy for everyone to build chatbots on 1
Facebook Messenger. We created a scripted chatbot, and implemented some AI-rules so the
chatbot should be able to recognise certain phrases and reply with a relevant answer.
Figure 1: A diagram showing how we planned to implement the chatbot prototype.
1 Chatfuel. (2018). Relationship-based Messenger marketing. Retrieved 15.10.18 from https://chatfuel.com/
13
IN5480 – Autumn 2018 – Bråten et al.
An example of a dialogue the chatbot should support.
User: Hello
Bot: Hi, [first name]. Are you late for something?
User: Yes
Bot: Do you want help with an excuse?
User: Yes
Bot: Does this one work?
Bot: I'm so sorry. The public transport(metro, tram, bus) was delayed.
User: Yes
Bot: Okay, that's great! Glad I could help!
Bot: Good luck with your excuse!
User: Thanks!
Bot: You're welcome!
A list of examples of excuses gathered from the “Excuse Bank”.
- There was construction right in front of my front door so I had to wait for the construction
workers to get done.
- I'm so sorry. The public transport(metro, tram, bus) was delayed.
- There were two dogs fighting on my way here so I had to help out the owners.
- I’m sorry, I misplaced the contact lens and couldn’t find it.
- I had a dream where I already was here, and then I woke up! I’m so sorry!
- I’m so so sorry. My pet got stuck in the toilet so I had to get him out.
- I was stuck in an elevator with a kid who pushed all the buttons.
- I’m so sorry! I got confused and drove to my elementary school, then i saw myself in the
mirror and remembered I’m not 9 years old and I don't go to elementary school. I’m sure
you can understand.
The AI-rules that were implemented in Chatfuel.
- Phrases like “Hi” will trigger the response message “Hi, [first name]. Are you late for
something?”.
- Phrases like “Thanks” will trigger the response message “You're welcome!”.
14
IN5480 – Autumn 2018 – Bråten et al.
Figure 2: A diagram showing how we implemented our prototype with four different excuses for
each category (personal, work and school).
15
IN5480 – Autumn 2018 – Bråten et al.
A.2 Reflections on the process
We started the design process with a discussion on what we wanted the purpose of the chatbot
to be. After some brainstorming, we decided to create an Excuse Bot. One of the reasons for
this was because we wanted the chatbot to have a narrow scope. After we had decided the
purpose of the chatbot, we identified the key tasks we wanted the chatbot to perform. We
decided that we wanted to divide the excuses into categories based on appointment type, and
we chose to start with three different appointment categories (school, work and personal). In
Chatfuel we created four different blocks of excuses linked to each category-block. We tested
the chatbot ourselves, and iterated several times. Due to the fact that the excuses are
generated at random, each excuse has an equal likelihood of being shown. A weakness with
the chosen design is that there are cases where the same excuse has been generated twice in
a row.
Figure 3: Screenshots of our chatbot interacting with a user.
16
IN5480 – Autumn 2018 – Bråten et al.
Appendix B
B.1 Initial script
We started the work process with running the initial script. There are two epochs in this script.
According to Brownlee (2018), the number of epochs is a “hyperparameter of gradient descent
that controls the number of complete passes through the training dataset”. The results after the 2
second epoch was an accuracy around 26 % and a validation accuracy around 1 %. As we
understand these results, the neural network hadn’t learned very well.
batch_size = 32
max_words = 1000
epochs = 2
#Model
model = Sequential()
model.add(Dense(512, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
Figure 4: These are the results from the program given out by Morten Goodwin.
2 https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
17
IN5480 – Autumn 2018 – Bråten et al.
B.2 Iterations on the script
We started our work process by increasing and decreasing different values, in addition to
changing the types and number of layers. We then observed how the level of accuracy
changed, and made iterations to improve the results.
batch_size = 32
max_words = 1000
epochs = 5
#Model
model = Sequential()
model.add(Dense(5000, input_shape=(max_words,)))
model.add(Dense(7000, input_shape=(max_words,)))
model.add(Activation('relu'))
model.add(Dropout(0.3))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
18
IN5480 – Autumn 2018 – Bråten et al.
Figure 5: These are the results we got when using five epochs and two dense layers. Dense
layers are fully connected layers where all neurons in one layer are connected to those in the
next layer.
From the results, we can see that the Deep Neural Net has been learning a lot of the data from
the training data set. However, when tested on new data, the accuracy is low. This can be read
from the high accuracy value (0.8122) and the low validity accuracy value (0.0200).
We have increased the number of dense layers to two and increased their value from 512 to
5000 and 7000. We have no scientific reason to why we did this, other than that we wanted to
experiment.
The time the machine used to run through the different epochs has also increased drastically.
This is because of the much higher values in dense but also because we changed the dropout
value from 0.5 to 0.3. Dropout is a a technique used to tackle overfitting. We tried to decrease
the dropout to speed up the learning process, but then we will not get a so robust learning
algorithm.
19
IN5480 – Autumn 2018 – Bråten et al.
Appendix C
Rema 1000 Smart House commercial Link: https://www.youtube.com/watch?v=sgJLpuprQp8
In the commercial, we can see a highly automated home. About everything in the main
characters life is ran by voice commands to some computer. When he goes to the dentist and
gets an anesthesia, his voice and pronunciation gets slurred. He is then locked out of his home
due to the voice controlled lock not responding. The problem escalades when the VA
misinterpret the voice commands e.g. “Hey! Open the door!” becomes “Play: On the floor”.
This problem could be solved by adding a combination lock to the door or by having the
possibility to override the system with a regular key. The problems shown in the commercial
could easily been discovered earlier by testing the system. The AI in the home could have been
designed to consider the context of a command, e.g.: the likeliness of someone shouting “play
on the floor” to a door lock is very low. A fuzzy search could also have noticed the similarity
between the interpreted “play on the floor” and “hey, open the door”.
20
IN5480 – Autumn 2018 – Bråten et al.
Appendix D Introduction: Sorry for our long appendix D. We had too much fun with Scenario 2.
D.1 Scenario 1: Automation Level 7
Level 7: Computer generates recommended options human decides (or input own choice) and
system carries out
We purpose a system were the intelligent agent gathers data through a normal recruitment
process, but instead of HR reading resumes and applications letters, the AI does this. The AI
then recommends the HR what candidates that are most applicable for an available position.
HR then invites the candidates for an interview, and follow the normal hiring routine.
The advantages with such a system is that it can help the humans take better decisions, but at
the same time, it is still the human that is in control. A problem with this form of AI is that the AI
can be biased due to the training data. There was recently a big news story about Amazon’s
hiring AI being unfavourable towards women because it had combed through male-dominated 3
résumés to accrue its data. By involving humans in the hiring process, such biases can be dealt
with.
D.2 Scenario 2: Automation Level 10
Level 10: The computers acts autonomously ignoring the human
We propose a system were the intelligent agent is hiring people based on big data gathered
from the Internet and IoT connected devices. No one has to manually apply for a job, because
there is another AI program that searches for people that are looking for jobs. This AI is looking
for companies that the person may want to work in based on manually configured preferences
and big data.
3 https://nordic.businessinsider.com/amazon-built-ai-to-hire-people-discriminated-against-women-2018-10?r=US&IR=T
21
IN5480 – Autumn 2018 – Bråten et al.
The recruitment AI, on the other hand, is constantly searching for profiles matching the needs of
the company that is hiring. When the recruitment AI makes a match with a job-searching AI, a
person is automatically hired. The person will get a notification, and the ability to approve or
decline a given position. Should the person accept their new job they get a rudimentary fill-in
about the specific workplace and can just walk into their new workplace.
Because of the highest level of automation the AI also provides a “stock-video” presentation of
the new employee to the company. It gatherers pictures and personal aspects from social media
and automates a bite-sized video for the company to watch and “get to know their new
employee”.
The advantages with such a system is that it will free up time for the human employees. It is
also possible that it will hire more suited candidates than a human recruiter. Disadvantages of
such as system is that co-workers and supervisors basically knows nothing before the new
employee starts at work. It can also be considered strange with new people just walking in to
work, never having met anyone there before.
A possible problem for this scenario is that the company is always changing employees. This
might cause current workers to feel uncertain of their job security and this can cause a bad work
environment. To take this problem further we could argue that this constant uncertainty impact
stress levels and lead mental health problems or fatigue. That again could impact employees
engagement and motivation in both extremes of the spectrum. Either losing interest in work-life
or maximizing their efforts beyond healthy levels to keep their job.
A solution to this problem could be that the AI get some rules and restrictions on how to manage
employees, e.g. the AI will only look for new employees if an access card is scheduled for
deactivation.
22