10
A DELOITTE SERIES ON CONVERSATIONAL AI A conversational journey How the three Ts of conversational AI build better voice assistants Vatatmaja, Jyotirmay Gadewadikar, Sherry Comes, and Timothy Murphy THE DELOITTE CENTER FOR INTEGRATED RESEARCH

A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

A DELOITTE SERIES ON CONVERSATIONAL AI

A conversational journeyHow the three Ts of conversational AI build better voice assistants

Vatatmaja, Jyotirmay Gadewadikar, Sherry Comes, and Timothy Murphy

THE DELOITTE CENTER FOR INTEGRATED RESEARCH

Page 2: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

2

IN THE FIRST article of our conversational AI series, we explored how the proliferation of voice assistants and messaging platforms are giving

way to a new era of user interfaces (see the sidebar, “A five-part series on conversational AI”). Whether it’s in the car, a phone, or a smart home device, nearly 112 million US consumers rely on their voice assistants at least once a month—and that number continues to grow.1

Yet the popularity of voice assistants isn’t without its growing pains. These can range from the mundane, such as misinterpreting a request for ordering a roll of paper towel, to the more troubling error of providing a harmful health recommendation (or conversely, providing an accurate, but difficult to interpret recommenda-tion).2 Despite the uptick in adoption of voice- enabled virtual assistants, designing effective products is a nontrivial endeavor. Virtual assistants

often deal with multiple, sometimes complex scenarios that require understanding a range of queries to which users expect a quick, accurate, and easily interpretable response.

In our experience, designing an intuitive and effective voice assistant is not as straightforward as combining structured and unstructured data with powerful AI capabilities such as natural language processing (NLP) and machine learning. Instead, virtual voice assistants require designers to match their technical capabilities and resources with human intuition and oversight. Voice assistant design is both art and science. This means incorporating sociological and geographical factors (such as accounting for regional accents), and simultaneously ensuring these voice assistants are properly calibrated to deliver messages in a conversational manner (e.g., proper tone and tenor). In this article, we explore “three Ts” of

The popularity of voice assistants is on the rise, but learning to design an intuitive and effective model is still a work in progress. How can organizations use the three Ts—training, testing, and tuning—to create more human-like voice assistants?

A FIVE-PART SERIES ON CONVERSATIONAL AIOver the next year, we will discuss the implications and use cases of conversational AI. In this chapter, we discuss three Ts to developing effective voice assistants. In our remaining chapters, we leverage secondary research and case studies to explore the following topics:

Conversational AI makes its business case: The initial chapter of this series breaks down what constitutes conversational AI and the myriad ways companies can leverage its capabilities.

Acoustic authentication: Explains how conversational systems can enhance security protocols by integrating voice into the multiauthentication process.

Industry use cases: Highlights how virtual assistants appear to be changing the face of customer service in banking, technology, and health care.

The liability of conversational systems: Explores how the more we integrate conversational bots into our work and lives, the more we should take steps to understand their liability in terms of insurance, training, auditing, and the ethical implications.

A conversational journey: How the three Ts of conversational AI build better voice assistants

2

Page 3: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

3

designing dynamic and flexible voice assistants: training, testing, and tuning.

Training the voice assistant: Matching human need with AI capabilitiesThere’s a paradox to designing voice assistants. While these assistants are underpinned by advanced AI and NLP capabilities, AI is only

“smart” in a very narrow sense—that is, it is most effective at solving well-defined problems.3 But consider the nature of a conversation: It’s free-flowing, words and turns of phrase can take on multiple meanings based on context and tone, and at a moment’s notice, we can jump from one topic to another. So how do designers marry an expansive need, conversational interaction, with a traditionally narrow solution?

Human-assisted trainers. Perhaps, a common misperception is that voice assistants need to be everything to everyone. Instead, most are usually asked to perform relatively specific tasks such as responding to routine call center issues or helping people select an artist from their music library. With this in mind, designers can benefit from working directly with stakeholders to identify requirements and goals. At its core, this means

solving well-defined problems that are easily tied to productivity measures (e.g., an airport voice assistant can measure how quickly and accurately it resolves customer queries).

In some of our earlier research, we found some of the best systems are designed directly with the communities that will interact with the AI solutions.4 That is, they benefit from making the human the focal point of the design process (also referred to as keeping the “human in the middle”).

In the call center example, this means working with and observing how call center employees interact with customers. What are the routine inquiries? Are there more complex asks that trip employees up? When does confusion arise between employees and customers?

Understanding these common challenges empowers designers to map a high-level process flow of the call fulfillment process. As demonstrated in figure 1, these mappings create the underlying foundation for recording and organizing calls into a manageable data set populated with keywords and phrases.

Source: Deloitte analysis.Deloitte Insights | deloitte.com/insights

FIGURE 1

Transcribing call center conversations for model training

< New call >

< Customer utterance >

< Business issue/intent >

< Agent response >

An indicator variable to track the start of a new call

Denotes the utterances spoken by the caller

Denotes the caller’s intent

The response returned by the human agent to the caller’s utterances

< New call > < Customer utterance > < Business issue/intent > < Agent response >

An indicator variable to track the

start of a new call

Denotes the utterances spoken

by the caller

Denotes the caller’s intent

The response returned by the

human agent to the caller’s utterances

A conversational journey: How the three Ts of conversational AI build better voice assistants

Designers can benefit from working directly with stakeholders to identify requirements and goals.

3

Page 4: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

4

Indeed, figure 1 is a simplification of the data structure, but after the designers are able to properly categorize these conversations, millions of recorded conversations can be translated into text and processed through mappings similar to this example.

Training the right data for your AI solution. After designers map the high-level process flow, numerous data sources are processed to train the voice assistants. This starts with transcribing voice data to text and parsing it into “human utterances.” These utterances consist of speech broken up by pauses in conversation. These range from single words to clauses to complete sentences. As seen in figure 1, utterances could be structured into business issues and resolutions.

After transforming the unstructured text into structured utterances, machine learning techniques, such as clustering analysis, create incredibly granular groupings within the data to uncover common patterns in the conversation. At this point, more supervised algorithms provide confidence scores that subject matter experts can validate and, when appropriate, use to correct machine learning conclusions. Taken together, putting humans in the middle, coupled with machine learning, creates foundational insights that inform these prospective voice assistants.

Testing the voice assistant: Uncovering the many dimensions of “accuracy”Testing a conversational system, such as a voice assistant, is more than ensuring that business issues are correctly mapped to resolutions. As many of us know from our own experiences, one-to-one conversations can easily be misinterpreted. If we aren’t familiar with an accent, we may misunderstand a question or if we are speaking to someone from a different geographical location, words can take on different meanings (for instance,

“chaps” can mean a good friend or something a cowboy wears). Conversational systems are no different—except, unlike us, they lack the ability to understand context.

For these reasons, designers should build quality assurance metrics that stress-test their models across a number of user personas, including:

• Variations in geography. Like our above examples, this consists of validating that the system can accurately interpret accents and contextually understand keyword meanings across groups. Taking this further, it may mean testing the model across multiple languages.

• Historical contexts. The models typically work best when they incorporate past conversations. If a prior resolution didn’t properly address an issue, then it can come off as tone-deaf if the model recommends the same solution again.

• Adaptable to real-life situations. Voice assistants benefit from testing in real-world situations. For instance, can the voice assistant cut through the background noise of the morning commute on the subway?

• Behavioral modeling. How we say something impacts action—that is, conversational systems don’t naturally have good bedside manners (e.g., telling someone they have a low balance on their checking account can probably benefit from a delicate delivery). Instead, it’s on the designer to ensure the responses are said in a natural and pleasant manner that users will be open to accepting.

All four dimensions show the importance of uncovering and accounting for implicit bias. If the algorithm doesn’t understand a specific accent, then it could be trained on a biased data set. In this case, the designers should work back to the training data to create a more inclusive design.

A conversational journey: How the three Ts of conversational AI build better voice assistants

4

Page 5: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

5

Fortunately, the testing process can help bring these issues to light.

Tuning for humans: Making conversations flow naturally

Voice assistants do not have to pass as humans, but they should be able to communicate in a pleasant and interpretable manner. In this spirit, designers can improve upon their voice assistants by tuning their models with a more natural delivery. Tuning a voice assistant includes:

• Pronunciation. Designers should build a pronunciation dictionary that standardizes the speech of reoccurring words.5 This reinforces the importance of focusing the goal of each voice assistant design to ensure a more manageable universe of words.

• Pauses. How pauses are deployed, both in their placement and duration, influence how natural a conversation sounds.6

• Pitch and pace. Since many languages, such as English, are atonal, the pitch and pace of words and sentences often convey a speaker’s feelings.7 For instance, rising intonations in the middle of a sentence indicate a speaker isn’t done talking, even if it’s followed by a pause. Further, a fast speaking pace can represent excitement, while a slower pace may indicate a more relaxed feel.

These natural changes in prosody work in concert to make conversations more natural and inviting. And with the help of virtual assistants, designers can deliver helpful conversations at scale.

The ever-improving assistant

Building an accurate and natural voice assistant is an iterative process. While we start with training, it doesn’t end with testing, and then tuning. Instead, each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address these biases; and if pauses are inappropriate, then the training data should be restructured to properly account for these natural breaks in conversation.

When designing your own voice assistants, remember:

1. The business objective should dictate the design.

2. Training, testing, and tuning are a dynamic process, with each step informing the other.

3. The work is never done. This is an iterative process, where the former version continually informs and improves upon future releases.

By establishing a well-articulated goal, designers can continually improve upon their voice assistants to sound a bit more human with each iteration.

A conversational journey: How the three Ts of conversational AI build better voice assistants

5

Page 6: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

6

1. Victoria Petrock, “US voice assistants users 2019: Who, what, when, where and why,” eMarketer, July 15, 2019.

2. Lauren Goode, “Your voice assistant may be getting smarter, but it’s still awkward,” Wired, December 27, 2018.

3. Jim Guszcza, “Smarter together: Why artificial intelligence needs human-centered design,” Deloitte Review 22, January 22, 2018.

4. Dr. Scott Pobiner and Timothy Murphy, From smart products to smart systems: The importance of participatory design in the age of artificial intelligence, Deloitte Insights, December 11, 2018.

5. Pearl, 2016.

6. Cohen, 2004.

7. Reedy, 2015.

Endnotes

The authors would like to thank Scott Pobiner of Deloitte Consulting LLP for his contributions to this series.

Acknowledgments

A conversational journey: How the three Ts of conversational AI build better voice assistants

Page 7: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

7

Vatatmaja | [email protected]

Vatatmaja is a specialist leader for Deloitte in the Applied AI group. He is a quintessential IT professional with current focus on cognitive computing, AI, deep learning, and emerging technologies, applying the knowledge to find interesting business solutions to improve productivity measure. He has frequently synthesized and recognized abstract patterns, facts, theories, trends, inferences, relationships, key issues, and themes in complex and variable unrelated situations when solving client business problems. Prior to joining Deloitte, he spent nearly 20 years with IBM, championing and leading teams in the area of enterprise integration, mobile, and IBM Watson.

Jyotirmay Gadewadikar | [email protected]

Jyotirmay Gadewadikar is a manager at Deloitte in the Applied AI group. He helps enterprises make strategic decisions with AI and analytics and is a recipient of the Department of Homeland Security’s Scientific Leadership Award. Gadewadikar has led global teams of data scientists, business analysts, software developers, and client stakeholders to conceptualize, design, and implement AI-enabled customized solutions through the analysis of available technology platforms, evangelization of supervised and unsupervised machine learning algorithms, and natural language processing and understanding methods.

Sherry Comes | [email protected]

Sherry Comes is a managing director at Deloitte’s Applied AI group, specializing in the areas of voice solutions, AI, NLP, sentiment analysis, analytics, data science, and ML. Her innovative approach has resulted in her receiving many innovation awards, and leading and being an integral part of many groundbreaking advancements, such as being the first person to bring AI solutions to Africa as a Distinguished Engineer at IBM Watson. Comes has done extensive work around creating voice virtual assistants in financial services industry and has a number of patents in her name. She started her career in research, working for the National Center for Atmospheric Research, and has held management and senior IT and executive positions at several companies, including Genpact, Century Link Telecommunications, Advanced Micro Devices, and IBM.

Timothy Murphy | [email protected]

Tim Murphy is a researcher and analytical scientist at Deloitte Services LP, developing thought leadership for Deloitte’s Center for Integrated Research. His research focuses on the managerial implications of the behavioral sciences within the workforce and the marketplace.

About the authors

A conversational journey: How the three Ts of conversational AI build better voice assistants

Page 8: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

8

Contact usOur insights can help you take advantage of change. If you’re looking for fresh ideas to address your challenges, we should talk.

Practice contacts

Sherry ComesManaging director, Deloitte Consulting LLP | Applied AI group, Conversational AI leader+1 720 325 3757 | [email protected]

Sherry Comes is a managing director at Deloitte’s Applied AI group, specializing in the areas of voice solutions, AI, NLP, sentiment analysis, analytics, data science, and ML.

VatatmajaSpecialist leader, Analytics and Cognitive | Deloitte Consulting LLP+1 816 802 7207 | [email protected]

Vatatmaja is a specialist leader for Deloitte in the Applied AI group. A quintessential IT professional, he is currently focused on cognitive computing, AI, DL, and emerging technologies, applying the knowledge to find interesting business solutions to improve productivity measure.

Industry Center contact

Tim MurphySenior manager, Center for Integrated Research | Deloitte Services LP+1 414 977 2252 | [email protected]

Tim Murphy is a researcher and analytical scientist at Deloitte Services LP, developing thought leadership for Deloitte’s Center for Integrated Research. His research focuses on the managerial implications of the behavioral sciences within the workforce and the marketplace.

A conversational journey: How the three Ts of conversational AI build better voice assistants

8

Page 9: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

9

Deloitte Analytics and AI

Achieving your business outcomes, whether a small-scale program or an enterprisewide initiative, demands ever-smarter insights—delivered faster than ever before. Doing that in today’s complex, connected world requires the ability to combine a high-performance blend of humans with machines, automation with intelligence, and business analytics with data science. Welcome to the Age of With, where Deloitte translates the science of analytics—through our services, solutions, and capabilities—into reality for your business.

Deloitte’s Center for Integrated Research focuses on developing fresh perspectives on critical business issues that cut across industries and functions, from the rapid change of emerging technologies to the consistent factor of human behavior. We look at transformative topics in new ways, delivering new thinking in a variety of formats, such as research articles, short videos, in-person workshops, and online courses.

About the Deloitte Center for Integrated Research

A conversational journey: How the three Ts of conversational AI build better voice assistants

Page 10: A conversational journey · each part of the process builds and iterates on the other. Implicit biases can occur during training, but testing can help designers uncover and address

About Deloitte Insights

Deloitte Insights publishes original articles, reports and periodicals that provide insights for businesses, the public sector and NGOs. Our goal is to draw upon research and experience from throughout our professional services organization, and that of coauthors in academia and business, to advance the conversation on a broad spectrum of topics of interest to executives and government leaders.

Deloitte Insights is an imprint of Deloitte Development LLC.

About this publication

This publication contains general information only, and none of Deloitte Touche Tohmatsu Limited, its member firms, or its and their affiliates are, by means of this publication, rendering accounting, business, financial, investment, legal, tax, or other professional advice or services. This publication is not a substitute for such professional advice or services, nor should it be used as a basis for any decision or action that may affect your finances or your business. Before making any decision or taking any action that may affect your finances or your business, you should consult a qualified professional adviser.

None of Deloitte Touche Tohmatsu Limited, its member firms, or its and their respective affiliates shall be responsible for any loss whatsoever sustained by any person who relies on this publication.

About Deloitte

Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee (“DTTL”), its network of member firms, and their related entities. DTTL and each of its member firms are legally separate and independent entities. DTTL (also referred to as “Deloitte Global”) does not provide services to clients. In the United States, Deloitte refers to one or more of the US member firms of DTTL, their related entities that operate using the “Deloitte” name in the United States and their respective affiliates. Certain services may not be available to attest clients under the rules and regulations of public accounting. Please see www.deloitte.com/about to learn more about our global network of member firms.

Copyright © 2019 Deloitte Development LLC. All rights reserved. Member of Deloitte Touche Tohmatsu Limited

Deloitte Insights contributorsEditorial: Rithu Thomas, Rupesh Bhat, Abrar Khan, and Preetha DevanCreative: Sonya Vasilieff and Emily MoreanoPromotion: Ankana ChakrabortyCover artwork: Neil Webb

Sign up for Deloitte Insights updates at www.deloitte.com/insights.

Follow @DeloitteInsight