26

Click here to load reader

psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

Embed Size (px)

Citation preview

Page 1: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

Dr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided I had to totally rewrite my talk and I did. It was way edgier than the original version. The second slide said, do we really want to espouse a vision of a white man, sitting alone at his computer doing math problems? This wild thing happened. It won't save and therefore, it just won't save and we used it earlier and it was fine and now it won't save. Apparently, I'm not suppose to give that talk and so I've gone back to my terribly tamed talk but I'm still going to start with a short rift on history.

The history that you heard so far is very cognitive and you heard from Eric this morning about Lick and it's true that Lick did play a role in history of AI but the truth is, that in early 1950s two parallel series of lectures and workshops were held to talk about designing a new science called artificial intelligence by one group and cybernetics by the other. Both groups invited the same young mathematicians and for those of you who know anything about artificial intelligence, these names are going to be super familiar. Marvin Minsky, John McCarthy, Claude Shannon, Allen Newell.

They went to both lectures. One was called the Dartmouth Workshop and it took place over eight weeks on the Dartmouth Campus, which is where John McCarthy was teaching and it was all of those mathematicians and one psychologist, Herb Simon who later got a Nobel prize and a Turing award which is computer science's Nobel prize so no idiot and in fact, in typical Carnegie Mellon style, he and his colleague Allen Newell both from Carnegie Mellon and in typical psychological style, were the only ones who during that period built a working system. Everybody else had a lot of great ideas, they just couldn't turn them into something that worked.

Newell and Simon actually came to the meeting with a working model of human learning, running on a computer but that's not what I want to talk about here. What I want to talk about here is that group is what we remember as the birth of AI but those same mathematicians between 1946 and 1956 so before the Dartmouth Workshop, attended a series of workshops called the Macy Lectures that took place in New York. The Macy Foundation funded these. The same mathematicians attended but the rest of the attendees were much more broad in the fields they represented. Margaret Mead was there, anthropologist. Her husband, Gregory Bateson was there, anthropologist, cultural critic. Several psychologist were there including Lick who was a psychologist.

Whereas the theme of the Dartmouth Workshop was, our premise is that a perfect model can be built of human intelligence including all aspects of it, such as learning and all other features of human intelligence, such that a machine may think like a human. The second one was much more broad. One of the topics was ... and I don't have this in front of me, this is what I want to show you, anthropology's digital and analog correlates. Cool and cybernetics and ethnicity and culture's role in learning and how to model it with computers. Now, have you ever heard of this history of AI?

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 1 of 17

Page 2: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

No, isn't it interesting that the broader cross-disciplinary version has been shrouded in history and we're not going to let that happen anymore, because over the last 10 years, what I've been doing is trying to bring in a much broader understanding of human intelligence and human behavior and use that as the basis of designing machines. The way I do that is ... I just read this last night, Morris Halle, a very famous phenologist who was Noam Chomsky's closest colleague at MIT just died two days ago, sadly in his late 90s and in one of his memorial speeches, someone said Morris Halle believed in the virtuous interaction of empirical coverage, descriptive insight and theoretical elegance. I thought, when I die may someone say that about me.

I thought I would put it here, write it down and when I die could someone please bring this back up again. Okay, so what I've been looking at is ... and this is not surprising because I was on Cynthia Breazeal's dissertation committee, social interaction and I'm going to tell you more about that and I do it in a very different way than Cynthia does. I want to start by showing a clip without a theoretical definition or an empirical model. Just so you get a sense of what I'm talking about. This is a child interacting with a virtual peer. Our first virtual peer was built in 2000, was presented at the CHI, Human Computer Interaction of 2001.

We've used it since then for a number of purposes, for working with children with autism and aspergers, for working with curiosity, a number of different things and I'm not going to, unfortunately, tell you about the autism work. This particular clip comes from some of my work on ethnicity, looking at ethnicity and identity, looking at dialect to use as an index, an important index of community affiliation and how that might affect learning in the classroom and how we can use that to build educational systems that don't suppress identity but celebrate it. This is the only part about that, that you're going to see. I just want to play this clip.

This kid has been given no introduction, except you're going to work with a virtual peer on the science task that you're doing in class and first the two of you can brainstorm and then you can practice your presentation to the teacher. Two conditions, in one condition, the virtual peer spoke African-American vernacular English, for the first task and then said to the child, "Oh, my teacher likes it when I use my school English," and switched into what we call mainstream American English. It's not a real dialect, it doesn't exist in nature but it exists in classrooms. The second condition was the virtual child use mainstream American English all the way through but the words were identical, the content was identical. Okay, here, I just want you to look at the nature of the interaction.

Speaker 1: Alex.

Dr. Cassell: Can you raise the sound?

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 2 of 17

Page 3: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

Speaker 1: Alex. You'd be the teacher and I answer questions.

Speaker 2: Okay. First question, do you think it needs to be high up or low?

Speaker 1: I think this need to be a little low but I think it's a little bit high.

Speaker 2: Why?

Speaker 1: Because, if it is really, really low, you're not going to be able to see it but if it's really, really high, you're not going to be ... it's going to touch the clouds and it's going to touch the moon.

Speaker 2: I think the bridge is going to fall over if it's too high up.

Speaker 1: Best is less. Next question.

Speaker 2: Good answer. Do you think it's better if you have more little beans or fewer thicker beans?

Speaker 1: Small little beans.

Speaker 2: Could you give more information?

Speaker 1: Yes, I can. I'm an expert ...

Dr. Cassell: Okay, you can stop the video. I think you get the sense. As you can tell, the child does not like the virtual peer, is not engaged with the task. No, on the contrary, the child is super engaged and is demonstrating what we would call in everyday language rapport and is showing some kind of bond with that agent. In fact our results showed ... so we were looking at dialect, our result showed that when the agent spoke African-American English in the first task and then mainstream American English, the kids learned more and there were a transfer effect several days later. However, when we did a closer look at the data, when we did a logistic regression on the data, we found that, that was entirely mediated by rapport.

The children whose videos were objectively and independently rated as having rapport, were the ones who learned the most and showed that transfer effect. That's really interesting. This is a kind of a demonstration of something that we all demonstrate every single minute and I've been watching it around me because I can't not watch it. When we meet someone, we don't just give them information. We don't just pursue what you might call propositional goals. We also pursue conversational goals so we make our ... or interactional goals. We make our interactions smooth, we take turns. Some of us, like the ones from New York, for us, interaction means talking at exactly the same time.

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 3 of 17

Page 4: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

For other of us, who may have grown up in California or even worse Iceland, it means nodding carefully for almost three seconds which is interminable for a New Yorker and then replying, but there are also interpersonal functions and we always do this. Even if we're at a bus stop, we'll say nice boots or weather looks great, finally, I think spring is coming and those are interpersonal goals. Over time, if we see those people over and over again, our interpersonal style will change. We will no longer say, "Nice weather, we're having." We'll say, "Why are you wearing a rain coat. We're not going to get rain today, it looks totally dumb."

That kind of rapport as you saw with that child is not just window dressing. It actually improves task performance. Kids who learn with friends, learn more. Survey respondents, give higher ... more complete, actually, answers to a survey and more honest answers. Physicians who build rapport in room or patients in studies, sales of course works better when there's rapport, which is why people call you during dinner and say, "John or Justin," at which point, I always say, "This is Dr. Cassell, how can I help you," because my friends never start a conversation that way so I know it's sales. Now, the methodology I use is very different from what you've seen today.

That's because I have a PHD in psychology and so, while I'm fascinated by artificial intelligence and I build AIs, I'm at least as fascinated by human behavior and the link between the two for me is like in 1952. These machines are ways of understanding human behavior and that's why I first began building them. When I built the very first virtual human in 1993, '94, it was to investigate a hypothesis about the relationship between discourse and dialog and nonverbal behavior and then I kind of thought, these are cool. I'm not going back to my day job. I'm going to do this but I still always start with human-human behavior, build a formal model that is a predictive model of the most rigorous sort.

Use that model to implement the algorithms that are going to make the virtual human work, use the virtual human to validate the model and we also use other techniques to validate the model and then if it works, we think about interventions that this virtual human could fulfill and in the last eight to 10 years when I've been thinking more about ... Initially, some of the issues around the notion of autonomy, for AI, scientist's autonomy has been the holy grail and it really started to bug me because we're not autonomous. Why is it the holy grail? When human behavior demonstrates that we're so interdependent and not autonomous.

I started trying to think of, how I could build machines where the unit of analysis is the dyad and not the individual. I love that challenge, people told me it was impossible, that's also a real motivator for me. I also like the challenge of including time in my analysis and I think it's something that we don't do in AI or in human computer interaction. Often, we say, they used it, they liked it, onto the next system. Think about Siri. Siri spends more time with you, probably than anyone else in your life. Why is it the case then that everyday SIRI says, "Hi, I'm Siri, I'm your personal assistant," no shit.

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 4 of 17

Page 5: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

I know, so I wanted to see whether I could both use the dyad as the unit of analysis and use time in my analysis, use temporality as part of the analysis and then build time into a machine and build the dyad into a machine and so that's what I've been working on for the last 10 years. I start by observing human behavior in lots of different context and I tried to make them as ethologically valid as possible, ecologically valid as possible and then I analyze and my students and I analyze it one thirtieth of a second granularity and lots and lots of undergraduates in my lab, usually around 46 over this summer doing annotation.

We're looking at everything. We're looking at eye brow raises, we're looking at gaze shifts, we're looking at smiles, we're looking at torso shifts and we're pairing them to what's said, to the acoustics of the voice and so on. It gets us to look at really tiny phenomena which are fascinating, like the difference between a true smile which is called a Duchenne smile and a regular smile. A regular smile looks like this. A Duchenne smile looks like this. It's differentiated by the muscles at the corners of the eyes or it gets us to look at the interaction between bodies. It turns out that over time we entrain or coordinate with one another.

That can be looked at as bidirectional where we come together and start to act like one another or unidirectional. My students and I have pioneered a technique to look at causality and it turns out that when you have rapport with somebody else, that leads to increasing coordination of the body and we look at tiny little gestures of the mouth. It turns out that when children work together, we've been looking at teens doing peer tutoring of linear algebra and in our data set, which is 60 hours of data, it took us four years to annotate the data set. In that data set, we saw only three instances of praise but we saw a lot of teasing and a lot of insulting.

Now, you've all been teenagers, otherwise, you wouldn't be adults and so you can probably think back to yourself on how you indicated that you were friends with somebody else. Now, the interesting thing to me is that computer scientist have been building tutoring systems for a long time now and a colleague of mine was trying to build social interaction into a tutoring system and rather than looking at teens he sat in his armchair and he's very, very smart, he sat in his armchair and he thought to himself, "Okay, what's a social interaction?" Politeness, well, low and behold that didn't work so well.

Insulting works super well and our results in a human-human experiment showed that those students who violated social norms through teasing and insulting learned more linear algebra and those students who did not violate social norms, who's stuck to the social norms learned less but the effect only worked for friends. When strangers insulted one another so it's this beautiful, it did not work at all. It actually led to less learning. After we've done these human-human experiments, that's when we begin to implement things and you can see here a range of the kinds of virtual humans that I've been implementing over the last two and a half decades with my students.

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 5 of 17

Page 6: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

After we've got that, then we evaluate, we were lucky enough to be able to demo some work at Davos, The World Economic Forum in 2017. We have to pixelate the world leader's facial expressions so you can't tell who they are and we use that evaluation to always say to ourselves, "We forgot to look at X behavior in the human-human data set and it's not implemented in the agent human data set. We have to go back to the data because we never get it all." We look at a lot of different metrics and I told you some of the kinds of things that we annotate for and here are some more. We annotate for a lot of different phenomena because we don't want to take for granted that we know what rapport is.

Right now, I'm working with the postdoc on a model of the role of nonverbal behavior in rapport building and maintenance. This doesn't exist, no one has done this before and it has to be nonverbal behaviors that play with language but some of them aren't going to because of course our initial idea was look at where people insult or use other kinds of rapport language and then look at what they do with their bodies but that's going to miss the ones that don't occur with language and so we're oblige to code every single nonverbal behavior and then do thin-slice annotation, pioneered by Nalini Ambady and Bob Rosenthal where you give participants or naïve annotators who don't know what you're working on, a 30 second clip.

Ask them to tell you whether there's rapport or not. From one to seven, they annotate it and then they also tell you their confidence. With all of this, what are we ending up with? We're ending up with socially aware systems and you saw some of these last night but they're quite different because their systems that ran off a model based on human-human behavior. They build interpersonal rapport and they also manage task goals and interactional goals, in a particular task. We know they're going to work because the same things that we do with humans and I know this after two and a half decades of doing this work, we do with agents and Cynthia is well-known for building super realistic and she mentioned this last night, systems.

Systems that really look like humans. It's not what I'm after and Eric this morning said, "Forget the uncanny valley, who cares about the uncanny valley." I care about the uncanny valley because I don't want to build the most human like machine. I want to build a machine that establishes in the technical sense, inter-subjectivity with people, in the technical psychological sense. That is that evokes from people, the most human-like responses in them so it's not the agents, virtual agent humanness that concerns me, it's yours, it's people's and that means building tiny little protocols that like Trevarthen showed with parent-infant pairs, unconsciously evoke from us, the same kinds of behaviors, eye gaze shifts, head nods, torso moves and that's what interest me.

I'm going to show you some examples. These are some of the teens who are tutoring one another in linear algebra.

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 6 of 17

Page 7: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

Female: I like XBOX games.

Female: I'm more of a PC fan myself.

Female: Do you play Call of Duty?

Female: Not really.

Dr. Cassell: It's painful to watch isn't it?

Female: What's your favorite video game?

Female: Either Minecraft or League of Legends.

Female: I love Minecraft.

Female: Yeah. I like this one.

Female: I like the new horse update.

Female: Yeah. Have you tried the Minecraft Hunger Games?

Female: Yeah.

Female: I love those.

Dr. Cassell: Okay, you can stop the video. Were these guys friends or strangers beforehand? Can you tell from their bodies as well as the fact that they don't know what kind of video games each other play? Yeah, what kinds of cues do you think you're seeing, upright, still, yeah, super still, what else? Not leaning in, yup. Yeah, there's a lot of nonverbal cues as well as the diminished range of prosody in the voice. These are the kinds of things that we saw and the way we got this was by telling the kids that we had a problem with the camera, actually that's why I was late, you're all being recorded.

No, telling them that we had a problem with the camera and would they just kind of hang out for 10 minutes while we fix it and then they would tutor each other, A would tutor B and then we'd say we're going to give you a snack, we'd leave him alone for 10 minutes then B would tutor A so we got both social and task periods of time. Now, this is going to be quite different. Can you play this? You can tell by her facial expression already.

Female: I don't ... It's not supposed to be in the sun. I don't want to take it off because it'll ... Yeah, that's really swollen and ...

Dr. Cassell: The girl in the sundress is asking the other, why she is wearing a hat and the girl with the hat says, because I've got really pus-filled sores on my scalp.

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 7 of 17

Page 8: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

Female: Do you think we'll get to use that computer thing, that computer game that you're ...

Female: [inaudible 00:27:45].

Female: It was really a good computer game.

Dr. Cassell: Okay, friends or strangers? Friends, you know because that's the rhetoric of a talk but are there also embodied cues that fill you in? Leaning forward, yeah, did you see the prosody, the hype? I don't know, it's so confusing. It's my favorite moment in our data set I think, is that little clip. It's so teenage. The fact that she's looking at her armpit, unfortunately, that's a one way mirror so the experimenters are back there rolling their eyes. The fact that they're talking about pus-filled sores, all of these are violations of the social norms. Now, we take data like this and we build a quite sophisticated model as I told you because we use the dyad as the unit of analysis.

The model needs to consider the dyad that is the two people as engaging in joint action, in the Herb Clark sentence. They're doing one thing. It also has to be multilevel and this ... I tell my students if there's one thing they leave my lab with, they have to remember that visible behaviors allow us to infer underlying psychological states. They are not the same as underlying psychological states. Why do I say this because there's a lot of bad research out there that says, "Oh look she's smiling. She's happy." Well, in fact, in the United States, we more often smile for reasons of embarrassment than happiness and in other cultures, it's actually pretty rare that smiles are more likely happiness than something else.

You confined a lot of bad work especially in ... among computer scientist who have sat in their armchairs and thought to themselves smiles. I know, happiness and it's not the case. It's multilevel, we code visible behaviors. We correlate with independent evidence for underlying psychological states so there's not circularity and the thin-slice annotation allows us to do that. Time is taken into account so you see all the utterances there and it's cross model, it's both verbal and nonverbal. This is a model of rapport that we've been developing over a number of years and it includes ... Of course, it's cross model, verbal and nonverbal.

It takes place over time so over time you change the conversational strategies, you use with other people and from now on, you erect your conversations with other people, are going to be filled with little meta-cognitive moments where you say to yourself, "Is he self-disclosing? He is, isn't he? I suppose that would mean things are going well. Well, how about I try a little mutual self-disclosure," and so on. As time goes on, you'll begin to indicate for example in a marriage that you're a group of two. You'll smile at private jokes. You'll tease the other person in front of others and so forth. This model is what we've implemented into agents and I'm going to show you one of those agents.

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 8 of 17

Page 9: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

Now, the model has to be made into something that's process-oriented so that we can turn into algorithms and it has to be temporally sensitive. We took a fairly new machine learning algorithm called temporal association rules. We made it real-time so that in real-time the agent can figure out what's likely to happen, given what happened before so here's a great example. If the kid who's tutoring at that given moment smiles and then the other kid violates social norms and then the first kid smiles and the second kid also violates social norms, super high rapport. Now, let's look at the same thing in strangers. The first kid smiles, the second kid violates social norms.

They talk at the same time really low rapport. These temporal association rules are cool because they're predicting the level of rapport based on verbal and nonverbal events that come before and I'm not going to show you the architecture in much detail. I'm just going to show you the sketch but we've built three modules that no one was able or maybe they just weren't interested in building before. One is the module that classifies what we say into self-disclosure, praise, negative self-disclosures, violating social norms. One is a module that automatically in real-time estimates the rapport between the agent and the human taking into account not just the human's behavior but the agent's behavior and the third is a module that takes all of that as input and decides what the best thing is to say next in order to build or maintain rapport.

We've been building a rapport aware peer tutor like the data and the was system is the same except that it has a learner module and some things from a classic. Tutoring system and we're in the middle of testing it, we're making things hard for ourselves. We've got three conditions, one is no rapport building whatsoever. The other is rapport building from other people's systems where they didn't look at human behavior and that's basically linear rising rapport over the course of the interaction and then our ... we're putting all our money on the third condition, which is that it's adaptive based on what happened in the previous turn and what the interlocutor, what the other person did, that determines what the agent does.

We're doing a mobile system and that's not particularly interesting and then we've got this full-sized personal assistant that we've been using mostly for demos built on top of this rapport model and I'm going to show you the version we showed at Davos because it's kind of fun. It has that module that takes non-verbal behavior and tracks it so that it can detect smiles and eye gaze and head nods. It has that module that automatically detects rapport in real time, on the upper left and then it has that conversational strategy, recognition that can tell whether the person is using a rapport building or a rapport maintaining or a rapport breaking conversational strategy.

That's the tracking facial movements, that's work by Louis-Philippe Morency and then it has a module that selects the next move, as I told you and it generates that next move. These particular things that you're seeing here are screens that are really talented animation company built for us so that the person ... the people who were not engaged with Sara at that moment could also see what

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 9 of 17

Page 10: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

was going on, in the mind of the agent and what the agent thought was going on in the mind of the person interacting. Can you play that video? I'm not going to show you all of it because it's long. This was a really high stress demo as you can imagine and my students and I basically didn't leave the lab for three months. They came to us on the first day of February and asked us to demo in June.

Sara: Hi. I'm Sara. It's such a pleasure to meet you. May I ask your name?

Eddie: Hey Sara. My name is Eddie.

Sara: Nice of you to take the time to stop by. How are things?

Eddie: It's going pretty good. It's going pretty good.

Dr. Cassell: There's that module deciding that self-disclosure and rapport therefore went up.

Sara: Great. Well, let me introduce myself.

Dr. Cassell: She's also self-disclosing.

Sara: I've been asked to play matchmaker by helping attendees find sessions to attend and people to meet based on their interest. All you need to do is tell me a little bit about yourself and what you want to do while you're here.

Eddie: I'm a masters student at Carnegie Mellon University, in Computer Science Department and my research focuses on artificial intelligence and natural language processing and I'm mainly here just to network, meet people in kind of the tech scene and maybe go to some interesting sessions.

Sara: I've never met someone like you before. It's refreshing.

Dr. Cassell: Now, remember, we have no idea what she's going to say.

Sara: This is my favorite part. Let's see if we can find some interesting people to meet. Just one minute while I look this up.

Dr. Cassell: You're supposed to have Google glass.

Sara: There are details on the screen about Justine Cassel. I feel you share some interest.

Eddie: Yeah. Justine Cassell. I've heard of her. It'll be nice to meet her though.

Sara: I can either send a message on your behalf suggesting a meeting or I can give you the information and you can setup the meeting yourself.

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 10 of 17

Page 11: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

Dr. Cassell: The world economic forum let us use the system that attendees use to collaborate with one another.

Sara: Would you like me to send the message?

Eddie: Yeah, please do.

Sara: Good. You'll see a message from me in your top link private message inbox introducing you two.

Eddie: Awesome.

Dr. Cassell: People have asked me a number of times why I wanted to work on social AI and what gave me the idea for this project.

Male: I started to really see that human communication is a lot more complicated than exchanging words. Timing is very essential. Hidden meaning in words is very important too.

Male: Then, we'll use the acoustic, for example, we should know the tone of a voice of the user. We'll use cameras to capture your facial expressions and there are 64 landmarks in your face.

Male: These are things we really have to keep in mind when we kind of continue through our development of Sara.

Female: The articular rather sense the intersection of language technologies, human-computer interaction as well as machine learning and psychology, come together and work on interdisciplinary projects that look at the social aspects of a technical design.

Female: He is one of the examples of the bright future of artificial intelligence.

Male: Sara standing for Socially-Aware Robot Assistant, is your personal assistant. She can recommend interesting sessions, interesting persons in this Davos conference.

Female: In this era of fears about robots taking over our jobs and killer bots, I really want to highlight the role that robots and agents can play as collaborators with people, as partners with people. Instead of taking over for people.

Sara: Okay, one, two, three, smile.

Dr. Cassell: At the end of every interactions you offer to take a selfie. We have world leaders coming back and saying, "That one didn't turn out well. Can I take another selfie?" Okay, you can stop the video, so you get the idea. We've been conducting research now, looking at those interactions to see whether we can

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 11 of 17

Page 12: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

predict, whether rapport was built in a particular session and what built it. I'm just going to give you a last example and I want to say by the way that ... I forgot to say this, the Alex project which is the one that I showed you just one single little clip of in the beginning, the one that's about dialect and rapport and science learning in second and third graders.

Actually, by total coincidence happens to be being demoed next door in the convention center. I got this email saying, "Would you like to demo at the US Science and Engineering festival in DC?" I already knew I was coming for this and so I said, "Yeah, great." Assuming it was going to be at NSF, it turns out to be connected to this building. My lab manager, Lauren who you saw on the video of Sara and former PHD student Samantha who did her PHD on the Alex work are there today, all day today and all day tomorrow and most of the day Sunday, starting at 9 AM and you can interact with Alex and talk to them. Today is mostly a school group day but they're welcoming adults.

In fact, I think they'd be grateful for an adult at some point. It's right in the middle of the convention center on the M3 Floor, I think. Anyone know if that's right? I think it's the M3 floor but it's the US Science and Engineering festival. This is an entirely different example and the reason I'm showing it to you is it's a great example of this process we follow which is really what I wanted to talk to this group about because I think it's missing in a lot of work and I think it's the place where psychologist really need to play a role and that is model construction. You need not just to take a human and copy that human into an agent.

Really what we force ourselves to do is to generalize kind of like we're taught to do in graduate school, to generalize from our data collection of human-human interaction to this formal model, that is predictive enough to allow us to build algorithms and then we iterate. We go around and around and we have a new topic that we've been working on and I'm so excited about this, for lots of reasons. I get funding often from people who are interested in the future of work, for example. Everyone is talking about innovation and 21st Century job skills, collaboration. What's interesting to me though is when I look at classrooms, when I go into classrooms, I see the death of innovation.

The death even of everything that might lead to a 21st Century job skill because in this era of funding for schools being based on test scores, teachers have no choice but to teach the test and when you teach to the test, you systematically remove any trace of exploration curiosity, self-driven, self-efficacy in students. We collected data, so we've been looking at curiosity. How can we protect, inspire, maintain curiosity in classrooms and after school programs and at home, given the way classrooms are working today. There's a lot of literature on curiosity but there's not a lot of work on the social aspects of curiosity.

All of the work that's been done on curiosity in psychology has been on individual predictors. There's this big surprising research gap. If kids are asked to

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 12 of 17

Page 13: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

collaborate or if children work in groups, doesn't it make sense that something in that group is also going to play a role in curiosity so that's what we've been looking at and it really plays a role. It's actually more predictive of curiosity than individual behaviors. We, as usual have this dual research goal. One is to develop a model of the social predictors of curiosity, amongst group of children and the second is to build a technology that can inspire, maintain, protect curiosity in the classroom.

There's a third goal too because the technology we're building is actually cheap these days but that doesn't mean it's cheap enough for every classroom and we also want teachers to understand the importance of this so our third research goal is to write a guide for teachers on how to inspire curiosity in their lesson plans so we often see teachers telling students to work together. I have to tell you this example. We stopped using our ... we collected data in a school context, the science classroom in an after school program and in our lab. We ended up not using the data from the school classroom or the after school program because so systematically was any hint of curiosity squashed.

My favorite example is this classroom where the teacher said to the students to divide into groups of three or four and she gave them a number of stones and she said, "Okay, now, I want the first child to pick up the stone and smell the stone. Good. Now, write down what you smelled? Now, I want the second child to pick up the stone and feel the stone. Great. Now, write down what you felt." At this point, I was like, "Ah," and it went like that for the whole lesson. The kids had a list of perfectly boring characteristics of stones. They never ... okay, it's true, they might ... some kid might have thrown a stone at another kid, if they hadn't been smelling and feeling and otherwise, dealing with the stone in this very regimented way but it's also possible that they might have learned something about it.

They might have found MICA embedded in matrix or they might have found the correlation between the weight and the kind of stone. None of that happened and likewise with the after school program. What we ended up doing was using a Rube Goldberg Machine task and you'll notice that in all of our research, there's never an adult nearby because children act so differently in the presence of an adult than in the absence of an adult. We ended up using this Rube Goldberg machine. We brought in groups of kids, three to four kids per group and we gave them straws and cups and all kinds of random stuff, ask them to build a Rube Goldberg machine.

Did some very fancy building of four simultaneous linked cameras and microphones on all of the kids so that we had all of their behavior and then we looked at visible behaviors that in the literature seemed like they might be drivers of curiosity in the individual. We looked at putative drivers in phenomena or in functions. Then, we looked at visible behaviors that fulfill those functions and then we annotated everything in our own data and you can see all of those cameras there. We had once again, naïve different annotators annotating thin-slice curiosity and by the way, the way we do thin-slice is we

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 13 of 17

Page 14: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

divide the video into 30 second clips and then we randomize the order of those clips and then give it to an annotator because we don't want to know the delta.

We don't want to know the difference between clip one and clip two. We want to know the absolute value of rapport curiosity or whatever it happens to be and then we analyze it. What we ended up with is this analysis that comes from a continuous time SEM factor analysis. It allows us to validate the putative functions and the putative behaviors that fulfill those functions and low and behold, you can see by the numbers those which were validated and those which weren't. What we found was there's a strong correlation between the visible behaviors and the drivers and between the drivers and thin-slice curiosity.

It's more the case that social behaviors across people in the group, children in the group predict curiosity, than that a stream of individual behaviors over time predict them. That's really cool. Kid one ... and once again, we use the temporal association rules to see over time what was predictive across kids. For example, kid one says something that's a hypothesis and kid two then verbalizes an idea about that hypothesis. Kid three says, that's stupid, that's what negative sentiment evaluation is by the way in everyday language and low and behold that leads one of the kids in that group to say, "Wait a second. I think we just have to look at what it's doing."

That's curiosity. That's what we're looking for. I'm not going to show you this video for reasons of time but now we're integrating that into a virtual agent who's going to play with the kids and that's where I'm going to stop. Thank you very much. Questions. I think you have to raise your hands really high. Yeah.

Female: Thank you so much. I have a quick questions about the piece you just mentioned about cutting little 30 second pieces, slices out and mixing, varying them up in order. Do you, in that way continue to test your hypothesis or certainly theoretically derived belief that rapport builds over time because you can then unscramble.

Dr. Cassell: Yeah, exactly. Thin-slice annotation is a really interesting technique and when Ambady and Rosenthal wrote this first paper about it, they showed that very thin slices of behavior can be recognized by naïve annotators as representative of particular underlying states such as rapport between two people, intimacy between two people, drive, all kinds of underlying states. We do all of these thin slices because rapport is one of the things, you have to be careful of circularity. You can't say, look for places where they smile at one another and then say, "Look, it's correlated with smiles." We need a way to get what's called in computer science, ground truth.

What's the actual state of the thing to which we can correlate those visible behaviors and that's what's this is giving us. We get the things out of order. We

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 14 of 17

Page 15: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

reassemble them and then we see what predicts a fall or a rise or a maintenance in those visible behaviors. Yeah, exactly.

Female: Hi. I have a question about your software that you're using, the analytical software that you're using and wonder if it's proprietary or if that's something that we might be able to use. For example, you had several screenshots of where you were looking at facial video, biometrics, all those kinds of things and then in tandem with that is, do you think it's possible to do this kind of research with a virtual lab? You, obviously have a beautiful lab there are Carnegie Mellon but I was wondering if this might be something that you could see conceivably having on a virtual lab.

Dr. Cassell: The slide that I showed you that set along the side analysis that's using a piece of software called ALON, which is free for download. You have to have a fairly powerful computer because you're loading video onto it and so it's a little intensive and because the software is free, it's not like you're paying for help using it but the guy who invented it actually ... an interesting story because the guy who invented it somehow ended up doing that instead of the PHD that he was supposed to be doing. He needed a tool to do the analysis before he did the PHD and never got further than the tool. He's very responsive so it's a good piece of software used for that reason.

As far as doing in a virtual lab, yeah, totally. In fact, so this annotation is so intense and for my career, I've been doing this and actually my dissertation was never published because the Journal of Child Language told me that an N of 12 was not large enough to publish in the Journal of Child Language and I thought let me chain you to a desk and get you to annotate 12 hours of data and then you tell me whether it's enough but I didn't write that to them although I was really tempted but it never got published because it wasn't large enough and now, that was a long time ago in times of change but for us, having 46 annotators is not always possible and it just takes too long even with 46 people.

We're starting to crowdsource the annotations and it's taken us a long time to find a way to do that. How do we train them? How do we find people who were going to pay attention? How do we throw out judgments where it's clear, they haven't paid attention like under a 30th of a second before we get their answer when the clip is 30 seconds long, something like that. How do we take five judgments because we get three to five judgments on every behavior and then we don't want to take the mean, that would be meaningless, what do we do? We're starting to get ways of using crowdsourcing that would make this feasible for a virtual lab as well as making it feasible for people to do, who don't have the luxury of 46 undergraduates and we're excited about that. Yup.

Female: That was a really fascinating talk and I really like that you collected so many video types of people having interaction in the lab and I'm especially fascinated to hear that people don't compliment each other. I mean, not only in the lab but also in real life but it seems insulting, it's more frequent. I have some ongoing

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 15 of 17

Page 16: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

research on why people don't compliment other people but I wonder why you think that's the case that sometimes we ... often we have nice thoughts about other people but we don't tell them. Why is it the case?

Dr. Cassell: Compliments and praise are a tiny bit different than one another. Maybe a compliment is a subcategory of praise. There's a famous paper by Tickle-Degnan and Rosenthal on the nonverbal correlates of rapport and they show that over the course of a relationship, attention to the other person stays constant. Coordination with the other person's body during conversation rises and here's the kicker, positivity towards the other person goes down and you can think about this in your own friendships and we see it in videos among strangers. At the very beginning, they're going, "Oh, good, good, good, good." That would be weird if I said to my best friend, "Good, good, good, good."

She'd say to me, "What is wrong with you," or actually really interestingly one of the things we found is that we break rapport with someone else by doing exactly the same behaviors but at the wrong time. If I start observing social norms with a close friend, that will break rapport, why thank you. That is so kind of you to have helped me with this. I will appreciate that help and they're going to be going, "Whoa, I must have done something really wrong." Praise is used by more often at the bus stop in the beginning than later on because the violation of social norms, I think this is my interpretation of our results shows that you're a unit of two, that you have the luxury of engaging in following interpersonal norms, rather than societal norms.

Those interpersonal norms can include things like letting go of, praise or other kinds of compliment. I wonder whether compliments are a subcategory of praise or vice versa but I think that's what we're seeing. Certainly there are places where you want to praise someone. If your spouse is wearing something brand new and you go, "Eh, that's not going to work so well." That's definitely a diminishment of positivity that is not going to maintain rapport. We do lose some of that over time.

Female: We have time for just one more question and then we will take a 15 minute coffee break.

Male: Is this on actually? Yes, it is. Thank you for this inspiring presentation. I know I just praised you but I actually have a friendly amendment about exactly this issue of norms. I think it's not correct that friends break social norms. What they do is they figure out what norms apply to that conversation. It's actually that their norm conforming ...

Dr. Cassell: It's the conversation to the relationship.

Male: Conversation as it becomes as the relationship so you may actually see breaking norms in both directions. You actually gave just the example. If you are too formal with a friend, you're breaking the norm but you actually conforming to

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 16 of 17

Page 17: psyciq.apa.orgpsyciq.apa.org/wp-content/uploads/2018/04/05-Models-and... · Web viewDr. Cassell: It has to be fate. After last night's keynote and this morning's keynote, I decided

the norm that would apply if you talk to a stranger. What I find fascinating about it, has implications for robots and computers is, you have to know which norm set applies to which phase of a conversation, to which phase of a relationship and you wouldn't want to train your assistant to be norm violating but rather to recognize which norms you should conform to with whom and what phase. That's my friendly amendment.

Dr. Cassell: Yeah. That might be definitional difference. It might be definitional difference because what you're saying basically is, do the thing that you should do at that point in the relationship and call that a norm and I would say that we come into an interaction with the stranger, with our best guest and what's going to work in that context and it can include things like politeness and other kinds of things like that. We get that set from the society and the particular interaction context and that's why I call that adhering to social norms. The other set of norms come from knowing the other person and that's what I call violating social norms.

It's not violating norms, it's violating societal norms and adhering to interpersonal norms. Yes, there's a norm set and actually one of the things we found in the literature was that being predictable is a high contributor to rapport and the PHD students who is doing this work actually was a masters student at that time, said to me, "So, you should be more predictable as my adviser," which I know it was great. Predictability can exist among strangers because we know that there's a societal set. Now, of course there are subcultures and there are different context so this is a complex notion but as time goes on, we also know more about the other person, from the rapport building behavior of attending to the other person and then that's the set of norms that we follow. Yeah. Thank you all.

05 Models and Implementations of Social Skills in Virtual Humans (wit...Transcript by Rev.com

Page 17 of 17