Anon-Emoji: An Optical See-Through Augmented Reality ......form facial landmark detection and sentimental analysis to extract the facial expression of the subject, map it to a cer-tain

Anon-Emoji: An Optical See-Through Augmented Reality System for ReducingAppearance Bias in Social Interactions

Ran Sun1 Harald Haraldsson1 Yuhang Zhao1,2 Serge Belongie1,2

1Cornell Tech 2Cornell University

Abstract

Facial expressions are an important part of human com-munication. At the same time, faces also reveal aspects ofour identities – ethnicity, gender, or even sexual orientation– that can surface biases held by our counterparts and leadto downstream inequalities in our social interactions. How-ever, we do not have the option to entirely change what welook like, or at least hide the identities that our faces mightexpose. As Optical See-Through (OST) Augmented Reality(AR) headsets possess the advantages of near-eyes displayand better depth alignment between virtual renderings andthe environment, we decided on an OST AR approach forthe solution. In this paper, we present a system designed forOST AR headsets that occludes the subject’s facial featureswith an emotion-presenting emoji model in 3D space.

1. Introduction1.1. Background and Motivation

Physical appearance generates many biases in differentsocial scenarios. Research in social psychology has shownthat face appearance [13], for example attractiveness, of-ten mediates hiring decisions [9] [8]. Biases caused by eth-nicity, gender and other identity-related information havealso resulted in workplace discrimination [5]. Recent workhas shown the problem is especially acute for African-American ride-sharing or hospitality service providers [7].

Much of our identities can be easily obtained from ourfaces. Currently it is impossible to occlude our looks en-tirely in situations where identity information is unneces-sary or even detrimental. At the same time, face-to-facesocial interaction provides important context beyond per-sonal identity: for example up to 70 percent of commu-nication during in-person interactions happen non-verbally[14]. Therefore, concealing people’s faces during interac-tions could cause great deficiency. Since human faces areextremely expressive, we would want to preserve such in-formation when wiping out people’s identity from their ap-

Figure 1. A viewer wearing the Magic Leap One headset facing asubject with a wearable image marker on their head. Below arethe views captured through the AR headset - an emoji model ren-dered around the subject’s head. Color variations are caused bythe device and lighting, and are not noticeable to users.

pearance.Emojis are widely used in text-based communications,

mainly to compensate for the loss of facial expression usageduring virtual communication [6]. With its increasing us-age in social media to hide people’s faces in photos, we seea natural utility of emojis in creating anonymity. This in-spired us to use emojis to bring back people’s facial expres-sion while concealing their faces for anonymity purposes.

Motivated by this, we explored a technological solutionthat could occlude people’s physical appearances while pre-serving the other important elements of a social interaction.Our work could serve as a low-fidelity version of genera-tive deep learning-based approaches to content manipula-tion, such as Deepfake or GAN [12], in AR settings. Inaddition to hiding people’s identity, we have the intention

1

to replace their faces with actual human faces of anothergenerated identity. Therefore, in the viewer’s perspective,they are still interacting with a real person that preservesfull facial expressions and facial muscle movements. How-ever, all unnecessary identity information could be left out.Emoji models are a generic, easily-understood face modelsthat we use to demonstrate the idea.

1.2. Advantages of OST AR headsets

The advantages of OST AR displays make them idealfor this solution. Compared with handheld AR displays,head-mounted displays is a type of near-eyes display thatimposes more control over what the users can see. WithAR headsets, users no longer have to consume the virtualcontents through handheld digital screens with limited fieldof view. In addition, virtual contents are better embedded inreality, enhancing the immersiveness of the experience.

Among AR Head-mounted devices, there are gener-ally two different display technologies that superimposevirtual renderings with the environment in reality [4],which resulted in the following two types of display meth-ods: Video See-Through (VST) display and Optical See-Through (OST) display. Compared to VST head-mounteddisplay, OST display allows users to look into the realitythrough optical lenses while making use of optical combin-ers to present contents [4]. This guarantees a better align-ment of virtual contents and reality not only on a pixel-precise basis but also on depth-precise basis [4]. However,the VST display does not possess such advantages.

Although the general intention of AR is to add objectsto a real environment, it can also remove objects [10] [3].Derived from AR, the concept of Diminished Reality (DR)describes the process of concealing, altering, and seeing-through objects in reality [11]. However, occluding realworld objects is not one of the strengths of OST AugmentedReality displays. Compared to OST AR displays, VSTAugmented displays possess much more control over thedisplay of both virtual contents being rendered and the realworld objects in the view. Due to this, almost all DR sys-tems are built with Video See-Through displays [11]. Weinvestigated the performance of occlusion of an OST ARheadset, Magic Leap One, and proposed a system that lever-ages the limited concealing effect of the headset to providepeople with an option of anonymity during interactions.

2. Related WorkBimber and Raskar [4] discussed disadvantages of OST

AR head-mounted displays with regard to rendering. Theystate that OST AR devices are generally incapable of pro-viding consistent occlusion effects between virtual and realobjects. As users see through the optical lenses into thereal world, light of the illuminated real environment goesthrough the optical lenses and interferes with the virtual

renderings displayed on the optical combiners (essentiallyhalf-silvered mirrors)[4].

Unlike the rendering of virtual contents in VST ARdisplays, where renderings can either be opaque or semi-transparent, the nature of OST displays means that all ren-derings are semi-transparent. Transparency is further in-creased outdoors or in the presence of a strong light source.Therefore, occluding real-world objects with virtual ren-derings in OST AR displays is challenging. AlthoughLCD panels are being used by Kiyokawa et al. to selec-tively block the incoming light with respect to the renderedgraphics[4][1], this approach is not widely adapted by mod-ern OST AR headset manufacturers. This lays the groundfor the challenges of the problem that we are seeking totackle.

Overlaying avatar models on top of human faces is be-coming a new trend in applications for OST AR headsets.Weta Gameshop announced a multiplayer mode at GDC2019[2] [1] which overlays comic style 3D avatar modelson top of other player’s head. As a multiplayer mode whichrequires each participant to wear the headset, they could getthe position and orientation of other player’s head to facili-tate the rendering. In the system we present, one user wearsthe headset and other users - subjects - do not. Our focusis on altering the perception of the viewer with the head-set to preserve the anonymity of the subject. Therefore, wedo not require whoever is in view wearing the device. Onthe other hand, with the subject not wearing anything thatwould occlude their face, this opens up opportunities to per-form facial feature detection and analysis which could en-hance the rendering with more information we obtain fromthe subjects’ faces, for example their facial expressions.

3. Limitations

As discussed, concealing objects in reality by virtual ren-derings in OST headsets is generally challenging because ofthe nature of the display. Most OST headsets are designedfor indoor purposes, favoring the visibility of the render-ings in lower ambient light. Similarly, our design wouldwork better in indoor settings without strong light comingthrough the headset. In addition, users with different pupildistances might experience differences in the depths of therendering in 3D space, which is a challenge introduced bythe device. This resulted in some of the users experiencingmisalignment between the rendered model and the subject’shead.

4. Method

An informal survey of members of the research teamprovided preliminary anecdotal evidence of how renderingsin OST AR displays alter people’s perception of color andedges. To understand how color renderings affect how peo-

ple perceive colors in the real world, we overlay single-colored squares on top of single-colored squares displayingon computer screens and shuffle the colors of both render-ings and display. Viewers reported colors they perceive onthe display and the results confirmed that bright colors suchas white, yellow and light green are best at changing peo-ple’s perception of color, and thus have the best concealingeffect.

In a second survey, we observed how different render-ings influence people’s perception of edge information. Wecombined bright colored background with pixelated facesand high frequency facial features separately and overlaythe renderings on top of images of human faces. Basedon the viewers’feedback, high frequency facial features eas-ily stand out from renderings with only bright color back-ground or with low frequency details. Therefore we decidedon a combination of high frequency details with a brightcolor background, which would be a better approach in ob-scuring human faces.

Based on our observations, we designed the system ofAnon-Emoji. Our system consists of an image marker thatthe subject wears on the head and a system running on theMagic Leap One headset. When the headset picks up theimage marker, our system tracks the position and orienta-tion of the person’s head and renders an emotion-presenting3D emoji around it so as to occlude his/her face.

We chose emojis for three main reasons: its nature in ex-pressing non-verbal information; its usage in social mediaand casual portfolios for anonymity purposes; its strengthin concealing effects under OST AR settings. The emo-tional information they bear is widely recognized and easyto comprehend. As they are in the rough shape of a head,they make sense to the viewers when overlayed on the headof a subject. People sometimes conceal their faces whenposting photos on social media when they want to preservetheir anonymity. Finally, emojis are generally bright yellowin color, which our experience tells us is ideal for occludingobjects in OST AR displays.

In the near future, we will implement a separate processruns in the background on the device which aims to per-form facial landmark detection and sentimental analysis toextract the facial expression of the subject, map it to a cer-tain emoji, and change the renderings accordingly in real-time. We foresee that the changes of the renderings basedon facial expressions could compensate for the loss of non-verbal communications that the occlusion wiped away.

5. System DesignSystem requirements: Magic Leap One Headset, OS ver-

sion 0.95.2, MLSDK 0.20.0, Unity 2019.1.1, OpenCV forUnity 2.3.4

Our system consists of an AR headset and an imagemarker. Viewers are required to wear the AR headsets,

Figure 2. The OST AR headset being used in the system: MagicLeap One headset with a lightpack and a 6DoF controller

while subjects only wear image markers on their foreheads.The image marker helps identify the position and rotationof the subject’s head. Once the marker is picked up by theheadset, a virtual rendering of 3D emoji model will be ren-dered around the subject’s head. The emoji model will ob-scure most visual information of the subject’s face from alldirections.

6. Future Work

As mentioned in previous sections, we will integrate afacial expression extracting process to obtain facial expres-sion information and present that information through thechanges of emoji models. Once the system implementationis done, we aim to use the system to conduct several fur-ther studies: (a) How could rendered emojis influence howusers perceive other people’s emotions? (b) If the renderedemoji doesn’t match the real facial expression of the sub-ject, how would that affect the viewer’s perception of oth-ers’emotions? (c) Can we ease the emotional tension duringsocial interactions by delaying the rendering of negative fa-cial expressions (angry, etc.)?

7. Discussions

Augmented Reality headsets has become increasinglyavailable to consumer market. We foresee a near futurewhere more people would have access to AR headsets. Ata point of time where some people own these type of head-mounted smart devices that enable them with more informa-tion and real time computation power while others don’t, in-equality and privacy issues might occur. Similar to the dis-cussion where people start to concern whether their build-ings and properties would become the free billboards ofAR advertisements, for those who don’t own headsets, wewould want them to have the option of being anonymous inthe AR world as well.

References[1] Magic leap weta workshop preview multi-

player mode for dr. grordbort’s invaders at gdc.https://magic-leap.reality.news/news/magic-leap-weta-workshop-preview-multiplayer-mode-for-dr-grordborts-invaders-gdc-0194957/.Accessed: 2019-05-08. 2

[2] Magic leap unveils latest multiplayer vrprototype at gdc 2019. https://variety.com/2019/gaming/news/magic-leap-multiplayer-vr-grordbattle-1203166465/.Accessed: 2019-05-08. 2

[3] Ronald T. Azuma. A survey of augmented reality, 1997. 2[4] Oliver Bimber and Ramesh Raskar. Modern approaches to

augmented reality. In ACM SIGGRAPH 2006 Courses on -SIGGRAPH ’06, 2006. 2

[5] Elizabeth A. Deitch, Adam Barsky, Rebecca M. Butz,Suzanne Chan, Arthur P. Brief, and Jill C. Bradley. Subtleyet significant: The existence and impact of everyday racialdiscrimination in the workplace. Human Relations, 2003. 1

[6] Daantje Derks, Agneta H. Fischer, and Arjan E.R. Bos. Therole of emotion in computer-mediated communication: A re-view, 2008. 1

[7] Benjamin Edelman, Michael Luca, and Dan Svirsky. Racialdiscrimination in the sharing economy: Evidence from afield experiment. American Economic Journal: Applied Eco-nomics, 2017. 1

[8] Madeline E. Heilman and Lois R. Saruwatari. When beautyis beastly: The effects of appearance and sex on evaluationsof job applicants for managerial and nonmanagerial jobs. Or-ganizational Behavior and Human Performance, 23(3):360–372, 1979. 1

[9] Stefanie Johnson, Kenneth Podratz, Robert Dipboye, and El-lie Gibbons. Physical attractiveness biases in ratings of em-ployment suitability: Tracking down the ”beauty is Beastly”effect. Journal of Social Psychology, 2010. 1

[10] Steve Mann. Mediated Reality. Fundamentals of WearableComputers and Augmented Reality, Lawrence Erlbaum As-sociates, Inc, 1994. 2

[11] Shohei Mori, Sei Ikeda, and Hideo Saito. A survey of dimin-ished reality: Techniques for visually concealing, eliminat-ing, and seeing through real objects. IPSJ Transactions onComputer Vision and Applications, 9(1), 6 2017. 2

[12] Anand Rajaraman and Jeffrey David Ullman. Generative Ad-versarial Nets. Mining of Massive Datasets, 9781107015:1–315, 2011. 1

[13] Sarah V. Stevenage and Yolanda McKay. Model applicants:The effect of facial appearance on recruitment decisions.British Journal of Psychology, 1999. 1

[14] D. S. Sundaram and Cynthia Webster. The role of nonverbalcommunication in service encounters, 2000. 1

https://magic-leap.reality.news/news/magic-leap-weta-workshop-preview-multiplayer-mode-for-dr-grordborts-invaders-gdc-0194957/

https://magic-leap.reality.news/news/magic-leap-weta-workshop-preview-multiplayer-mode-for-dr-grordborts-invaders-gdc-0194957/

https://variety.com/2019/gaming/news/magic-leap-multiplayer-vr-grordbattle-1203166465/



Documents

Anon-Emoji: An Optical See-Through Augmented Reality ......form facial landmark detection and sentimental analysis to extract the facial expression of the subject, map it to a cer-tain