SurroundWeb: Mitigating Privacy Concerns ina3DWebBrowser · 2015. 5. 11. · Figure 5 visualizes the data that the room skeleton pro-videstotheapplication. Room Setup: In a static

SurroundWeb: Mitigating Privacy Concernsin a 3D Web Browser

John VilkUniversity of Massachusetts, Amherst

Alexander MoshchukGoogle

David Molnar, Benjamin Livshits, Eyal OfekMicrosoft Research

Chris RossbachVMWare Research

Helen J. Wang, Ran GalMicrosoft Research

Abstract—Immersive experiences that mix digitaland real-world objects are becoming reality, but theyraise serious privacy concerns as they require real-timesensor input. These experiences are already presenton smartphones and game consoles via Kinect, andwill eventually emerge on the web platform. However,browsers do not expose the display interfaces neededto render immersive experiences. Previous securityresearch focuses on controlling application access tosensor input alone, and do not deal with display inter-faces. Recent research in human computer interactionshas explored a variety of high-level rendering interfacesfor immersive experiences, but these interfaces revealsensitive data to the application. Bringing immersiveexperiences to the web requires a high-level interfacethat mitigates privacy concerns.

This paper presents SurroundWeb, the first 3D webbrowser, which provides the novel functionality of ren-dering web content onto a room while tackling manyof the inherent privacy challenges. Following the prin-ciple of least privilege, we propose three abstractionsfor immersive rendering: 1) the room skeleton letsapplications place content in response to the phys-ical dimensions and locations of renderable surfacesin a room; 2) the detection sandbox lets applicationsdeclaratively place content near recognized objects inthe room without revealing if the object is present;and 3) satellite screens let applications display contentacross devices registered with SurroundWeb. Throughuser surveys, we validate that these abstractions limitthe amount of revealed information to an acceptabledegree. In addition, we show that a wide range ofimmersive experiences can be implemented with ac-ceptable performance.

Index Terms—augmented reality; JavaScript; webbrowser; projection mapping

I. Introduction

Immersive experiences mix digital and real-world ob-jects to create rich computing experiences. These experi-ences can take many forms, and span a wide variety ofsensor and display technologies. Games using the Kinectfor Xbox can sense the user’s body position, then showan avatar that mimics the user’s movements. Smartphonetranslation applications, such as Word Lens, perform real-time translations in the video stream from phone cameras.A projector paired with a Kinect can enhance a militaryshooter game with a grenade that “bounces” out of the

television screen and onto the floor [13]. Head mounteddisplays such as Google Glass, Epson Moverio, and MetaSpaceGlasses have dropped dramatically in price in thelast five years and point the way toward affordable main-stream platforms that enable immersive experiences.

Toward a 3D web: In this paper, we set out to builda 3D web browser that lets web pages and applicationsproject content onto physical surfaces in a room. Whilethis is a novel and somewhat futuristic idea, it is possibleto implement using modern sensors such as the Kinect.However, designing such a browser poses a plethora ofprivacy challenges. For example, current immersive experi-ences build custom rendering support on top of raw sensordata, which could contain sensitive data such as medicaldocuments, credit card numbers, and faces of childrenshould they be in view of the sensor. This paper is anexploration of how to reconcile 3D browser functionalitywith privacy concerns. In addition to this paper, we pro-duced an informative video that we encourage readers towatch to familiarize themselves with this scenario: http://research.microsoft.com/apps/video/?id=212669.

Tension between functionality and privacy: There isa clear tension between functionality and privacy: a 3Dbrowser needs to expose high-level immersive renderinginterfaces without revealing sensitive information to theapplication. Traditional web pages and applications aresandboxed within the browser; the browser interprets theirHTML and CSS to determine page layout. Unfortunately,as prior experience with CSS demonstrates, these seem-ingly innocuous declarative forms of describing web con-tent are fraught with privacy issues, such as surprisingCSS-based attacks that use history sniffing [30] and evenmore subtle ones that rely on scrollbars [16] and statelessfingerprinting [1].

In this work we discovered that a 3D browser needsto expose new rendering abstractions that let web ap-plications project content into the room, while limitingthe amount of revealed information to acceptable levels.Previous security research focuses on controlling applica-tion access to sensor data through filtering, access control,and sandboxing, but this research does not touch upon

2015 IEEE Symposium on Security and Privacy

© 2015, John Vilk. Under license to IEEE.

DOI 10.1109/SP.2015.33

431

2015 IEEE Symposium on Security and Privacy

© 2015, John Vilk. Under license to IEEE.

DOI 10.1109/SP.2015.33

431

the complementary rendering interfaces that a 3D webbrowser requires [11, 12, 15]. Recent HCI research describeshigh-level interfaces for immersive rendering, but theseapproaches reveal sensitive data to the application [7, 13,14].

SurroundWeb: In this paper, we presentSurroundWeb, a 3D web browser that lets webapplications display web content around a physical roomin a manner that follows the principle of least privilege.We give applications access to three high-level interfacesthat support a wide variety of existing and proposedimmersive experiences while limiting the amount ofinformation revealed to acceptable levels. First, the roomskeleton exposes the location, size, and input capabilitiesof flat rectangular surfaces in the room, and a mechanismfor associating web content with each. Applications canuse this interface to make intelligent layout decisionsaccording to the physical properties of the room. Second,the detection sandbox lets applications declaratively placecontent relative to objects in the room, without revealingthe presence or locations of these objects. Third, satellitescreens let applications display content across devicesregistered with SurroundWeb.

A. Contributions

This paper makes the following contributions:

• We implement SurroundWeb, a novel 3D webbrowser built on top of Internet Explorer. Sur-roundWeb lets web pages display content acrossmultiple surfaces in a room, run across multiplephones and tablets, and to take natural user inputs.

• To resolve the tension between privacy and func-tionality, we propose three novel abstractions: roomskeleton, detection sandbox, and satellite screens.

• We define the notions of detection privacy, renderingprivacy, and interaction privacy as key propertiesfor privacy in immersive applications, and show howSurroundWeb provides these properties.

• We evaluate the privacy and performance of Sur-roundWeb. To evaluate privacy, we survey usersrecruited through a professional survey provider andask them about the information revealed by Sur-roundWeb compared to legacy approaches.1 Wediscover that SurroundWeb has acceptable perfor-mance through benchmarking the rendering and lay-out speed of the room skeleton and detection sandbox.

B. Paper Organization

The rest of this paper is organized as follows. In Sec-tion II, we discuss the threat model, and provide anoverview of SurroundWeb. Section III describes ourabstractions for immersive rendering. Section IV describes

1Prior to running our surveys, we reviewed our questions, the datawe proposed to collect, and our choice of survey provider with ourinstitution’s group responsible for protecting the privacy and safetyof human subjects.

Fig. 1: Architectural diagram of SurroundWeb. Items belowthe thick line are considered trusted, and items with solidborders are off-the-shelf components.

key privacy properties for immersive applications, andhow SurroundWeb’s abstractions provide these prop-erties. Section V presents the implementation of Sur-roundWeb, and Section VI contains an evaluation of ourdesign decisions and an assessment of the performance ofour prototype. Section VII discusses the limitations of ourapproach alongside future work, and Section VIII describesrelated work. Finally, Section IX concludes.

II. Overview

In this work, we focus on the scenario where the com-puter, its operating system, the sensor hardware, andSurroundWeb itself are trusted, but SurroundWeb isexecuting an untrusted third-party web page or applica-tion. The SurroundWeb environment is identical to thebrowser sandbox used in regular browsers, except it hasbeen augmented with SurroundWeb interfaces. As aresult, the web application can only access sensor dataand display devices indirectly through trusted APIs thatwe describe in this paper.

Figure 1 displays SurroundWeb’s system diagram,with trusted components below the thick line and off-the-shelf components with solid borders. SurroundWebconsists of two main parts: the SurroundWeb API,which extends the browser API with immersive renderingsupport, and the SurroundWeb Renderer, which is re-sponsible for interacting with sensors and display devices.Like in a regular web browser, web pages have no directaccess to native resources (including sensors and displays),and are restricted to the interfaces furnished to themthrough JavaScript, HTML, and CSS.

Just like regular CSS, SurroundWeb supports bothabsolute and relative placement of web content withina room. For SurroundWeb, the notion of placement

432432

CSS SurroundWeb

Placement Example Placement Example

Absolute #picture {position: absolute; top:400px;} Room-location-aware<segment screen="4"><img src="picture.jpg" />

</segment>

Content is placed at an absolute location, relative to the content’sparent element. The example places a picture 400 pixels below itsparent element.

Content is placed in an absolute location in the room. The exampleplaces picture.jpg on screen 4, which corresponds to a surface inthe room.

Relative #picture {position: relative; right: 50px;} Object-relative #picture {left-of: "chair";}

Content is placed at a location relative to its normal position inthe document. Here, a picture is shifted 50 pixels to the right.

Content is placed relative to a recognized object in the room. Here,a picture is placed to the left of a chair.

Fig. 2: SurroundWeb supports positioning web content in the room in a way that is analogous to CSS positioning.

(a) Car racing news site. (b) Virtual windows (c) Road maps (d) Projectedawareness IM

Fig. 3: Four web applications enabled by SurroundWeb, shown with multiple projectors and an HDTV.

needs to be adapted to the context of 3D rendering.Figure 2 describes how SurroundWeb provides room-level positioning that is analogous to CSS positioning. Justlike in the case of regular CSS, control over placementgiven to the web page or application opens the door topossible privacy violations. Liang et al. describe how CSSfeatures like customizable scrollbars and media queries canbe used to learn information about the user’s browser and,in some cases, to sniff information about browsing historieswith the help of sophisticated timing attacks [16].

III. SurroundWeb Abstractions

We augment the browser with two privacy-preservingimmersive rendering interfaces: the room skeleton forroom-location-aware rendering tasks, and the detectionsandbox for object-relative rendering tasks. We also dis-cuss satellite screens, which are an extension to the roomskeleton that let applications display across devices. Weintroduce these abstractions with the help of exampleapplications, described in Figures 3 and 4, which clarifytheir intent and utility.

A. The Room Skeleton

The room skeleton reveals the location and dimensionsof all flat renderable surfaces in the room to the applicationas JavaScript objects, and exposes an API for displayingHTML content on each. We call these renderable surfacesscreens. SurroundWeb handles rendering and projectingthe content into the physical room. Using this interface,

applications can perform room-location-aware rendering.Figure 5 visualizes the data that the room skeleton pro-vides to the application.

Room Setup: In a static room, SurroundWeb canconstruct the room skeleton in a one-time setup phase,and can reuse it until the room changes. First, the setupprocess uses a depth camera to locate flat surfaces inthe room that are large enough to host content. Next, itdiscovers all display devices that are available and deter-mines which of them can show content on the detectedsurfaces. The process for doing this differs depending onthe display device. Head-mounted displays can supportrendering content on arbitrary surfaces, but projectorsare limited to the area they project onto. For projectors,monitors, and televisions, SurroundWeb maps devicedisplay coordinates to room coordinates; this process canbe automated using a video camera. Finally, the setup pro-cess discovers which input events are supported for whichdisplays. For example, a touchscreen monitor supportstouch events, and depth cameras can be used to supporttouch events on projected flat surfaces [25].

Runtime: At runtime, the application receives only theset of screens in the room. Each screen has a resolution,a measure of pixel density (points-per-inch), a relativelocation to other screens, and a list of input events that canbe accepted by the screen. For example, these input eventsinclude “none,”“mouse,” or “touch.” Applications can usethis information to dynamically adapt how it presents its

433433

Application Requires Description

SurroundPoint Room Skeleton Each screen in the room becomes a rendering surface for a room-wide presentation (seeFigure 6).

Car Racing News Site Room Skeleton Live video feed displays on a central monitor, with racing results projected around it (seeFigure 3a).

Virtual Windows Room Skeleton ”Virtual windows” render on surfaces around the room that display scenery from distantplaces (see Figure 3b).

Road Maps Room Skeleton Active map area displays on a central screen, with surrounding area projected around it(see Figure 3c).

Projected Awareness IM [3] Room Skeleton Instant messages display on a central screen, with frequent contacts projected above (seeFigure 3d). This is an example of Focus+Context from UI research [2].

Karaoke Room Skeleton Song lyrics appear above a central screen, with music videos playing around the room (seeFigure 10).

Advertisements Detection Sandbox Advertisements can register content to display near particular room objects detected by thedetection sandbox without knowing their locations or presence.

Calorie Informer Detection Sandbox The Calorie Informer displays calorie count and serving information by recognized foodproducts without knowing what products the user has.

SmartGlass [20] Satellite Screens Xbox SmartGlass turns a smartphone or tablet into a second screen; a web page can usesatellite screens to turn a smartphone or tablet into an additional screen.

Multiplayer Poker Satellite Screens Each user views their cards on a satellite screen on a smartphone or tablet, with the publicstate of the game displayed on a surface in the room.

Fig. 4: Web applications and immersive experiences that are possible with SurroundWeb.

Raw Data Room Skeleton Detection Sandbox

Fig. 5: Our room skeleton and detection sandbox abstractions reveal significantly less information than raw sensor data. Here,we visualize the information that these interfaces provide to the application. The detection sandbox reveals nothing, as theapplication provides content to be rendered near objects, but is not informed if the content is rendered or if the object is present.

content in response to the capabilities of the room.The application can associate HTML content with a

screen to display content in the room. SurroundWebuses a standard off-the-shelf browser renderer to render thecontent, and then displays it in the room according to in-formation discovered at setup time. If the screen supportsinput events, the application can use existing standard webAPIs to register event listeners on the HTML content.

Example 1 (SurroundPoint) SurroundPoint is a full-room presentation application pictured in Figure 6. Whiletraditional presentation software is limited to a singlescreen, SurroundPoint can display information around theentire room. Each slide in a SurroundPoint presentationhas “main” content to present on a primary display, plusoptional additional content. By querying the room skele-ton, SurroundPoint adapts the presentation to differentroom layouts.

Consider the case where the room has only a sin-gle 1080p monitor and no projectors, such as running ona laptop or in a conference room. Here, the room skeletoncontains only one screen: a single 1920×1080 rectangle.

Fig. 6: The SurroundPoint presentation application uses theroom skeleton to render an immersive presentation with leastprivilege. This image displays SurroundPoint running in ourSurroundWeb prototype.

Based on this information, SurroundPoint knows that itshould show only the “main” content.

In contrast, consider the room shown in Figure 5.

434434

This room contains multiple projectable screens, exposedthrough the room skeleton. SurroundPoint can detect thatthere is a monitor plus additional peripheral screens thatcan be used for showing additional content. �

B. The Detection Sandbox

Advances in object detection make it possible to quicklyand relatively accurately determine the presence and loca-tion of many objects or people in a room. Object detec-tion enables object-relative rendering, where an applicationspecifies that content should be rendered by an object inthe room. For example, an application can detect if theuser is holding a bottle of medicine, then show instructionsfor safely using the medicine as an “annotation” to thebottle. However, object detection is a privacy challenge be-cause the presence of objects, such as particular medicines,can reveal sensitive information about a user’s life. Thiscreates a tension between privacy and functionality.

Our detection sandbox provides least privilege forobject-relative rendering. All object recognition code runsas part of trusted code in SurroundWeb. Applicationsregister HTML content up front with the detection sand-box using a system of rendering constraints that canreference physical objects.

After the application loads, the detection sandbox im-mediately renders all registered content, regardless ofwhether or not it will be shown in the room. In doingso, we prevent the application from using tracking pix-els to determine the presence of an object [29]. Then,SurroundWeb checks registered content against a list ofobjects detected. If there is a match, the detection sandboxsolves the rendering constraints to determine a renderinglocation, and places the content in the room.

Constraint solving occurs asynchronously on a separatethread from the web application, preventing the appli-cation from directly timing constraint solving to inferwhether or not an object is present. In addition, to preventthe application from determining the presence of an object,the detection sandbox suppresses user input events to theregistered content and does not notify the application ifcontent is displayed in the room. In Section V, we showhow applications specify rendering constraints via CSS anddiscuss the FLARE constraint solver [28].

Example 2 (Object-contextual ads) An applicationcan register an ad to display if an energy drink can ispresent. When the can is placed in view of a camera in theroom, the detection sandbox detects that the can is presentand also detects that the application has registered contentto display if an energy drink is present. The detectionsandbox then displays the content, which in this case isan ad encouraging the user to drink tea instead. Theapplication, however, never learns if the content displaysor if the can is present.

With our detection sandbox, we cannot support usersclicking on ads, because the input event would leak the

presence of the object to the application. However, we be-lieve that we can leverage the sizeable amount of previouswork on privacy-preserving ad delivery to make interactionpossible in the future [9, 32]. We discuss this idea in moredetail in Section VII. �

C. Satellite Screens

In our discussion of the room skeleton above, we talkedabout exposing fixed, flat surfaces present in a room asvirtual screens for content. Today, however, many peoplehave personal mobile devices, such as smartphones ortablets. To accommodate these devices, we extend theroom skeleton with remote displays that we call satellitescreens.

Satellite screens let applications build multi-player ex-periences without needing to explicitly tackle creating adistributed system. By navigating to a URL of a cloudservice, phones, tablets, or anything with a web browsercan register a display with the room skeleton of a par-ticular SurroundWeb instance. JavaScript running onthe cloud service discovers the device’s screen size andinput capabilities, then communicates these to the roomskeleton. The room skeleton surfaces each device to theapplication as a new screen, which can be rendered to likeany other surface in the room. When a user closes theweb page, the cloud service notifies the room skeleton toremove the screen from the room skeleton.

Example 3 (Private displays for poker) Satellitescreens enable applications that need private displays. Forexample, a poker application might use a shared high-resolution display to show the public state of the game.As players join personal phones or tablets as satellitescreens, the application shows each player’s hand on herown device. Players can also make bets by pressing inputbuttons on their own device. �

IV. Privacy Properties

Our abstractions provide three privacy properties: de-tection privacy, rendering privacy, and interaction privacy.We explain each in detail, elaborating on how we providethem in the design of our abstractions. We then discussimportant limitations and how they may be addressed.

A. Detection Privacy

Property: Detection privacy means that an applicationcan customize its layout based on the presence of an objectin the room, but the application never learns whetherthe object is present or not. Without detection privacy,applications could scan a room and look for items thatreveal sensitive information about a user’s lifestyle.

For example, an e-commerce application could scan aroom to detect valuable items, make an estimate of theuser’s net worth, and then adjust the prices it offersto the user accordingly. For another example, an appli-cation could use optical character recognition to “read”documents left in a room, potentially learning sensitive

435435

information such as social security numbers, credit cardnumbers, or other financial data.

Because the presence of these objects is sensitive, theseprivacy threats apply even if the application has access toa high-level API for detecting objects and their properties,instead of raw video and depth streams [11]. At the sametime, as we argued above, continuous object recognitionenables new experiences. Therefore, detection privacy isan important goal for balancing privacy and functionalityin immersive room experiences.

Mechanism: The detection sandbox provides detectionprivacy. Our threat model for detection privacy is thatapplications are allowed to register arbitrary content inthe detection sandbox. This registration takes the formof rendering constraints specified relative to a physicalobject’s position, which tell the detection sandbox whereto display the registered content. The rendering process ishandled by trusted code in SurroundWeb, which inter-nally renders content regardless of the object’s presence todefeat tracking pixels. SurroundWeb also blocks inputevents to this content. As a result, the application neverlearns whether an object is present or not, no matter whatis placed in the detection sandbox. However, our approachplaces limitations on applications, both fundamental to theconcept of the detection sandbox and as artifacts of ourcurrent approach. We discuss these in detail in Section VII.

B. Rendering Privacy

Property: Rendering privacy means that an applicationcan render content into a room, but it learns no infor-mation about the room beyond an explicitly specifiedset of properties needed to render. Without renderingprivacy, applications would have access to additional in-cidental information, which may be sensitive in nature.For example, many existing immersive room applicationsprocess raw camera data directly to determine where torender content. The raw camera feed can contain largeamounts of incidental sensitive information, such the facesof children or documents in front of the camera. However,the application needs access to a subset of this data so itcan intelligently determine where to place virtual objectsin the physical room. Therefore, rendering privacy is animportant goal for balancing privacy and functionality inimmersive room experiences.

Mechanism: The challenge in rendering privacy is cre-ating an abstraction that follows the principle of leastprivilege for rendering, which we accomplish through theroom skeleton. Our threat model for rendering privacy isthat applications are allowed to query the room skeletonto discover screens, their capabilities, and their relativelocations, as we described above. Unlike with the detectionsandbox, we explicitly allow the web server to learn theinformation in the room skeleton. The rendering privacyguarantee is different from the detection private guarantee,because in this case we explicitly leak a specific set of in-formation to the application, while with detection privacy

we leak no information about the presence or absence ofobjects. User surveys in Section VI show that revealingthis information is acceptable to users.

C. Interaction Privacy

Property: Interaction privacy means that an applicationcan receive natural user inputs from users, but it doesnot see other information such as the user’s appearanceor how many people are present. Interaction privacy isimportant because sensing interactions usually requiressensing people directly. For example, without a systemthat supports interaction privacy, an application that usesgesture controls could potentially see a user while sheis naked or see faces of people in a room. This kind ofinformation is even more sensitive than the objects in theroom. Therefore, interaction privacy is an important goalfor balancing privacy and functionality in immersive roomexperiences.

Mechanism: We provide interaction privacy through acombination of two mechanisms. First, trusted code inSurroundWeb runs all natural user interaction detectioncode, such as gesture detection. Just as with the detectionsandbox, applications never talk directly to gesture detec-tion code. This means that applications cannot directlyaccess sensitive information about the user.

Second, we map natural user gestures to existing UIevents, such as mouse events. We perform this remappingto enable interactions with applications even if those ap-plications have not been specifically enhanced for naturalgesture interaction. These applications are never explicitlyinformed that they are interacting with a user throughgesture detection, as opposed to through a mouse andkeyboard. Our choice to focus on remapping gestures toexisting UI events does limit applications. In Section VIIwe discuss how this could be relaxed while maintainingour privacy properties.

V. Implementation

Excluding external dependencies, our SurroundWebprototype is written in 10K lines of C#, and 1.5K linesof JavaScript. We describe how we implement the coreSurroundWeb abstractions from Section III, and how weexpose the abstractions to web applications in a naturalmanner. We discuss how SurroundWeb extends CSP tolet web sites safely embed content from other origins usingiframes. We walk the reader through writing the frontendto a karaoke web application, which illustrates how easyit is to write a SurroundWeb application, and describea variety of items we have built using our prototype.

A. Abstraction Implementations

Our SurroundWeb prototype works with instru-mented rooms, each of which contains multiple projec-tors and Kinect sensors. SurroundWeb uses the Kinectsensors to scan the geometry of the room, then uses theprojectors to display application content at appropriate

436436

places in the room. Figure 1 displays an architecturaldiagram of the prototype.

Content rendering: The prototype uses the C# Web-

Browser component to render HTML content in InternetExplorer. The prototype maps the rendered content toroom locations, which are mapped back to display devicecoordinates. The prototype displays the content at theresulting display coordinates.

Room skeleton: The room skeleton exposes the sizes andrelative locations of flat surfaces in the physical room thatare able to display content. Our prototype constructs theroom skeleton offline using KinectFusion [22]. Since Kinectdepth data is very noisy, the prototype applies customfilters to the data stream that smooth the data to makeit simpler to identify flat surfaces. Once the prototypedetects the flat surfaces, it maps each back to displaydevice coordinates.

Detection sandbox: The detection sandbox lets appli-cations place content relative to recognized objects in theroom without leaking the presence of those objects to theapplication. Our prototype detection sandbox consists oftwo core components: object detection to detect and locateobjects in the physical room, and constraint solving todetermine where to place web content in the room.

To detect objects, our prototype runs object classifierson continuous depth and video feeds from Kinect cameras.While detecting arbitrary objects is a difficult computervision problem, existing experiences use QR-like identifiersto recognize objects, which can be read using consumer-grade cameras. In addition, while our prototype is limitedto vision-based object detection, object detection can beaccomplished through other means, such as through RFIDtags or sensors located in objects. After an object isdetected, the prototype checks the current web page forregistered content, then updates its rendering of the room.

SurroundWeb compiles the location of detected ob-jects and application-specified object-relative content lo-cations into 3D constraints. These constraints are fed intothe FLARE constraint solver, which finds appropriateroom coordinates for the content [28]. SurroundWebuses these coordinates to render content appropriately inthe room.

Satellite screens: We host a SurroundWeb satellitescreen service in Microsoft Azure. Users point their phone,tablet, or other device with a browser to a specific URLassociated with the running SurroundWeb instance.The front end runs JavaScript that discovers the browser’scapabilities, then sets a cookie in the browser containinga screen unique identifier and finally registers this newscreen with a service backend. The screen appears in theroom skeleton as a new renderable surface with no locationinformation, and the Azure service proxies user input toSurroundWeb.

Natural user interaction: The prototype uses theKinect SDK to detect people and gestures [19]. Figure 7

Fig. 7: SurroundWebmaps natural gestures to mouse events.Here, the user uses a standard “push” gesture to tell Sur-roundWeb to inject a click event into the web application.

Property Description

getAll() (Static) Returns an array of all of screens in the room.

id A unique string identifier for this screen.ppi The number of pixels per inch.height Height of the screen in pixels.width Width of the screen in pixels.capabilities List of JavaScript events supported on this screen.location Location of the screen in the room as an object literal,

with fields ul (upper-left) and lr (lower-right) eachcontaining an object literal with x, y, and z fields.

Fig. 8: Properties of each screen object, which are exposedthrough JavaScript.

shows a photograph of SurroundWeb detecting that theuser is performing a pushing gesture over a particular partof the wall. After a gesture is detected, SurroundWebmaps the gesture to a generic mouse or keyboard event,then injects that event into the running web page.

B. API Design

The prototype exposes the room skeleton, satellitescreens, and detection sandbox to web pages throughextensions to HTML, CSS, and JavaScript. We brieflydescribe these extensions below.

Room skeleton: To display content on a screen in theroom skeleton, the application must first identify whichscreen it wants to use. The application queries for availablescreens by calling Screen.getAll(). Figure 8 displays allof the properties available on each Screen object. Then, itmust encapsulate the web content in a <segment> HTMLtag, which is a new tag that we add to the web platform.The segment tag must have a screen property with therelevant screen’s id value assigned. Once this is done, thecontent will appear in the room on the specified screen.

// Display tweet on first screen in room skeleton.$(’#tweet ’).attr(’screen ’, Screen.getAll ()[0].id);

<segment id="tweet"><div class="tweet">@TypeScriptLang: TypeScript 1.4 now available! Now

with union types , type aliases , better inference ,and more ES6 support!

</div ></segment >

437437

Constraint Description

left-of Place the segment to the left of the object.

right-of Place the segment to the right of the object.

above Place the segment above the object.

below Place the segment below the object.

valign Vertically align the segment with the object relativeto the plane perpendicular to the ground.

halign Horizontally align the segment with the object rela-tive to the ground.

Fig. 9: Our prototype lets applications specify object-relativelayout constraints on content via new CSS properties. Thesecan be applied to segment HTML elements, which we add tothe web platform. Each property takes a list of object names.

Satellite screens: The process for displaying content ona satellite screen is identical to displaying content on ascreen in the room skeleton. Satellite screens appear inthe list of screens returned through Screen.getAll(), soapplications simply associate their id with the segment itwishes to display on each.

Detection sandbox: Our prototype augments CSS tosupport object-relative content positioning in the room.2

Web applications encapsulate the web content in a <seg-

ment> tag, and specify object-relative CSS constraints onthe tag. We describe the constraints that our prototypesupports in Figure 9, which translate into FLARE 3Dconstraints in a straightforward manner [28].

// Advertise lightbulb prices near a lamp in the room.#lightbulb -prices { left -of: "lamp" }

<segment id="lightbulb -prices"><div class="prices">GE 60-Watt Incandescent Light Bulbs: 4 for $6.49 @

Amazon.com</div ></segment >

C. Embedding Web Content

We now discuss how to handle embedding of contentfrom different origins inside of a web page running inSurroundWeb. In this discussion, we use the same website principal as defined in the same-origin policy (SOP)for SurroundWeb web pages, which is labeled by a website’s origin: the 3-tuple 〈protocol, domainname, port〉.

In an effort to be backward-compatible and to preserveconcepts that are familiar to the developer, we extend thesecurity model present on the web today: SurroundWebweb pages can embed content from other origins, suchas scripts and images, and can use Content Security Pol-icy (CSP) to restrict which origins it can embed particulartypes of content from. Like in the web today, scripts em-bedded from other origins will have full access to the webpage’s Document Object Model (DOM), which includes allbrowser APIs, SurroundWeb interfaces, and segmentsdefined by the web page.

We preserve this property for compatibility reasons,as many current web sites and JavaScript libraries rely

2Since we are unable to modify Internet Explorer’s CSS engine, ourprototype implementation exposes constraints through equivalentHTML attributes on segment elements.

Fig. 10: The Karaoke application uses the room skeletonto make room layout decisions. The application dynamicallydecides to show a video on the high resolution TV. Lyrics andphotos go on projected screens, and related videos are projectedon the table.

on the ability to load popular scripts, such as jQuery,from Content Distribution Networks (CDNs). Since Sur-roundWeb extends HTML, JavaScript, and CSS, theseexisting libraries for web sites will still have utility inSurroundWeb.

Web pages can use the iframe tag to safely embedcontent from untrusted origins without granting themaccess to SurroundWeb’s interfaces. In the current web,a frame has a separate DOM from the embedding site. Ifthe frame is from a different origin than the embeddingsite, then the embedded origin and the embedding origincannot access each other’s DOM. We extend CSP to allowweb pages to control whether or not particular originscan access the SurroundWeb interfaces from within aframe. If a web page denies an embedded origin accessto these interfaces, then the iframe will render as itdoes today: to a fixed-size rectangle that the embeddingorigin can control the placement of. If the web page allowsan embedded origin access to these interfaces, then theiframe will be able to render content in the room skeletonand detection sandbox.

D. Walkthrough: Karaoke Application

To illustrate how a web page can be developed usingSurroundWeb, we will walk through a sample Karaokeapplication, shown in Figure 10.

This web page renders karaoke lyrics above a centralscreen, with a video on the central screen and picturesaround the screen. Related songs are rendered on a table.The web page contains the following HTML:

<segment id="lyrics"></segment ><segment id="video"></segment ><segment id="related">

</segment >

The web page must scan the room skeleton to assign thesegments specified in the HTML to relevant screens.1) Using JavaScript, the web page locates the verticalscreen with the highest resolution, which will contain thevideo:

438438

<html ><head ><script type="text/javascript">

// Wait for HTML to load before running code.window.onload = function () {

var screens = Screen.getAll (), bigVScn , maxPpi = 0;function isVertical(scn) {

var scnLoc=scn.location ,ul=scnLoc.ul,lr=scnLoc.lr,zDelta = Math.abs(ul.z - lr.z),xDelta = Math.abs(ul.x - lr.x),yDelta = Math.abs(ul.y - lr.y);

return zDelta > xDelta || zDelta > yDelta;}// Find the highest resolution vertical screenscreens.forEach(function(scn) {

if (isVertical(scn) && scn.ppi > maxPpi)bigVScn = scn;

maxPpi = bigVScn.ppi;});// Assign video to screen.document.getElementById(’video ’)

.setAttribute(’screen ’, bigVScn.id);

var aboveScn , bigLoc = bigVScn.location;screens.forEach(function(scn) {

if (! isVertical(scn) || scn === bigVScn) return;var scnLoc=scn.location ,ul=scnLoc.ul,lr=scnLoc.lr;if (lr.z > bigLoc.ul.z) {

// scn is above bigVScnif (aboveScn) {

// Is scn closer to bigVScn than aboveScn?if (aboveScn.location.lr.z > lr.z)

aboveScn = scn;}else aboveScn = scn;

}});

// Assign lyrics to screen.document.getElementById(’lyrics ’)

.setAttribute(’screen ’, aboveScn.id);

var bigHScn , maxArea = 0;screens.forEach(function(scn) {

var area = scn.height*scn.width;if (! isVertical(scn) && area > maxArea) {

maxArea = area; bigHScn = scn;}

});// Assign related videos to screen.document.getElementById(’related ’)


// Assign random related media to other screens.screens.forEach(function(scn) {

if(scn !== aboveScn &&scn!= bigHScn &&scn!= bigVScn)renderMedia(scn);

});function renderMedia(scn) {

var newSgm = document.createElement(’segment ’);newSgm.setAttribute(’screen ’, scn.id);newSgm.appendChild(constructRandomMedia ());document.body.appendChild(newSgm);

}};

</script ></head ><body >

<segment id="lyrics"></segment ><segment id="video"></segment ><segment id="related" >

</segment ></body ></html >

Fig. 11: The complete SurroundWeb code for the Karaoke Application.

var screens = Screen.getAll (), bigVScn , maxPpi = 0;function isVertical(scn) {

var scnLoc=scn.location ,ul=scnLoc.ul,lr=scnLoc.lr,zDelta = Math.abs(ul.z - lr.z),xDelta = Math.abs(ul.x - lr.x),yDelta = Math.abs(ul.y - lr.y);

return zDelta > xDelta || zDelta > yDelta;}// Find the highest resolution vertical screenscreens.forEach(function(scn) {

if (isVertical(scn) && scn.ppi > maxPpi)bigVScn = scn;

maxPpi = bigVScn.ppi;});// Assign video to screen.document.getElementById(’video ’)

.setAttribute(’screen ’, bigVScn.id);

2) The web page determines the closest vertical screenabove the main screen, and renders the karaoke lyrics toit. In the code below, z is the distance from the floor:

var aboveScn , bigLoc = bigVScn.location;screens.forEach(function(scn) {

if (! isVertical(scn) || scn === bigVScn) return;var scnLoc=scn.location ,ul=scnLoc.ul,lr=scnLoc.lr;if (lr.z > bigLoc.ul.z) {

// scn is above bigVScnif (aboveScn) {

// Is scn closer to bigVScn than aboveScn?if (aboveScn.location.lr.z > lr.z)

aboveScn = scn;}else aboveScn = scn;

}});// Assign lyrics to screen.document.getElementById(’lyrics ’)


3) For the listing of related videos, the application locatesthe largest horizontal screen in the room:

var bigHScn , maxArea = 0;screens.forEach(function(scn) {

var area = scn.height*scn.width;if (! isVertical(scn) && area > maxArea) {

maxArea = area; bigHScn = scn;}

});// Assign related videos to screen.document.getElementById(’related ’)


4) Finally, the application assigns random media to renderon other screens:

screens.forEach(function(scn) {if(scn !== aboveScn &&scn!= bigHScn &&scn!= bigVScn)

renderMedia(scn);});function renderMedia(scn) {

var newSgm = document.createElement(’segment ’);newSgm.setAttribute(’screen ’, scn.id);newSgm.appendChild(constructRandomMedia ());document.body.appendChild(newSgm);

}

Now that the rendering code has finished, the karaokeapplication can update each screen in the same mannerthat a vanilla web site updates individual HTML elementson the page. Figure 11 shows all of the rendering code puttogether. Should the chosen configuration be suboptimal

439439

Fig. 12: Using our adaptation library, a 3D-cube demo runsacross multiple surfaces in our SurroundWeb prototype aftera simple one-line change. While the demo believes it is drawinga 3D scene to a single HTML5 canvas, it is actually drawingthe scene across three SurroundWeb screens.

for the user, the Karaoke application can provide controlsthat allow the user to dynamically choose the renderinglocation of the segments.

E. Application Showcase

Using the SurroundWeb prototype, we implementedroom-wide presentation software, a soda can object detec-tor, and a JavaScript library that adapts existing HTML5canvas applications to work across screens in the roomskeleton.

Room-wide presentations: SurroundPoint is a room-wide presentation application where each slide can spanmultiple surfaces in the room. A SurroundPoint presenta-tion can be seen in Figure 6. Each slide contains “main”content, and a set of optional additional content. Byquerying the room skeleton, SurroundPoint adapts thepresentation to different environments. We used Surround-Point to deliver multiple presentations to the press [21, 31].

Soda can detector: We implemented a soda can detectorthat supports detecting different types of soft drink cansusing a nearest-neighbor classifier based on color imagehistograms. When the classifier detects a soda can, thedetector alerts the detection sandbox, which updates therendering of the room. Note that object detectors areconsidered to be a trusted part of SurroundWeb, andweb pages do not directly interact with them.

Adapting existing web content: Since SurroundWebmerely adds additional features to the web platform, exist-ing web sites already function in SurroundWeb, but theyare limited to a single screen. To ease the transition intoa multi-screen layout, we wrote a JavaScript library calledJumboTron that automatically adapts applications thatuse the HTML5 canvas element to render across multiplesurfaces in the room. The library detects screens in theroom skeleton that are close together, and creates distinctcanvas elements for each screen. It then binds the canvaselements together in a JumboTron object, which emulatesthe canvas interface. The application interacts with theJumboTron object, which proxies the input to the relevant

sub-canvases. Figure 12 displays a three.js demo thatrequired a one-line change to use our library and renderacross surfaces.

VI. Evaluation

The focus of this evaluation is two-fold. We want toevaluate whether our abstractions deliver adequate privacyguarantees, which we do via a user study, and whether theperformance of SurroundWeb is acceptable.

A. Privacy

To evaluate the privacy of our abstractions, we con-ducted two surveys of random US Internet users usingthe Instant.ly survey service [33]. Below are the researchquestions we were trying to address via surveys and theanswers we discovered experimentally.

• RQ1: Is the information revealed by specific ab-stractions considered less sensitive than raw data?(Yes) We asked this question to evaluate whether ourabstractions in fact mitigate privacy concerns.

• RQ2: Do user privacy preferences change depend-ing on the application asking for data? (Yes) Weasked this question to evaluate whether follow-onwork should support different abstractions or differentpermissions for different web pages.

• RQ3: Is there a difference in user perceived sensitivitybetween giving the relative locations of flat surfacesvs. just stating the dimensions of flat surfaces? (No)We asked this question because we were not surewhether to include relative locations in the roomskeleton; the answer justified including them becauseit increased functionality without changing perceivedsensitivity of data given to the application.

We did not ask any questions that required participants tosupply us with personally identifiable information. Prior torunning our surveys, we reviewed our questions, the datawe proposed to collect, and our choice of survey providerwith our institution’s group responsible for protecting theprivacy and safety of human subjects.

Room skeleton sensitivity: The first survey measuresuser privacy attitudes toward the information that theroom skeleton reveals to applications. The survey beganwith a briefing, summarizing the capabilities commonlyavailable to immersive experiences. Next, we showed userstwo pictures: a color picture of a room, and a visualizationof the data exposed from that room in the room skeleton.Figure 14 displays these pictures. The survey asked thefollowing question: Only consider the information explictlydisplayed in the two images. Which image contains moreinformation that you consider “private”? We also askedrespondents to justify their answer in a free-text field.Finding 1: Out of 50 respondents, 78% claimed thatthey felt that the raw data was more sensitive than theinformation that the room skeleton exposes to web pages.However, we noticed from the written justifications thatsome people did not understand the survey and answered

440440

(a) Large flatsurfaces: size(height and width)and orientation(standing up/layingdown)

(b) Location oflarge flat surfaces

(c) Head position

(d) Hand positions (e) Body position (f) Face

(g) Raw image

Fig. 13: Survey participants chose which data should be givento three hypothetical applications by choosing zero or more ofthe above options.

Fig. 14: Excerpt from our room skeleton sensitivity survey. Weasked users which image contains more information that theyconsider private; 87.5% believe the raw image contains moreprivate information.

randomly. Others mistakenly chose the information theyfelt was most sensitive. After filtering out the randomrespondents and accounting for those who misinterpretedthe question, 87.5% claimed that they felt the raw datawas more sensitive than the information in the roomskeleton. This supports our choice of data to reveal in theroom skeleton.

Application-specific surveys: Our second survey ex-plores a broader set of possible data that could be releasedto applications in the context of different application sce-narios. We presented 50 survey-takers with three differentapplication descriptions: a “911 Assist” application thatdetects falls and calls 911, a “Dance Game” that asks usersto mimic dance moves, and a “Media Player” that playsvideos. We asked them which information they would feelcomfortable sharing with each application. Participantschose from the visualizations of different data availableto SurroundWeb shown in Figure 13. Participants couldchoose all, some, or none of the choices as information theywould be comfortable revealing to the application. Then,we asked participants to justify their answer in a free-textfield. From this survey, we have the following findings:Finding 2: Users have different privacy preferences fordifferent applications. For example, when asked about ahypothetical 911 Assist app, one person stated, “It seemslike it would only be used in an emergency and onlycommunicated to emergency personnel”, and another said“Any info that would help with 911 is worth giving”. Userswere more skeptical of releasing information to the dancegame; one user stated,“A dance game would not need moreinformation than the general outline or placements of thebody”. In some cases respondents thought creatively abouthow an application could use additional information; onerespondent suggested that the Video Player applicationcould adjust the projection of the video to “where youare watching”. These support a design that supports fine-grained permission levels for individual applications, whichwe view as future work.Finding 3: Users did not distinguish between the sensitiv-ity of screen sizes and room geometry. Screen sizes includesthe number, orientation, and size of screens, but does notinclude their positions in the room. Room geometry refersto the information revealed through our room skeletonabstraction. Before conducting our surveys, we hypothe-sized (RQ3) that users would find room geometry to bemore sensitive than screen sizes. In fact, our data doesnot confirm this hypothesis. In response to this finding, wedecided that the room skeleton would expose the room’sgeometry.

B. Performance

Room skeleton performance: SurroundWeb per-forms a one-time scan of a room to extract planes suitablefor projection, using a Kinect depth camera. Figure 15shows the results of scanning three fairly typical rooms wechose. No room took longer than 70 seconds to scan, whichis reasonable for a one-time setup cost.

Detection sandbox constraint solving time: Sur-roundWeb uses a constraint solver to determine theposition of segments that web sites register with the detec-tion sandbox in the room without leaking the presence orlocation of an object to the web application. The speedof the constraint solver is therefore important for web

441441

Room type Scanning Time (s) # Planes Found

Living room # 1 30 19Office 70 7Living room #2 17 13

Fig. 15: Scanning times, in seconds, and results for threerepresentative rooms.

0

1

2

3

4

5

6

7

8

0 20 40 60 80 100

Cons

trai

nt so

lvin

g tim

e (s

ec)

Number of segments

Fig. 16: Solver performance as the number of segments reg-istered with the detection sandbox increases. The error barsindicate 95% confidence intervals.

applications that use the detection sandbox. Note thatconstraint solving occurs on its own thread, so applicationshave no way of measuring this information. We consideredtwo scenarios for benchmarking the performance of theconstraint solver.

• We considered the scenario where the web appli-cation registers only constraints of the form “showthis segment near a specific object.” Figure 16 showshow solving time increases for this scenario as thenumber of registered content segments in a single webapplication grows. While we expect pages to havemany fewer than 100 segments registered with thedetection sandbox, the main point of this experimentis that constraint solving time scales linearly as thenumber of segments grow.

• Next, we considered the scenario where the web pageuses the solver for a more complicated layout. Wetested a scene with 12 detected objects and 8 con-tent segments. We created a “stress” script with 30constraints, including constraints for non-overlap inarea between segments, forcing segments to havespecified relative positions, specified distance boundsbetween segments in 3D space, and constraints on theline of sight to each segment. The constraints weresolved in less than 4 seconds on one core of an IntelCore i7 2.2 GHz processor.

In both cases, only segments that use constraints incurlatency. Note that since the constraint solver operates onits own thread, the constraint solver does not delay thedisplay of content rendered via the room skeleton.

0

10

20

30

40

50

60

70

0 20 40 60 80 100

Fram

es p

er se

cond

(FPS

)

Number of 192x108 screens

0

10

20

30

40

50

60

70

540x480 810x640 1440x720 1920x1080Fr

ames

per

seco

nd (F

PS)

Screen size

Fig. 17: On the left, maximum rendering frame rate as thenumber of same-size screens increases. On the right, the maxi-mum rendering frame rate of a single screen as its size increases.Error bars indicate 95% confidence intervals.

Rendering performance: We ran a benchmark thatmeasures how fast our prototype can alter the contents ofHTML5 canvas elements mapped to individual screens.Figure 17 displays the results. In the left configuration,the benchmark measures the frame rate as it increasesthe number of 192×108 screens. To stress the novel piecesof the SurroundWeb rendering pipeline, we must ren-der across multiple screens. For a single screen, Sur-roundWeb simply uses the embedded Trident renderingengine from Internet Explorer.

In the right configuration, the benchmark measures theframe rate as it increases the size of a single screen. Whenthere are 25 or fewer screens and screens with resolutionup to 1,440×720, the prototype maintains an acceptableframe rate above 30 FPS. These numbers could be im-proved by tighter integration into the rendering pipeline ofa web browser. At present, our prototype must copy eachframe multiple times and across language boundaries, asour prototype is written in C# but we embed a nativeWebBrowser control. Despite these limitations, our proto-type achieves reasonable frame rates.

442442

VII. Limitations and Future Work

Detection, Rendering, and Interaction privacy presentedin Section IV are variations on a theme: enabling leastprivilege for immersive room experiences. In each case,we provide an abstraction that supports immersive expe-riences while revealing minimal information. We discusslimitations to the privacy properties provided by ourabstractions and limitations to our abstraction implemen-tations in SurroundWeb, along with other future work.

Social engineering: Applications can ask users to explic-itly tell them if an object is present in the room or sendinformation about the room to the site. These attacks arenot prevented by our abstractions, but they also could becarried out by existing applications.

Browser fingerprinting: Browser fingerprinting allows aweb page to uniquely identify a user based on the instanceof her browser. Our interfaces add new information to theweb browser that could be used to fingerprint the user,such as the location and sizes of screens in the room.We note that browser fingerprinting is far from a solvedproblem, with recent work showing that even seeminglyrobust countermeasures fail to prevent fingerprinting instandard browsers [1, 24]. We also do not solve the browserfingerprinting problem.

Clickjacking: Clickjacking is the problem of a malicioussite overlaying UI elements on the elements of another site,causing the malicious site to intercept clicks intended forthe other site [10]. As a result, the browser takes an un-expected action on behalf of the user, such as authorizinga web site to access the user’s information.

SurroundWeb forbids segments to overlap, guarantee-ing that a user’s input on a screen is received by the visiblesegment. This property allows web sites to grant iframesaccess to the SurroundWeb API with the assurance thatthe iframe cannot intercept screen input events intendedfor the embedding site.

However, because SurroundWeb extends the existingweb model, it is possible that a web site has embedded amalicious script that uses existing web APIs to create anoverlay within the segment. Thus, SurroundWeb doesnot solve the clickjacking problem as it is currently facedon the web. That said, we also do not make the clickjackingproblem worse, and we do not believe our abstractionsintroduce new ways to perform clickjacking.

Side channels: The web platform allows introspection ondocuments, and different web browsers have subtly differ-ent interpretations of web APIs. Malicious JavaScript canuse these differences and introspection to learn sensitiveinformation about the user’s session. One key example ishistory sniffing, where JavaScript code from malicious webapplications was able to determine if a URL had beenvisited by querying the color of a link once rendered.While on recent browsers this property is not directlyaccessible to JavaScript, recent work has found multiple

interactive side channels which leak whether a URL hasbeen visited [16, 36].

Because SurroundWeb extends the web platform, sidechannels that exist on the current web are still presentin SurroundWeb. For the detection sandbox, we avoidthese side channels by having a separate trusted rendererplace object-relative content registered by applications.Application interactions with the detection sandbox arestrictly one-way; the application hands content and con-straints to the detection sandbox, and has no way to querythe content to determine its room location, or determine ifthe content is displayed in the room at all. The detectionsandbox renders content immediately, regardless of if itappears in the room or not, to prevent the applicationfrom using tracking pixels or Web Bugs to determine ifcontent is displayed [29]. Furthermore, applications receiveno input from content placed in the detection sandbox. Inthis way we isolate the content in the detection sandboxand mitigate potential side channels that could potentiallyreveal to the application whether the content is presentand, if so, where.

There may also be new side channels that reveal sensi-tive information about the room. For example, althoughconstraint solving occurs on its own thread, performancemay be different in the presence or absence of an objectin the room. For another example, our mapping fromnatural gestures to mouse events may reveal that the useris interacting with gestures or other information aboutthe user. Characterizing and defending against such sidechannels is future work.

Extending the detection sandbox: In our prototype,the detection sandbox allows only for registering content tobe displayed when specific objects are detected. We couldextend this to enable matching the color of an elementto the color of a nearby object. As a further step, webapplications might specify portions of a page or entirepages that are allowed to have access to object recognitionevents, in exchange for complete isolation from the webserver.

These approaches would require careful considerationof how to prevent leaking information about the presenceof an object through JavaScript introspection on elementproperties or other side channels. We could use informa-tion flow control to prevent sensitive information fromaffecting data sent back to web servers, but implement-ing sound information flow tracking for all of JavaScriptand its natively-implemented browser environment with-out unduly impacting performance presents a formidablechallenge.

The current detection sandbox is limited to trustedobject classifiers, preventing applications from providingapplication-specific object classifiers. The detection sand-box could be extended to support untrusted object classi-fiers through native sandboxing [27]. These untrusted clas-sifiers would be provided with raw sensor data, and wouldbe able to alert the detection sandbox when they detect

443443

objects, but would not be able to leak any informationoutside of the sandbox.

Our detection sandbox does appear to rule out server-side computation dependent on object presence, barring asandboxed and trusted server component. For example,cloud-based object recognition may require sandboxedcode on the server.

Supporting clickable object-relative ads: Because oursandbox prevents user inputs from reaching registeredcontent, in our prototype users can see object-dependentads but cannot click on these ads. Previous work onprivacy-preserving ads has suggested locally aggregatinguser clicks or using anonymizing relay services to fetchadditional ad content [9]. We could use these approachesto create privacy-friendly yet interactive object-dependentads.

VIII. Related Work

SurroundWeb bridges the gap between emerging re-search in HCI on augmented reality rendering abstrac-tions and research in security on preserving user privacyin augmented reality settings. Table 18 summarizes thecapabilities that these research systems surface comparedwith SurroundWeb.

A. Regulating Access to Sensor Data

Existing research in security focuses explicitly on reg-ulating sensor data provided to untrusted applications;none provide any capabilities for rendering content in anaugmented reality environment. This research is orthog-onal to SurroundWeb, and could be applied to exposerich sensor data to 3D web pages in a privacy-preservingmanner.

Darkly [12] performs privacy transforms on sensor in-puts using computer vision algorithms (such as blurring,extracting edges, or picking out important features) toprevent applications from extracting private informationfrom sensor data. For sensitive information, Darkly pro-vides the application with opaque references that it canpass to trusted library routines, and allows the applicationto apply basic numerical calculations to the data. Darklyalso provides a trusted GUI component, but applicationsare unable to introspect on the GUI’s contents, which isan important feature on the web. In addition, Darkly doesnot view the presence of input events on the trusted GUIas sensitive, whereas SurroundWeb must necessarilyblock input events to content registered with the detectionsandbox to prevent the application from learning whichobjects are in the room.

As a separate approach to the same problem, previouswork introduced the recognizer abstraction to provideapplications with the appropriate permissions access tohigher-level data constructed from sensor data by trustedcode in the OS [5, 11]. SurroundWeb itself uses a trustedrecognizer to map natural user input to existing webevents. πBox takes a third approach to this problem, and

sandboxes code that interacts with sensitive sensor infor-mation [15]. SurroundWeb’s detection sandbox, whichuses a constraint-based layout system, could be extendedto support full-fledged sandboxed layout decisions thatmake use of sensitive sensor information.

B. High-level Rendering Abstractions

On the other hand, recent research in HCI has createduseful abstractions for rendering content in an augmentedreality environment, but they are either limited in func-tionality or reveal potentially sensitive information to theapplication. Many of these abstractions can be reimple-mented completely or partially in SurroundWeb to pre-serve user privacy. Illumiroom uses projectors combinedwith a standard TV screen to create gaming experiencesthat “spill out” of the TV and into the room, but to dothis the application has access to the raw sensor feed [13].SurroundWeb can create a similar experience using itsroom skeleton abstraction.

Panelrama lets web pages intelligently split their contentacross local devices using a “Panel” abstraction, but itmakes layout decisions using only static device prop-erties, such as screen size, rather than data availablethrough sensors, such as room location [37]. In addition,because Panelrama does not depend on support in theweb browser, application developers must explicitly man-age state synchronization across multiple communicatingpages, each with their own DOM tree. In contrast, withSurroundWeb the developer writes a single page withall elements in the same DOM tree. Panelrama could bereimplemented on top of SurroundWeb’s room skeletonand satellite screens, and extended to incorporate roomlocation into its layout algorithm.

Layar uses tags in printed material and GPS coordinatesto superimpose interactive content relative to a taggedobject or location in a mobile phone’s camera feed [14].Unlike with SurroundWeb’s detection sandbox, access-ing Layar content reveals to the application that the usereither possesses the tagged object, or that the user is neara tagged physical location. The Argon mobile browserextends HTML to let web applications place content atparticular GPS locations, which the user can view throughher mobile phone [7]. Like with Layar, viewing and loadingthis content reveals to the application that the user isnear a tagged GPS location. SurroundWeb avoids thisside channel by immediately loading and rendering anycontent registered with the detection sandbox, regardlessof whether or not it will be displayed in the room.

C. Access Control

Recent work on world-driven access control restricts sen-sor input to applications in response to the environment,e.g. it can be used to disable access to the camera when in abathroom [26]. Mazurek et al. surveyed 33 users about howthey think about controlling access to data provided by avariety of devices, and discovered that many user’s mental

444444

Field Related work Application Sensor Input Rendering SupportUnrestricted Filtered Sandboxed Object Presence Multi-device Room-location-aware Object-relative

Security SurroundWeb � � � � �Darkly [12] �OS-level Recognizers [11] �World-driven ACL [26] �πBox [15] �

HCI Illumiroom [13] � �Panelrama [37] �Layar [14] � �Argon [7] � �

Fig. 18: Summary of related work in security and HCI. Emerging research in security focuses solely on restricting applicationaccess to sensitive sensor data, while HCI research generally focuses on application rendering capabilities without consideringtheir privacy implications. SurroundWeb bridges the gap between these two bodies of research.

models of access control are incorrect [18]. Vaniea et al.performed an experiment to determine how users noticeand fix access-control permission errors depending onwhere the access-control policy is spatially located on awebsite [34].

D. Browser Privacy

Previous research in web security has unearthed a va-riety of privacy issues in existing web browsers. While agreat deal of focus has been on stateful user tracking viacookies and policies such as DoNotTrack, there are someinherent additional issues with browser design that maycompromise privacy.

Most notably, a variety of methods exist for historysniffing, which allow web sites to learn about users’ visitsto other sites. Weinberg et al. describe an interactive sidechannel that leak whether a URL has been visited [36].Liang et al. discovered a novel CSS timing attack forsniffing browser history [16].

Nikiforakis et al. [23] propose a way to combat browser-based stateless fingerprinting by applying randomizationpolicies. These are evaluated to minimize page breakagethey might cause. This approach is proposed as a com-plement to the private browsing mode designed to protectagainst stateful fingerprinting.

Other security research discusses browser fingerprinting,which allows a web page to uniquely identify a user basedon the instance of the browser. The work of Mayer [17]and Eckersley [6] presents large-scale studies that showthe possibility of effective stateless web tracking via onlythe attributes of a user’s browsing environment. Thesestudies prompted some follow-up efforts [8, 35] to buildbetter fingerprinting libraries. Yen et al. [38] performed afingerprinting study by analyzing month-long logs of Bingand Hotmail and showed that the combination of the User-agent HTTP header with a client’s IP address were enoughto track approximately 80% of the hosts in their dataset.

Chapman and Evans show how side channels in webapplications can be detected with a black box approach;future work could apply this technique to search for de-tection sandbox side channels [4].

IX. Conclusion

This paper presents SurroundWeb, the first 3D webbrowser that enables web applications to project web con-tent into a room in a manner that follows the principle ofleast privilege. We described three rendering abstractionsfor the web platform that support a wide variety of ex-isting and proposed immersive experiences while limitingthe amount of information revealed to acceptable levels.The room skeleton lets applications place web content onrenderable surfaces in the room. The detection sandboxlets applications declaratively place web content relativeto objects in the room without revealing any object’slocation or presence. Satellite screens let applications dis-play content across multiple devices. We defined detectionprivacy, rendering privacy, and interaction privacy askey properties for privacy in immersive applications, anddemonstrate that our interfaces provide these properties.Finally, we validated that our two abstractions reveal anacceptable amount of information to applications throughuser surveys, and demonstrated that our prototype imple-mentation has acceptable performance.

Acknowledgements

We thank Lydia Chilton for suggesting the terms “roomskeleton” and “detection sandbox.” We thank Janice Tsaifor help with survey design and human protection review.We thank Doug Burger, Hvroje Benko, Shuo Chen, LydiaChilton, Jaron Lanier, Dan Liebling, Blair MacIntyre,James Mickens, Meredith Ringel Morris, Lior Shapira,Scott Saponas, Margus Veanes, and Andy Wilson forhelpful discussions and feedback. We also thank the anony-mous reviewers for their useful feedback, which greatlyimproved this paper.

References

[1] G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. Gurses,F. Piessens, and B. Preneel. FPDetective: Dusting the Web forFingerprinters. In Proceedings of the Conference on Computer& Communications Security, 2013.

[2] P. Baudisch, N. Good, and P. Stewart. Focus plus contextscreens: Combining display technology with visualization tech-niques. In Proceedings of the Symposium on User InterfaceSoftware and Technology, 2001.

445445

[3] J. Birnholtz, L. Reynolds, E. Luxenberg, C. Gutwin, andM. Mustafa. Awareness beyond the desktop: exploring attentionand distraction with a projected peripheral-vision display. InProceedings of Graphics Interface, 2010.

[4] P. Chapman and D. Evans. Automated black-box detection ofside-channel vulnerabilities in web applications. In Proceedingsof the Conference on Computer and Communications Security,2011.

[5] L. D’Antoni, A. M. Dunn, S. Jana, T. Kohno, B. Livshits,D. Molnar, A. Moshchuk, E. Ofek, F. Roesner, S. Saponas,M. Veanes, and H. J. Wang. Operating System Support forAugmented Reality Applications. In Workshop on Hot Topicsin Operating Systems, 2013.

[6] P. Eckersley. How Unique Is Your Browser? In Proceedings ofthe Privacy Enhancing Technologies Symposium, 2010.

[7] B. M. et al. Argon Mobile Web Browser, 2013. https://research.cc.gatech.edu/kharma/.

[8] E. Flood and J. Karlsson. Browser Fingerprinting. Master ofScience Thesis, Chalmers Univesity of Technology, 2012.

[9] S. Guha, B. Cheng, and P. Francis. Privad: Practical Privacyin Online Advertising. In Proceedings of the Symposium onNetworked Systems Design and Implementation, 2011.

[10] R. Hansen and J. Grossman. Clickjacking. http://www.sectheory.com/clickjacking.htm.

[11] S. Jana, D. Molnar, A. Moshchuk, A. Dunn, B. Livshits, H. J.Wang, and E. Ofek. Enabling fine-grained permissions foraugmented reality applications with recognizers. In Proceedingsof the USENIX Security Symposium, 2013.

[12] S. Jana, A. Narayanan, and V. Shmatikov. A Scanner Darkly:Privacy for Perceptual Applications. In Proceedings of the IEEESymposium on Security and Privacy, 2013.

[13] B. R. Jones, H. Benko, E. Ofek, and A. D. Wilson. IllumiRoom:Peripheral Projected Illusions for Interactive Experiences. InProceedings of the Conference on Human Factors in ComputingSystems, 2013.

[14] Layar. Layar Solutions, 2015. https://www.layar.com/solutions/.

[15] S. Lee, E. L. Wong, D. Goel, M. Dahlin, and V. Shmatikov. Box:A Platform for Privacy-Preserving Apps. In Proceedings of theSymposium on Networked Systems Design and Implementation,2013.

[16] B. Liang, W. You, L. Liu, W. Shi, and M. Heiderich. Scriptlesstiming attacks on web browser privacy. In Proceedings of theIEEE/IFIP International Conference on Dependable Systemsand Networks, 2014.

[17] J. R. Mayer. Any person... a pamphleteer. Senior Thesis,Stanford University, 2009.

[18] M. L. Mazurek, J. P. Arsenault, J. Bresee, N. Gupta, I. Ion,C. Johns, D. Lee, Y. Liang, J. Olsen, B. Salmon, R. Shay,K. Vaniea, L. Bauer, L. F. Cranor, G. R. Ganger, and M. K.Reiter. Access Control for Home Data Sharing: Attitudes, Needsand Practices. In Proceedings of the Conference on HumanFactors in Computing Systems, 2010.

[19] Microsoft Corporation. Kinect for Windows SDK, 2013. http://www.microsoft.com/en-us/kinectforwindows/.

[20] Microsoft Corporation. Xbox SmartGlass, 2014. http://www.xbox.com/en-US/SMARTGLASS.

[21] D. Molnar, E. Ofek, and Microsoft Corporation. SurroundWeb:Spreading the Web to Multiple Screens. http://research.microsoft.com/apps/video/?id=212669.

[22] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, D. Kim,A. J. Davison, P. Kohli, J. Shotton, S. Hodges, and A. Fitzgib-bon. KinectFusion: Real-time dense surface mapping and track-ing. In Proceedings of the IEEE International Symposium onMixed and Augmented Reality, 2011.

[23] N. Nikiforakis, W. Joosen, and B. Livshits. PriVaricator:Deceiving Fingerprinters with Little White Lies. In Proceedingsof the International World Wide Web Conference, 2015.

[24] N. Nikiforakis, A. Kapravelos, W. Joosen, C. Kruegel,F. Piessens, and G. Vigna. Cookieless Monster: Exploring theEcosystem of Web-based Device Fingerprinting. In Proceedingsof the IEEE Symposium on Security and Privacy, 2013.

[25] K. Parrish. Kinect for Windows, Ubi Turns Any Surface intoTouch Screen, 2013. http://www.tomshardware.com/news/

kinect-ubi-touch-screen-windows-8-projector,23887.html.

[26] F. Roesner, D. Molnar, A. Moshchuk, T. Kohno, and H. J.Wang. World-Driven Access Control. In Proceedings of theConference on Computer and Communications Security, 2014.

[27] D. Sehr, R. Muth, C. Biffle, V. Khimenko, E. Pasko, K. Schimpf,B. Yee, and B. Chen. Adapting Software Fault Isolation to Con-temporary CPU Architectures. In Proceedings of the USENIXSecurity Symposium, 2010.

[28] L. Shapira, R. Gal, E. Ofek, and P. Kohli. FLARE: FastLayout for Augmented Reality Applications. In Proceedings ofthe IEEE International Symposium on Mixed and AugmentedReality, 2014.

[29] R. M. Smith. The Web Bug FAQ, 1999. https://w2.eff.org/Privacy/Marketing/web_bug.html.

[30] S. Stamm. Plugging the CSS History Leak, 2010. http://mzl.la/1mzshEY.

[31] R. Taylor. Internet everywhere? How the web could beon every surface of a room. http://www.bbc.com/news/technology-27243375.

[32] V. Toubiana, A. Narayanan, D. Boneh, H. Nissenbaum, andS. Barocas. Adnostic: Privacy Preserving Targeted Advertising.In Proceedings of the Network and Distributed System SecuritySymposium, 2010.

[33] uSample. Instant.ly survey creator, 2013. http://instant.ly.[34] K. Vaniea, L. Bauer, L. F. Cranor, and M. K. Reiter. Out of

sight, out of mind: Effects of displaying access-control infor-mation near the item it controls. In Proceedings of the IEEEConference on Privacy, Security and Trust, 2012.

[35] V. Vasilyev. fingerprintjs library. https://github.com/Valve/fingerprintjs, 2013.

[36] Z. Weinberg, E. Y. Chen, P. R. Jayaraman, and C. Jackson. IStill Know What You Visited Last Summer. In Proceedings ofthe IEEE Symposium on Security and Privacy, 2011.

[37] J. Yang and D. Wigdor. Panelrama: Enabling easy specificationof cross-device web applications. In Proceedings of the Confer-ence on Human Factors in Computing Systems, 2014.

[38] T.-F. Yen, Y. Xie, F. Yu, R. P. Yu, and M. Abadi. HostFingerprinting and Tracking on the Web: Privacy and SecurityImplications. In Proceedings of the Network and DistributedSystem Security Symposium, 2012.

446446

Documents

SurroundWeb: Mitigating Privacy Concerns ina3DWebBrowser · 2015. 5. 11. · Figure 5 visualizes the data that the room skeleton pro-videstotheapplication. Room Setup: In a static