11
Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi Okoshi * , Kota Tsubouchi , Masaya Taji , Takanori Ichikawa , and Hideyuki Tokuda * * Graduate School of Media and Governance, Keio University [email protected] Yahoo Japan Corporation Abstract—In today’s advancing ubiquitous computing age, with its ever-increasing amount of information from various applications and services available for consumption, the man- agement of people’s attention has become very important. In particular, the high volume of notifications on mobile devices has become a major cause of interruption of users. There has been much research aimed at detecting the opportune moment to present such information to users with in a way that lowers the cognitive load or frustration. However, evaluation of such systems in the real-world production environment with real users and notifications, and evaluation on user’s engagement to the presented notification beyond simple responsiveness have not been adequately studied. To the best of our knowledge, this study is the first to investigate user interruptibility and engagement using a real-world large-scale mobile application and real-world notifications consisting of actual news content. We equipped the Yahoo! JAPAN Android app, one of the most popular applications on the national market, with our mobile-sensing and machine- learning-based interruptibility estimation logic. We conducted a large-scale in-the-wild user study with more than 680,000 users for three weeks. The results show that in most cases delaying the notification delivery until an interruptible moment is detected is beneficial to users and results in significant reduction of user response time (49.7%) compared to delivering the notifications immediately. We also observed a higher number of notifications opened in our system as well as constant improvement in user engagement levels throughout the entire study period. I. I NTRODUCTION While the capacity of our attention as humans is constant, the amount of information available for consumption has been growing by several orders of magnitude. Concomitant with advances in computing and multitasking operating systems, more devices, and more applications and services, increasing volumes of notifications that proactively convey information to users are resulting in a greater number of interruptions. Versatile applications and services in the cloud are being developed and utilized in this ubiquitous computing age. These software and services generate enormous amounts of various types of information for users, such as big data analysis, schedule reminders, messages from social media friends, the weather forecast, breaking news, and status updates from devices. Such information is delivered to users through devices such as smartphones and other mobile devices, wearable watches, and even through ambient devices embedded in a user’s environment. For better timeliness and speediness, the provision of such information has progressively become more proactive, and it is often delivered through push notification systems. In this information-overload world, the constant and limited capacity of human attention has become a new bottleneck [1] in computing. Push notifications that pop up in the background of a user’s attention at random times cause interruptions and divided attention. There have been several reports on the negative effects caused by divided attention in terms of productivity, emotion, and mental state [2], [3], [4], [5]. Researchers have been investigating user interruptibility in various ubiquitous and pervasive computing situations using different techniques with the objective of ensuring that inter- ruptive notifications do not unnecessarily steal users’ precious attention resources. Breakpoint [6], the boundary between two adjacent units of user activities, is known as a timing that can lower the impact on users’ cognitive load. We previously investigated the real-time detection of users’ breakpoints in their device interactions and physical activities using mobile sensing and machine learning techniques on smartphones [7] and wearable watch devices [8]. However, we found that three significant issues remain to be studied: (1) real-world evaluation of breakpoint-based adaptive notification with actual product application and as- sociated notifications, (2) software architecture design of such interruptibility estimation for real-world deployment both on the client and server sides, and (3) comprehensive evaluation of user behavior in terms of not only interruptibility but also users’ further engagement levels with the notification content. In this paper, we present the results and findings from a large-scale study conducted on smartphone users’ interrupt- ibility and user engagement with a popular real-world smart- phone application. We designed and implemented our real-time breakpoint detection and notification scheduling mechanisms inside the Yahoo! JAPAN Android application [9] (shown in Figure 1), one of the most popular applications in the (Front screen (left), Weather radar (center), Notification (right)) Fig. 1. Screenshots of Yahoo! JAPAN Android Application [9]

Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

Attention and Engagement-Awareness in the Wild:A Large-Scale Study with Adaptive Notifications

Tadashi Okoshi∗, Kota Tsubouchi†, Masaya Taji†, Takanori Ichikawa†, and Hideyuki Tokuda∗∗Graduate School of Media and Governance, Keio University

[email protected]†Yahoo Japan Corporation

Abstract—In today’s advancing ubiquitous computing age,with its ever-increasing amount of information from variousapplications and services available for consumption, the man-agement of people’s attention has become very important. Inparticular, the high volume of notifications on mobile deviceshas become a major cause of interruption of users. There hasbeen much research aimed at detecting the opportune momentto present such information to users with in a way that lowersthe cognitive load or frustration. However, evaluation of suchsystems in the real-world production environment with real usersand notifications, and evaluation on user’s engagement to thepresented notification beyond simple responsiveness have not beenadequately studied. To the best of our knowledge, this studyis the first to investigate user interruptibility and engagementusing a real-world large-scale mobile application and real-worldnotifications consisting of actual news content. We equipped theYahoo! JAPAN Android app, one of the most popular applicationson the national market, with our mobile-sensing and machine-learning-based interruptibility estimation logic. We conducted alarge-scale in-the-wild user study with more than 680,000 usersfor three weeks. The results show that in most cases delaying thenotification delivery until an interruptible moment is detected isbeneficial to users and results in significant reduction of userresponse time (49.7%) compared to delivering the notificationsimmediately. We also observed a higher number of notificationsopened in our system as well as constant improvement in userengagement levels throughout the entire study period.

I. INTRODUCTION

While the capacity of our attention as humans is constant,the amount of information available for consumption has beengrowing by several orders of magnitude. Concomitant withadvances in computing and multitasking operating systems,more devices, and more applications and services, increasingvolumes of notifications that proactively convey informationto users are resulting in a greater number of interruptions.Versatile applications and services in the cloud are beingdeveloped and utilized in this ubiquitous computing age. Thesesoftware and services generate enormous amounts of varioustypes of information for users, such as big data analysis,schedule reminders, messages from social media friends, theweather forecast, breaking news, and status updates fromdevices. Such information is delivered to users through devicessuch as smartphones and other mobile devices, wearablewatches, and even through ambient devices embedded in auser’s environment. For better timeliness and speediness, theprovision of such information has progressively become moreproactive, and it is often delivered through push notificationsystems.

In this information-overload world, the constant and limitedcapacity of human attention has become a new bottleneck [1]

in computing. Push notifications that pop up in the backgroundof a user’s attention at random times cause interruptionsand divided attention. There have been several reports onthe negative effects caused by divided attention in terms ofproductivity, emotion, and mental state [2], [3], [4], [5].

Researchers have been investigating user interruptibility invarious ubiquitous and pervasive computing situations usingdifferent techniques with the objective of ensuring that inter-ruptive notifications do not unnecessarily steal users’ preciousattention resources. Breakpoint [6], the boundary between twoadjacent units of user activities, is known as a timing thatcan lower the impact on users’ cognitive load. We previouslyinvestigated the real-time detection of users’ breakpoints intheir device interactions and physical activities using mobilesensing and machine learning techniques on smartphones [7]and wearable watch devices [8].

However, we found that three significant issues remainto be studied: (1) real-world evaluation of breakpoint-basedadaptive notification with actual product application and as-sociated notifications, (2) software architecture design of suchinterruptibility estimation for real-world deployment both onthe client and server sides, and (3) comprehensive evaluationof user behavior in terms of not only interruptibility but alsousers’ further engagement levels with the notification content.

In this paper, we present the results and findings from alarge-scale study conducted on smartphone users’ interrupt-ibility and user engagement with a popular real-world smart-phone application. We designed and implemented our real-timebreakpoint detection and notification scheduling mechanismsinside the Yahoo! JAPAN Android application [9] (shownin Figure 1), one of the most popular applications in the

(Front screen (left), Weather radar (center), Notification (right))Fig. 1. Screenshots of Yahoo! JAPAN Android Application [9]

Page 2: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

national application market. Considering several real-worldrequirements related to simplicity, scalability, and efficiency,our mechanism particularly focuses on the user’s physical-activity breakpoints [8], relying on activity recognition APIson the smartphone platform. Using mobile machine learningtechniques, the detection mechanism embedded in the appdetects the user’s breakpoints in real time and shows incomingnotifications at such timings.

Our large-scale user study, conducted over three weekswith a total of 687,840 users, revealed the efficiency ofour proposition. We found that, in most cases, notificationsdelivered at delayed breakpoint timings improved the user’soverall click timing (earlier). While the notification deliverydelay due to additional breakpoint detection (as opposed to theconventional “deliver immediately” style approach) is trivial,once the notification is delivered, a significantly reduced userresponse time (49.7%) was observed in our approach. We alsoobserved a higher number of notifications opened in our systemas well as constant improvement in user engagement levelsthroughout the entire study period.

The contribution of this paper is three-fold. First, wepresent the design and implementation of our interruptibilitydetection mechanism on a large-scale real-world smartphoneplatform. Second, we discuss our large-scale in-the-wild userstudy on user interruptibility and engagement conducted usingan actual product and associated notifications in a real-worldsituation. Finally, we evaluate our work in terms of not onlyinterruptibility but also further user engagement with thepresented notification contents. The remainder of this paper isorganized as follows. Section II explains the interruption over-load problem. Section III discusses related work. Section IVclarifies our research goals. Section V specifies requirementsfor our solution. Section VI presents the system design andarchitecture of our system, AtteliaY. Section VII describes ourinitial model training study. Section VIII reports on our large-scale in-the-wild user study conducted with 687,840 users forthree weeks, in terms of our experimental design, methodology,results, and analysis. Section IX discusses further researchopportunities arising from the user study. Section X concludesthis paper.

II. INTERRUPTION OVERLOAD

Our current computing life suffers from interruptionoverload caused by large numbers of notifications presentedin inappropriate ways. Interruption overload is one class ofa broader information overload problem discussed in theliterature [10], [11], [12]. More studies have recently beenconducted in the context of interruptions and multitasking [13],[14], [15], [16], [17].

The main source of interruption overload is notificationsfrom computer system entities such as local operating systems,messaging services connected to other users, and variousapplications. The notification in computer systems was origi-nally designed to provide newly available information to usersin a more timely and speedy manner (than polling by theuser). Since typical notification systems deliver notificationsimmediately to users as soon as they are available, the usersend up facing numerous interruptive notifications from thebackground of their current tasks at random timings, regardlessof their timing preference. When a notification is perceived

and recognized by a user, some amount of his/her attentionwith limited capacity [18], [19] is allocated to the informationcarried by the notification. This situation is called “dividedattention” [20].

Past studies have been revealing several types of negativeinfluence of interruptive notification, such as productivity [2],[3], [4], [5], [21], [22], emotional and social attribution [21],and psycho-physiological states [3]. Needs for computing sys-tems that can adapt their behavior to human users’ attentionalresources have been gradually recognized, with an increasingnumber of literatures particularly on sensing user’s attentionalstates.

III. RELATED WORK

There are two main targets for sensing a human’s currentattentional state: the user’s current cognitive load and inter-ruptibility.

A. Sensing Users’ Cognitive Load

In cognitive psychology, the concept of cognitive loadis defined as the total amount of mental effort allocated toworking memory. Several different approaches for measuringthis load have been proposed, including (a) subjective rating-based methods, (b) task performance-based methods, and (c)physiological response-based methods.

Several studies on the subjective rating-based approachhave shown that the measurement of cognitive load throughpost hoc self-reporting is a relatively reliable methodology formental effort assessment [23]. The most widely used tool forassessing a user’s cognitive load is the NASA Task Load Index(NASA-TLX) [24]. Although use of this method is widespread,the post hoc nature of the approach makes it difficult to applyto versatile ubiquitous computing systems where an assessmentneeds to be completed in real time.

The measurement of a user’s task performance is usedto objectively assess the user’s cognitive load during taskexecution. The user’s performance regarding their primary andfocal task is used in the “primary task measurements,” whereas“secondary task measurements” exploit the performance of asecondary task (often asked to be) executed simultaneouslywith the primary task [23]. In this methodology, the variationin reaction performance indicates variations in cognitive load.However, this methodology may not be feasible in ubiquitouscomputing situations where a user conducts multitasking withfrequent task switching between multiple tasks, making itdifficult to measure the response performances of the user’svarious types of tasks using uniform measurement criteria.

The psycho-physiological response-based method includesseveral different techniques, such as tracking of eye movementand pupil size [25], [26], [27], [28], readings from electro-cardiograms (ECG), galvanic skin response (GSR) [26], [29],[30], electroencephalogram (EEG) [29], [28], heart rate (HR),and HR variability (HRV) [31], [32], [28]. Haapalainen etal. [33] found that, in desktop computing, the combinationaluse of an electrocardiogram and heat flux is the most accurateat classifying low and high levels of cognitive load. Althoughthis approach looks promising in terms of detecting users’cognitive load in real time, the burden placed on the targetusers is not trivial.

Page 3: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

B. Sensing User’s Interruptibility

Rather than sensing a user’s cognitive load to representthat user’s internal mental status relatively directly, severalresearchers have proposed detecting user’s interruptibilityfrom the viewpoint of the source of possible interruptions. Thisclass of research can be categorized into two main groups: (a)estimation of interruptibility at a certain timing period basedon a user’s context, and (b) detecting the user’s breakpoint [6].The breakpoint is the boundary between two adjacent actionsof a person, and was found to be the timing when interruptingthe user results in relatively lower frustration and cognitiveoverhead [21], [34], [35].

Following early research in the desktop computing do-main [36], [37], [38], [39], [40], more studies have recentlybeen conducted in the mobile field. Ho et al. used wireless on-body accelerometers to trigger interruptions when users changeactivities [41] and found that interruptions at these transitiontimes reduce user annoyance. The most recent studies havebeen on widespread mobile and smartphone environments.Fischer et al. identified breakpoints after phone calls andtext messages [42]. They found that users tend to be moreresponsive to notifications after these activities than at otherrandom times. Hofte et al. used an experience samplingmethodology to collect information on location, transit status,company, and activities in order to build a model of inter-ruptibility [43], particularly for phone calls. Pejovic et al.expanded the use of context to detect interruptible momentson smartphones, including user activity, location, time of day,and emotional states [44]. Recent studies have even detecteduser boredom [45] as yet another opportune moment fornotifications and engagement level[46] as a further indicatorof user’s response to received information.

At the system level, current situation of fragmented inter-ruptive notification delivery over mobile network is also knownto be inefficient in power consumption. Acer et al. [47] showedthat delaying notification delivery can yield power savings inmobile devices.

Our previous works, Attelia I [7] and II [8], have followedthe same research trend in interruptibility on smartphonesand wearable watches (multi-device environment). Towards therealization of opportune moment detection and on-the-fly adap-tation in notification scheduling, we particularly emphasizedfour design principles: (1) feasibility on users’ real mobileand wearable devices, (2) supporting real-time detection, (3)applicability to diverse types of applications, and (4) affinityto all-day use. Attelia realizes real-time breakpoint detectionon smartphones and wearable watches without any externalsensors or modification to existing operating systems or appli-cations.

C. Further Research Challenges

Although the studies cited above show that researchers areactively working on interruptibility and notification schedulingon smartphones and wearable devices, there remain significantresearch challenges that need to be addressed:

1) To the best of our knowledge, no study has inves-tigated and evaluated user interruptibility with real(product-level) applications and the real notificationsissued from such applications. This is primarily ow-ing to the current situation of major smartphone

platforms (e.g., Android and iOS) not having openedtheir APIs to control notifications. Thus, past studiesmainly used a custom sample application and/or re-lated custom-made notifications prepared in an ad hocmanner for their research user study. Although somestudies [48], [49] focused on real-world smartphonenotifications, their main contributions pertained toanalysis of the current situations.

2) The system design for such interruptibility detectionand notification adaptation in real situations (with thereal-world applications used by real users) has notbeen adequately studied.

3) User engagement for information content presentedvia notifications, beyond the user’s initial responsesto the notifications (such as response time or clickrate), have not been adequately evaluated.

IV. RESEARCH GOALS AND APPROACHES

Considering the research issues outlined above, this studyaims to investigate smartphone users’ interruptibility againstnotifications and further engagement against the notified con-tent in systematically estimated “opportune timings”. Espe-cially, this research features such investigation in a real-world environment with a real application, real users, and realnotifications. To achieve these goals, we took the followingapproaches after careful discussion.

A1. Embedding interruptibility estimation logic into amarket-leading smartphone application: We added our in-terruptibility estimation logic into the “Yahoo! JAPAN” An-droid application [9]. Yahoo! JAPAN has been popular in theJapanese market since its launch in 1996, with a search enginemarket share of 32% [50] (its share of Yahoo!s worldwidemarket share is 3.4%). The Android app, shown in Fig. 1,has an installed base of more than 10 million users, making itone of the most popular smartphone applications. The app is aportal-like application with several different features includingWeb search, news reader, weather map, and links to a varietyof Yahoo! JAPAN services. To the best of our knowledge, weare the first to conduct interruptibility research with a real-world application that is utilized by such a large number ofusers.

A2. Using real-world notifications: Along with the applica-tion, we use real notifications issued from Yahoo! services onthe app to evaluate our interruptibility estimation approach.Whereas most previous studies used an artificial notificationor ESM [51] as notifications, utilization of real notificationsfrom real information sources enables us to understand howusers behave in real situations.

A3. Investigating users’ engagement: Finally, we quan-titatively measure user engagement levels for the contentpresented through each notification in addition to severalimmediate response criteria, such as response time and clickrate. Because users load Web content from Yahoo! JAPANservers when they click notifications, we decided to trackusers’ browsing behavior from the server side by measuringengagement-related criteria such as session length and revisitrate. This evaluation facilitates our understanding of howusers behave and engage with the presented content beyondimmediate “click to open” behavior.

Page 4: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

V. REQUIREMENTS FOR SYSTEM DESIGN ANDDEVELOPMENT

In spite of the goals and approaches above, doing suchresearch on the real production environment has several dif-ferences from that on a research-purpose environment. Herewe present the requirements for the system design and devel-opment that emerged after our discussion in the beginning ofthis project. Since we aim to port the original research softwareAttelia into the Yahoo! JAPAN commercial production system,we faced several real-world requirements related to acceptablebehavior and user experience of the product application, aswell as Yahoo!’s business-oriented decision and restrictions.

R1. User’s additional burden needs to be minimized.: Whenplacing the interruptibility estimation as an additional logicto an existing application, depending on the sensor and APItypes the logic uses, the users of the app will experienceadditional burden, such as explicit confirmation to give ad-ditional permissions (e.g., accessibility, location information)to the application. Such burden should be minimized in orderto retain the existing user base of the application.

R2. Power overhead needs to be minimized.: As power isalways a precious resource in mobile devices, and becausemobile users are very conscious about an application’s powerusage, our system design needs to be energy-aware and mini-mize power overhead.

R3. Cross-platform generalizability needs to be consid-ered.: Although our first step experimental system can beimplemented on a single platform for research purpose, thefundamental system design needs to be aware of cross-platformgeneralizability over both iOS and Android major platforms.

R4. Collection of sensitive data needs to conform tothe corporate policy and process.: Additional collectionof sensitive and/or privacy-related sensor data, such as fine-grained location information, needs to be proposed, carefullydiscussed, and approved in the corporate-wide business processfor assuring end user’s privacy protection. It means that thisprocess can take time and our system design may need tostart with a minimum set of data collection for the time-boundresearch period.

R5. System design and development needs to conformto the existing product management.: “Yahoo! JAPAN”Android application includes lots of commercial level prod-uct features of lots of Yahoo! JAPAN services. Its productplanning, design, and development processes are managedin the business-oriented governance. Thus, the design anddevelopment of our system naturally needs to fit such existingprocesses. This also means that opportunity of the applicationupdates to the market (i.e., GooglePlay) is considerably limitedcompared to single-purpose research prototype applicationwhich often contains only bare-bone features and can bepushed even nightly.

VI. SYSTEM DESIGN

In accordance with the goals, approaches, and the require-ments outlined above, this section explains our system design.

Fig. 2. Two Types of Breakpoints - Device Interaction and Physical Activity

A. Detection of Physical Activity Breakpoints

For our concrete interruptibility estimation design to embedinside the app, we decided to use and extend our previouswork on real-time mobile detection of breakpoints developedin Attelia [7], [8]. We previously placed breakpoints intotwo classes, namely physical activity breakpoints and deviceinteraction breakpoints as shown in Figure 2. In the figure,a user is sitting down and doing work on her tablet. After awhile, she decides to take a coffee break. She stands up, walksto the kitchen, pours a coffee, walks back, sits down on thecouch, and enjoys her coffee while watching a video on hersmartphone. In essence, in our daily lives with smartphonesand wearable devices, there is a significant amount of timewhen we simply carry or wear them but do not activelyuse (manipulate) them, in contrast to the certain periods weactually do use (manipulate) them. By detecting two differenttypes of breakpoints, our previous work detected interruptivemoments in user’s ubiquitous computing life comprehensively.

Meanwhile, in the present study, we pay special attentionto the utilization of physical activity breakpoints. This isdone for several reasons. Collecting UI interaction data onsmartphone needs user’s explicit permission for accessibilityAPI on the smartphone platform (against R1). Furthermore,accessing such sensitive information of users take long timein the product management process and it is even not clear ifsuch data collection gets approved (against R4). On the otherhand, detection of physical activity breakpoints has severaladvantages. Physical activity breakpoints can solely cover auser’s all-day computing life as long as the user is carryingthe device (even during the user’s active device use period).Moreover, activity recognition API used for physical activitybreakpoint detection has been recently implemented in both ofthe major mobile platforms [52], [53] (compatible with R3),and those APIs are considered optimized in terms of efficiency,accuracy, and power consumption (compatible with R2).

In our design, the system utilizes Google Play ServicesLocation API (activity recognition API) [53] provided by theAndroid platform, reads its detected activity results (e.g., walk-ing, still, on bike), and detects changes in the user’s activityas candidate timings of opportune moments for notification.

Page 5: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

B. Notification to Be Tested

Table I gives the definitions of several different notificationclasses currently used in Yahoo! JAPAN services. For the firststep of our study, we decided to use the “Recommendation”class, for three reasons. First, this class of notifications consistsmainly of updates from services such as sports and showbiz-related news departments and does not include any emergencycontent that needs to be presented to users immediately, so wecan explore various notification scheduling techniques. Second,receiver users of “Recommendation” notifications, i.e., thosewho enabled reception of this information class on the app’spreference screen, receive exactly the same content, so we canfocus on investigating a user’s behavior and interruptibilitywithout being influenced by any personalization aspect of thenotification content. Finally, this plan also fits the currentproduct planning of the Yahoo! JAPAN application (compatiblewith R5).

TABLE I. NOTIFICATION CLASSES

Class Name Real-Time/Batched Content Examples

Breaking news real-time breaking news articles

Natural disaster real-time weather update, earthquakes

Recommendation batched sports & showbiz news, tips

Personal batched transit, email, travel, auction

The recommendation notifications have four basic notifi-cation timing slots (8AM, noon, 6PM, and 9PM) every day.The content of the notifications are posted in the batch contentpush queue by the news provider department in advance andthen sent to each user at the scheduled timings. At each timingslot, only one notification content will be pushed to the users.

C. System Architecture

Figure 3 shows the architecture of our production system.Our implementation consists of a series of additional compo-nents inside the Yahoo! JAPAN Android application as wellas the components on the server side.

Fig. 3. System Architecture

1) Notification Content Fetching: On the client side, NewsFetcher is a component that triggers whole breakpoint detec-tion. News Fetcher maintains a connection to the server viaa TCP connection. When a new content (to be presented tothe user) is received from the server, News Fetcher notifiesthe Controller component that initiates the core breakpointdetection logic.

2) Life-cycle of Breakpoint Detection: Figure 4 shows thelife-cycle of the core breakpoint detection logic. The core logicwill be initiated upon the arrival of new notification content.Once the logic starts, it repeatedly executes core detectionlogic (sensing, feature extraction, and prediction) every timea new sensor data is detected by Mobile Sensing. When abreakpoint is detected by Predictor (with an installed model),the system issues a new Android notification and finishes thecore detection logic.

Fig. 4. Life-cycle of Breakpoint Detection Logic

With this design, we can minimize the duration of on-the-fly mobile sensing and online prediction and also minimizethe power consumption overhead (compatible with R2). Weactually tested the additional power consumption in qualityassurance department and confirmed that the additional over-head was less than 3% compared to the power consumptionof the original application.

3) Mobile Sensing: The Mobile Sensing component ob-tains several types of sensor data, including that from theGooglePlay Service Location API ActivityRecognitionand other device-related data. Table II lists the sensor datacollected on the client. The data mainly consists of a seriesof output values from the GooglePlay Service Location API’s“ActivityRecognition” and other device-based events (e.g.,screen on/off events). On the activity recognition results from“ActivityRecognition”, we used data only with a confidencevalue greater than 51 (the value can be between 0 and 100)based on our empirical knowledge. The mobile sensing compo-nent obtains all of these data through individually implementedevent handlers for each sensor. Any changes in sensor data,such as when the user’s activity changes or when the devicevolume is changed, will be sensed and logged by the system.

4) Feature Extractor: When new sensor data is detected,the Feature Extractor and Predictor modules execute andpredict if the current moment is a breakpoint of the user.Table III lists the features extracted from the sensor data. Thetotal number of features is 387. The system extracts seven

Page 6: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

TABLE II. COLLECTED SENSOR DATA

Sensor Name Values

Activity Recognition IN VEHICLE, ON BICYCLE, ON FOOT,STILL, TILTING, UNKNOWN

Device Volume integer (0 to 7)

Network Mode off, Mobile, Wi-Fi, WiMAX, LTE

Screen Sleep normal, sleeping

Vibration off, on

types of features: timestamp (hour), activity type, volume,device sleep/awake status, vibration status, silent mode setting,and network connection type. In addition to these sensorvalues, transition of the sensor data is introduced as an eighthtype of feature. For all possible “From” and “To” pairs ofsensor status transition, we prepared a dedicated feature value.To detect breakpoints that occur in the timings of activitychange (e.g., “tilt”ing their phone from the “still” status),these transition type features are helpful for the system tocharacterize changes detected in each sensor.

TABLE III. FEATURES EXTRACTED FROM THE SENSOR DATA

Event Types Explanation Levels

Timestamp (Hour) Hour value extracted from time stamp of client 24

Collected sensor data Present status of the client device 34

Trigger Sensor type triggering this detection process 3

Transitions of “From and To” transition sets of each sensor 326sensor values activity recognition (6x6),

volume(15x15), silent mode(4x4), network(6x6)

We referenced the detailed specifications of the Androidplatform to specify the number of possible values in activ-ity transition, volume transition, silent mode transition, andnetwork connection transition. We also confirmed that, toensure the desired cross-platform nature of the system design,these feature designs are feasible and reasonable for futureimplementation of the system on the iOS platform.

5) Predictor: Using the above features, the actual break-point prediction is executed in Predictor with an installedlinear regression model. The model parameters are generatedon the server side and then downloaded to the clients.

It is possible that no breakpoint is detected in Predictor fora long period after the core detection module has been initiated.For example, when a user has placed the phone on the desk andno sensor data update (activity recognition, volume change,etc.) is detected, the core logic may not detect any breakpointsuntil a change in a time-related feature value influences aprediction result. In such cases, the system automatically issuesa notification after a specified timeout T. We set this value as1 hour throughout this research.

6) Ground Truth Annotation: The user’s reaction to thenotification, along with the current sensor data, will be loggedon the client. When a user clicks a notification within 10seconds of the breakpoint (that triggered the notification),all the sensor data at that breakpoint timing will be loggedwith an annotation of “breakpoint” ground truth. According toYahoo! JAPAN’s knowledge on notifications, empirically weconfigured the 10 second timeout value for this study. Thelog data is periodically sent to the server for further modeltraining.

7) Model Training and Update at The Server: The log dataalong with the ground truth annotation will be sent to theserver nightly. At the server side, every night a new modelis built from all the data uploaded from the clients in thepast. The resulting model’s parameter will be available on theModel Server and will be downloaded to the clients on a dailybasis. This scheme of the periodic model update at the serverand distribution of it to the clients nicely fits our requirementR5, since the system can update the model frequently withoutbuilding and distributing the client application.

Because the size of the client log is expected to be huge,we prepared a dedicated Hadoop [54] cluster with 32 workersfor model training. After the log data from the clients aresplit into 32 buckets at the server side, each worker createsa logistic linear regression on LIBLINEAR [55]. The 32generated models will then be merged into a final unifiedmodel considering the bias of the number of training data.

VII. INITIAL MODEL TRAINING

On the basis of the system design presented above, weimplemented a prototype version (a new version of the Ya-hoo! JAPAN Android app) and conducted an initial studyto collect user’s response logs to notifications and train theinitial prediction model (that will be used in the productionuser study later), as well as to check the basic behaviorof our implementation. We installed the app on 39 Androidsmartphone devices belonging to members of our lab andcontinued the study for 35 days.

During the study, notifications from the real Yahoo! servicewere delivered at the conventional random timings. User’sresponse time to the notification along with the sensor datain the timing of the user click were logged on the client andsent to the server. As described in Section VI-C6, the sensordata will have annotated as a “breakpoint” ground truth whenthe user clicks a notification within 10 seconds and vice versa.

The prediction models, trained from all the log and an-notation data over 35 days, showed an average accuracy of94.8% in our 10-fold cross-validation. In this model, a totalof 104 features (all those with a value over 0) are selected forpredicting breakpoints. Table IV shows the top 5 influencingfeatures. Interestingly, the top feature is the “hour” value ofthe clock being equal to 10. We speculate that this may be dueto the working style of the participants (our lab members). Thesecond to fifth features relate to the activity change occurrenceof different “From” and “To” activities. It is interesting to seethat “IN VEHICLE” (literally meaning that the user is in avehicle) appears twice in this ranking. According to Google’sAPI definition [53], this class means that the device is in avehicle but does not necessarily mean that the owner is drivingthe vehicle. “ON FOOT” includes both walking and running.

TABLE IV. TOP 5 INFLUENCING FEATURES IN THE INITIAL MODEL

Feature Name Value

Hour value=10 4.27Activity (STILL to ON FOOT) 3.51

Activity (STILL to IN VEHICLE) 3.50Activity (IN VEHICLE to STILL) 3.30

Activity (TILTING to STILL) 3.23

Page 7: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

The API definition considers “TILTING” to be detected when“the device angle relative to gravity changed significantly. Thisoften occurs when a device is picked up from a desk or a userwho is sitting stands up.” [53]. Although these five features donot solely determine a breakpoint, we were convinced enoughby these results after a review discussion among the members(including some subjective impressions).

VIII. EVALUATION

On the basis of these promising results from our initialstudy, we conducted a large-scale “in-the-wild” user studyin the production environment with about 680,000 users forthree weeks to better understand how our breakpoint-basednotification scheduling works in a real user environment. Ourevaluation criteria are as follows.

1. Investigating users’ immediate response to thebreakpoint-scheduled notifications: We want to see how ef-fectively the breakpoint detection works from several differentpoints of view, such as relationships between activity changeand the conclusive detected breakpoints, and actual delay ofthe notification by waiting for a breakpoint.

2. Investigating users’ response to the breakpoint-scheduled notifications: We want to observe how users reactto the notifications scheduled at detected breakpoint timings.

3. Investigating users’ (long-term) engagement to the pre-sented contents and services: We want to investigate howthe users’ engagement level to the source Web service of thenotifications (Yahoo! JAPAN) will be influenced in the longerterm, beyond the users’ immediate reaction in a short periodof time.

A. Participants

We selected 687,840 users (approximately 10% of the totaluser base of the Yahoo! Android application) as participants inthis study. We used an existing A/B test infrastructure insideour application where a specific functional component of theapp can be enabled (or disabled) for a specific sub class ofusers based on the device ID. Using such system, we randomlyselected 5% of the whole userbase as the experimental groupand the control group respectively.

Table V shows the demographics of the users. We splitthese users evenly into two groups: (a) the experimental group(users to which our interruptibility detection and notificationschedule are enabled), and (b) the control group (users towhich our logic is not used) to compare the results and validatethe effectiveness of our system.

TABLE V. USER DEMOGRAPHICS

Number of Users 687,840

Gender Male 60.7%Female 39.3%

0-19 3.6%20-29 7.9%30-39 22.3%

Age Group 40-49 35.2%(Median: 44.0) 50-59 21.0%(Stdev.: 12.3) 60-69 8.4%

70-79 1.4%80- 0.3%

Selection of the participating users was not visible tothe participants. The application release was checked andconfirmed by the corporate legal and compliance departments.The study was conducted in conformity to the corporationregulation and the end user agreement.

B. Experiment Procedure

The user study was conducted for three weeks (21 days)in September 2016. To ensure the stability of the productionapplication, the new version (including our implementation)was released to the production environment with a graduateddeployment scheme on the app store. After three days, the newversion was made available for all users.

The mechanism we utilized for this user study is the onedetailed in Section VI. Our logic was enabled only for theexperimental group users and not enabled for the control groupusers. Note that, except for the delivery timing difference ofthe specific “Recommendation” type notifications described inSection VI-B, the users in both groups experienced the samenotification content and delivery timings.

All users (of the experimental group) used the same modelfor breakpoint prediction. At the beginning of the study, themodel trained in our initial study was installed in each client’sdevice. Once the study began, a new model was trained everynight at our Hadoop cluster from all clients’ log data and thenwas downloaded to each client as a daily update.

C. Results and Analysis

Through the nightly model update training over 21 days,our prediction model was gradually adapted to the users’ realusage. After 21 days (i.e., 21 iterations of the model update),the latest model showed the average performance accuracy of91.6% in the same 10-fold cross-validation methodology thatwe used in Section VII.

1) Breakpoint and Activity Change: Table VI shows abreakdown list of detected breakpoints with “true” annotation(i.e., breakpoints with a notification (presented based-on thebasis of the breakpoint detection trigger) that was clicked bythe user within 10 seconds) into activity change pairs.

TABLE VI. BREAKDOWN OF “TRUE”-LABELED DETECTEDBREAKPOINTS INTO ACTIVITY CHANGES

IN_VEHICLE ON_BICYCLE ON_FOOT STILL UNKNOWN TILTING

IN_VEHICLE 0.01 0.22 6.05 0.99 2.71ON_BICYCLE 0.03 0.01 0.32 0.01 0.09ON_FOOT 0.48 0.00 2.68 0.28 1.29STILL 8.16 0.06 1.75 7.42 43.28UNKNOWN 0.84 0.00 0.19 3.51 1.50TILTING 2.63 0.04 0.62 13.55 1.27

�������

From

To

Very interestingly, the “STILL to TILTING” activitychange showed the highest value. Again, as Google’s APIdocument [53] mentions, timings such as “when a device ispicked up from a desk” or “a user who is sitting stands up”are considered to have been opportune moments for the noti-fication receiver users. Moreover, activity changes to “STILL”showed high numbers, such as “TILTING to STILL” (13.55%)and “IN VEHICLE to STILL” (6.05%). This matches ourprevious hypothesis [8] that people would have breakpoints

Page 8: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

when changing from a high energy state to a low energy state.On the other hand, we see that “STILL to IN VEHICLE”resulted in the third highest number, 8.16%. What types ofreal-world situations are caught by this change is a possibletopic for future research.

2) Notification Delay Due To Breakpoint: Figure 5 showsa cumulative distribution function (CDF) on the delay fromwhen a notification content arrives at the client to when a coreestimation logic detects a breakpoint and the actual notificationis posted. The graph on the right side shows the overview. Weconfigured the timeout to 1 hour (3,600 seconds), so the valuegets very close to 1 at 3,600 seconds. (We also observed a veryrare situation where notification was further delayed due to animplementation issue.) When looking at the left side graph(zooming from 0 to 100 seconds), we see that more than 70%of notifications were posted within approximately 10 seconds.The overall average delay is 236.8 seconds (approximately 4minutes). The timeout occurrence rate is 1.11%.

Fig. 5. CDF on Delay from Content Reception To Notification Issue

3) Users’ Response to Notifications: One of the mostinteresting results we obtained from this study is the users’response time to presented notifications. Figure 6 shows aCDF on the users’ response time to the presented notifications(from when a notification is posted to when a user clicks it)in both user groups. The response times in the two groups aresignificantly different: the time in the experimental group is1,639.2 seconds (standard deviation: 1,690.2) while that in the

Fig. 6. CDF on Users’ Response Time To Notifications

control group is 3,258.1 seconds (standard deviation: 1,920.6).Comparing the two groups, we see that user response time wasreduced by 49.7%, with statistical significance. Combined withthe fact that more than 90% of notifications were delivered(Figure 5), these result mean that, in most cases, the users’clicks of notifications occurred earlier in our breakpoint-based notification scheduling. We conclude that, in mostcases, delay in notification delivery due to the breakpointdetection does not hurt and is even beneficial because the userscan click earlier.

Figure 7 and 8 shows CDFs of two user groups on theresponse time for each of the four notifications issued everyday. As explained previously, the “Recommendation” classnews content becomes available four times a day: 8AM, noon,6PM, and 9PM. When we plot each of them on the graph (blue:8AM, red: noon, green: 6PM, grey: 9PM), we confirm that,for all of them, breakpoint-scheduled notifications resulted inclearly shorter response times. The average response times are,in the experimental group, 1,490.0 sec (8AM), 1,527.8 sec(noon), 1,824.2 sec (6PM), and 1,645.0 sec (9PM), and in thecontrol group, 3,306.9 sec (8AM), 2,995.7 sec (noon), 3,413.6sec (6PM), and 3,400.6 sec (9PM).

Fig. 7. CDF on Users’ Response Time To Notifications (Experimental Group)

Fig. 8. CDF on Users’ Response Time To Notifications (Control Group)

Page 9: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

4) Users’ Content Browsing: Table VII shows the numberof opened notifications on a daily basis. Including the initialthree days when we gradually released the new application toGooglePlay, we measured the numbers for 24 days in total.

TABLE VII. USER BROWSING ON NOTIFICATION CONTENT

Day DOW Control Group Experimental Group gain(%)

(Deploy Day 1) (Tue) 1,837 1,880 +2.34

(Deploy Day 2) (Wed) 5,459 5,715 +4.69

(Deploy Day 3) (Thu) 9,096 9,280 +2.02

Study Day 1 (Fri) 29,837 30,386 +1.84

Study Day 2 (Sat) 27,747 27,918 +0.62

Study Day 3 (Sun) 36,528 36,927 +1.09

Study Day 4 (Mon) 39,990 41,253 +3.16

Study Day 5 (Tue) 31,237 31,792 +1.78

Study Day 6 (Wed) 33,869 34,320 +1.33

Study Day 7 (Thu) 48,419 49,184 +1.58

Study Day 8 (Fri) 53,191 53,874 +1.28

Study Day 9 (Sat) 49,008 49,379 +0.76

Study Day 10 (Sun) 45,544 45,717 +0.38

Study Day 11 (Mon)* 50,630 51,142 +1.01

Study Day 12 (Tue) 39,136 41,266 +5.44

Study Day 13 (Wed) 69,301 72,747 +4.97

Study Day 14 (Thu)* 37,845 38,345 +1.32

Study Day 15 (Fri) 42,394 42,971 +1.36

Study Day 16 (Sat) 47,824 48,051 +0.48

Study Day 17 (Sun) 45,256 46,330 +2.37

Study Day 18 (Mon) 33,128 33,678 +1.66

Study Day 19 (Tue) 39,089 39,479 +1.00

Study Day 20 (Wed)

Study Day 21 (Thu) 42,569 43,159 +1.39

(No data due to temporal system down)

(*): National public holiday in Japan

The interesting finding here is that, for 24 days in a row,the numbers in the control groups continued to get a greaternumber of accesses. The average gain over the 21 days of theuser study period was 1.91% (standard deviation: 1.35). Webelieve that this amount of gain is significant for Web serviceswhen compared with the industry practice. In addition, wecan observe that we have more gain on weekdays than otherdays including weekends and holidays. The average gain ofthe weekdays is 2.39% (standard deviation: 1.45) whereas thatof weekends and holidays is 1.00% (standard deviation: 0.64).This difference may stem from differences of users’ behaviorswith regard to their physical activities (commuting to work vs.being at home) and also from net access activities on weekdaysand other days.

5) Longer-Term User Engagement: Finally, we analyzedusers’ longer-term engagement level to the Yahoo! JAPANWeb service. As the evaluation criteria, we used a 5-level userengagement level that Yahoo! JAPAN typically uses for itsuser engagement analysis. This 5-level system was originallydesigned to rank each user as “5” if she/he visits the Yahoo!Web site 6-7 days a week, “4” in the case of 4-5 days, “3” inthe case of 2-3 days, “2” in the case of 1 day, and “1” in thecase of 0 days. Introducing this metric, we summarized thescore of 680,000 users on a group-by-group basis. Table VIIIshows the average level of users in each group. We firstsummarized the data a day before the study began and thendid the same summary for days 7, 14, and 21 of the study.

TABLE VIII. USER ENGAGEMENT LEVEL IN BOTH GROUPS

Timestamp Control Grp Experimental Grp gain (%)

Before Study 3.757 3.758 0.030

Study Day 7 3.753 3.756 0.077

Study Day 14 3.766 3.770 0.097

Study Day 21 3.828 3.832 0.104

As seen in the table, the differences between the experi-mental group and the control group continue to increase from“Before Study” (0.030%) to “Day 7” (0.077%), “Day 14”(0.097%), and “Day 21” (0.104%). We believe that this gain,as much as 3.46 times in three weeks, is a promising resultin terms of long-term user engagement beyond simple Webaccess session-wise user response to notifications.

IX. DISCUSSION

Now we discuss the further research opportunities that thisresearch enables. First, we are interested in utilizing moretypes of sensors and features for exploring even better systemperformance. In this study, the types of sensors were limiteddue to time limitations related to both corporate approvaland advance notification to the users. Once we have clearedthese requirements, injecting more types of sensor informationinto the estimation model will be our future research focus,while always maintaining our protection of users’ privacy. InFigure 5, more than 70% of notifications were posted withinapproximately 10 seconds. This can mean that, in some cases, abreakpoint was immediately detected at the first iteration of thelogic execution. We consider this comes from relative simpleset of features we used for the current system. For example, ifa trained model contains a relatively bigger weight for “devicevolume set to value x”, then the first execution immediatelydetects a breakpoint for notification firing. Even so, we areexcited to our result in this study, with that limited amount ofsensor and feature types.

Use of multiple models, including model personalization,is another search avenue. In our user study, we observed aninteresting difference in users’ responses between weekdaysand other days (Section VIII-C4). Further investigation on thisdifference, as well as the difference between days (day-to-daydifference) are obviously our immediate future work.

We are also interested in an even longer period of userstudy in the production environment. During the present study,we did not observe any statistical significance in differencesbetween user clicks and user engagement. However, we seeclearly reduced p values in some cases (on user engagement,p value decreased from 0.67 (before study) to 0.20 (day 21)).This motivates us to pursue further investigation over an evenlonger period.

Evaluation on the relationship between the notificationtypes (and content) and user’s preference and primary taskis also another future work. In this paper, for our first step,we focused on the generic user response thus the evaluation.influence of notifications from other applications, includingpopular messenger applications, is yet to be studied. Duringour user study, we hypothesized that they are random externalvariables that can randomly occur in both user groups.

Page 10: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

Finally, qualitative analysis from the end user’s perspectiveis also among our future work. We could not do such evaluationon the application itself during our user study, since ourexperiment uses the real product application. A user interviewcan be a concrete method of such evaluation.

X. CONCLUSION

In today’s advancing ubiquitous computing age, whereincreasingly proactive notifications have been causing inter-ruption overload, we addressed the problem of finding users’interruptible moments in the mobile environment in order toprovide them with better notification experiences featuringlower frustration, more responsiveness, and more engagementto the content. We developed a real-time interruptibility esti-mation logic based on breakpoint detection inside the Yahoo!JAPAN Android app, one of the most popular smartphoneapplications on the national market. Our large-scale in-the-wild user study in the production environment with morethan 680,000 real users for 21 days clearly demonstrated theeffectiveness of the system. We found that, in most cases,notification delivery delay due to breakpoint detection does nothurt and even improves a user’s overall click timing (earlier),with significantly reduced user response time (49.7%). We alsoobserved a continuous increase in content click numbers anduser engagement level over the entire study period.

REFERENCES

[1] D. Garlan, D. Siewiorek, A. Smailagic, and P. Steenkiste, “Project aura:toward distraction-free pervasive computing,” Pervasive Computing,IEEE, vol. 1, no. 2, pp. 22 –31, april-june 2002.

[2] J. G. Kreifeldt and M. E. McCarthy, “Interruption as a test of the user-computer interface,” in JPL Proceeding of the 17 th Annual Conferenceon Manual Control, 1981, pp. 655–667.

[3] F. R. Zijlstra, R. A. Roe, A. B. Leonora, and I. Krediet, “Temporalfactors in mental work: Effects of interrupted activities,” Journal ofOccupational and Organizational Psychology, vol. 72, no. 2, pp. 163–185, 1999.

[4] C. Speier, J. S. Valacich, and I. Vessey, “The influence of taskinterruption on individual decision making: An information overloadperspective,” Decision Sciences, vol. 30, no. 2, pp. 337–360, 1999.

[5] M. Czerwinski, E. Cutrell, and E. Horvitz, “Instant messaging: Effectsof relevance and timing,” in People and computers XIV: Proceedings ofHCI, vol. 2. British Computer Society, 2000, pp. 71–76.

[6] D. Newtson and G. Engquist, “The perceptual organization of ongoingbehavior,” Journal of Experimental Social Psychology, vol. 12, no. 5,pp. 436–450, 1976.

[7] T. Okoshi, J. Ramos, H. Nozaki, J. Nakazawa, A. K. Dey, andH. Tokuda, “Attelia: Reducing user’s cognitive load due to interruptivenotifications on smart phones,” in Proceedings of IEEE InternationalConference on Pervasive Computing and Communications 2015, ser.PerCom ’15, 2015.

[8] T. Okoshi, J. Ramos, H. Nozaki, J. Nakazawa, A. K. Dey, andH. Tokuda, “Reducing users’ perceived mental effort due to interruptivenotifications in multi-device mobile environments,” in Proceedingsof the 2015 ACM International Joint Conference on Pervasive andUbiquitous Computing, ser. UbiComp ’15, 2015, pp. 475–486.

[9] “Yahoo! japan android application,” https://play.google.com/store/apps/details?id=jp.co.yahoo.android.yjtop, Yahoo Japan Corporation, Sep.2016.

[10] H. A. Simon, “Designing organizations for an information-rich world,”Computers, communication, and the public interest, vol. 37, pp. 40–41,1971.

[11] A. Toffler, Future shock. Bantam, 1990.[12] J. T. Milord and R. P. Perry, “A methodological study of overloadx,”

The Journal of General Psychology, vol. 97, no. 1, pp. 131–137, 1977.

[13] “Attention management in ubiquitous computing environments (amuce07),” http://www.ubicomp.org/ubicomp2007/index-14.htm.

[14] S. Gould, D. Brumby, A. Cox, V. Gonzalez, D. Salvucci, and N. Taat-gen, “Multitasking and interruptions: a sig on bridging the gap betweenresearch on the micro and macro worlds,” in CHI’12 Extended Abstractson Human Factors in Computing Systems, 2012, pp. 1189–1192.

[15] B. Poppinga, M. Pielot, N. Henze, N. Oliver, K. Church, and A. S.Shirazi, “Smarttention, please! intelligent attention management onmobile devices,” in Proceedings of the 17th International Conferenceon Human-Computer Interaction with Mobile Devices and ServicesAdjunct, ser. MobileHCI ’15, 2015, pp. 1066–1069.

[16] D. Weber, A. S. Shirazi, S. Gehring, N. Henze, B. Poppinga, M. Pielot,and T. Okoshi, “Smarttention, please!: 2nd workshop on intelligentattention management on mobile devices,” in Proceedings of the 18thInternational Conference on Human-Computer Interaction with MobileDevices and Services Adjunct, ser. MobileHCI ’16, 2016, pp. 914–917.

[17] A. Voit, B. Poppinga, D. Weber, M. Bohmer, N. Henze, S. Gehring,T. Okoshi, and V. Pejovic, “Ubittention: Smart & ambient notificationand attention management,” in Adjunct Proceedings of the 2016 ACMInternational Joint Conference on Pervasive and Ubiquitous Computing,ser. UbiComp ’16, 2016, pp. 1520–1523.

[18] D. Kahneman, Attention and effort. Prentice-Hall, Inc., 1973.

[19] R. M. Shiffrin and W. Schneider, “Controlled and automatic humaninformation processing: Ii. perceptual learning, automatic attending anda general theory.” Psychological review, vol. 84, no. 2, p. 127, 1977.

[20] R. J. Sternberg and K. Sternberg, Cognitive Psychology, 6th ed. Cen-gage Learning, 2012.

[21] P. D. Adamczyk and B. P. Bailey, “If not now, when?: the effects ofinterruption at different moments within task execution,” in Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems,ser. CHI ’04, 2004, pp. 271–278.

[22] B. P. Bailey and J. A. Konstan, “On the need for attention-awaresystems: Measuring effects of interruption on task performance, errorrate, and affective state,” Computers in Human Behavior, vol. 22, no. 4,pp. 685 – 708, 2006.

[23] F. Paas, J. E. Tuovinen, H. Tabbers, and P. W. Van Gerven, “Cognitiveload measurement as a means to advance cognitive load theory,”Educational psychologist, vol. 38, no. 1, pp. 63–71, 2003.

[24] S. G. Hart and L. E. Staveland, “Development of NASA-TLX (TaskLoad Index): Results of empirical and theoretical research,” in HumanMental Workload, ser. Advances in Psychology, P. A. Hancock andN. Meshkati, Eds. North-Holland, 1988, vol. 52, pp. 139 – 183.

[25] J. Beatty and B. Lucero-Wagoner, “The pupillary system,” Handbookof psychophysiology, vol. 2, pp. 142–162, 2000.

[26] C. S. Ikehara and M. E. Crosby, “Assessing cognitive load with physi-ological sensors,” in System Sciences, 2005. HICSS’05. Proceedings ofthe 38th Annual Hawaii International Conference on, 2005, pp. 295a–295a.

[27] S. T. Iqbal, P. D. Adamczyk, X. S. Zheng, and B. P. Bailey, “Towards anindex of opportunity: understanding changes in mental workload duringtask execution,” in Proceedings of the SIGCHI conference on Humanfactors in computing systems, 2005, pp. 311–320.

[28] G. F. Wilson, “An analysis of mental workload in pilots during flight us-ing multiple psychophysiological measures,” The International Journalof Aviation Psychology, vol. 12, no. 1, pp. 3–18, 2002.

[29] K. Ryu and R. Myung, “Evaluation of mental workload with a combinedmeasure based on physiological indices during a dual task of trackingand mental arithmetic,” International Journal of Industrial Ergonomics,vol. 35, no. 11, pp. 991–1009, 2005.

[30] Y. Shi, N. Ruiz, R. Taib, E. Choi, and F. Chen, “Galvanic skin response(gsr) as an index of cognitive load,” in CHI’07 extended abstracts onHuman factors in computing systems, 2007, pp. 2651–2656.

[31] T. K. Fredericks, S. D. Choi, J. Hart, S. E. Butt, and A. Mital, “Aninvestigation of myocardial aerobic capacity as a measure of bothphysical and cognitive workloads,” International Journal of IndustrialErgonomics, vol. 35, no. 12, pp. 1097–1107, 2005.

[32] L. Mulder, “Measurement and analysis methods of heart rate andrespiration for use in applied environments,” Biological psychology,vol. 34, no. 2, pp. 205–236, 1992.

Page 11: Attention and Engagement-Awareness in the Wild: …slash/papers/Okoshi2017a...Attention and Engagement-Awareness in the Wild: A Large-Scale Study with Adaptive Notifications Tadashi

[33] E. Haapalainen, S. Kim, J. F. Forlizzi, and A. K. Dey, “Psycho-physiological measures for assessing cognitive load,” in Proceedingsof the 12th ACM international conference on Ubiquitous computing,ser. Ubicomp ’10, 2010, pp. 301–310.

[34] S. T. Iqbal and B. P. Bailey, “Investigating the effectiveness of mentalworkload as a predictor of opportune moments for interruption,” in CHI’05 Extended Abstracts on Human Factors in Computing Systems, ser.CHI EA ’05, 2005, pp. 1489–1492.

[35] S. T. Iqbal and B. Bailey, “Leveraging characteristics of task structureto predict the cost of interruption,” in Proceedings of the SIGCHIConference on Human Factors in Computing Systems, ser. CHI ’06,2006, pp. 741–750.

[36] E. Horvitz and J. Apacible, “Learning and reasoning about interruption,”in Proceedings of the 5th International Conference on MultimodalInterfaces, ser. ICMI ’03, 2003, pp. 20–27.

[37] S. Hudson, J. Fogarty, C. Atkeson, D. Avrahami, J. Forlizzi, S. Kiesler,J. Lee, and J. Yang, “Predicting human interruptibility with sensors:A wizard of oz feasibility study,” in Proceedings of the SIGCHIConference on Human Factors in Computing Systems, ser. CHI ’03,2003, pp. 257–264.

[38] J. B. Begole, N. E. Matsakis, and J. C. Tang, “Lilsys: Sensing unavail-ability,” in Proceedings of the 2004 ACM Conference on ComputerSupported Cooperative Work, ser. CSCW ’04, 2004, pp. 511–514.

[39] E. Horvitz, P. Koch, and J. Apacible, “Busybody: Creating and fieldingpersonalized models of the cost of interruption,” in Proceedings of the2004 ACM Conference on Computer Supported Cooperative Work, ser.CSCW ’04, 2004, pp. 507–510.

[40] S. T. Iqbal and B. P. Bailey, “Oasis: A framework for linking notifica-tion delivery to the perceptual structure of goal-directed tasks,” ACMTransactions on Computer-Human Interaction, vol. 17, no. 4, pp. 15:1–15:28, Dec. 2010.

[41] J. Ho and S. S. Intille, “Using context-aware computing to reduce theperceived burden of interruptions from mobile devices,” in Proceedingsof the SIGCHI Conference on Human Factors in Computing Systems,ser. CHI ’05, 2005, pp. 909–918.

[42] J. E. Fischer, C. Greenhalgh, and S. Benford, “Investigating episodesof mobile phone activity as indicators of opportune moments to delivernotifications,” in Proceedings of the 13th International Conference onHuman Computer Interaction with Mobile Devices and Services, ser.MobileHCI ’11, 2011, pp. 181–190.

[43] G. H. H. ter Hofte, “Xensible interruptions from your mobile phone,” inProceedings of the 9th International Conference on Human ComputerInteraction with Mobile Devices and Services, ser. MobileHCI ’07,2007, pp. 178–181.

[44] V. Pejovic and M. Musolesi, “InterruptMe : Designing IntelligentPrompting Mechanisms for Pervasive Applications,” in Proceedingsof the 2014 ACM International Joint Conference on Pervasive andUbiquitous Computing, ser. UbiComp ’14, 2014, pp. 395–906.

[45] M. Pielot, T. Dingler, J. S. Pedro, and N. Oliver, “When attention is notscarce - detecting boredom from mobile phone usage,” in Proceedingsof the 2015 ACM International Joint Conference on Pervasive andUbiquitous Computing, ser. UbiComp ’15, 2015, pp. 825–836.

[46] A. Mathur, N. D. Lane, and F. Kawsar, “Engagement-aware computing:Modelling user engagement from mobile contexts,” in Proceedingsof the 2016 ACM International Joint Conference on Pervasive andUbiquitous Computing, ser. UbiComp ’16, 2016, pp. 622–633.

[47] U. Acer, A. Mashhadi, C. Forlivesi, and F. Kawsar, “Energy efficientscheduling for mobile push notifications,” in Proceedings of the 12thEAI International Conference on Mobile and Ubiquitous Systems:Computing, Networking and Services on 12th EAI International Confer-ence on Mobile and Ubiquitous Systems: Computing, Networking andServices, ser. MOBIQUITOUS'15, 2015, pp. 100–109.

[48] A. Sahami Shirazi, N. Henze, T. Dingler, M. Pielot, D. Weber, andA. Schmidt, “Large-scale assessment of mobile notifications,” in Pro-ceedings of the 32nd annual ACM conference on Human factors incomputing systems - CHI ’14, 2014, pp. 3055–3064.

[49] M. Pielot, K. Church, and R. de Oliveira, “An In-Situ Study ofMobile Phone Notifications,” in Proceedings of the 16th InternationalConference on Human-Computer Interaction with Mobile Devices andServices - MobileHCI ’14, 2014, pp. 233–242.

[50] StatCounter, Inc., “StatCounter,” http://statcounter.com/, Jul. 2016.[51] M. Csikszentmihalyi and R. Larson, “Validity and reliability of the

experience-sampling method.” The Journal of nervous and mentaldisease, vol. 175, no. 9, pp. 526–536, 1987.

[52] Apple Inc., “CMMotionActivityManager,” https://developer.apple.com/library/ios/documentation/CoreMotion/Reference/CMMotionActivityManager class/index.html, 2014.

[53] Google Inc., “Making your app location-aware - Android Developers,”https://developer.android.com/intl/ja/training/location/index.html.

[54] The Apache Software Foundation, “Apache Hadoop Project,” http://hadoop.apache.org/.

[55] Machine Learning Group, National Taiwan University, “LIBLINEAR– A Library for Large Linear Classification,” https://www.csie.ntu.edu.tw/∼cjlin/liblinear/.