25
Research Article Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment Javier Andión , José M. Navarro, Gregorio López , Manuel Álvarez-Campana , and Juan C. Dueñas Departamento de Ingenier´ ıa de Sistemas Telem´ aticos, Escuela T´ ecnica Superior de Ingenieros de Telecomunicaci´ on, Universidad Polit´ ecnica de Madrid, Avenida Complutense 30, 28040 Madrid, Spain Correspondence should be addressed to Javier Andi´ on; [email protected] Received 3 August 2018; Revised 22 October 2018; Accepted 11 November 2018; Published 2 December 2018 Guest Editor: Jorge Lanza Copyright © 2018 Javier Andi´ on et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In a more and more urbanized World, the so-called Smart Cities need to be driven by the principles of efficiency and sustainability. Information and Communications Technologies and, in particular, the Internet of ings will play a key role on this, since they will allow monitoring and optimizing all the municipal services that exist and shall exist. People flow monitoring stands out in this context due to its wide range of applications, spanning from monitoring transport infrastructure to physical security applications. ere are different techniques to perform people flow monitoring, presenting pros and cons, as in any other engineering problem. Typically, the options that provide the most accurate results are also the most expensive ones, whereas there are cases where presence detection in given areas is enough and cost is a limiting factor. e main goal of this paper is to prove that a minimal deployment of sensors, combined with the adequate analysis and visualization algorithms, can render useful results. In order to achieve this goal, a dataset is used with 1-year data from a real infrastructure composed of 9 Wi-Fi tracking sensors deployed in the Telecommunications Engineering School of Universidad Polit´ ecnica de Madrid, which is visited by 4000 people daily and covers 1.8 hectares. e data analysis includes time and occupancy, position of people, and identification of common behaviors, as well as a comparison of the accuracy of the considered solution with actual data and a video monitoring system available at the library of the school. e obtained insights can be used for optimizing the management and operation of the school, as well as for other similar infrastructures and, in general, for other kind of applications which require not very accurate people flow monitoring at low cost. 1. Introduction e World is going tremendously urbanized. Based on the latest revision on the World urbanization prospects from the United Nations (UN) [1], nowadays 55% of the global population live in urban areas and such a percentage is expected to increase up to 68% by 2050. In addition, the number of so-called megacities (cities with more than 10 million inhabitants) around the World has gone from 10, in 1990, to 28, in 2014, and it is estimated that there will be 43 in 2030. As a result, efficiency and sustainability become the key principles for the so-called Smart Cities, so that they can accommodate such an amount of inhabitants guaranteeing high levels of comfort. Information and Communication Technologies (ICT) and, in particular, the new paradigm of the Internet of ings (IoT) are key for Smart Cities in that they will allow monitoring city services, ranging from traffic management to waste collection, and running optimizations based on the huge amount of gathered data. One of the main challenges when considering deploying new Smart City services is that there are many platforms, technologies, and protocols available and that they typically involve a huge number of devices, so the associated invest- ment is remarkable. erefore, it is especially interesting to have testbeds available for experimentation, but they need to be representative enough so that the conclusions obtained from them are significant. In this context, university cam- puses appear as ideal places for experimenting and evaluating innovative proposals that can later be scaled to the cities where they are typically integrated, thus arising the concept of living lab [2, 3], which is already a reality in Universities Hindawi Wireless Communications and Mobile Computing Volume 2018, Article ID 3136471, 24 pages https://doi.org/10.1155/2018/3136471

Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Research ArticleSmart Behavioral Analytics over a Low-Cost IoT Wi-FiTracking Real Deployment

Javier Andioacuten Joseacute M Navarro Gregorio Loacutepez Manuel Aacutelvarez-Campana and Juan C Duentildeas

Departamento de Ingenierıa de Sistemas Telematicos Escuela Tecnica Superior de Ingenieros de TelecomunicacionUniversidad Politecnica de Madrid Avenida Complutense 30 28040 Madrid Spain

Correspondence should be addressed to Javier Andion jandionupmes

Received 3 August 2018 Revised 22 October 2018 Accepted 11 November 2018 Published 2 December 2018

Guest Editor Jorge Lanza

Copyright copy 2018 Javier Andion et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

In a more andmore urbanizedWorld the so-called Smart Cities need to be driven by the principles of efficiency and sustainabilityInformation and Communications Technologies and in particular the Internet of Things will play a key role on this since theywill allow monitoring and optimizing all the municipal services that exist and shall exist People flow monitoring stands out in thiscontext due to its wide range of applications spanning from monitoring transport infrastructure to physical security applicationsThere are different techniques to perform people flow monitoring presenting pros and cons as in any other engineering problemTypically the options that provide themost accurate results are also themost expensive ones whereas there are caseswhere presencedetection in given areas is enough and cost is a limiting factorThemain goal of this paper is to prove that a minimal deployment ofsensors combined with the adequate analysis and visualization algorithms can render useful results In order to achieve this goal adataset is usedwith 1-year data froma real infrastructure composed of 9Wi-Fi tracking sensors deployed in the TelecommunicationsEngineering School of Universidad Politecnica de Madrid which is visited by 4000 people daily and covers 18 hectares The dataanalysis includes time and occupancy position of people and identification of common behaviors as well as a comparison ofthe accuracy of the considered solution with actual data and a video monitoring system available at the library of the school Theobtained insights can be used for optimizing themanagement and operation of the school aswell as for other similar infrastructuresand in general for other kind of applications which require not very accurate people flow monitoring at low cost

1 Introduction

The World is going tremendously urbanized Based on thelatest revision on the World urbanization prospects fromthe United Nations (UN) [1] nowadays 55 of the globalpopulation live in urban areas and such a percentage isexpected to increase up to 68 by 2050 In addition thenumber of so-called megacities (cities with more than 10million inhabitants) around the World has gone from 10 in1990 to 28 in 2014 and it is estimated that there will be 43 in2030

As a result efficiency and sustainability become the keyprinciples for the so-called Smart Cities so that they canaccommodate such an amount of inhabitants guaranteeinghigh levels of comfort Information and CommunicationTechnologies (ICT) and in particular the new paradigm of

the Internet of Things (IoT) are key for Smart Cities in thatthey will allow monitoring city services ranging from trafficmanagement to waste collection and running optimizationsbased on the huge amount of gathered data

One of the main challenges when considering deployingnew Smart City services is that there are many platformstechnologies and protocols available and that they typicallyinvolve a huge number of devices so the associated invest-ment is remarkable Therefore it is especially interesting tohave testbeds available for experimentation but they needto be representative enough so that the conclusions obtainedfrom them are significant In this context university cam-puses appear as ideal places for experimenting and evaluatinginnovative proposals that can later be scaled to the citieswhere they are typically integrated thus arising the conceptof living lab [2 3] which is already a reality in Universities

HindawiWireless Communications and Mobile ComputingVolume 2018 Article ID 3136471 24 pageshttpsdoiorg10115520183136471

2 Wireless Communications and Mobile Computing

9Wi-Fi tracking

1 Telecommunications2 Forestry I3 Forestry II4 Rectorate5 Civil6 Agricultural I7 Agricultural II

amp Aerospace8 Aeronautical

amp Sport12 Health Science

9 Naval amp Marine10 Architecture I11 Architecture II

13 IndustrialTotal

Environmental

Engineering schools Number of sensors

33 23 24 23 23 25 25 2

3 24 22 -3 2

5 252 25

Figure 1 Summary of the sensors deployed in Smart CEI Moncloa (at July 2018) Figure 1 is reproduced from [8] (2017)

around the World such as Delft University [4] University ofBritish Columbia [5] or Harvard University [6]

Universidad Politecnica de Madrid and in particularthe Campus of International Excellence (CEI) of Moncloapresents such a great potential in this sense This campusis integrated in the metropolitan area of Madrid spreadsacross 55 Km2 and counts on a daily flow that goes up to120000 people (which is comparable or even greater thanmany Spanish provincial capitals)

In order to make the most out of this potential withinthe UPMCity of the Future initiative [7] the IoT platform forSmart City services experimentation Smart CEIMoncloa wasdeployed [8]This platform offers currently two pilot servicesnamely environmental monitoring and people flow moni-toring The devices used for the environmental monitoringservice are based on Arduino and collect measurements oftemperature humidity luminosity noise CO and NO2 Thedevices used for the people flowmonitoring service are basedon Raspberry Pi and perform Wi-Fi tracking As Figure 1shows for the time being there are 77 devices deployed acrossthe 13 engineering schools of the CEI Moncloa 52 for thepeople flowmonitoring service and 25 for the environmentalmonitoring service The platform is up and running since2016 so there is plenty of data available to be analyzed

This paper focuses indeed on analyzing the peopleflow monitoring data gathered in the TelecommunicationsEngineering School (ETSIT) of UPM during 2016 Peopleflow monitoring represents a hot topic nowadays because itpresents such a wide range of applications in Smart Citiesspanning from monitoring public transport infrastructure(eg metro airports) private transport infrastructure (eghighways) overcrowded scenarios (eg demonstrationsconcerts) or customer behaviors (eg malls) to physicalsecurity applications (eg presence of unauthorized peoplein restricted areas) For these purposes in many cases it isenough with providing presence detection in given areas

instead of more sophisticated and costly solutions to performvery accurate location which require fingerprinting and verydense sensor deployments gathering data at very high fre-quencies This is the case indeed of the people flow monitor-ing service considered in this paper which is based on a fewlow-cost devices that upload data every 15 minutes and thatare independent from the institutional network which allowstracking the users connected to different Wi-Fi networks ifthey spatially coexist or even not connected to any

Hence as Figure 1 shows in the ETSIT which is one of thebiggest schools in the CEI Moncloa visited by 4000 peopledaily (3000 students 500 professors and researchers and 500admin andmaintenance staff approximately) there are 9Wi-Fi sensors covering 18 hectares of indoor areas Figure 2shows the location of these sensors Dark areas correspondto the floor of the buildings of the ETSIT (buildings A BC and D) There is a Wi-Fi sensor at the entry of eachbuilding In addition there are also sensors in the library(4) and student tables (3) as they are large spaces usuallycrowded by students As it can be also seen the area coveredby each sensor varies some of them covering especially largeareas such as the ones in the library (1300m2) or in the mainentrance (1270m2)

This paper aims to explore the useful insights that canbe obtained from such a cost-effective solution for peopleflowmonitoringThus the paper performs a detailed analysisof the people flow monitoring data including a temporalanalysis a spatial analysis and an activity pattern analysisas well as a comparison of the performance of this solutionwith a much more expensive one based on video monitoringat the library of the ETSIT These analyses can be used foroptimizing the management and operation of the schoolfrom the work shifts to the proper operation of the lightingto reduce energy consumption and so the carbon footprintThe conclusions can be valid for similar infrastructures butare also relevant in general for municipalities which will not

Wireless Communications and Mobile Computing 3

Sensor1 Entr A

2 Entr A Sec

3 Std Tables

4 Library

5 Entr B

6 Entr B Lab

7 Entr B Sec

8 Entr C

9 Entr D

Main entrance of the building A of the school

Secondary accesses to building A It covers most ofthe classrooms of this building

Sensor close to ldquoEntr Ardquo just above an area oftables where students work and study in groups

Inside the library which is open every day from 9am to 9 pm

Main entrance to building B This building containsmostly laboratories and offices

Secondary entrance of this building which is theclosest entrance to the laboratories

Secondary access to building B covering theclassrooms of this building

Main entrance of building C This building containsonly research laboratories and offices students donot have classes in this building

Entrance of building D covering backdoor of thelibrary

Description

Figure 2 Summary of the Wi-Fi sensors deployed in the Telecommunication Engineering SchoolThe map on the left-hand side also showsthe location of (120572) offices and labs (120573) classrooms (120574) cantina

typically be able to afford accurate and costly location systemsall over the city

The rest of the paper is structured as follows Section 2presents an exhaustive analysis of techniques currently usedfor monitoring people flows and identifying behaviors Sec-tion 3 outlines the main characteristics of the sensor infras-tructure and the IoT platform that collects the data analyzedin this paper After a brief sketch of the methodology usedin this work Section 4 presents the analysis performed ondata with respect to time and occupancy position of peopleand identification of common behaviors or activity patternsThen Section 5 describes the data available as ground truthand the validation of the analyzed system Finally Section 6summarizes the main findings of the analysis and presentssome ideas for building similar systems

2 Related Work

People flow monitoring has always been a useful pieceof information Knowing a personrsquos or a mass of peoplersquosposition or trajectory allows for the creation of a wide rangeof different applications such as crowd monitoring in eventsor concerts the discovery of the most common routes inshopping malls analysis of space usage in public or privateinfrastructures or security against unusual activities (egpresence of unauthorized people in restricted areas) In thelast decades the spread of communication technologies (egthe popularization of Wi-Fi networks or the use of smart-phones) has become a vast source of data which allows forthe improvement or even the automatization of techniques tomonitor people

In most of the cases positioning in outdoors scenarioscan be solved using Global Positioning System (GPS) but ittypically presents limitations in terms of accuracy in indoor

scenarios [9] Thus indoor location or pedestrian locationhas been a key research topic in the last years Some worksaim to discover with a high precision how people moveinside buildings by using the mobile network or personalarea networks eg [10] These systems try to help usersto discover their path in a building measure the lengthof their stay in a mall for commercial purposes or simplyease people movements by removing obstacles The mostcommon technologies used in recent years to achieve thiskind of pedestrian tracking have been video camera systemsradiofrequency signals Bluetooth smartphones sensors andWi-Fi networks These tracking methods can be classifiedbased on two aspects

(i) Whether they need user intervention like the usageof a smartphone application or a specific hardwarewhich would be classified as active or do not need thecooperation of the users to work ie passive systems

(ii) Whether or not a sensor network deployment isneeded (eg by using the existing Wi-Fi access pointsnetwork)

The usage of video camera systems and computer visionpresents some advantages such as the fact that it is a passivesystem which can use existing camera network deploymentsor provides almost real time feedback However it alsopresents many drawbacks although there are works thatpropose ways to mitigate these effects [11 12] such as itsdependency on visual aspects (eg poor lighting obstacles)its low scalability due to deployment complexity and coststhe difficulty of fusing multiple video streams to provideautomatic monitoring or the difficulty of tracking usersindividually Reference [13] provides a survey on computervision techniques for the specific case of crowd scene analysis

4 Wireless Communications and Mobile Computing

Table 1 Summary of the state of the art of people flow monitoring techniques which are not based on Wi-Fi tracking (PA stands forPassiveActive)

Technology Ref Year Scenario Application PA Owndeployment

Video camera ampComputer Vision [13] 2015 Indoor

Outdoor Crowd scene analysis P No

RF [14ndash16]201320142015

Indoor Short range movements (1-10 m) with highaccuracy (eg elderly care baby care) P Yes

Cellular networks [18] 2011 IndoorOutdoor

Analyze people flow in a suburban area newNYC with accuracy around hundreds ofmeters

P No

GPS [19] 2015 Outdoor Prevent critical situations in overcrowdedscenarios (eg concerts) A No

Smartphone sensors(eg accelerometer) [20] 2018 Indoor PDR Track individual pedestrian A Depends

Bluetooth [21] 2017 Indoor Monitor people flow (comparing Wi-Fi withBluetooth) A Yes

covering from existing models and popular algorithms tocurrent research problems and trends

People localization and tracking based on radiofrequency(RF) measurements has been also widely addressed in liter-ature References [14ndash16] represent some recent remarkableresearch work on this topic These solutions are based onantennas that transmit RF signals and are able to locate andtrack people based on body radio reflections In consequencethey can be classified as passive systems They provide veryaccurate results allowing tracking forward and backwardmovements or body part movements (including breathing)and are able to even track several people under certainconfigurations However it is also difficult to track a fairlyhigh number of users individually based on this kind oftechniques (even if we assume that a person may have acertain type of body radio reflection signature it wouldnot be different enough between similar people and mightchange depending on the distance between the person andthe vantage points) They are mainly applied in short rangemovements (1-10 meters) so they could only be applied insmall rooms at the cost of a large and specific deployment

People flowmonitoring can be also performed exploitingthe ubiquity communications capabilities and integratedsensors of smartphones [17] In [18] for instance Call DataRecords (CDR) are used to locate the base stations thesmartphones are attached to and thus analyze people flow inand out of a suburban city near New York City The maindrawbacks of this way of locating and tracking people areits low accuracy (around hundreds of meters) and that theterminal has to be active (eg make or receive a call orsend or receive an SMS) in order to be monitored althoughthis technique would be classified as passive given thatthe needed user activity is not aimed to contribute to thepositioning

Reference [19] proposes a solution to prevent criticalsituations in overcrowded scenarios based on a smartphoneapplication that transmits its GPS location data Although ityields good results in terms of accuracy the main drawbacks

of this approach are that it is an active technique sinceit requires the installation of the app the impact of theconsumption of the app on the autonomy of the terminal andit may also present problems to work properly indoors

Smartphone sensors can be also used to perform Pedes-trian Dead Reckoning (PDR) which is a method that tries toestimate a pedestrianrsquos position based on their past positionand the output of smartphonersquos sensors eg accelerometergyroscope etc This technique is usually supported by otherpositioning methods [20] but in most cases it does not needa specific sensor network other than the smartphone itself Itis an active system and it is typically used to track individualpedestrians instead of flocks but the main drawback of thistechnique is that it suffers tracking error accumulation andneeds an extra location measure frequently

Bluetooth has also been used to monitor people flowsin indoor environments Some works such as [21] performa comparison between the usage of Bluetooth and othertechniques such as Wi-Fi and its conclusion is that itscapabilities are belowother options although it can be used incombination with other positioning systems to improve theiraccuracy The main disadvantage of this tracking systems isthat most Bluetooth devices only react to Bluetooth signalswhen the users make them visible to the network Also theseimplementations require a specific deployment of Bluetoothsensors which usually cannot be used for other tasks Table 1summarizes the previous research work reviewed so farwithout considering Wi-Fi tracking based works

Lastly people tracking systems based onWi-Fi have beena hot topic from more than fifteen years and it is still so Thisis indeed the technique used in this paper Thus Table 2 isexclusively devoted to summarizing and comparing previousresearch work based on this technique

As it is shown in Table 2 works related to Wi-Fi trackingtechniques can be focused on different objectives sometry to obtain usersrsquo positions as accurately as possible [22ndash32] others analyze the trajectories followed by pedestrians[33ndash35] or flocks [36ndash38] and finally others study the

Wireless Communications and Mobile Computing 5

Table 2 Summary of the state of the art of people flow monitoring techniques based on Wi-Fi tracking (PA stands for PassiveActive)

Ref Year Scenario Application PA Own deployment

[22] 2003 Indoor One of the earliest approaches on precise positioning usingWi-Fi (precision 26m) A Yes

[23] 2006 IndoorWi-Fi fingerprint to identify the general location andapplying logistic regression to distinguish betweenfiner-grained locations

A Yes

[24] 2006 Indoor officebuilding

Precise positioning PDR combined with Wi-Fi to reduce theaccumulated error A Yes

[25] 2007 Outdoor andindoor

Creation of Wi-Fi map Positioning comparing with thecreated map A Yes Own hardware

Offline analysis

[26] 2007 Indoor campus Comparison of positioning and tracking methods usingWi-Fi P No Offline

[27] 2008 Indoor campus Estimate the position using Wi-Fi and tracking with PDR A Yes

[28] 2009 Indoor campus Real time Wi-Fi positioning web portal to check userrsquospositions A Yes

[39] 2009 Indoor campus Count of users in different buildings Analysis of usersrsquomobility between buildings P No Institutional APs

[57] 2010Indoor campus

and officebuilding

Coarse position with Wi-Fi and Bluetooth Graph of userco-occurrence A No Offline

[43] 2010Indoor campus

and officebuilding

Calculate of stay length based on Wi-Fi positioning Analysisof favorite locations A No Offline

[44] 2011 Indoor campus Extension to [39] User characterization based on theirmobility patterns P No Institutional APs

[29] 2011 Indoor tunnelin construction

Precise positioning in real time of workers inside aconstructing tunnel using Wi-Fi (precision 5m) P Yes Own AP

deployment

[30] 2012 Indoor campus Creation of Wi-Fi fingerprint map Map usage to positioningwith smartphone application A No

[36] 2012 Indoor campus Study of crowd movement Wi-Fi based Analysis of mobilitypatterns usersrsquo arrivals and departures from campus P No Institutional APs

[58] 2013 Indoor Wi-Fi path analysis in real time A No Institutional APs

[33] 2014 Indoor andoutdoor campus

Analyze pedestrian destination frequencies in an area of 55hectares of a university campus during 5 weekdays P No Institutional APs

and Radius server

[59] 2014 Indoor campusLocalization and tracking system exploiting particle filters tocombine dead reckoning Wi-Fi RSS-based analyzing andknowledge of floor plan together (precision 07m)

A

[60] 2015 Indoorshopping mall

Wi-Fi Channel State Information analysis to detect shopperactivities P Yes Own AP

deployment

[31] 2015 IndoorPrecise positioning based on sensor fusion combining Wi-FiPDR and landmarks Smartphone application (Positioning1m)

A No Smartphones

[32] 2015 Indoor parking Precise positioning combining Wi-Fi RSS andelectromagnetic field map

[37] 2015Outdoor

concert Indoorcampus

Portable Wi-Fi based user count Analysis of crowds inconcert and in campus P Yes Raspberry Pi

based

[61] 2015 Outdoor Creation of Wi-Fi map using GPS A[62] 2016 Indoor Precise positioning combining Wi-Fi and PDR A No Smartphones

[34] 2016 Indoor airportUser path detection Combining Wi-Fi GPS PDR andBluetooth to create a multilevel map and study of userrsquostrajectory prediction

A No Smartphones

[45] 2016 Indoor campus Analysis of usersrsquo activities User tagging based on activitiesregistered P No Institutional APs

[46] 2016 Indoor andoutdoor campus

Analysis of user movements to different food points topredict the operation of new stores based on price andlocation

P No Institutional APs

6 Wireless Communications and Mobile Computing

Table 2 Continued

Ref Year Scenario Application PA Own deployment[63] 2017 Indoor Crowdsourcing positioning based on Wi-Fi fingerprint A No Institutional APs

[41] 2018 IndoorCoarse positioning room level precision based onprobabilistic Wi-Fi fingerprint Usage of Hidden Markovchain models to analyze user movement

P No Institutional APs

[35] 2016 Indoor Trajectory analysis based on Hidden Markov chain models P No Institutional APs

[42] 2017 IndoorEstimate the number of participants and their space and timeevolution in an area of about 167 hectares during 2016 OpenDay of the European JRC

P No Institutional APs

[47] 2016 IndoorOutdoor

Study mobility-related activities in a campus of 440 hectaresbased on the 2700 APs of the institutional network andadditional opt-in smartphone application

A No Institutional APs

[53] 2014 Indoor

Classify users in a hospital (eg patient doctoradministrative) by checking the number of hours and thepositions of a user over time based on the institutional Wi-Finetwork

P No Institutional APs

[38] 2012 IndoorIdentify flocks walking in a building and their behavior basedon signal strength from the institutional Wi-Fi and usingclustering techniques

P No Institutional APs

[45] 2016 Indoor Analyze usersrsquo occupation (based on Markov models) as wellas regular and irregular hours in a university campus P No Institutional APs

[40] 2016 Indoor Analyze room utilization and people tracking providing heatmaps Analyze device statistics A No Institutional APs

[48] 2017 IndoorOutdoor

Analyze people mobility monitoring and tracking in SmartCities and traffic in a highway (eg driving behavior trafficforecasting)

P Yes Raspberry Pibased

[49] 2018 IndoorOutdoor

Provide user localization user profiling and deviceclassification A Yes Raspberry Pi

based

[54] 2005 IndoorOutdoor

Analyze Wi-Fi tracking records gathered during more thanone year in Madeira to classify users as tourists or locals andidentify touristic spots

PYes Based on

TP-Link MR3240v2home router

[56] 2017 Indoor

Obtain semantic trajectories Classify users based on theirlocations Analyze the probability of a user going to a specificshop based on their history and propose the creation of arecommender based on the whole dataset

- -

occupation of different zones [39ndash42] and obtain behaviorpatterns [36 43ndash49]

Wi-Fi tracking of a specific user is usually done byanalyzing the collected records related to a specific MACaddress ie tracking users is equivalent to tracking theirMAC address This technique in general presents issuesrelated to tracking people individually and privacy Althoughit is true that a user carrying several devices (eg smartphonetablet laptop) with the Wi-Fi interfaces on would be at firstidentified as several users after a reasonable period of timethis information can be correlated to fix the problem [50]In the case of the work presented in this paper the filesassociated with the measurements of different sensors of thesame building are compared in order to avoid counting thesame mobileperson several times (eg due to overlappingWi-Fi cells) The resulting file allows for the analysis ofpeople flow at a building level identifying the total staytime or the frequency of the visited places However as itis explained below on the position analysis subsection thisproblem appears in the collected data and it is necessary

to perform a postprocessing of the data to deal with thesecollisions Users may also use so-called MAC spoofing (iereplacing their actual MAC address by the MAC addressof another device) what could be seen as a kind of attackHowever this may yield connectivity problems and it is anegligible behavior in the scenario considered in this paper

Regarding privacy several proposals to protect Wi-Ficommunications by means of MAC address anonymizationhave arisen in recent years First these proposals appearedas apps for smartphones (allowing performing kind of MACspoofing) but recently smartphone manufacturers havestarted including these techniques in the latest versions oftheir Operating Systems (OS) (eg iOS Android and Win-dows) Such MAC address anonymization techniques aim toavoid using the actual MAC address until the device getsconnected to the Wi-Fi network (ie they use a fake MACaddress in their probe frames) The specific solution for thisproblem depends on the manufacturer and OS In the caseof iOS the solution involves sending locally administeredMAC addresses in the probe frames randomly selecting the

Wireless Communications and Mobile Computing 7

three less significant bytes of the MAC address This can beeasily detected just by inspecting the first byte of the MACaddress In the case of Android some manufacturers havedecided to use random MAC addresses in the probe framesfrom the MAC address ranges assigned by the IEEE to themNevertheless evenwith these techniques in place it is possibleto end up obtaining the actual device information [51 52]In addition MAC randomization is not actually a relevantproblem in the scenario considered in this paper since mostof the devices are connected to the available Wi-Fi networks(eg Eduroam) and to connect to a Wi-Fi network devicesmust use their actual MAC addresses

Wi-Fi tracking systems can be classified into two maingroups those that use the enterprise Wireless Local AreaNetwork (WLAN) itself and those that use a dedicatedlow-cost passive Wi-Fi infrastructure which is indeed thecase of the actual deployment considered in this paperOne of the main drawbacks of the systems that use theenterpriseWLAN is that they can only track the users of suchnetworks whereas independent dedicated low-cost passiveWi-Fi infrastructures allow tracking the users connected todifferentWi-Fi networks if they spatially coexist or even notconnected to any if MAC randomization is not used

As examples of works that use the enterprise WLANor the existing infrastructure of access points (APs) [42]presents the 2016 Open Day of the European Joint ResearchCenter (JRC) where 8000 people participated within anarea of about 167 hectares as a case study where the Wi-Fiinfrastructure of the event was used to estimate the numberof participants and their space and time evolution based onproperly processed MAC addresses Reference [47] presentsMobiCamp a large-scale testbed composed of around 2700APs to study mobility-related activities which combinesuser mobility traces based on Simple Network ManagementProtocol (SNMP) data with enriched data (eg gender age)provided by an opt-in smartphone application

Reference [53] represents yet another example of theanalysis that can be made with this kind of information Itsscenario is a hospital and by checking the number of hoursand the positions of a user over time they can classify thatuser according to a role eg patient doctor administrativeetc Reference [38] identifies flocks walking in a building andtheir behaviors applying clustering techniques to the signalstrength measurements provided by the institutional WLAN

Reference [33] presents a campus scenario where byusing the university network infrastructure a detailed profileof the userrsquos activity can be obtained Users tracked arethose logged into the university network which providesextra information about the user such as their role genderetc Combining that information with a detailed map whichcontains thousands of Point of Interest (POIs) the authorscan extract an activity log that shows the different userrsquosactivities with a minute precision Themain drawback of thiswork is that it totally depends on the usersrsquo profiles databaseand the POIrsquos map and both are resources complicated togain access to or create In reference [45] employing theuniversity network infrastructure and the location of eachAP in the university campus each sensor record only storesthe closest AP With this simple information the authors

can create an activity profile similar to the one showed in[33] By analyzing the basic results obtained they are able toextract new information (eg a count of irregular hours) ordetect patterns of anomalous events (eg periods of exams orholidays)

As last example of systems that use the institutionalWLAN reference [40] presents a web application in whichthe occupation of different rooms on a campus is shown inreal time The number of people in the room is calculatedusing the number of Wi-Fi devices detected by the APsUsing the signal strength measurement of each of them aheatmap is drawn that shows the user distribution in theroomThe collected data are analyzed offline to make reportsof utilization of the different rooms and to obtain conclusionsfrom the detected patterns

On the other hand [37 48 49 54] represent someexamples of works which use independent dedicated low-cost passive Wi-Fi infrastructures as it is the case of thedeployment considered in this paper In the case of [48] anetwork of devices called MOBYWIT based on a RaspberryPi and two wireless USB dongles are used to track peopleand vehiclersquos movement sniffing not only Wi-Fi but alsoBluetooth signals emitted by smartphones and vehicle hand-free calling systems In the case of [49] a passive Wi-Fiinfrastructure based on low-cost devices that combine aRaspberry Pi and a TP-LINKWi-Fi dongle is used to provideuser localization user profiling and device classificationbased on the properly processed MAC addresses capturedfrom the IEEE 80211 probe request frames Reference [37]also uses this approach to count people in a concert andin reference [54] the considered scenario is a whole island(Madeira Portugal) where the records gathered all over thereare analyzed to classify users as tourists or locals as well as toidentify touristic spots

One of the main features that make the work presentedin this paper to stand out compared to previous work is thatone-year data from an actual Wi-Fi tracking system deployedin a real-life environment is analyzed Most of previous worksconsider hours or a few days (eg weekdays) or weeks Onlythe work presented in [54] covers a similar period of time(being even larger) but the analysis is much broader beingfar away from the level of detail provided in this paper Theconsidered period of time allows analyzing seasonality effectsand other patterns that although may be seen as commonknowledge do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become models which can be used todetect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning It is also worthto mention the use of clustering to improve the data analysisand interpretation (as in previous works eg [38]) as wellas the application of the semantic trajectory concept [55]which combines positioning data with an external source of

8 Wireless Communications and Mobile Computing

(a) (b) (c)

Figure 3 (a) Raspberry Pi (b) TP-LINK USBWi-Fi dongle (c) developed Wi-Fi listening device

information to classify the different positions according to theactivity carried out in the area (eg users positioned in thecantina will be eating) and which has not been extensivelyexplored in previous works (eg [56])

Another strong point of the work presented in this papercompared to previous work is the validation of the Wi-Fitracking technique to estimate the occupation of the libraryof theTelecommunications School and the comparisonwith avideo camera system Only a few previous works such as [40]or [49] perform a similar validation (in [40] the number ofpeople detected in the room is compared with the attendancelist of the seminar taking place there and in [49] students arerequested to turn on the Wi-Fi interfaces of their devices andprovide the MAC addresses of their smartphones and laptopsthrough an anonymous web form to serve as ground truth fordevice classification) However again the period consideredin this paper is much larger and it is proved that the Wi-Fi tracking system outperforms the more expensive videocamera system As a matter of fact theWi-Fi tracking systemis actually used in a day-to-day basis by the library staff whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

3 Data Acquisition Infrastructure

Figure 3 shows the Wi-Fi listening devicesensor developedfor the people flow monitoring sensor network of the SmartCEI Moncloa It is based on very common and cheaphardware namely a Raspberry Pi board [64] one of the mostwidely used hardware for IoT and a TP-LINK USB Wi-Fidongle [65] configured in monitor mode As a result the costof this solution is in the order of tens of euros (around 80eurowhenmanufactured on 2016) which represents a remarkablecost reduction compared to other solutions available in themarket

As Figure 4 illustrates broadly speaking these Wi-Fisensors scan each of the Wi-Fi channels from both the24GHz and the 5GHz bands during a configurable amountof time (currently 250ms) read the header of the radio IEEE80211 packets (eg data packets or probe requests) in itsregion of coverage and record the sender MAC addressesAs these MAC addresses are unique per device counting

them is a good indicator of the number of devices availablein the surroundings of the Wi-Fi sensors (although theremay be issues related to the fact that a single user can bringseveral devices as already pointed out in Section 2) and theyallow for temporal correlation analysis thus obtaining usefulinformation such as stay time availability patterns etc

Regarding how this information is carried all the way upto the backend server and processed there Figure 5 showsanoverview of the communications architecture and protocolstack of the people flow monitoring service of the Smart CEIMoncloa

As it can be seen theWi-Fi sensors are directly connectedto the backend via the Ethernet network of the UPM Com-munications are protected end-to-end by the use of TransportLayer Security (TLS) on top of Transport Control Protocol(TCP)Internet Protocol (IP) Measurements are periodicallysent using Message Queue Telemetry Transport (MQTT)[66] The publishsubscribe mechanism provided by MQTTallows the Wi-Fi sensors not only to sendmeasurements (ieevents) periodically but also to receive commands (eg toreboot them or to perform a remote firmware update)

Taking advantage of the hierarchical structure of theMQTT topics all the publish events follow the structure SER-VICEIDEVENT(TIMESTAMP) Thus the publish eventsfrom the Wi-Fi sensors start by Wi-Fi followed by the MACaddress of its Ethernet interface which is used as unique IDThe format of the content published under the different topicsis Comma Separated Value (CSV) which is a lightweightsolution especially appropriate when the data structure isfixed since the meaning of each field of the subsequent linesis explained only in the header at the beginning of the file

Privacy issues have been also considered carefully thedeveloped Wi-Fi sensors apply an irreversible hash MD-5function with salt to the MAC address which avoids brute-force attackswith precomputed tables In addition as Figure 5shows once anonymized the data are carried securely up tothe platform servers where they are handled in an aggregatemanner instead of individually

Furthermore the software of the developed sensors hasbeenmodified in order to avoid that theMACanonymizationmechanisms presented in Section 2 affect the obtained mea-surements Thus the Wi-Fi frames with locally administered

Wireless Communications and Mobile Computing 9

AP1User1

User2

User4

User3

DataMAC_U3(ch3)

DataMAC_U1(ch1)

Probe ReqMAC_U2(chN)

Probe ReqMAC_U4(chM)

(ch1)

The developed Wi-Fi Listening Device islistening ldquoxrdquo ms at each channel

AP2(ch2)

AP3(ch3)

AP4(ch4)

Figure 4 Sketch of how the developed Wi-Fi listening devices obtain the MAC addresses of the users surrounding them

WiFiSensors

MQTTTLSTCP

IP8023

MQTTTLSTCP

IP8023

Backend Server

MQTT Broker

Real-time ampbatch processing

REST APIServer

CSV file-basedtemporal DB

DashboardWeb server

Figure 5 Communications architecture and protocol stack for the people flow monitoring service of the Smart CEI Moncloa

MAC addresses or including special MAC address ranges arediscarded so these devices are not considered Anyway as ithas been already mentioned in Section 2 MAC randomiza-tion is not actually such a big deal in our case since most ofthe users are connected to the Eduroam free Wi-Fi access sotheir smartphones end up using their actual MAC address

After gathering the CSV files for a large time span wemoved to an offline analysis platform on a private clusterThe Apache Spark 22 software platform was selected as thedata processing tool due to its optimized capabilities to workwith large amounts of structured data This drives into largedatasets which common centralized system will struggle toprocess but distributed systems such as Spark can handleeasily

The private cluster mentioned is composed by eightHP ProLiant SL250s Gen 8 machines with two Intel Xeone52630v2 26GHz (6 cores each) and RAM 32GB In addi-tion to Spark this platform also runs an Apache Hadoop

Filesystem where the dataset and the results are stored Thisis a distributed filesystem which allows that all the machinesaccess to the stored data in parallel The usage of suchplatform is advisable in order to speed up analysis but notmandatory since the algorithms we are going to describe areavailable in many other software platforms (such as thoseprovided with R or Python)

4 Data Analysis and Results

41 Data Processing Although this work is not a proper datamining process given that we are not using those kinds ofalgorithms and analyses the necessary steps previous to theactual analysis are the same that in a KDDmdashKnowledge Dis-covery in Databasesmdashprocess [67] For our analysis we tookthe aforementioned CSV format files each one containingthe data collected by a single sensor during a period of 15minutes The observation period used in our analysis is a full

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

2 Wireless Communications and Mobile Computing

9Wi-Fi tracking

1 Telecommunications2 Forestry I3 Forestry II4 Rectorate5 Civil6 Agricultural I7 Agricultural II

amp Aerospace8 Aeronautical

amp Sport12 Health Science

9 Naval amp Marine10 Architecture I11 Architecture II

13 IndustrialTotal

Environmental

Engineering schools Number of sensors

33 23 24 23 23 25 25 2

3 24 22 -3 2

5 252 25

Figure 1 Summary of the sensors deployed in Smart CEI Moncloa (at July 2018) Figure 1 is reproduced from [8] (2017)

around the World such as Delft University [4] University ofBritish Columbia [5] or Harvard University [6]

Universidad Politecnica de Madrid and in particularthe Campus of International Excellence (CEI) of Moncloapresents such a great potential in this sense This campusis integrated in the metropolitan area of Madrid spreadsacross 55 Km2 and counts on a daily flow that goes up to120000 people (which is comparable or even greater thanmany Spanish provincial capitals)

In order to make the most out of this potential withinthe UPMCity of the Future initiative [7] the IoT platform forSmart City services experimentation Smart CEIMoncloa wasdeployed [8]This platform offers currently two pilot servicesnamely environmental monitoring and people flow moni-toring The devices used for the environmental monitoringservice are based on Arduino and collect measurements oftemperature humidity luminosity noise CO and NO2 Thedevices used for the people flowmonitoring service are basedon Raspberry Pi and perform Wi-Fi tracking As Figure 1shows for the time being there are 77 devices deployed acrossthe 13 engineering schools of the CEI Moncloa 52 for thepeople flowmonitoring service and 25 for the environmentalmonitoring service The platform is up and running since2016 so there is plenty of data available to be analyzed

This paper focuses indeed on analyzing the peopleflow monitoring data gathered in the TelecommunicationsEngineering School (ETSIT) of UPM during 2016 Peopleflow monitoring represents a hot topic nowadays because itpresents such a wide range of applications in Smart Citiesspanning from monitoring public transport infrastructure(eg metro airports) private transport infrastructure (eghighways) overcrowded scenarios (eg demonstrationsconcerts) or customer behaviors (eg malls) to physicalsecurity applications (eg presence of unauthorized peoplein restricted areas) For these purposes in many cases it isenough with providing presence detection in given areas

instead of more sophisticated and costly solutions to performvery accurate location which require fingerprinting and verydense sensor deployments gathering data at very high fre-quencies This is the case indeed of the people flow monitor-ing service considered in this paper which is based on a fewlow-cost devices that upload data every 15 minutes and thatare independent from the institutional network which allowstracking the users connected to different Wi-Fi networks ifthey spatially coexist or even not connected to any

Hence as Figure 1 shows in the ETSIT which is one of thebiggest schools in the CEI Moncloa visited by 4000 peopledaily (3000 students 500 professors and researchers and 500admin andmaintenance staff approximately) there are 9Wi-Fi sensors covering 18 hectares of indoor areas Figure 2shows the location of these sensors Dark areas correspondto the floor of the buildings of the ETSIT (buildings A BC and D) There is a Wi-Fi sensor at the entry of eachbuilding In addition there are also sensors in the library(4) and student tables (3) as they are large spaces usuallycrowded by students As it can be also seen the area coveredby each sensor varies some of them covering especially largeareas such as the ones in the library (1300m2) or in the mainentrance (1270m2)

This paper aims to explore the useful insights that canbe obtained from such a cost-effective solution for peopleflowmonitoringThus the paper performs a detailed analysisof the people flow monitoring data including a temporalanalysis a spatial analysis and an activity pattern analysisas well as a comparison of the performance of this solutionwith a much more expensive one based on video monitoringat the library of the ETSIT These analyses can be used foroptimizing the management and operation of the schoolfrom the work shifts to the proper operation of the lightingto reduce energy consumption and so the carbon footprintThe conclusions can be valid for similar infrastructures butare also relevant in general for municipalities which will not

Wireless Communications and Mobile Computing 3

Sensor1 Entr A

2 Entr A Sec

3 Std Tables

4 Library

5 Entr B

6 Entr B Lab

7 Entr B Sec

8 Entr C

9 Entr D

Main entrance of the building A of the school

Secondary accesses to building A It covers most ofthe classrooms of this building

Sensor close to ldquoEntr Ardquo just above an area oftables where students work and study in groups

Inside the library which is open every day from 9am to 9 pm

Main entrance to building B This building containsmostly laboratories and offices

Secondary entrance of this building which is theclosest entrance to the laboratories

Secondary access to building B covering theclassrooms of this building

Main entrance of building C This building containsonly research laboratories and offices students donot have classes in this building

Entrance of building D covering backdoor of thelibrary

Description

Figure 2 Summary of the Wi-Fi sensors deployed in the Telecommunication Engineering SchoolThe map on the left-hand side also showsthe location of (120572) offices and labs (120573) classrooms (120574) cantina

typically be able to afford accurate and costly location systemsall over the city

The rest of the paper is structured as follows Section 2presents an exhaustive analysis of techniques currently usedfor monitoring people flows and identifying behaviors Sec-tion 3 outlines the main characteristics of the sensor infras-tructure and the IoT platform that collects the data analyzedin this paper After a brief sketch of the methodology usedin this work Section 4 presents the analysis performed ondata with respect to time and occupancy position of peopleand identification of common behaviors or activity patternsThen Section 5 describes the data available as ground truthand the validation of the analyzed system Finally Section 6summarizes the main findings of the analysis and presentssome ideas for building similar systems

2 Related Work

People flow monitoring has always been a useful pieceof information Knowing a personrsquos or a mass of peoplersquosposition or trajectory allows for the creation of a wide rangeof different applications such as crowd monitoring in eventsor concerts the discovery of the most common routes inshopping malls analysis of space usage in public or privateinfrastructures or security against unusual activities (egpresence of unauthorized people in restricted areas) In thelast decades the spread of communication technologies (egthe popularization of Wi-Fi networks or the use of smart-phones) has become a vast source of data which allows forthe improvement or even the automatization of techniques tomonitor people

In most of the cases positioning in outdoors scenarioscan be solved using Global Positioning System (GPS) but ittypically presents limitations in terms of accuracy in indoor

scenarios [9] Thus indoor location or pedestrian locationhas been a key research topic in the last years Some worksaim to discover with a high precision how people moveinside buildings by using the mobile network or personalarea networks eg [10] These systems try to help usersto discover their path in a building measure the lengthof their stay in a mall for commercial purposes or simplyease people movements by removing obstacles The mostcommon technologies used in recent years to achieve thiskind of pedestrian tracking have been video camera systemsradiofrequency signals Bluetooth smartphones sensors andWi-Fi networks These tracking methods can be classifiedbased on two aspects

(i) Whether they need user intervention like the usageof a smartphone application or a specific hardwarewhich would be classified as active or do not need thecooperation of the users to work ie passive systems

(ii) Whether or not a sensor network deployment isneeded (eg by using the existing Wi-Fi access pointsnetwork)

The usage of video camera systems and computer visionpresents some advantages such as the fact that it is a passivesystem which can use existing camera network deploymentsor provides almost real time feedback However it alsopresents many drawbacks although there are works thatpropose ways to mitigate these effects [11 12] such as itsdependency on visual aspects (eg poor lighting obstacles)its low scalability due to deployment complexity and coststhe difficulty of fusing multiple video streams to provideautomatic monitoring or the difficulty of tracking usersindividually Reference [13] provides a survey on computervision techniques for the specific case of crowd scene analysis

4 Wireless Communications and Mobile Computing

Table 1 Summary of the state of the art of people flow monitoring techniques which are not based on Wi-Fi tracking (PA stands forPassiveActive)

Technology Ref Year Scenario Application PA Owndeployment

Video camera ampComputer Vision [13] 2015 Indoor

Outdoor Crowd scene analysis P No

RF [14ndash16]201320142015

Indoor Short range movements (1-10 m) with highaccuracy (eg elderly care baby care) P Yes

Cellular networks [18] 2011 IndoorOutdoor

Analyze people flow in a suburban area newNYC with accuracy around hundreds ofmeters

P No

GPS [19] 2015 Outdoor Prevent critical situations in overcrowdedscenarios (eg concerts) A No

Smartphone sensors(eg accelerometer) [20] 2018 Indoor PDR Track individual pedestrian A Depends

Bluetooth [21] 2017 Indoor Monitor people flow (comparing Wi-Fi withBluetooth) A Yes

covering from existing models and popular algorithms tocurrent research problems and trends

People localization and tracking based on radiofrequency(RF) measurements has been also widely addressed in liter-ature References [14ndash16] represent some recent remarkableresearch work on this topic These solutions are based onantennas that transmit RF signals and are able to locate andtrack people based on body radio reflections In consequencethey can be classified as passive systems They provide veryaccurate results allowing tracking forward and backwardmovements or body part movements (including breathing)and are able to even track several people under certainconfigurations However it is also difficult to track a fairlyhigh number of users individually based on this kind oftechniques (even if we assume that a person may have acertain type of body radio reflection signature it wouldnot be different enough between similar people and mightchange depending on the distance between the person andthe vantage points) They are mainly applied in short rangemovements (1-10 meters) so they could only be applied insmall rooms at the cost of a large and specific deployment

People flowmonitoring can be also performed exploitingthe ubiquity communications capabilities and integratedsensors of smartphones [17] In [18] for instance Call DataRecords (CDR) are used to locate the base stations thesmartphones are attached to and thus analyze people flow inand out of a suburban city near New York City The maindrawbacks of this way of locating and tracking people areits low accuracy (around hundreds of meters) and that theterminal has to be active (eg make or receive a call orsend or receive an SMS) in order to be monitored althoughthis technique would be classified as passive given thatthe needed user activity is not aimed to contribute to thepositioning

Reference [19] proposes a solution to prevent criticalsituations in overcrowded scenarios based on a smartphoneapplication that transmits its GPS location data Although ityields good results in terms of accuracy the main drawbacks

of this approach are that it is an active technique sinceit requires the installation of the app the impact of theconsumption of the app on the autonomy of the terminal andit may also present problems to work properly indoors

Smartphone sensors can be also used to perform Pedes-trian Dead Reckoning (PDR) which is a method that tries toestimate a pedestrianrsquos position based on their past positionand the output of smartphonersquos sensors eg accelerometergyroscope etc This technique is usually supported by otherpositioning methods [20] but in most cases it does not needa specific sensor network other than the smartphone itself Itis an active system and it is typically used to track individualpedestrians instead of flocks but the main drawback of thistechnique is that it suffers tracking error accumulation andneeds an extra location measure frequently

Bluetooth has also been used to monitor people flowsin indoor environments Some works such as [21] performa comparison between the usage of Bluetooth and othertechniques such as Wi-Fi and its conclusion is that itscapabilities are belowother options although it can be used incombination with other positioning systems to improve theiraccuracy The main disadvantage of this tracking systems isthat most Bluetooth devices only react to Bluetooth signalswhen the users make them visible to the network Also theseimplementations require a specific deployment of Bluetoothsensors which usually cannot be used for other tasks Table 1summarizes the previous research work reviewed so farwithout considering Wi-Fi tracking based works

Lastly people tracking systems based onWi-Fi have beena hot topic from more than fifteen years and it is still so Thisis indeed the technique used in this paper Thus Table 2 isexclusively devoted to summarizing and comparing previousresearch work based on this technique

As it is shown in Table 2 works related to Wi-Fi trackingtechniques can be focused on different objectives sometry to obtain usersrsquo positions as accurately as possible [22ndash32] others analyze the trajectories followed by pedestrians[33ndash35] or flocks [36ndash38] and finally others study the

Wireless Communications and Mobile Computing 5

Table 2 Summary of the state of the art of people flow monitoring techniques based on Wi-Fi tracking (PA stands for PassiveActive)

Ref Year Scenario Application PA Own deployment

[22] 2003 Indoor One of the earliest approaches on precise positioning usingWi-Fi (precision 26m) A Yes

[23] 2006 IndoorWi-Fi fingerprint to identify the general location andapplying logistic regression to distinguish betweenfiner-grained locations

A Yes

[24] 2006 Indoor officebuilding

Precise positioning PDR combined with Wi-Fi to reduce theaccumulated error A Yes

[25] 2007 Outdoor andindoor

Creation of Wi-Fi map Positioning comparing with thecreated map A Yes Own hardware

Offline analysis

[26] 2007 Indoor campus Comparison of positioning and tracking methods usingWi-Fi P No Offline

[27] 2008 Indoor campus Estimate the position using Wi-Fi and tracking with PDR A Yes

[28] 2009 Indoor campus Real time Wi-Fi positioning web portal to check userrsquospositions A Yes

[39] 2009 Indoor campus Count of users in different buildings Analysis of usersrsquomobility between buildings P No Institutional APs

[57] 2010Indoor campus

and officebuilding

Coarse position with Wi-Fi and Bluetooth Graph of userco-occurrence A No Offline

[43] 2010Indoor campus

and officebuilding

Calculate of stay length based on Wi-Fi positioning Analysisof favorite locations A No Offline

[44] 2011 Indoor campus Extension to [39] User characterization based on theirmobility patterns P No Institutional APs

[29] 2011 Indoor tunnelin construction

Precise positioning in real time of workers inside aconstructing tunnel using Wi-Fi (precision 5m) P Yes Own AP

deployment

[30] 2012 Indoor campus Creation of Wi-Fi fingerprint map Map usage to positioningwith smartphone application A No

[36] 2012 Indoor campus Study of crowd movement Wi-Fi based Analysis of mobilitypatterns usersrsquo arrivals and departures from campus P No Institutional APs

[58] 2013 Indoor Wi-Fi path analysis in real time A No Institutional APs

[33] 2014 Indoor andoutdoor campus

Analyze pedestrian destination frequencies in an area of 55hectares of a university campus during 5 weekdays P No Institutional APs

and Radius server

[59] 2014 Indoor campusLocalization and tracking system exploiting particle filters tocombine dead reckoning Wi-Fi RSS-based analyzing andknowledge of floor plan together (precision 07m)

A

[60] 2015 Indoorshopping mall

Wi-Fi Channel State Information analysis to detect shopperactivities P Yes Own AP

deployment

[31] 2015 IndoorPrecise positioning based on sensor fusion combining Wi-FiPDR and landmarks Smartphone application (Positioning1m)

A No Smartphones

[32] 2015 Indoor parking Precise positioning combining Wi-Fi RSS andelectromagnetic field map

[37] 2015Outdoor

concert Indoorcampus

Portable Wi-Fi based user count Analysis of crowds inconcert and in campus P Yes Raspberry Pi

based

[61] 2015 Outdoor Creation of Wi-Fi map using GPS A[62] 2016 Indoor Precise positioning combining Wi-Fi and PDR A No Smartphones

[34] 2016 Indoor airportUser path detection Combining Wi-Fi GPS PDR andBluetooth to create a multilevel map and study of userrsquostrajectory prediction

A No Smartphones

[45] 2016 Indoor campus Analysis of usersrsquo activities User tagging based on activitiesregistered P No Institutional APs

[46] 2016 Indoor andoutdoor campus

Analysis of user movements to different food points topredict the operation of new stores based on price andlocation

P No Institutional APs

6 Wireless Communications and Mobile Computing

Table 2 Continued

Ref Year Scenario Application PA Own deployment[63] 2017 Indoor Crowdsourcing positioning based on Wi-Fi fingerprint A No Institutional APs

[41] 2018 IndoorCoarse positioning room level precision based onprobabilistic Wi-Fi fingerprint Usage of Hidden Markovchain models to analyze user movement

P No Institutional APs

[35] 2016 Indoor Trajectory analysis based on Hidden Markov chain models P No Institutional APs

[42] 2017 IndoorEstimate the number of participants and their space and timeevolution in an area of about 167 hectares during 2016 OpenDay of the European JRC

P No Institutional APs

[47] 2016 IndoorOutdoor

Study mobility-related activities in a campus of 440 hectaresbased on the 2700 APs of the institutional network andadditional opt-in smartphone application

A No Institutional APs

[53] 2014 Indoor

Classify users in a hospital (eg patient doctoradministrative) by checking the number of hours and thepositions of a user over time based on the institutional Wi-Finetwork

P No Institutional APs

[38] 2012 IndoorIdentify flocks walking in a building and their behavior basedon signal strength from the institutional Wi-Fi and usingclustering techniques

P No Institutional APs

[45] 2016 Indoor Analyze usersrsquo occupation (based on Markov models) as wellas regular and irregular hours in a university campus P No Institutional APs

[40] 2016 Indoor Analyze room utilization and people tracking providing heatmaps Analyze device statistics A No Institutional APs

[48] 2017 IndoorOutdoor

Analyze people mobility monitoring and tracking in SmartCities and traffic in a highway (eg driving behavior trafficforecasting)

P Yes Raspberry Pibased

[49] 2018 IndoorOutdoor

Provide user localization user profiling and deviceclassification A Yes Raspberry Pi

based

[54] 2005 IndoorOutdoor

Analyze Wi-Fi tracking records gathered during more thanone year in Madeira to classify users as tourists or locals andidentify touristic spots

PYes Based on

TP-Link MR3240v2home router

[56] 2017 Indoor

Obtain semantic trajectories Classify users based on theirlocations Analyze the probability of a user going to a specificshop based on their history and propose the creation of arecommender based on the whole dataset

- -

occupation of different zones [39ndash42] and obtain behaviorpatterns [36 43ndash49]

Wi-Fi tracking of a specific user is usually done byanalyzing the collected records related to a specific MACaddress ie tracking users is equivalent to tracking theirMAC address This technique in general presents issuesrelated to tracking people individually and privacy Althoughit is true that a user carrying several devices (eg smartphonetablet laptop) with the Wi-Fi interfaces on would be at firstidentified as several users after a reasonable period of timethis information can be correlated to fix the problem [50]In the case of the work presented in this paper the filesassociated with the measurements of different sensors of thesame building are compared in order to avoid counting thesame mobileperson several times (eg due to overlappingWi-Fi cells) The resulting file allows for the analysis ofpeople flow at a building level identifying the total staytime or the frequency of the visited places However as itis explained below on the position analysis subsection thisproblem appears in the collected data and it is necessary

to perform a postprocessing of the data to deal with thesecollisions Users may also use so-called MAC spoofing (iereplacing their actual MAC address by the MAC addressof another device) what could be seen as a kind of attackHowever this may yield connectivity problems and it is anegligible behavior in the scenario considered in this paper

Regarding privacy several proposals to protect Wi-Ficommunications by means of MAC address anonymizationhave arisen in recent years First these proposals appearedas apps for smartphones (allowing performing kind of MACspoofing) but recently smartphone manufacturers havestarted including these techniques in the latest versions oftheir Operating Systems (OS) (eg iOS Android and Win-dows) Such MAC address anonymization techniques aim toavoid using the actual MAC address until the device getsconnected to the Wi-Fi network (ie they use a fake MACaddress in their probe frames) The specific solution for thisproblem depends on the manufacturer and OS In the caseof iOS the solution involves sending locally administeredMAC addresses in the probe frames randomly selecting the

Wireless Communications and Mobile Computing 7

three less significant bytes of the MAC address This can beeasily detected just by inspecting the first byte of the MACaddress In the case of Android some manufacturers havedecided to use random MAC addresses in the probe framesfrom the MAC address ranges assigned by the IEEE to themNevertheless evenwith these techniques in place it is possibleto end up obtaining the actual device information [51 52]In addition MAC randomization is not actually a relevantproblem in the scenario considered in this paper since mostof the devices are connected to the available Wi-Fi networks(eg Eduroam) and to connect to a Wi-Fi network devicesmust use their actual MAC addresses

Wi-Fi tracking systems can be classified into two maingroups those that use the enterprise Wireless Local AreaNetwork (WLAN) itself and those that use a dedicatedlow-cost passive Wi-Fi infrastructure which is indeed thecase of the actual deployment considered in this paperOne of the main drawbacks of the systems that use theenterpriseWLAN is that they can only track the users of suchnetworks whereas independent dedicated low-cost passiveWi-Fi infrastructures allow tracking the users connected todifferentWi-Fi networks if they spatially coexist or even notconnected to any if MAC randomization is not used

As examples of works that use the enterprise WLANor the existing infrastructure of access points (APs) [42]presents the 2016 Open Day of the European Joint ResearchCenter (JRC) where 8000 people participated within anarea of about 167 hectares as a case study where the Wi-Fiinfrastructure of the event was used to estimate the numberof participants and their space and time evolution based onproperly processed MAC addresses Reference [47] presentsMobiCamp a large-scale testbed composed of around 2700APs to study mobility-related activities which combinesuser mobility traces based on Simple Network ManagementProtocol (SNMP) data with enriched data (eg gender age)provided by an opt-in smartphone application

Reference [53] represents yet another example of theanalysis that can be made with this kind of information Itsscenario is a hospital and by checking the number of hoursand the positions of a user over time they can classify thatuser according to a role eg patient doctor administrativeetc Reference [38] identifies flocks walking in a building andtheir behaviors applying clustering techniques to the signalstrength measurements provided by the institutional WLAN

Reference [33] presents a campus scenario where byusing the university network infrastructure a detailed profileof the userrsquos activity can be obtained Users tracked arethose logged into the university network which providesextra information about the user such as their role genderetc Combining that information with a detailed map whichcontains thousands of Point of Interest (POIs) the authorscan extract an activity log that shows the different userrsquosactivities with a minute precision Themain drawback of thiswork is that it totally depends on the usersrsquo profiles databaseand the POIrsquos map and both are resources complicated togain access to or create In reference [45] employing theuniversity network infrastructure and the location of eachAP in the university campus each sensor record only storesthe closest AP With this simple information the authors

can create an activity profile similar to the one showed in[33] By analyzing the basic results obtained they are able toextract new information (eg a count of irregular hours) ordetect patterns of anomalous events (eg periods of exams orholidays)

As last example of systems that use the institutionalWLAN reference [40] presents a web application in whichthe occupation of different rooms on a campus is shown inreal time The number of people in the room is calculatedusing the number of Wi-Fi devices detected by the APsUsing the signal strength measurement of each of them aheatmap is drawn that shows the user distribution in theroomThe collected data are analyzed offline to make reportsof utilization of the different rooms and to obtain conclusionsfrom the detected patterns

On the other hand [37 48 49 54] represent someexamples of works which use independent dedicated low-cost passive Wi-Fi infrastructures as it is the case of thedeployment considered in this paper In the case of [48] anetwork of devices called MOBYWIT based on a RaspberryPi and two wireless USB dongles are used to track peopleand vehiclersquos movement sniffing not only Wi-Fi but alsoBluetooth signals emitted by smartphones and vehicle hand-free calling systems In the case of [49] a passive Wi-Fiinfrastructure based on low-cost devices that combine aRaspberry Pi and a TP-LINKWi-Fi dongle is used to provideuser localization user profiling and device classificationbased on the properly processed MAC addresses capturedfrom the IEEE 80211 probe request frames Reference [37]also uses this approach to count people in a concert andin reference [54] the considered scenario is a whole island(Madeira Portugal) where the records gathered all over thereare analyzed to classify users as tourists or locals as well as toidentify touristic spots

One of the main features that make the work presentedin this paper to stand out compared to previous work is thatone-year data from an actual Wi-Fi tracking system deployedin a real-life environment is analyzed Most of previous worksconsider hours or a few days (eg weekdays) or weeks Onlythe work presented in [54] covers a similar period of time(being even larger) but the analysis is much broader beingfar away from the level of detail provided in this paper Theconsidered period of time allows analyzing seasonality effectsand other patterns that although may be seen as commonknowledge do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become models which can be used todetect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning It is also worthto mention the use of clustering to improve the data analysisand interpretation (as in previous works eg [38]) as wellas the application of the semantic trajectory concept [55]which combines positioning data with an external source of

8 Wireless Communications and Mobile Computing

(a) (b) (c)

Figure 3 (a) Raspberry Pi (b) TP-LINK USBWi-Fi dongle (c) developed Wi-Fi listening device

information to classify the different positions according to theactivity carried out in the area (eg users positioned in thecantina will be eating) and which has not been extensivelyexplored in previous works (eg [56])

Another strong point of the work presented in this papercompared to previous work is the validation of the Wi-Fitracking technique to estimate the occupation of the libraryof theTelecommunications School and the comparisonwith avideo camera system Only a few previous works such as [40]or [49] perform a similar validation (in [40] the number ofpeople detected in the room is compared with the attendancelist of the seminar taking place there and in [49] students arerequested to turn on the Wi-Fi interfaces of their devices andprovide the MAC addresses of their smartphones and laptopsthrough an anonymous web form to serve as ground truth fordevice classification) However again the period consideredin this paper is much larger and it is proved that the Wi-Fi tracking system outperforms the more expensive videocamera system As a matter of fact theWi-Fi tracking systemis actually used in a day-to-day basis by the library staff whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

3 Data Acquisition Infrastructure

Figure 3 shows the Wi-Fi listening devicesensor developedfor the people flow monitoring sensor network of the SmartCEI Moncloa It is based on very common and cheaphardware namely a Raspberry Pi board [64] one of the mostwidely used hardware for IoT and a TP-LINK USB Wi-Fidongle [65] configured in monitor mode As a result the costof this solution is in the order of tens of euros (around 80eurowhenmanufactured on 2016) which represents a remarkablecost reduction compared to other solutions available in themarket

As Figure 4 illustrates broadly speaking these Wi-Fisensors scan each of the Wi-Fi channels from both the24GHz and the 5GHz bands during a configurable amountof time (currently 250ms) read the header of the radio IEEE80211 packets (eg data packets or probe requests) in itsregion of coverage and record the sender MAC addressesAs these MAC addresses are unique per device counting

them is a good indicator of the number of devices availablein the surroundings of the Wi-Fi sensors (although theremay be issues related to the fact that a single user can bringseveral devices as already pointed out in Section 2) and theyallow for temporal correlation analysis thus obtaining usefulinformation such as stay time availability patterns etc

Regarding how this information is carried all the way upto the backend server and processed there Figure 5 showsanoverview of the communications architecture and protocolstack of the people flow monitoring service of the Smart CEIMoncloa

As it can be seen theWi-Fi sensors are directly connectedto the backend via the Ethernet network of the UPM Com-munications are protected end-to-end by the use of TransportLayer Security (TLS) on top of Transport Control Protocol(TCP)Internet Protocol (IP) Measurements are periodicallysent using Message Queue Telemetry Transport (MQTT)[66] The publishsubscribe mechanism provided by MQTTallows the Wi-Fi sensors not only to sendmeasurements (ieevents) periodically but also to receive commands (eg toreboot them or to perform a remote firmware update)

Taking advantage of the hierarchical structure of theMQTT topics all the publish events follow the structure SER-VICEIDEVENT(TIMESTAMP) Thus the publish eventsfrom the Wi-Fi sensors start by Wi-Fi followed by the MACaddress of its Ethernet interface which is used as unique IDThe format of the content published under the different topicsis Comma Separated Value (CSV) which is a lightweightsolution especially appropriate when the data structure isfixed since the meaning of each field of the subsequent linesis explained only in the header at the beginning of the file

Privacy issues have been also considered carefully thedeveloped Wi-Fi sensors apply an irreversible hash MD-5function with salt to the MAC address which avoids brute-force attackswith precomputed tables In addition as Figure 5shows once anonymized the data are carried securely up tothe platform servers where they are handled in an aggregatemanner instead of individually

Furthermore the software of the developed sensors hasbeenmodified in order to avoid that theMACanonymizationmechanisms presented in Section 2 affect the obtained mea-surements Thus the Wi-Fi frames with locally administered

Wireless Communications and Mobile Computing 9

AP1User1

User2

User4

User3

DataMAC_U3(ch3)

DataMAC_U1(ch1)

Probe ReqMAC_U2(chN)

Probe ReqMAC_U4(chM)

(ch1)

The developed Wi-Fi Listening Device islistening ldquoxrdquo ms at each channel

AP2(ch2)

AP3(ch3)

AP4(ch4)

Figure 4 Sketch of how the developed Wi-Fi listening devices obtain the MAC addresses of the users surrounding them

WiFiSensors

MQTTTLSTCP

IP8023

MQTTTLSTCP

IP8023

Backend Server

MQTT Broker

Real-time ampbatch processing

REST APIServer

CSV file-basedtemporal DB

DashboardWeb server

Figure 5 Communications architecture and protocol stack for the people flow monitoring service of the Smart CEI Moncloa

MAC addresses or including special MAC address ranges arediscarded so these devices are not considered Anyway as ithas been already mentioned in Section 2 MAC randomiza-tion is not actually such a big deal in our case since most ofthe users are connected to the Eduroam free Wi-Fi access sotheir smartphones end up using their actual MAC address

After gathering the CSV files for a large time span wemoved to an offline analysis platform on a private clusterThe Apache Spark 22 software platform was selected as thedata processing tool due to its optimized capabilities to workwith large amounts of structured data This drives into largedatasets which common centralized system will struggle toprocess but distributed systems such as Spark can handleeasily

The private cluster mentioned is composed by eightHP ProLiant SL250s Gen 8 machines with two Intel Xeone52630v2 26GHz (6 cores each) and RAM 32GB In addi-tion to Spark this platform also runs an Apache Hadoop

Filesystem where the dataset and the results are stored Thisis a distributed filesystem which allows that all the machinesaccess to the stored data in parallel The usage of suchplatform is advisable in order to speed up analysis but notmandatory since the algorithms we are going to describe areavailable in many other software platforms (such as thoseprovided with R or Python)

4 Data Analysis and Results

41 Data Processing Although this work is not a proper datamining process given that we are not using those kinds ofalgorithms and analyses the necessary steps previous to theactual analysis are the same that in a KDDmdashKnowledge Dis-covery in Databasesmdashprocess [67] For our analysis we tookthe aforementioned CSV format files each one containingthe data collected by a single sensor during a period of 15minutes The observation period used in our analysis is a full

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 3

Sensor1 Entr A

2 Entr A Sec

3 Std Tables

4 Library

5 Entr B

6 Entr B Lab

7 Entr B Sec

8 Entr C

9 Entr D

Main entrance of the building A of the school

Secondary accesses to building A It covers most ofthe classrooms of this building

Sensor close to ldquoEntr Ardquo just above an area oftables where students work and study in groups

Inside the library which is open every day from 9am to 9 pm

Main entrance to building B This building containsmostly laboratories and offices

Secondary entrance of this building which is theclosest entrance to the laboratories

Secondary access to building B covering theclassrooms of this building

Main entrance of building C This building containsonly research laboratories and offices students donot have classes in this building

Entrance of building D covering backdoor of thelibrary

Description

Figure 2 Summary of the Wi-Fi sensors deployed in the Telecommunication Engineering SchoolThe map on the left-hand side also showsthe location of (120572) offices and labs (120573) classrooms (120574) cantina

typically be able to afford accurate and costly location systemsall over the city

The rest of the paper is structured as follows Section 2presents an exhaustive analysis of techniques currently usedfor monitoring people flows and identifying behaviors Sec-tion 3 outlines the main characteristics of the sensor infras-tructure and the IoT platform that collects the data analyzedin this paper After a brief sketch of the methodology usedin this work Section 4 presents the analysis performed ondata with respect to time and occupancy position of peopleand identification of common behaviors or activity patternsThen Section 5 describes the data available as ground truthand the validation of the analyzed system Finally Section 6summarizes the main findings of the analysis and presentssome ideas for building similar systems

2 Related Work

People flow monitoring has always been a useful pieceof information Knowing a personrsquos or a mass of peoplersquosposition or trajectory allows for the creation of a wide rangeof different applications such as crowd monitoring in eventsor concerts the discovery of the most common routes inshopping malls analysis of space usage in public or privateinfrastructures or security against unusual activities (egpresence of unauthorized people in restricted areas) In thelast decades the spread of communication technologies (egthe popularization of Wi-Fi networks or the use of smart-phones) has become a vast source of data which allows forthe improvement or even the automatization of techniques tomonitor people

In most of the cases positioning in outdoors scenarioscan be solved using Global Positioning System (GPS) but ittypically presents limitations in terms of accuracy in indoor

scenarios [9] Thus indoor location or pedestrian locationhas been a key research topic in the last years Some worksaim to discover with a high precision how people moveinside buildings by using the mobile network or personalarea networks eg [10] These systems try to help usersto discover their path in a building measure the lengthof their stay in a mall for commercial purposes or simplyease people movements by removing obstacles The mostcommon technologies used in recent years to achieve thiskind of pedestrian tracking have been video camera systemsradiofrequency signals Bluetooth smartphones sensors andWi-Fi networks These tracking methods can be classifiedbased on two aspects

(i) Whether they need user intervention like the usageof a smartphone application or a specific hardwarewhich would be classified as active or do not need thecooperation of the users to work ie passive systems

(ii) Whether or not a sensor network deployment isneeded (eg by using the existing Wi-Fi access pointsnetwork)

The usage of video camera systems and computer visionpresents some advantages such as the fact that it is a passivesystem which can use existing camera network deploymentsor provides almost real time feedback However it alsopresents many drawbacks although there are works thatpropose ways to mitigate these effects [11 12] such as itsdependency on visual aspects (eg poor lighting obstacles)its low scalability due to deployment complexity and coststhe difficulty of fusing multiple video streams to provideautomatic monitoring or the difficulty of tracking usersindividually Reference [13] provides a survey on computervision techniques for the specific case of crowd scene analysis

4 Wireless Communications and Mobile Computing

Table 1 Summary of the state of the art of people flow monitoring techniques which are not based on Wi-Fi tracking (PA stands forPassiveActive)

Technology Ref Year Scenario Application PA Owndeployment

Video camera ampComputer Vision [13] 2015 Indoor

Outdoor Crowd scene analysis P No

RF [14ndash16]201320142015

Indoor Short range movements (1-10 m) with highaccuracy (eg elderly care baby care) P Yes

Cellular networks [18] 2011 IndoorOutdoor

Analyze people flow in a suburban area newNYC with accuracy around hundreds ofmeters

P No

GPS [19] 2015 Outdoor Prevent critical situations in overcrowdedscenarios (eg concerts) A No

Smartphone sensors(eg accelerometer) [20] 2018 Indoor PDR Track individual pedestrian A Depends

Bluetooth [21] 2017 Indoor Monitor people flow (comparing Wi-Fi withBluetooth) A Yes

covering from existing models and popular algorithms tocurrent research problems and trends

People localization and tracking based on radiofrequency(RF) measurements has been also widely addressed in liter-ature References [14ndash16] represent some recent remarkableresearch work on this topic These solutions are based onantennas that transmit RF signals and are able to locate andtrack people based on body radio reflections In consequencethey can be classified as passive systems They provide veryaccurate results allowing tracking forward and backwardmovements or body part movements (including breathing)and are able to even track several people under certainconfigurations However it is also difficult to track a fairlyhigh number of users individually based on this kind oftechniques (even if we assume that a person may have acertain type of body radio reflection signature it wouldnot be different enough between similar people and mightchange depending on the distance between the person andthe vantage points) They are mainly applied in short rangemovements (1-10 meters) so they could only be applied insmall rooms at the cost of a large and specific deployment

People flowmonitoring can be also performed exploitingthe ubiquity communications capabilities and integratedsensors of smartphones [17] In [18] for instance Call DataRecords (CDR) are used to locate the base stations thesmartphones are attached to and thus analyze people flow inand out of a suburban city near New York City The maindrawbacks of this way of locating and tracking people areits low accuracy (around hundreds of meters) and that theterminal has to be active (eg make or receive a call orsend or receive an SMS) in order to be monitored althoughthis technique would be classified as passive given thatthe needed user activity is not aimed to contribute to thepositioning

Reference [19] proposes a solution to prevent criticalsituations in overcrowded scenarios based on a smartphoneapplication that transmits its GPS location data Although ityields good results in terms of accuracy the main drawbacks

of this approach are that it is an active technique sinceit requires the installation of the app the impact of theconsumption of the app on the autonomy of the terminal andit may also present problems to work properly indoors

Smartphone sensors can be also used to perform Pedes-trian Dead Reckoning (PDR) which is a method that tries toestimate a pedestrianrsquos position based on their past positionand the output of smartphonersquos sensors eg accelerometergyroscope etc This technique is usually supported by otherpositioning methods [20] but in most cases it does not needa specific sensor network other than the smartphone itself Itis an active system and it is typically used to track individualpedestrians instead of flocks but the main drawback of thistechnique is that it suffers tracking error accumulation andneeds an extra location measure frequently

Bluetooth has also been used to monitor people flowsin indoor environments Some works such as [21] performa comparison between the usage of Bluetooth and othertechniques such as Wi-Fi and its conclusion is that itscapabilities are belowother options although it can be used incombination with other positioning systems to improve theiraccuracy The main disadvantage of this tracking systems isthat most Bluetooth devices only react to Bluetooth signalswhen the users make them visible to the network Also theseimplementations require a specific deployment of Bluetoothsensors which usually cannot be used for other tasks Table 1summarizes the previous research work reviewed so farwithout considering Wi-Fi tracking based works

Lastly people tracking systems based onWi-Fi have beena hot topic from more than fifteen years and it is still so Thisis indeed the technique used in this paper Thus Table 2 isexclusively devoted to summarizing and comparing previousresearch work based on this technique

As it is shown in Table 2 works related to Wi-Fi trackingtechniques can be focused on different objectives sometry to obtain usersrsquo positions as accurately as possible [22ndash32] others analyze the trajectories followed by pedestrians[33ndash35] or flocks [36ndash38] and finally others study the

Wireless Communications and Mobile Computing 5

Table 2 Summary of the state of the art of people flow monitoring techniques based on Wi-Fi tracking (PA stands for PassiveActive)

Ref Year Scenario Application PA Own deployment

[22] 2003 Indoor One of the earliest approaches on precise positioning usingWi-Fi (precision 26m) A Yes

[23] 2006 IndoorWi-Fi fingerprint to identify the general location andapplying logistic regression to distinguish betweenfiner-grained locations

A Yes

[24] 2006 Indoor officebuilding

Precise positioning PDR combined with Wi-Fi to reduce theaccumulated error A Yes

[25] 2007 Outdoor andindoor

Creation of Wi-Fi map Positioning comparing with thecreated map A Yes Own hardware

Offline analysis

[26] 2007 Indoor campus Comparison of positioning and tracking methods usingWi-Fi P No Offline

[27] 2008 Indoor campus Estimate the position using Wi-Fi and tracking with PDR A Yes

[28] 2009 Indoor campus Real time Wi-Fi positioning web portal to check userrsquospositions A Yes

[39] 2009 Indoor campus Count of users in different buildings Analysis of usersrsquomobility between buildings P No Institutional APs

[57] 2010Indoor campus

and officebuilding

Coarse position with Wi-Fi and Bluetooth Graph of userco-occurrence A No Offline

[43] 2010Indoor campus

and officebuilding

Calculate of stay length based on Wi-Fi positioning Analysisof favorite locations A No Offline

[44] 2011 Indoor campus Extension to [39] User characterization based on theirmobility patterns P No Institutional APs

[29] 2011 Indoor tunnelin construction

Precise positioning in real time of workers inside aconstructing tunnel using Wi-Fi (precision 5m) P Yes Own AP

deployment

[30] 2012 Indoor campus Creation of Wi-Fi fingerprint map Map usage to positioningwith smartphone application A No

[36] 2012 Indoor campus Study of crowd movement Wi-Fi based Analysis of mobilitypatterns usersrsquo arrivals and departures from campus P No Institutional APs

[58] 2013 Indoor Wi-Fi path analysis in real time A No Institutional APs

[33] 2014 Indoor andoutdoor campus

Analyze pedestrian destination frequencies in an area of 55hectares of a university campus during 5 weekdays P No Institutional APs

and Radius server

[59] 2014 Indoor campusLocalization and tracking system exploiting particle filters tocombine dead reckoning Wi-Fi RSS-based analyzing andknowledge of floor plan together (precision 07m)

A

[60] 2015 Indoorshopping mall

Wi-Fi Channel State Information analysis to detect shopperactivities P Yes Own AP

deployment

[31] 2015 IndoorPrecise positioning based on sensor fusion combining Wi-FiPDR and landmarks Smartphone application (Positioning1m)

A No Smartphones

[32] 2015 Indoor parking Precise positioning combining Wi-Fi RSS andelectromagnetic field map

[37] 2015Outdoor

concert Indoorcampus

Portable Wi-Fi based user count Analysis of crowds inconcert and in campus P Yes Raspberry Pi

based

[61] 2015 Outdoor Creation of Wi-Fi map using GPS A[62] 2016 Indoor Precise positioning combining Wi-Fi and PDR A No Smartphones

[34] 2016 Indoor airportUser path detection Combining Wi-Fi GPS PDR andBluetooth to create a multilevel map and study of userrsquostrajectory prediction

A No Smartphones

[45] 2016 Indoor campus Analysis of usersrsquo activities User tagging based on activitiesregistered P No Institutional APs

[46] 2016 Indoor andoutdoor campus

Analysis of user movements to different food points topredict the operation of new stores based on price andlocation

P No Institutional APs

6 Wireless Communications and Mobile Computing

Table 2 Continued

Ref Year Scenario Application PA Own deployment[63] 2017 Indoor Crowdsourcing positioning based on Wi-Fi fingerprint A No Institutional APs

[41] 2018 IndoorCoarse positioning room level precision based onprobabilistic Wi-Fi fingerprint Usage of Hidden Markovchain models to analyze user movement

P No Institutional APs

[35] 2016 Indoor Trajectory analysis based on Hidden Markov chain models P No Institutional APs

[42] 2017 IndoorEstimate the number of participants and their space and timeevolution in an area of about 167 hectares during 2016 OpenDay of the European JRC

P No Institutional APs

[47] 2016 IndoorOutdoor

Study mobility-related activities in a campus of 440 hectaresbased on the 2700 APs of the institutional network andadditional opt-in smartphone application

A No Institutional APs

[53] 2014 Indoor

Classify users in a hospital (eg patient doctoradministrative) by checking the number of hours and thepositions of a user over time based on the institutional Wi-Finetwork

P No Institutional APs

[38] 2012 IndoorIdentify flocks walking in a building and their behavior basedon signal strength from the institutional Wi-Fi and usingclustering techniques

P No Institutional APs

[45] 2016 Indoor Analyze usersrsquo occupation (based on Markov models) as wellas regular and irregular hours in a university campus P No Institutional APs

[40] 2016 Indoor Analyze room utilization and people tracking providing heatmaps Analyze device statistics A No Institutional APs

[48] 2017 IndoorOutdoor

Analyze people mobility monitoring and tracking in SmartCities and traffic in a highway (eg driving behavior trafficforecasting)

P Yes Raspberry Pibased

[49] 2018 IndoorOutdoor

Provide user localization user profiling and deviceclassification A Yes Raspberry Pi

based

[54] 2005 IndoorOutdoor

Analyze Wi-Fi tracking records gathered during more thanone year in Madeira to classify users as tourists or locals andidentify touristic spots

PYes Based on

TP-Link MR3240v2home router

[56] 2017 Indoor

Obtain semantic trajectories Classify users based on theirlocations Analyze the probability of a user going to a specificshop based on their history and propose the creation of arecommender based on the whole dataset

- -

occupation of different zones [39ndash42] and obtain behaviorpatterns [36 43ndash49]

Wi-Fi tracking of a specific user is usually done byanalyzing the collected records related to a specific MACaddress ie tracking users is equivalent to tracking theirMAC address This technique in general presents issuesrelated to tracking people individually and privacy Althoughit is true that a user carrying several devices (eg smartphonetablet laptop) with the Wi-Fi interfaces on would be at firstidentified as several users after a reasonable period of timethis information can be correlated to fix the problem [50]In the case of the work presented in this paper the filesassociated with the measurements of different sensors of thesame building are compared in order to avoid counting thesame mobileperson several times (eg due to overlappingWi-Fi cells) The resulting file allows for the analysis ofpeople flow at a building level identifying the total staytime or the frequency of the visited places However as itis explained below on the position analysis subsection thisproblem appears in the collected data and it is necessary

to perform a postprocessing of the data to deal with thesecollisions Users may also use so-called MAC spoofing (iereplacing their actual MAC address by the MAC addressof another device) what could be seen as a kind of attackHowever this may yield connectivity problems and it is anegligible behavior in the scenario considered in this paper

Regarding privacy several proposals to protect Wi-Ficommunications by means of MAC address anonymizationhave arisen in recent years First these proposals appearedas apps for smartphones (allowing performing kind of MACspoofing) but recently smartphone manufacturers havestarted including these techniques in the latest versions oftheir Operating Systems (OS) (eg iOS Android and Win-dows) Such MAC address anonymization techniques aim toavoid using the actual MAC address until the device getsconnected to the Wi-Fi network (ie they use a fake MACaddress in their probe frames) The specific solution for thisproblem depends on the manufacturer and OS In the caseof iOS the solution involves sending locally administeredMAC addresses in the probe frames randomly selecting the

Wireless Communications and Mobile Computing 7

three less significant bytes of the MAC address This can beeasily detected just by inspecting the first byte of the MACaddress In the case of Android some manufacturers havedecided to use random MAC addresses in the probe framesfrom the MAC address ranges assigned by the IEEE to themNevertheless evenwith these techniques in place it is possibleto end up obtaining the actual device information [51 52]In addition MAC randomization is not actually a relevantproblem in the scenario considered in this paper since mostof the devices are connected to the available Wi-Fi networks(eg Eduroam) and to connect to a Wi-Fi network devicesmust use their actual MAC addresses

Wi-Fi tracking systems can be classified into two maingroups those that use the enterprise Wireless Local AreaNetwork (WLAN) itself and those that use a dedicatedlow-cost passive Wi-Fi infrastructure which is indeed thecase of the actual deployment considered in this paperOne of the main drawbacks of the systems that use theenterpriseWLAN is that they can only track the users of suchnetworks whereas independent dedicated low-cost passiveWi-Fi infrastructures allow tracking the users connected todifferentWi-Fi networks if they spatially coexist or even notconnected to any if MAC randomization is not used

As examples of works that use the enterprise WLANor the existing infrastructure of access points (APs) [42]presents the 2016 Open Day of the European Joint ResearchCenter (JRC) where 8000 people participated within anarea of about 167 hectares as a case study where the Wi-Fiinfrastructure of the event was used to estimate the numberof participants and their space and time evolution based onproperly processed MAC addresses Reference [47] presentsMobiCamp a large-scale testbed composed of around 2700APs to study mobility-related activities which combinesuser mobility traces based on Simple Network ManagementProtocol (SNMP) data with enriched data (eg gender age)provided by an opt-in smartphone application

Reference [53] represents yet another example of theanalysis that can be made with this kind of information Itsscenario is a hospital and by checking the number of hoursand the positions of a user over time they can classify thatuser according to a role eg patient doctor administrativeetc Reference [38] identifies flocks walking in a building andtheir behaviors applying clustering techniques to the signalstrength measurements provided by the institutional WLAN

Reference [33] presents a campus scenario where byusing the university network infrastructure a detailed profileof the userrsquos activity can be obtained Users tracked arethose logged into the university network which providesextra information about the user such as their role genderetc Combining that information with a detailed map whichcontains thousands of Point of Interest (POIs) the authorscan extract an activity log that shows the different userrsquosactivities with a minute precision Themain drawback of thiswork is that it totally depends on the usersrsquo profiles databaseand the POIrsquos map and both are resources complicated togain access to or create In reference [45] employing theuniversity network infrastructure and the location of eachAP in the university campus each sensor record only storesthe closest AP With this simple information the authors

can create an activity profile similar to the one showed in[33] By analyzing the basic results obtained they are able toextract new information (eg a count of irregular hours) ordetect patterns of anomalous events (eg periods of exams orholidays)

As last example of systems that use the institutionalWLAN reference [40] presents a web application in whichthe occupation of different rooms on a campus is shown inreal time The number of people in the room is calculatedusing the number of Wi-Fi devices detected by the APsUsing the signal strength measurement of each of them aheatmap is drawn that shows the user distribution in theroomThe collected data are analyzed offline to make reportsof utilization of the different rooms and to obtain conclusionsfrom the detected patterns

On the other hand [37 48 49 54] represent someexamples of works which use independent dedicated low-cost passive Wi-Fi infrastructures as it is the case of thedeployment considered in this paper In the case of [48] anetwork of devices called MOBYWIT based on a RaspberryPi and two wireless USB dongles are used to track peopleand vehiclersquos movement sniffing not only Wi-Fi but alsoBluetooth signals emitted by smartphones and vehicle hand-free calling systems In the case of [49] a passive Wi-Fiinfrastructure based on low-cost devices that combine aRaspberry Pi and a TP-LINKWi-Fi dongle is used to provideuser localization user profiling and device classificationbased on the properly processed MAC addresses capturedfrom the IEEE 80211 probe request frames Reference [37]also uses this approach to count people in a concert andin reference [54] the considered scenario is a whole island(Madeira Portugal) where the records gathered all over thereare analyzed to classify users as tourists or locals as well as toidentify touristic spots

One of the main features that make the work presentedin this paper to stand out compared to previous work is thatone-year data from an actual Wi-Fi tracking system deployedin a real-life environment is analyzed Most of previous worksconsider hours or a few days (eg weekdays) or weeks Onlythe work presented in [54] covers a similar period of time(being even larger) but the analysis is much broader beingfar away from the level of detail provided in this paper Theconsidered period of time allows analyzing seasonality effectsand other patterns that although may be seen as commonknowledge do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become models which can be used todetect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning It is also worthto mention the use of clustering to improve the data analysisand interpretation (as in previous works eg [38]) as wellas the application of the semantic trajectory concept [55]which combines positioning data with an external source of

8 Wireless Communications and Mobile Computing

(a) (b) (c)

Figure 3 (a) Raspberry Pi (b) TP-LINK USBWi-Fi dongle (c) developed Wi-Fi listening device

information to classify the different positions according to theactivity carried out in the area (eg users positioned in thecantina will be eating) and which has not been extensivelyexplored in previous works (eg [56])

Another strong point of the work presented in this papercompared to previous work is the validation of the Wi-Fitracking technique to estimate the occupation of the libraryof theTelecommunications School and the comparisonwith avideo camera system Only a few previous works such as [40]or [49] perform a similar validation (in [40] the number ofpeople detected in the room is compared with the attendancelist of the seminar taking place there and in [49] students arerequested to turn on the Wi-Fi interfaces of their devices andprovide the MAC addresses of their smartphones and laptopsthrough an anonymous web form to serve as ground truth fordevice classification) However again the period consideredin this paper is much larger and it is proved that the Wi-Fi tracking system outperforms the more expensive videocamera system As a matter of fact theWi-Fi tracking systemis actually used in a day-to-day basis by the library staff whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

3 Data Acquisition Infrastructure

Figure 3 shows the Wi-Fi listening devicesensor developedfor the people flow monitoring sensor network of the SmartCEI Moncloa It is based on very common and cheaphardware namely a Raspberry Pi board [64] one of the mostwidely used hardware for IoT and a TP-LINK USB Wi-Fidongle [65] configured in monitor mode As a result the costof this solution is in the order of tens of euros (around 80eurowhenmanufactured on 2016) which represents a remarkablecost reduction compared to other solutions available in themarket

As Figure 4 illustrates broadly speaking these Wi-Fisensors scan each of the Wi-Fi channels from both the24GHz and the 5GHz bands during a configurable amountof time (currently 250ms) read the header of the radio IEEE80211 packets (eg data packets or probe requests) in itsregion of coverage and record the sender MAC addressesAs these MAC addresses are unique per device counting

them is a good indicator of the number of devices availablein the surroundings of the Wi-Fi sensors (although theremay be issues related to the fact that a single user can bringseveral devices as already pointed out in Section 2) and theyallow for temporal correlation analysis thus obtaining usefulinformation such as stay time availability patterns etc

Regarding how this information is carried all the way upto the backend server and processed there Figure 5 showsanoverview of the communications architecture and protocolstack of the people flow monitoring service of the Smart CEIMoncloa

As it can be seen theWi-Fi sensors are directly connectedto the backend via the Ethernet network of the UPM Com-munications are protected end-to-end by the use of TransportLayer Security (TLS) on top of Transport Control Protocol(TCP)Internet Protocol (IP) Measurements are periodicallysent using Message Queue Telemetry Transport (MQTT)[66] The publishsubscribe mechanism provided by MQTTallows the Wi-Fi sensors not only to sendmeasurements (ieevents) periodically but also to receive commands (eg toreboot them or to perform a remote firmware update)

Taking advantage of the hierarchical structure of theMQTT topics all the publish events follow the structure SER-VICEIDEVENT(TIMESTAMP) Thus the publish eventsfrom the Wi-Fi sensors start by Wi-Fi followed by the MACaddress of its Ethernet interface which is used as unique IDThe format of the content published under the different topicsis Comma Separated Value (CSV) which is a lightweightsolution especially appropriate when the data structure isfixed since the meaning of each field of the subsequent linesis explained only in the header at the beginning of the file

Privacy issues have been also considered carefully thedeveloped Wi-Fi sensors apply an irreversible hash MD-5function with salt to the MAC address which avoids brute-force attackswith precomputed tables In addition as Figure 5shows once anonymized the data are carried securely up tothe platform servers where they are handled in an aggregatemanner instead of individually

Furthermore the software of the developed sensors hasbeenmodified in order to avoid that theMACanonymizationmechanisms presented in Section 2 affect the obtained mea-surements Thus the Wi-Fi frames with locally administered

Wireless Communications and Mobile Computing 9

AP1User1

User2

User4

User3

DataMAC_U3(ch3)

DataMAC_U1(ch1)

Probe ReqMAC_U2(chN)

Probe ReqMAC_U4(chM)

(ch1)

The developed Wi-Fi Listening Device islistening ldquoxrdquo ms at each channel

AP2(ch2)

AP3(ch3)

AP4(ch4)

Figure 4 Sketch of how the developed Wi-Fi listening devices obtain the MAC addresses of the users surrounding them

WiFiSensors

MQTTTLSTCP

IP8023

MQTTTLSTCP

IP8023

Backend Server

MQTT Broker

Real-time ampbatch processing

REST APIServer

CSV file-basedtemporal DB

DashboardWeb server

Figure 5 Communications architecture and protocol stack for the people flow monitoring service of the Smart CEI Moncloa

MAC addresses or including special MAC address ranges arediscarded so these devices are not considered Anyway as ithas been already mentioned in Section 2 MAC randomiza-tion is not actually such a big deal in our case since most ofthe users are connected to the Eduroam free Wi-Fi access sotheir smartphones end up using their actual MAC address

After gathering the CSV files for a large time span wemoved to an offline analysis platform on a private clusterThe Apache Spark 22 software platform was selected as thedata processing tool due to its optimized capabilities to workwith large amounts of structured data This drives into largedatasets which common centralized system will struggle toprocess but distributed systems such as Spark can handleeasily

The private cluster mentioned is composed by eightHP ProLiant SL250s Gen 8 machines with two Intel Xeone52630v2 26GHz (6 cores each) and RAM 32GB In addi-tion to Spark this platform also runs an Apache Hadoop

Filesystem where the dataset and the results are stored Thisis a distributed filesystem which allows that all the machinesaccess to the stored data in parallel The usage of suchplatform is advisable in order to speed up analysis but notmandatory since the algorithms we are going to describe areavailable in many other software platforms (such as thoseprovided with R or Python)

4 Data Analysis and Results

41 Data Processing Although this work is not a proper datamining process given that we are not using those kinds ofalgorithms and analyses the necessary steps previous to theactual analysis are the same that in a KDDmdashKnowledge Dis-covery in Databasesmdashprocess [67] For our analysis we tookthe aforementioned CSV format files each one containingthe data collected by a single sensor during a period of 15minutes The observation period used in our analysis is a full

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

4 Wireless Communications and Mobile Computing

Table 1 Summary of the state of the art of people flow monitoring techniques which are not based on Wi-Fi tracking (PA stands forPassiveActive)

Technology Ref Year Scenario Application PA Owndeployment

Video camera ampComputer Vision [13] 2015 Indoor

Outdoor Crowd scene analysis P No

RF [14ndash16]201320142015

Indoor Short range movements (1-10 m) with highaccuracy (eg elderly care baby care) P Yes

Cellular networks [18] 2011 IndoorOutdoor

Analyze people flow in a suburban area newNYC with accuracy around hundreds ofmeters

P No

GPS [19] 2015 Outdoor Prevent critical situations in overcrowdedscenarios (eg concerts) A No

Smartphone sensors(eg accelerometer) [20] 2018 Indoor PDR Track individual pedestrian A Depends

Bluetooth [21] 2017 Indoor Monitor people flow (comparing Wi-Fi withBluetooth) A Yes

covering from existing models and popular algorithms tocurrent research problems and trends

People localization and tracking based on radiofrequency(RF) measurements has been also widely addressed in liter-ature References [14ndash16] represent some recent remarkableresearch work on this topic These solutions are based onantennas that transmit RF signals and are able to locate andtrack people based on body radio reflections In consequencethey can be classified as passive systems They provide veryaccurate results allowing tracking forward and backwardmovements or body part movements (including breathing)and are able to even track several people under certainconfigurations However it is also difficult to track a fairlyhigh number of users individually based on this kind oftechniques (even if we assume that a person may have acertain type of body radio reflection signature it wouldnot be different enough between similar people and mightchange depending on the distance between the person andthe vantage points) They are mainly applied in short rangemovements (1-10 meters) so they could only be applied insmall rooms at the cost of a large and specific deployment

People flowmonitoring can be also performed exploitingthe ubiquity communications capabilities and integratedsensors of smartphones [17] In [18] for instance Call DataRecords (CDR) are used to locate the base stations thesmartphones are attached to and thus analyze people flow inand out of a suburban city near New York City The maindrawbacks of this way of locating and tracking people areits low accuracy (around hundreds of meters) and that theterminal has to be active (eg make or receive a call orsend or receive an SMS) in order to be monitored althoughthis technique would be classified as passive given thatthe needed user activity is not aimed to contribute to thepositioning

Reference [19] proposes a solution to prevent criticalsituations in overcrowded scenarios based on a smartphoneapplication that transmits its GPS location data Although ityields good results in terms of accuracy the main drawbacks

of this approach are that it is an active technique sinceit requires the installation of the app the impact of theconsumption of the app on the autonomy of the terminal andit may also present problems to work properly indoors

Smartphone sensors can be also used to perform Pedes-trian Dead Reckoning (PDR) which is a method that tries toestimate a pedestrianrsquos position based on their past positionand the output of smartphonersquos sensors eg accelerometergyroscope etc This technique is usually supported by otherpositioning methods [20] but in most cases it does not needa specific sensor network other than the smartphone itself Itis an active system and it is typically used to track individualpedestrians instead of flocks but the main drawback of thistechnique is that it suffers tracking error accumulation andneeds an extra location measure frequently

Bluetooth has also been used to monitor people flowsin indoor environments Some works such as [21] performa comparison between the usage of Bluetooth and othertechniques such as Wi-Fi and its conclusion is that itscapabilities are belowother options although it can be used incombination with other positioning systems to improve theiraccuracy The main disadvantage of this tracking systems isthat most Bluetooth devices only react to Bluetooth signalswhen the users make them visible to the network Also theseimplementations require a specific deployment of Bluetoothsensors which usually cannot be used for other tasks Table 1summarizes the previous research work reviewed so farwithout considering Wi-Fi tracking based works

Lastly people tracking systems based onWi-Fi have beena hot topic from more than fifteen years and it is still so Thisis indeed the technique used in this paper Thus Table 2 isexclusively devoted to summarizing and comparing previousresearch work based on this technique

As it is shown in Table 2 works related to Wi-Fi trackingtechniques can be focused on different objectives sometry to obtain usersrsquo positions as accurately as possible [22ndash32] others analyze the trajectories followed by pedestrians[33ndash35] or flocks [36ndash38] and finally others study the

Wireless Communications and Mobile Computing 5

Table 2 Summary of the state of the art of people flow monitoring techniques based on Wi-Fi tracking (PA stands for PassiveActive)

Ref Year Scenario Application PA Own deployment

[22] 2003 Indoor One of the earliest approaches on precise positioning usingWi-Fi (precision 26m) A Yes

[23] 2006 IndoorWi-Fi fingerprint to identify the general location andapplying logistic regression to distinguish betweenfiner-grained locations

A Yes

[24] 2006 Indoor officebuilding

Precise positioning PDR combined with Wi-Fi to reduce theaccumulated error A Yes

[25] 2007 Outdoor andindoor

Creation of Wi-Fi map Positioning comparing with thecreated map A Yes Own hardware

Offline analysis

[26] 2007 Indoor campus Comparison of positioning and tracking methods usingWi-Fi P No Offline

[27] 2008 Indoor campus Estimate the position using Wi-Fi and tracking with PDR A Yes

[28] 2009 Indoor campus Real time Wi-Fi positioning web portal to check userrsquospositions A Yes

[39] 2009 Indoor campus Count of users in different buildings Analysis of usersrsquomobility between buildings P No Institutional APs

[57] 2010Indoor campus

and officebuilding

Coarse position with Wi-Fi and Bluetooth Graph of userco-occurrence A No Offline

[43] 2010Indoor campus

and officebuilding

Calculate of stay length based on Wi-Fi positioning Analysisof favorite locations A No Offline

[44] 2011 Indoor campus Extension to [39] User characterization based on theirmobility patterns P No Institutional APs

[29] 2011 Indoor tunnelin construction

Precise positioning in real time of workers inside aconstructing tunnel using Wi-Fi (precision 5m) P Yes Own AP

deployment

[30] 2012 Indoor campus Creation of Wi-Fi fingerprint map Map usage to positioningwith smartphone application A No

[36] 2012 Indoor campus Study of crowd movement Wi-Fi based Analysis of mobilitypatterns usersrsquo arrivals and departures from campus P No Institutional APs

[58] 2013 Indoor Wi-Fi path analysis in real time A No Institutional APs

[33] 2014 Indoor andoutdoor campus

Analyze pedestrian destination frequencies in an area of 55hectares of a university campus during 5 weekdays P No Institutional APs

and Radius server

[59] 2014 Indoor campusLocalization and tracking system exploiting particle filters tocombine dead reckoning Wi-Fi RSS-based analyzing andknowledge of floor plan together (precision 07m)

A

[60] 2015 Indoorshopping mall

Wi-Fi Channel State Information analysis to detect shopperactivities P Yes Own AP

deployment

[31] 2015 IndoorPrecise positioning based on sensor fusion combining Wi-FiPDR and landmarks Smartphone application (Positioning1m)

A No Smartphones

[32] 2015 Indoor parking Precise positioning combining Wi-Fi RSS andelectromagnetic field map

[37] 2015Outdoor

concert Indoorcampus

Portable Wi-Fi based user count Analysis of crowds inconcert and in campus P Yes Raspberry Pi

based

[61] 2015 Outdoor Creation of Wi-Fi map using GPS A[62] 2016 Indoor Precise positioning combining Wi-Fi and PDR A No Smartphones

[34] 2016 Indoor airportUser path detection Combining Wi-Fi GPS PDR andBluetooth to create a multilevel map and study of userrsquostrajectory prediction

A No Smartphones

[45] 2016 Indoor campus Analysis of usersrsquo activities User tagging based on activitiesregistered P No Institutional APs

[46] 2016 Indoor andoutdoor campus

Analysis of user movements to different food points topredict the operation of new stores based on price andlocation

P No Institutional APs

6 Wireless Communications and Mobile Computing

Table 2 Continued

Ref Year Scenario Application PA Own deployment[63] 2017 Indoor Crowdsourcing positioning based on Wi-Fi fingerprint A No Institutional APs

[41] 2018 IndoorCoarse positioning room level precision based onprobabilistic Wi-Fi fingerprint Usage of Hidden Markovchain models to analyze user movement

P No Institutional APs

[35] 2016 Indoor Trajectory analysis based on Hidden Markov chain models P No Institutional APs

[42] 2017 IndoorEstimate the number of participants and their space and timeevolution in an area of about 167 hectares during 2016 OpenDay of the European JRC

P No Institutional APs

[47] 2016 IndoorOutdoor

Study mobility-related activities in a campus of 440 hectaresbased on the 2700 APs of the institutional network andadditional opt-in smartphone application

A No Institutional APs

[53] 2014 Indoor

Classify users in a hospital (eg patient doctoradministrative) by checking the number of hours and thepositions of a user over time based on the institutional Wi-Finetwork

P No Institutional APs

[38] 2012 IndoorIdentify flocks walking in a building and their behavior basedon signal strength from the institutional Wi-Fi and usingclustering techniques

P No Institutional APs

[45] 2016 Indoor Analyze usersrsquo occupation (based on Markov models) as wellas regular and irregular hours in a university campus P No Institutional APs

[40] 2016 Indoor Analyze room utilization and people tracking providing heatmaps Analyze device statistics A No Institutional APs

[48] 2017 IndoorOutdoor

Analyze people mobility monitoring and tracking in SmartCities and traffic in a highway (eg driving behavior trafficforecasting)

P Yes Raspberry Pibased

[49] 2018 IndoorOutdoor

Provide user localization user profiling and deviceclassification A Yes Raspberry Pi

based

[54] 2005 IndoorOutdoor

Analyze Wi-Fi tracking records gathered during more thanone year in Madeira to classify users as tourists or locals andidentify touristic spots

PYes Based on

TP-Link MR3240v2home router

[56] 2017 Indoor

Obtain semantic trajectories Classify users based on theirlocations Analyze the probability of a user going to a specificshop based on their history and propose the creation of arecommender based on the whole dataset

- -

occupation of different zones [39ndash42] and obtain behaviorpatterns [36 43ndash49]

Wi-Fi tracking of a specific user is usually done byanalyzing the collected records related to a specific MACaddress ie tracking users is equivalent to tracking theirMAC address This technique in general presents issuesrelated to tracking people individually and privacy Althoughit is true that a user carrying several devices (eg smartphonetablet laptop) with the Wi-Fi interfaces on would be at firstidentified as several users after a reasonable period of timethis information can be correlated to fix the problem [50]In the case of the work presented in this paper the filesassociated with the measurements of different sensors of thesame building are compared in order to avoid counting thesame mobileperson several times (eg due to overlappingWi-Fi cells) The resulting file allows for the analysis ofpeople flow at a building level identifying the total staytime or the frequency of the visited places However as itis explained below on the position analysis subsection thisproblem appears in the collected data and it is necessary

to perform a postprocessing of the data to deal with thesecollisions Users may also use so-called MAC spoofing (iereplacing their actual MAC address by the MAC addressof another device) what could be seen as a kind of attackHowever this may yield connectivity problems and it is anegligible behavior in the scenario considered in this paper

Regarding privacy several proposals to protect Wi-Ficommunications by means of MAC address anonymizationhave arisen in recent years First these proposals appearedas apps for smartphones (allowing performing kind of MACspoofing) but recently smartphone manufacturers havestarted including these techniques in the latest versions oftheir Operating Systems (OS) (eg iOS Android and Win-dows) Such MAC address anonymization techniques aim toavoid using the actual MAC address until the device getsconnected to the Wi-Fi network (ie they use a fake MACaddress in their probe frames) The specific solution for thisproblem depends on the manufacturer and OS In the caseof iOS the solution involves sending locally administeredMAC addresses in the probe frames randomly selecting the

Wireless Communications and Mobile Computing 7

three less significant bytes of the MAC address This can beeasily detected just by inspecting the first byte of the MACaddress In the case of Android some manufacturers havedecided to use random MAC addresses in the probe framesfrom the MAC address ranges assigned by the IEEE to themNevertheless evenwith these techniques in place it is possibleto end up obtaining the actual device information [51 52]In addition MAC randomization is not actually a relevantproblem in the scenario considered in this paper since mostof the devices are connected to the available Wi-Fi networks(eg Eduroam) and to connect to a Wi-Fi network devicesmust use their actual MAC addresses

Wi-Fi tracking systems can be classified into two maingroups those that use the enterprise Wireless Local AreaNetwork (WLAN) itself and those that use a dedicatedlow-cost passive Wi-Fi infrastructure which is indeed thecase of the actual deployment considered in this paperOne of the main drawbacks of the systems that use theenterpriseWLAN is that they can only track the users of suchnetworks whereas independent dedicated low-cost passiveWi-Fi infrastructures allow tracking the users connected todifferentWi-Fi networks if they spatially coexist or even notconnected to any if MAC randomization is not used

As examples of works that use the enterprise WLANor the existing infrastructure of access points (APs) [42]presents the 2016 Open Day of the European Joint ResearchCenter (JRC) where 8000 people participated within anarea of about 167 hectares as a case study where the Wi-Fiinfrastructure of the event was used to estimate the numberof participants and their space and time evolution based onproperly processed MAC addresses Reference [47] presentsMobiCamp a large-scale testbed composed of around 2700APs to study mobility-related activities which combinesuser mobility traces based on Simple Network ManagementProtocol (SNMP) data with enriched data (eg gender age)provided by an opt-in smartphone application

Reference [53] represents yet another example of theanalysis that can be made with this kind of information Itsscenario is a hospital and by checking the number of hoursand the positions of a user over time they can classify thatuser according to a role eg patient doctor administrativeetc Reference [38] identifies flocks walking in a building andtheir behaviors applying clustering techniques to the signalstrength measurements provided by the institutional WLAN

Reference [33] presents a campus scenario where byusing the university network infrastructure a detailed profileof the userrsquos activity can be obtained Users tracked arethose logged into the university network which providesextra information about the user such as their role genderetc Combining that information with a detailed map whichcontains thousands of Point of Interest (POIs) the authorscan extract an activity log that shows the different userrsquosactivities with a minute precision Themain drawback of thiswork is that it totally depends on the usersrsquo profiles databaseand the POIrsquos map and both are resources complicated togain access to or create In reference [45] employing theuniversity network infrastructure and the location of eachAP in the university campus each sensor record only storesthe closest AP With this simple information the authors

can create an activity profile similar to the one showed in[33] By analyzing the basic results obtained they are able toextract new information (eg a count of irregular hours) ordetect patterns of anomalous events (eg periods of exams orholidays)

As last example of systems that use the institutionalWLAN reference [40] presents a web application in whichthe occupation of different rooms on a campus is shown inreal time The number of people in the room is calculatedusing the number of Wi-Fi devices detected by the APsUsing the signal strength measurement of each of them aheatmap is drawn that shows the user distribution in theroomThe collected data are analyzed offline to make reportsof utilization of the different rooms and to obtain conclusionsfrom the detected patterns

On the other hand [37 48 49 54] represent someexamples of works which use independent dedicated low-cost passive Wi-Fi infrastructures as it is the case of thedeployment considered in this paper In the case of [48] anetwork of devices called MOBYWIT based on a RaspberryPi and two wireless USB dongles are used to track peopleand vehiclersquos movement sniffing not only Wi-Fi but alsoBluetooth signals emitted by smartphones and vehicle hand-free calling systems In the case of [49] a passive Wi-Fiinfrastructure based on low-cost devices that combine aRaspberry Pi and a TP-LINKWi-Fi dongle is used to provideuser localization user profiling and device classificationbased on the properly processed MAC addresses capturedfrom the IEEE 80211 probe request frames Reference [37]also uses this approach to count people in a concert andin reference [54] the considered scenario is a whole island(Madeira Portugal) where the records gathered all over thereare analyzed to classify users as tourists or locals as well as toidentify touristic spots

One of the main features that make the work presentedin this paper to stand out compared to previous work is thatone-year data from an actual Wi-Fi tracking system deployedin a real-life environment is analyzed Most of previous worksconsider hours or a few days (eg weekdays) or weeks Onlythe work presented in [54] covers a similar period of time(being even larger) but the analysis is much broader beingfar away from the level of detail provided in this paper Theconsidered period of time allows analyzing seasonality effectsand other patterns that although may be seen as commonknowledge do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become models which can be used todetect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning It is also worthto mention the use of clustering to improve the data analysisand interpretation (as in previous works eg [38]) as wellas the application of the semantic trajectory concept [55]which combines positioning data with an external source of

8 Wireless Communications and Mobile Computing

(a) (b) (c)

Figure 3 (a) Raspberry Pi (b) TP-LINK USBWi-Fi dongle (c) developed Wi-Fi listening device

information to classify the different positions according to theactivity carried out in the area (eg users positioned in thecantina will be eating) and which has not been extensivelyexplored in previous works (eg [56])

Another strong point of the work presented in this papercompared to previous work is the validation of the Wi-Fitracking technique to estimate the occupation of the libraryof theTelecommunications School and the comparisonwith avideo camera system Only a few previous works such as [40]or [49] perform a similar validation (in [40] the number ofpeople detected in the room is compared with the attendancelist of the seminar taking place there and in [49] students arerequested to turn on the Wi-Fi interfaces of their devices andprovide the MAC addresses of their smartphones and laptopsthrough an anonymous web form to serve as ground truth fordevice classification) However again the period consideredin this paper is much larger and it is proved that the Wi-Fi tracking system outperforms the more expensive videocamera system As a matter of fact theWi-Fi tracking systemis actually used in a day-to-day basis by the library staff whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

3 Data Acquisition Infrastructure

Figure 3 shows the Wi-Fi listening devicesensor developedfor the people flow monitoring sensor network of the SmartCEI Moncloa It is based on very common and cheaphardware namely a Raspberry Pi board [64] one of the mostwidely used hardware for IoT and a TP-LINK USB Wi-Fidongle [65] configured in monitor mode As a result the costof this solution is in the order of tens of euros (around 80eurowhenmanufactured on 2016) which represents a remarkablecost reduction compared to other solutions available in themarket

As Figure 4 illustrates broadly speaking these Wi-Fisensors scan each of the Wi-Fi channels from both the24GHz and the 5GHz bands during a configurable amountof time (currently 250ms) read the header of the radio IEEE80211 packets (eg data packets or probe requests) in itsregion of coverage and record the sender MAC addressesAs these MAC addresses are unique per device counting

them is a good indicator of the number of devices availablein the surroundings of the Wi-Fi sensors (although theremay be issues related to the fact that a single user can bringseveral devices as already pointed out in Section 2) and theyallow for temporal correlation analysis thus obtaining usefulinformation such as stay time availability patterns etc

Regarding how this information is carried all the way upto the backend server and processed there Figure 5 showsanoverview of the communications architecture and protocolstack of the people flow monitoring service of the Smart CEIMoncloa

As it can be seen theWi-Fi sensors are directly connectedto the backend via the Ethernet network of the UPM Com-munications are protected end-to-end by the use of TransportLayer Security (TLS) on top of Transport Control Protocol(TCP)Internet Protocol (IP) Measurements are periodicallysent using Message Queue Telemetry Transport (MQTT)[66] The publishsubscribe mechanism provided by MQTTallows the Wi-Fi sensors not only to sendmeasurements (ieevents) periodically but also to receive commands (eg toreboot them or to perform a remote firmware update)

Taking advantage of the hierarchical structure of theMQTT topics all the publish events follow the structure SER-VICEIDEVENT(TIMESTAMP) Thus the publish eventsfrom the Wi-Fi sensors start by Wi-Fi followed by the MACaddress of its Ethernet interface which is used as unique IDThe format of the content published under the different topicsis Comma Separated Value (CSV) which is a lightweightsolution especially appropriate when the data structure isfixed since the meaning of each field of the subsequent linesis explained only in the header at the beginning of the file

Privacy issues have been also considered carefully thedeveloped Wi-Fi sensors apply an irreversible hash MD-5function with salt to the MAC address which avoids brute-force attackswith precomputed tables In addition as Figure 5shows once anonymized the data are carried securely up tothe platform servers where they are handled in an aggregatemanner instead of individually

Furthermore the software of the developed sensors hasbeenmodified in order to avoid that theMACanonymizationmechanisms presented in Section 2 affect the obtained mea-surements Thus the Wi-Fi frames with locally administered

Wireless Communications and Mobile Computing 9

AP1User1

User2

User4

User3

DataMAC_U3(ch3)

DataMAC_U1(ch1)

Probe ReqMAC_U2(chN)

Probe ReqMAC_U4(chM)

(ch1)

The developed Wi-Fi Listening Device islistening ldquoxrdquo ms at each channel

AP2(ch2)

AP3(ch3)

AP4(ch4)

Figure 4 Sketch of how the developed Wi-Fi listening devices obtain the MAC addresses of the users surrounding them

WiFiSensors

MQTTTLSTCP

IP8023

MQTTTLSTCP

IP8023

Backend Server

MQTT Broker

Real-time ampbatch processing

REST APIServer

CSV file-basedtemporal DB

DashboardWeb server

Figure 5 Communications architecture and protocol stack for the people flow monitoring service of the Smart CEI Moncloa

MAC addresses or including special MAC address ranges arediscarded so these devices are not considered Anyway as ithas been already mentioned in Section 2 MAC randomiza-tion is not actually such a big deal in our case since most ofthe users are connected to the Eduroam free Wi-Fi access sotheir smartphones end up using their actual MAC address

After gathering the CSV files for a large time span wemoved to an offline analysis platform on a private clusterThe Apache Spark 22 software platform was selected as thedata processing tool due to its optimized capabilities to workwith large amounts of structured data This drives into largedatasets which common centralized system will struggle toprocess but distributed systems such as Spark can handleeasily

The private cluster mentioned is composed by eightHP ProLiant SL250s Gen 8 machines with two Intel Xeone52630v2 26GHz (6 cores each) and RAM 32GB In addi-tion to Spark this platform also runs an Apache Hadoop

Filesystem where the dataset and the results are stored Thisis a distributed filesystem which allows that all the machinesaccess to the stored data in parallel The usage of suchplatform is advisable in order to speed up analysis but notmandatory since the algorithms we are going to describe areavailable in many other software platforms (such as thoseprovided with R or Python)

4 Data Analysis and Results

41 Data Processing Although this work is not a proper datamining process given that we are not using those kinds ofalgorithms and analyses the necessary steps previous to theactual analysis are the same that in a KDDmdashKnowledge Dis-covery in Databasesmdashprocess [67] For our analysis we tookthe aforementioned CSV format files each one containingthe data collected by a single sensor during a period of 15minutes The observation period used in our analysis is a full

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 5

Table 2 Summary of the state of the art of people flow monitoring techniques based on Wi-Fi tracking (PA stands for PassiveActive)

Ref Year Scenario Application PA Own deployment

[22] 2003 Indoor One of the earliest approaches on precise positioning usingWi-Fi (precision 26m) A Yes

[23] 2006 IndoorWi-Fi fingerprint to identify the general location andapplying logistic regression to distinguish betweenfiner-grained locations

A Yes

[24] 2006 Indoor officebuilding

Precise positioning PDR combined with Wi-Fi to reduce theaccumulated error A Yes

[25] 2007 Outdoor andindoor

Creation of Wi-Fi map Positioning comparing with thecreated map A Yes Own hardware

Offline analysis

[26] 2007 Indoor campus Comparison of positioning and tracking methods usingWi-Fi P No Offline

[27] 2008 Indoor campus Estimate the position using Wi-Fi and tracking with PDR A Yes

[28] 2009 Indoor campus Real time Wi-Fi positioning web portal to check userrsquospositions A Yes

[39] 2009 Indoor campus Count of users in different buildings Analysis of usersrsquomobility between buildings P No Institutional APs

[57] 2010Indoor campus

and officebuilding

Coarse position with Wi-Fi and Bluetooth Graph of userco-occurrence A No Offline

[43] 2010Indoor campus

and officebuilding

Calculate of stay length based on Wi-Fi positioning Analysisof favorite locations A No Offline

[44] 2011 Indoor campus Extension to [39] User characterization based on theirmobility patterns P No Institutional APs

[29] 2011 Indoor tunnelin construction

Precise positioning in real time of workers inside aconstructing tunnel using Wi-Fi (precision 5m) P Yes Own AP

deployment

[30] 2012 Indoor campus Creation of Wi-Fi fingerprint map Map usage to positioningwith smartphone application A No

[36] 2012 Indoor campus Study of crowd movement Wi-Fi based Analysis of mobilitypatterns usersrsquo arrivals and departures from campus P No Institutional APs

[58] 2013 Indoor Wi-Fi path analysis in real time A No Institutional APs

[33] 2014 Indoor andoutdoor campus

Analyze pedestrian destination frequencies in an area of 55hectares of a university campus during 5 weekdays P No Institutional APs

and Radius server

[59] 2014 Indoor campusLocalization and tracking system exploiting particle filters tocombine dead reckoning Wi-Fi RSS-based analyzing andknowledge of floor plan together (precision 07m)

A

[60] 2015 Indoorshopping mall

Wi-Fi Channel State Information analysis to detect shopperactivities P Yes Own AP

deployment

[31] 2015 IndoorPrecise positioning based on sensor fusion combining Wi-FiPDR and landmarks Smartphone application (Positioning1m)

A No Smartphones

[32] 2015 Indoor parking Precise positioning combining Wi-Fi RSS andelectromagnetic field map

[37] 2015Outdoor

concert Indoorcampus

Portable Wi-Fi based user count Analysis of crowds inconcert and in campus P Yes Raspberry Pi

based

[61] 2015 Outdoor Creation of Wi-Fi map using GPS A[62] 2016 Indoor Precise positioning combining Wi-Fi and PDR A No Smartphones

[34] 2016 Indoor airportUser path detection Combining Wi-Fi GPS PDR andBluetooth to create a multilevel map and study of userrsquostrajectory prediction

A No Smartphones

[45] 2016 Indoor campus Analysis of usersrsquo activities User tagging based on activitiesregistered P No Institutional APs

[46] 2016 Indoor andoutdoor campus

Analysis of user movements to different food points topredict the operation of new stores based on price andlocation

P No Institutional APs

6 Wireless Communications and Mobile Computing

Table 2 Continued

Ref Year Scenario Application PA Own deployment[63] 2017 Indoor Crowdsourcing positioning based on Wi-Fi fingerprint A No Institutional APs

[41] 2018 IndoorCoarse positioning room level precision based onprobabilistic Wi-Fi fingerprint Usage of Hidden Markovchain models to analyze user movement

P No Institutional APs

[35] 2016 Indoor Trajectory analysis based on Hidden Markov chain models P No Institutional APs

[42] 2017 IndoorEstimate the number of participants and their space and timeevolution in an area of about 167 hectares during 2016 OpenDay of the European JRC

P No Institutional APs

[47] 2016 IndoorOutdoor

Study mobility-related activities in a campus of 440 hectaresbased on the 2700 APs of the institutional network andadditional opt-in smartphone application

A No Institutional APs

[53] 2014 Indoor

Classify users in a hospital (eg patient doctoradministrative) by checking the number of hours and thepositions of a user over time based on the institutional Wi-Finetwork

P No Institutional APs

[38] 2012 IndoorIdentify flocks walking in a building and their behavior basedon signal strength from the institutional Wi-Fi and usingclustering techniques

P No Institutional APs

[45] 2016 Indoor Analyze usersrsquo occupation (based on Markov models) as wellas regular and irregular hours in a university campus P No Institutional APs

[40] 2016 Indoor Analyze room utilization and people tracking providing heatmaps Analyze device statistics A No Institutional APs

[48] 2017 IndoorOutdoor

Analyze people mobility monitoring and tracking in SmartCities and traffic in a highway (eg driving behavior trafficforecasting)

P Yes Raspberry Pibased

[49] 2018 IndoorOutdoor

Provide user localization user profiling and deviceclassification A Yes Raspberry Pi

based

[54] 2005 IndoorOutdoor

Analyze Wi-Fi tracking records gathered during more thanone year in Madeira to classify users as tourists or locals andidentify touristic spots

PYes Based on

TP-Link MR3240v2home router

[56] 2017 Indoor

Obtain semantic trajectories Classify users based on theirlocations Analyze the probability of a user going to a specificshop based on their history and propose the creation of arecommender based on the whole dataset

- -

occupation of different zones [39ndash42] and obtain behaviorpatterns [36 43ndash49]

Wi-Fi tracking of a specific user is usually done byanalyzing the collected records related to a specific MACaddress ie tracking users is equivalent to tracking theirMAC address This technique in general presents issuesrelated to tracking people individually and privacy Althoughit is true that a user carrying several devices (eg smartphonetablet laptop) with the Wi-Fi interfaces on would be at firstidentified as several users after a reasonable period of timethis information can be correlated to fix the problem [50]In the case of the work presented in this paper the filesassociated with the measurements of different sensors of thesame building are compared in order to avoid counting thesame mobileperson several times (eg due to overlappingWi-Fi cells) The resulting file allows for the analysis ofpeople flow at a building level identifying the total staytime or the frequency of the visited places However as itis explained below on the position analysis subsection thisproblem appears in the collected data and it is necessary

to perform a postprocessing of the data to deal with thesecollisions Users may also use so-called MAC spoofing (iereplacing their actual MAC address by the MAC addressof another device) what could be seen as a kind of attackHowever this may yield connectivity problems and it is anegligible behavior in the scenario considered in this paper

Regarding privacy several proposals to protect Wi-Ficommunications by means of MAC address anonymizationhave arisen in recent years First these proposals appearedas apps for smartphones (allowing performing kind of MACspoofing) but recently smartphone manufacturers havestarted including these techniques in the latest versions oftheir Operating Systems (OS) (eg iOS Android and Win-dows) Such MAC address anonymization techniques aim toavoid using the actual MAC address until the device getsconnected to the Wi-Fi network (ie they use a fake MACaddress in their probe frames) The specific solution for thisproblem depends on the manufacturer and OS In the caseof iOS the solution involves sending locally administeredMAC addresses in the probe frames randomly selecting the

Wireless Communications and Mobile Computing 7

three less significant bytes of the MAC address This can beeasily detected just by inspecting the first byte of the MACaddress In the case of Android some manufacturers havedecided to use random MAC addresses in the probe framesfrom the MAC address ranges assigned by the IEEE to themNevertheless evenwith these techniques in place it is possibleto end up obtaining the actual device information [51 52]In addition MAC randomization is not actually a relevantproblem in the scenario considered in this paper since mostof the devices are connected to the available Wi-Fi networks(eg Eduroam) and to connect to a Wi-Fi network devicesmust use their actual MAC addresses

Wi-Fi tracking systems can be classified into two maingroups those that use the enterprise Wireless Local AreaNetwork (WLAN) itself and those that use a dedicatedlow-cost passive Wi-Fi infrastructure which is indeed thecase of the actual deployment considered in this paperOne of the main drawbacks of the systems that use theenterpriseWLAN is that they can only track the users of suchnetworks whereas independent dedicated low-cost passiveWi-Fi infrastructures allow tracking the users connected todifferentWi-Fi networks if they spatially coexist or even notconnected to any if MAC randomization is not used

As examples of works that use the enterprise WLANor the existing infrastructure of access points (APs) [42]presents the 2016 Open Day of the European Joint ResearchCenter (JRC) where 8000 people participated within anarea of about 167 hectares as a case study where the Wi-Fiinfrastructure of the event was used to estimate the numberof participants and their space and time evolution based onproperly processed MAC addresses Reference [47] presentsMobiCamp a large-scale testbed composed of around 2700APs to study mobility-related activities which combinesuser mobility traces based on Simple Network ManagementProtocol (SNMP) data with enriched data (eg gender age)provided by an opt-in smartphone application

Reference [53] represents yet another example of theanalysis that can be made with this kind of information Itsscenario is a hospital and by checking the number of hoursand the positions of a user over time they can classify thatuser according to a role eg patient doctor administrativeetc Reference [38] identifies flocks walking in a building andtheir behaviors applying clustering techniques to the signalstrength measurements provided by the institutional WLAN

Reference [33] presents a campus scenario where byusing the university network infrastructure a detailed profileof the userrsquos activity can be obtained Users tracked arethose logged into the university network which providesextra information about the user such as their role genderetc Combining that information with a detailed map whichcontains thousands of Point of Interest (POIs) the authorscan extract an activity log that shows the different userrsquosactivities with a minute precision Themain drawback of thiswork is that it totally depends on the usersrsquo profiles databaseand the POIrsquos map and both are resources complicated togain access to or create In reference [45] employing theuniversity network infrastructure and the location of eachAP in the university campus each sensor record only storesthe closest AP With this simple information the authors

can create an activity profile similar to the one showed in[33] By analyzing the basic results obtained they are able toextract new information (eg a count of irregular hours) ordetect patterns of anomalous events (eg periods of exams orholidays)

As last example of systems that use the institutionalWLAN reference [40] presents a web application in whichthe occupation of different rooms on a campus is shown inreal time The number of people in the room is calculatedusing the number of Wi-Fi devices detected by the APsUsing the signal strength measurement of each of them aheatmap is drawn that shows the user distribution in theroomThe collected data are analyzed offline to make reportsof utilization of the different rooms and to obtain conclusionsfrom the detected patterns

On the other hand [37 48 49 54] represent someexamples of works which use independent dedicated low-cost passive Wi-Fi infrastructures as it is the case of thedeployment considered in this paper In the case of [48] anetwork of devices called MOBYWIT based on a RaspberryPi and two wireless USB dongles are used to track peopleand vehiclersquos movement sniffing not only Wi-Fi but alsoBluetooth signals emitted by smartphones and vehicle hand-free calling systems In the case of [49] a passive Wi-Fiinfrastructure based on low-cost devices that combine aRaspberry Pi and a TP-LINKWi-Fi dongle is used to provideuser localization user profiling and device classificationbased on the properly processed MAC addresses capturedfrom the IEEE 80211 probe request frames Reference [37]also uses this approach to count people in a concert andin reference [54] the considered scenario is a whole island(Madeira Portugal) where the records gathered all over thereare analyzed to classify users as tourists or locals as well as toidentify touristic spots

One of the main features that make the work presentedin this paper to stand out compared to previous work is thatone-year data from an actual Wi-Fi tracking system deployedin a real-life environment is analyzed Most of previous worksconsider hours or a few days (eg weekdays) or weeks Onlythe work presented in [54] covers a similar period of time(being even larger) but the analysis is much broader beingfar away from the level of detail provided in this paper Theconsidered period of time allows analyzing seasonality effectsand other patterns that although may be seen as commonknowledge do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become models which can be used todetect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning It is also worthto mention the use of clustering to improve the data analysisand interpretation (as in previous works eg [38]) as wellas the application of the semantic trajectory concept [55]which combines positioning data with an external source of

8 Wireless Communications and Mobile Computing

(a) (b) (c)

Figure 3 (a) Raspberry Pi (b) TP-LINK USBWi-Fi dongle (c) developed Wi-Fi listening device

information to classify the different positions according to theactivity carried out in the area (eg users positioned in thecantina will be eating) and which has not been extensivelyexplored in previous works (eg [56])

Another strong point of the work presented in this papercompared to previous work is the validation of the Wi-Fitracking technique to estimate the occupation of the libraryof theTelecommunications School and the comparisonwith avideo camera system Only a few previous works such as [40]or [49] perform a similar validation (in [40] the number ofpeople detected in the room is compared with the attendancelist of the seminar taking place there and in [49] students arerequested to turn on the Wi-Fi interfaces of their devices andprovide the MAC addresses of their smartphones and laptopsthrough an anonymous web form to serve as ground truth fordevice classification) However again the period consideredin this paper is much larger and it is proved that the Wi-Fi tracking system outperforms the more expensive videocamera system As a matter of fact theWi-Fi tracking systemis actually used in a day-to-day basis by the library staff whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

3 Data Acquisition Infrastructure

Figure 3 shows the Wi-Fi listening devicesensor developedfor the people flow monitoring sensor network of the SmartCEI Moncloa It is based on very common and cheaphardware namely a Raspberry Pi board [64] one of the mostwidely used hardware for IoT and a TP-LINK USB Wi-Fidongle [65] configured in monitor mode As a result the costof this solution is in the order of tens of euros (around 80eurowhenmanufactured on 2016) which represents a remarkablecost reduction compared to other solutions available in themarket

As Figure 4 illustrates broadly speaking these Wi-Fisensors scan each of the Wi-Fi channels from both the24GHz and the 5GHz bands during a configurable amountof time (currently 250ms) read the header of the radio IEEE80211 packets (eg data packets or probe requests) in itsregion of coverage and record the sender MAC addressesAs these MAC addresses are unique per device counting

them is a good indicator of the number of devices availablein the surroundings of the Wi-Fi sensors (although theremay be issues related to the fact that a single user can bringseveral devices as already pointed out in Section 2) and theyallow for temporal correlation analysis thus obtaining usefulinformation such as stay time availability patterns etc

Regarding how this information is carried all the way upto the backend server and processed there Figure 5 showsanoverview of the communications architecture and protocolstack of the people flow monitoring service of the Smart CEIMoncloa

As it can be seen theWi-Fi sensors are directly connectedto the backend via the Ethernet network of the UPM Com-munications are protected end-to-end by the use of TransportLayer Security (TLS) on top of Transport Control Protocol(TCP)Internet Protocol (IP) Measurements are periodicallysent using Message Queue Telemetry Transport (MQTT)[66] The publishsubscribe mechanism provided by MQTTallows the Wi-Fi sensors not only to sendmeasurements (ieevents) periodically but also to receive commands (eg toreboot them or to perform a remote firmware update)

Taking advantage of the hierarchical structure of theMQTT topics all the publish events follow the structure SER-VICEIDEVENT(TIMESTAMP) Thus the publish eventsfrom the Wi-Fi sensors start by Wi-Fi followed by the MACaddress of its Ethernet interface which is used as unique IDThe format of the content published under the different topicsis Comma Separated Value (CSV) which is a lightweightsolution especially appropriate when the data structure isfixed since the meaning of each field of the subsequent linesis explained only in the header at the beginning of the file

Privacy issues have been also considered carefully thedeveloped Wi-Fi sensors apply an irreversible hash MD-5function with salt to the MAC address which avoids brute-force attackswith precomputed tables In addition as Figure 5shows once anonymized the data are carried securely up tothe platform servers where they are handled in an aggregatemanner instead of individually

Furthermore the software of the developed sensors hasbeenmodified in order to avoid that theMACanonymizationmechanisms presented in Section 2 affect the obtained mea-surements Thus the Wi-Fi frames with locally administered

Wireless Communications and Mobile Computing 9

AP1User1

User2

User4

User3

DataMAC_U3(ch3)

DataMAC_U1(ch1)

Probe ReqMAC_U2(chN)

Probe ReqMAC_U4(chM)

(ch1)

The developed Wi-Fi Listening Device islistening ldquoxrdquo ms at each channel

AP2(ch2)

AP3(ch3)

AP4(ch4)

Figure 4 Sketch of how the developed Wi-Fi listening devices obtain the MAC addresses of the users surrounding them

WiFiSensors

MQTTTLSTCP

IP8023

MQTTTLSTCP

IP8023

Backend Server

MQTT Broker

Real-time ampbatch processing

REST APIServer

CSV file-basedtemporal DB

DashboardWeb server

Figure 5 Communications architecture and protocol stack for the people flow monitoring service of the Smart CEI Moncloa

MAC addresses or including special MAC address ranges arediscarded so these devices are not considered Anyway as ithas been already mentioned in Section 2 MAC randomiza-tion is not actually such a big deal in our case since most ofthe users are connected to the Eduroam free Wi-Fi access sotheir smartphones end up using their actual MAC address

After gathering the CSV files for a large time span wemoved to an offline analysis platform on a private clusterThe Apache Spark 22 software platform was selected as thedata processing tool due to its optimized capabilities to workwith large amounts of structured data This drives into largedatasets which common centralized system will struggle toprocess but distributed systems such as Spark can handleeasily

The private cluster mentioned is composed by eightHP ProLiant SL250s Gen 8 machines with two Intel Xeone52630v2 26GHz (6 cores each) and RAM 32GB In addi-tion to Spark this platform also runs an Apache Hadoop

Filesystem where the dataset and the results are stored Thisis a distributed filesystem which allows that all the machinesaccess to the stored data in parallel The usage of suchplatform is advisable in order to speed up analysis but notmandatory since the algorithms we are going to describe areavailable in many other software platforms (such as thoseprovided with R or Python)

4 Data Analysis and Results

41 Data Processing Although this work is not a proper datamining process given that we are not using those kinds ofalgorithms and analyses the necessary steps previous to theactual analysis are the same that in a KDDmdashKnowledge Dis-covery in Databasesmdashprocess [67] For our analysis we tookthe aforementioned CSV format files each one containingthe data collected by a single sensor during a period of 15minutes The observation period used in our analysis is a full

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

6 Wireless Communications and Mobile Computing

Table 2 Continued

Ref Year Scenario Application PA Own deployment[63] 2017 Indoor Crowdsourcing positioning based on Wi-Fi fingerprint A No Institutional APs

[41] 2018 IndoorCoarse positioning room level precision based onprobabilistic Wi-Fi fingerprint Usage of Hidden Markovchain models to analyze user movement

P No Institutional APs

[35] 2016 Indoor Trajectory analysis based on Hidden Markov chain models P No Institutional APs

[42] 2017 IndoorEstimate the number of participants and their space and timeevolution in an area of about 167 hectares during 2016 OpenDay of the European JRC

P No Institutional APs

[47] 2016 IndoorOutdoor

Study mobility-related activities in a campus of 440 hectaresbased on the 2700 APs of the institutional network andadditional opt-in smartphone application

A No Institutional APs

[53] 2014 Indoor

Classify users in a hospital (eg patient doctoradministrative) by checking the number of hours and thepositions of a user over time based on the institutional Wi-Finetwork

P No Institutional APs

[38] 2012 IndoorIdentify flocks walking in a building and their behavior basedon signal strength from the institutional Wi-Fi and usingclustering techniques

P No Institutional APs

[45] 2016 Indoor Analyze usersrsquo occupation (based on Markov models) as wellas regular and irregular hours in a university campus P No Institutional APs

[40] 2016 Indoor Analyze room utilization and people tracking providing heatmaps Analyze device statistics A No Institutional APs

[48] 2017 IndoorOutdoor

Analyze people mobility monitoring and tracking in SmartCities and traffic in a highway (eg driving behavior trafficforecasting)

P Yes Raspberry Pibased

[49] 2018 IndoorOutdoor

Provide user localization user profiling and deviceclassification A Yes Raspberry Pi

based

[54] 2005 IndoorOutdoor

Analyze Wi-Fi tracking records gathered during more thanone year in Madeira to classify users as tourists or locals andidentify touristic spots

PYes Based on

TP-Link MR3240v2home router

[56] 2017 Indoor

Obtain semantic trajectories Classify users based on theirlocations Analyze the probability of a user going to a specificshop based on their history and propose the creation of arecommender based on the whole dataset

- -

occupation of different zones [39ndash42] and obtain behaviorpatterns [36 43ndash49]

Wi-Fi tracking of a specific user is usually done byanalyzing the collected records related to a specific MACaddress ie tracking users is equivalent to tracking theirMAC address This technique in general presents issuesrelated to tracking people individually and privacy Althoughit is true that a user carrying several devices (eg smartphonetablet laptop) with the Wi-Fi interfaces on would be at firstidentified as several users after a reasonable period of timethis information can be correlated to fix the problem [50]In the case of the work presented in this paper the filesassociated with the measurements of different sensors of thesame building are compared in order to avoid counting thesame mobileperson several times (eg due to overlappingWi-Fi cells) The resulting file allows for the analysis ofpeople flow at a building level identifying the total staytime or the frequency of the visited places However as itis explained below on the position analysis subsection thisproblem appears in the collected data and it is necessary

to perform a postprocessing of the data to deal with thesecollisions Users may also use so-called MAC spoofing (iereplacing their actual MAC address by the MAC addressof another device) what could be seen as a kind of attackHowever this may yield connectivity problems and it is anegligible behavior in the scenario considered in this paper

Regarding privacy several proposals to protect Wi-Ficommunications by means of MAC address anonymizationhave arisen in recent years First these proposals appearedas apps for smartphones (allowing performing kind of MACspoofing) but recently smartphone manufacturers havestarted including these techniques in the latest versions oftheir Operating Systems (OS) (eg iOS Android and Win-dows) Such MAC address anonymization techniques aim toavoid using the actual MAC address until the device getsconnected to the Wi-Fi network (ie they use a fake MACaddress in their probe frames) The specific solution for thisproblem depends on the manufacturer and OS In the caseof iOS the solution involves sending locally administeredMAC addresses in the probe frames randomly selecting the

Wireless Communications and Mobile Computing 7

three less significant bytes of the MAC address This can beeasily detected just by inspecting the first byte of the MACaddress In the case of Android some manufacturers havedecided to use random MAC addresses in the probe framesfrom the MAC address ranges assigned by the IEEE to themNevertheless evenwith these techniques in place it is possibleto end up obtaining the actual device information [51 52]In addition MAC randomization is not actually a relevantproblem in the scenario considered in this paper since mostof the devices are connected to the available Wi-Fi networks(eg Eduroam) and to connect to a Wi-Fi network devicesmust use their actual MAC addresses

Wi-Fi tracking systems can be classified into two maingroups those that use the enterprise Wireless Local AreaNetwork (WLAN) itself and those that use a dedicatedlow-cost passive Wi-Fi infrastructure which is indeed thecase of the actual deployment considered in this paperOne of the main drawbacks of the systems that use theenterpriseWLAN is that they can only track the users of suchnetworks whereas independent dedicated low-cost passiveWi-Fi infrastructures allow tracking the users connected todifferentWi-Fi networks if they spatially coexist or even notconnected to any if MAC randomization is not used

As examples of works that use the enterprise WLANor the existing infrastructure of access points (APs) [42]presents the 2016 Open Day of the European Joint ResearchCenter (JRC) where 8000 people participated within anarea of about 167 hectares as a case study where the Wi-Fiinfrastructure of the event was used to estimate the numberof participants and their space and time evolution based onproperly processed MAC addresses Reference [47] presentsMobiCamp a large-scale testbed composed of around 2700APs to study mobility-related activities which combinesuser mobility traces based on Simple Network ManagementProtocol (SNMP) data with enriched data (eg gender age)provided by an opt-in smartphone application

Reference [53] represents yet another example of theanalysis that can be made with this kind of information Itsscenario is a hospital and by checking the number of hoursand the positions of a user over time they can classify thatuser according to a role eg patient doctor administrativeetc Reference [38] identifies flocks walking in a building andtheir behaviors applying clustering techniques to the signalstrength measurements provided by the institutional WLAN

Reference [33] presents a campus scenario where byusing the university network infrastructure a detailed profileof the userrsquos activity can be obtained Users tracked arethose logged into the university network which providesextra information about the user such as their role genderetc Combining that information with a detailed map whichcontains thousands of Point of Interest (POIs) the authorscan extract an activity log that shows the different userrsquosactivities with a minute precision Themain drawback of thiswork is that it totally depends on the usersrsquo profiles databaseand the POIrsquos map and both are resources complicated togain access to or create In reference [45] employing theuniversity network infrastructure and the location of eachAP in the university campus each sensor record only storesthe closest AP With this simple information the authors

can create an activity profile similar to the one showed in[33] By analyzing the basic results obtained they are able toextract new information (eg a count of irregular hours) ordetect patterns of anomalous events (eg periods of exams orholidays)

As last example of systems that use the institutionalWLAN reference [40] presents a web application in whichthe occupation of different rooms on a campus is shown inreal time The number of people in the room is calculatedusing the number of Wi-Fi devices detected by the APsUsing the signal strength measurement of each of them aheatmap is drawn that shows the user distribution in theroomThe collected data are analyzed offline to make reportsof utilization of the different rooms and to obtain conclusionsfrom the detected patterns

On the other hand [37 48 49 54] represent someexamples of works which use independent dedicated low-cost passive Wi-Fi infrastructures as it is the case of thedeployment considered in this paper In the case of [48] anetwork of devices called MOBYWIT based on a RaspberryPi and two wireless USB dongles are used to track peopleand vehiclersquos movement sniffing not only Wi-Fi but alsoBluetooth signals emitted by smartphones and vehicle hand-free calling systems In the case of [49] a passive Wi-Fiinfrastructure based on low-cost devices that combine aRaspberry Pi and a TP-LINKWi-Fi dongle is used to provideuser localization user profiling and device classificationbased on the properly processed MAC addresses capturedfrom the IEEE 80211 probe request frames Reference [37]also uses this approach to count people in a concert andin reference [54] the considered scenario is a whole island(Madeira Portugal) where the records gathered all over thereare analyzed to classify users as tourists or locals as well as toidentify touristic spots

One of the main features that make the work presentedin this paper to stand out compared to previous work is thatone-year data from an actual Wi-Fi tracking system deployedin a real-life environment is analyzed Most of previous worksconsider hours or a few days (eg weekdays) or weeks Onlythe work presented in [54] covers a similar period of time(being even larger) but the analysis is much broader beingfar away from the level of detail provided in this paper Theconsidered period of time allows analyzing seasonality effectsand other patterns that although may be seen as commonknowledge do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become models which can be used todetect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning It is also worthto mention the use of clustering to improve the data analysisand interpretation (as in previous works eg [38]) as wellas the application of the semantic trajectory concept [55]which combines positioning data with an external source of

8 Wireless Communications and Mobile Computing

(a) (b) (c)

Figure 3 (a) Raspberry Pi (b) TP-LINK USBWi-Fi dongle (c) developed Wi-Fi listening device

information to classify the different positions according to theactivity carried out in the area (eg users positioned in thecantina will be eating) and which has not been extensivelyexplored in previous works (eg [56])

Another strong point of the work presented in this papercompared to previous work is the validation of the Wi-Fitracking technique to estimate the occupation of the libraryof theTelecommunications School and the comparisonwith avideo camera system Only a few previous works such as [40]or [49] perform a similar validation (in [40] the number ofpeople detected in the room is compared with the attendancelist of the seminar taking place there and in [49] students arerequested to turn on the Wi-Fi interfaces of their devices andprovide the MAC addresses of their smartphones and laptopsthrough an anonymous web form to serve as ground truth fordevice classification) However again the period consideredin this paper is much larger and it is proved that the Wi-Fi tracking system outperforms the more expensive videocamera system As a matter of fact theWi-Fi tracking systemis actually used in a day-to-day basis by the library staff whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

3 Data Acquisition Infrastructure

Figure 3 shows the Wi-Fi listening devicesensor developedfor the people flow monitoring sensor network of the SmartCEI Moncloa It is based on very common and cheaphardware namely a Raspberry Pi board [64] one of the mostwidely used hardware for IoT and a TP-LINK USB Wi-Fidongle [65] configured in monitor mode As a result the costof this solution is in the order of tens of euros (around 80eurowhenmanufactured on 2016) which represents a remarkablecost reduction compared to other solutions available in themarket

As Figure 4 illustrates broadly speaking these Wi-Fisensors scan each of the Wi-Fi channels from both the24GHz and the 5GHz bands during a configurable amountof time (currently 250ms) read the header of the radio IEEE80211 packets (eg data packets or probe requests) in itsregion of coverage and record the sender MAC addressesAs these MAC addresses are unique per device counting

them is a good indicator of the number of devices availablein the surroundings of the Wi-Fi sensors (although theremay be issues related to the fact that a single user can bringseveral devices as already pointed out in Section 2) and theyallow for temporal correlation analysis thus obtaining usefulinformation such as stay time availability patterns etc

Regarding how this information is carried all the way upto the backend server and processed there Figure 5 showsanoverview of the communications architecture and protocolstack of the people flow monitoring service of the Smart CEIMoncloa

As it can be seen theWi-Fi sensors are directly connectedto the backend via the Ethernet network of the UPM Com-munications are protected end-to-end by the use of TransportLayer Security (TLS) on top of Transport Control Protocol(TCP)Internet Protocol (IP) Measurements are periodicallysent using Message Queue Telemetry Transport (MQTT)[66] The publishsubscribe mechanism provided by MQTTallows the Wi-Fi sensors not only to sendmeasurements (ieevents) periodically but also to receive commands (eg toreboot them or to perform a remote firmware update)

Taking advantage of the hierarchical structure of theMQTT topics all the publish events follow the structure SER-VICEIDEVENT(TIMESTAMP) Thus the publish eventsfrom the Wi-Fi sensors start by Wi-Fi followed by the MACaddress of its Ethernet interface which is used as unique IDThe format of the content published under the different topicsis Comma Separated Value (CSV) which is a lightweightsolution especially appropriate when the data structure isfixed since the meaning of each field of the subsequent linesis explained only in the header at the beginning of the file

Privacy issues have been also considered carefully thedeveloped Wi-Fi sensors apply an irreversible hash MD-5function with salt to the MAC address which avoids brute-force attackswith precomputed tables In addition as Figure 5shows once anonymized the data are carried securely up tothe platform servers where they are handled in an aggregatemanner instead of individually

Furthermore the software of the developed sensors hasbeenmodified in order to avoid that theMACanonymizationmechanisms presented in Section 2 affect the obtained mea-surements Thus the Wi-Fi frames with locally administered

Wireless Communications and Mobile Computing 9

AP1User1

User2

User4

User3

DataMAC_U3(ch3)

DataMAC_U1(ch1)

Probe ReqMAC_U2(chN)

Probe ReqMAC_U4(chM)

(ch1)

The developed Wi-Fi Listening Device islistening ldquoxrdquo ms at each channel

AP2(ch2)

AP3(ch3)

AP4(ch4)

Figure 4 Sketch of how the developed Wi-Fi listening devices obtain the MAC addresses of the users surrounding them

WiFiSensors

MQTTTLSTCP

IP8023

MQTTTLSTCP

IP8023

Backend Server

MQTT Broker

Real-time ampbatch processing

REST APIServer

CSV file-basedtemporal DB

DashboardWeb server

Figure 5 Communications architecture and protocol stack for the people flow monitoring service of the Smart CEI Moncloa

MAC addresses or including special MAC address ranges arediscarded so these devices are not considered Anyway as ithas been already mentioned in Section 2 MAC randomiza-tion is not actually such a big deal in our case since most ofthe users are connected to the Eduroam free Wi-Fi access sotheir smartphones end up using their actual MAC address

After gathering the CSV files for a large time span wemoved to an offline analysis platform on a private clusterThe Apache Spark 22 software platform was selected as thedata processing tool due to its optimized capabilities to workwith large amounts of structured data This drives into largedatasets which common centralized system will struggle toprocess but distributed systems such as Spark can handleeasily

The private cluster mentioned is composed by eightHP ProLiant SL250s Gen 8 machines with two Intel Xeone52630v2 26GHz (6 cores each) and RAM 32GB In addi-tion to Spark this platform also runs an Apache Hadoop

Filesystem where the dataset and the results are stored Thisis a distributed filesystem which allows that all the machinesaccess to the stored data in parallel The usage of suchplatform is advisable in order to speed up analysis but notmandatory since the algorithms we are going to describe areavailable in many other software platforms (such as thoseprovided with R or Python)

4 Data Analysis and Results

41 Data Processing Although this work is not a proper datamining process given that we are not using those kinds ofalgorithms and analyses the necessary steps previous to theactual analysis are the same that in a KDDmdashKnowledge Dis-covery in Databasesmdashprocess [67] For our analysis we tookthe aforementioned CSV format files each one containingthe data collected by a single sensor during a period of 15minutes The observation period used in our analysis is a full

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 7

three less significant bytes of the MAC address This can beeasily detected just by inspecting the first byte of the MACaddress In the case of Android some manufacturers havedecided to use random MAC addresses in the probe framesfrom the MAC address ranges assigned by the IEEE to themNevertheless evenwith these techniques in place it is possibleto end up obtaining the actual device information [51 52]In addition MAC randomization is not actually a relevantproblem in the scenario considered in this paper since mostof the devices are connected to the available Wi-Fi networks(eg Eduroam) and to connect to a Wi-Fi network devicesmust use their actual MAC addresses

Wi-Fi tracking systems can be classified into two maingroups those that use the enterprise Wireless Local AreaNetwork (WLAN) itself and those that use a dedicatedlow-cost passive Wi-Fi infrastructure which is indeed thecase of the actual deployment considered in this paperOne of the main drawbacks of the systems that use theenterpriseWLAN is that they can only track the users of suchnetworks whereas independent dedicated low-cost passiveWi-Fi infrastructures allow tracking the users connected todifferentWi-Fi networks if they spatially coexist or even notconnected to any if MAC randomization is not used

As examples of works that use the enterprise WLANor the existing infrastructure of access points (APs) [42]presents the 2016 Open Day of the European Joint ResearchCenter (JRC) where 8000 people participated within anarea of about 167 hectares as a case study where the Wi-Fiinfrastructure of the event was used to estimate the numberof participants and their space and time evolution based onproperly processed MAC addresses Reference [47] presentsMobiCamp a large-scale testbed composed of around 2700APs to study mobility-related activities which combinesuser mobility traces based on Simple Network ManagementProtocol (SNMP) data with enriched data (eg gender age)provided by an opt-in smartphone application

Reference [53] represents yet another example of theanalysis that can be made with this kind of information Itsscenario is a hospital and by checking the number of hoursand the positions of a user over time they can classify thatuser according to a role eg patient doctor administrativeetc Reference [38] identifies flocks walking in a building andtheir behaviors applying clustering techniques to the signalstrength measurements provided by the institutional WLAN

Reference [33] presents a campus scenario where byusing the university network infrastructure a detailed profileof the userrsquos activity can be obtained Users tracked arethose logged into the university network which providesextra information about the user such as their role genderetc Combining that information with a detailed map whichcontains thousands of Point of Interest (POIs) the authorscan extract an activity log that shows the different userrsquosactivities with a minute precision Themain drawback of thiswork is that it totally depends on the usersrsquo profiles databaseand the POIrsquos map and both are resources complicated togain access to or create In reference [45] employing theuniversity network infrastructure and the location of eachAP in the university campus each sensor record only storesthe closest AP With this simple information the authors

can create an activity profile similar to the one showed in[33] By analyzing the basic results obtained they are able toextract new information (eg a count of irregular hours) ordetect patterns of anomalous events (eg periods of exams orholidays)

As last example of systems that use the institutionalWLAN reference [40] presents a web application in whichthe occupation of different rooms on a campus is shown inreal time The number of people in the room is calculatedusing the number of Wi-Fi devices detected by the APsUsing the signal strength measurement of each of them aheatmap is drawn that shows the user distribution in theroomThe collected data are analyzed offline to make reportsof utilization of the different rooms and to obtain conclusionsfrom the detected patterns

On the other hand [37 48 49 54] represent someexamples of works which use independent dedicated low-cost passive Wi-Fi infrastructures as it is the case of thedeployment considered in this paper In the case of [48] anetwork of devices called MOBYWIT based on a RaspberryPi and two wireless USB dongles are used to track peopleand vehiclersquos movement sniffing not only Wi-Fi but alsoBluetooth signals emitted by smartphones and vehicle hand-free calling systems In the case of [49] a passive Wi-Fiinfrastructure based on low-cost devices that combine aRaspberry Pi and a TP-LINKWi-Fi dongle is used to provideuser localization user profiling and device classificationbased on the properly processed MAC addresses capturedfrom the IEEE 80211 probe request frames Reference [37]also uses this approach to count people in a concert andin reference [54] the considered scenario is a whole island(Madeira Portugal) where the records gathered all over thereare analyzed to classify users as tourists or locals as well as toidentify touristic spots

One of the main features that make the work presentedin this paper to stand out compared to previous work is thatone-year data from an actual Wi-Fi tracking system deployedin a real-life environment is analyzed Most of previous worksconsider hours or a few days (eg weekdays) or weeks Onlythe work presented in [54] covers a similar period of time(being even larger) but the analysis is much broader beingfar away from the level of detail provided in this paper Theconsidered period of time allows analyzing seasonality effectsand other patterns that although may be seen as commonknowledge do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become models which can be used todetect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning It is also worthto mention the use of clustering to improve the data analysisand interpretation (as in previous works eg [38]) as wellas the application of the semantic trajectory concept [55]which combines positioning data with an external source of

8 Wireless Communications and Mobile Computing

(a) (b) (c)

Figure 3 (a) Raspberry Pi (b) TP-LINK USBWi-Fi dongle (c) developed Wi-Fi listening device

information to classify the different positions according to theactivity carried out in the area (eg users positioned in thecantina will be eating) and which has not been extensivelyexplored in previous works (eg [56])

Another strong point of the work presented in this papercompared to previous work is the validation of the Wi-Fitracking technique to estimate the occupation of the libraryof theTelecommunications School and the comparisonwith avideo camera system Only a few previous works such as [40]or [49] perform a similar validation (in [40] the number ofpeople detected in the room is compared with the attendancelist of the seminar taking place there and in [49] students arerequested to turn on the Wi-Fi interfaces of their devices andprovide the MAC addresses of their smartphones and laptopsthrough an anonymous web form to serve as ground truth fordevice classification) However again the period consideredin this paper is much larger and it is proved that the Wi-Fi tracking system outperforms the more expensive videocamera system As a matter of fact theWi-Fi tracking systemis actually used in a day-to-day basis by the library staff whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

3 Data Acquisition Infrastructure

Figure 3 shows the Wi-Fi listening devicesensor developedfor the people flow monitoring sensor network of the SmartCEI Moncloa It is based on very common and cheaphardware namely a Raspberry Pi board [64] one of the mostwidely used hardware for IoT and a TP-LINK USB Wi-Fidongle [65] configured in monitor mode As a result the costof this solution is in the order of tens of euros (around 80eurowhenmanufactured on 2016) which represents a remarkablecost reduction compared to other solutions available in themarket

As Figure 4 illustrates broadly speaking these Wi-Fisensors scan each of the Wi-Fi channels from both the24GHz and the 5GHz bands during a configurable amountof time (currently 250ms) read the header of the radio IEEE80211 packets (eg data packets or probe requests) in itsregion of coverage and record the sender MAC addressesAs these MAC addresses are unique per device counting

them is a good indicator of the number of devices availablein the surroundings of the Wi-Fi sensors (although theremay be issues related to the fact that a single user can bringseveral devices as already pointed out in Section 2) and theyallow for temporal correlation analysis thus obtaining usefulinformation such as stay time availability patterns etc

Regarding how this information is carried all the way upto the backend server and processed there Figure 5 showsanoverview of the communications architecture and protocolstack of the people flow monitoring service of the Smart CEIMoncloa

As it can be seen theWi-Fi sensors are directly connectedto the backend via the Ethernet network of the UPM Com-munications are protected end-to-end by the use of TransportLayer Security (TLS) on top of Transport Control Protocol(TCP)Internet Protocol (IP) Measurements are periodicallysent using Message Queue Telemetry Transport (MQTT)[66] The publishsubscribe mechanism provided by MQTTallows the Wi-Fi sensors not only to sendmeasurements (ieevents) periodically but also to receive commands (eg toreboot them or to perform a remote firmware update)

Taking advantage of the hierarchical structure of theMQTT topics all the publish events follow the structure SER-VICEIDEVENT(TIMESTAMP) Thus the publish eventsfrom the Wi-Fi sensors start by Wi-Fi followed by the MACaddress of its Ethernet interface which is used as unique IDThe format of the content published under the different topicsis Comma Separated Value (CSV) which is a lightweightsolution especially appropriate when the data structure isfixed since the meaning of each field of the subsequent linesis explained only in the header at the beginning of the file

Privacy issues have been also considered carefully thedeveloped Wi-Fi sensors apply an irreversible hash MD-5function with salt to the MAC address which avoids brute-force attackswith precomputed tables In addition as Figure 5shows once anonymized the data are carried securely up tothe platform servers where they are handled in an aggregatemanner instead of individually

Furthermore the software of the developed sensors hasbeenmodified in order to avoid that theMACanonymizationmechanisms presented in Section 2 affect the obtained mea-surements Thus the Wi-Fi frames with locally administered

Wireless Communications and Mobile Computing 9

AP1User1

User2

User4

User3

DataMAC_U3(ch3)

DataMAC_U1(ch1)

Probe ReqMAC_U2(chN)

Probe ReqMAC_U4(chM)

(ch1)

The developed Wi-Fi Listening Device islistening ldquoxrdquo ms at each channel

AP2(ch2)

AP3(ch3)

AP4(ch4)

Figure 4 Sketch of how the developed Wi-Fi listening devices obtain the MAC addresses of the users surrounding them

WiFiSensors

MQTTTLSTCP

IP8023

MQTTTLSTCP

IP8023

Backend Server

MQTT Broker

Real-time ampbatch processing

REST APIServer

CSV file-basedtemporal DB

DashboardWeb server

Figure 5 Communications architecture and protocol stack for the people flow monitoring service of the Smart CEI Moncloa

MAC addresses or including special MAC address ranges arediscarded so these devices are not considered Anyway as ithas been already mentioned in Section 2 MAC randomiza-tion is not actually such a big deal in our case since most ofthe users are connected to the Eduroam free Wi-Fi access sotheir smartphones end up using their actual MAC address

After gathering the CSV files for a large time span wemoved to an offline analysis platform on a private clusterThe Apache Spark 22 software platform was selected as thedata processing tool due to its optimized capabilities to workwith large amounts of structured data This drives into largedatasets which common centralized system will struggle toprocess but distributed systems such as Spark can handleeasily

The private cluster mentioned is composed by eightHP ProLiant SL250s Gen 8 machines with two Intel Xeone52630v2 26GHz (6 cores each) and RAM 32GB In addi-tion to Spark this platform also runs an Apache Hadoop

Filesystem where the dataset and the results are stored Thisis a distributed filesystem which allows that all the machinesaccess to the stored data in parallel The usage of suchplatform is advisable in order to speed up analysis but notmandatory since the algorithms we are going to describe areavailable in many other software platforms (such as thoseprovided with R or Python)

4 Data Analysis and Results

41 Data Processing Although this work is not a proper datamining process given that we are not using those kinds ofalgorithms and analyses the necessary steps previous to theactual analysis are the same that in a KDDmdashKnowledge Dis-covery in Databasesmdashprocess [67] For our analysis we tookthe aforementioned CSV format files each one containingthe data collected by a single sensor during a period of 15minutes The observation period used in our analysis is a full

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

8 Wireless Communications and Mobile Computing

(a) (b) (c)

Figure 3 (a) Raspberry Pi (b) TP-LINK USBWi-Fi dongle (c) developed Wi-Fi listening device

information to classify the different positions according to theactivity carried out in the area (eg users positioned in thecantina will be eating) and which has not been extensivelyexplored in previous works (eg [56])

Another strong point of the work presented in this papercompared to previous work is the validation of the Wi-Fitracking technique to estimate the occupation of the libraryof theTelecommunications School and the comparisonwith avideo camera system Only a few previous works such as [40]or [49] perform a similar validation (in [40] the number ofpeople detected in the room is compared with the attendancelist of the seminar taking place there and in [49] students arerequested to turn on the Wi-Fi interfaces of their devices andprovide the MAC addresses of their smartphones and laptopsthrough an anonymous web form to serve as ground truth fordevice classification) However again the period consideredin this paper is much larger and it is proved that the Wi-Fi tracking system outperforms the more expensive videocamera system As a matter of fact theWi-Fi tracking systemis actually used in a day-to-day basis by the library staff whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

3 Data Acquisition Infrastructure

Figure 3 shows the Wi-Fi listening devicesensor developedfor the people flow monitoring sensor network of the SmartCEI Moncloa It is based on very common and cheaphardware namely a Raspberry Pi board [64] one of the mostwidely used hardware for IoT and a TP-LINK USB Wi-Fidongle [65] configured in monitor mode As a result the costof this solution is in the order of tens of euros (around 80eurowhenmanufactured on 2016) which represents a remarkablecost reduction compared to other solutions available in themarket

As Figure 4 illustrates broadly speaking these Wi-Fisensors scan each of the Wi-Fi channels from both the24GHz and the 5GHz bands during a configurable amountof time (currently 250ms) read the header of the radio IEEE80211 packets (eg data packets or probe requests) in itsregion of coverage and record the sender MAC addressesAs these MAC addresses are unique per device counting

them is a good indicator of the number of devices availablein the surroundings of the Wi-Fi sensors (although theremay be issues related to the fact that a single user can bringseveral devices as already pointed out in Section 2) and theyallow for temporal correlation analysis thus obtaining usefulinformation such as stay time availability patterns etc

Regarding how this information is carried all the way upto the backend server and processed there Figure 5 showsanoverview of the communications architecture and protocolstack of the people flow monitoring service of the Smart CEIMoncloa

As it can be seen theWi-Fi sensors are directly connectedto the backend via the Ethernet network of the UPM Com-munications are protected end-to-end by the use of TransportLayer Security (TLS) on top of Transport Control Protocol(TCP)Internet Protocol (IP) Measurements are periodicallysent using Message Queue Telemetry Transport (MQTT)[66] The publishsubscribe mechanism provided by MQTTallows the Wi-Fi sensors not only to sendmeasurements (ieevents) periodically but also to receive commands (eg toreboot them or to perform a remote firmware update)

Taking advantage of the hierarchical structure of theMQTT topics all the publish events follow the structure SER-VICEIDEVENT(TIMESTAMP) Thus the publish eventsfrom the Wi-Fi sensors start by Wi-Fi followed by the MACaddress of its Ethernet interface which is used as unique IDThe format of the content published under the different topicsis Comma Separated Value (CSV) which is a lightweightsolution especially appropriate when the data structure isfixed since the meaning of each field of the subsequent linesis explained only in the header at the beginning of the file

Privacy issues have been also considered carefully thedeveloped Wi-Fi sensors apply an irreversible hash MD-5function with salt to the MAC address which avoids brute-force attackswith precomputed tables In addition as Figure 5shows once anonymized the data are carried securely up tothe platform servers where they are handled in an aggregatemanner instead of individually

Furthermore the software of the developed sensors hasbeenmodified in order to avoid that theMACanonymizationmechanisms presented in Section 2 affect the obtained mea-surements Thus the Wi-Fi frames with locally administered

Wireless Communications and Mobile Computing 9

AP1User1

User2

User4

User3

DataMAC_U3(ch3)

DataMAC_U1(ch1)

Probe ReqMAC_U2(chN)

Probe ReqMAC_U4(chM)

(ch1)

The developed Wi-Fi Listening Device islistening ldquoxrdquo ms at each channel

AP2(ch2)

AP3(ch3)

AP4(ch4)

Figure 4 Sketch of how the developed Wi-Fi listening devices obtain the MAC addresses of the users surrounding them

WiFiSensors

MQTTTLSTCP

IP8023

MQTTTLSTCP

IP8023

Backend Server

MQTT Broker

Real-time ampbatch processing

REST APIServer

CSV file-basedtemporal DB

DashboardWeb server

Figure 5 Communications architecture and protocol stack for the people flow monitoring service of the Smart CEI Moncloa

MAC addresses or including special MAC address ranges arediscarded so these devices are not considered Anyway as ithas been already mentioned in Section 2 MAC randomiza-tion is not actually such a big deal in our case since most ofthe users are connected to the Eduroam free Wi-Fi access sotheir smartphones end up using their actual MAC address

After gathering the CSV files for a large time span wemoved to an offline analysis platform on a private clusterThe Apache Spark 22 software platform was selected as thedata processing tool due to its optimized capabilities to workwith large amounts of structured data This drives into largedatasets which common centralized system will struggle toprocess but distributed systems such as Spark can handleeasily

The private cluster mentioned is composed by eightHP ProLiant SL250s Gen 8 machines with two Intel Xeone52630v2 26GHz (6 cores each) and RAM 32GB In addi-tion to Spark this platform also runs an Apache Hadoop

Filesystem where the dataset and the results are stored Thisis a distributed filesystem which allows that all the machinesaccess to the stored data in parallel The usage of suchplatform is advisable in order to speed up analysis but notmandatory since the algorithms we are going to describe areavailable in many other software platforms (such as thoseprovided with R or Python)

4 Data Analysis and Results

41 Data Processing Although this work is not a proper datamining process given that we are not using those kinds ofalgorithms and analyses the necessary steps previous to theactual analysis are the same that in a KDDmdashKnowledge Dis-covery in Databasesmdashprocess [67] For our analysis we tookthe aforementioned CSV format files each one containingthe data collected by a single sensor during a period of 15minutes The observation period used in our analysis is a full

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 9

AP1User1

User2

User4

User3

DataMAC_U3(ch3)

DataMAC_U1(ch1)

Probe ReqMAC_U2(chN)

Probe ReqMAC_U4(chM)

(ch1)

The developed Wi-Fi Listening Device islistening ldquoxrdquo ms at each channel

AP2(ch2)

AP3(ch3)

AP4(ch4)

Figure 4 Sketch of how the developed Wi-Fi listening devices obtain the MAC addresses of the users surrounding them

WiFiSensors

MQTTTLSTCP

IP8023

MQTTTLSTCP

IP8023

Backend Server

MQTT Broker

Real-time ampbatch processing

REST APIServer

CSV file-basedtemporal DB

DashboardWeb server

Figure 5 Communications architecture and protocol stack for the people flow monitoring service of the Smart CEI Moncloa

MAC addresses or including special MAC address ranges arediscarded so these devices are not considered Anyway as ithas been already mentioned in Section 2 MAC randomiza-tion is not actually such a big deal in our case since most ofthe users are connected to the Eduroam free Wi-Fi access sotheir smartphones end up using their actual MAC address

After gathering the CSV files for a large time span wemoved to an offline analysis platform on a private clusterThe Apache Spark 22 software platform was selected as thedata processing tool due to its optimized capabilities to workwith large amounts of structured data This drives into largedatasets which common centralized system will struggle toprocess but distributed systems such as Spark can handleeasily

The private cluster mentioned is composed by eightHP ProLiant SL250s Gen 8 machines with two Intel Xeone52630v2 26GHz (6 cores each) and RAM 32GB In addi-tion to Spark this platform also runs an Apache Hadoop

Filesystem where the dataset and the results are stored Thisis a distributed filesystem which allows that all the machinesaccess to the stored data in parallel The usage of suchplatform is advisable in order to speed up analysis but notmandatory since the algorithms we are going to describe areavailable in many other software platforms (such as thoseprovided with R or Python)

4 Data Analysis and Results

41 Data Processing Although this work is not a proper datamining process given that we are not using those kinds ofalgorithms and analyses the necessary steps previous to theactual analysis are the same that in a KDDmdashKnowledge Dis-covery in Databasesmdashprocess [67] For our analysis we tookthe aforementioned CSV format files each one containingthe data collected by a single sensor during a period of 15minutes The observation period used in our analysis is a full

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

10 Wireless Communications and Mobile Computing

log(

user

s)

106

105

104

103

102

101

100

Hours0 2500 5000 7500 10000 12500 15000 17500 20000

(a) Histogram user-number of registered minutes

log(

user

s)

104

103

102

101

100

Hours500 1000 1500 2000 2500 3000 3500

(b) Histogram of hours per user during whole year-filtered

Figure 6 Histogram user-number of registered hours (a) raw analysis (b) first threshold applied

year from January 2016 to December 2016 In a one-year timelapse 9 sensors each generating a file every fifteen minuteswould create around 315K files a theoretically maximumbecause a file is only created if the sensor is activated anddetects at least one device during the period The sensorswere deployed at the beginning of 2016 but until Marchthe deployment was not totally finished There were alsosome holidays during 2016 when ETSIT was fully closedDuring those days some of the sensors (although there arealways security staff somewhere in the covered area) detectedno devices so they did not generate a file As a result thenumber of files which we actually have for the analysis is246K

The first step in preprocessing these data was to mergeall the files into a single one resulting in a 2GB CSV plaintext file where each row represents the observation of adevice during the associated time period including amongothers the anonymized MAC address the sensor nameand the detection of the device for a given minute in theobservation timespan With this transformation the resultingdataset contains 63 million rows each one representing thedetection of a single device in a single minute by a singlesensor

A total of 128K unique devices were detected duringthe whole year It is known that the number of peoplethat regularly go to ETSIT is around 4K so not all of thedevices detected can be considered actual users of interestconsequently the data needed to be filtered Only those ofpeople that perform activities related to the place such asstudents professors etc should be taken into account Thusit is necessary to filter out devices keeping only the ones

that can be labelled as users We apply filters based on theobservations of each device

So for each device we count the number of minutes itwas detected in the entire dataset A device can be seen bymore than one sensor during the same minute so to generatethis measure we considered that the repeated minutes arecounted only once per device Grouping the resulting counta histogram (Figure 6) is obtained on which it is possible tomake a classification of the devices based on the total timerecorded during the whole year

To facilitate the analysis of the chart the horizontalaxis has been expressed in hours and the count in thevertical axis is shown in logarithmic scale Around 95of the devices were seen for less than 48 hours during thewhole year in average less than four hours per month Thisgroup is mainly composed by people passing near the schoolbuildings momentarily entering the coverage area of thesensors without accessing ETSIT Figure 6(b) shows thehistogram applying a lower threshold of 48 hours and anupper threshold of 3650 an average of 10 hours per day Inthis chart it can be observed some isolated peaks in the tailof the graphic starting around 1000 hours in the horizontalaxis A detailed analysis of these peaks revealed that they weredevices that remained connected continuously for severaldays like servers So we applied the label ldquouserrdquo only to thosedevices that registered a number of hours during the yearbetween these two thresholds As shown in Table 3 from the128K detected devices only 4653 were classified as users overwhich we will perform the rest of the analysis

The last step before the proper analyses is to merge thisdataset with the information about the position and name of

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 11

Accu

mul

ated

min

utes

[times10

00]

150

100

50

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(a)

Num

ber o

f use

rs[times

100

]

20

15

10

5

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(b)

Ratio

[min

utes

us

er]

400

200

0

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

(c)

Figure 7 Daily analysis aggregated during 2016 (a) minutes accumulated (b) unique users registered (c) ratio between minutes and users

Table 3 Dataset statistics after and before preprocessing

Rows DevicesBefore preprocessing 63427489 128188After preprocessing 41294344 (65) 4653 (36)

the sensors Additionally only the data columns with relevantinformation for the analyses are kept in order to reduce thein-memory cost

(1) timestamp the detection time measured in minutes

(2) user the detected MAC address device

(3) sensor the MAC address of the sensor which madethe detection

(4) sensorName the name of the sensor which made thedetection

(5) accessPoint the MAC address of the SSID to whichthe device is connected if any

(6) sensorLat latitude of the sensorrsquos position

(7) sensorLng length of the sensorrsquos position

42 Temporal Analysis Once the data was ready we per-formed a temporal study aiming to know whether thecaptured data allowed for the identification of significantperiods of activity or trends in user behavior in ETSIT As oneyear is too long for a minute-level analysis we aggregated ourdata by days and by hours

Figures 7ndash10 are classified into groups of two charts Thefirst one accumulated time chart will show the number ofactivity minutes registered by each sensor during a time sliceThe second chart presents the number of unique users seenby each sensor during the time slice In addition Figures 7-8include a third chart with the ratio of accumulated time overunique users this provides a hint on how stationary users areThis idea can be observed more precisely in the ratio chartpeaks represent moments when users are still (eg studentsin class) and dips are associated with transition times (egstudents arriving at the school)

First Figure 7 shows an overview of these metricsthroughout the year In this figure the measurements of the9 sensors are aggregated into a single line In the case ofthe Figure 7(a) the result is not exactly equal to the sum ofactivity minutes each sensor accumulates since a user can bedetected in the same minute by different sensors and theseoccasions are represented as single instant in this line

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

12 Wireless Communications and Mobile Computing

Jan

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Accu

mul

ated

min

utes

[times10

00]

100

50

0

(a)

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov DecDate

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

Num

ber o

f use

rs[times

100

]

15

10

5

0

(b)

Ratio

[min

utes

us

er]

1000

500

0Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Std Tables

(c)

Figure 8 Daily analysis during 2016 (a) minutes accumulated by each sensor (b) unique users registered by each sensor (c) ratio betweenminutes and users by each sensor

The most obvious observation that can be drawn fromFigure 7 is the presence of a strange behavior during themonths of January and February The reason is that aswe previously discussed the sensor network began to bedeployed in January 2016 and was not completely operativeuntil March We keep this data in the analysis because it canbe used to compare an anomalous situation with a regularone also it is much more intuitive to present a whole yearrange instead of nine months Nevertheless data collected inthose months are not used to raise conclusions

Holiday periods are clearly shown in the unique userrsquoschart when the line falls for example in March (Easter)summer holidays or some isolated holidays These periodscan also be seen in the ratio chart since it increases because inthose dates many fewer users attend to the school but usuallyspend long periods of time in the library A remarkable pointis October 12nd in which the ratio chart reaches the highestpeak and the number of users is almost zero That day even

the library was closed so only the security staff was in theschool We can also observe the effect of weekends on eachchart both the number of users and the accumulated minutesdecrease but the ratio increases for the same reason exposedfor the holidays users will study at the library during theweekends Finally between the months of May to June adecreasing trend can be observed both in the unique usersand in accumulated minutes going up in mid-June for theexamination period

Figure 8 shows a second set of graphs that correspondto a daily analysis of the whole year representing eachsensor One of the first conclusions that can be drawn is thedifference between the proportions in unique userrsquos chartand accumulated minutes chart The difference between thenumber of registered users per sensor is not as remarkable asthe difference between the number of accumulated minutesAgain this is due to the fact that users spend much longerperiods of time in the library than in other areas Observing

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 13

Entr

A S

ec-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(a)

Entr

A S

ec-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(b)

Entr

C-U

sers Sun

MonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(c)

Entr

C-M

inut

es SunMonTueWedThuFriSat

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

(d)Figure 9 Calendar view (a) unique users registered by sensor Entr A Sec (b) minutes accumulated by sensor Entr A Sec (c) unique usersregistered by sensor Entr C (d) minutes accumulated by sensor Entr C

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Accu

mul

ated

min

utes

[times10

00]

10

7

5

2

0

(a)

Mon 05 Tue 06 Wed 07 Thu 08 Fri 09 Sat 10 Sun 11Date

Entr AEntr A Sec

Entr BEntr B Lab

Entr B SecEntr C

Entr DLibrary

Num

ber o

f use

rs[times

100

]

3

2

1

0

(b)Figure 10 Hourly analysis during first week of November (a) minutes accumulated by each sensor (b) unique users registered by eachsensor

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 14: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

14 Wireless Communications and Mobile Computing

Table 4 Sensor records collisions

Collisions Count Total Percent9 3 27 0008 45 360 0007 421 2947 0016 5371 32226 0085 59222 296110 0724 560994 2243976 5463 2525718 7577154 18422 7268825 14537650 35351 16433862 16433862 3996

the line of the sensor of building C in the accumulatedminutes chart it can be seen that themagnitude ismaintainedthroughout the year This sensor covers a building wherethere are professorsrsquo offices and research laboratories ie thissensor registers mostly users who work at school and theymaintain a more regular schedule throughout the year thanstudents who go to classes or to the library

The line charts can be difficult to interpret for periods oftime as long as a whole year To ease the visual analysis a newcalendar visualization is offered inwhich the samedata can bestudied in amore intuitive way Figure 9 presents the full yearview in calendar format each row representing one day ofthe week and themonths appear delimited in blackThe colorintensity of the cell is proportional to the data it representsFigure 9 is formed by 4 subfigures which corresponds to thecount of users and number of minutes by the sensors ldquoEntr ASecrdquo and ldquoEntr Crdquo These visualizations are relative and canbe used to obtain general conclusions For a detailed studyboth visualizations with absolute measurements and a dataexamination are still necessary

Figure 9(a) represents the number of users detectedby the sensor ldquoEntr A Secrdquo which covers the classroomsof this building and Figure 9(b) the number of minutesaccumulated by the same sensor It is clear that the firsttwo months of the year this sensor was not operational onSaturdays and Sundays (last and first row) this sensor does notregister almost any activity and holiday periods are clearlyidentified such as Easter inMarch or summer holidays in Julyand August But this visualization reveals other informationthat is more difficult to observe in a line graph such as thatMondays are the days that the users spend less time in thisarea They also highlight some blank cells in the last monthsof the year revealing that the sensor did notwork during thosedays

Figures 9(c) and 9(d) represent the information of thenumber of users detected and accumulated minutes by theldquoEntr Crdquo sensor Asmentioned in the description of Figure 8this sensor includes very stable measures because it coversthe research laboratories and the workersrsquo schedules are notaffected by the school calendar as it is the case of the activityseen by other sensors like ldquoEntr A Secrdquo

Figure 10 presents a different view plotted at the hourlevel during a week inNovemberThis eases the identificationof activity hours which span from 8 in the morning to 22

in the evening Another notorious effect is the valley in themiddle of the day corresponding to the lunch break whenusers move to the cantina (see Figure 2) or go out from thebuildings Finally it is observed that the users leave the schoolgradually during the afternoon

43 Position Analysis Before getting into the details andinsights obtained from the analysis of the one-year gathereddata from the spatial perspective it is worth tomention that aset of testswas carried out in a controlled environment duringthe first stage of the deployment in order to check that theWi-Fi tracking system worked properly These tests includedtracking a well-known group of MAC addresses throughouttheWi-Fi tracking sensors checking that they appeared in theappropriate ones It was also checked that the system correctlylocated and tracked the security staff throughout their nighsecurity tours

A spatial analysis provides insight on how the users aredistributed throughout the buildings during different timesof the year To reach these conclusions we have improvedthe method presented in [68] The first improvement isrelated to the event when a user is detected by two or moresensors during the same minute From now on this eventwill be identified as a collision The number of collisions isa significant one in the case of some sensors that are close toeach other (eg library and building d) or sensors that covertransition areas (eg Entr A or Std Tables)

Table 4 shows a study of the number of collisions Theorder of collision is the number of sensors that collide for thesame minute and the count the number of rows in whicha collision of that order occurs Collisions of order 2 and3 group more than 50 of the data The solution to thissituation was to eliminate these collisions by replacing forthis experiment all the rows corresponding to a collision by asingle row whose latitude and longitude data are the centroidof the positions of the sensors participating in the collision

The second improvement consists in the incorporationof external information with the approximate position ofthe access points to which the users are connected Thisinformation has been obtained from the API Mylnikov Geo[69] getting the position of all ESSID registered throughoutthe year In the cases the user is connected and the approx-imate position data of the access point are available thisnew position is used instead of the position of the sensor

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 15: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 15

June

(a)

August

(b)

September

(c)

Figure 11 Monthly centroids heatmaps (a) June (b) August (c) September

that generates the row This allows measuring the userrsquosposition in more accuracy In addition it allows smoothingthe effect of using only the positions of the sensors whichcauses that the resulting possible positions are always insidethe hull of the polygon formed by the positions of thesensors

This experiment is based on the userrsquos centroid conceptThis centroid is the average position of a certain user alonga certain period of time Representation of these points overthe map reveals hints of the usersrsquo behaviors over the studiedtime slice based on the amount of people in each area

Figure 11 shows heatmaps of three different monthsFigure 11(a) is June amonth belonging to the second semesterof the scholar course and the month when the final termstake place Figure 11(b) is August during summer holidayswhen there are no classes or exams andmost of the professorsresearchers and staff are on vacation Figure 11(c) representsSeptember start of semester

Figure 11(b) confirms one of the facts extracted in thetemporal analysis on holidays the number of users falls andmost of the users present in the school keep studying at thelibrary (the warmest point is located over the library and ismuch smaller than on other months)

Both Figures 11(a) and 11(c) show that building A is thearea with the highest concentration of users Even so there aredifferences between two figures In Figure 11(a) the warmestarea in the map is over the library This is explained bythe final exam period of June The same observation canbe repeated in other periods of time to discover when thestudents have exams In Figure 11(c) the hot spots are onthe classrooms September is the beginning of the schoolyear and there is the greatest attendance to classes Thus thisobservation is an indicator of the level of studentsrsquo assistanceto class

The same type of analysis can be done with shorter timeframes to obtain more detailed behaviors Figure 12 shows thecentroids heatmap grouped by hours on September 2nd

A detailed analysis by hours like this allows observing theusersrsquo movement throughout the day The first row of mapsin Figures 12(a) 12(b) 12(c) and 12(d) shows the evolutionof userrsquos centroids at lunch time between 1200 and 1600 Inthis transition it can be seen that there are hot spots on theclasses and library at the beginning On the next map thosecentroids move to the cantina (see Figure 2) and in the lastmap they return to their original positions The second rowof maps in Figure 12 presents the start and the end of theactivity time in the school In Figures 12(e) and 12(f) it can beobserved how the first users in the day go directly to classesBy contrast Figures 12(g) and 12(h) reveal that users tend tobe at the library at the end of the day

Finally Table 5 presents the count of the different usersdetected by each sensor throughout the year Recalling thatthe total number of users obtained in Section 41 is 4653the data in the table reveals that over the year most usershave ever been seen by each sensor at some time The twoexceptions to this fact are the sensor of building C and thesensor of the laboratories of building B These sensors coverthe professorrsquo offices and research laboratories so they areunusual for students to stay in those areas

44 Behavior Analysis The third set of experiments weperformed deals with the behaviors that each user followsthroughout a single day To obtain them we grouped the datausing a user-day key For each key a vector of 24 positions -one per hour- is created In each position of this vector wedetermine which one has been the sensor that has detectedthis user most of the time This vector represents thereforethe route that the user followed throughout that day hour by

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 16: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

16 Wireless Communications and Mobile Computing

Table 5 Sensor annual statistics

Sensor Number of rows Number of usersLibrary 14166793 4674Entr D 6818165 4621Std Tables 5625974 4670Entr A 4736237 4620Entr A Sec 2352019 4579Entr B Sec 2258618 4530Entr B Lab 2229560 3630Entr C 1469026 3093Entr B 1248462 4127

1200 to 1300

(a) (b) (c) (d)

(e) (f) (g) (h)

1300 to 1400 1400 to 1500 1500 to 1600

800 to 900 900 to 1000 1900 to 2000 2000 to 2100

Figure 12 Hourly centroids heatmaps of 2nd September (a) 1200 to 1300 (b) 1300 to 1400 (c) 1400 to 1500 (d) 1500 to 1600 (e) 800 to900 (f) 900 to 1000 (g) 1900 to 2000 (h) 2000 to 2100

hourOnce the behavior vectors are obtained the informationof the day and the user is discarded to make a count ofthe most repeated behavior vectors The dimensionality oftheses vectors makes the number of possible behaviors hugetheoretically 2410asymp6lowast1013 or 1410asymp289K millions using onlythe activity hours but it is known that users behave similarlyso a much lower number of behaviors can be expected eventhough it will still be a high number

Table 6 presents the 20 most repeated behaviors through-out the year A total of 285K behaviors are detected of which139K are uniqueThe first 500 most frequent behaviors group25 of the total behaviors Table 6 shows only the part

corresponding to the activity hours of the school which asobserved in the temporal analysis covers from 800 am to1000 pm As it can be seen the majority of the most repeatedbehaviors are periods between two and five hours of stay inthe library It is necessary to expand the analysis to the top20 to observe the class attendance behaviors It is easy toappreciate that the different behaviors are usually morning orafternoon with lunchtime from 100 pm to 300 pm whichmeans that the majority of users go either in the morning orin the afternoon but they do not spend all day at school

Another quite obvious observation is that many of thebehaviors obtained are very similar among them (eg going

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 17: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 17

Table 6 Top 20 most repeated behaviors

800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100- - - - - - - - - - - - - -

900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 22001 Lib1 Lib Lib Lib Lib2 Lib Lib Lib Lib3 EntA2

4 Lib Lib Lib5 Lib Lib Lib Lib6 Lib7 EntA8 Lib Lib Lib Lib Lib Lib9 Std3 Std Std Std Std Std Std Std Std Std Std Std Std Std10 EntA11 EntA12 Lib Lib Lib Lib Lib13 Lib Lib14 Lib Lib15 EntA16 EntA EntA17 EntA18 Lib Lib19 SecA4 SecA SecA SecA20 EntA1 Lib = Library2 EntA = Entr A3 SecA = Entr A Sec4 Std = Std Tables

to the library from 900 am to 200 pm or going to thelibrary from 900 am to 100 pm) In order to lower thisredundancy we performed a clustering procedure on themEach behavior can be understood as a categorical vector oflength 24 inwhich the categories correspond to the 9 possiblesensors of the data set plus the empty category The chosenclustering algorithm is Proximus [70] due to its simplicityefficiency scalability and resultsrsquo reliability The algorithmworks with binary vectors and creates clusters based on theHamming distance (the number of bits that differ between

two binary vectors) A vector is chosen as the center of thecluster and other vectors whose distance to the center issmaller than the maximum cluster radius are added to thatcluster

The behaviors are expressed in categorical vectors so it isnecessary to transform them into binary vectors to be usedin Proximus The transformation shown in Equation (1) isproposed This transformation is simple and also reversiblewhich allows for the recovery of the original behavior driversafter clustering

119861 = ℎ0 ℎ23ℎ119894 isin 0 1198861 1198869

997888rarr119861 = 12057501198861 12057501198869 120575231198861 120575231198869

120575119894119886119895 =

0 119894119891 ℎ119894 = 1198861198951 119894119891 ℎ119894 = 119886119895

(1)

Where 1198861 1198869 are the nine sensors 119861 is the behaviorvector described above and 119861 is the 24lowast9 long transformedvector made out of 9 samples subgroups each one associatedwith each hour Every sample on each subgroup is 0 exceptfor the index of the active sensor on each hour which ismarked with a 1 on its variable

Table 7 shows the results of applying the Proximusclustering to the found behaviors The first column is the

center of the cluster the second one the number of behaviorsthat are grouped in that cluster and the third the numberof behaviors that fall within that cluster The results arepresented ordered by the number of behaviors included inthe cluster 45707 clusters are obtained The first 300 group50 of the behaviors recorded throughout the year Themostimportant cluster behaviors in Table 7 can be understood thisway

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 18: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

18 Wireless Communications and Mobile Computing

Table 7 Behavior clusters

Cluster center Cluster components Behaviors count1 Entr A Sec from 900 to 1400 415 49632 Entr B Sec from 1000 to 1300 458 37563 Entr B Sec from 1500 to 1900 471 29244 Library from 900 to 1300 256 28705 Library from 1000 to 1200 176 23966 Library from 1800 to 2100 244 22317 Entr A 1300 787 19798 Library 1200 563 18949 Library from 1800 to 1900 and from 2000 to 2100 175 185310 Entr B Lab 1100 to 2000 568 1831

(i) Clusters 1 2 and 3 students attending to classesThese sensors cover the main classes in the schooland the intervals matches with the class schedule

(ii) Clusters 4 5 6 and 9 students at the library(iii) Cluster 7 users which usually are outside of the

sensors coverage areas arriving or leaving(iv) Cluster 10 laboratory equipment and professors

This sensor covers some of the professorsrsquo officesand laboratories which usually have some laboratoryequipment connected during work hours

5 Validation of Occupancy EstimationBased on Wi-Fi Tracking

Although as it has been already pointed out in Section 43 atthe very first stage of the deployment it was tested that theWi-Fi sensors properly locate and track well-known MACaddresses it was still needed to validate the accuracy of thesystem for estimating occupancy (as it can be distorted bythe aforementioned fact that a single user can carry severaldevices connected to Wi-Fi networks) As no ground-truthdata was available for the full set of buildings and only somedata was found for the library we centered our validationefforts in comparing our data with the available ones if wecan trust our results in that area then we can extend ourtrust to the rest of the areas for which no well-known dataare available

The library of ETSIT has 408 study sites and it offers aweb service to check the number of available seats at a givenmoment of time [71] This system is based on two sourcesof information a person who counts the empty seats everyopening hour from Friday to Sunday and a video cameralocated at the main entrance of the library that counts thenumber of people entering or leaving at 15-minute intervalsThe human systemprovides a ground truth about the numberof occupied positions but this measure is very differentfrom the actual number of people in the library since acommon situation especially during examination terms isthat students place their studymaterial at the seat to reserve itwhile they are not in the libraryThis situation is a problem forthe library staff and therefore they installed the video camerasystem to count the studentrsquos entrances and exits

This systemgenerates an estimation of the number of peo-ple in the library adding to the previous measure the numberof people which are detected entering and subtracting thenumber of people which are detected leaving The system isnot perfect and in most cases it carries an accumulated errorthat increases in the estimation of the number of people in thelibrary The total error can be calculated clearly at the end ofthe day when the library closes and the number of peopleinside is supposed to be zero In summer there is a situationthat aggravates this error and consequently the measureachieved by this system due to the rise in temperature theback door of the library is opened to improve ventilationand allows students to exit through it although they mustcontinue entering through the main door The camera doesnot count students leaving through this back entry

The library staff provided us with the data collected by thetwo systems (human and camera) between June 5th and 30thThese measurements can be compared by those obtained bythe Wi-Fi sensor installed at the library to validate them Forthis test all the data collected by the sensor will be usedwithout filtering the MACs of the sporadic users as it hasbeen explained before

Figure 13 shows the data collected by the three systemson Sunday June 5th This is the first day with data fromthe three sources Other days in which these three sourcesare present have the same trends It is clear that there is adivergence between the human system observation and therest of the data The graph of the human system shows thatthe number of occupied seats increases in the first hours upto the maximum and remains steady until the end of theday without being affected by the behavior of the users atlunchtime However this effect is reflected in the camerasystem andWi-Fi tracking measurements We observed thatin general the number of people accounted for by the camerasystem is under theWi-Fi-tracking systemmeasure althoughthe proportion is maintained over time Finally the figureshows the cumulative error effect of the camera systemwhichat the end of the day still renders 65 people in the library

To better study the relation between the camera systemand the Wi-Fi-tracking system we generated a detailed visu-alization that allows us to observe the data of thewholemonthin a single figure Figure 14 is composed of 3 subfigures eachof them is a matrix of colored cells the lines represent a full

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 19: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 19

Num

ber o

f use

rs

400

300

200

100

0

Hour1000 1200 1400 1600 1800 2000 2200

Camera Human Wifi

Figure 13 Measure of library occupation using Wi-Fi camera and human systems

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(a)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(b)

00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

(c)

Figure 14 Measures each 15 minutes of library occupation during June (a) Wi-Fi records (b) camera records (c) difference between Wi-Fiand camera

day and the columns are divisions of 15 minutes The colorintensity is proportional to the number of users measuredin each interval being more intense when more users aredetected Figure 14(a) shows the measurements of the Wi-Fisensor The behavior is the one observed in previous sections

of this document Figure 14(b) shows the measurement of thecamera system During the first half of the day the same trendas in Figure 14(a) can be observed but in the afternoon andnight the accumulated error begins to be appreciated Thiserror reaches a maximum of 378 people with an average of

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 20: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

20 Wireless Communications and Mobile Computing

139 people at the end of the day while the Wi-Fi-trackingsystem registers a maximum of 10 and an average of 7 at theend of the day (these are machines that are still turned onwhen the library is empty of people) Finally Figure 14(c)shows the difference between the two previous ones usingthe green color when the error is positive (theWi-Fi-trackingsystem registers a higher value than the camera system) andred color otherwise This figure validates the measurementsobtained by the Wi-Fi-tracking system since the differencewith the measure of the camera system in the first half ofthe day is small (an average of 27 people counting the datafrom 800 to 1600) whereas in the afternoon the erroraccumulated by the camera system provokes that the averagedifference grows to 83 people between 1600 to 2200

TheWi-Fi sensor of the library was not placed for the spe-cific purpose of counting the number of people in the libraryand in consequence its coverage area can detect devicesthat are outside the library This explains why in some casesit registers a measurement greater than that of the camerasystem In any case the Wi-Fi sensor with its limitationsregisters a measure very similar to the system of countingpeople using a camera even avoiding the cumulative errorof this system being much cheaper (tens of euros againstthousands of euros) and less intrusive than a video cameraCurrently since the camera system is already installed theWi-Fi sensor measures can be used as a maximum whichwould limit the error made by this system In summary theusage of Wi-Fi sensor to count people in the library providesaccurate results despite the potential errors induced by thesensing period collisions ratio between users andMACseenand data processing performed These results invite to trustthat the results of our analyses for the rest of the areas incampus are also correct

6 Discussion and Conclusions

In the execution of this case study we have learned somelessons about the limitations we faced that are worth to betaken into account for similar systems and analyses Firstthere is the key issue of the sensors deployment it is notjust a question of density (number or sensors per surface)but of adapting its design to the topology of the place tobe controlled So for example it is crucial not only to havesensors in entries and exits of buildings but also inside themas in the considered deployment where there are sensorsplaced in the entrance and exists of the buildings but alsoin other especially relevant places such as the library or thework-in-group area

A complete coverage of indoors devices without colli-sions and for the complete timespan of the stay indoorswould require a large deployment with many sensors able tocover all corners This is not suitable in this context insteadwe counted with 9 sensors some in gates and others in largerooms which proved to be enough to check if a device isin the reach of any of them But some assumptions had tobe taken we considered as valid behaviors only those thatspent a significant amount of time once they had been seen(to remove transient behaviors and those of people walkingout of the buildings) also we discarded behaviors that did

not appear enough times in the yearly timespan On the otherhand the topology of these buildings did not impose severerestrictions on sensorsrsquo coverage In fact we found severaldevices seen by different sensors at once As a summarywe can conclude that the finer the spatial granularity (moresensors) the larger the set of different behaviors so gettingraw data from sensors would lead to an explosion of statesthat would render meaningless results

The gathered data is another key issue some worksin literature got signal strength measurements every fewseconds allowing them to identify indoor trajectories Thiswas not our case and in fact this revealed to be very limitingThus the precision of our analysis is limited by the regionscovered by each sensor which hampers us from analyzingany kind of movement within regions Nevertheless as ithas been seen relevant results can still be obtained withoutthe increase on energy consumption and the risk of floodingthe school communications network that the other approachwould entail

As regards the analysis of data to identify usersrsquo behaviorswe have to indicate that a long observation period is a mustObtaining data for a complete work cycle (in this case afull school year) helps in discovering common behaviorsthat happen in a university Using long observation timescan help detecting erroneous or atypical operations on thesensors as happens in the first months of the year in this caseGetting even larger observation datasets would reinforce thebehaviors we have found (eg seasonality) but the chance todiscover new ones is negligible due to the expected behaviorsin the campus will be periodic with the school year

The usage of big data platforms for analysis while notmandatory eases the management of large datasets and theexecution of iterative study on the data The preprocessingwork carried out allowed us to clean and filter our dataFor example some discovered behaviors are too regular andextended over time which may correspond to servers ormachines which we could then filter out

The temporal analysis has revealed some behaviors thatare maintained throughout the year and others that occuroccasionally The work has focused on the study of behaviorsthat are repeated and maintained throughout the year Acloser view such as the analysis per hour shows the hoursof activity nocturnal patterns or movements at mealtime Inthe daily vision the difference in activity between work daysand weekends or holidays is clearly seen Visualization hasbeen a key technique in order to detect and understand thesetemporal patterns

The spatial analysis revealed that a problem of collisionhappens in the user detection but its effect was mitigated bythe use of innovative algorithms and techniques such as thecalculation of centroids and the combination of sensors infor-mation with external sources of information (ie semantictrajectories) so a much more precise positioning of the usersis achieved than with the exclusive use of the sensors Thevisualization of centroids fostered the visual understandingof complex data such as the position of each user over amonth and the superposition of the centroids in a heatmapallowed knowing the movements of the groups of people andthe occupation of the different zones

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 21: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 21

Furthermore we have found dominant usersrsquo behaviorsas the most repeated behaviors registered by the sensorthroughout the year The number of found behaviors hasbeen huge but the application of the Proximus clusteringalgorithm reduced this number to a manageable amountThen the study of the obtained clusters has revealed that themost frequent behaviors coincide with what can be expectedfrom a college building researchers working in laboratoriesand students attending to classes or studying in the library

Although some of the discovered behaviors and patternscan be seen as common knowledge it is worth to stressthat they do bring value since they represent numericalevidences that support decision making (eg someone canthink that the Wi-Fi access in a given area does not workproperly because it is always overcrowded but numericalevidences are needed to appropriately justify the investmentof increasing the number of AP of the corporate WLAN inthat given area to improve the service) In addition suchwell-known patterns when obtained automatically by processingthe available data become baseline models which can be usedto detect anomalies or atypical situations as it is commonpractice in nonsupervised machine learning

Lastly we have validated the accuracy of using Wi-Fitracking for occupancy estimation comparing it with thelibrary staff manual counting (considered as ground truth)and with a video camera system installed at the library mainentrance As a main finding Wi-Fi tracking has proved tobe more accurate than the video camera system in additionto being way cheaper As a matter of fact the library staff iscurrently using preferably the occupancy estimation basedon Wi-Fi tracking rather than the one based on the videocamera system Nevertheless the accuracy of the Wi-Fitracking system can be further improved by correlating MACaddresses detected in same places over fair enough periodsof time and considering only one thus mitigating the issuerelated to the fact that a single person can bring severaldevices connected to Wi-Fi networks

Beside this theWi-Fi tracking system is currently used bythe library staff to perform more sophisticated studies suchas figuring out the percentage of students from the differentschools of the university who come to study to the libraryof the Telecommunications Engineering School during theweekend Figure 15 shows the results of such an analysiswhich represents a token of how this kind of IoT system canhelp solving real-life problems and improving the operationof already running services

To summarize we have studied a one-yearWi-Fi trackingdataset obtained from a reduced set of low-cost sensors withlimited capabilities deployed on an actual university campusthat receives around 4000 people every day We have pro-cessed the data in order to identify traces of mobile devicesenabled with Wi-Fi which are identified as people movingin the campus buildings and then we have extracted peoplersquosstays movements and common behaviors The obtainedresults represent numerical evidences that illustrate how alow-costWi-Fi tracking system can be used in real-life condi-tions to improve or optimize the operation of the monitoredpremisesThese results can allow dimensioning appropriatelythe WLAN infrastructure or the canteen personnel or detect

1396

1806

264

090951

382458

1771

2882

CIVIL

INDUS

NAVAL

SPORT

ARQUI

FORES

EDIF

AERO

TELECOM

Figure 15 Users who only go to the library of the Telecommunica-tion School on weekends classified by their school

anomaly situations in real time Furthermore the data fromthe people flow monitoring system is currently being usedtogether with the data from the environmental monitoringsystem to try to reduce the environmental footprint of theschool [72] In addition the Wi-Fi tracking system is actuallyused by the library staff in their day-to-day activity whichillustrates the value that this kind of IoT infrastructure canbring to real-life problems and services

Data Availability

The dataset with the Wi-Fi sensors records used to supportthe findings of this study are available from the correspondingauthor upon request

Conflicts of Interest

The authors declare that they have no conflicts of interest

Acknowledgments

This work has been partly funded by Universidad Politecnicade Madrid through the project RES2+U (ResponsablesSostenibles y Universitarios) (httpblogsupmesres2masu)The work performed by Jose M Navarro was funded by theMinisterio de Educacion Cultura yDeporte de Espana underGrant no FPU 1403209

References

[1] ldquo2018 Revision ofWorld Urbanization ProspectsmdashMultimediaLibrary - United Nations Department of Economic and Social

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 22: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

22 Wireless Communications and Mobile Computing

Affairsrdquo httpswwwunorgdevelopmentdesapublications2018-revision-of-world-urbanization-prospectshtml

[2] M Pallot ldquoEngaging Users into Research and Innova-tionThe Living Lab Approach as a User CentredOpen Innova-tion Ecosystemrdquo httpswebarchiveorgweb20120509081658httpwwwcwe-projectseupubbscwcgi1760838id=7154041760838

[3] E Almirall and J Wareham ldquoLiving Labs arbiters of mid-and ground-level innovationrdquo Technology Analysis amp StrategicManagement vol 23 no 1 pp 87ndash102 2011

[4] ldquoTUDelft Green officerdquo httpswwwtudelftnlsustainability[5] ldquoUBC Campus as a living laboratoryrdquo httpssustainubcca

our-commitmentcampus-living-lab[6] ldquoHarvard Sustainabilityrdquo httpsgreenharvardedu[7] ldquoUPM City of the Future initiativerdquo httpblogsupmes

cityofthefuture-upmeninitiative[8] M Alvarez-Campana G Lopez E Vazquez V A Villagra

and J Berrocal ldquoSmart CEI moncloa An iot-based platformfor people flow and environmental monitoring on a SmartUniversity Campusrdquo Sensors vol 17 no 12 2017

[9] M B Kjaeligrgaard H Blunck T Godsk T Toftkjaeligr D LChristensen and K Groslashnbaeligk ldquoIndoor positioning using GPSrevisitedrdquoLectureNotes in Computer Science (including subseriesLecture Notes in Artificial Intelligence and Lecture Notes inBioinformatics) Preface vol 6030 pp 38ndash56 2010

[10] A Belmonte-Hernandez G Hernandez-Penaloza F Alvarezand G Conti ldquoAdaptive Fingerprinting in Multi-Sensor Fusionfor Accurate Indoor Trackingrdquo IEEE Sensors Journal vol 17 no15 pp 4983ndash4998 2017

[11] M S Kristoffersen J V Dueholm R Gade and T BMoeslundldquoPedestrian counting with occlusion handling using stereothermal camerasrdquo Sensors vol 16 no 1 2016

[12] L Zheng X Ruan Y Chen and M Huang ldquoShadow removalfor pedestrian detection and tracking in indoor environmentsrdquoMultimedia Tools and Applications vol 76 no 18 pp 18321ndash18337 2017

[13] T Li H ChangMWang B Ni R Hong and S Yan ldquoCrowdedscene analysis a surveyrdquo IEEE Transactions on Circuits andSystems for Video Technology vol 25 no 3 pp 367ndash386 2015

[14] F Adib and D Katabi ldquoSee through walls with WiFirdquo inProceedings of the Annual Conference of the ACM Special InterestGroup on Data Communication on the Applications Technolo-gies Architectures and Protocols for Computer CommunicationACM SIGCOMM 2013 pp 75ndash86 China August 2013

[15] Z Kabelac D Katabi and R C Miller ldquo3D Tracking viaBody Radio Reflectionsrdquo in Proceedings of the 11th USENIXConference on Networked Systems Design and Implementationpp 317ndash329 2014

[16] F Adib Z Kabelac and D Katabi ldquoMulti-person localizationvia RF body reflectionsrdquo in Proceedings of the 12th USENIXSymposium on Networked Systems Design and ImplementationNSDI 2015 pp 279ndash292 USA May 2015

[17] Z-A Deng G Wang D Qin Z Na Y Cui and J Chen ldquoCon-tinuous indoor positioning fusing WiFi smartphone sensorsand landmarksrdquo Sensors vol 16 no 9 2016

[18] R A Becker R Caceres K Hanson et al ldquoA tale of one cityUsing cellular network data for urban planningrdquo IEEE PervasiveComputing vol 10 no 4 pp 18ndash26 2011

[19] J E Mallah F Carrino O A Khaled and E MugellinildquoCrowd monitoring critical situations prevention using smart-phones and group detectionrdquo Lecture Notes in Computer Science

(including subseries Lecture Notes in Artificial Intelligence andLecture Notes in Bioinformatics) Preface vol 9189 pp 496ndash5052015

[20] J Kuang X Niu and X Chen ldquoRobust Pedestrian DeadReckoning Based on MEMS-IMU for Smartphonesrdquo Sensorsvol 18 no 5 p 1391 2018

[21] A Kurkcu and K Ozbay ldquoEstimating Pedestrian DensitiesWait Times and Flows with Wi-Fi and Bluetooth SensorsrdquoTransportation Research Record vol 2644 no 1 pp 72ndash82 2017

[22] A Kotanen M Hannikainen H Leppakoski and THamalainen ldquoPositioning with IEEE 80211b wireless LANrdquo inProceedings of the 14th IEEE 2003 International Symposium onPersonal Indoor and Mobile Radio Communications vol 3 pp2218ndash2222 Beijing China 2003

[23] W Ho A Smailagic D P Siewiorek and C Faloutsos ldquoAnadaptive two-phase approach to WiFi location sensingrdquo inProceedings of the 4th Annual IEEE International Conference onPervasive Computing and Communications Workshops PerComWorkshops 2006 pp 452ndash456 Italy March 2006

[24] F Evennou and F Marx ldquoAdvanced integration of WiFi andinertial navigation systems for indoor mobile positioningrdquoEURASIP Journal onApplied Signal Processing vol 2006 ArticleID 86706 11 pages 2006

[25] J Rekimoto T Miyaki and T Ishizawa ldquoLifeTag WiFi-basedcontinuous location logging for life pattern analysisrdquo LNCS vol4718 pp 35ndash49 2007

[26] J A Besada A M Bernardos P Tarrıo and J R CasarldquoAnalysis of tracking methods for wireless indoor localizationrdquoin Proceedings of the 2nd International Symposium on WirelessPervasive Computing (ISWPC rsquo07) pp 492ndash497 February 2007

[27] O Woodman and R Harle ldquoPedestrian localisation for indoorenvironmentsrdquo in Proceedings of the 10th International Con-ference on Ubiquitous Computing (UbiComp rsquo08) pp 114ndash123Seoul Republic of Korea September 2008

[28] F Aloul A Sagahyroon A Al-Shami I Al-Midfa and RMoutassem ldquoUsing mobiles for on campus location trackingrdquoinProceedings of the 7th International Conference onAdvances inMobile Computing and Multimedia MoMM2009 pp 231ndash235Malaysia December 2009

[29] S Woo S Jeong E Mok et al ldquoApplication of WiFi-basedindoor positioning system for labor tracking at constructionsites A case study in Guangzhou MTRrdquo Automation in Con-struction vol 20 no 1 pp 3ndash13 2011

[30] N LeDortz F Gain andP Zetterberg ldquoWiFi fingerprint indoorpositioning system using probability distribution comparisonrdquoinProceedings of the IEEE International Conference on AcousticsSpeech and Signal Processing (ICASSP rsquo12) pp 2301ndash2304March 2012

[31] Z Chen H Zou H Jiang Q Zhu Y C Soh and L Xie ldquoFusionof WiFi smartphone sensors and landmarks using the kalmanfilter for indoor localizationrdquo Sensors vol 15 no 1 pp 715ndash7322015

[32] Y Shu C Bo G Shen C Zhao L Li and F Zhao ldquoMagi-col indoor localization using pervasive magnetic field andopportunistic wifi sensingrdquo IEEE Journal on Selected Areas inCommunications vol 33 no 7 pp 1443ndash1457 2015

[33] ADanalet B Farooq andM Bierlaire ldquoABayesian approach todetect pedestrian destination-sequences fromWiFi signaturesrdquoTransportation Research Part C Emerging Technologies vol 44pp 146ndash170 2014

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 23: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

Wireless Communications and Mobile Computing 23

[34] O Czogalla and S Naumann ldquoPedestrian indoor navigation forcomplex public facilitiesrdquo in Proceedings of the 2016 Interna-tional Conference on Indoor Positioning and Indoor NavigationIPIN 2016 pp 1ndash8 Spain October 2016

[35] L Schauer P Marcus and C Linnhoff-Popien ldquoTowardsfeasibleWi-Fi based indoor tracking systems using probabilisticmethodsrdquo in Proceedings of the 2016 International Conference onIndoor Positioning and Indoor Navigation IPIN 2016 pp 1ndash8Spain October 2016

[36] F Meneses and A Moreira ldquoLarge scale movement analysisfrom WiFi based location datardquo in Proceedings of the 2012International Conference on Indoor Positioning and IndoorNavigation IPIN 2012 Australia November 2012

[37] B Bonne A Barzan P Quax and W Lamotte ldquoWiFiPiinvoluntary tracking of visitors atmass eventsrdquo in Proceedings ofthe IEEE 14th International Symposium on a World of WirelessMobile and Multimedia Networks (WoWMoM rsquo13) pp 1ndash6Madrid Spain June 2013

[38] M B Kjaergaard M Wirz D Roggen and G Troster ldquoMobilesensing of pedestrian flocks in indoor environments usingWiFisignalsrdquo in Proceedings of the 10th IEEE International Conferenceon Pervasive Computing and Communications (PerCom rsquo12) pp95ndash102 Lugano Switzerland March 2012

[39] S Sendra M Garcia C Turro and J Lloret ldquoPeople mobilitybehaviour study in a university campus using WLANsrdquo inProceedings of the 3rd International Conference on MobileUbiquitous Computing Systems Services and TechnologiesUBICOMM 2009 pp 124ndash129 Malta October 2009

[40] J Scheuner G Mazlami D Schoni et al ldquoProbr - A Genericand Passive WiFi Tracking Systemrdquo in Proceedings of the 41stIEEE Conference on Local Computer Networks LCN 2016 pp495ndash502 UAE November 2016

[41] Y Li S Williams B Moran and A Kealy ldquoQuantized RSSBased Wi-Fi Indoor Localization with Room Level Accuracyrdquoin Proceedings of the International Global Navigation SatelliteSystems 2018 2018

[42] A Alessandrini C Gioia F Sermi I Sofos D Tarchi andM Vespe ldquoWiFi positioning and Big Data to monitor flows ofpeople on a wide scalerdquo in Proceedings of the 25th EuropeanNavigation Conference ENC 2017 pp 322ndash328 SwitzerlandMay 2017

[43] LVuKNahrstedt S Retika and IGupta ldquoJoint bluetoothwifiscanning framework for characterizing and leveraging peoplemovement in university campusrdquo in Proceedings of the13th ACMInternational Conference on Modeling Analysis and Simulationof Wireless and Mobile Systems (MSWiM rsquo10) pp 257ndash265October 2010

[44] M Garcia S Sendra C Turro and J Lloret ldquoUserrsquos Macro andMicro-mobility Study using WLANs in a University CampusrdquoInternational Journal On Advances in Internet Technology vol4 no 1 pp 37ndash46 2011

[45] Y Xu I D G Groeneveld R Sulzer E Theocharous O TWillems and M S Tryfona ldquoDetermine activity based onthe classified identity of users by using Wi-Fi monitoringrdquoGeomatics Synthesis Group Project Report 2016

[46] A Danalet L Tinguely M D Lapparent and M BierlaireldquoLocation choicewith longitudinalWiFi datardquo Journal of ChoiceModelling vol 18 pp 1ndash17 2016

[47] M Zhou K Sui M Ma Y Zhao D Pei and T MoscibrodaldquoMobiCamp A campus-wide testbed for studying mobile phys-ical activitiesrdquo in Proceedings of the 3rd International Workshopon Physical Analytics WPA 2016 pp 1ndash6 Singapore 2016

[48] A Fernandez-Ares A M Mora M G Arenas et al ldquoStudyingreal traffic and mobility scenarios for a Smart City using a newmonitoring and tracking systemrdquo Future Generation ComputerSystems vol 76 pp 163ndash179 2017

[49] A E Redondi andM Cesana ldquoBuilding up knowledge throughpassiveWiFi probesrdquoComputer Communications vol 117 pp 1ndash12 2018

[50] L Huang K MatsuuraH Yamanet and K Sezaki ldquoEnhancingwireless location privacy using silent periodrdquo in Proceedingsof the 2005 IEEE Wireless Communications and NetworkingConference WCNC 2005 Broadband Wirelss for the Masses -Ready for Take-off pp 1187ndash1192 USA March 2005

[51] J Martin T Mayberry C Donahue et al ldquoA Study of MACAddress Randomization in Mobile Devices and When it FailsrdquoProceedings on Privacy Enhancing Technologies vol 2017 no 4pp 365ndash383 2017

[52] MVanhoef CMatteMCunche L S Cardoso and F PiessensldquoWhyMACaddress randomization is not enough an analysis ofWi-Fi network discoverymechanismsrdquo inProceedings of the 11thACM on Asia Conference on Computer and CommunicationsSecurity pp 413ndash424 ACM Xirsquoan China June 2016

[53] A J Ruiz-Ruiz H Blunck T S Prentow A Stisen and MB Kjaergaard ldquoAnalysis methods for extracting knowledgefrom large-scale WiFi monitoring to inform building facilityplanningrdquo in Proceedings of the 2014 12th IEEE InternationalConference on Pervasive Computing and Communications Per-Com 2014 pp 130ndash138 Hungary March 2014

[54] N Nunes M Ribeiro C Prandi and V Nisi ldquoBeanstalk - Acommunity based passive Wi-Fi tracking system for analysingtourism dynamicsrdquo in Proceedings of the 9th ACM SIGCHISymposium on Engineering Interactive Computing Systems EICS2017 pp 93ndash98 Portugal June 2017

[55] C ParentN Pelekis YTheodoridis et al ldquoSemantic trajectoriesmodeling and analysisrdquo ACMComputing Surveys vol 45 no 4pp 1ndash32 2013

[56] C Wei ldquoMining of User Behavioral Features Based on IndoorSemantic Trajectoriesrdquo Boletın Tecnico ISSN0376-723X vol55 2017

[57] K V Long D Quang and N Klara Lessons learned from blue-toothwifi scanning deployment in university campus UrbanaIllinois USA 2010

[58] V Radu and M K Marina ldquoHiMLoc indoor smartphonelocalization via activity aware pedestrian dead reckoning withselective crowdsourced WiFi fingerprintingrdquo in Proceedings ofthe International Conference on Indoor Positioning and IndoorNavigation (IPIN rsquo13) pp 1ndash10 IEEE Montbeliard-BelfortFrance October 2013

[59] FHongY Zhang Z ZhangMWei Y Feng andZGuo ldquoWaPIndoor localization and tracking using WiFi-Assisted Particlefilterrdquo in Proceedings of the 39th Annual IEEE Conference onLocal Computer Networks LCN 2014 pp 210ndash217 CanadaSeptember 2014

[60] Y Zeng P H Pathak and P Mohapatra ldquoAnalyzing shopperrsquosbehavior through WiFi signalsrdquo in Proceedings of the 2ndWorkshop on Physical AnalyticsWPA 2015 pp 13ndash18 Italy 2015

[61] P Sapiezynski A Stopczynski R Gatej and S LehmannldquoTracking human mobility using WiFi signalsrdquo PLoS ONE vol10 no 7 p e0130824 2015

[62] Z Tian Y Jin M Zhou Z Wu and Z Li ldquoWi-FiMARGIntegration for Indoor Pedestrian Localizationrdquo Sensors vol 16no 12 p 2100 2016

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 24: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

24 Wireless Communications and Mobile Computing

[63] B Zhou Q Li Q Mao and W Tu ldquoA Robust Crowdsourcing-Based Indoor Localization Systemrdquo Sensors vol 17 no 4 p 8642017

[64] ldquoRaspberry Pirdquo httpswwwraspberrypiorg[65] ldquoTP-LINK Wi-Fi USB dongle datasheetrdquo httpwwwtp-

linkcomusproductsdetailscat-5520 TL-WN722Nhtml[66] ldquoISOIEC 209222016 - Information technology ndash Message

Queuing Telemetry Transport (MQTT) v311rdquo httpswwwisoorgstandard69466html

[67] G Piateski-Shapiro and W Frawley Knowledge Discovery inDatabases MIT Press Cambridge MA USA 1991

[68] J Andion Jimenez J M Navarro Gonzalez M Alvarez-Campana Fernandez-Corredor and J C Duenas Lopez ldquoApassive non-intrusive cheap method to identify behavioursand habits in the Campusrdquo in Proceedings of the XIII Jornadas deIngenieria Telematica - JITEL2017 vol 40 no 47 pp 10ndash4995September 2017

[69] A Mylinikov ldquoGeo projectrdquo httpswwwmylnikovorg[70] K Mehmet and G Ananth ldquoPROXIMUS A framework for

analyzing very high dimensional discrete-attributed datasetsrdquoinProceedings of the 9thACMSIGKDD International Conferenceon Knowledge Discovery andDataMining KDD rsquo03 vol 147 no156 pp 147ndash156 USA August 2003

[71] ldquoUPM Library occupationrdquo httpceiboardditupmessmart-campusbiblio

[72] C A R Inarejos A Rodrıguez G Lopez and M Alvarez-Campana ldquoAnalisis de la huella de carbono de la ETSIT de laUPM y propuesta de mejora basada en datos de la plataformaIoT Smart CEI Moncloardquo in Proceedings of the I CongresoIberoamericano de Ciudades Inteligentes (ICSC-CITIES 2018)2018

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 25: Smart Behavioral Analytics over a Low-Cost IoT Wi-Fi ...downloads.hindawi.com/journals/wcmc/2018/3136471.pdfSmart Behavioral Analytics over a Low-Cost IoT Wi-Fi Tracking Real Deployment

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom