10
Intelligent Systems Conference 2017 7-8 September 2017 | London, UK Towards Stream-based Reasoning and Machine Learning for IoT Applications Markus Endler Dept of Informatics, PUC-Rio Rio de Janeiro, Brazil Email: [email protected] Jean-Pierre Briot LIP6, UPMC–CNRS & Dept of Informatics, PUC-Rio Paris, France Rio de Janeiro, Brazil Email: [email protected] Vitor P. de Almeida Dept of Informatics, PUC-Rio Rio de Janeiro, Brazil Email: [email protected] Francisco Silva e Silva LSDi, Univ. Federal do Maranh˜ ao ao Luis, Brazil Email: [email protected] Edward H. Haeusler Dept of Informatics, PUC-Rio Rio de Janeiro, Brazil Email: [email protected] Abstract—As distributed IoT applications become larger and more complex, the pure processing of raw sensor and actuation data streams becomes impractical. Instead, data streams must be fused into tangible facts and these pieces of information must be combined with a background knowledge to infer new pieces of knowledge. And since many IoT applications require almost real-time reactivity to stimulus of the environment, such information inference process has to be performed in a continuous, on-line manner. This paper proposes a new semantic model for data stream processing and real-time reasoning based on the concepts of Semantic Stream and Fact Stream, as a natural extension of Complex Event Processing (CEP) and RDF (graph-based knowledge model). The main advantages of our approach are that: (a) it considers time as a key relation between pieces of information; (b) the processing of streams can be implemented using CEP; (c) it is general enough to be applied to any Data Stream Management System (DSMS). Last, we will present challenges and prospects on using machine learning and induction algorithms to learn abstractions and reasoning rules from a continuous data stream. KeywordsInternet of Things (IoT); sensors; data streams; complex event processing (CEP); semantic reasoning; inference; machine learning. I. I NTRODUCTION Several complex IoT applications, such as manufacturing industry, transportation systems and healthcare, put hard real time requirements on the acquisition and processing of sensor data for iden- tifying situations and extracting information from systems’ operations and its environment. These typically require on-line processing of continuous streams of sensor data (Data Stream Processing), sensor fusion techniques, pattern recognition and timely and autonomous systems control. However, so far in current IoT systems, sensing and actuation is mostly done at the bare bones data level, whereas many IoT applications demand higher level situation awareness of – and reasoning about – the systems’ states and the physical envi- ronment where they operate. For this to be possible, it is necessary to have comprehensive semantic models for data stream analysis and actuation. Semantic models are formally defined concepts and relations on which reasoning engines can operate to derive new bits of information and knowledge about a system and its environment. The main problem is that current semantic models (designed for the Semantic Web) are not suitable for efficient and real-time reasoning. Current data analysis for IoT systems is either done off-line or lacks any semantic-based reasoning. For example, consider a production plant in the near future, where several – mobile or stationary – robots operate in a product assembly and interact with each other to hand over parts and tools of IEEE 1 | Page

Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

  • Upload
    voduong

  • View
    219

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

Towards Stream-based Reasoning andMachine Learning for IoT Applications

Markus EndlerDept of Informatics, PUC-Rio

Rio de Janeiro, BrazilEmail: [email protected]

Jean-Pierre BriotLIP6, UPMC–CNRS & Dept of Informatics, PUC-Rio

Paris, France Rio de Janeiro, BrazilEmail: [email protected]

Vitor P. de AlmeidaDept of Informatics, PUC-Rio

Rio de Janeiro, BrazilEmail: [email protected]

Francisco Silva e SilvaLSDi, Univ. Federal do Maranhao

Sao Luis, BrazilEmail: [email protected]

Edward H. HaeuslerDept of Informatics, PUC-Rio

Rio de Janeiro, BrazilEmail: [email protected]

Abstract—As distributed IoT applications becomelarger and more complex, the pure processing ofraw sensor and actuation data streams becomesimpractical. Instead, data streams must be fused intotangible facts and these pieces of information must becombined with a background knowledge to infer newpieces of knowledge. And since many IoT applicationsrequire almost real-time reactivity to stimulus of theenvironment, such information inference process hasto be performed in a continuous, on-line manner.This paper proposes a new semantic model for datastream processing and real-time reasoning based onthe concepts of Semantic Stream and Fact Stream,as a natural extension of Complex Event Processing(CEP) and RDF (graph-based knowledge model). Themain advantages of our approach are that: (a) itconsiders time as a key relation between pieces ofinformation; (b) the processing of streams can beimplemented using CEP; (c) it is general enoughto be applied to any Data Stream ManagementSystem (DSMS). Last, we will present challenges andprospects on using machine learning and inductionalgorithms to learn abstractions and reasoning rulesfrom a continuous data stream.

Keywords—Internet of Things (IoT); sensors; datastreams; complex event processing (CEP); semanticreasoning; inference; machine learning.

I. INTRODUCTION

Several complex IoT applications, such asmanufacturing industry, transportation systems and

healthcare, put hard real time requirements on theacquisition and processing of sensor data for iden-tifying situations and extracting information fromsystems’ operations and its environment. Thesetypically require on-line processing of continuousstreams of sensor data (Data Stream Processing),sensor fusion techniques, pattern recognition andtimely and autonomous systems control.

However, so far in current IoT systems, sensingand actuation is mostly done at the bare bonesdata level, whereas many IoT applications demandhigher level situation awareness of – and reasoningabout – the systems’ states and the physical envi-ronment where they operate. For this to be possible,it is necessary to have comprehensive semanticmodels for data stream analysis and actuation.Semantic models are formally defined concepts andrelations on which reasoning engines can operateto derive new bits of information and knowledgeabout a system and its environment. The mainproblem is that current semantic models (designedfor the Semantic Web) are not suitable for efficientand real-time reasoning. Current data analysis forIoT systems is either done off-line or lacks anysemantic-based reasoning.

For example, consider a production plant in thenear future, where several – mobile or stationary –robots operate in a product assembly and interactwith each other to hand over parts and tools of

IEEE 1 | P a g e

Page 2: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

the assembly line. Suddenly, there is a short poweroutage and the assembly line stops for a fewseconds, so that some robots go back to theirconsistent initial states, while others continue theiractivity (e.g., on battery power) and only stopwhen their sensors notice that the production lineis not advancing. In this case, the robots haveto “understand” what has happened, and have to“know” which of the machinery (and robots) are inwhich state when activity is resumed, as well as theassembly stage of items being produced. And likemagic, only a few seconds after energy is back, therobots synchronize with each other, identify missedsteps in the assembly process of each item, andresume cooperating again. Such knowledge andunderstanding is only possible because all robotshave not only a semantic model of their own state,but also situational awareness, i.e. a comprehensivemodel of the production process as a whole andtheir role in the entire process. The semantic modelfurthermore describes possible localized and globalproblems of the entire production process, as wellas individual and specific actuation plans for somesituations. As all possible situations cannot berepresented in a model, the robots have to classifyfeatures, combine situational patterns and combineparts of specific action plans. In the aforementionedIoT scenario, the robots would be capable of suchfast recovery of the manufacturing process be-cause their situational understanding (i.e. semantic-centered inference/reasoning process) is executedvery fast, with almost no delay, as soon as eachrobot’s operational capability is back.

With the goal of finding a suitable semanticmodel for IoT, this paper proposes a novel ap-proach for real-time symbolic reasoning based onthe concepts of Semantic Stream and Fact Stream,as natural extensions of Complex Event Process-ing (CEP) [18] and RDF (graph-based knowl-edge model) [10]. The main advantages of ourapproach are that: (a) it uses the timestamp andco-location information to correlate actions/eventshappening at different real-world entities (i.e. ob-jects and subjects); (b) the online processing ofsemantic streams can be implemented using con-ventional CEP technology and semantic reasoningapproaches; (c) using ontology-based reasoningover a knowledge base, it is possible not only todeduce future or indirect events that would notbe detected through CEP, but also to generate

new CEP rules for the stream analysis; (d) theapproach is generic enough to be applied to manyData Stream Management Systems (DSMS). Thisresearch is being carried out in the scope of theESMOCYP cooperation project between PUC-Rio,Federal University of Maranhao and University ofStuttgart. We are currently developing a prototypeof the semantic stream reasoning using ContextNet,our distributed and scalable middleware for theInternet of Mobile Things [21]. It is a mobile-cloud architecture where several interconnectedCEP agents can be deployed both in a cloud/cluster[5], as well as on Android mobile devices [20].

The paper is structured as follows. In Section II,we explain the basic concepts of Complex EventProcessing and list some common approaches formodeling knowledge and performing reasoning.Section III explains the two steps of semanticstream reasoning. In Section IV, we present a sce-nario to explain how our reasoning process wouldbe performed using temperature and accelerometersensors embedded into vehicles, houses and in thestreet. Section V discusses related work. In Sec-tion VI, we discuss the benefits of our approach andprospects. Section VII presents an initial analysisof how machine learning and induction algorithmscould help in automatically or semi-automaticallyextracting stream analysis patterns and rules. Sec-tion VIII then concludes the paper.

II. FUNDAMENTALS

A. Complex Event Processing

Complex Event Processing (CEP) [18] providesa rich set of concepts and operators for processingevents, which include the CQL-like (ContinuousQuery Language) [4] queries, rules, primitive func-tions (aggregation, filtering, transformation, etc.)and production of derived events. A CEP workflowcontinuously processes incoming events, analysesand manipulates them, and outputs derived eventsthat are delivered to event consumers. These out-put usually represent notifications about detectedsituations of interest to the applications.

The processing of events is described by CEPrules, which are Event-Condition-Actions that com-bine continuous query primitives with context op-erators (e.g., temporal, logical, quantifiers) on re-ceived events, checking for correlations among

IEEE 2 | P a g e

Page 3: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

these events, and generating complex (or compos-ite) events that summarize the correlation of theinput events. For example, a split rule takes aninput event and creates a set of events, while afilter rule only outputs events that satisfy a givencriteria. Rules can also operate on a collection ofevents, for example, an aggregate rule outputs asingle event by executing a function on the groupedevents, while a join transformation tries to correlateevents from various data streams. Another impor-tant concept in CEP is that of sliding time and eventwindows. A time window is a temporal contextthat subdivides the stream of events into intervals,where CEP rules and operators are applied only tothe events within each window. CEP supports threesorts of windows: landmark, sliding and fading,the latter being a sliding window where a decayfactor λ is applied to the events according to theirage, i.e. more recent events have higher importancethan older events. Most CEP systems have theconcept of Event Processing Agents (EPAs), whichare software modules that implement one trans-formation within the event processing workflow.The type of an EPA is defined by the rules itimplements, such as filtering, counting or specificevent pattern detection. Note that rules are handwritten by experts. We will address in Section VII,a preliminary analysis of how machine learning andinduction algorithms could help in, automaticallyor semi-automatically, constructing rules as wellas extracting patterns.

B. Knowledge Representation and Reasoning Ap-proaches

There are plenty of Semantic Models that rep-resent knowledge about a system and its environ-ment, but almost all of them have problems ofscale (i.e. the reasoning has high computationalcomplexity), and thus are not suitable for real-timereasoning. The main semantic approaches are (see,e.g., for a survey and comparison in [19]):

• Frame Based Models: A frame is an arti-ficial intelligence data structure used to di-vide knowledge into substructures by rep-resenting “stereotyped situations”. Theyare used in artificial intelligence Framelanguages.

• Conceptual Graphs: are a logical formal-ism that includes classes, relations, indi-

viduals and quantifiers. This formalism isbased on semantic networks, but it hasdirect translation to the language of firstorder predicate logic, from which it takesits semantics.

• Description Logic: are logics serving pri-marily for formal description of conceptsand roles (relations). These logics werecreated from the attempts to formalize se-mantic networks and frame based systems.Semantically they are found on predicatelogic,

• Ontologies: An ontology is a seman-tic/concept network that contains a body ofknowledge describing some domain, typ-ically common sense knowledge relatingconcepts.

• Semantic Web: RDF, RDFS and OWL:RDF (Resource Description Framework)is a framework for representing informa-tion about resources in a graph model,where information is represented by triples(subject, predicate, object). RDFS (RDFSchema) extends RDF vocabulary to allowdescribing taxonomies of classes and prop-erties. It also extends definitions for someof the elements of RDF, for example it setsthe domain and range of properties andrelates the RDF classes and properties intotaxonomies using the RDFS vocabulary.Web Ontology Language (OWL) bringsthe expressive and reasoning power of De-scription Logic (DL) to the semantic web.It is divided into two levels: OWL Lite andOWL DL, which differ in their expressivepower and the deduction complexity. Thelimitation with OWL Lite and OWL DLis that reasoning is hardly implemented inan efficient way, and it also suffers fromlack of scalability.

III. GENERAL IDEA

The general idea of our semantic model andreasoning approach is to define two-level CEPtransformations, each of which transforms oneevent flow/stream into a semantically richer one: 1)from annotated preprocessed events to RDF triples;and 2) from RDF triples to a stream of facts.

IEEE 3 | P a g e

Page 4: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

Initially, sensor data received from smart objectsare pre-processed so as to identify: a) the entitytype and instance from the received UUID; andb) what is happening to the entity, e.g., if it isdoing some action, experiencing a state change orany other transformation. This 2nd type of pre-processing may be performed, e.g., through CEP(by matching a sequence of data onto a pre-definedtemporal pattern identifying a specific pattern ofaction). This leads to a stream of semanticallyannotated data with pairs (subject, predicate) or(object, predicate). The entity type/instance andpredicate identification is performed by CEP agentclose to the sensors, (see Figure 1), that in thespecific case of our IoT middleware typically exe-cute on mobile devices. Therefore, we named themMobile Event Processing Agents (Mobile EPAs).

Then, in the first stream processing stage, ourapproach transforms the stream of annotated datainto a stream of RDF statements, and in the secondstage, we transform the stream of RDF-triplesinto semantically richer facts, i.e combining RDFstatements. The details of each of these stages areexplained in the following.

A. Mapping Data Events to Semantic Events

Our reasoning approach dictates that eachsimple annotated event (actually, a data objectwith member attributes) represents an action-basedpredicate (i.e. the event is the outcome of an action)and has at least one of the other two remainingRDF elements: the subject or the object. If theevent has the ID of the subject and the objectthen we have the complete RDF triple (subject,predicate, object), but otherwise, the missing thirdRDF element of the triple may be inferred fromthe shared context (i.e. the temporal and spatialcorrelation) of both elements, the subject and theobject when these are received in separate events.For example, if we consider RDF statement (ball,kicking, in the front-yard), then the event instancesrepresent the predicate kick. It further carries theID of either the ball (e.g., when the ball carriesan accelerometer sensor), or else the ID of theyard (e.g. the GPS-position or the street numberof the yard (e.g., lawn sensors detect some kickingobject). And the shared context is defined by thesame location (co-location) of the events and thesynchronicity of the events that the sensors on the

yard ground and the sensor in the ball detect thehitting of the ball with the lawn (the kick). Thiscontextual correlation is performed by CEP rulescalled Context mappers, that analyze the streams ofevents and match Subjects, Objects and Predicates.

Figure 2 shows how Context mappers analyzeeach pair of events in the sliding time window (e.g.,60 s.) of Data Event Stream and try to identifycommon contexts, based on time proximity or anyother data attribute.

B. Mapping Semantic Events to Knowledge Facts

The mapping from Semantic Events (i.e. RDFtriples) to Facts is achieved by Semantic Event(SEv) rules. These are CEP rules that look outto find causality and temporal patterns in severalSemantic Event sub-Streams, where each streamcomprises the Semantic events of a given context.This “context-specific splitting” is possible in mostCEP engines by the concept of a stream partition(a.k.a. context). Then, depending on the SEv rule,it might consume, filter out, modify or even insertnew RDF triples in some SEv streams, a featurethat is supported by CEP. This manipulation isachieved by querying the Knowledge base aboutall the concepts and relations pertaining to the sub-streams analyzed. For example, the inference mightdeduce that the ”kicking ball with a given ID” has”Bob” as its owner, and that the ”yard where theball is kicking” is the one where Bob lives. By this,the new piece of knowledge may be derived suchas ”someone is kicking Bob’s ball on his house’syard”. And maybe with the context information”Bob has finished his homework”, it is possibleto deduce – with high probability – that ”Bob iskicking his ball in his house’s yard”.

The Knowledge base is organized as nestedcontexts [9], which allows a much more efficientchecking of concepts and relations when comparedto single-layer (or flat) ontologies. For example, theontology of the Knowledge Base may be organizedas the following nested contexts: Spatial nestedcontexts: ”Green Way district” ⊃ ”house at 10Rodeo Dr.” ⊃ its yard ⊃ its lawn; Temporal nestedcontexts: ”Bob’s leisure time” ⊃ ”Thursday” ⊃”afternoon” ⊃ ”Bob’s homework finished”; Con-tainment nested context, such as, ”Bob’s toys” ⊃balls ⊃ ”Basket ball with ID”, etc.

IEEE 4 | P a g e

Page 5: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

Fig. 1. Semantic annotation from raw sensor data

Fig. 2. Mapping Data Events to Semantic Events

Figure 3 shows how Semantic Event rules an-alyze all RDF triples in the sliding time window(e.g., 180 s.) of sub streams of Semantic Events,trying to find event patterns, filtering, manipulatingor adding RDF triples into “their” main contextsub-stream or also of sub-streams of semanticallyrelated contexts, such as, ”the front yard” and the”street in front of the yard”.

C. Deriving Situations

Using the Facts of the stream and checkingthem against the Semantic Graph (Ontology) ofthe knowledge base, complex situations may beidentified such as ”Bob is playing basketball inthe front yard, but should be notified that a strongstorm is approaching his house’s yard”. More-over, some of the complex facts may be usedfor expanding, reinforcing or removing some theknowledge about a subject, an object or a place.For example, after Bob’s pen has finished writing

QED on the page with the exercises of his Math’shomework notebook, the latter has been closed, andhis Bob house’s main door has been opened andclosed, sensing that someone left the house, thenthe Knowledge Base will be expanded with thefacts that (Bob, finished, Math homework), (Bob,left, house) and (Bob, stepped into, yard).

IV. AN EXAMPLE OF REASONING OVER DATASTREAMS

In this section we show how the aforemen-tioned two-phase reasoning could be done with off-the-shelf components and current wireless WPANtechnologies, such as Bluetooth Low Energy(BLE). Consider a scenario where smart ambientsensors are everywhere: in houses, offices, publictransportation, in the streets and in private cars,and that these smart devices include a temperatureand an accelerometer sensor, have a unique UUIDand Bluetooth Low Energy interface. Now consider

IEEE 5 | P a g e

Page 6: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

Fig. 3. Mapping Semantic Events to Knowledge Facts

a user, Silva, lives in Rio de Janeiro and carries asmartphone running our Internet of Things middle-ware ContextNet [11], [21]. This middleware usesthe smartphone as the bridge between Bluetooth-enabled smart devices/objects/sensors and IoT ap-plication servers executing in a cloud. The mobilemiddleware (Mobile Hub) periodically issues aBLE scan, discovers nearby BLE devices, connectsto them, subscribes to the smart device’s sensorsand writes commands to the smart objects thathave some actuator. Assume that it is summertimeand that some IoT application needs to know ifSilva is in his office, if he is walking on thestreet or if he is in a bus or car. Whenever Silva’ssmartphone encounters a BLE smart sensor, it ispossible to deduce if he is in an air conditionedspace or not, and whether he is in movementor not (due to the smart device’s accelerometer).Moreover, if the location of each deployed smartdevice is previously registered, it is further possibleto deduce if Silva is in his office or elsewhere.And this can be deduced even without the use ofGPS, either because of its signal is not available(indoors), or because Silva decides to keep it offto save the smartphone’s battery.

In this case, it would be possible to de-duce the RDF triple (Silva, rides, BusLine435)

from the following simpler semantic events inthe stream: (Silva’s cell phone, connectedTo,sensorX), (sensorX, in, BusKKZ8674), (sensorX,Abs(Accelerator) > 10), (sensorX, temp=20) andthe fact that BusKKZ8674 operates ”BusLine 435”.Moreover, it would be possible to deduce that Silvais moving in the traffic, but that he is in an air-conditioned bus, which may be very important dur-ing Rio’s summertime, when outdoor temperaturecan reach more than 45 degrees (C). Figure 4shows the Mobile Hub with 4 SensorTags, eachwith 6 sensors (temperature, accelerometer. . . ).

V. RELATED WORK

In an early work, Adi et al. [1] presents ab-stractions that describe semantic relationships be-tween events, object and tasks. These are definedas generalizations and associations and throughattributes that may reference events. Their abstrac-tions are suitable for specification but cannot becomputed efficiently. On the other hand, the work[3] describes a system (ETALIS) that can performreasoning over streaming events with respect tobackground knowledge, similar to our KnowledgeBase. It implements two languages for specificationof event patterns: the rule based ETALIS Lan-guage for Events (ELA), and Event Processing

IEEE 6 | P a g e

Page 7: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

Fig. 4. ContextNet Mobile Hub with four SensorTags

SPARQL. ETALIS can evaluate domain knowl-edge on-the-fly, thereby proving semantic relationsamong events and reasoning about them. Theirsemantic relations among events are time-based,but don’t have the synchronicity requirement. An-other difference is that they do not generate a RDFStream which they check against a knowledge base.Thus, their inference is much simpler than the oneproposed in our project.

Tachmazidis et al. [23] propose a reasoningmethod over RDF triples based on defeasible logic(i.e. a non-monotonic logic) which can be imple-mented in a massively parallel way. They usedHadoop, an open-source implementation of theMapReduce paradigm, and a stratified rule setfor a more efficient processing of the knowledgebase. Unlike our proposal, they do not handleStream Processing and do not apply their methodto reasoning for time-critical systems, such as CPS.Moreover, their choice for defeasible logic limitsthe sorts of knowledge that can be inferred by theirsystem, as opposed to temporal logic, which shallhave highly parallelizable implementations.

The following projects CityPulse [15] Star-City[17] and FIESTA-IoT [2] also present researchtoward the use of Semantic Stream reasoning. Allof these projects use the knowledge base in order todeduce new context/facts. Also, they use a single-layer (or flat) ontology model, which differs fromour ontology model that is organized as nestedcontexts. Moreover, none of these projects focuson the problem of delivering real-time reasoning.

The FIESTA-IoT project [2] integrates severalother projects and one of them is the CityPulseproject [15]. The main goal of these projects is

to achieve semantic interoperability at differentlevels (hardware, data, model, query, reasoningand application levels). The StarCity project hasa similar idea, but it is aimed at using semanticsto provide interoperability at the data level.

On the other hand, the work by Teymourianet al. [24] has the same focus as our work. Theyuse a similar idea and combine the use of SCEPrules (semantic web plus CEP) with a semanticknowledge base to deliver real-time reasoning.The difference is that our work uses an ontologymodel organized as nested context to representcontext information, rather then a flat ontologymodel. As a result, it is more efficient on queryprocessing, because when we execute a query, thequery will be processed only using a sub-set ofthe knowledge base (a partition of the knowledgebase). Furthermore, another difference is that weplan to insert new SCEP rules on-the-fly, basedon new facts generated by the reasoning over theknowledge base. Consequently, it will give theapplication a more efficient approach to adapt todifferent situations. For example, in a monitoringapplication, we only need a CEP rule that triggersan action based on an altitude situation only if themonitored person is in a high altitude, until thenthis rule does not need to be there.

VI. DISCUSSION

Combining symbolic reasoning based on on-tologies with Complex Event Processing has sev-eral advantages. Firstly, it allows to leverage CEP’sefficient processing of dense flows of simpleevents, not just over raw sensor events but also overRDF triples. Secondly, CEP’s ability to produce

IEEE 7 | P a g e

Page 8: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

complex events is also necessary for the iterativegeneration of higher level information from lowerlevel bits of information.

On the other hand, while CEP is appropriatefor processing data that is carried by the incom-ing events, it is incapable of detecting domain-specific relationships between events that are pro-duced by distinct entities/objects that apparentlyhave no relation with each other, or when thisrelationship cannot be directly encoded by the(meta-)information carried by the events. Symbolicreasoning using ontologies, on the other hand,can very well model these “indirect” relationshipsamong the monitored entities and/or their corre-sponding events. And hence, by using the resultsof a query over a domain-specific ontology duringa CEP-based continuous processing, it becomespossible to generate new sorts of events (i.e.,fact events), which are produced independentlyby the Semantic Event reasoners in response tothe consumption of some RDF triples. These Factevents, which in some sense embody some seman-tic knowledge that was forked off the knowledgebase, can in turn be further processed by otherCEP engines, and may be used to predict eventsthat actually did not yet happen, but which are anatural consequence of initial events that have beendetected by CEP.

This makes us consider the Semantic Webreasoners as a special kind of CEP engines, whichhave access to the knowledge base, consume RDFevents and eventually produce fact events that arepassed on to other CEP engines in the EventProcessing Network. (See Figure 2).

VII. TOWARDS AUTOMATED RULES ANDPATTERNS INDUCTION

In this section, we briefly discuss prospects forusing machine learning and induction techniquesto extract useful information from the data stream.At first, let us mention that although there areknown and proven techniques for extracting in-formation from data, from raw data to structureddata and knowledge, most of them have been de-signed as off-line techniques and with the assump-tion of all data present in the working memory.Therefore, there is a great challenge in adaptingwhen possible current techniques to on-line streamdata processing with huge volumes of data or

designing new techniques. A good review of theissues (continuous data streams flow, unboundedmemory requirements, mining changes, avoidingoverfitting. . . ) can be found in [12]. Another goodanalysis could be found is [22].

Let us start with the raw data produced by thesensors. Unsupervised algorithms, such as sparseautoencoders, may be used to automatically extracthigher level features [16]. The basic idea is to usean autoencoder, a neural network with a hiddenencoding layer and a decoding output layer identi-cal to the input layer. We add an additional sparsityactivation constraint, in order to enforce specializa-tion of each neuron as a specific feature detector.Training an autoencoder, sometimes called self-supervised, relies on traditional supervised learningon learning the identity, as the autoencoder learnsto reconstruct its input data on its output. Oncetrained, to extract features from an input, one justneeds to feed forward the input data and gather theactivations. One may use successive levels (stacks)of autoencoders in order to extract more abstractfeatures. Although standard training is off-line, onemay make use it incrementally, with successiverounds of batch training.

An interesting end to end approach has beenproposed by Ganz et al. in [13]. The first step,named SensorSAX (as for Sensor Symbolic Ag-gregate Aproximation), is the discretization of datainto qualitative attributes, encoded in some alpha-bet words. Then a clusterization step is applied,using a k-means non supervised clusterization al-gorithm, by considering time as one of the criteria,to form patterns, which are proto-concepts (notyet named concepts). Temporal relations betweenthese proto-concepts are extracted by constructinga Markov model, a statistical predictive modelof temporal occurrences of proto-concepts. Threekinds of temporal relations are considered: oc-cursAfter, occursBefore and occursSame. The laststep consists in manual labeling, i.e. naming proto-concepts into symbolic concepts (e.g., ”coldTem-perature”). The authors are also considering thepossibility of automatic labeling, derived from thelabels of the sensors and a common sense ontology.

When starting from RDF triples, one may con-sider various knowledge extraction methods basedon ontologies (mostly based on OWL), designedfor the Semantic Web. One objective is induction,

IEEE 8 | P a g e

Page 9: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

to be able to construct more abstract knowledge(concepts/hypotheses) from the facts. Various algo-rithms exist and aim at both generalizing examplesinto concepts, while specializing them in order touncover counter examples. Inductive Logic Pro-gramming (ILP) is a seminal formalism but thereare many variant (see for instance the DL-Learnerframework [8] which includes various ones), aswell as related techniques like decision trees con-struction and also exploratory approaches based ongenetic algorithms.

An interesting proposal in [7] offers inductivereasoning as well as deductive reasoning on RDFdata streams. Deductive reasoning is performedon queries constrained by concepts expressed inOWL. C-SPARQL [6] is the query language used.It is an extension for continuous queries on RDFstreams of the SPARQL RDF query language.Inductive reasoning is performed on a subset ofdata in order to be practically computable. The userdefines statistical units (entities, e.g., persons) aswell as a population of these entities (e.g., at aspecific institution or location) on which he wishesto make inductive queries. The inductive engineperiodically updates data matrices representing thefeatures of the population of the statistical unitsconsidered (actually, there are two kinds of ma-trices, one long term stable and one short termrepresenting the trends) and conducts a multivari-ate analysis of theses matrices. The trained modelcould then be used to predict relationships betweenentities at query time.

Last, it is also important to be able to ex-tract temporal relations. A promising proposal isby Georgala et al. [14] to efficiently extract allpossible temporal relations (along seminal Allen’staxonomy and algebra of temporal intervals) fromtime stamped RDF streams.

In summary, we could see that there are variousinteresting directions for introducing automatedmachine learning and knowledge extraction tech-niques into our framework. One important issueis the dynamicity of the data produced. That is,because of the continuous stream of data, we needto find good trade-offs between: the demand forhigher level knowledge, the cost for extracting it(processing cost as well as memory cost/limitation)and the risk of it being obsolete, depending on: theusage, the nature of the data and the computing &

communicating resources available. For instance,in the scope of our two stage process, we believethat machine learning could be effective as a wayto consolidate into the knowledge base the factswhich occur very frequently (see Figure 3). Anexample of such learning (in that case, inductive)is the identification of two temperature settings,inside a air-conditioned bus and outside, that willbe extracted from repeated facts of passengersentering and exiting air-conditioned buses. There-fore, one needs to carefully examine what exactmachine learning techniques we will insert into ourframework, and at which stage.

VIII. CONCLUSION AND FUTURE WORK

This paper presented a real-time reasoning ap-proach based on semantic events and fact streamsfor IoT systems. The reasoning approach is basedon the assumptions that all objects, people, build-ings, places, vehicles, environments, etc. will havemany embedded tiny sensors that will emit simpleevents whenever some action is performed with/toit by an actor, and that each event will carrythe items’ unique UUID and an accurate time-stamp. By enforcing the restriction that predicatesin a RDF triple must be action-based, such as”kick”, ”put”, ”grab”, etc., rather than state-based,such as ”has”, ”is”, ”belongs to”, etc., we are ofcourse limiting the amount of information that thedata/event streams are capable to express. How-ever, we believe that the action-based predicatesare the really important ones for reasoning inIoT applications. All the state predicates, on theother hand, should instead be represented by thenested context-based ontology in the KnowledgeBase. We are aware that this is only a first andinitial step towards adding semantics to real-timereasoning over data streams, and that much moretheoretical and practical research is required tovalidate our approach, evaluate it under a broaderperspective and show its feasibility for large-scaleand distributed IoT applications. However, we areconfident that it is a promising first step. As nextsteps, we will finish the development of the ContextMappers and Semantic Event Rules using Esper’sEPL (Event Processing Language) and deploy themon our mobile IoT middleware. In parallel, we willmodel a simple scenario and the main entities andtheir relationships, as described in Section IV andrepresent it as nested contexts.

IEEE 9 | P a g e

Page 10: Intelligent Systems Conference 2017 7-8 September 2017 j ...endler/paperlinks/Intellisys-2017.pdf · Intelligent Systems Conference 2017 7-8 September 2017 jLondon, UK the assembly

Intelligent Systems Conference 20177-8 September 2017 | London, UK

Acknowledgements: Our ESMOCYP (Ef-ficient Semantic MOdels and Fault-tolerant Mid-dleware for CYber-Physical Systems) Projectis supported by PROBRAL CAPES–DAADBrazil-Germany cooperation program (Process No8148/2015-05) and by a CAPES PVE fellowshipto J.-P. Briot.

REFERENCES

[1] A. Adi, D. Botzera, and O. Etzion. Semantic EventModel and its Implication on Situation Detection. In Eu-ropean Conference on Information Systems (ECIS 2000),2000.

[2] Y. Al-Hazmi and T. Magedanz. Towards semantic mon-itoring data collection and representation in federatedinfrastructures. In IEEE International Conference onFuture Internet of Things and Cloud (FICloud), pages17–24, 2015.

[3] D. Anicic, S. Rudolph, P. Fodor, and N. Stojanovic.Stream Reasoning and Complex Event Processing inETALIS. Semantic Web, (1):1–5, 2009.

[4] A. Arasu, S. Babu, and J. Widom. The CQL continuousquery language: Semantic foundations and query execu-tion. The VLDB Journal, 15(2):121–142, June 2006.

[5] G. Baptista, F. Carvalho, S. Colcher, and M. Endler.A middleware for data-centric and dynamic distributedcomplex event processing for iot real-time analytics in thecloud. In 34th Brazilian Symposium on Computer Net-works and Distributed Systems (SBRC’2016), Salvador,Brazil, June 2016.

[6] D. Barbieri, D. Braga, S. Ceri, E. Della Valle, andM. Grossniklaus. C-SPARQL: SPARQL for continuousquerying. In 18th International World Wide Web Confer-ence (WWW’09), pages 1061–1062, 2009.

[7] D. Barbieri, D. Braga, S. Ceri, E. Della Valle, Y. Huang,V. Tresp, A. Rettinger, and H. Wermser. Deductive andinductive stream reasoning for semantic social media an-alytics. IEEE Intelligent Systems, 25:32–41, November–December 2010.

[8] L. Buhmann, J. Lehmann, and P. Westphal. DL-Learner– A framework for inductive learning on the semanticweb. Web Semantics: Science, Services and Agents onthe World Wide Web, 39(C):15–24, August 2016.

[9] I. Cafezeiro, J. Viterbo, A. Rademaker, E. Haeusler, andM. Endler. Specifying ubiquitous systems through thealgebra of contextualized ontologies. The KnowledgeEngineering Review, 29(02):171–185, 2014.

[10] K. S. Candan, H. Liu, and R. Suvarna. Resource descrip-tion framework: Metadata and its applications. SIGKDDExplor. Newsl., 3(1):6–19, July 2001.

[11] M. Endler, G. Baptista, L. D. Silva, R. Vasconcelos,M. Malcher, V. Pantoja, V. Pinheiro, and J. Viterbo.Contextnet: Context reasoning and sharing middlewarefor large-scale pervasive collaboration and social net-working. In ACM/IFIP/USENIX 12th International Mid-

dleware Conference Workshop on Posters and DemosTrack, pages 2:1–2:2. ACM, 2011.

[12] M. Gaber, A. Zaslavsky, and S. Krishnaswamy. Miningdata streams: A review. SIGMOD Record, 34(2):18–26,June 2005.

[13] F. Ganz, P. Barnaghi, and F. Carrez. Automated semanticknowledge acquisition from sensor data. IEEE SystemsJournal, 10(3):1214–1225, September 2016.

[14] K. Georgala, M. Sherif, and A.-C. Ngonga Ngomo.An efficient approach for the generation of Allen re-lations. In 22nd European Conference on ArtificialIntelligence (ECAI’2016), Den Haag, The Netherlands,August–September 2016.

[15] M. Giatsoglou, D. Chatzakou, V. Gkatziaki, A. Vakali,and L. Anthopoulos. CityPulse: A platform prototype forsmart city social data mining. Journal of the KnowledgeEconomy, 7(2):344–372, 2016.

[16] Q. V. Le, M. Ranzato, R. Monga, M. Devin, K. Chen,G. S. Corrado, J. Dean, and A. Y. Ng. Building high-level features using large scale unsupervised learning.In 29th International Conference on Machine Learning,Edinburgh, U.K., 2012.

[17] F. Lecue, S. Tallevi-Diotallevi, J. Hayes, R. Tucker,V. Bicer, M. L. Sbodio, and P. Tommasi. Star-City:semantic traffic analytics and reasoning for city. In 19thinternational conference on Intelligent User Interfaces,pages 179–188. ACM, 2014.

[18] D. C. Luckham. The Power of Events: An Introductionto Complex Event Processing in Distributed EnterpriseSystems. Addison-Wesley Longman Publishing Co., Inc.,Boston, MA, USA, 2001.

[19] N. Fridman Noy and C. D. Hafner. The state of the artin ontology design – A survey and comparative review.AI Magazine, 18(3), 1997.

[20] L. Talavera Rios, M. Endler, and S. Colcher. An energy-aware IoT gateway with continuous processing of sensordata. In 34th Brazilian Symposium on Computer Net-works and Distributed Systems (SBRC’2016), Salvador,Brazil, June 2016.

[21] L. Talavera Rios, M. Endler, I. Vasconcelos, R. Vascon-celos, M. Cunha, and F. Silva e Silva. The mobile hubconcept: Enabling applications for the internet of mobilethings. In 12th IEEE Workshop on Managing UbiquitousCommunications and Services (MUCS 2015), pages 123–128, St. Louis, MI, USA, March 2015.

[22] X. Su, E. Gilman, P. Wetz, J. Riekki, Y. Zuo, andT. Leppanen. Stream reasoning for the Internet ofThings: Challenges and gap analysis. In 6th InternationalConference on Web Intelligence, Mining and Semantics(WIMS’16), Nımes, France, June 2016.

[23] I. Tachmazidis, G. Antoniou, G. Flouris, S. Kotoulas, andL. McCluskey. Large-scale parallel stratified defeasiblereasoning. In ECAI-12, 2012.

[24] K. Teymourian, M. Rohde, and A. Paschke. Fusionof background knowledge and streams of events. In6th ACM International Conference on Distributed Event-Based Systems, DEBS’12, pages 302–313. ACM, 2012.

IEEE 10 | P a g e