16
Int. J. Ad Hoc and Ubiquitous Computing, Vol. x, No. x, xxxx 1 SensorStream: a semantic real-time stream management system Dimitrios-Emmanuel Spanos*, Periklis Stavrou and Nikolas Mitrou School of Electrical and Computer Engineering, National Technical University of Athens, Zografou 157 73, Athens, Greece E-mail: [email protected] E-mail: [email protected] E-mail: [email protected] *Corresponding author Nikolaos Konstantinou Athens Information Technology, 19,5km Markopoulo Ave., Peania, Greece E-mail: [email protected] Abstract: As data proliferates at increasing rates, the need for real-time stream processing applications increases as well. In the same way that Data Stream Management Systems (DSMS) have emerged from the database community, there is now a similar concern in managing dynamic knowledge among the Semantic Web community. Unfortunately, early relevant approaches are, to a large extent, theoretical and do not present convincing evidence of their efficiency in real dynamic environments. In this paper, we present a framework for the effective, real-time processing of streaming data and we define and analyse in depth its key components. Our framework serves as a basis for the implementation of the SensorStream prototype, on which we run numerous performance and scalability measurements that outline its behaviour and demonstrate its suitability and scalability for solutions that require real-time information processing from distributed and heterogeneous data sources. Keywords: stream data; sensors; semantic management; data management; ontology; dynamic knowledge; scalable framework; real-time; windowing; information processing; distributed sources; heterogeneity; performance measurements. Reference to this paper should be made as follows: Spanos, D-E., Stavrou, P., Mitrou, N. and Konstantinou, N. (xxxx) ‘SensorStream: a semantic real-time stream management system’, Int. J. Ad Hoc and Ubiquitous Computing, Vol. x, No. x, pp.xxx–xxx. Biographical notes: Dimitrios-Emmanuel Spanos received his Diploma Degree in Electrical and Computer Engineering from the National Technical University of Athens (NTUA) in 2005 and a Master in Business Administration (MBA) from the Athens University of Economics and Business (AUEB) in 2007. Since 2007, he is a PhD Candidate in NTUA and a member of the Multimedia Communications and Web Technologies Research Group. His research interests lie in the field of Semantic Web and knowledge representation including, among others, relational database to ontology mapping, semantic annotation of multimedia content from sensor networks and semantic search in the context of libraries and open access repositories. He is a member of the Technical Chamber of Greece. Periklis Stavrou graduated from National Technical University of Athens (NTUA) with a Dipl-Ing degree in Electrical and Computer Engineering in 2008. Since October 2008, he is part of the Multimedia Communications and Web Technologies Research Group in NTUA, working towards his PhD. Among his main research interests are semantic annotation, data fusion using semantic web technologies and knowledge representation and reasoning in distributed systems. He is a member of the Technical Chamber of Greece. Nikolas Mitrou is a Professor in National Technical University of Athens. His research interests are in the areas of digital communication systems and networks (broadband networks in particular) and networked multimedia in all range of studies: design, implementation, modelling, performance evaluation and optimisation. Since 1988, he has Copyright © 2012 Inderscience Enterprises Ltd.

SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

Int. J. Ad Hoc and Ubiquitous Computing, Vol. x, No. x, xxxx 1

SensorStream: a semantic real-time stream

management system

Dimitrios-Emmanuel Spanos*, Periklis Stavrou

and Nikolas Mitrou

School of Electrical and Computer Engineering,National Technical University of Athens,Zografou 157 73, Athens, GreeceE-mail: [email protected]: [email protected]: [email protected]*Corresponding author

Nikolaos Konstantinou

Athens Information Technology,19,5km Markopoulo Ave., Peania, GreeceE-mail: [email protected]

Abstract: As data proliferates at increasing rates, the need for real-time stream processingapplications increases as well. In the same way that Data Stream Management Systems(DSMS) have emerged from the database community, there is now a similar concernin managing dynamic knowledge among the Semantic Web community. Unfortunately,early relevant approaches are, to a large extent, theoretical and do not present convincingevidence of their efficiency in real dynamic environments. In this paper, we present aframework for the effective, real-time processing of streaming data and we define and analysein depth its key components. Our framework serves as a basis for the implementationof the SensorStream prototype, on which we run numerous performance and scalabilitymeasurements that outline its behaviour and demonstrate its suitability and scalability forsolutions that require real-time information processing from distributed and heterogeneousdata sources.

Keywords: stream data; sensors; semantic management; data management; ontology;dynamic knowledge; scalable framework; real-time; windowing; information processing;distributed sources; heterogeneity; performance measurements.

Reference to this paper should be made as follows: Spanos, D-E., Stavrou, P., Mitrou, N. andKonstantinou, N. (xxxx) ‘SensorStream: a semantic real-time stream management system’,Int. J. Ad Hoc and Ubiquitous Computing, Vol. x, No. x, pp.xxx–xxx.

Biographical notes: Dimitrios-Emmanuel Spanos received his Diploma Degree in Electricaland Computer Engineering from the National Technical University of Athens (NTUA)in 2005 and a Master in Business Administration (MBA) from the Athens University ofEconomics and Business (AUEB) in 2007. Since 2007, he is a PhD Candidate in NTUAand a member of the Multimedia Communications and Web Technologies Research Group.His research interests lie in the field of Semantic Web and knowledge representationincluding, among others, relational database to ontology mapping, semantic annotation ofmultimedia content from sensor networks and semantic search in the context of libraries andopen access repositories. He is a member of the Technical Chamber of Greece.

Periklis Stavrou graduated from National Technical University of Athens (NTUA) with aDipl-Ing degree in Electrical and Computer Engineering in 2008. Since October 2008, he ispart of the Multimedia Communications and Web Technologies Research Group in NTUA,working towards his PhD. Among his main research interests are semantic annotation,data fusion using semantic web technologies and knowledge representation and reasoning indistributed systems. He is a member of the Technical Chamber of Greece.

Nikolas Mitrou is a Professor in National Technical University of Athens. His researchinterests are in the areas of digital communication systems and networks (broadbandnetworks in particular) and networked multimedia in all range of studies: design,implementation, modelling, performance evaluation and optimisation. Since 1988, he has

Copyright © 2012 Inderscience Enterprises Ltd.

Page 2: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

2 D-E. Spanos et al.

been actively involved in many RACE, ACTS and ESPRIT projects and he was thecoordinator of one of them (AC235, WATT). He is a member of the IEEE, member of theIFIP WG 6.3 and member of the Technical Chamber of Greece.

Nikolaos Konstantinou received his Engineer Diploma from the National TechnicalUniversity of Athens (NTUA), School of Electrical and Computer Engineering (ECE),in February 2004. In November 2004, after having fulfilled his military obligations he wasaccepted as a PhD Candidate in NTUA, School of ECE, Division of Communication,Electronic and Information Engineering. In December 2009, he successfully defendedhis PhD Thesis, entitled ‘Semantic information enrichment in relational databases andcontext-aware systems’. During his studies, he participated in a number of nationallyfunded research programme while also obtaining valuable experience working as aSoftware Engineer. In February 2010, he joined the Athens Information Technology wherehe currently works as a Post-doctoral Researcher. His research interests include knowledgemanagement, context-sensitive systems and information flow in the internet of things.His research resulted in several publications, with a very encouraging reception in terms ofcitations. He is a member of the Technical Chamber of Greece and a member of the IEEE.

1 Introduction

In recent years, technological advancements inmicrosensor technology and increasing businessneeds have laid the ground for an emerging class ofapplications that shift their focus towards data streams.The availability of large-scale sensor deployments resultsin huge, often unbounded, quantities of data generatedby all kinds of sensor devices, calling for efficientmanagement and storage. In the same time, businessesseek ways to perform analysis of their data, be it financial,medical, customer or corporate network data, as soonas it becomes available, to have an accurate picture ofthe market situation and be able to respond in a timelymanner. These needs originally led to the development ofseveral DSMSs during the last decade (Arasu et al., 2004;Chandrasekaran et al., 2003; Abadi et al., 2005), so asto fill the gap left by traditional Database ManagementSystems (DBMSs), which cannot deal with continuous,real-time sequences of data (DSMS). DSMS introduced anovel rationale that is not based on persistent storage ofall available data and user-invoked queries, but insteadadvocates on the fly stream manipulation and permanentmonitoring queries.

In the parallel universe of Knowledge Engineering,we can see that while reasoners have excelled theirperformance with regard to static knowledge, the issueof reasoning with frequently changing facts andassertions has not been adequately addressed yet.Stream reasoning has been coined as a term thatdescribes the above-mentioned problem (Della Valleet al., 2009) of performing reasoning on a KnowledgeBase (KB) comprising stable or occasionally changingterminological axioms and a stream of incomingassertions or facts. Advances and solutions to thisproblem will progressively lead the way for smarter andmore complex applications, including traffic managementand fastest route planning, environmental monitoring,surveillance and object-tracking scenarios, and diseaseoutburst detection applications among others.

So far, work in this relatively new field includesre-definition of the RDF model and enhancementsof current W3C recommendations for time-awareknowledge representation and querying (Gutierrez et al.,2007; Pugliese et al., 2008; Rodriguez et al., 2009; Bolleset al., 2008; Barbieri et al., 2009), incremental reasoningtechniques (Cuenca Grau et al., 2007; Perez et al., 2006;Volz et al., 2005) and frameworks that use combinedexperience from the database and Semantic Web researchfields (Walavalkar et al., 2008; Barbieri et al., 2010).However, despite the need for working applicationsthat can manage and query efficiently streaming data,there is considerable lack of such tools in the literaturesince most approaches remain theoretical and fail toprovide evaluation of their performance that will serve asadequate proof of their utility.

In this paper, we present a generic ontology-based architecture for the efficient SPARQL queryingof streaming XML data and event recognition. Ourframework does not impose any restriction to the natureof the incoming data, as long as they are encoded in awell-formed XML template. Moreover, our frameworkcan be tailored to any application context according tothe ontology selected for use. Sets of domain-specificrules define the system’s attitude and the actions tobe taken, when a high-level event is being recognised.We created the SensorStream prototype on top of JenaSemantic Web framework (Carroll et al., 2004) as aworking implementation of our architecture and ranexperiments to assess its performance. Results showthat our system is perfectly scalable, has near real-timeresponse and can handle changes in the terminologicalpart of the ontology, providing sound inferencesat all times.

The rest of the paper is organised as follows:in Section 2, some background information on the streammanagement problem as well as the motivation behindour work are presented. Section 3 mentions the relatedwork, while Section 4 presents in detail our framework.Section 5 provides an evaluation of the efficiency of

Page 3: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

SensorStream: a semantic real-time stream management system 3

our system and the simulation experiments run on.Finally, Section 6 highlights the conclusions and sketchesout future work.

2 Background and motivation

In general, the issue of efficient management andquerying of dynamic data is related to several researchdomains. We have already mentioned that the needfor stream processing had originally arisen within thedatabase community when it became evident that typicalDBMS could not support applications where datais dynamic and processing requests are event-drivenrather than human-initiated. Traditional DBMSs are ineffect passive repositories of data that execute querieson user’s demand, therefore they are inadequate tohandle streaming data of possibly infinite size andexecute continuous queries on them. This has led to theintroduction of several prototype systems (Arasu et al.,2004; Chandrasekaran et al., 2003; Abadi et al., 2005;Chen et al., 2000; Liu et al., 1999) that constitute thegroundwork of the recently emerged stream managementproblem (Babcock et al., 2002).

In a nutshell, DSMSs, introduce various methodsto overcome the unbounded nature of stream data andthe problems it poses on query execution. Most DSMSsextend the relational model by including a timestampas a label for each relational tuple. This timestampcan either denote the arrival time of the tuple in thesystem or constitute a property of the tuple itself,e.g., the occurrence time of the event this tuple describes.The first kind of timestamp is called implicit, whereasthe second one is referred to as explicit. The timestampis usually the indicator value acting as a basis for theformulation of stream data windows. The notion ofwindows in stream manipulation tasks is a common one,since it greatly simplifies query execution by consideringonly finite subsets of the entire stream. A window overa stream is defined as a finite portion of the streamat some point in time. This portion can be defined interms of either time or number of tuples: in the formercase, the window is said to be time-based or logical,whereas in the latter case, the window is called tuple-based or physical. The elements of the stream that formthe window change as new stream data becomes availablein a DSMS system.

According to the definitions given in Patroumpas andSellis (2006), a logical window for a stream S of tuples isthe set of tuples with timestamps occurring in the [t1, t2]interval:

wl(S, t1, t2) = {S(τ), t1 ≤ τ ≤ t2}. (1)

On the contrary, a physical window applied at timeinstant τ contains the N most recent tuples of a stream:

wp(S, N, τ) = {wl(S, t1, τ) : t1 ≤ τ ∧ |wl(S, t1, τ)| ≤N

∧∀t2 < t1 : |wl(S, t2, τ)| > N}. (2)

Intuitively, the above-mentioned definition specifies t1as the timestamp of the oldest among the N mostrecent tuples and uses it as the start of a logicalwindow spanning until τ , which denotes the current timeinstant.

Depending on the way that windows are updated,they can be categorised as sliding, tumbling or landmarkwindows. Sliding windows have a fixed length andboth their bounds are moving at a predefined interval,replacing old stream elements with new ones. A slidingwindow with size w and progression step d can bedefined as:

wsl(S, w, d, t0, τ)

=

wl(S, t0, τ), t0 ≤ τ ≤ t0 + w and

mod (τ − t0, d) = 0

wl(S, τ − w, τ), t0 + w < τ and

mod(τ − t0, d) = 0

wsl(S, w, d, t0, τ − dt), mod(τ − t0, d) �= 0

(3)

where t0 is the time of the initial application of thewindowing method and τ is the current time instant.Equation (3) shows that sliding windows are computedat fixed time intervals d, starting at t0. For intermediatetimes, the sliding window stays the same as the onecomputed in the preceding time instant (where dt denotesthe time resolution of the system), hence the recursivenature of the last branch of the definition. Usually,progression step d is smaller than window size w, leadingto overlaps between two successive sliding windows.

Tumbling windows can be considered as a special case ofsliding windows, where the window is moving at a fixedinterval of length equal to the window size (d = w). Thus,tumbling windows are pairwise disjoint and do not shareany stream elements:

wtum(S, w, t0, τ) = wsl(S, w, w, t0, τ) (4)

Landmark windows are another type of window, wherethe beginning bound tl remains fixed while the other oneis moving, resulting in a variable window size (Arasuet al., 2006):

wland(S, tl, τ) =

wl(S, tl, τ), τ ≥ tl

0, τ < tl. (5)

A graphical representation of the above-mentionedthree types of logical windows is shown in Figure 1.Physical (tuple-based) windows are defined analogouslyas their logical counterparts in equations (3)–(5). Besidesthe aforementioned rudimentary windowing methods,semantics of other windowing techniques have beenproposed in the literature, such as the partitionedwindow technique (Li et al., 2005), where windows canbe formed according to attributes of interest, and themuch resembling notion of predicate windows (Ghanem

Page 4: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

4 D-E. Spanos et al.

Figure 1 Time-based window types: (a) Sliding window with size w and progression step d; (b) tumbling window with size w and(c) landmark window with lower bound tl

et al., 2006). In our approach, we favour the use of asliding physical window, which can be lightly formed andupdated.

Other popular stream-processing techniques includefiltering, punctuation and synopses. Filtering, as the namesuggests, mainly consists of applying criteria for theselection of a subset of stream elements and ignoring therest, while punctuation uses special markers inside thestream to indicate that the stream elements that havearrived so far can be partially processed and blockingoperators (e.g., a maximum or a sorting operator) canbe applied to them. Synopses are mere summaries andaggregations of incoming data, extremely useful whenapproximate answers are required. A brief overview ofthese techniques is provided in Maier et al. (2005).

Lately, a similar tendency has been observed inthe Semantic Web research community. While thetrade-off between expressive knowledge representationlanguages and reasoning complexity has been extensivelydocumented so far (Baader et al., 2005) and toolsthat offer scalable and efficient ontology storageand reasoning solutions surface at an increasing rate(Broekstra et al., 2002; Carroll et al., 2004; Dasand Srinivasan, 2009; Erling and Mikhailov, 2010),the focus has been mainly put on static collectionsof data and ontologies. Relatively few methods andframeworks have been proposed for the processing andmanagement of dynamic semantic data and ontologiesand even fewer working application examples havebeen implemented so far. Such efforts are presentedin the next section.

3 Related work

The lack of working stream processing applicationsin the Semantic Web context, pointed out in Section 2,is partly due to the fact that current popular W3Crecommendations OWL 2 (Motik et al., 2009) andSPARQL (Prud’hommeaux and Seaborne, 2008) donot offer satisfactory support for time-evolving data.Modelling the notion of time is essential, since time isa primordial factor in dynamic systems and streamingapplications. Considerable effort has been made towardsexpanding the RDF model so as to incorporate time.The most notable efforts to enhance RDF with timefeatures and provide an adequate respective querylanguage are presented in Gutierrez et al. (2007), Puglieseet al. (2008). Another extension of RDF incorporating

the notion of time is presented in Rodriguez et al. (2009),where every RDF resource considered is time annotated.An extension of SPARQL is proposed along with theabove-mentioned Time-Annotated RDF extension andthey are both applied in an indicative architecture to storeand query time-varying data. Essentially, this extensionworks merely as a synopsis of a more complex RDFrepresentation that uses few additional proprietary RDFpredicates.

Related efforts that are worth mentioning are theTime Ontology in OWL (Hobbs and Pan, 2006), whichdescribes general temporal information and O&M-OWL(Henson et al., 2009), modelling time series sensorobservations. Likewise, additions to the SPARQL querylanguage enabling continuous queries over windowedstream data have been incorporated in the C-SPARQL(Barbieri et al., 2009) and Streaming SPARQL (Bolleset al., 2008) approaches. As these approaches define newlanguages or update existing ones, they are significantlydifferent from the work presented here.

Most of the approaches dealing with managementand query processing of streaming data build on thecombination of existing DSMS and experience onreasoning and storage techniques for static knowledge,while using some of the aforementioned time-enhancedlanguages. To name just one example, Walavalkaret al. (2008) built on existing mature projects Jena(Carroll et al., 2004) and TelegraphCQ (Chandrasekaranet al., 2003) for the reasoning procedure on the staticontology and stream processing, respectively, to diminishthe processing time of RDF stream messages. Theentire subsumption hierarchy for OWL classes andproperties is computed only once in a preliminary stage,and stored in a database together with the inferenceresults from domain and range statements. Then, onruntime, as RDF statements arrive, new statementsare interpolated in the stream based on the inferencesstored in the system’s database. The performance ofthis approach clearly depends on the number of theinitially inferred statements. The underlying assumptionof this work is that all stream messages abide by acommon static ontology schema, making no previsionfor possible changes in the set of ontology axioms.Another drawback of this approach is the fact thatit utilises the stored inferences for only a pre-specifiedclass of interest, ignoring stream messages not obeyingthis restriction.

A prototype architecture that provides a proof ofconcept for the efficiency of the C-SPARQL language

Page 5: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

SensorStream: a semantic real-time stream management system 5

in continuous querying of stream data is describedin Barbieri et al. (2010). This architecture combines aplain SPARQL engine with a DSMS to achieve thedesired functionality. According to the above-mentionedarchitecture, every continuous query is split into twoparts: a static and a dynamic one, with the first oneinvoking the reasoner to infer new knowledge andthe second one delegating execution to the underlyingDSMS. This implies that reasoning takes place onlyonce for every continuous query, right after its initialregistration with the system. Thus, any modification inthe terminological part of the knowledge base does notreflect in the results of a query that has already beenregistered.

Semantic Streams (Whitehouse et al., 2006) is aframework that specifies a predicate-based representationof sensors and their measurements and allows usersto pose Prolog-like queries. The answers are evaluatedusing a backward chaining variant where a goal tree isconstructed and consecutive variable bindings are triedout until an answer is found. The stream nature ofsensor data seems to be overlooked in this approach,and no performance tests are provided, though it ishighly unlikely that real-time requirements are satisfiedgiven that execution of Prolog programs can be highlyinefficient. Another mainly theoretical framework for astream-processing middleware is DyKnow (Heintz et al.,2009), where emphasis is given on stream reasoningmethods and on the devision of a representation languagefor policies and processes applied to the incomingstreams.

SensorMasher (Le-Phuoc and Hauswirth, 2009) is aplatform for publishing sensor data as Linked Data(Bizer et al., 2009), based on a layered architecture,similar to ours. However, the focus in that work lies onthe exposure of sensor data to the user under differentways (Web service, HTTP representation, SPARQLendpoint, etc.) and less on the management aspect andreal-time processing of the incoming stream data.

A context-aware environmental-monitoring systemthat relies on a set of rules and a reference ontology todeduce high-level events from low-level sensor readingsis presented in Calder et al. (2010). This approachshares several features with our system, nevertheless the

streaming nature of incoming sensor data is neglectedand the performance of the system in relevant dynamicenvironments remains unclear. Information extractionfrom data that arrives in streams is also performedby complex event processing architectures (Gyllstromet al., 2007; Demers et al., 2007) that define proprietaryquery languages for data retrieval. These applications,though, are only marginally related to our work, sincethey lack semantic representation of both incomingdata and application context. Our SensorStream systemborrows ideas from several of the above-mentionedapproaches, building on the previous work in context-aware middleware (Konstantinou et al., 2010).

4 The SensorStream approach

In this section, we describe a layered architecture forthe efficient real-time processing and reasoning withstream data. This architecture has served as a basisfor the implementation of our SensorStream prototypeand is depicted in Figure 2. In the rest of this section,we enumerate the core SensorStream components anddetail on them.

Information sources: Information sources are the contentproviders for our system. In this paper, we giveemphasis to streaming sources that constantly providenew, possibly unbounded, information. Such examplesof information streams include sensor data, clickstreamdata, financial data or even network monitoringmessages. Our framework is applicable to all kindsof streaming data, no matter what its source is.This suppleness is guaranteed by a common language allstream messages should adhere to, as explained later.

In some cases, an information source comprises asensing device observing its environment and one or moredistributed trackers. Trackers are software componentsthat process data to extract more meaningful and usefulinformation. For instance, a tracker can process livevideo streams to extract events of interest, such asin Lien et al. (2007), Zhang and Chang (2002), where thetrackers extract events from baseball video streams, orin Comaniciu and Meer (1999), Bradski (1998) and Allen

Figure 2 Information flow in SensorStream

Page 6: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

6 D-E. Spanos et al.

et al. (2001) where the algorithms deal with the face orobject-tracking problem.

The trackers are considered to be distributed, in thesense that there is no need of message exchange orcommunication between them at any level. A system likethe hereby-presented approach can rely on a numberof distributed trackers without the need of a centralisedcoordination, with the exception of a common clockin cases when this is deemed necessary. In case theinformation processed needs to be consistent to acommon clock and present continuity and timeliness, atime server could be contacted.

Message language and mechanism: The various systemcomponents need to communicate via well-definedinterfaces. Therefore, there is a need of specifying:

1 A common language. In the current case, thelanguage of choice is XML, to allow interoperabilitywith any specific tracker. The structure and contentof XML messages can be conveniently tailored tothe needs of any application by complying tospecific schema definitions (XSDs). Thus, we relaxthe assumption made by the majority of approaches– as seen in Section 3 – where incoming stream datais a set of “ready to be consumed” RDF statements.This is clearly not the case for real applications anddata generators, which usually do not provide datacoupled with context or other semanticinformation.

On the contrary, the assumption that data ormeasurements are readily available in XML is not afar-fetched one, especially in the case of financial orWeb applications’ data. In the case of sensing data,few developing effort is needed to convert thecustom response format of a typical sensor device(also known as mote) to an XML template ofchoice. Thus, we can assume that such small-scaleXML converters are part of every tracker’s or datasource’s interface with the other frameworkcomponents. This renders our framework directlyapplicable even to situations when regular motes,usually forming a wireless sensor network are usedas data providers supporting high-levelapplications.

2 Common or well-interoperable technologies.To establish communication inside a widelydistributed network of heterogeneous devices, onehas many options: the Java Messaging System(JMS), sockets or Web services to name a few.In the studied approach, Web services have beenchosen because of the independence fromtechnologies, portability to any platform, scalabilityand maturity of the recommended approaches.Apache Axis2 and CXF are only a few of theprojects supporting the related W3Crecommendations.1

3 Interfacing. Well-defined and completefunctionality should be exposed to the informationsources and, more precisely, to the trackers. For thepurpose of the current implementation, only onemethod is needed, to send the XML messagecarrying the information from the trackers to theserver. Additional methods could be applied to senda server’s response in the form of commands to thetrackers. Such interfacing methods are necessarywhen information sources need to receive feedbackfrom the main processing unit and, hence, adaptthemselves in environmental conditions or intricatehigher-level application needs, possibly expressedin the form of semantic rules.

Message fusion mechanism: To collect, separate andthereafter fuse the incoming messages, a wide varietyof approaches exist in the literature. In fact, datafusion from a variety of sources constitutes a livelyresearch domain with a variety of open issues and ahighly active community (Liggins et al., 2009). As XMLinherently carries only structural information, to combineinformation from multiple sources it is essential toenhance semantically this information and ensuresemantic interoperability. In the approach presented here,mapping rules are used to combine information frommultiple sources and leverage its meaning by producinghigher-level information.

To be more specific, the information contained inthe XML stream messages contributes and modifiesthe knowledge of the system. Ideally, the intendedbehaviour of the system consists of evolving the ontologyrepresenting the knowledge core of the system, accordingto the incoming events encoded in XML. To this end,a set of mapping rules are employed to add statementsto the knowledge base. These rules are formed accordingto the following event-condition-action (ECA) pattern(Papamarkos et al., 2003):

on event if condition then actions

where the event in our case is an XML message arrivaland the condition is applied to the incoming XMLmessage. These mapping rules fetch data from the XMLmessage and store it into the ontology model in theform of class individuals. They are expressed in a customlanguage allowing for combination of the XML and RDFtechnical spaces, where typical examples of conditionsin the above-mentioned clause include the presence of aspecific XPath expression, whereas actions usually consistof the addition of individual assertions. These rulesare constructed manually from a user who is aware ofthe possible templates of the incoming XML messagesand the ontology structure as well. The full grammarspecification of the mapping rule language and details onthe system’s user interfaces can be found in Konstantinouet al. (2010).

This design choice automatically raises the limitationof a single common schema for all stream elements,

Page 7: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

SensorStream: a semantic real-time stream management system 7

assumed by other relevant approaches. DifferentXML templates can be considered for every distinctdata generator, as long as there is at least onecorresponding mapping rule to handle the ontologypopulation. An alternative choice for the XML to OWLtransformation would involve application of XSLTstylesheets to the incoming XML messages, followingthe GRDDL approach (Conolly, 2007). Although ourimplementation can handle either of these approaches,for the test scenario presented in this paper, we followedthe mapping rules approach.

Information manipulation: With this term, we referto information administering, processing, storing andexploiting. The technologies that are chosen have to beable to cooperate smoothly with each other. For instance,regarding storage, the available options include relationaldatabases, distributed (NoSQL) databases and semanticweb RDF stores. Respectively, processing and queryingshould be done using SQL, keyword-based approaches,2

or SPARQL queries. Therefore, the decisions that have tobe taken regarding the information flow and the relatedtechnologies are greatly affected by the purpose of thewhole attempt.

In the present approach, we follow the SemanticWeb point of view, to study the correspondingbenefits and difficulties. However, such a view entailsthe necessity of making a number of additionalchoices.

1 The reasoner. We need to be aware that in theSemantic Web context, reasoning is an issue thatneeds to be specifically taken care of. Poor decisionsin the choice of a reasoner, selection of reasoner’sconnectivity method with the application andin some cases even fine tuning the reasoner can leadto serious performance degradation. A seriesof reasoners are nowadays available in thebibliography, including FaCT++ (Tsarkov andHorrocks, 2006), Pellet (Sirin et al., 2007), KAON2(Maier et al., 2005) and OWLJessKB3 amongothers. For the current implementation, thereasoner chosen is Pellet since it is the one that isopen-source, Java-based, allows integration with theapplication’s code, is based on a complete DLreasoning algorithm but, most importantly, Pellethas the feature of incremental reasoning, which isclosely related to the characteristics a real-timesystem should present. The incrementalclassification feature of Pellet allows for improvedperformance in cases of frequent axiommodifications in the KB, given that the classificationhierarchy is incrementally updated instead of beingrecomputed from scratch with every axiom change.Furthermore, incremental consistency checkingensures that in an environment of continuous ABoxadditions or deletions, completion rules are appliedto the updated completion graph and not startingagain from scratch (Halaschek-Wiener et al., 2006).

2 The ontology. The ontology that has to be chosen todescribe the concepts involved in each problemshould be carefully selected while keeping in mindthat a highly expressive ontology entails increasedcomplexity and additional computational burdenfor the reasoning procedure. Thus, among the vastvariety of ontologies available on the Web, specialcare needs to be taken in choosing one and tailoringit to the application’s needs. It is important to notethat we do not consider a standard generic ontologythat covers most application cases, although morethan a few have been proposed in the relatedsituation awareness literature, e.g., Little andRogova (2009), Kokar et al. (2009). As the choice ofthe ontology to be used sets the application context,our framework is kept application-independent anddomain-agnostic.

As we have seen in Section 3, the notion of time isvital in stream-processing applications. Hence, weargue that the ontology to be chosen should be ableto describe the time an event or a measurementtook place instead of relying to timestamps thatusually denote the time of arrival of a streamelement. The use of such timestamps does not helpsort out the true sequence of events since therespective notifications may arrive in disorder owingto variable delays in the routing from the sourceto the receiver. Ensuring that incoming informationis already annotated with its time of occurrencehandling elegantly the infamous event disorder issuein data streams, under the assumption that alldisparate sources share a common timereference frame.

3 A semantic ruleset. Generally, the inclusion of a setof rules encoding the application or domain-specificknowledge is optional and depends on theapplication designer’s choice. In the SensorStreamcase, semantic rules are used to recognise events orsituations of interest and perform some actionaccording to the current application (e.g., issue ofan alert). The form and syntax of these rulesresembles to the one of the mapping rules presentedearlier, with the difference that, in the case of theformer, the condition refers to the ontology model.Example conditions include checks on the existenceor number of individuals for a given class or checkson the existence of a non-empty result set for agiven SPARQL query. Therefore, the condition partof SensorStream’s semantic rules can be replaced byan appropriate SPARQL/Update query,guaranteeing that an expressiveness equal to that ofSPARQL is achieved.

Windowing is an important decision that has to betaken since, by definition, streams are unboundedand cannot fit in memory to be processed as usual.Windowing concerns three important decisions that haveto be taken:

Page 8: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

8 D-E. Spanos et al.

1 The window measurement unit. This choice refers tothe basic unit serving as the window definition basis.In this aspect, a window can be either physical orlogical. Physical windows are defined in terms ofphysical units, which may be tuples in the case ofDSMS or RDF triples in the Semantic Web context.Logical windows, on the other hand, as alreadystated in Section 2, are typically defined in terms oftime units.

SensorStream follows a different path, by definingwindows in terms of the number of KB individualsthey refer to. This point of view is significantlydifferent from triple-based physical windows andoffers significant benefits. An individual-basedwindow may not have a fixed statement length butinstead guarantees that all statements involving anindividual are kept in the current window range.On the contrary, in the triple-based physical windowcase, statements are dropped out of the windowbased solely on their timestamp and not on theirrelationship with other statements. For example, anincoming statement might refer to an individual thatparticipates in older statements already dropped outof the triple-based window and, hence, knowledgeabout it is incomplete. An individual-based windowdecreases considerably the degree of incompletenessby dropping out groups of statements that refer tothe oldest individual in the KB.

An individual-based window is defined over astream of RDF triples, where every element of thestream can be written as (< s, p, o >, τ). We firstintroduce the individual timestamp tind for a givenentity s and time τ as the timestamp of its mostrecent appearance as subject in some RDF triple ofstream S measured in timestamp τ :

tind(s, τ) = {max(t) : (< s, p, o >, t) ∈ S

∧t ≤ τ} (6)

We then define an individual-based logical windowas the set of the RDF triples that contain as subjectan entity with individual timestamp tind occurringin the [t1, t2] interval:

w′l(S, t1, t2) = {(< s, p, o >, τ) ∈ S :

t1 ≤ tind(s, t2) ≤ t2}. (7)

From the above, similarly to equation (2), we candefine an individual-based physical window of sizeN as the set of the RDF triples that contain assubject one of the N most recently appearedindividuals in stream S:

w′p(S, N, τ) = {w′

l(S, t1, τ) : t1 ≤ τ

∧|w′l(S, t1, τ)|I ≤ N

∧∀t2 < t1 : |w′l(S, t2, τ)|I > N} (8)

where | · |I denotes the size in terms of individuals.The definitions for the sliding, tumbling andlandmark logical windows given in Section 2 can beadapted to the individual-based case, by replacingtypical tuple-based logical windows wl withindividual-based ones w′

l and expressing windowsize w and progression step d as a number ofindividuals.

2 The window size: We need to note that the windowsize causes a proportional increase in the systemlatency and, therefore, these two parameters have tobe balanced accordingly. On the one hand, toprocess data in real time we would like to have allthe information available in memory but, on theother hand, latency should be restricted toacceptable levels.

3 The window behaviour: In Konstantinou et al.(2010), a system is investigated where the bufferis ‘flushed’ after the processing latency exceededcertain thresholds. This behaviour, althoughconvenient for a large number of cases, is notalways optimal since maintenance operationscould take place at inconvenient times (e.g., duringthe occurrence of an event of interest) leadingto incorrect results. The approach followedin SensorStream deals with this problemby maintaining the latest information atall times.

As we will see in the next section, the application of awindow ensures that the response time stays within somefinite bounds. This, of course, happens at the expenseof the amount and completeness of information storedin the cache KB, upon which the semantic rules areapplied. This clearly prevents us from declaring rules thatmake use of older information and, thus, we rely solelyon the information kept inside the window. Dependingon the application requirements, different choices of thewindow size and the amount of information availableat real time can be made. For example, in a persontracking application taking place in a crowded area,a window of 100 ontology individuals might proveinsufficient for real time recognition of high-level events,while the same window would probably be satisfactoryfor the real-time monitoring of a natural phenomenon.This prevents us from establishing a general applicationquality metric, although we could risk saying that in mostcases, application quality is positively correlated with thecompleteness of information and, therefore, with windowsize as well.

Regardless of the window choice, and in contrastwith most systems employing windowing techniques,SensorStream stores older elements that are dropped outof the current window in a persistent KB to enablehistorical offline queries, as shown in Figure 2.

In brief, the course of actions taken by ourSensorStream prototype as a new information elementarrives is illustrated in Algorithm 1. When a new

Page 9: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

SensorStream: a semantic real-time stream management system 9

stream element becomes available, reasoning on thecurrent instance of the cache KB must be performedto ensure that all inferences regarding the new elementare computed. Fortunately, the redundancy impliedin reasoning over successive KB instances can behandled by a reasoner (e.g. Pellet) employing incrementalreasoning techniques. This step is essential to builda system that is both knowledge-aware and real-time,given that:

• A simple unprocessed message insertion in thecache KB does not render the systemknowledge-aware, since no implicit informationcan be extracted and added as well, without useof a reasoner.

• The notion of real-time is tightly coupled with theconcepts of event and response time. Moreprecisely, according to Dougherty and Laplante(1995), an event can be defined as ‘any occurrencethat results in a change in the sequential flow ofprogram execution’ and the response time as ‘thetime between the presentation of a set of inputs andthe appearance of all the associated outputs’.Therefore, without reloading the entire KB andperforming reasoning on it, we cannot consider thatall the associated outputs are produced and thesystem cannot be classified as real-time. Systemsthat perform simple insertions upon each messagearrival and scheduled inferencing at predefined timeintervals cannot be considered real-time, only‘near real-time’ if these intervals are frequentenough.

After the reasoning procedure is over, the currentsnapshot of the window has to be updated to includethe new element and possibly drop older elements,depending on the type of window chosen. In the sametime, the elements that were dropped out of the windoware moved to the persistent KB for future reference.Then, the mapping rules are applied to populate thesystem’s cache KB. Reasoning and semantic rules’ checkis performed afterwards to spot high-level events. As wehave already mentioned, semantic rules can be replacedby an equivalent SPARQL query. Since semantic rulesare constantly checked for every incoming element, theyin fact encapsulate continuous queries running againstthe temporary cache KB.

5 Performance evaluation

5.1 Measurement environment

For our measurements, we assumed a simple objecttracking application scenario. A data stream in the formof XML messages containing sensor data is received fromthe server for real-time semantic processing and eventrecognition. Continuous updates in the ontology carriedout by the execution of the mapping rules allow the KBto evolve using the information from the data streamwhile the semantic rules provide a means for the use ofthe complete (initial and inferred) knowledge for rule-based inferencing, continuous query answering and eventrecognition.

The data stream is simulated by means of a clientapplication that produces and sends at predefined timeintervals XML messages with dummy content thatconform to a specific template shown in the nextsubsection.

Our measurements were taken in a lab environment.The server was an Intel� CoreTM 2 [email protected],with 3962MB total memory running a x86_64 Ubuntu9.10 OS. The client was running on an Intel� CoreTM [email protected], with 2012 MB total memory and 32 bitUbuntu 9.10 OS. The core components of the Annotationand Knowledge layers of the SensorStream system wereimplemented using Jena 2.6.2, ARQ 2.8.1 and Pellet 2.0.2libraries, and MySQL 5.1.37 server.

5.2 Measurement methodology

First, we configure the system with a set of initial,necessary inputs, i.e., the ontology, the mapping rulesand the semantic rules. The ontology used was inspiredfrom the context awareness domain, including conceptslike , and relevant properties like

, etc. The ontology was designedto be generic enough to support a system for humanor object tracking in general. Since the development ofa more expressive and richer ontology is out of thescope of this work, it was kept as minimal as possible,using only the lightweight ALC(D) Description Logiclanguage, without General Concept Inclusions (GCIs)and numbering 31 classes, 3 object properties and 7datatype properties, in a total of 110 statements.

Page 10: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

10 D-E. Spanos et al.

The message template of the incoming XML stream,as shown here with a message example, includesinformation about the location, the time and theconfidence of the measurement.

The XML stream used as the input set for the purposeof our measurements contains 4000 messages havingthe same structure as the above. The mapping rulesfor this template and ontology provide the assertionof an individual of type Human, with the extractedvalues from the XML message serving as valuesfor the datatype properties hasTime, hasXLocation,hasYLocation, hasCertainty and hasName. The semanticrule in our measurements scenario is just checking for theexistence of individuals of the Human class and if thereis one or more, it adds exactly one in the Action class.As the performance of the system highly depends on thenumber and the complexity of rules, the rationale behindthe experiments run was, on the one hand, to test theperformance of the system with the presence of all thefeatures of our architecture but, on the other hand, fullycontrol the complexity introduced by the rules. Thus, ourmapping rule inserts five statements with every incomingmessage: one for the type definition and four for theassignment of properties’ values, while our semantic ruleleaves the knowledge base intact, with the exceptionof the first run when the class of interest contains noindividuals.

For each experiment, the main variable of interest islatency. Informally, we define latency as the time elapsedbetween the transmission of an XML message from theinformation source and the execution of the last semanticrule in this processing cycle. This time period ttotal

consists of processing time tp performed by the systemand, to a smaller extent, message transmission delay ttrincurred by the network, as shown in equation (9).

ttotal(N) = ttr + tp(N) (9)

tp(N) = ti(N) + tw(N) + tm(N) + ts(N) (10)

Furthermore, tp consists of the initialisation interval ti,during which the cache KB is loaded into the memory,the window selection interval tw, when the time bounds

of the window and its contents are updated and, finally,the mapping and semantic rules execution intervals, tmand ts, respectively. All the time variables introducedin equation (10) are functions of the window size N(in terms of statements). For every experiment run, wemeasure the total latency and the process time withrespect to the number of incoming statements, as shownin the next subsection. We should clarify here that werestrict ourselves to one scenario, given that our intentionis the comparison of the overall system performancewith respect to different window types and varyingwindow sizes. A thorough benchmarking of the systemperformance for a given window type would involve theinvestigation of different scenarios with several differentapplication cases of mapping and semantic rules, varyingontology size and number of statements to number ofindividuals ratio, and is thus considered out of scope forthe current paper.

5.3 Results

First, we perform a test to assess the system’s behaviourin the absence of a window. In Figure 3, we illustratethe system’s behaviour with Pellet, which employsincremental reasoning techniques to reduce the totaltime of the reasoning process. Although this feature ofPellet reduces, radically, the response time of the serverand its performance is superior to static tableau-basedreasoning algorithms, it still does not suffice for meetingthe real-time requirements in the case of streaming datasince processing time is shown to increase steadily withthe number of statements in the KB. As expected, thedifference between total latency and processing time,which accounts for the network delay, is small andconstant throughout the entire test.

For this purpose, we employ and study the efficiencyof two windowing approaches, where the size of thewindows used is measured in terms of the number ofindividuals stored in the cache KB, as mentioned inSection 4. We try an individual-based sliding windowwith unit advancement step and an individual-based‘gradually filling’ tumbling window and examine howtheir size affects the process time of the system.

Figure 3 Performance and number of statements without windowing. Raw model refers to the initial model while inferred refersto the model after the reasoning procedure: (a) process time and latency and (b) number of statements

Page 11: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

SensorStream: a semantic real-time stream management system 11

The advancement step in the sliding window is alsodefined in terms of individuals, in other words as anew individual becomes available, the sliding windowprogresses by excluding statements involving the oldestindividual in the KB and including statements referringto the new one. A random selection occurs when thereare more than one individuals with the oldest timestamp.Attention though must be drawn to the fact thatthe process of maintaining and constantly updating awindow of the incoming stream in-memory places anadditional burden for the KB and constitutes an overheadfor the system.

The ‘gradually filling’ tumbling window has, likeall tumbling windows, a progression step equal tothe window size, but fills gradually as individualsarrive to the KB instead of waiting to be completelyfilled at once. For ease of representation, we give thedefinition of a logical ‘gradually filling’ tumbling window,which can be easily adjusted for the individual-basedcase:

wtum,gf (S, w, t0, τ)

=

wl(S, t0, τ), t0 ≤ τ < t0 + w

wl(S, τ − mod(τ − t0, w), τ),

t0 + w ≤ τ and mod(τ − t0, w) �= 0

wl(S, τ − w, τ), t0 + w ≤ τ

and mod(τ − t0, w) = 0

. (11)

While formulating a window over a stream in DSMSis a straightforward procedure, the same does notapply to the case of a KB. The initial formulationand constant update of the window is performedwith the aid of SPARQL queries that are keptas lightweight as possible to minimise the overheadimposed by their execution. For our measurements,the sliding window was implemented indirectly usingthe following SPARQL query to select the olderindividual in the KB. All the statements involvingthe selected individual are inserted in the persistentKB and then removed from the cache KB. ThisSPARQL query is executed every time a new individualappears, given the unit progression step of the slidingwindow. As mentioned in Section 4, the hasTimeproperty present in our testing ontology is the propertythat time-annotates an individual. In our case, thisannotation refers to the generation time of the incomingmessage.

Equivalently, we could have used the followingSPARQL/Update query to perform in one step thesame procedure4. However, this query, more complex asit is with the use of the OPTIONAL pattern matching

(Perez et al., 2006), takes considerably more time toexecute on Jena’s ARQ engine than the previous one.Therefore, we decided to stick to the indirect two-stepwindowing procedure.

In a similar fashion, for the maintenance of the ‘graduallyfilling’ tumbling window, we used the following SPARQLquery to first select all individuals from the cache KB,copy them to the persistent KB and, finally, remove themfrom the cache KB.

In the following set of experiments, we showin Figures 4 and 5 the performance of the system withthe use of tumbling and sliding windows for sizes of1000, 5000 and 15,000 triples or 191, 991 and 2991individuals, respectively. Because of the specific mappingrules used in the experiment, as explained in Section 5.2,every individual is involved in five statements, hencethe number of statements is approximately five timesthe number of individuals considered. Next to eachperformance diagram, we append the evolution of thenumber of statements with the number of incomingmessages to clearly show the performance dependencyon the number of statements in the cache KB.

The peaks that are seen in the process time in Figure 4show that in the case of a tumbling window, whenreaching the window size and entering a new windowcomputation stage, the time needed to clear the windowcontents is much higher than the typical time needed

Page 12: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

12 D-E. Spanos et al.

Figure 4 Performance and number of statements for ‘gradually filling’ tumbling window of various sizes. raw model refers to theinitial model while inferred refers to the model after the reasoning procedure: (a) Window size = 191 individulals;(b) window size = 191 individulals; (c) window size = 991 individulals; (d) window size = 991 individulals; (e) windowsize = 2991 individulals and (f) window size = 191 individulals

to process a single message. This delay is caused bythe fact that in this maintenance stage every individualin the cache KB is deleted and the whole model ofthe ontology is affected. Moreover, as we can see inFigure 4 (a), (c) and (e) and Table 1, the processing timeat the maintenance stage increases at a high and growingrate, meaning that a tumbling window needs furtherimprovements to scale for larger data sets. Apart fromthe performance, the choice of a tumbling window entailsthe possibility of producing a false alarm since, aftera maintenance, the ontology will be back to a pristinestate and all previous information that could possiblylead to an alarm will no longer be available. Therefore,the use of such a tumbling window is recommendedin cases where a small loss in captured eventsis not crucial.

On the other hand, the behaviour of a sliding windowis more ‘predictable’ and closer to the desired one,compared with the tumbling window. In Figure 5(a), (b)and (e), the system shows a stable performance, wherethe overhead of computing the new window instanceis kept low regardless of the size of the window. Theaverage process time, tsl

p , in the case of the slidingwindow, as shown in Table 1, increases at a relatively

Table 1 Processing time for tumbling and sliding windowsof varying size (measured in number of statements).Average gradients are shown in parentheses

Window size ttump tsl

p

1000 1250 ms (1.005) 253 ms (0.135)5000 5270 ms (1.108) 793 ms (0.153)15000 16348 ms 2306 ms

small and almost stable rate as a function of the windowsize. While keeping a stable response, the system is ableto make real-time inferences from the newly asserteddata, providing not only improved performance, butalso more accurate real-time entailments. The peaks thatare seen in the figures are caused by turbulences in thetesting environment, leading to variations in networkdelay.

Finally, in Figure 6, the performance of the twowindowing approaches are juxtaposed to the plain nowindow approach for the three window sizes consideredto show the superior performance of the sliding windowapproach. The above-mentioned results strengthen theview that a windowing approach over a KB, can be a

Page 13: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

SensorStream: a semantic real-time stream management system 13

Figure 5 Performance and number of statements for unit step sliding window of various sizes. Raw model refers to the initialmodel while inferred refers to the model after the reasoning procedure: (a) Window size = 191 individulals;(b) window size = 191 individulals; (c) window size = 991 individulals; (d) window size = 991 individulals;(e) window size = 2991 individulals and (f) window size = 191 individulals

Figure 6 Comparison of performances for the three methods for various window sizes: (a) Process time, window size = 191individuals; (b) process time, window size = 991 individuals and (c) process time, window size = 2991 individuals

Page 14: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

14 D-E. Spanos et al.

stable solution to Semantic Web applications where timeis a critical parameter. Furthermore, our individual-basedwindow approach provides hints that Semantic Webtechniques can be applied to exploit stream semanticsand ameliorate application-level quality given the latencylimitation posed by certain real-time applications.

6 Conclusions and future work

As streams of data become increasingly available,real-time solutions are needed to process and combinestreams to extract high-level events with minimalhuman intervention. In this paper, we presented asemantic-based framework for dealing with informationstreams from heterogeneous sources encoded in variousXML templates and its prototype implementation,SensorStream. Our framework combines consistently anumber of technologies leading to a powerful solutionfor the problem of combining semantic informationin real time. SensorStream does not restrict itself toan application domain, as the ontology encoding theknowledge of the domain is configurable and canbe selected and tailored to the application needs.Moreover, it allows for multiple information sourcesthat store their output in possibly distinct XMLtemplates. The syntactic heterogeneity of the latter ishandled via mapping rules that populate the domainontology with appropriate individuals. We argued that,given a common time reference frame shared by allinformation sources, incorporating the notion of time inthe ontology allows for annotating each event with itstime of occurrence. This design choice differs significantlyfrom the traditional timestamps of stream managementsystems that are able to capture only the time of arrivalof every message to the system, giving rise to the eventdisorder problem.

Two windowing techniques that restrict the unboundstream to a finite stream subset were borrowed from therelated stream management research field in databasesand adjusted accordingly to the KB context. What needsto be noted though is the fact that we do not dropthe elements that exit the window but we store themexplicitly in a persistent KB to allow for future reuse,reference and execution of historical queries on them.Tests were performed for various sizes of these twowindow types, namely the sliding and ‘gradually filling’tumbling window, and the performance of the system wasmeasured. Results have shown that the sliding windowperforms better than the tumbling window and, thus, itsapplication meets the real-time requirements for semanticprocessing of information streams. Average process timeincreases, as anticipated, as the applied window growslarger, but, as the tests suggest, at a lower rate than thewindow size increase rate.

Next steps extending SensorStream’s functionalitywould include testing of other window types mentionedin the stream database literature, such as landmarkwindows, while we could try to make the appropriate

adjustments to other related techniques that reduce thevolume of a stream, such as filtering or synopses. It wouldbe interesting to explore how these techniques wouldbe applied in a KB, taking advantage of the elements’properties and the relationships linking them to achievesmarter load shedding. Furthermore, more extensivetests of our framework could be performed examininghow the ontology’s complexity and size as well as thecomplexity of the applied rules affect the overall system’sperformance.

Acknowledgement

Dimitrios-Emmanuel Spanos wishes to acknowledge thefinancial support of “Alexander S. Onassis Public BenefitFoundation”, under its Scholarships Programme.

References

Abadi, D., Ahmad, Y., Balazinska, M., Cetintemel, U.,Cherniack, M., Hwang, J., Lindner, W., Maskey, A.,Rasin, A., Ryvkina, E., Tatbul, N., Xing, Y. andZdonik, S. (2005) ‘The design of the borealis streamprocessing engine’, Second Biennial Conference onInnovative Data Systems Research (CIDR 2005),Asilomar, CA, ACM.

Allen, B.D., Bishop, G. and Welch, G. (2001) ‘Tracking:beyond 15 minutes of thought’, ACM SIGGRAPH 2001Conference - Course Track, Los Angeles, CA, USA.

Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M.,Ito, K., Motwani, R., Srivastava, U. and Widom, J.(2004) STREAM: The Stanford Data Stream ManagementSystem, Technical Report, Stanford InfoLab.

Arasu, A., Babu, S. and Widom, J. (2006) ‘The CQLcontinuous query language: semantic foundations andquery execution’, The VLDB Journal, Vol. 15, No. 2,pp.121–142.

Baader, F., Horrocks, I. and Sattler, U. (2005) DescriptionLogics as Ontology Languages for the Semantic Web,Springer, Berlin, Heidelberg, pp.228–248.

Babcock, B., Babu, S., Datar, M., Motwani, R. andWidom, J. (2002) ‘Models and issues in data streamsystems’, Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of DatabaseSystems – PODS ’02, Madison, WI, USA, pp.1–16.

Barbieri, D., Braga, D., Ceri, S., Della Valle, E. andGrossniklaus, M. (2009) ‘C-SPARQL: SPARQLfor continuous querying’, Proceedings of the 18thInternational Conference on World Wide Web, ACMNew York, NY, USA, pp.1061–1062.

Barbieri, D., Braga, D., Ceri, S. and Grossniklaus, M. (2010)‘An execution environment for C-SPARQL queries’,Proceedings of the 13th International Conferenceon Extending Database Technology, ACM, Lausanne,Switzerland, pp.441–452.

Bizer, C., Heath, T. and Berners-Lee, T. (2009) ‘Linked datathe story so far’, International Journal on Semantic Weband Information Systems, Vol. 5, No. 3, pp.1–22.

Page 15: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

SensorStream: a semantic real-time stream management system 15

Bolles, A., Grawunder, M. and Jacobi, J. (2008) ‘StreamingSPARQL – extending SPARQL to process data streams’,The Semantic Web: Research and Applications, Springer,Tenerife, Spain, pp.448–462.

Bradski, G.R. (1998) ‘Computer vision face tracking for use ina perceptual user interface’, Intel Technology Journal, Vol.2, No. 2.

Broekstra, J., Kampman, A. and van Harmelen, F. (2002)‘Sesame: a generic architecture for storing and queryingRDF and RDF schema’, in Horrocks, I. and Hendler, J.(Eds.): Proceedings of the first Int’l Semantic WebConference (ISWC 2002), Vol. 2342, Lecture Notesin Computer Science, Springer Verlag, Sardinia, Italy,pp.54–68.

Calder, M., Morris, R.A. and Peri, F. (2010) ‘Machinereasoning about anomalous sensor data’, EcologicalInformatics, Vol. 5, No. 1, pp.9–18.

Carroll, J., Dickinson, I., Dollin, C. and Reynolds, D. (2004)‘Jena: implementing the semantic web recommendations’,Proceedings of the 13th International World Wide WebConference on Alternate Track Papers & Posters, NewYork, USA, pp.74–83.

Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.,Hellerstein, J., Hong, W., Krishnamurthy, S., Madden, S.,Raman, V., Reiss, F. and Shah, M. (2003) ‘TelegraphCQ:continuous dataflow processing for an uncertain world’,Proceedings of the Conference on Innovative DataSystems Research (CIDR 2003), ACM Press, San Diego,CA, USA.

Chen, J., DeWitt, D., Tian, F. and Wang, Y. (2000)‘NiagaraCQ: a scalable continuous query system forinternet databases’, ACM SIGMOD Record, Vol. 29, No. 2,pp.379–390.

Comaniciu, D. and Meer, P. (1999) ‘Mean shift analysisand applications’, Proceedings of the Seventh IEEEInternational Conference on Computer Vision (ICCV’99),IEEE Computer Society, Kerkyra, Greece, Vol. 2,pp.1197–1203.

Conolly, D. (2007) Gleaning Resource Descriptions fromDialects of Languages (GRDDL), Available online athttp://www.w3.org/TR/grddl/

Cuenca Grau, B., Halaschek-Wiener, C. and Kazakov, Y.(2007) ‘History matters: incremental ontology reasoningusing modules’, in Aberer, K., Choi, K-S., Noy, N.,Allemang, D., Lee, K-I., Nixon, L., Golbeck, J.,Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G.and Cudr-Mauroux, P. (Eds.): Proceedings of the 6thInternational Semantic Web Conference (ISWC 2007),Lecture Notes in Computer Science, Springer, Berlin,Heidelberg, Vol. 4825, pp.183–196.

Das, S. and Srinivasan, J. (2009) ‘Database technologiesfor RDF’, 5th International Summer School 2009 onReasoning Web. Semantic Technologies for InformationSystems, LNCS 5689, pp.205–221.

Della Valle, E., Ceri, S., van Harmelen, F. and Fensel, D. (2009)‘It’s a streaming world! reasoning upon rapidly changinginformation’, Intelligent Systems, IEEE, Vol. 24, No. 6,pp.83–89.

Demers, A., Gehrke, J., Panda, B., Riedewald, M. andSharma, V. (2007) Cayuga: A General Purpose EventMonitoring System, Proceedings CIDR, ACM Press, NewYork, New York, USA, pp.412–422.

Dougherty, E. and Laplante, P. (1995) Introduction to Real-Time Imaging, SPIE Press, Bellingham, WA, USA.

Erling, O. and Mikhailov, I. (2010) Virtuoso: RDF Support in aNative RDBMS, Springer, Berlin, Heidelberg, pp.501–519.

Ghanem, T., Aref, W. and Elmagarmid, A. (2006) ‘Exploitingpredicate-window semantics over data streams’, ACMSIGMOD Record, Vol. 35, No. 1, pp.3–8.

Gutierrez, C., Hurtado, C. and Vaisman, A. (2007) ‘IntroducingTme into RDF’, IEEE Transactions on Knowledge andData Engineering, Vol. 19, No. 2, pp.207–218.

Gyllstrom, D., Wu, E., Chae, H., Diao, Y., Stahlberg, P. andAnderson, G. (2007) ‘SASE: complex event processingover streams’, Proceedings of the Third BiennialConference on Innovative Data Systems Research (CIDR2007), Asilomar, CA, USA.

Halaschek-Wiener, C., Parsia, B. and Sirin, E. (2006)‘Description logic reasoning with syntactic updates’, inMeersman, R. and Tari, Z. (Eds.): On the Move toMeaningful Internet Systems 2006: CoopIS, DOA, GADA,and ODBASE, Lecture Notes in Computer Science,Springer, Berlin, Heidelberg, Vol. 4275, pp.722–737.

Heintz, F., Kvarnstrom, J. and Doherty, P. (2009) ‘Streamreasoning in dyknow: a knowledge processing middlewaresystem’, 1st International Workshop on Stream Reasoning,Springer, Heraklion, Greece.

Henson, C., Neuhaus, H., Sheth, A., Thirunarayan, K.and Buyya, R. (2009) ‘An ontological representationof time series observations on the semantic sensorweb’, Proceedings of the 1st Int. Workshop on theSemantic Sensor Web (SemSensWeb), collocated withESWC, Heraklion, Greece.

Hobbs, J. and Pan, F. (2006) Time ontology in OWL’, Availableonline at http://www.w3.org/TR/ owl-time/

Kokar, M., Matheus, C. and Baclawski, K. (2009) ‘Ontology-based situation awareness’, Information fusion, Vol. 10,No. 1, pp.83–98.

Konstantinou, N., Solidakis, E., Zafeiropoulos, A.,Stathopoulos, P. and Mitrou, N. (2010) ‘A context-awaremiddleware for real-time semantic enrichment ofdistributed multimedia metadata’, Multimedia Tools andApplications, Vol. 46, Nos. 2–3, pp.425–461.

Le-Phuoc, D. and Hauswirth, M. (2009) ‘Linked opendata in sensor data mashups’, Proceedings of the 2ndInternational Workshop on Semantic Sensor Networks(SSN09), Washington DC, USA, pp.1–16.

Li, J., Maier, D., Tufte, K., Papadimos, V. and Tucker, P.A.(2005) ‘Semantics and evaluation techniques for windowaggregates in data streams’, Proceedings of the 2005 ACMSIGMOD International Conference on Management ofData – SIGMOD ’05, Baltimore, MD, USA, pp.311–322.

Lien, C.C., Chiang, C.L. and Lee, C.H. (2007) ‘Scene- basedevent detection for baseball videos’, Journal of VisualCommunication and Image Representation, Vol. 18, No. 1,pp.1–14.

Liggins, M.E., Hall, D.L. and Llinas, J. (Eds.) (2009) Handbookof Multisensor Data Fusion: Theory and Practice, CRCPress, Boca Raton, FL, USA.

Little, E. and Rogova, G. (2009) ‘Designing ontologies forhigher level fusion’, Information Fusion, Vol. 10, No. 1,pp.70–82.

Page 16: SensorStream:asemanticreal-timestream …people.cn.ntua.gr/nkons/papers/sensorstream.pdfSensorStream: a semantic real-time stream management system 3 our system and the simulation

16 D-E. Spanos et al.

Liu, L., Pu, C. and Tang, W. (1999) ‘Continual queriesfor internet scale event-driven information delivery’,IEEE Transactions on Knowledge and Data Engineering,Vol. 11, No. 4, pp.610–628.

Maier, D., Tucker, P. and Garofalakis, M. (2005) ‘Filtering,punctuation, windows and synopses’, in book Stream DataManagement, Springer, New York, USA, pp.35–58.

Motik, B., Sattler, U. and Studer, R. (2005) ‘Query answeringfor OWL-DL with rules’, Web Semantics: Science,Services and Agents on the World Wide Web, Vol. 3, No. 1,pp.41–60.

Motik, B., Grau, B.C., Horrocks, I., Wu, Z., Fokoue, A. andLutz, C. (2009) OWL 2 Web Ontology Language Profiles,Available online at http://www.w3.org/TR/ 2009/PR-owl2-profiles-20090922/

Papamarkos, G., Poulovassilis, A. and Wood, P.T. (2003)‘Event-condition-action rule languages for the semanticweb’, Workshop on Semantic Web and Databases,Springer, Berlin, Germany, pp.309–327.

Parsia, B., Halaschek-Wiener, C. and Sirin, E. (2006) ‘Towardsincremental reasoning through updates in OWL-DL’,Proceedings of the Reasoning on the Web Workshop atWWW 2006, Edinburgh, Scotland.

Patroumpas, K. and Sellis, T. (2006) ‘Window specificationover data streams’, International Conference onSemantics of a Networked World: Semantics of Sequenceand Time Dependent Data (ICSNW’06), Springer,Munich, Germany, pp.445–464.

Perez, J., Arenas, M. and Gutierrez, C. (2006) ‘Semantics andcomplexity of SPARQL’, Proceedings of the InternationalSemantic Web Conference (ISWC 2006), Springer, Athens,GA, USA, pp.30–43.

Prud’hommeaux, E. and Seaborne, A. (2008) SPARQLQuery Language for RDF, Available online athttp://www.w3.org/TR/rdf-sparql-query/

Pugliese, A., Udrea, O. and Subrahmanian, V.S. (2008) ‘ScalingRDF with time’, Proceedings of the 17th InternationalConference on World Wide Web – WWW ’08, pp.605–614.

Rodriguez, A., McGrath, R., Liu, Y. and Myers, J. (2009)‘Semantic management of streaming data’, Proceedingsof the 2nd International Workshop on Semantic SensorNetworks (SSN09), Washington DC, USA, pp.80–95.

Sirin, E., Parsia, B., Kalyanpur, A., Grau, B. and Katz, Y.(2007) ‘Pellet: a practical OWL-DL reasoner’, WebSemantics: Science, Services and Agents on the WorldWide Web, Vol. 5, No. 2, pp.51–53.

Tsarkov, D. and Horrocks, I. (2006) ‘FaCT++ descriptionlogic reasoner: system description’, Proceedings of theInt. Joint Conf. on Automated Reasoning (IJCAR 2006),Lecture Notes in Artificial Intelligence, Springer, Seattle,WA, USA, Vol. 4130, pp.292–297.

Volz, R., Staab, S. and Motik, B. (2005) ‘Incrementallymaintaining materializations of ontologies stored in logicdatabases’, Journal on Data Semantics, Springer, Vol. 2,pp.1–34.

Walavalkar, O., Joshi, A., Finin, T. and Yesha, Y.(2008) ‘Streaming knowledge bases’, Fourth InternationalWorkshop on Scalable Semantic Web Knowledge BaseSystems, Karlsruhe, Germany.

Whitehouse, K., Zhao, F. and Liu, J. (2006) ‘Semantic streams:a framework for composable semantic interpretation ofsensor data’, Proceedings of the 3rd European Workshopon Wireless Sensor Networks (EWSN 2006), Springer,Zurich, Switzerland, pp.5–20.

Zhang, D. and Chang, S.F. (2002) ‘Event detection inbaseball video using superimposed caption recognition’,Proceedings of the Tenth ACM International Conferenceon Multimedia, ACM New York, NY, USA, pp.315–318.

Notes

1W3C Web Services Activity, http://www.w3.org/2002/ws/2NoSQL databases do not typically provide any declarativequery language equivalent to SQL.

3OWLJessKB, http://edge.cs.drexel.edu/assemblies/software/owljesskb/

4Since Jena’s ARQ engine does not allow update nestedqueries, the alternative is a much more unintuitivequery, which finds as well the oldest individual in theKnowledge Base.