Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
PROPOSTE DI PROPOSTE DI PROGETTI E TESI DI PROGETTI E TESI DI LAUREALAUREA
Tecnologie per i Sistemi Informativi
Context Integration for Mobile Data Design
• Disparate, heterogeneous, independent Data Sources
• Semantic schema integration
• Context-aware information filtering: Data Tailoring
• Common, integrated, semantic access to data
• Issues: mobility, data transiency
• Multiple scenarios: system adaptability
2
Context Model: Dimension Tree
Dimension Tree:
• is a Context-User Model, represented as a constrained ontology
• Dimensions are used to classify all the possible user-context
pairs
• is an extension of the Very Small DataBase Dimension Array
3
Domain Ontology
Domain Ontology:
• Represents the main concepts, relations, attributes of the domain: build a shared vocabulary
• Copes with the absence of the equivalent of a DB “global schema”
• It will be, in the medium/long term, shared and commonly agreed
• Must be decidable and computable (typically within OWL-DL)
Data Source: Semantic Extraction
Data Source Ontology:
• Semantic Extraction: data abstract model + storage model
• Supports the query processing
• Models isolation (different models can be used separately)
4
Chunks
Chunk:
• is the set of relevant data for a given user in a given context
• can be derived from several data sources
• is highly context-aware
• can be materialized on the user device
Possibili aree di progetto
� Moduli per ontology mapping (tecniche di rilevazione di similitudine)
� Estrattori di semantica per le diverse sorgenti informative (XML, Web pages, OODB, sensori wireless…)
� Query processing: argomento più opportuno per (progetto + tesi) � richiede lavoro di analisi preliminare
� Generazione di chunk nelle varie fasi del ciclo di vita del sistema (design time, run time, query time)
� Toolbox per la configurazione dell’architettura� Case tool
5
XML
� XML (acronimo di eXtensible MarkupLanguage) deriva come HTML dalla specifica di SGML (Standard GeneralizedMarkup Language) ed è stato introdotto dal W3C;
� XML può essere visto come una moderna “lingua franca” nella modellizzazione delle informazioni e può anche essere utilizzato per rappresentare dati semi-strutturati(a differenza dei Database) che hanno una struttura implicita e incompleta;
� XML non è né un sostituto di HTML né un linguaggio di programmazione a se stante;
Data Mining
� Data Mining area di ricerca che si occupa dello studio di tecniche per estrapolare informazioni implicite, non conosciute ma utili per gli utenti, da basi di dati di grosse dimensioni.
� Regola di associazione implicazione valida con una certa frequenza. Ad esempio, con una certa frequenza f, coloro che seguono il genere “gioco a premi” seguono anche gli “sceneggiati televisivi”.
6
Our goal
� Given� XML dataset D
� A summarized representation of D by means of association rules AR
� A query Q
� Provide an intensional answer to Q by querying AR instead of D
� Substitute the actual data answering query with a set of properties characterizing them[Motro89]
<article year"2001"><volume>30</volume><number>2</number><month>June</month><conference>ACM International …</conference><date>May 21 - 24, 2001</date><location>Santa Barbara, California, USA</location><title articleCode="302001">Securing...</title><authors>
<author authorPosition="01">E. Brown</author><author authorPosition="02">L. Baines</author>
</authors>
….
Our goal
D
AR
Q<XQuery>
<xml>
<xml>
EXTENSIONAL answer
INTENSIONAL answer
Data Mining
<result> {for $article in doc("document.xml")//articlewhere $article/authors/author/text() = "E. Brown"return $article }</result>
<result> {for $article in doc("RuleSet.xml")//AssociationRulewhere $article[RuleBody[item[ItemName="author"
and ItemValue="E. Brown"]]]return $article }</result>
<AssociationRule support="0.2" confidence="0.8"><RuleBody><item>
<ItemName>author</ItemName><ItemValue>E. Brown</ItemValue>
</item></RuleBody><RuleHead><item>
<ItemName>conference</ItemName><ItemValue>ACM Intern…</ItemValue>
</item></RuleHead>
</AssociationRule>
7
Motivation
� XML is a verbose representation of data� Huge storage space� Query processing time
� AR‘s provide a succinct representation� Provide:
• fast• approximate• succinct
� Can substitute the actual set if currently
unreachable
Answer to query (e.gdecison supportpurpose)
Patterns for XML Documents(1)
� Patterns = abstract representation of a generalizationof constraints [BGQT04]
� summarized representation of the data
� Based on association rule extracted from the dataset
� Association rule:
� X,Y set of data items X ⇒ Y
� support sup(X ⇒ Y) = freq(X U Y)
� confidence conf(X ⇒ Y) = freq(X U Y)/freq(X)
8
Patterns for XML Documents(2)
Two orthogonal ways to classify patterns:
Exact
(e.g. functionaldependencies)
Probabilistic(weak constraints)
Schema
(datasetstructure)
Instance
(datasetinstances)
Patterns for XML Documents(3)
� Instance patterns = patterns expressed on the instances of the dataset
� GSL language for pattern formalization [BGQT04]
Instance PatternInstance Pattern QueryQuery
9
Examples of framework (1/4)
Classes of query formalized into XQuery expression to inquire either the XML Dataset or the Rule Set.
�A tool with query prototype for each class of query
Examples of framework (2/4)
• Graphical query language to express
queries
XQBE (XQuery By Example) [Braga03]
• User friendly
• Output: XQuery expression easy to modify in
an automatic manner to inquire even the rule
set
10
Examples of framework (3/4)
Examples of framework (4/4)
11
Wireless embedded sensornetworks
� Thousands of tiny low power devicesspread over large physical areas monitor the environment, possibly predictingpotential faults in buildings, bridges, roads, railways etc.
� The devices must be small, unobtrusive, and cheap
� The network must be unexpensive to develop, deploy, program, utilize and maintain
A sensor network
� Comprises a number of sensor nodesand a base station
� Applications:� Monitoring contaminated land areas
or waters� Monitoring animal behaviour� Fire, earthquake emergencies� Vehicle tracking, traffic control� Surveillance of city districts, defense
related networks, alerts to terroristicalthreats
12
Motes: the Mica2 platform
� Mica2Dot� Basically same features, smaller size, fewer
sensor options
� Different sensor boards for Mica2 and Mica2Dot
DB view of sensor networks
� Traditional:� Procedural addressing of individual
sensor nodes: user specifies how task is executed, data is processedcentrally
� DB-style approach:� Declarative querying: user is not
concerned about “how the network works”: in-network distributedprocessing
13
TinyDB
� Reduced SQL interface (with some additional constructs)
� Queries issued from a PC� Collects data from motes in the
environment, filters it, aggregates it together, and routes it out to a PC
� Exploits power-efficient in-network processing algorithms.
� Multiple persistent queries with different sample time
“TinyDB is a query processing system for extracting information from a network of TinyOS sensors.”
But further useful database functionalities are still lacking…
� One VSDB should reside at least on every generic sensing device(e.g. Mica2)
� To compose a distributed/federated database � Each VSDB should be context aware� Each VSDB should be able to “appropriately” redirect queries to
neighbours (P2P)� because of an internal fault or a generic unavailability� because it does not possess the information� because the other node “knows” something more, in order to
complete the information� because the other node has a less power-consuming sensor on-
board� design appropriate, optimized query processing plans (e.g. redirect
subquery, cache subquery result, etc.)
14
Estrazione di dati da sorgenti web e costruzione di data warehouse
Un tema di interesse: i congressi medici nel mondo:
� Definizione di ontologie di dominio� Estrattori di informazioni� Progetto e realizzazione della base di
dati e del data warehouse