PROPOSTE DI PROGETTI E TESI DI LAUREAhome.deib.polimi.it/schreibe/tecnologie/materiale/Let2005/ProposteProgetti2.pdf · 1 PROPOSTE DI PROGETTI E TESI DI LAUREA Tecnologie per i Sistemi

1

PROPOSTE DI PROPOSTE DI PROGETTI E TESI DI PROGETTI E TESI DI LAUREALAUREA

Tecnologie per i Sistemi Informativi

Context Integration for Mobile Data Design

• Disparate, heterogeneous, independent Data Sources

• Semantic schema integration

• Context-aware information filtering: Data Tailoring

• Common, integrated, semantic access to data

• Issues: mobility, data transiency

• Multiple scenarios: system adaptability

2

Context Model: Dimension Tree

Dimension Tree:

• is a Context-User Model, represented as a constrained ontology

• Dimensions are used to classify all the possible user-context

pairs

• is an extension of the Very Small DataBase Dimension Array

3

Domain Ontology

Domain Ontology:

• Represents the main concepts, relations, attributes of the domain: build a shared vocabulary

• Copes with the absence of the equivalent of a DB “global schema”

• It will be, in the medium/long term, shared and commonly agreed

• Must be decidable and computable (typically within OWL-DL)

Data Source: Semantic Extraction

Data Source Ontology:

• Semantic Extraction: data abstract model + storage model

• Supports the query processing

• Models isolation (different models can be used separately)

4

Chunks

Chunk:

• is the set of relevant data for a given user in a given context

• can be derived from several data sources

• is highly context-aware

• can be materialized on the user device

Possibili aree di progetto

� Moduli per ontology mapping (tecniche di rilevazione di similitudine)

� Estrattori di semantica per le diverse sorgenti informative (XML, Web pages, OODB, sensori wireless…)

� Query processing: argomento più opportuno per (progetto + tesi) � richiede lavoro di analisi preliminare

� Generazione di chunk nelle varie fasi del ciclo di vita del sistema (design time, run time, query time)

� Toolbox per la configurazione dell’architettura� Case tool

5

XML

� XML (acronimo di eXtensible MarkupLanguage) deriva come HTML dalla specifica di SGML (Standard GeneralizedMarkup Language) ed è stato introdotto dal W3C;

� XML può essere visto come una moderna “lingua franca” nella modellizzazione delle informazioni e può anche essere utilizzato per rappresentare dati semi-strutturati(a differenza dei Database) che hanno una struttura implicita e incompleta;

� XML non è né un sostituto di HTML né un linguaggio di programmazione a se stante;

Data Mining

� Data Mining area di ricerca che si occupa dello studio di tecniche per estrapolare informazioni implicite, non conosciute ma utili per gli utenti, da basi di dati di grosse dimensioni.

� Regola di associazione implicazione valida con una certa frequenza. Ad esempio, con una certa frequenza f, coloro che seguono il genere “gioco a premi” seguono anche gli “sceneggiati televisivi”.

6

Our goal

� Given� XML dataset D

� A summarized representation of D by means of association rules AR

� A query Q

� Provide an intensional answer to Q by querying AR instead of D

� Substitute the actual data answering query with a set of properties characterizing them[Motro89]

<article year"2001"><volume>30</volume><number>2</number><month>June</month><conference>ACM International …</conference><date>May 21 - 24, 2001</date><location>Santa Barbara, California, USA</location><title articleCode="302001">Securing...</title><authors>

<author authorPosition="01">E. Brown</author><author authorPosition="02">L. Baines</author>

</authors>

….

Our goal

D

AR

Q<XQuery>

<xml>

<xml>

EXTENSIONAL answer

INTENSIONAL answer

Data Mining

<result> {for $article in doc("document.xml")//articlewhere $article/authors/author/text() = "E. Brown"return $article }</result>

<result> {for $article in doc("RuleSet.xml")//AssociationRulewhere $article[RuleBody[item[ItemName="author"

and ItemValue="E. Brown"]]]return $article }</result>

<AssociationRule support="0.2" confidence="0.8"><RuleBody><item>

<ItemName>author</ItemName><ItemValue>E. Brown</ItemValue>

</item></RuleBody><RuleHead><item>

<ItemName>conference</ItemName><ItemValue>ACM Intern…</ItemValue>

</item></RuleHead>

</AssociationRule>

7

Motivation

� XML is a verbose representation of data� Huge storage space� Query processing time

� AR‘s provide a succinct representation� Provide:

• fast• approximate• succinct

� Can substitute the actual set if currently

unreachable

Answer to query (e.gdecison supportpurpose)

Patterns for XML Documents(1)

� Patterns = abstract representation of a generalizationof constraints [BGQT04]

� summarized representation of the data

� Based on association rule extracted from the dataset

� Association rule:

� X,Y set of data items X ⇒ Y

� support sup(X ⇒ Y) = freq(X U Y)

� confidence conf(X ⇒ Y) = freq(X U Y)/freq(X)

8


Two orthogonal ways to classify patterns:

Exact

(e.g. functionaldependencies)

Probabilistic(weak constraints)

Schema

(datasetstructure)

Instance

(datasetinstances)


� Instance patterns = patterns expressed on the instances of the dataset

� GSL language for pattern formalization [BGQT04]

Instance PatternInstance Pattern QueryQuery

9

Examples of framework (1/4)

Classes of query formalized into XQuery expression to inquire either the XML Dataset or the Rule Set.

�A tool with query prototype for each class of query


• Graphical query language to express

queries

XQBE (XQuery By Example) [Braga03]

• User friendly

• Output: XQuery expression easy to modify in

an automatic manner to inquire even the rule

set

10



11

Wireless embedded sensornetworks

� Thousands of tiny low power devicesspread over large physical areas monitor the environment, possibly predictingpotential faults in buildings, bridges, roads, railways etc.

� The devices must be small, unobtrusive, and cheap

� The network must be unexpensive to develop, deploy, program, utilize and maintain

A sensor network

� Comprises a number of sensor nodesand a base station

� Applications:� Monitoring contaminated land areas

or waters� Monitoring animal behaviour� Fire, earthquake emergencies� Vehicle tracking, traffic control� Surveillance of city districts, defense

related networks, alerts to terroristicalthreats

12

Motes: the Mica2 platform

� Mica2Dot� Basically same features, smaller size, fewer

sensor options

� Different sensor boards for Mica2 and Mica2Dot

DB view of sensor networks

� Traditional:� Procedural addressing of individual

sensor nodes: user specifies how task is executed, data is processedcentrally

� DB-style approach:� Declarative querying: user is not

concerned about “how the network works”: in-network distributedprocessing

13

TinyDB

� Reduced SQL interface (with some additional constructs)

� Queries issued from a PC� Collects data from motes in the

environment, filters it, aggregates it together, and routes it out to a PC

� Exploits power-efficient in-network processing algorithms.

� Multiple persistent queries with different sample time

“TinyDB is a query processing system for extracting information from a network of TinyOS sensors.”

But further useful database functionalities are still lacking…

� One VSDB should reside at least on every generic sensing device(e.g. Mica2)

� To compose a distributed/federated database � Each VSDB should be context aware� Each VSDB should be able to “appropriately” redirect queries to

neighbours (P2P)� because of an internal fault or a generic unavailability� because it does not possess the information� because the other node “knows” something more, in order to

complete the information� because the other node has a less power-consuming sensor on-

board� design appropriate, optimized query processing plans (e.g. redirect

subquery, cache subquery result, etc.)

14

Estrazione di dati da sorgenti web e costruzione di data warehouse

Un tema di interesse: i congressi medici nel mondo:

� Definizione di ontologie di dominio� Estrattori di informazioni� Progetto e realizzazione della base di

dati e del data warehouse