36
Self Maintenance of materialized XML views with non-cooperative data sources DBDBD – 2006 Virginie Sans –ETIS/CNRS Laboratory– MIDI Team

Self Maintenance of materialized XML views with non-cooperative data sources

Embed Size (px)

DESCRIPTION

Self Maintenance of materialized XML views with non-cooperative data sources. DBDBD – 2006 Virginie Sans –ETIS/CNRS Laboratory– MIDI Team. Issue and context Pre-requisite The issue Context State of the art Contributions View computation with the XAlgebra - PowerPoint PPT Presentation

Citation preview

Page 1: Self Maintenance of materialized XML views  with non-cooperative data sources

Self Maintenance of materialized XML views with non-cooperative data

sources

DBDBD – 2006

Virginie Sans –ETIS/CNRS Laboratory– MIDI Team

Page 2: Self Maintenance of materialized XML views  with non-cooperative data sources

2

SummarySummary

1) Issue and context

1) Pre-requisite2) The issue3) Context4) State of the art

2) Contributions

1) View computation with the XAlgebra2) Detection and Identification of source updates3) View maintenance4) Applications and performances

Conclusion

Page 3: Self Maintenance of materialized XML views  with non-cooperative data sources

3

Mediation architectureMediation architecture

Introduced by WiederHold

The architecture mediator wrappers sources Query langague

1.1 Pre-requisite

Page 4: Self Maintenance of materialized XML views  with non-cooperative data sources

4

Mediation architectureMediation architecture

Mediator Handle the user request: canonization, atomization Send atomic request to a source via its wrapper

wrappers Translate query coming from the mediator into a

query in the native langague of the web source Give the mediator an answer in XML

Data sources heterogeneous distributed In a web context : Partially unavailable

Source SQL

WrapperWrapper

Meditor

XMLAtomic request

SQL Tuples

1.1 Pre-requisite

Page 5: Self Maintenance of materialized XML views  with non-cooperative data sources

5

ViewsViews

What about views ? Data integration Access control, security Data-warehouses

Why ? Interoperability Heterogeneous data

Materializing views Fast access to complex query Better Availability Request optimization

RDB SQL HTML

Materializedviews

WrapperWrapper

Mediator

WrapperWrapper WrapperWrapper

1.1 Pre-requisite

Page 6: Self Maintenance of materialized XML views  with non-cooperative data sources

6

Issue : View maintenance Issue : View maintenance

Maintenance process

Recomputation Recompute the whole view from scratch

When data sources are updated, the view consistency should be kept

Incremental maintenance compute changes to view in response

to changes to base sourcesSource t

Viewt

View computation

Source t+1

Viewt+1

Recomputation

Update

incr

emen

tal

Mai

nten

ance

Maintenance

1.2 Issue

Page 7: Self Maintenance of materialized XML views  with non-cooperative data sources

7

Context : semi-structured XML dataContext : semi-structured XML data

XML views are materialized at the mediator level

Hierarchical data

No scheme, except the query scheme

<bib><book>

<price> 65.95 </price><title> Advanced Programming in the Unix environment </title>

</book> <book>

<title> TCP/IP Illustrated </title></book><book>

<price> 65.95 </price><title> Advanced Programming in the Unix environment </title>

</book> <book> <price>39.95</price>

<title> Data on the Web </title><title> Données sur le Web </title>

</book></bib>

<bib><book>

<price> 65.95 </price><title> Advanced Programming in the Unix environment </title>

</book> <book>

<title> TCP/IP Illustrated </title></book><book>

<price> 65.95 </price><title> Advanced Programming in the Unix environment </title>

</book> <book> <price>39.95</price>

<title> Data on the Web </title><title> Données sur le Web </title>

</book></bib>

1.3 Context

Page 8: Self Maintenance of materialized XML views  with non-cooperative data sources

8

Context : XQUERY Context : XQUERY

XQuery

Dedicated to XML data

Relational operator (projection, select, join, union, …)

XML operator (tagging, unnesting, aggregation, ..)

FLWOR syntax

…………(pronounced Flower !)

<result> for $b in document("bib.xml")/bib/book let $a=$b/author where $b/price/text() < 60 Order by $b/year return <cheap_book>

$b/title </cheap_book>

</result>

<result> for $b in document("bib.xml")/bib/book let $a=$b/author where $b/price/text() < 60 Order by $b/year return <cheap_book>

$b/title </cheap_book>

</result>

Syntaxe FLWOR

for $var in foret [$var in foret]*let $var:= sous-arbreWhere conditionReturn result

Syntaxe FLWOR

for $var in foret [$var in foret]*let $var:= sous-arbreWhere conditionReturn result

1.3 Context

Page 9: Self Maintenance of materialized XML views  with non-cooperative data sources

9

Context : Other specificities Context : Other specificities

Views are computed using XAlgebra Cf.View computation

Wrappers have limited resources Few computation possibilities A component named logger stores the last modification date and a checksum of sources

Non cooperative web sources No information about their updates Not always available Not enough granularity

1.3 Context

Page 10: Self Maintenance of materialized XML views  with non-cooperative data sources

10

State of the art (1/2)State of the art (1/2)

Relational views Not fit for semi-structured data

Abiteboul and Al. OEM (Object Embedded Model) LOREL language Some Operators are missing

VOX – Rainbow Team Need to know the exact position in the XML Tree where the update has been done

1.4 State of the art

Page 11: Self Maintenance of materialized XML views  with non-cooperative data sources

11

State of the art (2/2) State of the art (2/2)

Cobena and Al. XDiff – an algorithm for XML files comparison Need a copy of the source at the wrapper level

Bonnet and Al. /Papadimos and Al. Parachute queries A mutant query plan

What about when sources are really unavailable ?

Our goal :

Reduce to the minimum sources accessUse information that are stored in the view

1.4 State of the art

Page 12: Self Maintenance of materialized XML views  with non-cooperative data sources

12

View maintenance : The process View maintenance : The process

View computation An algebraic approach using XAlgebra – Extension of the XAlgebra (identifiers)

Update detection Comparison of the information of the source and those stored in the logger

Update identification Recovering process Diff Algorithm

View maintenance Propagation rules for each operator

2.1 View computation

Page 13: Self Maintenance of materialized XML views  with non-cooperative data sources

13

View computationView computation

Steps :

2.1 View computation

Page 14: Self Maintenance of materialized XML views  with non-cooperative data sources

14

The XAlgebra data modelThe XAlgebra data model

Data structures : XRelation, XTuple, XAttributes

Operators : XSource, XConstruct, XUnion, ….

2.1 View computation

Page 15: Self Maintenance of materialized XML views  with non-cooperative data sources

15

XSource Operator– Step 1XSource Operator– Step 1

XQuery analysis

We obtain : A contextA set of patterns

For $f in doc("informations.xml")/personnes/personneLet $a:=$f/nomWhere $f/age<27 and $a="Durand"Return<nom>{$a}</nom><prenom>{$f/prenom}</prenom>

Path extraction :OptionalMandatoryHidden

2.1 View computation

Page 16: Self Maintenance of materialized XML views  with non-cooperative data sources

16

XSource Operator– Step 2 and 3XSource Operator– Step 2 and 3

From XML Sub-Trees to the tabular structure

1 Sub Tree => 1 Xtuple XRelation = set of XTuples

2.1 View computation

Page 17: Self Maintenance of materialized XML views  with non-cooperative data sources

17

XSource Operator– Extending the Algebra XSource Operator– Extending the Algebra

adding identifiers : XTids

An XTID is a set of pair :

{(idsource, idfragment), …..}

2.1 View computation

Page 18: Self Maintenance of materialized XML views  with non-cooperative data sources

18

View computation - XOperatorView computation - XOperator

XProject

2.1 View computation

Page 19: Self Maintenance of materialized XML views  with non-cooperative data sources

19

View computation - XOperatorView computation - XOperator

XJoin

XTids propagation : card (XTID)1for some nodes

2.1 View computation

Page 20: Self Maintenance of materialized XML views  with non-cooperative data sources

20

Update detection and IdentificationUpdate detection and Identification

Detection

Comparison of the information of the source and those stored in the logger• The last modification date• The checksum of the source

Identification

Partial recovery of the source information based on Xtids Comparison of the recovered XRelation with the updated source Δ computation

2.2 Update detection and identification

Page 21: Self Maintenance of materialized XML views  with non-cooperative data sources

21

XRecoverXRecover

Step 1 : Project XRv on XR1 patterns

2.2 Update detection and identification

Page 22: Self Maintenance of materialized XML views  with non-cooperative data sources

22

XRecoverXRecover

Step 2 : filtering XTuples values

2.2 Update detection and identification

Page 23: Self Maintenance of materialized XML views  with non-cooperative data sources

23

XRecoverXRecover

Step 3 : re-ordering XTuples

XTidUnnest

2.2 Update detection and identification

Xtuples are unnested depending on their XTids

Page 24: Self Maintenance of materialized XML views  with non-cooperative data sources

24

XRecoverXRecover

Step 3 : re-ordering Xtuples

XTidnest

2.2 Update detection and identification

Xtuples are nested by their Xtids

Xtuples are re-ordered

Page 25: Self Maintenance of materialized XML views  with non-cooperative data sources

25

Update Identification – Comparison AlgorithmUpdate Identification – Comparison Algorithm

Comparison of XR1t+1 avec XRt’

XR1t+1 is the XRelation obtained by applying Xsource to source 1 at t+1

XRt’ is the partial recovery of Xrelation of source 1 at t

Remark : XR1t+1 can also be filtered using predicates before comparison

The Diff algorithm is based on Unix Diff (Hunt & McIllroy).The symbol is the Xtuple instead of being the line

2.2 Update detection and identification

Page 26: Self Maintenance of materialized XML views  with non-cooperative data sources

26

Update identification – Diff algorithmUpdate identification – Diff algorithm

Delta with hunks : Insert(pos; Xtuple) delete(pos;Xtuple) Replace(pos; Xtupleold, Xtuplenew)

2.2 Update detection and identification

Insert(2,{Leclerc,Avide,{(1,3)}} {John,Avide,{(1,3)}} }

Delete(4,{Durand,Avide,{(1,11)}}, {Marcel,Avide,{(1,11)}} {Eric,Avide,{(1,11)}}}

Etc…

Page 27: Self Maintenance of materialized XML views  with non-cooperative data sources

27

Maintenance RulesMaintenance RulesFrom Delta to view maintenanceFrom Delta to view maintenance

Case of a deletion - delete(pos, xtuple)

An Xtuple is associated to an Xtid {(x)} such that card=1, Each Xvalue of the view have xtids noted XTID

1) We delete from Xvalues each pair of the Xtid such that x XTID

Example : The XTuple where xtid is x=1,3 has been deletedThe Xvalue {Alain}1,3;1,4 becomes XValeur {Alain}1,4

2) We delete each Xvalues such that card(XTID)=0

If XValue {Alain}1,3 become XValeur {Alain} We delete entirely the XValue

3) If the Xvalue was concenned by the predicate, we delete the XTuple

Join and restriction case

2.3 View maintenance

Page 28: Self Maintenance of materialized XML views  with non-cooperative data sources

28

Maintenance RulesMaintenance RulesFrom Delta to view maintenanceFrom Delta to view maintenance

Case of an insertion - insert(pos; xtuple)

1) A new Xtid is created Goal : preserved Xtuples order for a later recovery

2) Depending on the operator; we obtain various maintenance instructions

Projection: insert of the projection of the xtupleSelect : xtuple satisfies the predicat insertion

Join XR1 * XR2, computation of XT= xtuple * XR2. If XT insertion of XT

Union and Intersect: we keep the conservation des doublons Union Select where the predicate is always true Intersect join

Depending on the predicate, we can request either XR2 or its recovery

2.3 View maintenance

Page 29: Self Maintenance of materialized XML views  with non-cooperative data sources

29

Maintenance RulesMaintenance RulesFrom Delta to view maintenanceFrom Delta to view maintenance

Case of a modification- Replace(pos; Xtupleold, Xtuplenew)

Xtuple modification=

Xvalue modification OR

Xvalues deletion followed by insertion

Project and Union: modification of the concerned XValuesSelect and Intersect: If modification is applied an Xvalue that must verify the condition,

deletion of the Xtuple Else modification of the XValuesIntersect select.Join deletion followed by insertion.

2.3 View maintenance

Page 30: Self Maintenance of materialized XML views  with non-cooperative data sources

30

Maintenance RulesMaintenance RulesFrom Delta to view maintenanceFrom Delta to view maintenance

2.3 View maintenance

Page 31: Self Maintenance of materialized XML views  with non-cooperative data sources

31

Maintenance rulesMaintenance rulesMissing InformationMissing Information

Missing Information (join ?)

Source Recovery Multi-view strategy Source request

Goal : limited acces to the sources !!!!

Example :View= S1*S2

SQLHTML

Materialized viewsMediator

WrapperWrapperWrapperWrapper

xtuple x is inserted in S1

Computation of S2’

Insertio : x * S2’

2.3 View maintenance

Page 32: Self Maintenance of materialized XML views  with non-cooperative data sources

32

ApplicationsApplications

•On the web

• With sensors (ANR Project )

When necessary sources are unavailable

Goal : Limited access to them

With sensors that have no wire

Goal: Preserve power ressources

2.4 Applications and performances

Page 33: Self Maintenance of materialized XML views  with non-cooperative data sources

33

PerformancesPerformances

• Comparison between XRecover and Recomputation

2.4 Applications and performances

Page 34: Self Maintenance of materialized XML views  with non-cooperative data sources

34

PerformancesPerformances

• Comparison between XRecover and Recomputation

2.4 Applications and performances

Page 35: Self Maintenance of materialized XML views  with non-cooperative data sources

35

ContributionsContributions

Maintenance process in the context of non-cooperative web sources

Contribution to the XAlgebra New operators : XRecover, XTidUnnest, XTidNest

New data structure : XTids

Futur work Order sensitive view maintenance

A better Diff algorithm

Conclusion

Page 36: Self Maintenance of materialized XML views  with non-cooperative data sources

36

Thanks for you Thanks for you attention !attention !

Any questions ?Any questions ?