36
1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

Embed Size (px)

Citation preview

Page 1: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

1

XPath and XSLT

CSE3201/CSE4500

Information Retrieval Systems

Page 2: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

2

Manipulating XML Documents

parser

data

data

data

Applications

Page 3: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

3

What is XSL

• Extensible Stylesheet Language

• Developed by W3C XSL Working Group

• Motivation: to handle the manipulation and presentation of XML documents

• Consists of: XSLT and XSL-FO

Page 4: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

4

XSL

Stylesheet processor

XML document

XSL document

Presentation document

Transformation process

Page 5: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

5

Transformation Tools

• XPath

• XSL(Extensible Stylesheet Languages)– XSLT(XSL Transformation)– XSL-FO(XSL Formatting Object)

Page 6: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

6

Transformation Process

Page 7: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

7

XSLT Processing

• Type of processings:– Change of vocabulary– Reorder data elements– Combine data elements– Filter and exclude data elements

• Output– Other XML vocabularies or fragments– Non-XML formats

• Uses– Display and printing– Transformation of data

Page 8: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

8

XPath

• A locator for items in XML document.

• XPath expression gives direction of navigation in XML document.

• Assume an XML document as a “tree”

• Any part of a document, eg element, attribute, is considered as a “node”

• Current version XPATH 1.0

Page 9: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

9

XPath

• Syntax (full form):axis :: node-test [predicate]

• Axis– describing the relationship between nodes, eg child,

parents, etc.

• Node test– condition for selecting nodes.

• Predicate: – further condition refinement of the set of nodes resulted

from the node test.

Page 10: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

10

XPath AxesAncestor

Parent/ancestor

sibling

node

child/descendant

descendantattribute

sibling

context node

Page 11: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

11

Node Test

• A node test identifies nodes in the document that meet the criteria of the test.

• The simplest type of test is nodes that match an element name.

• Example:

child::book => to find any child element with the name “book”.

child::author

Page 12: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

12

Predicate

• Predicate further refine or filter the node-set produced by the node test.

• Example:– Find the third book in the list

• child::book[position( )=3]

– Find all the books that has <isbn> element• child::book[isbn]

Page 13: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

13

AbbreviationsFormal Short Description

child::book book Select all children of the context node that has <book> element nodes.

child::* * Select all element nodes of the context node.

self::node() . Select the context node.

parent::node() .. Select the parent of the context node.

child::book[position()=1]

Book[1] Select the first child element that has <book> element.

attribute::* @* select all the attributes of the context node

attribute::number

@number Find the number of attributes in the context node.

Page 14: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

14

Location PathDocument Root

<name>

<first>

<middle>

“John”

“Little”

<last>

“Howard”

/name/first

Uses “/” to build path, eg

Page 15: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

15

Relative vs Absolute Path

• Absolute Path– full path needs to be included, starting from the

root node.• eg: /name/first

• Relative Path– path is declared starting from the current

context node.• eg: assume our current context is “name”, the XPath

expression for the node first => first

Page 16: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

16

Recursive Decent Operator

• Locating nodes based on their names, regardless of where their positions in the document.

• Uses “//”• Example: //first

– Select any <first> element in the document (regardless how far down the tree).

• Decrease the performance of the stylesheet.– The entire document must be searched by the XSLT

parser.

Page 17: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

17

Filtering Nodes

• It is done using XPath’s predicate.– the “[ ]” symbol.

• Using element as a filter: – book[price] matches any <book> element that

has a <price> child element.

• Using attribute as a filter:– book[@id] matches any <book> element that

has an id attribute.

Page 18: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

18

XPath Expression• Some possible operators to build an XPath

Expression:and Logical AND

or Logical OR

not() logical negation

= Equal

!= Not equal

< Less than

<= Less than equal

> Greater than

>= Greater than equal

| Union

Page 19: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

19

XPath Expression - Examples

• <xsl:template match="/">• <xsl:if test=“not(position()=last())”>

Page 20: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

20

Usage of XPath in XSLT

• XSLT uses XPath expression to:– Match node sets in order to execute templates.– Evaluate node sets to control execution of

conditional XSLT elements.– Select node sets to change current context and

direct the flow of the execution through the source document.

– Select node sets to obtain an output value Professional XML, page 379.

Page 21: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

21

XPath Function

• XPath functions can be used to:– manipulate node set

• eg: count, last, name, position

– manipulate string• eg: concat, substring, contains

– test boolean value• eg: language, false, true

– perform numeric operations• eg: ceiling, floor, number, round, sum

– XSLT specific manipulation• eg: current

Page 22: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

22

XPath Function - Examples

• <xsl:if test=“not(position()=last())”>

• substring(‘abcde’,2,3) => returns ‘bcd’

Page 23: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

23

Structure of Stylesheet

• An XSLT stylesheet is an XML document.• Root element is stylesheet element

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

</xsl:stylesheet>

• Consists of a set of rules.• Rules are made up of patterns and templates.

Page 24: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

24

Attaching an XSL to an XML doc

<?xml-stylesheet type="text/xsl" href="books.xsl"?>

• href refers to the filename of the XSL document.

Page 25: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

25

Example of a Stylesheet

<?xml version="1.0"?><xsl:stylesheet

xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="html"/><xsl:template match="/"> <html> <body> <h1>Book</h1> <xsl:value-of

select="bookshop/book/title"/>

</body> </html></xsl:template></xsl:stylesheet>

<bookshop><book><title> Harry Potter and the Sorcerer stone </title><author> <initials>J.K</initials> <surname> Rowling</surname></author><price value=“$16.95”></price></book>…</bookshop>

Page 26: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

26

Selecting Output Type

• Possible outputs:– XML, HTML, Text

• Syntax:<xsl-output method=“xml”/>

<xsl-output method=“text”/>

<xsl-output method=“html”/>

Page 27: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

27

Templates

• To create a template, we need:– To declare the location in the source tree where

the template will be applied.– Rules of matching to be applied.

• can be another template

• The location is declared using the XPath expression.

Page 28: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

28

Using Templates

• Templates are called using the <xsl:apply template>.

• <xsl:apply-templates  select = node-set-expression> </xsl:apply-templates>

• The “select” attribute is optional.• Without the “select” attribute, the XSL

processor will apply the templates to all the child elements of the current context node.

Page 29: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

29

Template Examples

<?xml version="1.0"?><xsl:stylesheet

xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="html"/><xsl:template match="/bookshop/book"> <html> <body> <h1>Book</h1> <xsl:apply-templates/> </body> </html></xsl:template></xsl:stylesheet>

Page 30: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

30

Selecting Templates

<?xml version="1.0"?><xsl:stylesheet

xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

<xsl:output method="html"/>

<xsl:template match="/"> <html> <body> <h1>Monash Bookshop</h1> <xsl:apply-templates select="bookshop/book"/> </body> </html></xsl:template>

Page 31: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

31

Selecting Templates- cont’d

<xsl:template match="book" >

<xsl:apply-templates select="author"/>

</xsl:template>

<xsl:template match="author">

<h2>Author</h2>

<xsl:value-of select="."/>

</xsl:template>

</xsl:stylesheet>

Page 32: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

32

Getting the Value of a Node

xsl:value-of select=XPath expression

Example:

<xsl:template match="bookshop/book"><p><xsl:value-of select="title"/></p></xsl:template>

Page 33: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

33

Conditional Test• xsl:if

– there is no “else” statement.– takes one attribute, test, which is an XPath expression. – if it evaluates true, the body of the element is executed

• Example:– <xsl:if test=“@id”> …</xsl:if>

Page 34: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

34

Iteration

• <xsl:for-each><xsl:template match="/"><html><body><h1>Book</h1><xsl:for-each select="/bookshop/book"><p><xsl:value-of select="title"/></p></xsl:for-each></body></html></xsl:template>

<html><body><h1>Book</h1><p> Harry Potter and the Sorcerer Stone</p><p> Harry Potter</p></body></html>

Page 35: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

35

Making Copies• xsl:copy

– It does not copy any child nodes that the context node may have.• xsl:copy-of

– copies all

<xsl:template match="/bookshop"> <html> <body> <h1>Book</h1> <xsl:copy/> </body> </html></xsl:template>

<html><body><h1>Book</h1><bookshop></bookshop> </body> </html>

Page 36: 1 XPath and XSLT CSE3201/CSE4500 Information Retrieval Systems

36

Copy-of

<?xml version="1.0" encoding="utf-8"?><Author_List> <author>

<initials>JK</initials><surname> Rowling</surname>

</author> <author>

<initials>J</initials><surname> Rowling</surname>

</author></Author_List>

<?xml version="1.0"?><xsl:stylesheet version = '1.0' xmlns:xsl='http://www.w3.org/1999/XSL/Transform'><xsl:output method="xml"/>

<xsl:template match="/"><xsl:element name="Author_List"><xsl:apply-templates/></xsl:element></xsl:template>

<xsl:template match="bookshop/book"><xsl:copy-of select="author"/></xsl:template>

</xsl:stylesheet>