Upload
lekiet
View
219
Download
0
Embed Size (px)
Citation preview
1
1 © Copyright 2011 EMC Corporation. All rights reserved.
XML Pipeline Processing with XProc
Vojtěch Toman Consultant Software Engineer EMC Corporation [email protected]
2 © Copyright 2011 EMC Corporation. All rights reserved.
Agenda • XML pipeline processing • XProc • XProc in practice
2
3 © Copyright 2011 EMC Corporation. All rights reserved.
XML Pipeline Processing
4 © Copyright 2011 EMC Corporation. All rights reserved.
XML Processing as XML Pipelines
Validate
XSLT
XQuery
3
5 © Copyright 2011 EMC Corporation. All rights reserved.
Nothing new… • Apache Ant • Cocoon Sitemaps • GNU JAXP Library: Package
gnu.xml.pipeline • Jelly : Executable XML • MT Pipeline • NetKernel • Oracle XML Developer's Kit • Schemachine • ServingXML • smallx
• Strawman
• SXPipe
• Xerces Native Interface
• XML Pipeline Definition Language Version 1.0
• XML Pipeline Language (XPL) Version 1.0 (Draft)
• XML-ECHO
• XPipe
• ...
6 © Copyright 2011 EMC Corporation. All rights reserved.
XProc
4
7 © Copyright 2011 EMC Corporation. All rights reserved.
XProc: An XML Pipeline Language • “A language for describing operations to be
performed on XML documents” • W3C XML Processing Model Working Group
started in late 2005 • W3C Recommendation 11 May 2010
– http://www.w3.org/TR/xproc/
8 © Copyright 2011 EMC Corporation. All rights reserved.
Design Principles • Technology Neutral • Platform Neutral • Small and Simple • Infoset Processing • Straightforward Core Implementation • Address Practical Interoperability • Validation of XML Pipeline Documents by a Schema • Reuse and Support for Existing Specifications • Arbitrary Components • Control of Inputs and Outputs • Control of Flow and Errors
5
9 © Copyright 2011 EMC Corporation. All rights reserved.
Requirements • Apply a Sequence of Operations
• XInclude Processing
• Parse/Validate/Transform
• Document Aggregation
• Single-file Command-line Document Processing
• Multiple-file Command-line Document Generation
• Extracting MathML
• Style an XML Document in a Browser
• Run a Custom Program
• XInclude and Sign
• Make Absolute URLs
• A Simple Transformation Service
• Service Request/Response Handling on a Handheld
• Interact with Web Service (Tide Information)
• Parse and/or Serialize RSS descriptions
• Collections
• An AJAX Server
• Dynamic XQuery
• Read/Write Non-XML File
• Update/Insert Document in Database
• Content-Dependent Transformations
• Configuration-Dependent Transformations
• Response to XML-RPC Request
• Database Import/Ingestion
• Metadata Retrieval
• Non-XML Document Production
• Integrate Computation Components (MathML)
• Document Schema Definition Languages (DSDL) - Part 10: Validation Management
• Large-Document Subtree Iteration
• Adding Navigation to an Arbitrarily Large Document
• Fallback to Choice of XSLT Processor
• No Fallback for XQuery Causes Error
10 © Copyright 2011 EMC Corporation. All rights reserved.
XProc as an XML Processing Language • “Why should I use XProc over XSLT/XQuery?” • The answer is architectural:
– Separation of concerns – Use the right tools for the job
6
11 © Copyright 2011 EMC Corporation. All rights reserved.
XProc as an Integration Technology • XProc integrates multiple XML technologies
– Takes care of “plumbing”
• Easy to use, robust • Focus on WHAT, not on the low-level HOW • Better maintainability and customizability
Validate XSLT XQuery
12 © Copyright 2011 EMC Corporation. All rights reserved.
XProc as an Enabling Technology • Some XML standards
depend on XML processing capabilities
– XForms
• The XRX architecture – XForms/REST/X...... – End-to-end XML model – XProc is a natural fit
XForms
XProc
HTTP
XML
XML
Native XML DB
7
13 © Copyright 2011 EMC Corporation. All rights reserved.
XProc Basics • Step
– A basic computational unit of a pipeline
– Expects zero or more XML documents on its input and produces zero or more new XML documents on its output
– Example: Validate, XSLT, identity, rename, HTTP request, …
• Pipeline – A sequence (possibly non-linear) of steps
– Is a step itself (“turtles all the way down”)
• XPath as expression language
14 © Copyright 2011 EMC Corporation. All rights reserved.
• Applies an XProc pipeline to the sequence of input documents
• Evaluates the steps in the pipeline in the right order
XProc Processor
8
15 © Copyright 2011 EMC Corporation. All rights reserved.
Example: Validate/XQuery/Transform <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="1.0"> <p:input port="source" sequence="true"/> <p:output port="result" sequence="true"/> <p:for-each> <p:validate-with-xml-schema> <p:input port="schema"><p:document href="schema.xsd"/></p:input> </p:validate-with-xml-schema> </p:for-each> <p:xquery> <p:input port="query"><p:document href="query.xq"/></p:input> </p:xquery> <p:xslt> <p:input port="stylesheet"><p:document href="style.xsl"/></p:input> </p:xslt> </p:declare-step>
16 © Copyright 2011 EMC Corporation. All rights reserved.
XProc Namespaces • http://www.w3.org/ns/xproc
– The namespace of the XProc XML vocabulary – Conventional prefix: p
• http://www.w3.org/ns/xproc-step – The namespace used for documents that are inputs to
and outputs from standard XProc steps – Conventional prefix: c
• http://www.w3.org/ns/xproc-error – Used for errors. – Conventional prefix: err
9
17 © Copyright 2011 EMC Corporation. All rights reserved.
XProc Errors • Static errors
– Must be detected before the pipeline is evaluated – err:XS0001: “It is a static error if there are any loops in
the connections between steps: no step can be connected to itself nor can there be any sequence of connections through other steps that leads back to itself.”
• Dynamic errors – Occur while the pipeline is being evaluated – err:XD0001: “It is a dynamic error if a non-XML
resource is produced on a step output or arrives on a step input.”
18 © Copyright 2011 EMC Corporation. All rights reserved.
Atomic Steps • Perform a “further indivisible” unit of processing • XProc standard step library
– “Apply an XSLT stylesheet”
• Custom steps – “Send an e-mail”
10
19 © Copyright 2011 EMC Corporation. All rights reserved.
Standard XProc Step Library • Required steps (31 steps)
– p:add-attribute, p:add-xml-base, p:compare, p:count, p:delete, p:directory-list, p:error, p:escape-markup, p:filter, p:http-request, p:identity, p:insert, p:label-elements, p:load, p:make-absolute-uris, p:namespace-rename, p:pack, p:parameters, p:rename, p:replace, p:set-attributes, p:sink, p:split-sequence, p:store, p:string-replace, p:unescape-markup, p:unwrap, p:wrap, p:wrap-sequence, p:xinclude, p:xslt
• Optional steps (10 steps) – p:exec, p:hash, p:uuid, p:validate-with-relax-ng, p:validate-with-
schematron, p:validate-with-xml-schema, p:www-form-urldecode, p:www-form-urlencode, p:xquery, p:xsl-formatter
20 © Copyright 2011 EMC Corporation. All rights reserved.
• Contain a sequence of steps (a subpipeline) • A pipeline is a compound step
• Extensibility is a natural part of the language
Compound steps
11
21 © Copyright 2011 EMC Corporation. All rights reserved.
p:viewport • Applies a subpipeline to one or more subtrees of
the input document
22 © Copyright 2011 EMC Corporation. All rights reserved.
p:for-each • Applies a subpipeline to each document in the input
sequence
12
23 © Copyright 2011 EMC Corporation. All rights reserved.
p:choose • Selects exactly one of a list of alternative
subpipelines
when
otherwise
when
24 © Copyright 2011 EMC Corporation. All rights reserved.
p:try • Try/catch logic for dealing with dynamic errors
try
catch
13
25 © Copyright 2011 EMC Corporation. All rights reserved.
p:group • Wraps a subpipeline in a single step wrapper
26 © Copyright 2011 EMC Corporation. All rights reserved.
Anatomy of an XProc Step • Step declaration
– Type – Input ports – Output ports – Options – Parameter input ports
source position
p:insert result
match
insertion
14
27 © Copyright 2011 EMC Corporation. All rights reserved.
Step Declaration (Atomic Steps)
• Standard XProc step: p:insert
<p:declare-step type="p:insert">
<p:input port="source" primary="true"/>
<p:input port="insertion" sequence="true"/>
<p:output port="result"/>
<p:option name="match" select="'/*'"/>
<p:option name="position" required="true"/>
</p:declare-step>
28 © Copyright 2011 EMC Corporation. All rights reserved.
Step Declaration (Compound Steps) • Pipeline that applies the p:insert step
<p:declare-step>
<p:input port="source"/>
<p:output port="result"/>
<p:insert match="section" position="first-child">
<p:input port="insertion">
<p:inline><remark/></p:inline>
</p:input>
</p:insert>
</p:declare-step>
15
29 © Copyright 2011 EMC Corporation. All rights reserved.
The p:pipeline Shortcut • The following two pipelines are equivalent
<p:declare-step>
<p:input port="source"
primary="true"/>
<p:input port="parameters"
kind="parameter"
primary="true"/>
<p:output port="result"
primary="true"/>
<p:identity/>
</p:declare-step>
<p:pipeline>
<p:identity/>
</p:pipeline>
30 © Copyright 2011 EMC Corporation. All rights reserved.
Connecting Steps • Input ports of steps can be
connected to: – Output ports of other steps – External documents – Inline documents – Empty sequence of documents
• Connections determine the evaluation order
• Default processing rules – Implicit connections
p:identity
result
source
p:xslt
result
source stylesheet
16
31 © Copyright 2011 EMC Corporation. All rights reserved.
Connecting Steps (cont.) • Pipe binding
– Step name/port name
• The following two pipelines are equivalent
<p:pipeline>
<p:identity name="id1"/>
<p:identity>
<p:input port="source">
<p:pipe step="id1" port="result"/>
</p:input>
</p:identity>
</p:pipeline>
<p:pipeline>
<p:identity/>
<p:identity/>
</p:pipeline>
32 © Copyright 2011 EMC Corporation. All rights reserved.
Step Libraries • Steps can be organized into libraries
<p:library xmlns:ex="http://www.example.org/ns/xproc">
<p:declare-step type="ex:step1">
...
</p:declare-step>
<p:pipeline type="ex:step2">
...
</p:pipeline>
</p:library>
17
33 © Copyright 2011 EMC Corporation. All rights reserved.
Importing Steps • Reuse of XProc pipelines • Import statement
– Can appear in p:library, p:pipeline, p:declare-step
<p:pipeline xmlns:ex="http://www.example.org/ns/xproc">
<p:import href="my-library.xpl"/>
<ex:step1/>
</p:pipeline>
34 © Copyright 2011 EMC Corporation. All rights reserved.
Extensibility of XProc • Extending XProc in XProc • Custom atomic steps • Extension attributes • …
18
35 © Copyright 2011 EMC Corporation. All rights reserved.
Custom Atomic Steps
• Custom steps that provide extension functionality – Implementation in the processor’s host language
– Step declaration + import
• Potentially not interoperable – p:step-available() XPath extension function
ext:custom-step
36 © Copyright 2011 EMC Corporation. All rights reserved.
Custom Atomic Steps (Example) • Custom step that sends an e-mail (email.xpl)
<p:declare-step type="ex:send-email"
xmlns:ex="http://www.example.org/ns/xproc">
<p:input port="source"/>
<p:option name="to" required="true"/>
</p:declare-step>
• Usage: <p:declare-step xmlns:ex="http://www.example.org/ns/xproc">
<p:input port="source"/>
<p:import href="email.xpl"/>
<ex:send-email to="[email protected]"/>
</p:declare-step>
19
37 © Copyright 2011 EMC Corporation. All rights reserved.
Extension attributes • Processor-specific information • Attributes in a non-null, non-XProc namespace • Ignored by processors that do not recognize them
...
<p:store href="doc.xml" ex:send-notification="true“
xmlns:ex="http://www.example.org/ns/xproc"/>
...
38 © Copyright 2011 EMC Corporation. All rights reserved.
XProc in Practice
20
39 © Copyright 2011 EMC Corporation. All rights reserved.
XProc Implementations • Calabash
– http://xmlcalabash.com
• Calumet – http://developer.emc.com/xmltech
• …more in development – http://xproc.org
40 © Copyright 2011 EMC Corporation. All rights reserved.
XProc at EMC • XProc Engine (Calumet)
– Stand-alone tool – (Optional) tight integration with EMC Documentum xDB
• XProc Designer • EMC Documentum Dynamic Delivery Services • EMC Documentum XProc Service
• DITA XProc Pipelines
21
41 © Copyright 2011 EMC Corporation. All rights reserved.
XProc Designer • WYSIWYG XProc
editor • Written in
JavaScript (GWT)
42 © Copyright 2011 EMC Corporation. All rights reserved.
Dynamic Delivery Services • Framework for
creating XML delivery applications
• xDB + XForms + XProc
• Content delivery/publishing
• Application decommissioning
22
43 © Copyright 2011 EMC Corporation. All rights reserved.
Documentum XProc Service – Import <p:pipeline version="1.0" xmlns:p="http://www.w3.org/ns/xproc"
xmlns:ex="http://example.org" xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:sysobj="http://www.emc.com/documentum/xml/xproc/dctm/sysobject"
xmlns:pobj="http://www.emc.com/documentum/xml/xproc/dctm/persistent-object">
<p:pipeline type="ex:chunk">
<p:viewport match="/*//section">
<ex:chunk/>
<p:add-attribute match="xi:include" attribute-name="href">
<p:input port="source“><p:inline><xi:include/></p:inline></p:input>
<p:with-option name="attribute-value" select="/*"/>
</p:add-attribute>
</p:viewport>
<sysobj:create docbase="winsqlXDB"/>
<pobj:save><p:with-option name="href" select="/*"/></pobj:save>
</p:pipeline>
<ex:chunk/>
</p:pipeline>
44 © Copyright 2011 EMC Corporation. All rights reserved.
Documentum XProc Service – Export • Can be as simple as: <p:pipeline version="1.0" xmlns:p="http://www.w3.org/ns/xproc"> <p:xinclude/> </p:pipeline>
• More interesting (but still trivial) example <p:pipeline version="1.0" xmlns:p="http://www.w3.org/ns/xproc"> <p:xinclude/> <p:xquery> <p:input port="query"> <p:inline> <c:query> <result>{count(//para)}</result> </c:query> </p:inline> </p:input> </p:xquery> </p:pipeline>
23
45 © Copyright 2011 EMC Corporation. All rights reserved.
DITA XProc Pipelines • Darwin Information Typing Architecture (DITA)
– OASIS standard for structuring, managing and publishing documentation
– Topic-oriented
• DITA Open Toolkit – Ant, Java, XSLT – File system-based
• DITA XProc Pipelines – Flexibility, extensibility, portability, performance
46 © Copyright 2011 EMC Corporation. All rights reserved.
DITA-OT Pipeline
24
47 © Copyright 2011 EMC Corporation. All rights reserved.
DITA XProc Pipelines – Processing Flow
48 © Copyright 2011 EMC Corporation. All rights reserved.
Resources • XProc: An XML Pipeline Language
– http://www.w3.org/TR/xproc/
• XProc processor implementations – Calabash (http://xmlcalabash.com) – Calumet (http://developer.emc.com/xmltech)
• XProc.org – Informal website about XProc and its use
• [email protected] mailing-list • EMC Developer Network
– http://developer.emc.com/xmltech