25
1 1 © Copyright 2011 EMC Corporation. All rights reserved. XML Pipeline Processing with XProc Vojtěch Toman Consultant Software Engineer EMC Corporation [email protected] 2 © Copyright 2011 EMC Corporation. All rights reserved. Agenda XML pipeline processing XProc XProc in practice

XML Pipeline Processing with XProc - dret.net · • Service Request/Response Handling on a Handheld • Interact with Web Service (Tide ... • Read/Write Non-XML File • Update/Insert

  • Upload
    lekiet

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

1

1 © Copyright 2011 EMC Corporation. All rights reserved.

XML Pipeline Processing with XProc

Vojtěch Toman Consultant Software Engineer EMC Corporation [email protected]

2 © Copyright 2011 EMC Corporation. All rights reserved.

Agenda • XML pipeline processing • XProc • XProc in practice

2

3 © Copyright 2011 EMC Corporation. All rights reserved.

XML Pipeline Processing

4 © Copyright 2011 EMC Corporation. All rights reserved.

XML Processing as XML Pipelines

Validate

XSLT

XQuery

3

5 © Copyright 2011 EMC Corporation. All rights reserved.

Nothing new… •  Apache Ant •  Cocoon Sitemaps •  GNU JAXP Library: Package

gnu.xml.pipeline •  Jelly : Executable XML •  MT Pipeline •  NetKernel •  Oracle XML Developer's Kit •  Schemachine •  ServingXML •  smallx

•  Strawman

•  SXPipe

•  Xerces Native Interface

•  XML Pipeline Definition Language Version 1.0

•  XML Pipeline Language (XPL) Version 1.0 (Draft)

•  XML-ECHO

•  XPipe

•  ...

6 © Copyright 2011 EMC Corporation. All rights reserved.

XProc

4

7 © Copyright 2011 EMC Corporation. All rights reserved.

XProc: An XML Pipeline Language •  “A language for describing operations to be

performed on XML documents” • W3C XML Processing Model Working Group

started in late 2005 • W3C Recommendation 11 May 2010

–  http://www.w3.org/TR/xproc/

8 © Copyright 2011 EMC Corporation. All rights reserved.

Design Principles •  Technology Neutral •  Platform Neutral •  Small and Simple •  Infoset Processing •  Straightforward Core Implementation •  Address Practical Interoperability •  Validation of XML Pipeline Documents by a Schema •  Reuse and Support for Existing Specifications •  Arbitrary Components •  Control of Inputs and Outputs •  Control of Flow and Errors

5

9 © Copyright 2011 EMC Corporation. All rights reserved.

Requirements •  Apply a Sequence of Operations

•  XInclude Processing

•  Parse/Validate/Transform

•  Document Aggregation

•  Single-file Command-line Document Processing

•  Multiple-file Command-line Document Generation

•  Extracting MathML

•  Style an XML Document in a Browser

•  Run a Custom Program

•  XInclude and Sign

•  Make Absolute URLs

•  A Simple Transformation Service

•  Service Request/Response Handling on a Handheld

•  Interact with Web Service (Tide Information)

•  Parse and/or Serialize RSS descriptions

•  Collections

•  An AJAX Server

•  Dynamic XQuery

•  Read/Write Non-XML File

•  Update/Insert Document in Database

•  Content-Dependent Transformations

•  Configuration-Dependent Transformations

•  Response to XML-RPC Request

•  Database Import/Ingestion

•  Metadata Retrieval

•  Non-XML Document Production

•  Integrate Computation Components (MathML)

•  Document Schema Definition Languages (DSDL) - Part 10: Validation Management

•  Large-Document Subtree Iteration

•  Adding Navigation to an Arbitrarily Large Document

•  Fallback to Choice of XSLT Processor

•  No Fallback for XQuery Causes Error

10 © Copyright 2011 EMC Corporation. All rights reserved.

XProc as an XML Processing Language •  “Why should I use XProc over XSLT/XQuery?” • The answer is architectural:

–  Separation of concerns –  Use the right tools for the job

6

11 © Copyright 2011 EMC Corporation. All rights reserved.

XProc as an Integration Technology • XProc integrates multiple XML technologies

–  Takes care of “plumbing”

• Easy to use, robust • Focus on WHAT, not on the low-level HOW • Better maintainability and customizability

Validate XSLT XQuery

12 © Copyright 2011 EMC Corporation. All rights reserved.

XProc as an Enabling Technology • Some XML standards

depend on XML processing capabilities

–  XForms

• The XRX architecture –  XForms/REST/X...... –  End-to-end XML model –  XProc is a natural fit

XForms

XProc

HTTP

XML

XML

Native XML DB

7

13 © Copyright 2011 EMC Corporation. All rights reserved.

XProc Basics •  Step

–  A basic computational unit of a pipeline

–  Expects zero or more XML documents on its input and produces zero or more new XML documents on its output

–  Example: Validate, XSLT, identity, rename, HTTP request, …

•  Pipeline –  A sequence (possibly non-linear) of steps

–  Is a step itself (“turtles all the way down”)

•  XPath as expression language

14 © Copyright 2011 EMC Corporation. All rights reserved.

• Applies an XProc pipeline to the sequence of input documents

• Evaluates the steps in the pipeline in the right order

XProc Processor

8

15 © Copyright 2011 EMC Corporation. All rights reserved.

Example: Validate/XQuery/Transform <p:declare-step xmlns:p="http://www.w3.org/ns/xproc" version="1.0"> <p:input port="source" sequence="true"/> <p:output port="result" sequence="true"/> <p:for-each> <p:validate-with-xml-schema> <p:input port="schema"><p:document href="schema.xsd"/></p:input> </p:validate-with-xml-schema> </p:for-each> <p:xquery> <p:input port="query"><p:document href="query.xq"/></p:input> </p:xquery> <p:xslt> <p:input port="stylesheet"><p:document href="style.xsl"/></p:input> </p:xslt> </p:declare-step>

16 © Copyright 2011 EMC Corporation. All rights reserved.

XProc Namespaces • http://www.w3.org/ns/xproc

–  The namespace of the XProc XML vocabulary –  Conventional prefix: p

• http://www.w3.org/ns/xproc-step –  The namespace used for documents that are inputs to

and outputs from standard XProc steps –  Conventional prefix: c

• http://www.w3.org/ns/xproc-error –  Used for errors. –  Conventional prefix: err

9

17 © Copyright 2011 EMC Corporation. All rights reserved.

XProc Errors • Static errors

–  Must be detected before the pipeline is evaluated –  err:XS0001: “It is a static error if there are any loops in

the connections between steps: no step can be connected to itself nor can there be any sequence of connections through other steps that leads back to itself.”

• Dynamic errors –  Occur while the pipeline is being evaluated –  err:XD0001: “It is a dynamic error if a non-XML

resource is produced on a step output or arrives on a step input.”

18 © Copyright 2011 EMC Corporation. All rights reserved.

Atomic Steps • Perform a “further indivisible” unit of processing • XProc standard step library

–  “Apply an XSLT stylesheet”

• Custom steps –  “Send an e-mail”

10

19 © Copyright 2011 EMC Corporation. All rights reserved.

Standard XProc Step Library • Required steps (31 steps)

–  p:add-attribute, p:add-xml-base, p:compare, p:count, p:delete, p:directory-list, p:error, p:escape-markup, p:filter, p:http-request, p:identity, p:insert, p:label-elements, p:load, p:make-absolute-uris, p:namespace-rename, p:pack, p:parameters, p:rename, p:replace, p:set-attributes, p:sink, p:split-sequence, p:store, p:string-replace, p:unescape-markup, p:unwrap, p:wrap, p:wrap-sequence, p:xinclude, p:xslt

• Optional steps (10 steps) –  p:exec, p:hash, p:uuid, p:validate-with-relax-ng, p:validate-with-

schematron, p:validate-with-xml-schema, p:www-form-urldecode, p:www-form-urlencode, p:xquery, p:xsl-formatter

20 © Copyright 2011 EMC Corporation. All rights reserved.

• Contain a sequence of steps (a subpipeline) • A pipeline is a compound step

• Extensibility is a natural part of the language

Compound steps

11

21 © Copyright 2011 EMC Corporation. All rights reserved.

p:viewport • Applies a subpipeline to one or more subtrees of

the input document

22 © Copyright 2011 EMC Corporation. All rights reserved.

p:for-each • Applies a subpipeline to each document in the input

sequence

12

23 © Copyright 2011 EMC Corporation. All rights reserved.

p:choose • Selects exactly one of a list of alternative

subpipelines

when

otherwise

when

24 © Copyright 2011 EMC Corporation. All rights reserved.

p:try • Try/catch logic for dealing with dynamic errors

try

catch

13

25 © Copyright 2011 EMC Corporation. All rights reserved.

p:group • Wraps a subpipeline in a single step wrapper

26 © Copyright 2011 EMC Corporation. All rights reserved.

Anatomy of an XProc Step • Step declaration

–  Type –  Input ports –  Output ports –  Options –  Parameter input ports

source position

p:insert result

match

insertion

14

27 © Copyright 2011 EMC Corporation. All rights reserved.

Step Declaration (Atomic Steps)

• Standard XProc step: p:insert

<p:declare-step type="p:insert">

<p:input port="source" primary="true"/>

  <p:input port="insertion" sequence="true"/>

  <p:output port="result"/>

  <p:option name="match" select="'/*'"/>

  <p:option name="position" required="true"/>

</p:declare-step>

28 © Copyright 2011 EMC Corporation. All rights reserved.

Step Declaration (Compound Steps) • Pipeline that applies the p:insert step

<p:declare-step>

<p:input port="source"/>

<p:output port="result"/>

<p:insert match="section" position="first-child">

<p:input port="insertion">

<p:inline><remark/></p:inline>

</p:input>

</p:insert>

</p:declare-step>

15

29 © Copyright 2011 EMC Corporation. All rights reserved.

The p:pipeline Shortcut • The following two pipelines are equivalent

<p:declare-step>

<p:input port="source"

primary="true"/>

<p:input port="parameters"

kind="parameter"

primary="true"/>

<p:output port="result"

primary="true"/>

<p:identity/>

</p:declare-step>

<p:pipeline>

<p:identity/>

</p:pipeline>

30 © Copyright 2011 EMC Corporation. All rights reserved.

Connecting Steps •  Input ports of steps can be

connected to: –  Output ports of other steps –  External documents –  Inline documents –  Empty sequence of documents

• Connections determine the evaluation order

• Default processing rules –  Implicit connections

p:identity

result

source

p:xslt

result

source stylesheet

16

31 © Copyright 2011 EMC Corporation. All rights reserved.

Connecting Steps (cont.) •  Pipe binding

–  Step name/port name

•  The following two pipelines are equivalent

<p:pipeline>

<p:identity name="id1"/>

<p:identity>

<p:input port="source">

<p:pipe step="id1" port="result"/>

</p:input>

</p:identity>

</p:pipeline>

<p:pipeline>

<p:identity/>

<p:identity/>

</p:pipeline>

32 © Copyright 2011 EMC Corporation. All rights reserved.

Step Libraries • Steps can be organized into libraries

<p:library xmlns:ex="http://www.example.org/ns/xproc">

<p:declare-step type="ex:step1">

...

</p:declare-step>

<p:pipeline type="ex:step2">

...

</p:pipeline>

</p:library>

17

33 © Copyright 2011 EMC Corporation. All rights reserved.

Importing Steps • Reuse of XProc pipelines •  Import statement

–  Can appear in p:library, p:pipeline, p:declare-step

<p:pipeline xmlns:ex="http://www.example.org/ns/xproc">

<p:import href="my-library.xpl"/>

<ex:step1/>

</p:pipeline>

34 © Copyright 2011 EMC Corporation. All rights reserved.

Extensibility of XProc • Extending XProc in XProc • Custom atomic steps • Extension attributes • …

18

35 © Copyright 2011 EMC Corporation. All rights reserved.

Custom Atomic Steps

• Custom steps that provide extension functionality –  Implementation in the processor’s host language

–  Step declaration + import

• Potentially not interoperable –  p:step-available() XPath extension function

ext:custom-step

36 © Copyright 2011 EMC Corporation. All rights reserved.

Custom Atomic Steps (Example) •  Custom step that sends an e-mail (email.xpl)

<p:declare-step type="ex:send-email"

xmlns:ex="http://www.example.org/ns/xproc">

<p:input port="source"/>

<p:option name="to" required="true"/>

</p:declare-step>

•  Usage: <p:declare-step xmlns:ex="http://www.example.org/ns/xproc">

<p:input port="source"/>

<p:import href="email.xpl"/>

<ex:send-email to="[email protected]"/>

</p:declare-step>

19

37 © Copyright 2011 EMC Corporation. All rights reserved.

Extension attributes • Processor-specific information • Attributes in a non-null, non-XProc namespace •  Ignored by processors that do not recognize them

...

<p:store href="doc.xml" ex:send-notification="true“

xmlns:ex="http://www.example.org/ns/xproc"/>

...

38 © Copyright 2011 EMC Corporation. All rights reserved.

XProc in Practice

20

39 © Copyright 2011 EMC Corporation. All rights reserved.

XProc Implementations • Calabash

–  http://xmlcalabash.com

• Calumet –  http://developer.emc.com/xmltech

• …more in development –  http://xproc.org

40 © Copyright 2011 EMC Corporation. All rights reserved.

XProc at EMC • XProc Engine (Calumet)

–  Stand-alone tool –  (Optional) tight integration with EMC Documentum xDB

• XProc Designer • EMC Documentum Dynamic Delivery Services • EMC Documentum XProc Service

• DITA XProc Pipelines

21

41 © Copyright 2011 EMC Corporation. All rights reserved.

XProc Designer • WYSIWYG XProc

editor • Written in

JavaScript (GWT)

42 © Copyright 2011 EMC Corporation. All rights reserved.

Dynamic Delivery Services •  Framework for

creating XML delivery applications

•  xDB + XForms + XProc

•  Content delivery/publishing

•  Application decommissioning

22

43 © Copyright 2011 EMC Corporation. All rights reserved.

Documentum XProc Service – Import <p:pipeline version="1.0" xmlns:p="http://www.w3.org/ns/xproc"

xmlns:ex="http://example.org" xmlns:xi="http://www.w3.org/2001/XInclude"

xmlns:sysobj="http://www.emc.com/documentum/xml/xproc/dctm/sysobject"

xmlns:pobj="http://www.emc.com/documentum/xml/xproc/dctm/persistent-object">

<p:pipeline type="ex:chunk">

<p:viewport match="/*//section">

<ex:chunk/>

<p:add-attribute match="xi:include" attribute-name="href">

<p:input port="source“><p:inline><xi:include/></p:inline></p:input>

<p:with-option name="attribute-value" select="/*"/>

</p:add-attribute>

</p:viewport>

<sysobj:create docbase="winsqlXDB"/>

<pobj:save><p:with-option name="href" select="/*"/></pobj:save>

</p:pipeline>

<ex:chunk/>

</p:pipeline>

44 © Copyright 2011 EMC Corporation. All rights reserved.

Documentum XProc Service – Export •  Can be as simple as: <p:pipeline version="1.0" xmlns:p="http://www.w3.org/ns/xproc"> <p:xinclude/> </p:pipeline>

•  More interesting (but still trivial) example <p:pipeline version="1.0" xmlns:p="http://www.w3.org/ns/xproc"> <p:xinclude/> <p:xquery> <p:input port="query"> <p:inline> <c:query> <result>{count(//para)}</result> </c:query> </p:inline> </p:input> </p:xquery> </p:pipeline>

23

45 © Copyright 2011 EMC Corporation. All rights reserved.

DITA XProc Pipelines • Darwin Information Typing Architecture (DITA)

–  OASIS standard for structuring, managing and publishing documentation

–  Topic-oriented

• DITA Open Toolkit –  Ant, Java, XSLT –  File system-based

• DITA XProc Pipelines –  Flexibility, extensibility, portability, performance

46 © Copyright 2011 EMC Corporation. All rights reserved.

DITA-OT Pipeline

24

47 © Copyright 2011 EMC Corporation. All rights reserved.

DITA XProc Pipelines – Processing Flow

48 © Copyright 2011 EMC Corporation. All rights reserved.

Resources • XProc: An XML Pipeline Language

–  http://www.w3.org/TR/xproc/

• XProc processor implementations –  Calabash (http://xmlcalabash.com) –  Calumet (http://developer.emc.com/xmltech)

• XProc.org –  Informal website about XProc and its use

•  [email protected] mailing-list • EMC Developer Network

–  http://developer.emc.com/xmltech

25

49 © Copyright 2011 EMC Corporation. All rights reserved.

THANK YOU