29
XML-based Web Publishing and Content Management at Seattle University School of Law James Cooper Director of Technology & Media Services [email protected] Evan Lenz Content Management Architect [email protected]

XML-based Web Publishing and Content Management at Seattle University School of Law

Embed Size (px)

DESCRIPTION

XML-based Web Publishing and Content Management at Seattle University School of Law. James Cooper Director of Technology & Media Services [email protected] Evan Lenz Content Management Architect [email protected]. Contents. Web site requirements and architecture - PowerPoint PPT Presentation

Citation preview

Page 1: XML-based Web Publishing and Content Management at Seattle University School of Law

XML-based Web Publishing and Content Management at Seattle

University School of Law

James CooperDirector of Technology & Media Services

[email protected]

Evan LenzContent Management Architect

[email protected]

Page 2: XML-based Web Publishing and Content Management at Seattle University School of Law

Contents

1. Web site requirements and architecture

2. Web site management with Cocoon• URI design discussion

3. Redhawk CMS

4. An acronym you should know: XSLT

5. Q&A

Page 3: XML-based Web Publishing and Content Management at Seattle University School of Law

1. Web site requirements and architecture

Page 4: XML-based Web Publishing and Content Management at Seattle University School of Law

SU Law Web site requirements (summer 2002)

• Must include a Flash-enhanced version• Must include an HTML-based version that approximates

the look-and-feel and navigational structure of the Flash-enhanced version

• Must include a version of the site that is designed for accessibility

• Must employ the separation of presentation and content through the use of XML technologies. Multiple published versions of the same content must originate in an automatic way from the same source.

• The publishing framework must employ a single point of control over navigational structure, e.g. using an XML configuration file.

Page 5: XML-based Web Publishing and Content Management at Seattle University School of Law

Web site requirements, cont.

• Must allow an average Web developer to easily author new content, edit existing content, etc.

• Must accommodate the continued use of existing tools for authoring content, e.g. Dreamweaver.

• Particular kinds of content that have predictable, repeating structure should be converted into custom XML vocabularies to increase their flexibility and ease of management.

• The Web site must include search functionality integrated into all versions of the site.

Page 6: XML-based Web Publishing and Content Management at Seattle University School of Law

Web content strategy today• Static pages were converted to and are stored as style-free XHTML

(in VSS, with latest versions shadowed on the staging server).• Apache Ant is invoked on the staging server to incrementally build

all versions (Flash, Standard, Text-only, and crawler) of each static page, using the page source, as well as global navigation and sidebar configuration files, as input.

• Cocoon powers the core functionality of the site, including setting the user’s version preferences and serving dynamic content. All static pages and files are served directly by Apache.

• Dynamic content pieces are identified by URI in the Cocoon sitemap, which is configured to assemble corresponding pages on-the-fly. Dynamic content examples include:– Specialized content in our home-grown CMS called “Redhawk”, which

provides end-user WYSIWYG editing of certain kinds of content– Google search results– Legacy ASP pages

• Traditional Web content management, e.g. WYSIWYG editing of all pages, is being considered, but not sorely missed at this time.

Page 7: XML-based Web Publishing and Content Management at Seattle University School of Law

Benefits of using XML

• Separation of presentation from content– Ensures consistency of presentation across all pages

(eliminates layout errors)– Enables publication to multiple channels– Content re-use

• Many commercial and open-source tools available for processing/creating XML

• Integration between disparate systems (including legacy ASP pages, Google, Redhawk, etc.)

• Great for configuration files

Page 8: XML-based Web Publishing and Content Management at Seattle University School of Law

Primary tools used in our Web site

Run-time:• Apache Cocoon (Java-based)• Apache Web server on Linux• mod_rewrite (for rewriting

incoming URLs, e.g. path?mode=flash, to /flash-html/path.html)

• Google Appliance (for integrated search inside our site template)

• IIS/ASP (legacy database access scripts, e-mail forms, etc.)

• 4Suite, for exporting content from the Redhawk CMS (based on 4Suite)

Build-time:• MS Visual SourceSafe (for

versioning of static content)• Samba (for mounting a VSS

shadow folder on the Linux staging server)

• Dreamweaver MX (includes XHTML support and VSS integration)

• Apache Ant (for building the bulk of the site statically)

• 4Suite, for end-user content management of specialized document types, aka Redhawk

Page 9: XML-based Web Publishing and Content Management at Seattle University School of Law

2. Web site management with Cocoon

Page 10: XML-based Web Publishing and Content Management at Seattle University School of Law

Introduction to Cocoon

• Cocoon is an open-source, Java-based XML Web publishing framework

• Recently gained status as a top-level Apache project, at http://cocoon.apache.org

• Designed to enable the separation of concerns between content, logic, and style

Page 11: XML-based Web Publishing and Content Management at Seattle University School of Law

The Cocoon sitemap

• SAX-based pipeline mechanism allows XML content to go through a series of transformations, configurable by the sitemap, Cocoon's central point of configuration

• Each pipeline consists of:– Exactly one generator

• Produces XML content using any number of mechanisms: reading a file, submitting an HTTP request, calling a database, invoking a server page script, etc.

– Followed by zero or more transformers• Processes the XML, e.g. XSLT or Xinclude, for subsequent handling

by either another transformer or the serializer– Followed by exactly one serializer

• Serializes into a particular format, e.g. well-formed XML, browser-compatible XHTML, SVG, PDF (via XSL:FO and FOP), rasterized images (via SVG and Batik), etc.

Page 12: XML-based Web Publishing and Content Management at Seattle University School of Law

Simplified Cocoon sitemap excerpt

<map:match pattern="accesstojustice/hague/cases"> <map:generate src="http://redhawk/?xslt=getCases.xsl"/> <map:transform src="stylesheets/case2html.xsl"/> <map:serialize type="xhtml"/></map:match>

Page 13: XML-based Web Publishing and Content Management at Seattle University School of Law

Another sitemap excerpt <map:resource name="front-door"> <map:select type="request-parameter"> <map:parameter name="parameter-name" value="set-version"/> <map:when test="flash"> <map:call resource="check-flash"/> </map:when> <map:when test="flash-confirmed"> <map:call resource="set-preference-to-flash"/> </map:when> <map:when test="standard"> <map:call resource="set-preference-to-standard"/> </map:when> <map:when test="simple"> <map:call resource="set-preference-to-simple"/> </map:when> <map:otherwise> <!-- more logic --> </map:otherwise> </map:select> </map:resource>

Page 14: XML-based Web Publishing and Content Management at Seattle University School of Law
Page 15: XML-based Web Publishing and Content Management at Seattle University School of Law

URI design considerations

• The URI design of the SU Law Web site was inspired by Tim Berners-Lee's 1998 essay “Cool URIs don't change” – http://www.w3.org/Provider/Style/URI.html

• Aims to follow two of the essay's suggestions:– Leave out file extensions– Leave out topic/classification by subject

Page 16: XML-based Web Publishing and Content Management at Seattle University School of Law

Leave out file extensions

• Cocoon makes it easy to map external URIs to internal filenames or other content generators

• In the SU Law Web site, the URLs of all HTML pages do not include any file extensions

• Other types of content use standard file extensions, e.g. JPG, GIF, Flash, Word, etc.

Page 17: XML-based Web Publishing and Content Management at Seattle University School of Law

Leave out topic/classification by subject

• Difficult problem• Design URIs such that they are meaningfully

mnemonic and will never change, even though the corresponding pages may be classified into different topics later

• Berners-Lee: "Because the relationships between subjects are web-like rather than tree-like, even...people who agree on a web may pick a different tree representation."

Page 18: XML-based Web Publishing and Content Management at Seattle University School of Law

Decouple navigational structure from URI structure

• URI structure is, of necessity, hierarchical• Site navigation tends to be hierarchical,

classifying pages into topics or subjects• To help in following the original suggestion, we

formulated the following mandate:– Decouple navigational structure from URI structure.

• We met this goal through the use of a custom XML configuration file (navigation.xml) that maps between the two independent hierarchies (navigation and URI structure)

Page 19: XML-based Web Publishing and Content Management at Seattle University School of Law

Excerpt from navigation.xml<navigation xmlns="http://law.seattleu.edu"> <menu display="Welcome" sectionId="welcome"> <link href="/" display="SU Law Home"/> <link display="Contact Information" href="/contactus"/> <link display="Directions" href="/directions"/> <link href="/welcome" display="From the Dean"/> <link href="/history" display="History"/> <link href="/calendar" display="Master Calendar"/> <link href="/mission" display="Mission"/> <link href="/search" display="Search"/> <link href="/sitemap" display="Site Map"/> <link href="http://www.seattleu.edu" display="Seattle University Home"/> <hidden href="/news" display="News"/> <hidden pattern="/news"/> <hidden href="/privacy" display="Privacy Statement"/> </menu> <menu display="Students" sectionId="students"> <menu display="Academics"> <link href="/academics" display="Introduction"/> <link href="/academics/calendar" display="Academic Calendar"/> <link href="/courses" display="Course Descriptions"/> <link href="/classassignments" display="Class Assignments"/> <hidden pattern="/classassignments"/> <!-- more pages --> </menu> <!-- more submenus --> </menu> <!-- more menus --></navigation>

Page 20: XML-based Web Publishing and Content Management at Seattle University School of Law

The benefits of URI-navigation independence

• Pages can be moved from one section of the site to another by simply editing one file (navigation.xml)

• Navigation structure can change without needing to update any links or change any URIs (thereby rendering them uncool)

• Files do not need to be moved around just because corresponding pages “move around” the site

Page 21: XML-based Web Publishing and Content Management at Seattle University School of Law

XML-based configuration of the Web site “sidebar”

<sidebar xmlns="http://law.seattleu.edu"> <allButtons> <promotion id="laptop" img="laptoppurchase.gif“ alt="Student Laptop Purchase Program (Dell)“ href="/technology/purchase"/> <profile id="cmhall" alt="Christian Halliburton Video“ movie="cmhall.rm"/> <quote id="cumbow" img="cumbow.gif" alt="Cumbow Quote"/> ... </allButtons> ... <section id="faculty"> <profile idref="cmhall"/> <quote idref="cumbow"/> <promotion idref="giving"/> <promotion idref="newfaculty"/> <promotion idref="laptop"/> </section> ...</sidebar>

Page 22: XML-based Web Publishing and Content Management at Seattle University School of Law

3. Redhawk CMS

Page 23: XML-based Web Publishing and Content Management at Seattle University School of Law

Redhawk, home-grown CMS• Redhawk is a specialized XML content management system, based

on 4Suite, an open-source platform for XML and RDF processing• Named after SU mascot• Basic unit of storage is an XML document• Supports development of custom Redhawk "document classes",

which correspond to XML document types (or schemas)• Provides basic CRUD (Create, Read, Update, Delete) and role-

based workflow functionality• Two types of users for each document class: Author and Editor• Any Create, Update, or Delete requests by an Author must be

approved by an Editor before taking effect• Pluggable WYSIWYG editing environments; so far we have

developed support for Altova's free browser-based XML editor, Authentic 5

• Future plans to support Microsoft InfoPath and Word 2003

Page 24: XML-based Web Publishing and Content Management at Seattle University School of Law

Create New Announcement form

Page 25: XML-based Web Publishing and Content Management at Seattle University School of Law

Current Redhawk applications

• Announcements and events for the Docket (migration from custom production application in process)

• Access to Justice Institute’s Hague Project for managing Hague Convention-related case information (in production)

Page 26: XML-based Web Publishing and Content Management at Seattle University School of Law

4. An acronym you should know: XSLT

Page 27: XML-based Web Publishing and Content Management at Seattle University School of Law

The common denominator: XSLT (Extensible Stylesheet Language Transformations)

• Used in Cocoon to assemble all pages (XSLT is the default type of "Transformer")

• Used in our site build process, via Ant's <xslt> task for collectively applying transformations over multiple files

• Built-in to 4Suite and used throughout Redhawk to assemble pages, create documents, and implement the core CMS logic (with the help of extensions)

• Used in the Google Appliance to style the output of search results• Used in Redhawk in the browser to apply supplemental "clean-up"

transformations to the XML resulting from Authentic editing• Growing abundance of conformant XSLT processors, including IE6

and Mozilla support, as well as a growing number of powerful tools• And… XSLT is reaching mainstream technology status: Microsoft

Office 2003 will pervasively employ XSLT for the development of custom XML solutions, particularly in Word, Excel, Access, and InfoPath.

Page 28: XML-based Web Publishing and Content Management at Seattle University School of Law

References

• http://cocoon.apache.org• http://4suite.org• http://ant.apache.org

• “Cool URIs don't change” – http://www.w3.org/Provider/Style/URI.html

• “Cocoon and 4Suite for Content Management: The Best of Both Worlds at Seattle University School of Law” - http://www.xmlportfolio.com/xmleurope2003/

Page 29: XML-based Web Publishing and Content Management at Seattle University School of Law

Questions?