Upload
darcy-hamilton
View
241
Download
1
Tags:
Embed Size (px)
Citation preview
XML/XSL and Information
continue with XML/XSL
Ideas from Edward Tufte on data density & data junk
Homework: find & report on data presentation, Tufte, statistics, graphs
Comments on XSLT
• declarative as opposed to procedural language– no side effects (variables can't be changed, order of
application of templates is somewhat flexible)
• one main use is matching parts of XML tree using patterns and 'declaring' results– push processing: pushes out results based on applying
templates
– pull processing: pulls in relevant information and produces results
Comments, cont.
• XML still under development (definition of next standard)– New version will have what are now done with so-
called extensions.
– Other options are server-side or client side programming
• XML Schema possible replacement for DTDs• XML-Formatting Objects focus [more] on
formatting
XSLT examples
• use of variables defined by [more intricate] Xpath expressions
• use of recursive calls to named templates– template calling itself with new parameters
Other mechanisms (for you to look up as needed)– mode for template: facility to examine (transform the
same nodes under different conditions=modes)– key function: facility to categorize nodes according to
some calculated expression.
World cup data
• Previous example transformed each match, dependent on whether or not match marked as 'feature' in attribute.
• What about producing a table of the results?
What do we want to do
• Transform logic:– produce HTML table, one row for each team– calculate for team certain values that use all the
matches that that team is 'in'.
• Implementation– use XSLT variables
XSLT mechanics
<xsl:variable name="teams" select="//team[not(.=preceding::team)]"/>• The value of variables is set by the Select pattern. THEY CANNOT
BE CHANGED. • The //team means find all the team nodes anywhere.• The . means the node you are considering now.• The square brackets define a condition for which teams are to be
selected. This variable is a node set.• the preceding:: is an example of what is called an axis. It is a
qualifier.• This says: make up a node set consisting of teams, but don't include
any that have occurred previously.
<?xml version="1.0" encoding="UTF-8" ?><xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><xsl:output method="html"/>
<xsl:variable name="teams" select="//team[not(.=preceding::team)]"/><xsl:variable name="matches" select="//match"/>
<xsl:template match="/results"> <html> <head><title>Results of World Cup </title><LINK REL="stylesheet" TYPE="text/css" HREF="results.css"/></head> <body> <h2> Results of World Cup </h2>
<table cellpadding="5">
<tr>
<th> Team </th>
<th> Played </th>
<th> Won </th>
<th> Lost </th>
<th> Tied </th>
<th> For </th>
<th> Against </th>
<th> Points </th>
</tr>
<xsl:for-each select ="$teams">
<xsl:variable name="this" select="."/>
<xsl:variable name="played" select="count($matches[team=$this])"/>
<xsl:variable name="won" select="count($matches[team[.=$this]/@score > team[.!=$this]/@score])"/>
<xsl:variable name="lost" select="count($matches[team[.=$this]/@score < team[.!=$this]/@score])"/>
<xsl:variable name="tied" select="count($matches[team[.=$this]/@score = team[.!=$this]/@score])"/>
<xsl:variable name="for" select="sum($matches/team[.=current()]/@score)"/>
<xsl:variable name="against" select="sum($matches[team=current()]/team/@score)-$for"/>
<xsl:variable name="points" select="3*$won+$tied"/>
$ indicates variable
<tr><td><xsl:value-of select="."/></td><td><xsl:value-of select="$played"/></td><td><xsl:value-of select="$won"/></td><td><xsl:value-of select="$lost"/></td><td><xsl:value-of select="$tied"/></td><td><xsl:value-of select="$for"/></td><td><xsl:value-of select="$against"/></td><td><xsl:value-of select="$points"/> </td></tr></xsl:for-each></table> </body> </html></xsl:template></xsl:transform>
New example
• http://99-bottles-of-beer.ls-la.net/
• A web site with programs in over 300 different programming languages to display all verses to …
• This is my version in xml/xslt – They have another version that is xslt stand-
alone. Check it out.
Example<?xml version="1.0" ?><?xml-stylesheet href="lyrics.xsl" type="text/xsl"?><!DOCTYPE lyrics [<!ELEMENT lyrics (line1, line2, line3)><!ATTLIST lyrics start CDATA #REQUIRED> <!ELEMENT line1 (#PCDATA)><!ELEMENT line2 (#PCDATA)><!ELEMENT line3 (#PCDATA)>]><lyrics start="3"> <line1> bottles of beer on the wall</line1> <line2> bottles of beer</line2> <line3> take one down and pass it around</line3></lyrics>
could be 99
Document Type Definition
• defines what is a valid XML document• Validation can be done with external validator
http://www.stg.brown.edu/service/xmlvalid/xmlvalid.var
• Alternative to DTD is XML Schema– XML Schemas are XML trees.
– less advanced with respect to being official standard
Demonstration• Go tohttp://www.stg.brown.edu/service/xmlvalid/xmlvalid.var
• (Since it is short), copy and paste lyrics.xml into text area
• Click on validate– returns ok
• Now, change xml to NOT match DTD– remove start attribute– add element
• Click on validate– indicates problems
What do we want to do?• Transformation
– produce well-formed HTML– start with the 'start' attribute and, using it as a string,
output it as start of first line.– output as HTML the line.– using start as number, subtract one to get new value.
Using this value as a string, output with line1.– repeat process
• Implementation– The 'repeat' will be done as a recursive call, that is, a
template will call itself.– The template will be a named template, with parameter
the value, starting with the value of the start attribute.
Outline of the xsl file
• header/instructional stuff
• template that matches the main node (lyrics)
• the so-called named template to be called by the main template AND also called by itself
Technical note
• <xsl:copy> and <xsl:copy-of> copies information from the source document to the result.
• <xsl:copy> copies only the node whereas <xsl:copy-of> copies the node and any descendants (called a deep copy)
• In this example, either could be used.
<?xml version="1.0" encoding="UTF-8" ?><xsl:transform
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="html"/><xsl:template match="/lyrics"><html><head><title>Singing </title></head><body><xsl:call-template name="singverse"> <xsl:with-param name="counter"> <xsl:value-of select="@start"/> </xsl:with-param></xsl:call-template></body> </html></xsl:template>
templates can have parameters. Here one parameter is set with value of start attribute.
<xsl:template name="singverse"><xsl:param name="counter" /> <br/><xsl:copy-of select="$counter"/> <xsl:value-of select="line1" /><br/><xsl:copy-of select="$counter"/><xsl:value-of select="line2" /><br/><xsl:value-of select="line3" /> <br/><xsl:variable name="next" select="$counter - 1" /><xsl:copy-of select="$next"/> <xsl:value-of select="line1" /><br/> **** <br/><xsl:if test="$next >=1" ><xsl:call-template name="singverse"> <xsl:with-param name="counter" select="$next" /></xsl:call-template></xsl:if> </xsl:template> </xsl:transform>
User centered design• Build the interface/application for the person using it!
This is generally not you.• (Sometimes), it is important to distinguish between the
system owner and system users.• If possible, use a more descriptive name: client,
customer, patient, player, museum visitor, tourist, for users.
• Determine for the user(s), what is best– organization – vocabulary
• Determine for the user(s), what are the important/all – platform, access, etc.
Challenges
• More than one 'user' category– first time (novice) versus repeat (expert)
• System owners (the paying client) may want system to serve different audiences– intranet, employees at client locations, employees at
hotels and on planes, perhaps using cell phones….
• Web sites: visitors may enter site in different ways.– Search engines may make it important for what you
think of as inner pages to be stand-alone.
Data presentation
• Edward Tufte (and others) promote presentation of data that features the data as opposed to (what he calls) chartjunk.– You need data=content.
• Compare amount of space devoted to data versus everything else, including– descriptions, annotation, labels – illustration without content
• Also, make sure space for navigation is not overdone.
Tufte: Challenges of display
• 'life' is multi-dimensional, multi-variety but paper and screens are two-dimensional.– How do you escape flatland?
• Your content may require more resolution than you have, especially if limited by computer screens– How do you manage data density?
Solution: thoughtful, inventive, creative design!Design is clear thinking made visible.
Tufte advice
• What's the problem? Who cares? (why care)? What is solution?
• Particular – general – particular• Teaching by example—study books
– Minard march– Connecticut radar– Challenger disaster– (Columbia disaster)
Connecticut auto deaths
• Traffic deaths following (intervention) of radar – no context– Is it normal fluctuation, with 'normal'
regression to the mean after an extreme (outlier) year or the effects of policy?
– Need to look at context (in space and time)
Challenge Disaster
• Failure of---technical presentation
• The Morton Thiokol engineers did not want to approve the launch because they thought that the O-rings would not work in cold weather.– They made a presentation, which did not
succeed.– The launch went. Their prediction was correct!
Tufte's proposal
Principles
• Show visual comparisons. Try to make comparisons in space and not 'stacked in time'
• Show causality.• Show multi-variables/dimensions.• Integrate word, number and image.• Document: where did data come from? Annotate.• Everything depends on quality, relevance and
integrity of content.
Screen
• Term: screen real estate used to indicate the value of each part of the screen.
• You (system owners) may need to share screen with other organizations, for example, ad space.
• White space, that is, space with nothing on it, is valuable for clarity.
Data dimension
• The data that is worth presenting in graphics form (as opposed to clear text) is generally complex: multi-dimensional. more on this next class.
• Tufte (and others): don't give data dimensions it doesn't have.– recall 3D bar graph– recall army marching in and out of Russia.– New York Times interactives on 9/11 focused on time,
location in the towers, Fire Companies from different places in NYC. Audio was often real: police calls, calls on cell phones.
As with 3D bar charts when you only have points,
avoid rainbow, when the data is one-dimensional
(Note: shades of blue chart better for color-blind
visitors.)
My defense
• Tufte recommends: no PowerPoint (no charts with bullets)– I try for whole sentences.– I avoid decorations.– Charts are my notes (which I share with you).
Homework
• Find and report via CourseInfo on one of these or related topics: Tufte, visual presentation, good/bad use of graphs, user-centered design.
• Continue taking on-line XML/XSLT tutorials.• Do your own versions of XML/XSLT exercises• (Try writing (simple) DTD and doing validation.)