Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Digital.Humanities@Oxford Summer School 2012
edited by James Cummings and Sebastian Rahtz
July 2012
1
Digital.Humanities @ Oxford Summer School 2012
2
Digital Humanities @ Oxford
Contents
1 Overall Timetable 4
2 Introduction 6
3 Full Programme 73.1 Monday 2 July 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Tuesday 3 July 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Wednesday 4 July 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.4 Thursday 5 July 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103.5 Friday 6 July 2012 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4 Workshop Abstracts 134.1 An Introduction to XML and the Text Encoding Initiative . . . . . . . . . . . . . . . . . 134.2 Working with TEI Texts (Advanced) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.3 An Introduction to Digital Humanities Tools and Approaches . . . . . . . . . . . . . . . 134.4 A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the
Semantic Web . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5 Workshop: An Introduction to XML and the Text Encoding Initiative 155.1 Timetable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.2 Exercise 1: Create an XML Document . . . . . . . . . . . . . . . . . . . . . . . . . . . 165.3 Exercise 2: Create a TEI Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.4 Exercise 3: Improving a teiHeader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.5 Exercise 4: Marking Up Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.6 Exercise 5: Creating a Manuscript Description . . . . . . . . . . . . . . . . . . . . . . . 365.7 Exercise 6: Transcribing with the TEI . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.8 Exercise 7: Encoding Spoken Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.9 Exercise 8: Linguistic Markup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.10 Exercise 9: Customise the TEI with Roma . . . . . . . . . . . . . . . . . . . . . . . . . 595.11 Exercise 10: OxGarage and the TEI Community . . . . . . . . . . . . . . . . . . . . . . 665.12 TEI reference material: summary of elements . . . . . . . . . . . . . . . . . . . . . . . 715.13 Wilfred Owen: Letter To Leslie Gunston . . . . . . . . . . . . . . . . . . . . . . . . . . 955.14 Wilfred Owen: Preface MS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 975.15 Stuart Lee interviews Ian Hislop (fragment) . . . . . . . . . . . . . . . . . . . . . . . . 98
6 Workshop: Working with TEI Texts 1016.1 Timetable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1016.2 Data samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1026.3 Getting better quality TEI XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1246.4 XSLT transformations for genetic editions . . . . . . . . . . . . . . . . . . . . . . . . . 1246.5 Grouping Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.6 Using XQuery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.7 Using TEI stylesheet family . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256.8 TEI reference material: XSL stylesheets . . . . . . . . . . . . . . . . . . . . . . . . . . 1286.9 Quick reference cards for XSLT, XQuery, XPath, Regular Expressions, and Schematron . 155
7 Workshop: An Introduction to Digital Humanities Tools and Approaches 1687.1 Corpus Linguistics and Text Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
8 Workshop: A Humanities Web of Data: Publishing, Linking, Querying and Visualisationon the Semantic Web 181
3
Overall Timetable
1 Overall Timetable
4
5
Introduction
2 IntroductionThe Digital.Humanities@Oxford Summer School (DHOXSS) 2012 takes places from 2nd - 6thJuly at the University of Oxford. DHOXSS delegates will be introduced to a range of topics suitablefor researchers, project managers, research assistants, and students who are interested in the creation,management, or publication of digital data in the humanities.
Delegates will follow one of our 5 day workshops on:
• An Introduction to XML and the Text Encoding Initiative
• Working with TEI Texts (Advanced)
• An Introduction to Digital Humanities Tools and Approaches
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb
Each day will also contain plenary guest lectures by experts in their fields, plus sessions on a widevariety of Digital Humanities topics. There will be morning surgery sessions to discuss projects andpossibilities with tutors. The summer school is a collaboration for Digital.Humanities@Oxford betweenOxford University Computing Services (OUCS), Oxford e-Research Centre (OeRC), with the assistanceof the Humanities Division, the Bodleian Libraries, the Oxford Internet Institute, and e-Research South.The DHOXSS is organized by James Cummings and Sebastian Rahtz at OUCS and Erin Snyder atOeRC.
The Summer School will be located at Merton College, OUCS, and the OeRC, all situated in thecentre of Oxford.
6
3 Full Programme3.1 Monday 2 July 20123.1.1 09:30 - 10:00: RegistrationRegistration will take place in the foyer of the TS Eliot Lecture Theatre at Merton College from 09:30 -10:00 on Monday Morning. Registration may be available at other times by prior arrangement.
3.1.2 10:00 - 11:00: Plenary LecturePlenary Lecture: Crowdsourcing in the Humanities Chris Lintott (Zooniverse)
3.1.3 11:00 - 11:30: Tea BreakTea Break will take place is the foyer of the TS Eliot Lecture Theatre at Merton College.
3.1.4 11:30 - 12:30: Workshops – Introductory Lectures• An Introduction to XML and the Text Encoding Initiative – David Harvey Room, Merton College
• Working with TEI Texts (Advanced) – Ian Taylor Room, Merton College
• An Introduction to Digital Humanities Tools and Approaches: "Corpus and Text Analysis forResearch in the Humanities", Martin Wynne (OUCS and OeRC) – TS Eliot Lecture Theatre,Merton College
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb – Sir Howard Stringer Room, Merton College
3.1.5 12:30 - 13:30: LunchLunch will be in the foyer of the TS Eliot Lecture Theatre
3.1.6 13:30 - 14:00: Travel Time to OUCSThe computer-based practical aspects of the workshops will take place in the Thames Suite of the OxfordUniversity Computing Services, 13 Banbury Road, Oxford, OX2 6NN. Leave adequate time to walkthere from Merton College.
3.1.7 14:00 - 16:00: Workshops – Practical• An Introduction to XML and the Text Encoding Initiative – Evenlode Room, OUCS
• Working with TEI Texts (Advanced) –Cherwell Room, OUCS
• An Introduction to Digital Humanities Tools and Approaches: "Dealing with the Data Deluge:Corpus Linguistics for Text-Based Research", Martin Wynne (OUCS and OeRC) – Isis Room,OUCS
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb – Windrush Room, OUCS
3.1.8 16:00 - 16:30: Tea BreakThe Tea Break and Parallel Sessions will be held at the Oxford e-Research Centre, 7 Keble Road Oxford,OX1 3QG. Tea Break will be in the OeRC Atrium.
3.1.9 16:30 - 17:30: Parallel SessionsYou have a free choice on the day of which session to attend:
• Parallel Session 1: Oxford adventures in crowdsourcing: models for engaging communitiesand enhancing digital collections Kate Lindsay (OUCS) and David Tomkins (Bodleian) – OeRCLecture Theatre B
• Parallel Session 2: Creating Digital Data Resources: Issues to consider David Robey (OeRC) –OeRC Conference Room
7
Full Programme
3.1.10 19:00 - : Drinks ReceptionA free reception with drinks and nibbles will take place from 19:00 on the Sundial Lawn at MertonCollege (or in case of rain, the TS Eliot Foyer).
3.2 Tuesday 3 July 20123.2.1 09:30 - 10:00: Surgery A (Optional)Surgery A is Focus Group on Sustainability and EEBO-TCP by Judith Siefring (Bodleian) – Sir HowardStringer Room, Merton College.
3.2.2 10:00 - 11:00: Plenary LecturePlenary Lecture: Humanities Research Data – Rate me! Wolfram Horstmann (Bodleian) – TS EliotLecture Theatre, Merton College.
3.2.3 11:00 - 11:30: Tea BreakTea Break will take place is the foyer of the TS Eliot Lecture Theatre at Merton College.
3.2.4 11:30 - 12:30: Workshops – Introductory Lectures• An Introduction to XML and the Text Encoding Initiative – David Harvey Room, Merton College
• Working with TEI Texts (Advanced) – Ian Taylor Room, Merton College
• An Introduction to Digital Humanities Tools and Approaches: "The Dangers and Delights of DataMining", Glenn Roe (OeRC) – TS Eliot Lecture Theatre, Merton College
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb – Sir Howard Stringer Room, Merton College
3.2.5 12:30 - 13:30: LunchLunch will be in the foyer of the TS Eliot Lecture Theatre
3.2.6 13:30 - 14:00: Travel Time to OUCSThe computer-based practical aspects of the workshops will take place in the Thames Suite of the OxfordUniversity Computing Services, 13 Banbury Road, Oxford, OX2 6NN. Leave adequate time to walkthere from Merton College.
3.2.7 14:00 - 16:00: Workshops – Practical• An Introduction to XML and the Text Encoding Initiative – Evenlode Room, OUCS
• Working with TEI Texts (Advanced) –Cherwell Room, OUCS
• An Introduction to Digital Humanities Tools and Approaches: "A Practical Introduction to TextMining", Glenn Roe (OeRC) – Isis Room, OUCS
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb – Windrush Room, OUCS
3.2.8 16:00 - 16:30: Tea BreakThe Tea Break and Parallel Sessions will be held at the Oxford e-Research Centre, 7 Keble Road Oxford,OX1 3QG. Tea Break will be in the OeRC Atrium.
3.2.9 16:30 - 17:30: Parallel Sessions• Parallel Session 3: The other 99%: two approaches to project modelling Pip Willcox (Bodleian)
– – OeRC Conference Room
• Parallel Session 4: Encoding Music Text and Text with Music Raffaele Viglianti (King’s CollegeLondon) – OeRC Lecture Theatre B
8
3.3 Wednesday 4 July 2012
3.3 Wednesday 4 July 20123.3.1 09:30 - 10:00: Surgery B (Optional)Surgery B is Surgery B: Text Encoding Project Advice James Cummings (OUCS) – Sir Howard StringerRoom, Merton College.
3.3.2 10:00 - 11:00: Plenary LecturePlenary Lecture: Social Machines Dave DeRoure (OeRC) – TS Eliot Lecture Theatre, Merton College.
3.3.3 11:00 - 11:30: Tea BreakTea Break will take place is the foyer of the TS Eliot Lecture Theatre at Merton College.
3.3.4 11:30 - 12:30: Workshops – Introductory Lectures• An Introduction to XML and the Text Encoding Initiative – David Harvey Room, Merton College
• Working with TEI Texts (Advanced) – Ian Taylor Room, Merton College
• An Introduction to Digital Humanities Tools and Approaches: "Introduction to Markup", LouBurnard (Adonis TGE) – TS Eliot Lecture Theatre, Merton College
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb – Sir Howard Stringer Room, Merton College
3.3.5 12:30 - 13:30: LunchLunch will be a buffet in Merton College Hall.
3.3.6 13:30 - 14:00: Travel Time to OUCSThe computer-based practical aspects of the workshops will take place in the Thames Suite of the OxfordUniversity Computing Services, 13 Banbury Road, Oxford, OX2 6NN. Leave adequate time to walkthere from Merton College.
3.3.7 14:00 - 16:00: Workshops – Practical• An Introduction to XML and the Text Encoding Initiative – Evenlode Room, OUCS
• Working with TEI Texts (Advanced) –Cherwell Room, OUCS
• An Introduction to Digital Humanities Tools and Approaches: "TEI a la Carte", Lou Burnard(Adonis TGE) – Isis Room, OUCS
• A Humanities Web of Data : Publishing, Linking, Querying and Visualisation on the SemanticWeb – Windrush Room, OUCS
3.3.8 16:00 - 16:30: Tea BreakThe Tea Break and Parallel Sessions will be held at the Oxford e-Research Centre, 7 Keble Road Oxford,OX1 3QG. Tea Break will be in the OeRC Atrium.
3.3.9 16:30 - 17:30: Parallel Sessions• Parallel Session 5: Copyright and Open Licensing Rowan Wilson (OUCS) – OeRC Lecture
Theatre B
• Parallel Session 6: Silos and Street-Literature: Digitising and Linking Cheap Print Collectionsand Traditions Giles Bergel (Merton College and English Faculty) – OeRC Conference Room
3.3.10 19:00 - : BanquetA table-service banquet will take place in Merton College Hall for those who selected this additionaloption when registering and paying.
9
Full Programme
3.4 Thursday 5 July 20123.4.1 09:30 - 10:00: Surgery C (Optional)Surgery C is Web Project and Data Modelling James Cummings (OUCS), Alexander Dutton (OUCS),Monica Messaggi-Kaya (Bodleian), Pip Willcox(Bodleian) – Sir Howard Stringer Room, MertonCollege.
3.4.2 10:00 - 11:00: Plenary LecturePlenary Lecture: Linked Data in the Humanities: An Open-and-Shut Case? Elton Barker (OpenUniversity) and Leif Isaksen (University of Southampton) – TS Eliot Lecture Theatre, Merton College.
3.4.3 11:00 - 11:30: Tea BreakTea Break will take place is the foyer of the TS Eliot Lecture Theatre at Merton College.
3.4.4 11:30 - 12:30: Workshops – Introductory Lectures• An Introduction to XML and the Text Encoding Initiative – David Harvey Room, Merton College
• Working with TEI Texts (Advanced) – Ian Taylor Room, Merton College
• An Introduction to Digital Humanities Tools and Approaches: "Working with Digital Images",Segolene Tarte (OeRC) – TS Eliot Lecture Theatre, Merton College
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb – Sir Howard Stringer Room, Merton College
3.4.5 12:30 - 13:30: LunchLunch will be a buffet in Merton College Hall.
3.4.6 13:30 - 14:00: Travel Time to OUCSThe computer-based practical aspects of the workshops will take place in the Thames Suite of the OxfordUniversity Computing Services, 13 Banbury Road, Oxford, OX2 6NN. Leave adequate time to walkthere from Merton College.
3.4.7 14:00 - 16:00: Workshops – Practical• An Introduction to XML and the Text Encoding Initiative – Evenlode Room, OUCS
• Working with TEI Texts (Advanced) –Cherwell Room, OUCS
• An Introduction to Digital Humanities Tools and Approaches: "Exploring and Extracting Infor-mation from Images", Segolene Tarte (OeRC) – Isis Room, OUCS
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb – Windrush Room, OUCS
3.4.8 16:00 - 16:30: Tea BreakThe Tea Break and Parallel Sessions today, for a change, will be held at OUCS.
3.4.9 16:30 - 17:30: Parallel SessionsIn OUCS for a change:
• Parallel Session 7: Impact as a process: Understanding and enhancing the reach of digitalresources Eric Meyer (OII) and Kathryn Eccles (OII) – Evenlode Room, OUCS
• Parallel Session 8: Discoverability, Accessibility, and Machine-Readability Joseph Talbot (OUCS)– Isis Room, OUCS
10
3.5 Friday 6 July 2012
3.4.10 19:00 - : Drinks ReceptionA free reception with drinks and nibbles (included in registration charge) will take place from 19:00 atthe Oxford University Museum of Natural History,
3.5 Friday 6 July 20123.5.1 09:30 - 10:00: Surgery D (Optional)Surgery D is Surgery D: Making funding proposals for digital projects Martin Wynne (OUCS andOeRC)– Sir Howard Stringer Room, Merton College.
3.5.2 10:00 - 11:00: Plenary LecturePlenary Lecture: Making the Digital Human: Anxieties, Possibilities, and Challenges Andrew Prescott(King’s College London) – TS Eliot Lecture Theatre, Merton College.
3.5.3 11:00 - 11:30: Tea BreakTea Break will take place is the foyer of the TS Eliot Lecture Theatre at Merton College.
3.5.4 11:30 - 12:30: Workshops – Introductory Lectures• An Introduction to XML and the Text Encoding Initiative – David Harvey Room, Merton College
• Working with TEI Texts (Advanced) – Ian Taylor Room, Merton College
• An Introduction to Digital Humanities Tools and Approaches: "Don’t Waste Space: How GIScan Aid Digital Humanities Research", Chris Green (Archaeology) – TS Eliot Lecture Theatre,Merton College
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb – Sir Howard Stringer Room, Merton College
3.5.5 12:30 - 13:30: LunchLunch will be a buffet in Merton College Hall.
3.5.6 13:30 - 14:00: Travel Time to OUCSThe computer-based practical aspects of the workshops will take place in the Thames Suite of the OxfordUniversity Computing Services, 13 Banbury Road, Oxford, OX2 6NN. Leave adequate time to walkthere from Merton College.
3.5.7 14:00 - 16:00: Workshops – Practical• An Introduction to XML and the Text Encoding Initiative – Evenlode Room, OUCS
• Working with TEI Texts (Advanced) –Cherwell Room, OUCS
• An Introduction to Digital Humanities Tools and Approaches: "Spatial Awareness: A BriefIntroduction to ArcGIS", Chris Green (Archaeology) – Isis Room, OUCS
• A Humanities Web of Data: Publishing, Linking, Querying and Visualisation on the SemanticWeb – Windrush Room, OUCS
3.5.8 16:00 - 16:30: Tea BreakThe Tea Break and Parallel Sessions will be held at the Oxford e-Research Centre, 7 Keble Road Oxford,OX1 3QG. Tea Break will be in the OeRC Atrium.
11
Full Programme
3.5.9 16:30 - 17:30: Parallel Sessions• Parallel Session 9: Digital Library Technologies and Best Practice Neil Jefferies (Bodleian) and
Christine Madsen (Bodleian) – OeRC Conference Room
• Parallel Session 10: Panel: Running Digital Humanities Summer Schools James Cummings(OUCS), Sebastian Rahtz (OUCS), Ray Siemens (University of Victoria), Erin Snyder (OeRC),John Pybus (OeRC) – OeRC Lecture Theatre B
12
4 Workshop Abstracts4.1 An Introduction to XML and the Text Encoding InitiativeThis introductory workshop will balance lectures with hands-on practical sessions to introduce therecommendations of the Text Encoding Initiative (TEI) for encoding of digital text. The workshopcombines in-depth coverage of the latest version of the TEI P5 Guidelines for the encoding of digitaltext with practical exercises to reinforce the topics covered. It provides an introduction to mark-up, explanations of various aspects of the TEI Guidelines and approaches to publishing TEI texts.Major aspects surveyed will include: basic TEI elements, metadata, names of people and places,manuscript transcription and description, linguistic analysis, and customisation of the TEI. Numerouspractical exercises expose you hands-on experience of a wide range of TEI editing, customisation, andpublication.
Tutors: James Cummings, Renée Baalen, Ylva Berglund-Prytz
4.2 Working with TEI Texts (Advanced)This advanced workshop will teach how to do something practical with your TEI XML texts beyondsimply converting them to HTML and putting them on the web. A mixture of talks and practical exerciseswill take participants through:
• Advanced validation and integrity checking using TEI ODD, Schematron and XSLT
• Transforming your TEI XML to formats other than HTML (Word, ePub, LaTeX etc)
• Extracting data from TEI texts for further analysis (eg names and places)
• Processing some more complex TEI documents (eg genetic encoding and timelines)
• Storing TEI documents in an XML database and querying them
Requirements: You must already have a good basic knowledge of XML, TEI and some familiaritywith programming/scripting ideas. Most of the work will be based on XSLT and XPath.
Tutors: Sebastian Rahtz, Raffaele Viglianti
4.3 An Introduction to Digital Humanities Tools and ApproachesThis workshop will introduce key research areas in the digital humanities, including language tools, textmining, image analysis, and use of geo-spatial data. The lecture sessions will emphasize the researchpotential of each area, discuss the theoretical implications of modelling data through these methods,and provide guidance about how these techniques are most usefully adapted to humanities research. Theworkshops will focus on actively addressing research questions, providing datasets and guidance on howto begin to conduct research with these tools. The course is conceived as a wide-ranging introduction tosome of the most exciting areas in digital humanities research, and will enable its participants to quicklybecome familiar with the possibilities and processes of conducting research in these areas.
• Monday – Martin WynneLecture: Corpus and Text Analysis for Research in the HumanitiesWorkshop: Dealing with the Data Deluge: Corpus Linguistics for Text-Based Research
• Tuesday – Glenn Roe:Lecture: The Dangers and Delights of Data MiningWorkshop: A Practical Introduction to Text Mining
• Wednesday – Lou Burnard:Lecture: Introduction to MarkupWorkshop: TEI a la Carte
13
Workshop Abstracts
• Thursday – Segolene Tarte:Lecture: Working with Digital ImagesWorkshop: Exploring and Extracting Information from Images
• Friday – Chris Green:Lecture: Don’t Waste Space: How GIS can Aid Digital Humanities ResearchWorkshop: Spatial Awareness: A Brief Introduction to ArcGIS
Tutors: Erin Snyder, Christopher Green, Glenn Roe, Segolene Tarte, Martin Wynne
4.4 A Humanities Web of Data: Publishing, Linking, Querying andVisualisation on the Semantic Web
This workshop will introduce the Semantic Web and show how to publish your data so that it is availableas Linked Open Data within the web of data. Topics covered will include: the RDF format; modellingyour data and publishing to the web; querying RDF data using SPARQL; choosing and designingvocabularies and ontologies, and more.
Tutors: John Pybus, Alexander Dutton, Kevin Page
14
5.1 Timetable
5 Workshop: An Introduction to XML and the Text EncodingInitiative
5.1 Timetable
Time Monday Tuesday Wednesday Thursday FridayMorning(1hr)
XML andTEI [JC]
TEI Metadata[JC]
MS Description[JC]
Spoken Texts[YB]
Customising theTEI [RB]
Practical 1 Createan XMLDocument
Improving aTEI Header
Adding a MSDescription
TranscribingSpeech
Customise theTEI with Roma
Talk (1hr) TEI CoreModule[JC]
Names,People, Places[RB]
Transcription,Facsimile andGenetic Editing[JC]
LinguisticAnalysis andTools [YB]
Talk: Trans-forming the TEI[JC]
Practical 2 Create a TEIDocument
Marking upNames andPeople
TranscriptionExercise
LinguisticAnalysis
OxGarageand the TEICommunity
15
Workshop: An Introduction to XML and the Text Encoding Initiative
5.2 Exercise 1: Create an XML Document5.2.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• mark up an XML declaration
• insert a text file into an XML editor
• mark up basic features of a poem
• create a well-formed XML document
5.2.2 SummaryThis exercise will walk you through creating an XML document in the oXygen editor and introducea variety of ways to mark this document up. You will first start a new document, then insert someunmarked up text into the editor, and then mark up the stanzas or line-groups (lg) and lines (l). You willlearn to check that your document is well-formed or not.
5.2.3 Starting A New XML FileLet’s start a new XML file by following the following steps:
• Load up the oXygen XML Editor if it isn’t already loaded by using the Windows Start Menu, ordouble-clicking the icon on the desktop.
• Once the editor has fully loaded from the ’File’ menu select ’New’ and under ’New Document’select ’XML Document’. This should open up a blank document with an XML Declaration added.
• An XML Declaration looks like:
<?xml version="1.0" encoding="UTF-8"?>
and The XML declaration in the element tells anything processing your XML file, including theeditor, that this is an XML file and what version of XML you are using through the @versionattribute. It also conveys which characters the program may expect in attribute @encoding.XML version 1.0 is a W3C recommendation from 2008. UTF-8 (Universal Character SetTransformation Format - 8 bit) contains most characters from all human writing systems. TheXML declaration needs no closing tag as it takes the form of a special processing-instruction thatstarts and ends with an angle-bracket and a question mark.
5.2.4 Creating a DivisionLet’s create a division of a text using the <div> element. This is a generic division or section element.
• On the line below the XML declaration type: <div>.
• Notice what happens when you type the final ’>’. oXygen is trying to help you and inserts in theclosing </div> tag. This is because it knows the rules of XML, and knows that if you type anopening <div> you are required to have a closing </div> sooner or later.
• We haven’t said what type of division this is, so lets categorise it as ’verse’ by adding a @typeattribute. Move the cursor back until your just after the letter ’v’ in the opening tag. Press space,and then type: type=" and notice what happens when you type the quotation mark. oXygenis again trying to help you by putting the closing quotation mark, because it knows that attributevalues must always be quoted.
• In between the quotation marks type ’verse’ to categorise our division as being verse.
• Move back until you are directly in between the opening <div> and closing </div>. Press’enter’ a couple times to give yourself some space inside the element.
16
5.2 Exercise 1: Create an XML Document
5.2.5 Inserting Some TextWe are going to use the Wilfred Owen poem Strange Meeting as an example for this exercise. But itwould waste a lot of time if we asked you to type the whole poem in, so we’ve done that for you.
• Make sure your cursor is in-between the opening <div> and the closing </div> and go to theDocument menu and select ’File’ and from there then ’Insert File’. Note: This is from the’Document’ menu on the menu bar, not the ’File’ one.
• Select ’strange-meeting.txt’ as the file to insert.
• The start of your document should look like:
<?xml version="1.0" encoding="UTF-8"?><div type="verse">STRANGE MEETING
It seemed that out of battle I escapedDown some profound dull tunnel, long since scoopedThrough granites which titanic wars had groined.
[...a lot more text...]</div>
5.2.6 Encoding the Heading (using ’Surround with Tags’)The text ’STRANGE MEETING’ at the top of the poem is obviously a heading. The TEI <head>element should be used to mark this. To mark this do the following:
• Highlight the text ’STRANGE MEETING’ with the mouse.
• Either press control-e as a shortcut key, or right-click and under ’Refactoring’ select ’Surroundwith Tags’. A box should pop up and type head into it. Notice how oXygen helps you again byputting the opening tag before what you had highlighted and the close tag afterwards.
5.2.7 Marking Stanzas (using both ’Surround with Tags’ and ’Split Element’)Let’s mark the stanzas that appear doing the following steps:
• Highlight the first stanza, from "It seemed" to "had groined".
• Using control-e, or the menus, as you did above, mark this stanza as an <lg> element.
• Add a @type attribute with a value of ’stanza’ to the <lg> element so it looks like: <lgtype="stanza">.
• The start of your document should now look like:
<?xml version="1.0" encoding="UTF-8"?><div type="verse"><head>STRANGE MEETING</head><lg type="stanza"> It seemed that out of battle I escaped
Down some profound dull tunnel, long since scoopedThrough granites which titanic wars had groined.
</lg>
[...a lot more text...]
</div>
17
Workshop: An Introduction to XML and the Text Encoding Initiative
• But if we have lots of stanzas, marking each one of them seems a lot of work, but there is a(possibly) easier way.
• Highlight the entire rest of the poem, from "Yet also there" to "Let us sleep now....’", and thensurround all of it in an <lg> element (by pressing control-e)
• Of course it is silly to have the entire rest of the poem marked as a single line-group, but go andadd a type="stanza" attribute to the opening tag.
• If you move the cursor to just before the start of each stanza, e.g. just before where it says "With athousand pains", and press alt-shift-d (or select Refactoring -> Split Element from the right-clickmenu), oXygen should split the <lg> element, ending it here and starting it just before wherethere cursor is located.
• Do this for other stanzas that are not marked yet.
5.2.8 Marking LinesWe’ve marked all the stanzas but we’ve not marked the lines.
• Highlight the first line in the first stanza, press control-e to surround with a tag, and type ’l’ as theelement name. (<l> is the line element, meaning a line of metrical verse).
• It might be a bit painful to mark up each and every line this way, you could try using the split-element technique above, but there is another shortcut to try as well. Highlight the second lineand press control-/ and notice that oXygen has wrapped the line in a <l> element. The reason forthis is that control-/ is the ’surround with the last element I surrounded something with’ shortcutkey.
• Using this technique, quickly mark all the remaining lines.
5.2.9 Format and IndentOur poem is marked up, but some of the markup might be a bit messy.
• Make sure that your file is ’well-formed’. You’ll be able to tell it is well-formed because oXygenwill have a happy green square in the upper right-hand corner. If it is red, you better find theproblem (where a red bar on the right-hand side is) and correct the mistake!
• Now let’s format and indent our file. This tidies up some of the whitespace and indents elementsbased on their place in the hierarchy. Either select the ’Format and Indent’ icon from the toolbar(it looks like some indented lines), or go to the menus: ’Document’ -> ’Source’ -> ’Format andIndent’.
• Formatting and indenting your markup is not necessary, it could all be on one big long line, but itmakes it much easier for other people to read.
5.2.10 Saving Your WorkLet’s save our work:
• Is your work well-formed? Do you have a happy green square or an angry red one?
• From the ’File’ menu select ’Save’ or click on the Save icon (looks like an old-style 3.5" disk)
• Save the file using the name ’exercise01.xml’ or another name of your choice.
18
5.2 Exercise 1: Create an XML Document
5.2.11 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions to yourself:
• How do you start a new XML document in oXygen?
• What is an XML declaration?
• What is a well-formed document?
• How do I ’Surround with tag’ and repeat that action quickly?
• Why might using the ’Split element’ approach be useful?
• What is the function of each element and attribute in your current file?
• What is the advantage of formatting and indenting your markup?
5.2.12 Next?Your XML file may be well-formed but it is not yet valid because it doesn’t validate against a particularschema (such as those which are customisations of the TEI). Next we will have a short introduction tothe structure of TEI documents and some of the most frequently used elements. If you are finished earlyyou may wish to browse through the TEI Guidelines online at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html. In particular you might want to look at the Elements appendix of referencepages for individual elements. Consider looking up all the elements you’ve used in this file to see howthey are defined.
19
Workshop: An Introduction to XML and the Text Encoding Initiative
5.3 Exercise 2: Create a TEI Document5.3.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• discern the elements and attributes needed for a minimum valid TEI XML file
• associate a TEI XML file with a schema
• have used the TEI namespace
• create a minimum TEI header and text body
• check for both validity and well-formedness
5.3.2 SummaryThis exercise will walk you through creating a TEI XML file and inserting the work you did previouslyinto it. You’ll learn about the required aspects of the <teiHeader> and the basic structure of a TEIfile.
5.3.3 Start a New XML FileFollow the same steps you did for the first exercise to start a new blank XML file. Although we couldstart a file with a TEI P5 template, for this particular exercise that would be cheating!
• Load up the oXygen XML Editor if it isn’t already loaded by using the Windows Start Menu, ordouble-clicking the icon on the desktop.
• Once the editor has fully loaded from the ’File’ menu select ’New’ and under ’New Document’select ’XML Document’. This should open up a blank file with an XML Declaration added.
• An XML Declaration looks like:
<?xml version="1.0" encoding="UTF-8"?>
5.3.4 Inserting a <TEI> ElementAll TEI files start either with a <TEI> element or a <teiCorpus> element. In most cases you’llwant a <TEI> element. These elements have a special psuedo-attribute called ’xmlns’ that indicates thenamespace a set of elements are from. This is inherited by any elements inside it (unless overridden).This is how we can be sure we’re talking about, say, a <title> element from the TEI rather than anyother schema.
• Add a <TEI> element and then add it to the TEI namespace (http://www.tei-c.org/ns/1.0). Maybeadd a few blanks line between the starting and closing tag. Your file should look like:
<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="http://www.tei-c.org/ns/1.0">
</TEI>
• Notice what happens in oXygen and how it helps you input this. Also notice that your file maynow have an angry red square rather than a happy green one! Is your file well-formed? (yes, it is!)Why is this red then?
20
5.3 Exercise 2: Create a TEI Document
• If it is red it is because your version of oXygen is prepackaged with all sorts of TEI goodness, andin this case it recognises that files starting with <TEI> in the TEI namespace are to be associatedautomatically with a TEI schema that it has stored. It is complaining that you do not have a<teiHeader> in your file because all valid TEI files must have this.
5.3.5 Adding a <teiHeader>
Inside the <TEI> element we need to add a <teiHeader> element.
• Put the cursor between the starting and closing <TEI> element and type in a <teiHeader>element. Notice that oXygen provides the closing </teiHeader> element. If the correct optionis set in oXygen, it understands the TEI schema and knows that certain content is required insidea <teiHeader>. It can automatically provide that markup. If not, you’ll have to type it in. Theresulting file should look like:
<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="http://www.tei-c.org/ns/1.0"><teiHeader><fileDesc><titleStmt><title> </title></titleStmt><publicationStmt/><sourceDesc/></fileDesc></teiHeader></TEI>
• Notice that your file still has an angry red square rather than a happy green square. This isbecause there is still stuff needed even though you’ve added some markup. First, add a titleof something like "My ’Strange Meeting’ document" by adding this text between the starting andclosing <title> tags. There are other elements which are allowed here in <titleStmt>such as <author> (Wilfred Owen), <editor> (Jon Stallworthy), that you could add but aren’treally required for this exercise. You could use the more general <respStmt> (with a <name>element with your name and a <resp> element with something like ’TEI P5 Encoding’ in it) torecord your own work if you wish, but as with the other embellishments this isn’t necessary forthis exercise.
• Then add a paragraph <p> inside the <publicationStmt> with some text to record what thisfile is for, perhaps something like "An exercise for learning TEI."
• Inside sourceDesc we should add a <p> with some text like: "The primary resourceof this file is Strange Meeting from Jon Stallworthy’s edition, available on the FirstWorld War Poetry Digital Archive." To make this even better, we might surround thetitle ’Strange Meeting’ with a <ref> element with a @target attribute with a value of’http://www.oucs.ox.ac.uk/ww1lit/collections/item/3350’ because that is URL from which wegot this text.
• Your <teiHeader> should now look something like:
<teiHeader><fileDesc><titleStmt><title>My ’Strange Meeting’ document</title>
</titleStmt><publicationStmt><p>An exercise for learning TEI.</p>
21
Workshop: An Introduction to XML and the Text Encoding Initiative
</publicationStmt><sourceDesc><p> The primary resource of this file is <ref
target="http://www.oucs.ox.ac.uk/ww1lit/collections/item/3350">StrangeMeeting</ref> from Jon Stallworthy’s edition, available on the First
WorldWar Poetry Digital Archive. </p>
</sourceDesc></fileDesc>
</teiHeader>
• Notice that even though this is a complete <teiHeader> with all the required aspects, our fileas a whole isn’t valid.
5.3.6 Add a <text>
All TEI files, in addition to a <teiHeader> with <fileDesc> containing a <titleStmt>,<publicationStmt>, and <sourceDesc>, need to follow the header with at least one of:<sourceDoc>, <facsimile>, or <text>. In our case we’re going to add a <text> element.To do this:
• Add a couple of blank lines after the closing </teiHeader>.
• Insert a <text> element and inside that a <body> element. (The <text> element requires a<body> element because if you don’t have a text body, what are you encoding?)
• The <text> section of the file should look something like:
<text><body>
<!–We will put our poem here –></body>
</text>
5.3.7 Adding Our PoemThis is a good start but we need to put something inside the body. Luckily, we have already encoded apoem in the previous exercise, so we can use that!
• With the cursor in between the opening and closing <body> tags go to the ’Document’ menu onthe menu bar, and select ’File’, and ’Insert File’. Select the file you saved earlier if you finishedthe first exercise. If you didn’t then in the spoilers directory there is a file called ’ex01.xml’ whichhas the completed first exercise.
• But wait, as soon as you’ve added this we get a bit of a problem! oXygen will complain that we’vegot an XML declaration in the middle of our file. Delete this redundant XML declaration!
• Your document should now be valid and have a happy green square in the upper right-hand corner!If it isn’t, try to solve the problem by looking at the error message that is provided.
5.3.8 Saving Your WorkLet’s save our work:
• Have you formatted and indented your work automatically?
• Is your work well-formed? Do you have a happy green square or an angry red one?
• From the ’File’ menu select ’Save’ or click on the Save icon (looks like an old-style 3.5" disk)
• Save the file using the name ’exercise02.xml’ or another name of your choice.
22
5.3 Exercise 2: Create a TEI Document
5.3.9 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions:
• Which elements and attributes do you need for a minimum valid TEI XML document?
• What three parts of the <teiHeader> are required in all TEI conformant documents?
• Where are these elements and attributes allowed?
• What is the function of each element and attribute you’ve used?
• Why do you think these elements and attributes are required in TEI XML?
5.3.10 Next and More ReadingThis exercise and the previous one should have given you some experience editing XML and making avalid TEI file. Next time we’ll get an more in-depth introduction to various other TEI modules and learnmore about the <teiHeader>.
• If you are finished early you may wish to browse through the TEI Guidelines online athttp://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html.
• In particular you might want to look at the Elements appendix of reference pages for individualelements. Consider looking up all the elements you’ve used in this file to see how they are defined.
• What other elements are allowed inside the <text> element? What would you use them for?
• What other parts of the <teiHeader> are there? What are they for?
• You may wish to read the chapters on Default Text Structure http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html or Elements Available to All TEI Doccuments http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html.
23
Workshop: An Introduction to XML and the Text Encoding Initiative
5.4 Exercise 3: Improving a <teiHeader>5.4.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• read through and analyse encoding in an existing TEI file.
• improve the structure and metadata of a <teiHeader>.
• understand the components of a <fileDesc> including:
– <titleStmt> for title and intellectual responsibility.
– <publicationStmt> for information about the publication and distribution of theelectronic item.
– <sourceDesc> to record metadata about the source document.
• use the <encodingDesc> to record the markup used in the file.
• use the <profileDesc> to record non-biliographic aspects of the file.
• record major changes to the file in the <revisionDesc>.
5.4.2 SummaryThis exercise gives you a chance to read through a TEI XML file you have not encoded and understandits markup and structure. It walks you through improvements to various aspects of the <teiHeader>and how to record additional metadata about the electronic file and its sources.
5.4.3 Starting UpIn this case we’re starting with a sample file that we have created for you. Load up the file called: ’letter-to-LG.xml’ in the oXygen XML editor. Check that the file is well-formed and valid. Note the line nearthe top of the file:
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng"schematypens="http://relaxng.org/ns/structure/1.0"?>
This is what tells the oXygen editor that it should be validating this file with the tei_all schema.
5.4.4 Reading through the fileThis file contains a letter from Wilfred Owen to Leslie Gunston. It talks about a forthcoming addressto the Field Club, and contains a partial draft of ’The Wrestlers’. It was written in July 1917 fromCraiglockhart War Hospital, Edinburgh, Scotland. Images of this letter are available in your DHOXSSbooklet as well as in the materials we’ve provided.
• Note the very minimal <teiHeader>.
• Look at the structure of the document as three divisions and make sure you understand thesedivisions.
• Note the use of the <dateline> element.
• See how the encoder has recorded line-breaks in the prose.
• What other elements has the encoder included? Make sure you understand the meaning of them.If you are unsure of the meaning of them, look them up on the TEI-C website at: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/REF-ELEMENTS.html
24
5.4 Exercise 3: Improving a teiHeader
5.4.5 Improving the <titleStmt>
As you can see the <teiHeader> is lacking a lot of information. Let’s improve it!
• Inside the <fileDesc> the <titleStmt> contains only a <title>. What else can<titleStmt> contain? (hint: typing ’<’ here will provoke oXygen into providing a dropdownlist of possibilities).
• Underneath the <title> add an <author> element. The content of this should be ’WilfredOwen’.
• Below this add an <editor> element with the content of ’Renée van Baalen’. (She transcribedthe letter for our teaching purposes.) How does one type in ’é’ in oXygen? Hint: the ’Edit’ menucontains a ’Insert from Character Map’ entry.
• After this add a <principal> element to record the person primarily responsible for the project.In this case, use your own name.
• Below this add a <meeting> element with the content of ’Digital Humanities at Oxford SummerSchool 2012’.
• After that add <respStmt>with a <resp> inside it saying ’Improved encoding’ and a <name>with your name.
• Your <titleStmt> should now look something like:
<titleStmt><title>Letter to Leslie Gunston</title><author>Wilfred Owen</author><editor>Renée van Baalen</editor><principal>[your name here]</principal><meeting>Digital Humanities at Oxford Summer School 2012</meeting><respStmt><resp>Improved encoding</resp><name>[Your name here]</name>
</respStmt></titleStmt>
If you do not understand what any of these elements are for, make sure to look them up on theTEI-C website at the URL given above.
5.4.6 Improving the <publicationStmt>
The <publicationStmt> is also fairly limited. It could contain a lot of structured information, butjust has a paragraph of prose. Let’s replace it!
• Delete the entire paragraph including the starting and ending <p> tags.
• Inside <publicationStmt> add a <publisher> element. In this case, ’TEI @ Oxford’ isthe publisher.
• Below the <publisher> add a <distributor> containing ’Digital Humanities at OxfordSummer School 2012’.
• After this add an <authority> element, to detail under who’s authority it is published. In thiscase let’s say it is under your authority, so add your name.
25
Workshop: An Introduction to XML and the Text Encoding Initiative
• Next, inside a <pubPlace> element, which itself contains an <address> element withan <orgName> (Oxford University Computing Services), a <street> address (13 BanburyRoad), a <settlement> (Oxford), a <postCode> (OX2 6NN), and a <country> (UnitedKingdom).
• After the <pubPlace> element but still inside the <publicationStmt> add a <date>element with content of ’3 July 2012’. The <date> element can have a @when attribute to takea standardised YYYY-MM-DD form of the date, add this as well.
• Add an ID number after this using <idno>. This should be something like a catalogue number,or a URL at which this document will reside. In this case, make up what you think a sensible IDnumber would be for your edition of this letter.
• Next add an <availability> statement with a <p> containing a description of the licenceyou would want to distribute this under. We recommend you choose a Creative Commons licenseusing http://creativecommons.org/choose/. For bonus points you can include a link (using <ref>with a @target attribute to the license your chose.
• Your <publicationStmt> should now look something like:
<publicationStmt><publisher>TEI @ Oxford</publisher><distributor>Digital Humanities at Oxford Summer School 2012</distributor><authority>[Your name here]</authority><pubPlace><address><orgName>Oxford University Computing Services</orgName><street>13 Banbury Road</street><settlement>Oxford</settlement><postCode>OX2 6NN</postCode><country>United Kingdom</country>
</address></pubPlace><date when="2012-07-03">3 July 2012</date><idno>[Insert an ID number here]</idno><availability><p>Licensed with a <ref
target="http://creativecommons.org/licenses/by/3.0/">CreativeCommons Attribution</ref> licence.</p>
</availability></publicationStmt>
5.4.7 Improving the <sourceDesc>
Our <sourceDesc> is also fairly limited.
• Delete the entire paragraph that is currently in the <sourceDesc> and replace it with a<biblStruct>.
• The <biblStruct> should have an <analytic>with a <title> (Letter to Leslie Gunston),and <author> (Wilfred Owen).
• The <biblStruct> should also have a <monogr> for the collection containing:
– <title> (The Wilfred Owen Collection).
– A <ref> (First World War Poetry Digital Archive) containing a @target attribute pointingto ’http://www.oucs.ox.ac.uk/ww1lit/collections/document/5243’.
26
5.4 Exercise 3: Improving a teiHeader
– An <imprint> element containing a <publisher> (The First World War Poetry DigitalArchive), a <pubPlace> (Oxford), and a <biblScope> (Two pages) with a @typeattribute of ’pp’, and a @n attribute of ’2’.
– Outside the <monogr> but inside the <biblStruct> add a <relatedItem> with a<bibl> containing ’The source of this digital resource is a copy from the Harry RansomCentre.’ You could also wrap ’Harry Ransom Centre’ in a <distributor> element. Thisis an example of a much less structured bibliographic citation inside a structured one.
– Your <sourceDesc> should now look something like:
<sourceDesc><biblStruct><analytic><title>Letter to Leslie Gunston</title><author>Wilfred Owen</author>
</analytic><monogr><title>The Wilfred Owen Collection</title><ref
target="http://www.oucs.ox.ac.uk/ww1lit/collections/document/5243/4769">First WorldWar Poetry Digital Archive</ref>
<imprint><publisher>The First World War Poetry Digital Archive</publisher><pubPlace>Oxford</pubPlace><biblScope type="pp" n="2">Two pages</biblScope>
</imprint></monogr><relatedItem><bibl>The source of this digital resource is a copy from the<distributor>Harry Ransom Centre</distributor>.</bibl>
</relatedItem></biblStruct>
</sourceDesc>
5.4.8 Other components of the <fileDesc>
There are other elements that could appear in your <fileDesc>.
• Immediately after the closing </fileDesc> tag you could add an <editionStmt> with an<edition> containing a descriptive phrase such as ’First Edition’ for the current edition of theelectronic file.
• Immediately after the closing </editionStmt> you could add an <extent> element withsome measure of the size of the text (e.g. ’260 words’).
• Immediately after the closing </publicationStmt> you could add a <notesStmt> withone or more <note> elements inside it. One could contain something saying ’Transcribed forDHOXSS TEI Workshop’.
5.4.9 Adding an <encodingDesc>
An <encodingDesc> element will give us a place to document the encoding practices in thedocument.
• After the closing </fileDesc> we should add an <encodingDesc> element.
• Inside the <encodingDesc> add a <projectDesc> with a <p> inside it saying somethinglike ’The TEI@Oxford project created teaching materials for DHOXSS’.
27
Workshop: An Introduction to XML and the Text Encoding Initiative
• Next inside the <encodingDesc> add an <editorialDecl> with a <correction>inside that with a paragraph saying something like ’Apparent errors have been marked as <sic>but correct readings not provided’. Mark up <sic> as an element by using <gi> (genericidentifier).
• Also inside the <editorialDecl> add a <hyphenation> with a paragraph saying some-thing like ’Hyphens have been transcribed as they appear’.
• Look at the other options available to you inside <editorialDecl> and <encodingDesc>.
• Your <encodingDesc> should look something like:
<encodingDesc><projectDesc><p>The TEI@Oxford project created teaching materials for DHOXSS.</p>
</projectDesc><editorialDecl><correction><p>Apparent errors have been marked as <gi>sic</gi> but correct readings
not provided.</p></correction><hyphenation><p>Hyphens have been transcribed as they appear.</p>
</hyphenation></editorialDecl>
</encodingDesc>
5.4.10 Adding a <profileDesc>
A <profileDesc> is a place to store various non-bibliographic information concerning the text.
• After the closing </encodingDesc> add a <profileDesc>.
• Inside this add a <creation> with a <placeName> (Craiglockhart) and a <date> (July1917) perhaps with a @when attribute (’1917-07’).
• In the <profileDesc> next add a <handNotes> with a <handNote> inside it sayingsomething like ’Written in Wilfred Owen’s hand’.
• Next, add a <langUsage> inside the <profileDesc> with a <language> inside (’En-glish’) with an @ident attribute with a value of ’en’ for the English language code.
• Next add a <textClass> with a <classCode> with content of ’826’ and a @schemeattribute of "http://www.oclc.org/dewey/resources/summaries/default.htm". This is the Deweyclassification code for ’English Letters’.
• Your <profileDesc> should now look something like:
<profileDesc><creation><placeName>Craiglockhart</placeName><date when="1917-07">July 1917</date>
</creation><handNotes><handNote>Written in Wilfred Owen’s hand</handNote>
</handNotes><langUsage><language ident="en">English</language>
28
5.4 Exercise 3: Improving a teiHeader
</langUsage><textClass><classCode
scheme="http://www.oclc.org/dewey/resources/summaries/default.htm">826</classCode>
</textClass></profileDesc>
5.4.11 Adding a <revisionDesc>
A <revisionDesc> gives you a way to record major stages in revision to a document.
• After the closing </profileDesc> add a <revisionDesc> element.
• Add two <change> elements. On the first one add a @when attribute with today’s date. Insidethe <change> add a <persName> containing your name, followed by the text ’improved theheader’.
• In the second <change> add a @when attribute of ’2012-02’, with a <persName> of ’Renéevan Baalen’ saying that she ’transcribed the Letter to Leslie Gunston document’. You may alsowish to mark ’Letter to Leslie Gunston’ as a <title>.
• It is standard practice for the most recent <change> to be first.
• Your <revisionDesc> should now look something like:
<revisionDesc><change when="2012-07-03"><persName>[Your name here]</persName> improved the header.</change>
<change when="2012-02"><persName>Renée van Baalen</persName> transcribed the <title>Letter to
Leslie Gunston</title> document. </change></revisionDesc>
5.4.12 Saving Your WorkLet’s save our work:
• Is your work well-formed? Do you have a happy green square or an angry red one?
• Have you formatted and indented your work automatically?
• From the ’File’ menu select ’Save’ or click on the Save icon (looks like an old-style 3.5" disk).
• Or if you prefer use the ’File’ then ’Save As’ menu item to save the file using the name’exercise03.xml’ or another name of your choice.
5.4.13 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions:
• What kinds of metadata can you store in a <titleStmt>?
• What is a <publicationStmt> used for? What can it contain?
• How do you provide details of the source for the file?
• What is the difference between <bibl> and <biblStruct>?
• What is an <encodingDesc> for?
• What order should <change> elements be listed in a <revisionDesc>?
29
Workshop: An Introduction to XML and the Text Encoding Initiative
5.4.14 Next and More ReadingNext we’ll learn to relate information in the body of the text to aspects of the header. There is lots ofinformation we could have put in our header which we didn’t.
• If you haven’t already, look up the main elements in the <teiHeader> on the TEI-C websiteand see what they are allowed to contain.
• You could also have a look at the TEI Guidelines chapter of the Header at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html.
30
5.5 Exercise 4: Marking Up Names
5.5 Exercise 4: Marking Up Names5.5.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• encode personal, place, and organizational names
• store metadata concerning people, places, or organizations in the <teiHeader>
• link names in the document text to metadata stored in the header or another file
5.5.2 SummaryThis exercise will give you practical experience in marking up names of people, places, and organiza-tions. You’ll learn how to store richly structured metadata about these in the header, and how to link tothem from the document.
5.5.3 Starting UpLoad up the completed file from the previous exercise. If you did not complete the exercise you cancheat by loading up ’spoilers/ex03.xml’ and saving it under a new name.
5.5.4 Marking Up NamesIn addition to the general purpose <name> element which can take a @type attribute for classification,there are three types of names specifically catered for in the TEI. These are: organizational names(<orgName>), personal names (<persName>), and place names (<placeName>). Occasionallyyou might want to mark something like ’she’ which is not strictly a name but references an understoodnamed entity. To do this we use a reference string or <rs> element.
• In the first <salute> mark up ’L.’ as a <persName>.
• In the first paragraph encode ’Field Club’ as an <orgName>, and ’Berlitz, Edin.’ as an<orgName> with a <placeName> inside it (’Edin.’).
• In the second paragraph mark up Antaeus, Heracles, Mother Earth, and ’old Herk.’ as<persName> elements.
• In the verse encode ’Earth’ as a <persName> (because it is used anthropomorphically here).
• In the final division mark up ’Locke’s’ and ’Swinburne’ as a <persName> elements.
• Inside the <signed> element mark up ’WEO’ as a <persName>.
• There are more names we could mark up, such as the use of the names Leslie Gunston and WilfredOwen throughout the header, but that is optional.
5.5.5 Making PeopleThe names we find in documents are merely instances of names, they are not people, places,or organizations. Often we want to store canonical metadata about these many instances in our<teiHeader> and so we use the <person>, <place>, <org> elements as containers for thismetadata. We contain these in a <listPerson>, <listPlace>, or <listOrg> commonly (butnot always) stored inside the <sourceDesc> of the header.
• Just before the closing </sourceDesc> add a <listPerson> element.
• Add a <person> element with an @xml:id attribute, and at least a <persName> inside asfollows:
@xml:id <persName> Other Info
31
Workshop: An Introduction to XML and the Text Encoding Initiative
LG Leslie Gunstonherc Heraclesearth Mother Earthant AntaeusWL William John Locke Birth: Cunningsbury St. George, 20th March 1863
Death: Paris, , 15th May 1930
AS Algernon CharlesSwinburne
Birth: London, 5th April 1837Death: London, 10th April 1909
WO Wilfred Edward SalterOwen
Birth: Oswestry, 18th March 1893Death: Ors, 4th November 1918
For bonus points, perhaps mark <forename>s and <surname> of real people inside the<persName> and also add a <birth> and <death> element for those with this information.These can have a @when attribute with a YYYY-MM-DD format of the date, and can alsothemselves contain <placeName> elements.
• Your <listPerson> might look something like:
<listPerson><person xml:id="LG"><persName><forename>Leslie</forename><surname>Gunston</surname>
</persName></person><person xml:id="herc"><persName>Heracles</persName>
</person><person xml:id="ant"><persName>Antaeus</persName>
</person><person xml:id="earth"><persName>Mother Earth</persName>
</person><person xml:id="WL"><persName><forename>William</forename><forename>John</forename><surname>Locke</surname>
</persName><birth when="1863-03-20"><placeName ref="#Cun">Cunningsbury St. George</placeName>, 20th March
1863</birth><death when="1930-05-20"><placeName ref="#Par">Paris</placeName>, 15th May 1930</death>
</person><person xml:id="AS"><persName><forename>Algernon</forename><forename>Charles</forename><surname>Swinburne</surname>
</persName><birth when="1837-04-05"><placeName ref="#Lon">London</placeName>, 5th April 1837</birth>
<death when="1909-04-10"><placeName ref="#Lon">London</placeName>, 10th April 1909</death>
32
5.5 Exercise 4: Marking Up Names
</person><person xml:id="WO"><persName><forename>Wilfred</forename><forename>Edward</forename><forename>Salter</forename><surname>Owen</surname>
</persName><birth when="1893-03-18"><placeName ref="#Osw">Oswestry</placeName>, 18th March 1893</birth>
<death when="1918-11-04"><placeName ref="#Ors">Ors</placeName>, 4th November 1918</death>
</person></listPerson>
5.5.6 Building PlacesWe also refer to some places in our file, so let’s document those as well!
• After the closing </listPerson> create a <listPlace>.
• Add a <place> inside, with an @xml:id of ’edinburgh’, <placeName> of ’Edinburgh’, a<region> of ’Scotland’, and a <country> of ’United Kingdom’.
• Inside this <place> add a nested <place> element with an @xml:id of ’craiglockhart’.
• Inside this <place> add a <placeName> of ’Craiglockhart War Hospital’ and as a sibling tothis a <settlement> of ’Edinburgh’.
• Then add a <location> with a <geo> inside which contains the coordinates 55.91812, -3.24019.
• Your <listPlace> might look something like this:
<listPlace><place xml:id="edinburgh"><placeName>Edinburgh</placeName><region>Scotland</region><country>United Kingdom</country><place xml:id="craiglockhart"><placeName>Craiglockhart War Hospital</placeName><settlement>Edinburgh</settlement><location><geo>55.91812, -3.24019</geo>
</location></place>
</place></listPlace>
• By nesting the hospital’s location inside the place for Edinburgh, we record that the one placeis inside the other through the XML hierarchy. The nested <settlement> of ’Edinburgh’ istechnically redundant.
5.5.7 Creating OrganizationsThe principle is basically the same for creating an <org> inside a <listOrg>:
• After the closing </listPlace> create a <listOrg> with an <org> inside it with an@xml:id of ’Berlitz’.
33
Workshop: An Introduction to XML and the Text Encoding Initiative
• Give this <org> an <orgName> of ’Berlitz’.
• Inside this <org> add a <place> element with a <location> containing an <address>with a <street> of ’14 Frederick Street’, a <postCode> of ’EH2 2HB’, a <settlement>of ’Edinburgh’, and a <country> of ’United Kingdom’.
• Your <listOrg> might look something like:
<listOrg><org xml:id="Berlitz"><orgName>Berlitz</orgName><place><location><address><street>14 Frederick Street</street><postCode>EH2 2HB</postCode><settlement>Edinburgh</settlement><country>United Kingdom</country>
</address></location>
</place></org>
</listOrg>
• Also inside this <listOrg> add an <org> for the ’Field Club’ with an <orgName> and a<note> recording that this is now known as the Edinburgh Natural History Society, and has awebsite at "http://www.edinburghnaturalhistorysociety.org.uk/".
5.5.8 Linking Names and MetadataHaving marked all these names, and created stored metadata about them, it seems a shame not to linkthe names to this metadata. So let’s do that!
• Go to the <persName> you put in the first <salute> around ’L.’. Put the cursor immediatelyafter the final ’e’ in the opening <persName> tag and press space. You should get a drop-downlist of attributes, select ’ref’, when you do so you should get a drop-down list of @xml:id valuespresent in the entire document. Scroll down and select ’#LG’.
• This <salute> now should look like:
<salute>Dear <persName ref="#LG">L.</persName></salute>
• The value of @ref is a URI, which includes URLs, and in this case a ’fragmentary URL’.It starts with a ’#’ to let us know it is in the same document. You could also have storedthe <listPerson> in a separate document, in which case we would put something like’people.xml#LG’, or stored this online somewhere ’http://www.example.com/people.xml#LG’.While it is best if this points to a TEI <person> element, it can in fact point to anything whichdocuments the name such as a wikipedia article. (One reason it is better for this to point to a<person> element is that inside that you could indeed point to more than one external source ofinformation.)
• For each <persName>, <placeName>, and <orgName> (for which you’ve created a<person>, <place> or <org> element) go through and add a @ref attribute pointing to thecorrect @xml:id.
34
5.5 Exercise 4: Marking Up Names
• The benefit of doing all this work, is now for each instance of the name a standardised form of it,and other metadata is available during processing to other outputs. (e.g. for help in searching, ordisplaying this information)
5.5.9 Referencing StringsAs explained earlier the <rs> element can be used to mark things which aren’t strictly names inthemselves but are understood to reference named entities. For example ’I’ and ’you’ in this file refer toWilfred Owen and Leslie Gunston respectively.
• Depending on how much time you have left, mark as many of the instances of ’I’ and ’you’ as<rs> pointing to the appropriate <person> element in each case.
5.5.10 Saving Your WorkLet’s save our work:
• Is your work well-formed? Do you have a happy green square or an angry red one?
• Have you formatted and indented your work automatically?
• From the ’File’ menu select ’Save’ or click on the Save icon (looks like a old-style 3.5" disk).
• Or if you prefer use the ’File’ then ’Save As’ menu item to save the file using the name’exercise04.xml’ or another name of your choice.
5.5.11 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions:
• Which elements are used to mark personal, place, and organizational names?
• How do you store metadata in the header about the entities these names refer to?
• What values does the @ref attribute allow? How can this be used to point to external files orURLs?
• How do you mark up strings of text which reference named entities, but aren’t themselves names?
5.5.12 Next and More ReadingNext we’ll be investigating more about the physical document itself. However, before that if you havetime you may wish to:
• Look up the reference pages for each of the new elements you’ve used.
• Read some of the chapter on Names, Dates, People, and Places: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ND.html.
35
Workshop: An Introduction to XML and the Text Encoding Initiative
5.6 Exercise 5: Creating a Manuscript Description5.6.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• Modify a basic manuscript description to provide more structure
• Understand the general categories of manuscript description
• Have more experience editing a complex <teiHeader>
5.6.2 SummaryIn this exercises you will add a manuscript description to the file you finished in the previousexercise. You’ll modify an existing <msDesc> element with a basic structure to categorise manuscriptdescription information into a more detailed structure.
5.6.3 Starting UpLoad up the completed file from the previous exercise. If you did not complete the exercise you cancheat by loading up ’spoilers/ex04.xml’ and saving it under a new name (perhaps ’exercise04.xml’).
5.6.4 Inserting a basic <msDesc>The information for our manuscript description will basically be taken from the document descriptionat http://www.oucs.ox.ac.uk/ww1lit/collections/document/5243. But let’s pretend that we already havea basic manuscript description. There is no requirement with TEI <msDesc> to divide it into allthe possible categories of information, instead all it requires is at least a <msIdentifier>, otherinformation could be stored in a few accompanying paragraphs. This is useful for the retrospectiveconversion of catalogues in other legacy formats to TEI XML.
• Move the cursor to immediately following the closing </listOrg> tag. At this point either cutand paste or insert (with the ’Document’ -> ’File’ -> ’Insert File’) the file ’msDesc.xml’.
• As you’ll notice, this contains a very basic <msDesc> with a minimal <msIdentifier>.
5.6.5 Filling out a <msIdentifier>
Let’s expand the <msIdentifier>. As you have a lot more experience editing XML files in oXygennow, the steps will sometimes be given in less detail.
• Notice that the first paragraph mostly contains information that tells us where the manuscript is,in other words it identifies it and so this text could go in a <msIdentifier>.
• Take the information in this paragraph and expand the <msIdentifier> until it lookssomething like this:
<msIdentifier><country>United States of America</country><region>Texas</region><settlement>Austin</settlement><institution> The University of Texas at Austin </institution><repository>Harry Ransom Centre</repository><collection>Wilfred Owen Collected Letters</collection><idno type="folio">ff504</idno><altIdentifier><idno>Letter no. 535 Ed. ’Wilfred Owen Collected Letters’</idno>
</altIdentifier><msName>Letter to Leslie Gunston</msName>
</msIdentifier>
36
5.6 Exercise 5: Creating a Manuscript Description
• Note how elements are prescribed to appear in a particular order (from greatest level of granularityto more specific). Notice that most elements cannot be repeated (some like <collection> and<altIdentifier> can be).
• When you’ve finished creating the <msIdentifier> delete the remains of the first <p> fromthe basic manuscript description.
5.6.6 Providing some <msContents>
The second paragraph contains information that will be useful in compiling an <msContents>. Thisacts as a place to store structured information concerning the intellectual contents of a manuscript. Itgives a place for a summary of the contents of the manuscript and multiple <msItem> elements formsomething like a table of contents of works in the document.
• Rename the second paragraph element as <msContents> (your document will now not be valid)
• Highlight the text inside from the start to the end of "Collected Letters’.", press control-e to’surround with element’ and wrap this in a <summary>. This acts as a summary for theintellectual content
• Highlight the remaining text and surround it with a <msItem> element.
• Delete the ’Authored by’ and surround ’Wilfred Owen (1893-1918).’ with an <author>element.
• Surround ’English.’ with a <textLang> element.
• Add an @mainLang with a value of ’en’ (the ISO language code for ’English’)
• Add a @ref to the <author> and point to your <person> for Wilfred Owen.
• As this <msItem> is recording information for this particular item we also want to give it a<title>. Create an empty <title> element and cut and paste "Letter To Leslie Gunston /The Wrestlers." into it.
• Your <msContents> should now look something like:
<msContents><summary>"Letter To Leslie Gunston / The Wrestlers". Talks about forthcoming
address to the ’Field Club’. Includes a partial draft of ’The Wrestlers’.This is letter no. 535 in Ed. ’Wilfred Owen Collected Letters’.</summary><msItem><author>Wilfred Owen (1893-1918).</author><textLang mainLang="en"> English. </textLang>
</msItem></msContents>
5.6.7 Giving a <physDesc>
The next paragraph has a lot of information about the physical aspects of the manuscript. Let’s turn itinto a <physDesc>
• Rename the <p> to be a <physDesc>
• Immediately inside this create an <objectDesc> with a <supportDesc> inside that.
• Inside that <supportDesc> add a <support>, and inside this put the text "A single folio ofpaper in the collection as ff504 recto and verso"
37
Workshop: An Introduction to XML and the Text Encoding Initiative
• You could wrap the element <material> around the word ’paper’, but also you could add a@material attribute to <supportDesc> with a value of ’paper’.
• You could also categorise the object’s form by adding a @form attribute on <objectDesc>with a value of ’folio’.
• After the closing </supportDesc> tag add a <layoutDesc> with a <layout> to recordinformation about the physical layout. In this case "Written full width as a single column, withapproximately 20 lines per page"
• To the <layout> element add a @columns attribute of ’1’, and a @writtenLines of ’20’.
• After the closing </objectDesc> add a <handDesc> with a @hands attribute with a valueof ’1’.
• Inside the <handDesc> add a <handNote> with the remaining text "Written in WilfredOwen’s handin pen.". You might want to mark Wilfred Owen as a <persName> with a @refpointing back to the <person> for Wilfred Owen.
• Your <physDesc> now might look something like:
<physDesc><objectDesc form="folio"><supportDesc material="paper"><support>A single folio of <material>paper</material> in the collection as
ff504 recto and verso</support></supportDesc><layoutDesc><layout columns="1" writtenLines="20">Written full width as a single
column, with approximately 20 lines per page</layout></layoutDesc>
</objectDesc><handDesc hands="1"><handNote>Written in <persName ref="#WO">Wilfred Owen’s</persName> hand in
pen.</handNote></handDesc>
</physDesc>
5.6.8 Detailing a <history>
The <history> element gives a place to detail the <origin>, <provenance>, and<acquisition> of the manuscript if available. In this case we have some minimal information aboutthe origin of the manuscript
• Rename the second-last paragraph to a <history> element.
• Select all the text of "This letter was written by Wilfred Owen in July 1917 at Craiglockhart WarHospital." and surround it with a <origin> element.
• Inside this mark ’July 1917’ as an <origDate> element. This is like the <date> element,but is specific to recording the origin date of the manuscript being described. Provide a @whenattribute of ’1917-07’.
• Similarly mark the ’Craiglockhart War Hospital’ as an <origPlace> with a @ref of’#craiglockhart’ to point to the <place> you made earlier. You could also surround the textwith an <orgName> if you want to indicate that this is an organizational name. As before youcould mark Wilfred Owen’s name.
38
5.6 Exercise 5: Creating a Manuscript Description
• Your <history> element should look something like:
<history><origin>This letter was written by <persName ref="#WO">Wilfred
Owen</persName> in<origDate when="1917-07">July 1917</origDate> at <origPlace ref="#craiglockhart">
<orgName>Craiglockhart War Hospital</orgName></origPlace>
</origin></history>
5.6.9 Noting <additional> InformationAt the end of your <msDesc> you can include an <additional> element which stores otherinformation such as <adminInfo> (for recording administrative events of the object), <listBibl>(for listing bibliographic citations about the object), and <surrogates> (for listing additionalrepresentations of the object).
• Change the final paragraph to an <additional> element with a <surrogates> inside thatcontaining all the text.
• Modify the URL given to be a <ptr> with a @target attribute.
• Your <additional> element should look something like:
<additional><surrogates>A digital image is available from the First World War Poetry
Digital Archiveat <ptrtarget="http://www.oucs.ox.ac.uk/ww1lit/collections/document/5243/4770"/>.
</surrogates></additional>
5.6.10 Saving Your WorkLet’s save our work:
• Is your work well-formed? Do you have a happy green square or an angry red one?
• Have you formatted and indented your work automatically?
• From the ’File’ menu select ’Save’ or click on the Save icon (looks like a old-style 3.5" disk).
• Or if you prefer use the ’File’ then ’Save As’ menu item to save the file using the name’exercise05.xml’ or another name of your choice.
5.6.11 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions:
• What is the only required aspect of a TEI manuscript description?
• How does one record the separate works of intellectual content present in the manuscript?
• Where does one describe the support which forms the object, or its layout?
• How does one record the origin, provenance, and acquisition of the object?
• Where might you record
39
Workshop: An Introduction to XML and the Text Encoding Initiative
5.6.12 Next and More ReadingNext we’ll be looking at more encoding one can add to manuscripts, particularly for transcriptions.However, before that if you have time you may wish to:
• Look up the reference pages for each of the new elements you’ve used.
• Read some of the chapter on Manuscript Description: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/MS.html.
40
5.7 Exercise 6: Transcribing with the TEI
5.7 Exercise 6: Transcribing with the TEI5.7.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• Undertake conversion of a transcription of a document
• Use <add>, <del> and <subst> effectively
• Use <choice> with <abbr> and <expan>
• Use <unclear> to note difficult to read passages
5.7.2 SummaryIn this exercise you will take a transcription, already partly in TEI but with a bespoke transcriptionnotation, and convert it fully to TEI P5 XML. You’ll learn how to use <add>, <del>, <subst>,<choice>,<abbr>, <expan>, and <unclear> in both simple and nested manners to indicate somecomplex transcriptional phenomena.
5.7.3 Starting UpIn this case we’re starting with a sample file that we have created for you. Load up the file called:’preface.xml’ in the oXygen XML editor. Check that the file is well-formed and valid. Note that:
• There is a minimal header with an <msDesc>.
• There is a commented-out plain text edition of this preface, edited by J. Stallworthy. (This is justto give you a reading copy of a clean text).
• There is a transcription with a bespoke notation of "(deleted: ’some text’)" by a transcriber toshow what was deleted, added, and other aspects. In some case additions and deletions are nestedtogether as a single act with an extra set of parentheses. In other cases deletions are made insideadditions.
• There is an image of the manuscript page in the ’preface-ms.jpg’ file. Have a look at this, thecommented out edited version, and the transcription of the manuscript.
5.7.4 Using @rendNote that the <head> inside the division has a content of:
<head>(underlined:’Preface.’)</head>
• Remove the "(underlined:’" and "’)" so that you are left with just the text.
• Add a @rend attribute with a value of ’underline’.
• You should have a <head> that looks something like:
<head rend="underline">Preface.</head>
5.7.5 Second ParagraphLet’s change some of this transcription notation to real markup in the second paragraph (the first has notranscription notation).
• Change (added below: ’about glory, honour,’) to be
41
Workshop: An Introduction to XML and the Text Encoding Initiative
<add place="below">about glory, honour,</add>
• The next is an example of a substitution, a deletion and addition provided as a single act, followedby a deletion. Change ((deleted:’battles, and glory of battles’) (addedabove:’deeds or lands’)) (deleted ’or land,’) to be
<subst><del rend="stroked">battles, and glory of battles</del><add place="above">deeds or lands</add>
</subst><del rend="stroked">or land,</del>
• Replace ((deleted:’or’) (added below:’or anything about’)) with
<subst><del rend="stroked">or</del><add place="below">or anything about</add>
</subst>
• The transcriber has marked a passage that was unclear in transcription as (unclear,scribbled: ’majesty’). Replace this with
<unclear reason="scribbled">majesty</unclear>
• The transcriber has recorded an abbreviation with its expansion as domin(ion). This is meantto mean that ’domin’ is what it is on the page, but that it should be expanded to ’dominion’.Encode this as:
<choice><abbr>domin</abbr><expan>domin<ex>ion</ex></expan>
</choice>
The <ex> element gives the supplied letters in the expanded form.
• Replace (deleted:’whatever’) with
<del rend="stroked">whatever</del>
• This concludes your second paragraph, and has given you some experience in marking up<add>, <del>, and wrapping those in <subst>, also using <unclear> and <choice>with <abbr>, <expan> and <ex>. Your paragraph should look something like:
<p> Nor is it <add place="below">about glory, honour,</add> about<subst>
<del rend="stroked">battles, and glory of battles</del><add place="above">deeds or lands</add>
</subst>
42
5.7 Exercise 6: Transcribing with the TEI
<del rend="stroked">or land,</del><subst><del rend="stroked">or</del><add place="below">or anything about</add>
</subst> any might, <unclear reason="scribbled">majesty</unclear>, <choice><abbr>domin</abbr><expan>domin<ex>ion</ex></expan>
</choice> or power <lb/><del rend="stroked">whatever</del> except War. </p>
5.7.6 Third ParagraphThe third paragraph introduces nesting additions/deletions inside additions for when someone has addedsomething, and then changed their mind by adding more, and/or deleting the addition.
• Change (added left: ’Above all’) (deleted: ’Its’) ((addedabove:’I am’) (deleted:’This book’)) to be:
<add place="left">Above all </add><del rend="stroked">Its </del><subst><add place="above">I am </add><del rend="stroked">This book</del>
</subst>
• Modify (added above:’(deleted: ’(unclear, unsure:’center’)’)’)which contains an unclear bit of text that has been deleted, inside an addition, to be somethinglike:
<add place="above"><del rend="stroked"><unclear>center</unclear>
</del></add>
• Modify ((added left:’My’) (deleted:"It’s")) (added above:’(deleted:’The’)’)which has a substitution with an addition and a deletion as well as an addition which is thendeleted.
<subst><add place="left">My </add><del rend="stroked">It’s </del>
</subst><add place="above"><del rend="stroked">The </del>
</add>
• Finish up the last couple of deletions in this paragraph and it should look something like:
<p><add place="left">Above all </add><del rend="stroked">Its </del>
43
Workshop: An Introduction to XML and the Text Encoding Initiative
<subst><add place="above">I am </add><del rend="stroked">This book</del>
</subst> is <add place="above"><del rend="stroked"><unclear>center</unclear>
</del></add> not concerned with Poetry. <lb/><subst><add place="left">My </add><del rend="stroked">It’s </del>
</subst><add place="above"><del rend="stroked">The </del>
</add> subject <del rend="stroked">of</del> is War, and the pity of<del rend="stroked">it</del> War. <lb/> The Poetry is in the pity.
</p>
5.7.7 Fourth ParagraphThe fourth paragraph has more of the same, but also a deletion with a nested addition with both asubstitution and standalone deletion.
• Change (deleted:’I have no hesistation in’) (deleted:’makingpublic’)(deleted:’publishing such’)((deleted:’My’) (added above with caret:’Yet these’))(deleted:’Ihave no hesistation in’) (deleted:’making public’)(deleted:’publishing such’)((deleted:’My’) (added above with caret:’Yet these’)) into:
<del rend="stroked">I have no hesistation in </del><del rend="stroked">making public</del><lb/><del rend="stroked">publishing such</del><lb/><subst><del rend="stroked">My </del><add place="above" rend="caret">Yet these</add>
</subst>
• Change (added above:’to this (added above:’(deleted:’past’)’)generation’) (deleted: ’not further consolation’) into:
<add place="above">to this <add place="above"><del rend="stroked">past </del>
</add> generation </add><del rend="stroked">not further consolation</del>
• Change (deleted:’The’) to (deleted:’this’ (added above:’((deleted:’a’)(added above:’this’)) (deleted:’bereaved’)’) generation’) which is adeletion (of ’this generation’) with a nested addition with a substitution and a deletion to:
44
5.7 Exercise 6: Transcribing with the TEI
<del rend="stroked">The</del> to<del rend="stroked">this <add place="above">
<subst><del rend="stroked">a</del><add place="above">this</add>
</subst><del rend="stroked">bereaved</del>
</add> generation</del>.
• (That is quite complicated in terms of additions and substitions!) Change the next couple deletionsinto markup, and then (deleted:’used proper names.’) (added above:’(Alla poet can do today is (added above:’(deleted:’to’)’) warn(deleted:’children’)’) into:
<del rend="stroked">used proper names.</del><add place="above">All a poet can do today is<add place="above">
<del rend="stroked">to</del></add> warn <del rend="stroked">children</del>
</add>
• Mark the last deletion ’War’, and then this paragraph should now look something like:
<p><del rend="stroked">I have no hesistation in </del><del rend="stroked">making public</del><lb/><del rend="stroked">publishing such</del><lb/><subst><del rend="stroked">My </del><add place="above" rend="caret">Yet these</add>
</subst> elegies are <add place="above">to this <add place="above"><del rend="stroked">past </del>
</add> generation </add><del rend="stroked">not further consolation</del><lb/> in no sense consolatory <lb/><del rend="stroked">The</del> to <del rend="stroked">this <add place="above">
<subst><del rend="stroked">a</del><add place="above">this</add>
</subst><del rend="stroked">bereaved</del>
</add> generation</del>. They may be to the <lb/> next. <del rend="stroked">IfI thought the letter of this</del><lb/><del rend="stroked">book would last, I now might have </del><lb/><del rend="stroked">used proper names.</del><add place="above">All a poet can do today is <add place="above">
<del rend="stroked">to</del></add> warn <del rend="stroked">children</del>
</add><lb/> That is why the true <del rend="stroked">War</del> Poets must be truthful.
<lb/></p>
45
Workshop: An Introduction to XML and the Text Encoding Initiative
5.7.8 Final ParagraphThe final paragraph doesn’t really introduce anything new that you don’t know already, so finish it offquickly! It should end up looking something like:
<p>[If I thought the letter of this book would last, I <lb/><del rend="stroked">wo </del> might have used proper names; but if the spirit of
<lb/> it <add place="above">survives</add> - survives Prussia - <subst><del rend="stroked">I </del><add>my ambition</add>
</subst> and those mames will <lb/><del rend="stroked">be</del><del rend="stroked">content</del> have achieved <del rend="stroked">themselves</del><lb/><del>ourselves</del> fresher fields than Flanders, <lb/> for he, not of war,
would he <lb/> sing</p>
5.7.9 Transcription ComparisonCompare your transcription to the image in ’preface-ms.jpg’.
• Is there anything you haven’t noted that you think is important?
• Are there any mistakes in the transcription that you should correct?
• Are there things you might have marked up differently?
5.7.10 Saving Your WorkLet’s save our work:
• Is your work well-formed? Do you have a happy green square or an angry red one?
• Have you formatted and indented your work automatically?
• From the ’File’ menu select ’Save’ or click on the Save icon (looks like an old-style 3.5" disk).
• Or if you prefer use the ’File’ then ’Save As’ menu item to save the file using the name’exercise06.xml’ or another name of your choice.
5.7.11 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions:
• If you want to indicate an abbreviation and expansion (or correction and error) are linked, whatelement do you wrap them in?
• If you want to indicate an addition and deletion are one editorial act, what do you surround themwith?
• How do you show that an addition is subsequently deleted?
5.7.12 Next and More ReadingNext we’ll be moving on to spoken texts and linguistic corpora. However, before that if you have timeyou may wish to:
• Look up the reference pages for each of the new elements you’ve used.
46
5.7 Exercise 6: Transcribing with the TEI
• What we haven’t covered in this exercise is the genetic encoding, using <sourceDoc> andlinking transcriptions to the <facsimile> element. Read some of the chapter on Representationof Primary Sources: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/PH.html.
• At the bottom of that chapter you can find a list of elements added by the ’transcr’ module.It is interesting to note how many of the elements we used appear in the ’core’ module at:http://www.tei-c.org/release/doc/tei-p5-doc/en/html/CO.html.
47
Workshop: An Introduction to XML and the Text Encoding Initiative
5.8 Exercise 7: Encoding Spoken Text5.8.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• Encode a transcription of spoken text
• Use a <recordingStmt> in the header to document an audio source
• Record participants in a linguistic interaction
• Mark utterances, pauses, and other incidents
• Understand basic use of a <timeline> element
• Use other markup in a spoken text transcription
5.8.2 SummaryThis exercise starts with a TEI template and quickly makes a transcription of an audio interview into afull valid TEI P5 file. The interviewer (Stuart Lee) is interviewing Ian Hislop, who had recently done atelevision programme ’Not Forgotten’ about the impact on British society of the First World War. Wewill mark up the utterances, pauses, and other aspects of a fragment of this interview.
5.8.3 Starting UpWith oXygen loaded, start a new TEI P5 document by going to ’File’ -> ’New’ -> ’FrameworkTemplates’ -> ’TEI P5’ -> ’All’, and modify the headers as below.
5.8.4 Creating a Better HeaderThe default template header isn’t very good, let’s make it better.
• Modify the default <fileDesc> element to have a better <title> and<publicationStmt>. This document will be a transcribed fragment of an interviewwith Ian Hislop for teaching purposes. Your <fileDesc> should look something like:
<fileDesc><titleStmt><title>Fragment of interview with Ian Hislop for teaching purposes</title>
</titleStmt><publicationStmt><p>Used for a teaching exercise at DHOXSS TEI Workshop</p>
</publicationStmt></fileDesc>
• Use the <recordingStmt> element inside <sourceDesc> to describe the source of thematerial as an (audio in this case) recording. This recording is 27 minutes and 9 seconds long, andwas made by OUCS on the 7th September 2007. So our <sourceDesc> looks something like:
<sourceDesc><recordingStmt><recording type="audio" dur="PT27M09S"><respStmt><resp>Recording by</resp><orgName ref="#OUCS">Oxford University
Computing Services</orgName></respStmt><date when="2007-09-07">7th September, 2007</date>
</recording></recordingStmt>
</sourceDesc>
48
5.8 Exercise 7: Encoding Spoken Text
• After the end of the <fileDesc> element but before the closing </teiHeader> tag, inserta <profileDesc> element, and within that a <particDesc>. This is where we’re goingto record the participants in the interview. Inside here create a <listPerson> with two<person> elements. The first (interviewer) should have an xml:id attribute value of ‘SL’. Thisis the interviewer and his name is Stuart Lee. Create a <persName> inside the <person> forhim. Also inside the <person>, alongside the <persName>, create a <note> with a <ref>inside. The @target attribute’s value should be http://users.ox.ac.uk/~stuart/Site/About_Me.htmlwith the <ref> content being something like ’Stuart Lee’s home page’.
• In the second <person> element, add an @xml:id attribute value of ‘IH’. The person beinginterviewed’s name is Ian Hislop, a well known UK comedian and editor of the satirical PrivateEye magazine. Create a <note> for him with a <ref> that points to his wikipedia page:http://en.wikipedia.org/wiki/Ian_Hislop Your <profileDesc> should look something like:
<profileDesc><particDesc><listPerson><person xml:id="SL"><persName>Stuart Lee</persName><note><ref
target="http://users.ox.ac.uk/~stuart/Site/About_Me.html"> Stuart Lee’shome page</ref>
</note></person><person xml:id="IH"><persName>Ian Hislop</persName><note><ref
target="http://en.wikipedia.org/wiki/Ian_Hislop"> Ian Hislop’s entry inWikipedia</ref>
</note></person>
</listPerson></particDesc>
</profileDesc>
• Your document should be well-formed valid, and have a happy green square!
5.8.5 Adding the Transcription and Utterances• Inside the <body> element, delete any paragraph that is there, and insert the ’hislop.txt’ file by
going to the ’Document’ -> ’File’ -> ’Insert File’ menu. (Otherwise you could copy and paste itfrom notepad or similar).
• oXygen will complain that you have a bunch of text just inside <body>, let’s solve that first byadding some structure.
• Replace the [gap for sampling purposes] at the start and end of the text with a <gapreason=”sampling”/>.
• Around each line-break separated utterance (including the speaker name) wrap a <u> element.(highlight and press ’control-e’; or you could put it in one <u> element and split it with ’alt-shift-D’ in front of each one.)
• Your first utterance should look something like:
49
Workshop: An Introduction to XML and the Text Encoding Initiative
<u>Lee (24.27-24.36):So em d-em having read [clicking sound: 0.17s] the Wipers Timesnow and and your [pause: 0.62s] view thatth th the thirties was that thatprisonment you say</u>
• Your document should be well-formed and valid now, with a happy green square, sort out anyproblems before progressing further.
5.8.6 Making Better UtterancesWhile our markup might be well-formed and valid it is a long way from the truth.
• The first person to speak didn’t really say ‘Lee (24.27-24.36):’.
• These are artifacts left by the transcriber to give a time stamp and indicate which person wasspeaking.
• Go through and comment out each of these lines by highlighting it and pressing ’control-shift-comma’ (or selecting ‘Toggle Comment’ from the right-click menu).
• Let’s use this information now to add some extra metadata to each of the utterances.
• For each utterance add a who attribute with a value of # followed byt the corresponding @xml:idvalue of the person you are pointing to.
• So your first utterance should look something like:
<u who="#SL"><!– Lee (24.27-24.36): –>So em d-em having read [clicking sound: 0.17s] the Wipers Timesnow and and your [pause: 0.62s] view thatth th the thirties was that thatprisonment you say
</u>
• Repeat this for all the utterances using ’#IH’ and ’#SL’ where appropriate.
• Note: we are not going to use the timestamps in this exercise, but do not delete them. Having XMLcomments in your file doesn’t cost you anything, but they can be deleted by other applicationsprocessing the files. So they are a good place to temporarily store information (such as where youare having a problem with some encoding, or how far you have got through a file).
5.8.7 Incidents and Pauses (and Regular Expressions)Our audio transcriber has rigorously recorded the number of seconds of a clicking sound that they heardand the pauses that speakers made in speaking. We want to turn these non-spoken notes into markup. Ifyou want to listen to the audio clip it is provided as ’hislop.mp3’. You probably don’t have timeto listen to the whole interview but it is provided as ’ian_hislop.mp3’. (The transcriber may havemis-heard some words here and there.)
• In the first utterance replace “[clicking sound: 0.17s]” with:
50
5.8 Exercise 7: Encoding Spoken Text
<incident dur="PT0.17S"><desc>clicking sound</desc>
</incident>
and “[pause: 0.62s]” with
<pause dur="PT0.62S"/>
• We want to do this throughout the document, but doing this manually is a bit of hard work. If youcannot get Regular Expressions to work, you can always do this manually, but this might make ita bit quicker! Press ’control-f’ (or ’Find’ -> ’Find/Replace’ in the menus) to bring up the searchand replace dialog window.
• Make sure the ‘Regular expression’ option is ticked. We are going to use regular expressions(sometimes called ’regex’ or ’regexes’) to make the search and replacing more powerful.
• In this case put \[clicking sound: ([0-9.]*)s\] into the ‘Text to find:’ box. Thismeans that we’re looking for a literal square bracket, followed by the text ‘clicking sound:’ thenany combination of numbers and ‘.’ followed by an s and a closing bracket. We have to escapethe brackets because they are used as part of the regular expression language.
• In the Replace with: section put:
<incident dur="PT\1S"><desc>clicking sound</desc>
</incident>
which has us insert the incident element and adds the string of text that we found in parenthesesin the search (represented by \1) to the duration value we’re adding to the replacement.
• Click ‘Find’ a couple times to make sure this is finding the recorded clicking sounds. If so, click‘Replace All’. Check that this has done what you want.
• Do the same with the pauses that have been recorded by using \[pause: ([0-9.]*)s\]as the text to search for and
<pause dur="PT\1S"/>
as the text to replace it with.
• When regular expressions are used carefully, they can make text replacement and markup aquicker job. What do you think are the dangers of using regular expressions?
5.8.8 Encoding a TimelineHaving a timeline is optional, but gives you a way to relate one point in a spoken text to another. In ourcase it is a bit artificial so you can decide whether to encode it our not.
• Inside the <body> before the first <gap/> element put the following <timeline> construc-tion. We could put this in the header, or a number of places, but this is as good a place as any.
51
Workshop: An Introduction to XML and the Text Encoding Initiative
<timeline unit="min" xml:id="recordingStart"><when xml:id="fragmentStart" interval="24.27"/><when xml:id="TS2" interval="24.36"/><when xml:id="TS3" interval="24.42"/><when xml:id="TS4" interval="25.13"/><when xml:id="TS5" interval="26.35"/><when xml:id="TS6" interval="26.36"/><when xml:id="TS7" interval="27.09"/>
</timeline>
• What this does is set up a timeline, with units in minutes and then has a series of <when> elementswith @xml:id attributes that we can then point at from the body of our transcription. The fragmentwe are using is 24.27 minutes through the interview. The next speaker says something at 24.36,etc. These correspond to the times in the comments inside your <u> elements.
• This means that we can add a @start and @end (where desirable) to the <u> elements to point tothese <when> elements.
• Our first utterance opening tag now looks like:
<u who="#SL" start="#fragmentStart" end="#TS2"> So em d-em having read...</u>
• For our second we’ve only indicated the start:
<u who="#IH" start="#TS2"><!–Hislop (24.36):–>Yeah.
</u>
• The third:
<u who="#SL" start="#TS2" end="#TS3"><!–Lee (24.36-24.42)–>...
</u>
• Fourth:
<u who="#IH" start="#TS3" end="#TS4"><!–Hislop (24.42-25.13):–><pause dur="PT0.76S"/> Not really I mean
...
</u>
• Fifth:
52
5.8 Exercise 7: Encoding Spoken Text
<u who="#SL" start="#TS4"><!–Lee (25.13):–>Yes.
</u>
• Sixth:
<u who="#IH" start="#TS4" end="#TS5"><!–Hislop (25.13-26.35)–>um <pause dur="PT0.50S"/> which I saw again and...
</u>
• Seventh:
<u who="#SL" start="#TS5"><!–Lee (26.35)–>hmm
</u>
• Eighth:
<u who="#IH" start="#TS6" end="#TS7"><!–Hislop (26.36-27.09)–>I think it’s difficult to read the history of the century...
</u>
• If you have done all that your document should be well-formed and valid with a happy greensquare. If it isn’t, find the problem!
5.8.9 Other Things to EncodeThere are some other things we could encode like:<title>s, <note>s, and <persName>s.
• There are three titles mentioned ’Wiper Times’, ’The Gassed’, and ’Voices’. Mark them up astitles, removing any quotation marks that indicated they were titles if they exist.
• There are two notes recorded by the transcriber: one has ’John Singer Sargent, 1918’ in it (removethe square brackets before marking it as a <note>. The other is ’syllables -damental whilelaughing’, mark this as a note as well. This last one could also have been marked using the<shift> element, but let’s leave it as a note for simplicity.
• There are a number of personal names, mark these as <persName>.
53
Workshop: An Introduction to XML and the Text Encoding Initiative
5.8.10 Saving Your WorkLet’s save our work:
• Is your work well-formed? Do you have a happy green square or an angry red one?
• Have you formatted and indented your work automatically?
• From the ’File’ menu select ’Save’ or click on the Save icon (looks like an old-style 3.5" disk).
• Or if you prefer use the ’File’ then ’Save As’ menu item to save the file using the name’exercise07.xml’ or another name of your choice.
5.8.11 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions:
• What element in the <teiHeader> is used to document the details of a recording that acts as asource?
• How do you mark utterances in spoken texts?
• What useful attributes can this element have?
• How do you indicate pauses or other incidents?
• What does a <timeline> look like?
• How do you mark titles that someone has said?
5.8.12 Next and More ReadingNext we’ll be looking at some linguistic markup. However, before that if you have time you may wishto:
• Look up the reference pages for each of the new elements you’ve used.
• You might want to read the chapter on ’Transcriptions of Speech’ at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TS.html.
54
5.9 Exercise 8: Linguistic Markup
5.9 Exercise 8: Linguistic Markup5.9.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• Understand how to use a <taxonomy> in the header for hierarchical classifications
• Know how to mark up words, and their parts-of-speech
• Associate XSLT stylesheets with an XSLT file
5.9.2 SummaryWe’re going to mark up the parts of speech for individual words in the transcription of spoken text weencoded in the previous exercise. To do this we’ll first put in a <taxonomy> element to refer to withstandard linguistic parts of speech. We’ll then tag individual words, and use an attribute to refer back tothis taxonomy. We’ll then realise how hard this is to do manually, and we’ll find a way to cheat! Finallywe’ll transform our XML file, not only into a standard web page displaying the transcribed text, but alsoto a page grouping together the words we’ve marked up.
5.9.3 Starting UpIn oXygen, load up the file you created in the previous exercise. If you didn’t finish that exercise, youcan cheat by loading up ’spoilers/ex07.xml’.
5.9.4 Inserting a TaxonomyIn order to have something to refer back to we’re going to insert a <taxonomy> element into our file.
• Immediate after the closing </profileDesc> tag, add a <encodingDesc> with a<classDecl> inside it.
• Making sure the cursor is in-between the starting and ending <classDecl> tags, insert the’taxonomy.xml’ file (’Document’ -> ’File’ -> ’Insert File’).
• This should add a large taxonomy of linguistic categories, each with their own @xml:id anddescription in a <catDesc> element.
• Your <encodingDesc> should look something like:
<encodingDesc><classDecl><taxonomy><category xml:id="adje"><catDesc>adjectives</catDesc><category xml:id="AJ0"><catDesc>adjective (unmarked) (e.g. GOOD, OLD)</catDesc>
</category><category xml:id="AJC"><catDesc>comparative adjective (e.g. BETTER, OLDER)</catDesc>
</category><category xml:id="AJS"><catDesc>superlative adjective (e.g. BEST, OLDEST)</catDesc>
</category><category xml:id="AT0"><catDesc>article (e.g. THE, A, AN)</catDesc>
</category></category>
...</taxonomy>
</classDecl></encodingDesc>
55
Workshop: An Introduction to XML and the Text Encoding Initiative
5.9.5 Marking Up Part of SpeechLet’s mark up some words!
• We need to wrap a <w> element around each of the words in the three utterances. Highlight thefirst one, and use ’control-e’ to then surround it with this tag. Then highlight the second one, andpress ’control-/’ to surround with the tag you just used. Do that until ’changed over time?’
• Each word though needs to get an @ana attribute added to it. Either do this manually or thinkhow you might search for <w> and replace it with <w ana="">.
• Now the problem is that each of those @ana attributes needs a value! It needs to be ’#’ followedby one of the @xml:id values in our <taxonomy>. We know that you might not automaticallyknow what category each word is, so we’ve listed what our first three utterances look like below:
<u who="#SL" start="#fragmentStart" end="#TS2"><!–Lee (24.27-24.36):–><w ana="#AV0">So</w><w ana="#UNC">em</w><w ana="#UNC">d-em</w><w ana="#VHG">having</w><w ana="#VVN">read</w><incident dur="PT0.17S"><desc>clicking sound</desc>
</incident><w ana="#AT0">the</w><title><w ana="#NN2">Wipers</w><w ana="#NN2">Times</w>
</title><w ana="#AV0">now</w><w ana="#CJC">and</w><w ana="#CJC">and</w><w ana="#DPS">your</w><pause dur="PT0.62S"/><w ana="#NN1">view</w><w ana="#CJT">that</w><w ana="#NN0">th</w><w ana="#NN0">th</w><w ana="#AT0">the</w><w ana="#CRD">thirties</w><w ana="#WBD">was</w><w ana="#CJT">that</w><w ana="#CJT">that</w><w ana="#NN1">prisonment</w><w ana="#PNP">you</w><w ana="#VVB">say</w>
</u><u who="#IH" start="#TS2"><!–Hislop (24.36):–><w ana="#ITJ">Yeah.</w>
</u><u who="#SL" start="#TS2" end="#TS3"><!–Lee (24.36-24.42)–><incident dur="PT1.28S"><desc>clicking sound</desc>
</incident><w ana="#VVG">looking</w><w ana="#AVP">back</w><w ana="#PRP">on</w><w ana="#AT0">a</w>
56
5.9 Exercise 8: Linguistic Markup
<w ana="#AJ0">failed</w><w ana="#NN1">piece</w><pause dur="PT0.35S"/><w ana="#VHZ">has</w><w ana="#DPS">your</w><w ana="#NN1">attitude</w><w ana="#PRP">to</w><w ana="#AT0">the</w><w ana="#PRP">to</w><w ana="#AT0">the</w><w ana="#NN1">War</w><w ana="#NN2">Poets</w><persName><w ana="#NP0">Wilfred</w><w ana="#NP0">Owen</w>
</persName><persName><w ana="#NP0">Sassoon</w>
</persName><pause dur="PT0.30S"/><w ana="#VVD">changed</w><w ana="#PRP">over</w><w ana="#NN1">time?</w>
</u>
5.9.6 How to CheatThat was an awful lot of work. In fact, some of those entries might be wrong. Why is that? Well, it isbecause we fed a plain text version of the transcription to an automatic part-of-speech tagger for Englishat http://ucrel.lancs.ac.uk/claws/. This has some limitations, but makes good guesses. Go and skim-readthe web page about it quickly.
• So we cheated in determining which parts-of-speech these words were, so we can hardly stop youcheating if you don’t want to manually mark up the rest of the spoken text in this file!
• We’d highly recommend, instead, that you save your current file (perhaps calling it ’exer-cise08.xml’?), and open up ’spoilers/ex08.xml’ which has a finished version of the file! (Justto save time you understand).
• In a real-world situation you probably wouldn’t manually tag a corpus like this in any case. Youwould run scripts over it (as we did) in order to automatically process it and convert the output ofa part-of-speech tagger.
5.9.7 Transforming Your FileBut what can we do with this markup now that we have ... erm... added it? (Ok, loaded it by opening’spoilers/ex08.xml’.)
• Let’s transform this file with an XSLT stylesheet we have prepared! XSLT is a transformationlanguage for XML which allows us to turn our XML files into other things (such as other XML,HTML, DOCX, PDF, TXT, etc.) and control what happens to them.
• In order to relate the XML file to a stylesheet we have to associate the two together. Go to the’Document’ -> ’XML Document’ -> ’Associate XSLT/CSS Stylesheet’ menu.
• Click on the ’XSLT’ tab, and click the folder icon to browse for a file.
• Choose ’spoilers/parts-of-speech.xsl’ as the XSLT file to use.
57
Workshop: An Introduction to XML and the Text Encoding Initiative
• You should notice that oXygen adds a new line to the top of your file that looks something like:
<?xml-stylesheet type="text/xsl" href="parts-of-speech.xsl"?>
• This allows the XML document to know what stylesheet it can use to transform the document.
• Select from the ’Document’ -> ’Transformation’ menu, ’Configure Transformation Scenario’.
• On the window that appears select ’XML Stylesheet Processing Instruction’, and then click’Transform Now’.
• If everything has worked perfectly (sometimes settings change across versions of oXygen), thenyour web browser should open a web page containing the text of this interview. It should havea table-of-contents which allows you to see two different versions of the text. One as you mightexpect, the other with words grouped by part of speech. (If for any reason it does not open, simplyopen up ’spoilers/parts-of-speech.html’ in a web browser as a demonstration.)
• Have a look at both of these. Hover the mouse over the words in both cases and note the extrainformation you should get in a tooltip.
5.9.8 Saving Your WorkYou don’t really have to save this exercise (though feel free to if you want) since we opened up’spoilers/ex08.xml’.
5.9.9 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions:
• Where does a <taxonomy> go in the header?
• Can <category> elements nest inside each other?
• What element is used to mark words?
• How do you mark the part-of-speech of a word?
• How do you associate an XSLT stylesheet with an XML file?
5.9.10 Next and More ReadingNext we’ll move on to learning how to customise the TEI for your own purposes. However, before thatif you have time you may wish to:
• Look up the reference pages for each of the new elements you’ve used.
• Read more about linguistic markup in the TEI chapter on ’Simple Analytic Mechanisms’http://www.tei-c.org/release/doc/tei-p5-doc/en/html/AI.html.
• You may also be interested in the TEI chapter on ’Linking, Segmentation and Alignment’http://www.tei-c.org/release/doc/tei-p5-doc/en/html/SA.html.
58
5.10 Exercise 9: Customise the TEI with Roma
5.10 Exercise 9: Customise the TEI with Roma5.10.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• Analyse the TEI elements, attributes and values you need for your TEI XML document
• Tailor a TEI schema to your TEI XML file in Roma
• Use a different schema in oXygen
• Generate human-readable specifications of your TEI schema in Roma
• Set the value of existing attributes
• Be aware of the underlying TEI ODD XML format
5.10.2 SummaryIn this exercise we will customise the TEI to remove those elements we do not think we’ll use. In orderto customise a TEI schema you need to know which elements you want to use, and which you don’t,which sometimes involves a lengthy document analysis process. In our case we’ll shortcut that by tellingyou what to include or not include. You will learn to create a new schema, and download and use it inoXygen. You’ll learn how to constrain the acceptable values for an attribute, and require its presence.You’ll have a look at the underlying TEI ODD XML format which enables this customisation.
5.10.3 Starting UpLoad up the file ’spoilers/ex06.xml’ in oXygen and save it under a new name. Open up a webbrowser and go to http://www.tei-c.org/Roma/. (There is also a development version of this athttp://tei.oucs.ox.ac.uk/Roma/.)
5.10.4 Your Current SchemaoXygen already knows about the TEI, it comes bundled with an open source TEI Framework (oxygen-tei) that helps it understand how TEI files are meant to work.
• In oXygen with ’spoilers/ex06.xml’ (or whatever you saved it as) loaded, move the cursor to justinside a paragraph after the opening <p>.
• If you type a ’<’ at this point, as you know, oXygen will give you a dropdown list of all theelements allowed inside a <p>.
• Scroll down the list of elements, referring to the pop-up tooltip if you want to know whatthe elements are for. Notice such elements as <address>, <camera>, <incident>,<metamark>, and <notatedMusic>.
• Hit escape to leave the dropdown menu and delete the ’<’ that you had added.
• You certainly have a lot of choices for elements you can add here! But in any project it is unlikelythat you are going to want all those choices. Also, increased choice of what elements to add canlead to greater human error and inconsistency, and we don’t want that!
5.10.5 Roma: Starting a New SchemaRoma enables you to customise the TEI schema and remove those bits you are not going to use.
• Go to http://www.tei-c.org/Roma/ in your browser and note that you are given four options fromwhich to start:
1. Build up: this allows you to create a new customisation by adding elements and modules tothe smallest recommended schema
59
Workshop: An Introduction to XML and the Text Encoding Initiative
2. Reduce: this allows you to create a new customisation by removing elements and modulesfrom the full tei_all (largest) schema
3. Template: this allows you to create a customisation from a template provided by the TEI asa starting point
4. Open: this allows you to open an existing customisation that you have saved previously.
• In our case, let’s start by choosing ’reduce’, and clicking ’start’.
• Set your parameters, change the following things:
– Title: ’TEI with maximal setup’ is kind of boring, why not call it something like ’My specialTEI customisation’.
– Filename: change ’tei_all’ for something like ’myTEI’ (don’t include spaces).
– Author name: You aren’t Sebastian! Change this to your name!
– You can leave the description as it is for now.
• Click ’Save’ at the bottom of the page. Notice how the box in the upper right tells you whichcustomisation you are working on.
5.10.6 Adding and Deleting ModulesModules are groupings of TEI elements for structural or semantic reasons. For example there is a’dictionary’ module which contains most of the elements needed for writing dictionaries. If you aren’twriting a dictionary, you probably don’t need that module. Below is a list of all the TEI modules:
Table 3: List of TEI Modules
analysis Simple analytic mechanismscertainty Certainty and uncertaintycore Elements common to all TEI documentscorpus Corpus textsdictionaries Dictionariesdrama Performance textsfigures Tables, formulæ, notated music, and figuresgaiji Character and glyph documentationheader The TEI Headeriso-fs Feature structureslinking Linking, segmentation and alignmentmsdescription Manuscript Descriptionnamesdates Names and datesnets Graphs, networks, and treesspoken Transcribed Speechtagdocs Documentation of TEI modulestextcrit Critical Apparatustextstructure Default text structuretranscr Transcription of primary sourcesverse Verse structures
• Click on the ’Modules’ tab to go to the page that allows you to add/delete modules from yourschema.
• Notice that because we’ve started with a ’maximal’ schema, the list of selected modules on theright is completely the same as the list of TEI modules on the left.
60
5.10 Exercise 9: Customise the TEI with Roma
• Click ’remove’ next to ’analysis’ on the right-hand side. Note that it vanishes from this list, butremains on the left-hand side where you could add it back if you wanted it.
• Remove ’analysis’, ’certainty’, ’corpus’, ’dictionaries’, ’drama’, ’figures’, ’gaiji’, ’iso-fs’, ’link-ing’, ’nets’, ’spoken’, ’textcrit’, ’verse’, and ’tagdocs’!
• Well! With removing that many maybe we should have started by building up instead of reducingdown? You should be left with: ’tei’ (you can’t remove this one in Roma), ’core’, ’header’,’msdescription’, ’namesdates’, ’textstructure’, and ’transcr’. Why do you think we have left thesemodules?
5.10.7 Including or Excluding ElementsWe have shrunk down the TEI to just a few modules, but those modules contain elements that we don’twant.
• Click on ’core’ (note: not ’remove’ but the word ’core’) on the right-hand side. This should takeyou to a page listing all of the elements in the ’core’ module.
• Each row of this table has:
– the element
– whether it is Included or Excluded
– the name being used for the element
– a question mark linking to the reference page for this element
– a description of the element
– a link to change its attributes
• It is possible to Include or Exclude all the elements by clicking this word in the table header.
• From ’core’ exclude the following elements: ’addrLine’, ’address’, ’analytic’, ’biblStruct’, ’bina-ryObject’, ’distinct’, ’divGen’, ’gb’, ’headItem’, ’headLabel’, ’imprint’, ’index’, ’listBibl’, ’mea-sure’, ’measureGrp’, ’meeting’, ’mentioned’, ’monogr’, ’postBox’, ’postCode’, ’relatedItem’,’rs’, ’said’, ’series’, ’sp’, ’speaker’, ’stage’, ’street’, ’teiCorpus’, ’term’, ’textLang’, and ’time’.
• Wow! That’s a lot less elements in your TEI schema. Remember to click ’Save’ at the bottomof the page!
• We could go through to each of the other modules removing elements from there, but you get theidea. In a real life situation you would work through carefully only including elements that youreally needed. The tighter your schema, the more consistent your data!
5.10.8 Saving Your Schema• If you click on the ’Schema’ tab you will see a drop down menu listing various schema formats to
generate. The TEI uses a meta-schema format of its own called ODD which allows it to generatethese different formats.
• Generate a schema either in Relax NG Compact Syntax, or Relax NG XML Syntax. These reallyare the best choice.
• When you click generate your browser should automatically download the schema file. Findwherever it has saved it, and move it (not, not copy, move) it to the place you have saved the’ex06.xml’ (or whatever you saved it as) file. They should be in the same directory.
• Do not close down your browser window or you’ll have to do that all again.
61
Workshop: An Introduction to XML and the Text Encoding Initiative
5.10.9 Associating Your Schema in oXygenoXygen has been using the tei_all schema by default because it recognises (from the TEI element in theTEI namespace) what kind of files we have been creating.
• Go to oXygen and the file you have previously loaded (’ex06.xml’ or whatever you saved it as).
• With this file open go to the ’Document’ -> ’Schema’ menu and note the icon next to ’AssociateSchema’. This icon should also be on your oXygen toolbar. Click either the icon, or ’AssociateSchema’.
• Click on the little folder icon next to ’URL’ in order to ’Browse for local file’. Find the schemafile you saved earlier, select it, and then click ’OK’ when back in the oXygen dialog box.
• When you click ’Ok’ then oXygen should add a line that looks something lke this:
<?xml-model href="myTEI.rng" type="application/xml"schematypens="http://relaxng.org/ns/structure/1.0"?>
at the top of your file.
5.10.10 Trying It OutRemember those elements like ’address’ and ’camera’ that you could add within a paragraph?
• Go to somewhere just after a <p> opening tag, and insert an ’<’ to get a dropdown list fromoXygen.
• Are any of the elements you excluded available? No? Good! If they are, then chances are youdidn’t click ’Save’ after Including/Excluding them, go back and do it again!
5.10.11 Constraining the @type Attribute on <div>
Removing elements is all well and good and is the first step in customising your schema, but we want todo more. Let’s customise the @type attribute on <div> to only allow certain values.
• Go back to Roma in your browser (hopefully you didn’t shut it and lose all your work?)
• Click on the ’Modules’ tab.
• Click on the ’textstructure’ module name.
• On the row containing ’div’ click on ’Change Attributes’ on the far right-hand side.
• This should take you to a page listing all the possible attributes on <div>. This is also where youwould include/exclude use of those attributes if we wanted to change that.
• Scroll down to ’type’ and click on it. This should take you to a page allowing you to set variousoptions for the @type attribute. Set them as follows:
– Is it optional? This allows us to control whether the attribute is required or not. Let’s makeour @type attribute required, so click ’no’ it is not optional.
– Contents This would allow us to change what type of datatype is allowed and how manytimes it should appear. Let’s leave that just as it is as ’Text’.
– Default value would allow us to set a default value for the attribute if you didn’t supply one.Let’s force ourselves to supply one and so leave this blank.
– Closed list? enables us to say whether our list of values is fixed, or merely a suggestion.Let’s be rigorous and say that it is a closed list. Answer yes!
62
5.10 Exercise 9: Customise the TEI with Roma
– List of values is where we give the values we want to supply to the schema as valid valuesfor the @type attribute on <div>. We give this as a comma-separated list. So write in:prose,verse,drama,chapter,somethingElse.
– Description allows us to change the description of this attribute. Add the phrase ’Ourmodified type attribute ’ to the start of the description.
• Click ’Save’ at the bottom of the page.
5.10.12 Trying It Out AgainLet’s go try out the changes we made. You know how to do this now:
• Click on the ’Schema’ tab.
• Choose one of the Relax NG formats from the dropdown list.
• Click ’Generate’
• Find where the file has downloaded it and copy it over the previous version you had there.
• Do not close down your browser!
• Go back to oXygen, and the ’ex06.xml’ (or whatever you renamed it as) file, and go to the’Document’ -> ’Validate’ -> ’Reset Cache and Validate’ menu item.
• You document should validate fine and you should have a happy green square.
• Go to the first <div> tag in the document that looks like <div type="prose"> and changeit to be just <div>.
• Your document should not be valid. You should have an angry red square. If it is still valid’Reset Cache and Validate’ again, and ensure that it is pointing to the correct schema. Theerror message it should be providing is that that element ’div’ missing requiredattribute ’type’ or similar.
• Put your cursor immediately after the ’v’ in <div> and press space. oXygen should provide adropdown list of attributes available on <div>. Scroll down until you find @type and note that itis in bold. This is because we made it required.
• Select @type and notice that oXygen gives you another dropdown list of the possible values. Thisis because we provided the values and said that this was a closed value list.
• Choose one of the values, perhaps ’prose’. Your document should again be valid and have a happygreen square.
5.10.13 Saving Your CustomisationThis is great, but what if you want to save your customisation, and come back later to do more work?
• Go back to your web browser and click on the ’Save Customization’ tab.
• Your browser should automatically start downloading an XML file. Move it to somewhereconvenient, for example where you put the schema.
• Do not shut your web browser yet!
• This is the file that you could upload when going to the ’New’ tab on Roma (the very first pagewith the four choices), if you had selected ’Open existing customization’. (Don’t do this nowthough!)
63
Workshop: An Introduction to XML and the Text Encoding Initiative
• Open this XML file in oXygen. It might not be formatted or indented properly. If not go to the’Document’ -> ’Source’ -> ’Format and Indent’ menu, or click the Format and Indent icon on thetoolbar, or press ’control-shift-p’.
• Read through the file to get a sense of how it relates to your customisation. Note how<moduleRef> includes those modules you have asked for, and how the ’core’ module isincluded except for the list of attributes you excluded.
• Look at the <elementSpec> for <div> and see how we’ve changed it.
• Note that this file is a TEI file just like the ones you’ve been editing, it just uses special elementsfrom the ’tagdocs’ module.
5.10.14 Generating Reference DocumentationRoma does not only generate schemas, but also customised reference documentation.
• If you return to your web browser and click on the ’Documentation’ tab.
• Choose HTML web page from the dropdown menu and click ’Generate’.
• If your browser has downloaded the file, instead of opening it, open the saved file with your webbrowser.
• You should get a web page starting with a table of contents listing the elements. Scroll down andclick on <div>.
• Notice that this has the @type attribute as required, and lists the legal values. Notice, however,that the example has not changed and it says type="poetry" in that.
• Try generating some PDF documentation as well. Which do you prefer?
5.10.15 More About RomaRoma the web front-end is a bit of a dated interface to a command line script and the OxGarage webservice. When you generated the documentation this used OxGarage and you didn’t even notice!
Some people write their TEI ODD customisation files entirely in XML and do not use the Roma webinterface at all. There are a number of things that the Roma web interface can’t do which the TEI ODDlanguage underneath is capable of. Notice, for example, that you weren’t able to provide descriptionsof each of the attribute values you entered for @type? You can do that in the underlying XML. Somepeople do a combination of both Roma and hand editing.
There is also a ’Sanity Checker’ tab... click that and find out what happens! (It might warn about theelement <term> being used in <keywords> but not being defined. That is fine!)
5.10.16 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions:
• What is Roma?
• How do you add and remove TEI modules using Roma?
• How do you include/exclude individual elements using Roma?
• How can you change attributes using Roma?
• Is it possible to save your customisation in Roma?
• What kinds of documentation can you generate in Roma?
64
5.10 Exercise 9: Customise the TEI with Roma
• What kinds of schemas can you generate in Roma?
• What does an underlying TEI ODD customisation file look like? Is it a TEI file like the onesyou’ve been working with?
5.10.17 Next and More ReadingNext we’ll move on to some of the other tools and utilities offered by the TEI Consortium. But firstconsider reading more about TEI ODD at:
• http://www.tei-c.org/release/doc/tei-p5-doc/en/html/USE.html#IM.
• and also the Documentation Elements chapter at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TD.html.
• See also, http://tbe.kantl.be/TBE/modules/TBED08v00.htm.
65
Workshop: An Introduction to XML and the Text Encoding Initiative
5.11 Exercise 10: OxGarage and the TEI Community5.11.1 Learning OutcomesWhen you successfully complete this exercise you should be able to:
• Us OxGarage to convert files from all sorts to and from TEI
• Understand the limitations of automatic conversion
• Explore the TEI Guidelines, Website, and Wiki
• Know how to read and submit bugs and feature requests on the TEI Sourceforge site
• Subscribe to the TEI-L mailing list
• Visit the TEI By Examples website for more self-directed training
5.11.2 SummaryThis exercise is designed to give you some exposure to some of the other TEI resources available online.Not only will you use OxGarage to convert files to and from TEI, but be shown the limitations ofsuch conversions. You’ll be directed to more parts of the TEI Guidelines, the TEI-C Website, and thecommunity-developed Wiki. The process for submitting and reviewing bugs and feature requests willbe reviewed, along with how to subscribe to the TEI-L mailing list. The TEI By Examples website willbe suggested as a good place for further self-directed study.
5.11.3 Starting UpThis exercise will primarily use a web browser, we recommend Google Chrome or a recent version ofMozilla Firefox.
5.11.4 OxGarage: Have a quick playOxGarage is a pipelining transformation engine with a RESTful Web Service (which in this case meansit can be used by programs as well as on the web) that converts documents from one format to another.
• Go to http://oxgarage.oucs.ox.ac.uk:8080/ege-webclient or if this isn’t working http://www.tei-c.org/oxgarage/.
• Click on ’Documents’ and select "TEI P5 XML Document" as your input. When you do so a listof possible conversion targets should appear on the right. Choose "Microsoft Word Document(.docx)".
• When you’ve done this a ’Choose File’ button should appear on the upper left. Click the buttonand navigate to your finished (if well-formed and valid) file from way back in Exercise 2. (If youdidn’t finish this, choose ’spoilers/ex02.xml’ instead).
• Click convert and open the document in Microsoft Word. Note the information that is retainedand the information that is lost.
• Try this again, but use a more complex file such the one from Exercise 6 (if you didn’t finish this,choose ’spoilers/ex06.xml’ instead). Note how in conversion to DOCX format that it attemptsto interpret additions, deletions, unclear, expansions, and representing them in presentationalmarkup.
• Try converting to a variety of other formats and see the results you get. (Note: Not all conversionsare equal!)
66
5.11 Exercise 10: OxGarage and the TEI Community
5.11.5 More on OxGarageOf course, OxGarage isn’t just about converting TEI files to other formats, it can also convert otherformats to TEI! See http://www.oucs.ox.ac.uk/oxgarage/ for more information.
• Take one of the formats you have converted (e.g. DOCX) a file in, and edit that file in MicrosoftWord. Add some new text, and divisions to them. Use ’heading2’ style (or similar) to add a newdivision at some point and add a few lines of text.
• If they are available, use some of the in-line styles the Word provides, and mark some text withthem.
• Try converting the file back to TEI and seeing what is preserved (and what isn’t!)
• Although most conversions are ’lossy’, this is a good mechanism for getting a large number ofdocuments into a basic TEI P5 XML structure, to then do further conversion work on this. Oneof the things we do for funded research projects is take on the work of ’up-converting’ their filesfrom Word to TEI P5, but deducing as much additional structure and markup as we can. Usuallythis kind of conversion is different for every project, but builds on the common base that we haveput into OxGarage.
5.11.6 The TEI GuidelinesThe TEI Guidelines are the main output of the TEI Consortium and contain chapters on a wide varietyof TEI recommendations. Hopefully you’ve had a chance to read a bit of them already.
• Go to http://www.tei-c.org/release/doc/tei-p5-doc/en/html/index-toc.html.
• Note the division into ’Front Matter’, ’Text Body’, and ’Back Matter’, look at the kinds of thingspresent in each section.
• Choose a chapter from the ’Text Body’ section and note how the left-hand table of contents showsthe general divisions (and previous/next chapters) and how the small right-hand navigation allowsyou to move forward/backwards through the sections.
• Notice also that greyed out ’¶’ character after any sub-division (or sub-sub-division) heading.This is a link to that section in particular. This is useful if you want to cite a section of the TEIGuidelines in conversation. (e.g. on the TEI-L mailing list)
• Notice that all elements, attributes, classes, and datatypes are links through to the reference pagesabout that object.
• If you look at the examples provided, most of them will have green backgrounds, which meansthat this snippet is valid in a TEI file (assuming it was put in the right place). Some examplesmight have amber (feasibly valid if some missing elements were provided), or red (invalid againstthe default schema). Sometimes it is necessary to show examples which are invalid to demonstratemixing of namespaces or when discussing XML itself.
• Click on any element name in the prose to go to the reference page for that element. Theinformation here can be very useful when you want to look up what the definition of an elementis, where it is allowed, or what is allowed inside it. At the bottom are one or more examples, anda link through to a list of all the examples in the TEI Guidelines which use this element.
5.11.7 The TEI Consortium WebsiteThe TEI-C Website http://www.tei-c.org/ is the central location leading to all things TEI related.
• Click the ’Home’ link on the menu (if you are still on the TEI Guidelines), and look at the homepage. Note the newsfeed that is provided on the left, and the menu bar.
67
Workshop: An Introduction to XML and the Text Encoding Initiative
• If you hover your mouse over the menus a drop-down menu should appear, look at each menu andfamiliarise yourself with the kinds of resources on the TEI-C website.
• If you explore enough, you should also find items like the minutes of meetings of the Boardand Technical Council. The TEI, where possible, attempts to conduct a large part of its businesspublicly since it is really a product of its own community (e.g. the Technical Council mailinglist archives are openly available for public viewing, and most of the work is freely available onSourceforge).
• The TEI consortium provides XSLT stylesheets to transform TEI to various formats. These arewhat underlie the OxGarage conversions above. They are freely available for anyone to use fromthe TEI Sourceforge Subversion repository. You can read about them at Tools -> Stylesheets, orat: http://www.tei-c.org/Tools/Stylesheets/.
• Explore the rest of the TEI-C Website!
5.11.8 The TEI WikiThe TEI Wiki is a community developed location for all sorts of TEI-related information.
• Go to http://wiki.tei-c.org/ and read the main page.
• How many different XSLT Stylesheets are provided on the TEI wiki?
• How many pages on ’Tools’ are there?
• What Special Interest Groups (SIGs) have pages on the TEI Wiki. Look at the Manuscripts SIGfor an example.
• Look on the Technical Council page, to see its last agenda.
• In order to edit the wiki you need to request an account. If you think this is something you’ll need,why not do so now! (It may take a couple days, it is approved by volunteers!)
5.11.9 The TEI Sourceforge SiteThe TEI Sourceforge site is currently used to manage the work of the TEI Technical Council inmaintaining the TEI Guidelines and various tools that accompany them (e.g. Roma and the Stylesheets).It is also the location that allows the community to submit bug and feature reports.
• Go to http://tei.sourceforge.net/ and notice that there are links to the project summary, bug reports,feature requests, file downloads and code repository. Explore each of these!
• Click feature requests or go to http://purl.org/tei/fr (with no trailing slash).
• Look through some recently submitted feature requests and click on one of them.
• Read the details of the tick noting whether it has been assigned to anyone, and whether there areany comments at the bottom. If there are, read them.
• Anyone who has registered for a Sourceforge account is able to comment on tickets, and the TEIencourages community participation. (So if you see a ticket you want to comment on, register/login and comment!
• Returning to http://tei.sourceforge.net/ click on ’Code Repository’. This allows you openly tobrowse the Subversion repository used by the TEI for its development.
• If you click on ’P5’, ’Source’, then ’Specs’, you’ll find the folder of all of the specifications forindividual elements. If you choose one of these you’ll see the revision history for this element.
• The Sourceforge site is a useful repository for the TEI that allows it to undertake ongoingdevelopment in an open and transparent manner. Being able to post bugs and feature requesthere makes you part of this development effort.
68
5.11 Exercise 10: OxGarage and the TEI Community
5.11.10 The TEI-L Mailing ListThe TEI-L mailing list is the main form of communication within the TEI Community. Questions onthere range from those entirely new to the TEI to those who have been using the TEI for a coupledecades. You should not be afraid to post straightforward sounding questions, just make sure you’vechecked the TEI Guidelines and Website first and are clear about what you want to do and why you areconfused. You will almost certainly be guaranteed an answer, sometimes several competing ones!
• Go to the TEI-L Archives at: http://listserv.brown.edu/archives/cgi-bin/wa?A0=TEI-L (or clickon the link in the last paragraph on the TEI-C Website home page).
• Read through some TEI-L messages from June 2012, choose some which sound interesting basedon their subject line.
• Try sorting by Date (they are defaultly sorted by Subject).
• Try searching (on the right-hand side) for some keyword that interests you.
• Consider subscribing to the mailing list!
• Another archive of the TEI-L messages are available from:http://blog.gmane.org/gmane.text.tei.general and another one at:http://tei-l.970651.n3.nabble.com/.
5.11.11 TEI By ExampleTEI By Example provides a variety of freely-available online tutorials the demonstrate a number ofdifferent stages in encoding a TEI file. There is a general introduction to text encoding and step-by-steptutorials provide introductions to eight different aspects of TEI markup with lots of examples. Real lifeexamples are provided for each tutorial and the theory provided can be tested with tests and exercises.A tools section gives an annotated overview of XML encoding technology and a validator for fragmentsof TEI.
• If you have not already done so, visit TEI By Example at: http://tbe.kantl.be/TBE/.
• Choose a Tutorial that is interesting to you and skim read the tutorial (you do not have to read itin depth at this point, you may choose to do so later).
• Look at the corresponding Examples section for that Tutorial, and see what things you do notunderstand. (Look them up in the Tutorial section).
• Have a look at the corresponding Exercise for the Tutorial section. (You don’t need to do it, justget a sense of the kind of exercises provided.)
• TEI By Example is a good resource that it would benefit you to return to at a later date and workthrough.
5.11.12 Self-AssessmentCheck if you understand some of the core principles of this exercise by answering the followingquestions:
• What is OxGarage good at? What is it not good at?
• How do you get to an element’s reference page on the TEI Website?
• What kind of information do you find on the TEI Wiki?
• How do you submit a bug or feature request to the TEI?
• Have you joined the TEI-L mailing list?
• What do you think of TEI-By-Example?
69
Workshop: An Introduction to XML and the Text Encoding Initiative
5.11.13 Next?This is the last exercise of this workshop, and we hope you feel like you’ve had a (quick!) but broadoverview of some of the things the TEI can do. Your learning is by no means complete! Read the TEIGuidelines! Use TEI-By-Example! Join the TEI-L mailing list and ask questions! If you have Oxfordspecific TEI questions you can email us on [email protected], but you are more likely to get a widerrange of answers on the TEI-L mailing list. All of the exercises will be made available from a link onthe DHOXSS website, and http://tei.oucs.ox.ac.uk/.
70
5.12 TEI reference material: summary of elements
5.12 TEI reference material: summary of elementsThe sections in this document summarize all the TEI elements as of June 2012.
TEI textstructure (TEI document) contains a single TEI-conformant document, comprisinga TEI header and a text, either in isolation or as part of a teiCorpuselement.
ab linking (anonymous block) contains any arbitrary component-level unit of text,acting as an anonymous container for phrase or inter level elementsanalogous to, but without the semantic baggage of, a paragraph.
abbr core (abbreviation) contains an abbreviation of any sort.accMat msdescription (accompanying material) contains details of any significant additional
material which may be closely associated with the manuscript beingdescribed, such as non-contemporaneous documents or fragments boundin with the manuscript at some earlier historical period.
acquisition msdescription contains any descriptive or other information concerning the process bywhich a manuscript or manuscript part entered the holding institution.
activity corpus contains a brief informal description of what a participant in a languageinteraction is doing other than speaking, if anything.
actor drama Name of an actor appearing within a cast list.add core (addition) contains letters, words, or phrases inserted in the text by an
author, scribe, annotator, or corrector.addName namesdates (additional name) contains an additional name component, such as a
nickname, epithet, or alias, or any other descriptive phrase used within apersonal name.
addSpan transcr (added span of text) marks the beginning of a longer sequence of textadded by an author, scribe, annotator or corrector (see also add).
additional msdescription groups additional information, combining bibliographic informationabout a manuscript, or surrogate copies of it with curatorial or admin-istrative information.
additions msdescription contains a description of any significant additions found within amanuscript, such as marginalia or other annotations.
addrLine core (address line) contains one line of a postal address.address core contains a postal address, for example of a publisher, an organization, or
an individual.adminInfo msdescription (administrative information) contains information about the present cus-
tody and availability of the manuscript, and also about the record descrip-tion itself.
affiliation namesdates (affiliation) contains an informal description of a person’s present or pastaffiliation with some organization, for example an employer or sponsor.
age namesdates (age) specifies the age of a person.alt linking (alternation) identifies an alternation or a set of choices among elements
or passages.altGrp linking (alternation group) groups a collection of alt elements and possibly
pointers.altIdent tagdocs (alternate identifier) supplies the recommended XML name for an ele-
ment, class, attribute, etc. in some language.altIdentifier msdescription (alternative identifier) contains an alternative or former structured identi-
fier used for a manuscript, such as a former catalogue number.am transcr (abbreviation marker) contains a sequence of letters or signs present in
an abbreviation which are omitted or replaced in the expanded form ofthe abbreviation.
71
Workshop: An Introduction to XML and the Text Encoding Initiative
analytic core (analytic level) contains bibliographic elements describing an item (e.g.an article or poem) published within a monograph or journal and not asan independent publication.
anchor linking (anchor point) attaches an identifier to a point within a text, whether ornot it corresponds with a textual element.
app textcrit (apparatus entry) contains one entry in a critical apparatus, with anoptional lemma and at least one reading.
appInfo header (application information) records information about an application whichhas edited the TEI file.
application header provides information about an application which has acted upon thedocument.
arc nets encodes an arc, the connection from one node to another in a graph.argument textstructure A formal list or prose description of the topics addressed by a subdivision
of a text.att tagdocs (attribute) contains the name of an attribute appearing within running
text.attDef tagdocs (attribute definition) contains the definition of a single attribute.attList tagdocs contains documentation for all the attributes associated with this element,
as a series of attDef elements.attRef tagdocs (attribute pointer) points to the definition of an attribute or group of
attributes.author core in a bibliographic reference, contains the name(s) of an author, personal
or corporate, of a work; for example in the same form as that providedby a recognized bibliographic name authority.
authority header (release authority) supplies the name of a person or other agency respon-sible for making an electronic file available, other than a publisher ordistributor.
availability header supplies information about the availability of a text, for example anyrestrictions on its use or distribution, its copyright status, any licenceapplying to it, etc.
back textstructure (back matter) contains any appendixes, etc. following the main part of atext.
bibl core (bibliographic citation) contains a loosely-structured bibliographic cita-tion of which the sub-components may or may not be explicitly tagged.
biblFull header (fully-structured bibliographic citation) contains a fully-structured bibli-ographic citation, in which all components of the TEI file description arepresent.
biblScope core (scope of citation) defines the scope of a bibliographic reference, forexample as a list of page numbers, or a named subdivision of a largerwork.
biblStruct core (structured bibliographic citation) contains a structured bibliographiccitation, in which only bibliographic sub-elements appear and in aspecified order.
bicond iso-fs (bi-conditional feature-structure constraint) defines a biconditionalfeature-structure constraint; both consequent and antecedent are specifiedas feature structures or groups of feature structures; the constraint issatisfied if both subsume a given feature structure, or if both do not.
binary iso-fs (binary value) represents the value part of a feature-value specificationwhich can contain either of exactly two possible values.
72
5.12 TEI reference material: summary of elements
binaryObject core provides encoded binary data representing an inline graphic or otherobject.
binding msdescription contains a description of one binding, i.e. type of covering, boards, etc.applied to a manuscript.
bindingDesc msdescription (binding description) describes the present and former bindings of amanuscript, either as a series of paragraphs or as a series of distinctbinding elements, one for each binding of the manuscript.
birth namesdates (birth) contains information about a person’s birth, such as its date andplace.
bloc namesdates (bloc) contains the name of a geo-political unit consisting of two or morenation states or countries.
body textstructure (text body) contains the whole body of a single unitary text, excludingany front or back matter.
broadcast spoken describes a broadcast used as the source of a spoken text.byline textstructure contains the primary statement of responsibility given for a work on its
title page or at the head or end of the work.c analysis (character) represents a character.cRefPattern header (canonical reference pattern) specifies an expression and replacement
pattern for transforming a canonical reference into a URI.caesura verse marks the point at which a metrical line may be divided.calendar header describes a calendar or dating system used in a dating formula in the text.calendarDesc header (calendar description) contains a description of the calendar system used
in any dating expression found in the text.camera drama describes a particular camera angle or viewpoint in a screen play.caption drama contains the text of a caption or other text displayed as part of a film
script or screenplay.case dictionaries contains grammatical case information given by a dictionary for a given
form.castGroup drama (cast list grouping) groups one or more individual castItem elements
within a cast list.castItem drama (cast list item) contains a single entry within a cast list, describing either
a single role or a list of non-speaking roles.castList drama (cast list) contains a single cast list or dramatis personae.catDesc header (category description) describes some category within a taxonomy or text
typology, either in the form of a brief prose description or in terms of thesituational parameters used by the TEI formal textDesc.
catRef header (category reference) specifies one or more defined categories within sometaxonomy or text typology.
catchwords msdescription describes the system used to ensure correct ordering of the quires makingup a codex or incunable, typically by means of annotations at the foot ofthe page.
category header contains an individual descriptive category, possibly nested within asuperordinate category, within a user-defined taxonomy.
cb core (column break) marks the boundary between one column of a text andthe next in a standard reference system.
cell figures contains one cell of a table.certainty certainty indicates the degree of certainty associated with some aspect of the text
markup.
73
Workshop: An Introduction to XML and the Text Encoding Initiative
change header documents a change or set of changes made during the production of asource document, or during the revision of an electronic file.
channel corpus (primary channel) describes the medium or channel by which a textis delivered or experienced. For a written text, this might be print,manuscript, e-mail, etc.; for a spoken one, radio, telephone, face-to-face,etc.
char gaiji (character) provides descriptive information about a character.charDecl gaiji (character declarations) provides information about nonstandard charac-
ters and glyphs.charName gaiji (character name) contains the name of a character, expressed following
Unicode conventions.charProp gaiji (character property) provides a name and value for some property of the
parent character or glyph.choice core groups a number of alternative encodings for the same point in a text.cit core (cited quotation) contains a quotation from some other document, to-
gether with a bibliographic reference to its source. In a dictionary it maycontain an example text with at least one occurrence of the word form,used in the sense being described, or a translation of the headword, or anexample.
cl analysis (clause) represents a grammatical clause.classCode header (classification code) contains the classification code used for this text in
some standard classification system.classDecl header (classification declarations) contains one or more taxonomies defining
any classificatory codes used elsewhere in the text.classRef tagdocs points to the specification for an attribute or model class which is to be
included in a schemaclassSpec tagdocs (class specification) contains reference information for a TEI element
class; that is a group of elements which appear together in contentmodels, or which share some common attribute, or both.
classes tagdocs specifies all the classes of which the documented element or class is amember or subclass.
climate namesdates (climate) contains information about the physical climate of a place.closer textstructure groups together salutations, datelines, and similar phrases appearing as a
final group at the end of a division, especially of a letter.code tagdocs contains literal code from some formal language such as a programming
language.collation msdescription contains a description of how the leaves or bifolia are physically ar-
ranged.collection msdescription contains the name of a collection of manuscripts, not necessarily located
within a single repository.colloc dictionaries (collocate) contains a collocate of the headword.colophon msdescription contains the colophon of a manuscript item: that is, a statement providing
information regarding the date, place, agency, or reason for productionof the manuscript.
cond iso-fs (conditional feature-structure constraint) defines a conditional feature-structure constraint; the consequent and the antecedent are specifiedas feature structures or feature-structure collections; the constraint issatisfied if both the antecedent and the consequent subsume a givenfeature structure, or if the antecedent does not.
74
5.12 TEI reference material: summary of elements
condition msdescription contains a description of the physical condition of the manuscript.constitution corpus describes the internal composition of a text or text sample, for example
as fragmentary, complete, etc.constraint tagdocs (constraint rules) the formal rules of a constraintconstraintSpec tagdocs (constraint on schema) contains a constraint, expressed in some formal
syntax, which cannot be expressed in the structural content modelcontent tagdocs (content model) contains the text of a declaration for the schema docu-
mented.corr core (correction) contains the correct form of a passage apparently erroneous
in the copy text.correction header (correction principles) states how and under what circumstances correc-
tions have been made in the text.country namesdates (country) contains the name of a geo-political unit, such as a nation,
country, colony, or commonwealth, larger than or administratively su-perior to a region and smaller than a bloc.
creation header contains information about the creation of a text.custEvent msdescription (custodial event) describes a single event during the custodial history of
a manuscript.custodialHist msdescription (custodial history) contains a description of a manuscript’s custodial
history, either as running prose or as a series of dated custodial events.damage transcr contains an area of damage to the text witness.damageSpan transcr (damaged span of text) marks the beginning of a longer sequence of text
which is damaged in some way but still legible.datatype tagdocs specifies the declared value for an attribute, by referring to any datatype
defined by the chosen schema language.date core contains a date in any format.dateline textstructure contains a brief description of the place, date, time, etc. of production
of a letter, newspaper story, or other work, prefixed or suffixed to it as akind of heading or trailer.
death namesdates (death) contains information about a person’s death, such as its date andplace.
decoDesc msdescription (decoration description) contains a description of the decoration of amanuscript, either as a sequence of paragraphs, or as a sequence oftopically organized decoNote elements.
decoNote msdescription (note on decoration) contains a note describing either a decorativecomponent of a manuscript, or a fairly homogenous class of suchcomponents.
def dictionaries (definition) contains definition text in a dictionary entry.default iso-fs (default feature value) represents the value part of a feature-value speci-
fication which contains a defaulted value.defaultVal tagdocs (default value) specifies the default declared value for an attribute.del core (deletion) contains a letter, word, or passage deleted, marked as deleted,
or otherwise indicated as superfluous or spurious in the copy text by anauthor, scribe, annotator, or corrector.
delSpan transcr (deleted span of text) marks the beginning of a longer sequence oftext deleted, marked as deleted, or otherwise signaled as superfluous orspurious by an author, scribe, annotator, or corrector.
75
Workshop: An Introduction to XML and the Text Encoding Initiative
depth msdescription contains a measurement measured across the spine of a book or codex, or(for other text-bearing objects) perpendicular to the measurement givenby the width element.
derivation corpus describes the nature and extent of originality of this text.desc core (description) contains a brief description of the object documented by
its parent element, including its intended usage, purpose, or applicationwhere this is appropriate.
dictScrap dictionaries (dictionary scrap) encloses a part of a dictionary entry in which otherphrase-level dictionary elements are freely combined.
dim msdescription contains any single measurement forming part of a dimensional specifi-cation of some sort.
dimensions msdescription contains a dimensional specification.distinct core identifies any word or phrase which is regarded as linguistically distinct,
for example as archaic, technical, dialectal, non-preferred, etc., or asforming part of a sublanguage.
distributor header supplies the name of a person or other agency responsible for thedistribution of a text.
district namesdates contains the name of any kind of subdivision of a settlement, such as aparish, ward, or other administrative or geographic unit.
div textstructure (text division) contains a subdivision of the front, body, or back of a text.div1 textstructure (level-1 text division) contains a first-level subdivision of the front, body,
or back of a text.div2 textstructure (level-2 text division) contains a second-level subdivision of the front,
body, or back of a text.div3 textstructure (level-3 text division) contains a third-level subdivision of the front, body,
or back of a text.div4 textstructure (level-4 text division) contains a fourth-level subdivision of the front,
body, or back of a text.div5 textstructure (level-5 text division) contains a fifth-level subdivision of the front, body,
or back of a text.div6 textstructure (level-6 text division) contains a sixth-level subdivision of the front,
body, or back of a text.div7 textstructure (level-7 text division) contains the smallest possible subdivision of the
front, body or back of a text, larger than a paragraph.divGen core (automatically generated text division) indicates the location at which a
textual division generated automatically by a text-processing applicationis to appear.
docAuthor textstructure (document author) contains the name of the author of the document, asgiven on the title page (often but not always contained in a byline).
docDate textstructure (document date) contains the date of a document, as given (usually) on atitle page.
docEdition textstructure (document edition) contains an edition statement as presented on a titlepage of a document.
docImprint textstructure (document imprint) contains the imprint statement (place and date ofpublication, publisher name), as given (usually) at the foot of a title page.
docTitle textstructure (document title) contains the title of a document, including all itsconstituents, as given on a title page.
76
5.12 TEI reference material: summary of elements
domain corpus (domain of use) describes the most important social context in whichthe text was realized or for which it is intended, for example private vs.public, education, religion, etc.
eLeaf nets (leaf or terminal node of an embedding tree) provides explicitly for aleaf of an embedding tree, which may also be encoded with the eTreeelement.
eTree nets (embedding tree) provides an alternative to tree element for representingordered rooted tree structures.
edition header (edition) describes the particularities of one edition of a text.editionStmt header (edition statement) groups information relating to one edition of a text.editor core secondary statement of responsibility for a bibliographic item, for exam-
ple the name of an individual, institution or organization, (or of severalsuch) acting as editor, compiler, translator, etc.
editorialDecl header (editorial practice declaration) provides details of editorial principles andpractices applied during the encoding of a text.
education namesdates contains a description of the educational experience of a person.eg tagdocs (example) contains any kind of illustrative example.egXML tagdocs (example of XML) contains a single well-formed XML fragment demon-
strating the use of some XML element or attribute, in which the egXMLelement itself functions as the root element.
elementRef tagdocs points to the specification for some element which is to be included in aschema
elementSpec tagdocs (element specification) documents the structure, content, and purpose ofa single element type.
email core (electronic mail address) contains an e-mail address identifying a loca-tion to which e-mail messages can be delivered.
emph core (emphasized) marks words or phrases which are stressed or emphasizedfor linguistic or rhetorical effect.
encodingDesc header (encoding description) documents the relationship between an electronictext and the source or sources from which it was derived.
entry dictionaries contains a single structured entry in any kind of lexical resource, such asa dictionary or lexicon.
entryFree dictionaries (unstructured entry) contains a single unstructured entry in any kind oflexical resource, such as a dictionary or lexicon.
epigraph textstructure contains a quotation, anonymous or attributed, appearing at the start orend of a section or on a title page.
epilogue drama contains the epilogue to a drama, typically spoken by an actor out ofcharacter, possibly in association with a particular performance or venue.
equipment spoken provides technical details of the equipment and media used for an audioor video recording used as the source for a spoken text.
equiv tagdocs (equivalent) specifies a component which is considered equivalent to theparent element, either by co-reference, or by external link.
etym dictionaries (etymology) encloses the etymological information in a dictionary entry.event namesdates (event) contains data relating to any kind of significant event associated
with a person, place, or organization.ex transcr (editorial expansion) contains a sequence of letters added by an editor or
transcriber when expanding an abbreviation.exemplum tagdocs groups an example demonstrating the use of an element along with
optional paragraphs of commentary.
77
Workshop: An Introduction to XML and the Text Encoding Initiative
expan core (expansion) contains the expansion of an abbreviation.explicit msdescription contains the explicit of a manuscript item, that is, the closing words of
the text proper, exclusive of any rubric or colophon which might followit.
extent header describes the approximate size of a text as stored on some carriermedium, whether digital or non-digital, specified in any convenient units.
f iso-fs (feature) represents a feature value specification, that is, the associationof a name with a value of any of several different types.
fDecl iso-fs (feature declaration) declares a single feature, specifying its name,organization, range of allowed values, and optionally its default value.
fDescr iso-fs (feature description (in FSD)) describes in prose what is represented bythe feature being declared and its values.
fLib iso-fs (feature library) assembles a library of feature elements.facsimile transcr contains a representation of some written source in the form of a set of
images rather than as transcribed or encoded text.factuality corpus describes the extent to which the text may be regarded as imaginative or
non-imaginative, that is, as describing a fictional or a non-fictional world.
faith namesdates specifies the faith, religion, or belief set of a person.figDesc figures (description of figure) contains a brief prose description of the appear-
ance or content of a graphic figure, for use when documenting an imagewithout displaying it.
figure figures groups elements representing or containing graphic information such asan illustration, formula, or figure.
fileDesc header (file description) contains a full bibliographic description of an electronicfile.
filiation msdescription contains information concerning the manuscript’s filiation, i.e. its rela-tionship to other surviving manuscripts of the same text, its protographs,antigraphs and apographs.
finalRubric msdescription contains the string of words that denotes the end of a text division, oftenwith an assertion as to its author and title, usually set off from the textitself by red ink, by a different size or type of script, or by some othersuch visual device.
floatingText textstructure contains a single text of any kind, whether unitary or composite, whichinterrupts the text containing it at any point and after which the surround-ing text resumes.
floruit namesdates contains information about a person’s period of activity.foliation msdescription describes the numbering system or systems used to count the leaves or
pages in a codex.foreign core (foreign) identifies a word or phrase as belonging to some language other
than that of the surrounding text.forename namesdates contains a forename, given or baptismal name.forest nets provides for groups of rooted trees.form dictionaries (form information group) groups all the information on the written and
spoken forms of one headword.formula figures contains a mathematical or other formula.front textstructure (front matter) contains any prefatory matter (headers, title page, prefaces,
dedications, etc.) found at the start of a document, before the main body.
78
5.12 TEI reference material: summary of elements
fs iso-fs (feature structure) represents a feature structure, that is, a collection offeature-value pairs organized as a structural unit.
fsConstraints iso-fs (feature-structure constraints) specifies constraints on the content of validfeature structures.
fsDecl iso-fs (feature structure declaration) declares one type of feature structure.fsDescr iso-fs (feature system description (in FSD)) describes in prose what is repre-
sented by the type of feature structure declared in the enclosing fsDecl.fsdDecl iso-fs (feature system declaration) provides a feature system declaration com-
prising one or more feature structure declarations or feature structuredeclaration links.
fsdLink iso-fs (feature structure declaration link) associates the name of a typed featurestructure with a feature structure declaration for it.
funder header (funding body) specifies the name of an individual, institution, or organi-zation responsible for the funding of a project or text.
fvLib iso-fs (feature-value library) assembles a library of reusable feature valueelements (including complete feature structures).
fw transcr (forme work) contains a running head (e.g. a header, footer), catchword,or similar material appearing on the current page.
g gaiji (character or glyph) represents a non-standard character or glyph.gap core (gap) indicates a point where material has been omitted in a transcription,
whether for editorial reasons described in the TEI header, as part ofsampling practice, or because the material is illegible, invisible, orinaudible.
gb core (gathering begins) marks the point in a transcribed codex at which a newgathering or quire begins.
gen dictionaries (gender) identifies the morphological gender of a lexical item, as givenin the dictionary.
genName namesdates (generational name component) contains a name component used todistinguish otherwise similar names on the basis of the relative ages orgenerations of the persons named.
geo namesdates (geographical coordinates) contains any expression of a set of geographiccoordinates, representing a point, line, or area on the surface of the earthin some notation.
geoDecl header (geographic coordinates declaration) documents the notation and thedatum used for geographic coordinates expressed as content of the geoelement elsewhere within the document.
geogFeat namesdates (geographical feature name) contains a common noun identifying somegeographical feature contained within a geographic name, such as valley,mount, etc.
geogName namesdates (geographical name) a name associated with some geographical featuresuch as Windrush Valley or Mount Sinai.
gi tagdocs (element name) contains the name (generic identifier) of an element.gloss core identifies a phrase or word used to provide a gloss or definition for some
other word or phrase.glyph gaiji (character glyph) provides descriptive information about a character
glyph.glyphName gaiji (character glyph name) contains the name of a glyph, expressed follow-
ing Unicode conventions for character names.
79
Workshop: An Introduction to XML and the Text Encoding Initiative
gram dictionaries (grammatical information) within an entry in a dictionary or a termino-logical data file, contains grammatical information relating to a term,word, or form.
gramGrp dictionaries (grammatical information group) groups morpho-syntactic informationabout a lexical item, e.g. pos, gen, number, case, or iType (inflectionalclass).
graph nets encodes a graph, which is a collection of nodes, and arcs which connectthe nodes.
graphic core indicates the location of an inline graphic, illustration, or figure.group textstructure contains the body of a composite text, grouping together a sequence of
distinct texts (or groups of such texts) which are regarded as a unit forsome purpose, for example the collected works of an author, a sequenceof prose essays, etc.
handDesc msdescription (description of hands) contains a description of all the different kinds ofwriting used in a manuscript.
handNote header (note on hand) describes a particular style or hand distinguished within amanuscript.
handNotes transcr contains one or more handNote elements documenting the differenthands identified within the source texts.
handShift transcr marks the beginning of a sequence of text written in a new hand, or thebeginning of a scribal stint.
head core (heading) contains any type of heading, for example the title of a section,or the heading of a list, glossary, manuscript description, etc.
headItem core (heading for list items) contains the heading for the item or gloss columnin a glossary list or similar structured list.
headLabel core (heading for list labels) contains the heading for the label or term columnin a glossary list or similar structured list.
height msdescription contains a measurement measured along the axis at right angles to thebottom of the written surface, i.e. parallel to the spine for a codex orbook.
heraldry msdescription contains a heraldic formula or phrase, typically found as part of a blazon,coat of arms, etc.
hi core (highlighted) marks a word or phrase as graphically distinct from thesurrounding text, for reasons concerning which no claim is made.
history msdescription groups elements describing the full history of a manuscript or manuscriptpart.
hom dictionaries (homograph) groups information relating to one homograph within anentry.
hyph dictionaries (hyphenation) contains a hyphenated form of a dictionary headword, orhyphenation information in some other form.
hyphenation header summarizes the way in which hyphenation in a source text has beentreated in an encoded version of it.
iNode nets (intermediate (or internal) node) represents an intermediate (or internal)node of a tree.
iType dictionaries (inflectional class) indicates the inflectional class associated with alexical item.
ident tagdocs (identifier) contains an identifier or name for an object of some kindin a formal language. ident is used for tokens such as variable names,class names, type names, function names etc. in formal programminglanguages.
80
5.12 TEI reference material: summary of elements
idno header (identifier) supplies any form of identifier used to identify some object,such as a bibliographic item, a person, a title, an organization, etc. in astandardized way.
if iso-fs defines a conditional default value for a feature; the condition is specifiedas a feature structure, and is met if it subsumes the feature structure inthe text for which a default value is sought.
iff iso-fs (if and only if) separates the condition from the consequence in a bicondelement.
imprimatur textstructure contains a formal statement authorizing the publication of a work,sometimes required to appear on a title page or its verso.
imprint core groups information relating to the publication or distribution of a biblio-graphic item.
incident spoken any phenomenon or occurrence, not necessarily vocalized or commu-nicative, for example incidental noises or other events affecting commu-nication.
incipit msdescription contains the incipit of a manuscript item, that is the opening words of thetext proper, exclusive of any rubric which might precede it, of sufficientlength to identify the work uniquely; such incipts were, in fomer times,frequently used a means of reference to a work, in place of a title.
index core (index entry) marks a location to be indexed for whatever purpose.institution msdescription contains the name of an organization such as a university or library, with
which a manuscript is identified, generally its holding institution.interaction corpus describes the extent, cardinality and nature of any interaction among
those producing and experiencing the text, for example in the form ofresponse or interjection, commentary, etc.
interp analysis (interpretation) summarizes a specific interpretative annotation whichcan be linked to a span of text.
interpGrp analysis (interpretation group) collects together a set of related interpretationswhich share responsibility or type.
interpretation header describes the scope of any analytic or interpretive information added tothe text in addition to the transcription.
item core contains one component of a list.join linking identifies a possibly fragmented segment of text, by pointing at the
possibly discontiguous elements which compose it.joinGrp linking (join group) groups a collection of join elements and possibly pointers.keywords header contains a list of keywords or phrases identifying the topic or nature of a
text.kinesic spoken any communicative phenomenon, not necessarily vocalized, for example
a gesture, frown, etc.l core (verse line) contains a single, possibly incomplete, line of verse.label core contains any label or heading used to identify part of a text, typically but
not exclusively in a list or glossary.lacunaEnd textcrit indicates the end of a lacuna in a mostly complete textual witness.lacunaStart textcrit indicates the beginning of a lacuna in the text of a mostly complete
textual witness.lang dictionaries (language name) name of a language mentioned in etymological or other
linguistic discussion.langKnowledge namesdates (language knowledge) summarizes the state of a person’s linguistic
knowledge, either as prose or by a list of langKnown elements.
81
Workshop: An Introduction to XML and the Text Encoding Initiative
langKnown namesdates (language known) summarizes the state of a person’s linguistic compe-tence, i.e., knowledge of a single language.
langUsage header (language usage) describes the languages, sublanguages, registers, di-alects, etc. represented within a text.
language header characterizes a single language or sublanguage used within a text.layout msdescription describes how text is laid out on the page, including information about
any ruling, pricking, or other evidence of page-preparation techniques.layoutDesc msdescription (layout description) collects the set of layout descriptions applicable to a
manuscript.lb core (line break) marks the start of a new (typographic) line in some edition
or version of a text.lbl dictionaries (label) contains a label for a form, example, translation, or other piece
of information, e.g. abbreviation for, contraction of, literally, approxi-mately, synonyms:, etc.
leaf nets encodes the leaves (terminal nodes) of a tree.lem textcrit (lemma) contains the lemma, or base text, of a textual variation.lg core (line group) contains one or more verse lines functioning as a formal unit,
e.g. a stanza, refrain, verse paragraph, etc.licence header contains information about a licence or other legal agreement applicable
to the text.line transcr contains the transcription of a topographic line in the source documentlink linking defines an association or hypertextual link among elements or passages,
of some type not more precisely specifiable by other elements.linkGrp linking (link group) defines a collection of associations or hypertextual links.list core (list) contains any sequence of items organized as a list.listBibl core (citation list) contains a list of bibliographic citations of any kind.listChange header groups a number of change descriptions associated with either the
creation of a source text or the revision of an encoded text.listEvent namesdates (list of events) contains a list of descriptions, each of which provides
information about an identifiable event.listForest nets provides for lists of forests.listNym namesdates (list of canonical names) contains a list of nyms, that is, standardized
names for any thing.listOrg namesdates (list of organizations) contains a list of elements, each of which provides
information about an identifiable organization.listPerson namesdates (list of persons) contains a list of descriptions, each of which provides
information about an identifiable person or a group of people, forexample the participants in a language interaction, or the people referredto in a historical source.
listPlace namesdates (list of places) contains a list of places, optionally followed by a list ofrelationships (other than containment) defined amongst them.
listRef tagdocs (list of references) supplies a list of significant references to places wherethis element is discussed, in the current document or elsewhere.
listRelation namesdates provides information about relationships identified amongst people,places, and organizations, either informally as prose or as formally ex-pressed relation links.
listTranspose transcr supplies a list of transpositions, each of which is indicated at some pointin a document typically by means of metamarks.
82
5.12 TEI reference material: summary of elements
listWit textcrit (witness list) lists definitions for all the witnesses referred to by a criticalapparatus, optionally grouped hierarchically.
localName gaiji (locally-defined property name) contains a locally defined name for someproperty.
locale corpus contains a brief informal description of the kind of place concerned, forexample: a room, a restaurant, a park bench, etc.
location namesdates defines the location of a place as a set of geographical coordinates, interms of other named geo-political entities, or as an address.
locus msdescription defines a location within a manuscript or manuscript part, usually as a(possibly discontinuous) sequence of folio references.
locusGrp msdescription groups a number of locations which together form a distinct but dis-continuous item within a manuscript or manuscript part, according toa specific foliation.
m analysis (morpheme) represents a grammatical morpheme.macroRef tagdocs points to the specification for some pattern which is to be included in a
schemamacroSpec tagdocs (macro specification) documents the function and implementation of a
pattern.mapping gaiji (character mapping) contains one or more characters which are related
to the parent character or glyph in some respect, as specified by the typeattribute.
material msdescription contains a word or phrase describing the material of which the objectbeing described is composed.
measure core contains a word or phrase referring to some quantity of an object orcommodity, usually comprising a number, a unit, and a commodity name.
measureGrp core (measure group) contains a group of dimensional specifications which re-late to the same object, for example the height and width of a manuscriptpage.
meeting core contains the formalized descriptive title for a meeting or conference,for use in a bibliographic description for an item derived from such ameeting, or as a heading or preamble to publications emanating from it.
memberOf tagdocs specifies class membership of the documented element or class.mentioned core marks words or phrases mentioned, not used.metDecl verse (metrical notation declaration) documents the notation employed to
represent a metrical pattern when this is specified as the value of a met,real, or rhyme attribute on any structural element of a metrical text (e.g.lg, l, or seg).
metSym verse (metrical notation symbol) documents the intended significance of aparticular character or character sequence within a metrical notation,either explicitly or in terms of other symbol elements in the samemetDecl.
metamark transcr contains or describes any kind of graphic or written signal within adocument the function of which is to determine how it should be readrather than forming part of the actual content of the document.
milestone core marks a boundary point separating any kind of section of a text, typicallybut not necessarily indicating a point at which some part of a standardreference system changes, where the change is not represented by astructural element.
mod transcr represents any kind of modification identified within a single document.
83
Workshop: An Introduction to XML and the Text Encoding Initiative
moduleRef tagdocs (module reference) references a module which is to be incorporated intoa schema.
moduleSpec tagdocs (module specification) documents the structure, content, and purpose of asingle module, i.e. a named and externally visible group of declarations.
monogr core (monographic level) contains bibliographic elements describing an item(e.g. a book or journal) published as an independent item (i.e. as aseparate physical object).
mood dictionaries contains information about the grammatical mood of verbs (e.g. indica-tive, subjunctive, imperative).
move drama (movement) marks the actual entrance or exit of one or more characterson stage.
msContents msdescription (manuscript contents) describes the intellectual content of a manuscriptor manuscript part, either as a series of paragraphs or as a series ofstructured manuscript items.
msDesc msdescription (manuscript description) contains a description of a single identifiablemanuscript or other text-bearing object.
msIdentifier msdescription (manuscript identifier) contains the information required to identify themanuscript being described.
msItem msdescription (manuscript item) describes an individual work or item within the intel-lectual content of a manuscript or manuscript part.
msItemStruct msdescription (structured manuscript item) contains a structured description for anindividual work or item within the intellectual content of a manuscriptor manuscript part.
msName msdescription (alternative name) contains any form of unstructured alternative nameused for a manuscript, such as an ocellus nominum, or nickname.
msPart msdescription (manuscript part) contains information about an originally distinctmanuscript or part of a manuscript, now forming part of a compositemanuscript.
musicNotation msdescription contains description of type of musical notation.name core (name, proper noun) contains a proper noun or noun phrase.nameLink namesdates (name link) contains a connecting phrase or link used within a name but
not regarded as part of it, such as van der or of.namespace header supplies the formal name of the namespace to which the elements
documented by its children belong.nationality namesdates contains an informal description of a person’s present or past nationality
or citizenship.node nets encodes a node, a possibly labeled point in a graph.normalization header indicates the extent of normalization or regularization of the original
source carried out in converting it to electronic form.notatedMusic figures encodes the presence of music notation in a textnote core contains a note or annotation.notesStmt header (notes statement) collects together any notes providing information about
a text additional to that recorded in other parts of the bibliographicdescription.
num core (number) contains a number, written in any form.number dictionaries indicates grammatical number associated with a form, as given in a
dictionary.numeric iso-fs (numeric value) represents the value part of a feature-value specification
which contains a numeric value or range.
84
5.12 TEI reference material: summary of elements
nym namesdates (canonical name) contains the definition for a canonical name or namecomponent of any kind.
oRef dictionaries (orthographic-form reference) in a dictionary example, indicates a refer-ence to the orthographic form(s) of the headword.
oVar dictionaries (orthographic-variant reference) in a dictionary example, indicates areference to variant orthographic form(s) of the headword.
objectDesc msdescription contains a description of the physical components making up the objectwhich is being described.
objectType msdescription contains a word or phrase describing the type of object being refered to.occupation namesdates contains an informal description of a person’s trade, profession or
occupation.offset namesdates that part of a relative temporal or spatial expression which indicates the
direction of the offset between the two place names, dates, or timesinvolved in the expression.
opener textstructure groups together dateline, byline, salutation, and similar phrases appear-ing as a preliminary group at the start of a division, especially of a letter.
org namesdates (organization) provides information about an identifiable organizationsuch as a business, a tribe, or any other grouping of people.
orgName namesdates (organization name) contains an organizational name.orig core (original form) contains a reading which is marked as following the
original, rather than being normalized or corrected.origDate msdescription (origin date) contains any form of date, used to identify the date of origin
for a manuscript or manuscript part.origPlace msdescription (origin place) contains any form of place name, used to identify the place
of origin for a manuscript or manuscript part.origin msdescription contains any descriptive or other information concerning the origin of a
manuscript or manuscript part.orth dictionaries (orthographic form) gives the orthographic form of a dictionary head-
word.p core (paragraph) marks paragraphs in prose.pRef dictionaries (pronunciation reference) in a dictionary example, indicates a reference
to the pronunciation(s) of the headword.pVar dictionaries (pronunciation-variant reference) in a dictionary example, indicates a
reference to variant pronunciation(s) of the headword.particDesc corpus (participation description) describes the identifiable speakers, voices, or
other participants in any kind of text.pause spoken a pause either between or within utterances.pb core (page break) marks the boundary between one page of a text and the next
in a standard reference system.pc analysis (punctuation character) a character or string of characters regarded as
constituting a single punctuation mark.per dictionaries (person) contains an indication of the grammatical person (1st, 2nd, 3rd,
etc.) associated with a given inflected form in a dictionary.performance drama contains a section of front or back matter describing how a dramatic piece
is to be performed in general or how it was performed on some specificoccasion.
persName namesdates (personal name) contains a proper noun or proper-noun phrase referringto a person, possibly including one or more of the person’s forenames,surnames, honorifics, added names, etc.
85
Workshop: An Introduction to XML and the Text Encoding Initiative
person namesdates provides information about an identifiable individual, for example aparticipant in a language interaction, or a person referred to in a historicalsource.
personGrp namesdates (personal group) describes a group of individuals treated as a singleperson for analytic purposes.
phr analysis (phrase) represents a grammatical phrase.physDesc msdescription (physical description) contains a full physical description of a manuscript
or manuscript part, optionally subdivided using more specialized ele-ments from the model.physDescPart class.
place namesdates contains data about a geographic locationplaceName namesdates contains an absolute or relative place name.population namesdates contains information about the population of a place.pos dictionaries (part of speech) indicates the part of speech assigned to a dictionary
headword such as noun, verb, or adjective.postBox core (postal box or post office box) contains a number or other identifier for
some postal delivery point other than a street address.postCode core (postal code) contains a numerical or alphanumeric code used as part of
a postal address to simplify sorting or delivery of mail.postscript textstructure contains a postscript, e.g. to a letter.precision certainty indicates the numerical accuracy or precision associated with some
aspect of the text markup.preparedness corpus describes the extent to which a text may be regarded as prepared or
spontaneous.principal header (principal researcher) supplies the name of the principal researcher
responsible for the creation of an electronic text.profileDesc header (text-profile description) provides a detailed description of non-
bibliographic aspects of a text, specifically the languages and sublan-guages used, the situation in which it was produced, the participants andtheir setting.
projectDesc header (project description) describes in detail the aim or purpose for which anelectronic file was encoded, together with any other relevant informationconcerning the process by which it was assembled or collected.
prologue drama contains the prologue to a drama, typically spoken by an actor out ofcharacter, possibly in association with a particular performance or venue.
pron dictionaries (pronunciation) contains the pronunciation(s) of the word.provenance msdescription contains any descriptive or other information concerning a single identi-
fiable episode during the history of a manuscript or manuscript part, afterits creation but before its acquisition.
ptr core (pointer) defines a pointer to another location.pubPlace core (publication place) contains the name of the place where a bibliographic
item was published.publicationStmt header (publication statement) groups information concerning the publication or
distribution of an electronic or other text.publisher core provides the name of the organization responsible for the publication or
distribution of a bibliographic item.purpose corpus characterizes a single purpose or communicative function of the text.
86
5.12 TEI reference material: summary of elements
q core (quoted) contains material which is distinguished from the surroundingtext using quotation marks or a similar method, for any one of a variety ofreasons including, but not limited to: direct speech or thought, technicalterms or jargon, authorial distance, quotations from elsewhere, andpassages that are mentioned but not used.
quotation header specifies editorial practice adopted with respect to quotation marks in theoriginal.
quote core (quotation) contains a phrase or passage attributed by the narrator orauthor to some agency external to the text.
rdg textcrit (reading) contains a single reading within a textual variation.rdgGrp textcrit (reading group) within a textual variation, groups two or more readings
perceived to have a genetic relationship or other affinity.re dictionaries (related entry) contains a dictionary entry for a lexical item related to the
headword, such as a compound phrase or derived form, embedded insidea larger entry.
recordHist msdescription (recorded history) provides information about the source and revisionstatus of the parent manuscript description itself.
recording spoken (recording event) details of an audio or video recording event used as thesource of a spoken text, either directly or from a public broadcast.
recordingStmt spoken (recording statement) describes a set of recordings used as the basis fortranscription of a spoken text.
redo transcr indicates one or more cancelled interventions in a document which havesubsequently been marked as reaffirmed or repeated.
ref core (reference) defines a reference to another location, possibly modified byadditional text or comment.
refState header (reference state) specifies one component of a canonical reference de-fined by the milestone method.
refsDecl header (references declaration) specifies how canonical references are con-structed for this text.
reg core (regularization) contains a reading which has been regularized or normal-ized in some sense.
region namesdates contains the name of an administrative unit such as a state, province, orcounty, larger than a settlement, but smaller than a country.
relatedItem core contains or references some other bibliographic item which is related tothe present one in some specified manner, for example as a constituent oralternative version of it.
relation namesdates (relationship) describes any kind of relationship or linkage amongst aspecified group of objects, places, events or people.
relationGrp namesdates (relation group) provides information about relationships identifiedamongst people, places, and organizations, either informally as prose oras formally expressed relation links.
remarks tagdocs contains any commentary or discussion about the usage of an element,attribute, class, or entity not otherwise documented within the containingelement.
rendition header supplies information about the rendition or appearance of one or moreelements in the source text.
repository msdescription contains the name of a repository within which manuscripts are stored,possibly forming part of an institution.
residence namesdates (residence) describes a person’s present or past places of residence.
87
Workshop: An Introduction to XML and the Text Encoding Initiative
resp core (responsibility) contains a phrase describing the nature of a person’sintellectual responsibility, or an organization’s role in the production ordistribution of a work.
respStmt core (statement of responsibility) supplies a statement of responsibility forthe intellectual content of a text, edition, recording, or series, wherethe specialized elements for authors, editors, etc. do not suffice or donot apply. May also be used to encode information about individuals ororganizations which have played a role in the production or distributionof a bibliographic work.
respons certainty (responsibility) identifies the individual(s) responsible for some aspect ofthe markup of particular element(s).
restore transcr indicates restoration of text to an earlier state by cancellation of aneditorial or authorial marking or instruction.
retrace transcr contains a sequence of writing which has been retraced, for example byover-inking, to clarify or fix it.
revisionDesc header (revision description) summarizes the revision history for a file.rhyme verse marks the rhyming part of a metrical line.role drama the name of a dramatic role, as given in a cast list.roleDesc drama (role description) describes a character’s role in a drama.roleName namesdates contains a name component which indicates that the referent has a
particular role or position in society, such as an official title or rank.root nets (root node) represents the root node of a tree.row figures contains one row of a table.rs core (referencing string) contains a general purpose name or referring string.rubric msdescription contains the text of any rubric or heading attached to a particular
manuscript item, that is, a string of words through which a manuscriptsignals the beginning of a text division, often with an assertion as to itsauthor and title, which is in some way set off from the text itself, usuallyin red ink, or by use of different size or type of script, or some other suchvisual device.
s analysis (s-unit) contains a sentence-like division of a text.said core (speech or thought) indicates passages thought or spoken aloud, whether
explicitly indicated in the source or not, whether directly or indirectlyreported, whether by real people or fictional characters.
salute textstructure (salutation) contains a salutation or greeting prefixed to a foreword,dedicatory epistle, or other division of a text, or the salutation in theclosing of a letter, preface, etc.
samplingDecl header (sampling declaration) contains a prose description of the rationale andmethods used in sampling texts in the creation of a corpus or collection.
schemaSpec tagdocs (schema specification) generates a TEI-conformant schema and docu-mentation for it.
scriptDesc msdescription contains a description of the scripts used in a manuscript or similarsource.
scriptNote header describes a particular script distinguished within the description of amanuscript or similar resource.
scriptStmt spoken (script statement) contains a citation giving details of the script used fora spoken text.
seal msdescription contains a description of one seal or similar attachment applied to amanuscript.
88
5.12 TEI reference material: summary of elements
sealDesc msdescription (seal description) describes the seals or other external items attached to amanuscript, either as a series of paragraphs or as a series of distinct sealelements, possibly with additional decoNotes.
secFol msdescription (second folio) The word or words taken from a fixed point in a codex(typically the beginning of the second leaf) in order to provide a uniqueidentifier for it.
seg linking (arbitrary segment) represents any segmentation of text below the chunklevel.
segmentation header describes the principles according to which the text has been segmented,for example into sentences, tone-units, graphemic strata, etc.
sense dictionaries groups together all information relating to one word sense in a dictionaryentry, for example definitions, examples, and translation equivalents.
series core (series information) contains information about the series in which abook or other bibliographic item has appeared.
seriesStmt header (series statement) groups information about the series, if any, to which apublication belongs.
set drama (setting) contains a description of the setting, time, locale, appearance,etc., of the action of a play, typically found in the front matter of a printedperformance text (not a stage direction).
setting corpus describes one particular setting in which a language interaction takesplace.
settingDesc corpus (setting description) describes the setting or settings within which alanguage interaction takes place, either as a prose description or as aseries of setting elements.
settlement namesdates contains the name of a settlement such as a city, town, or village identifiedas a single geo-political or administrative unit.
sex namesdates specifies the sex of a person.shift spoken marks the point at which some paralinguistic feature of a series of
utterances by any one speaker changes.sic core (Latin for thus or so) contains text reproduced although apparently
incorrect or inaccurate.signatures msdescription contains discussion of the leaf or quire signatures found within a codex.signed textstructure (signature) contains the closing salutation, etc., appended to a foreword,
dedicatory epistle, or other division of a text.soCalled core contains a word or phrase for which the author or narrator indicates a
disclaiming of responsibility, for example by the use of scare quotes oritalics.
socecStatus namesdates (socio-economic status) contains an informal description of a person’sperceived social or economic status.
sound drama describes a sound effect or musical sequence specified within a screenplay or radio script.
source msdescription describes the original source for the information contained with amanuscript description.
sourceDesc header (source description) describes the source from which an electronic textwas derived or generated, typically a bibliographic description in the caseof a digitized text, or a phrase such as "born digital" for a text which hasno previous existence.
sourceDoc transcr contains a transcription or other representation of a single sourcedocument potentially forming part of a dossier génétique or collectionof sources.
89
Workshop: An Introduction to XML and the Text Encoding Initiative
sp core (speech) An individual speech in a performance text, or a passagepresented as such in a prose or verse text.
spGrp drama (speech group) A group of speeches or songs in a performance textpresented in a source as constituting a single unit or number.
space transcr indicates the location of a significant space in the copy text.span analysis associates an interpretative annotation directly with a span of text.spanGrp analysis (span group) collects together span tags.speaker core A specialized form of heading or label, giving the name of one or more
speakers in a dramatic text or fragment.specDesc tagdocs (specification description) indicates that a description of the specified
element or class should be included at this point within a document.specGrp tagdocs (specification group) contains any convenient grouping of specifications
for use within the current module.specGrpRef tagdocs (reference to a specification group) indicates that the declarations con-
tained by the specGrp referenced should be inserted at this point.specList tagdocs (specification list) marks where a list of descriptions is to be inserted into
the prose documentation.sponsor header specifies the name of a sponsoring organization or institution.stage core (stage direction) contains any kind of stage direction within a dramatic
text or fragment.stamp msdescription contains a word or phrase describing a stamp or similar device.state namesdates contains a description of some status or quality attributed to a person,
place, or organization often at some specific time or for a specific daterange.
stdVals header (standard values) specifies the format used when standardized date ornumber values are supplied.
street core a full street address including any name or number identifying a buildingas well as the name of the street or route on which it is located.
stress dictionaries contains the stress pattern for a dictionary headword, if given separately.string iso-fs (string value) represents the value part of a feature-value specification
which contains a string.subc dictionaries (subcategorization) contains subcategorization information (transi-
tive/intransitive, countable/non-countable, etc.)subst transcr (substitution) groups one or more deletions with one or more additions
when the combination is to be regarded as a single intervention in thetext.
substJoin transcr (substitution join) identifies a series of possibly fragmented additions,deletions or other revisions on a manuscript that combine to make up asingle intervention in the text
summary msdescription contains an overview of the available information concerning someaspect of an item (for example, its intellectual content, history, layout,typography etc.) as a complement or alternative to the more detailedinformation carried by more specific elements.
superEntry dictionaries groups a sequence of entries within any kind of lexical resource, such asa dictionary or lexicon which function as a single unit, for example a setof homographs.
supplied transcr signifies text supplied by the transcriber or editor for any reason, typicallybecause the original cannot be read because of physical damage or lossto the original.
90
5.12 TEI reference material: summary of elements
support msdescription contains a description of the materials etc. which make up the physicalsupport for the written part of a manuscript.
supportDesc msdescription (support description) groups elements describing the physical support forthe written part of a manuscript.
surface transcr defines a written surface as a two-dimensional coordinate space, option-ally grouping one or more graphic representations of that space, zones ofinterest within that space, and transcriptions of the writing within them.
surfaceGrp transcr defines any kind of useful grouping of written surfaces, for example therecto and verso of a single leaf, which the encoder wishes to treat as asingle unit.
surname namesdates contains a family (inherited) name, as opposed to a given, baptismal, ornick name.
surplus transcr marks text present in the source which the editor believes to be superflu-ous or redundant.
surrogates msdescription contains information about any representations of the manuscript beingdescribed which may exist in the holding institution or elsewhere.
syll dictionaries (syllabification) contains the syllabification of the headword.symbol iso-fs (symbolic value) represents the value part of a feature-value specification
which contains one of a finite list of symbols.table figures contains text displayed in tabular form, in rows and columns.tag tagdocs contains text of a complete start- or end-tag, possibly including attribute
specifications, but excluding the opening and closing markup delimitercharacters.
tagUsage header supplies information about the usage of a specific element within a text.tagsDecl header (tagging declaration) provides detailed information about the tagging
applied to a document.taxonomy header defines a typology either implicitly, by means of a bibliographic citation,
or explicitly by a structured taxonomy.tech drama (technical stage direction) describes a special-purpose stage direction that
is not meant for the actors.teiCorpus core contains the whole of a TEI encoded corpus, comprising a single corpus
header and one or more TEI elements, each containing a single textheader and a text.
teiHeader header (TEI Header) supplies the descriptive and declarative information mak-ing up an electronic title page prefixed to every TEI-conformant text.
term core contains a single-word, multi-word, or symbolic designation which isregarded as a technical term.
terrain namesdates contains information about the physical terrain of a place.text textstructure contains a single text of any kind, whether unitary or composite, for
example a poem or drama, a collection of essays, a novel, a dictionary,or a corpus sample.
textClass header (text classification) groups information which describes the nature ortopic of a text in terms of a standard classification scheme, thesaurus,etc.
textDesc corpus (text description) provides a description of a text in terms of its situationalparameters.
textLang core (text language) describes the languages and writing systems identifiedwithin the bibliographic work being described, rather than its description.
91
Workshop: An Introduction to XML and the Text Encoding Initiative
then iso-fs separates the condition from the default in an if, or the antecedent andthe consequent in a cond element.
time core contains a phrase defining a time of day in any format.timeline linking (timeline) provides a set of ordered points in time which can be linked to
elements of a spoken text to create a temporal alignment of that text.title core contains a title for any kind of work.titlePage textstructure (title page) contains the title page of a text, appearing within the front or
back matter.titlePart textstructure contains a subsection or division of the title of a work, as indicated on a
title page.titleStmt header (title statement) groups information about the title of a work and those
responsible for its content.tns dictionaries (tense) indicates the grammatical tense associated with a given inflected
form in a dictionary.trailer textstructure contains a closing title or footer appearing at the end of a division of a
text.trait namesdates contains a description of some status or quality attributed to a person,
place, or organization typically, but not necessarily, independent of thevolition or action of the holder and usually not at some specific time orfor a specific date range.
transpose transcr describes a single textual transposition as an ordered list of at least twopointers specifying the order in which the elements indicated should bere-combined.
tree nets encodes a tree, which is made up of a root, internal nodes, leaves, andarcs from root to leaves.
triangle nets (underspecified embedding tree, so called because of its characteristicshape when drawn) Provides for an underspecified eTree, that is, an eTreewith information left out.
typeDesc msdescription contains a description of the typefaces or other aspects of the printing ofan incunable or other printed source.
typeNote header describes a particular font or other significant typographic feature distin-guished within the description of a printed resource.
u spoken (utterance) a stretch of speech usually preceded and followed by silenceor by a change of speaker.
unclear core contains a word, phrase, or passage which cannot be transcribed withcertainty because it is illegible or inaudible in the source.
undo transcr indicates one or more marked-up interventions in a document which havesubsequently been marked for cancellation.
unicodeName gaiji (unicode property name) contains the name of a registered Unicodenormative or informative property.
usg dictionaries (usage) contains usage information in a dictionary entry.vAlt iso-fs (value alternation) represents the value part of a feature-value specifica-
tion which contains a set of values, only one of which can be valid.vColl iso-fs (collection of values) represents the value part of a feature-value specifi-
cation which contains multiple values organized as a set, bag, or list.
92
5.12 TEI reference material: summary of elements
vDefault iso-fs (value default) declares the default value to be supplied when a featurestructure does not contain an instance of f for this name; if unconditional,it is specified as one (or, depending on the value of the org attribute ofthe enclosing fDecl) more fs elements or primitive values; if conditional,it is specified as one or more if elements; if no default is specified, or nocondition matches, the value none is assumed.
vLabel iso-fs (value label) represents the value part of a feature-value specificationwhich appears at more than one point in a feature structure.
vMerge iso-fs (merged collection of values) represents a feature value which is theresult of merging together the feature values contained by its children,using the organization specified by the org attribute.
vNot iso-fs (value negation) represents a feature value which is the negation of itscontent.
vRange iso-fs (value range) defines the range of allowed values for a feature, in the formof an fs, vAlt, or primitive value; for the value of an f to be valid, it mustbe subsumed by the specified range; if the f contains multiple values (assanctioned by the org attribute), then each value must be subsumed bythe vRange.
val tagdocs (value) contains a single attribute value.valDesc tagdocs (value description) specifies any semantic or syntactic constraint on the
value that an attribute may take, additional to the information carried bythe datatype element.
valItem tagdocs documents a single attribute-value within a list of possible or mandatoryitems.
valList tagdocs (value list) contains one or more valItem elements defining possiblevalues for an attribute.
value gaiji (value) contains a single value for some property, attribute, or otheranalysis.
variantEncoding textcrit declares the method used to encode text-critical variants.view drama describes the visual context of some part of a screen play in terms of what
the spectator sees, generally independent of any dialogue.vocal spoken any vocalized but not necessarily lexical phenomenon, for example
voiced pauses, non-lexical backchannels, etc.w analysis (word) represents a grammatical (not necessarily orthographic) word.watermark msdescription contains a word or phrase describing a watermark or similar device.when linking indicates a point in time either relative to other elements in the same
timeline tag, or absolutely.width msdescription contains a measurement measured along the axis parallel to the bottom
of the written surface, i.e. perpendicular to the spine of a book or codex.wit textcrit contains a list of one or more sigla of witnesses attesting a given reading,
in a textual variation.witDetail textcrit (witness detail) gives further information about a particular witness, or
witnesses, to a particular reading.witEnd textcrit (fragmented witness end) indicates the end, or suspension, of the text of
a fragmentary witness.witStart textcrit (fragmented witness start) indicates the beginning, or resumption, of the
text of a fragmentary witness.witness textcrit contains either a description of a single witness referred to within the
critical apparatus, or a list of witnesses which is to be referred to by asingle sigil.
93
Workshop: An Introduction to XML and the Text Encoding Initiative
writing spoken a passage of written text revealed to participants in the course of a spokentext.
xr dictionaries (cross-reference phrase) contains a phrase, sentence, or icon referring thereader to some other location in this or another text.
zone transcr defines any two-dimensional area within a surface element.
94
5.13 Wilfred Owen: Letter To Leslie Gunston
5.13 Wilfred Owen: Letter To Leslie Gunston
95
Workshop: An Introduction to XML and the Text Encoding Initiative
96
5.14 Wilfred Owen: Preface MS
5.14 Wilfred Owen: Preface MS
97
Workshop: An Introduction to XML and the Text Encoding Initiative
5.15 Stuart Lee interviews Ian Hislop (fragment)
[gap for sampling purposes]
Lee (24.27-24.36):So em d-em having read [clicking sound: 0.17s] the Wipers Timesnow and and your [pause: 0.62s] view thatth th the thirties was that thatprisonment you say
Hislop (24.36):Yeah.
Lee (24.36-24.42)[clicking sound: 1.28s] looking backon a failed piece [pause: 0.35s]has your attitudeto the to the War PoetsWilfred Owen Sassoon [pause: 0.30s]changed over time?
Hislop (24.42-27.09):[pause: 0.76s]Not really I meanI’ve-I read the Owen again, em [pause: 0.20s]very recentlyand just thought how brilliant [laughs in -iant syllable][pause: 0.40s] um those poems are [pause: 0.50s]um [pause: 0.65s] and it’s [pause: 0.70s] it’s not that um [pause: 0.70s]I don’t think they’re any goodI just think there were other voices [pause: 0.50s]um and [pause: 0.50s]they have been takenas all there wasI mean I think they doexpress the horror [pause: 0.63s]quite wonderfully [pause: 0.50s]um it’s like the [pause: 0.45s]the painting of the um [pause: 0.55s]’The Gassed’ [Note: John Singer Sargent, 1918][pause: 0.60s] um that’s um Sargent
Lee (25.13):Yes.
Hislop (25.13-26.35)um [pause: 0.50s] which I saw again andjust was completely blown awayby how brilliant he wasand that [pause: 1.15s] that is what theFirst World War was about [pause: 0.75s][clicking sound:0.06s]but [pause: 0.95s]
98
5.15 Stuart Lee interviews Ian Hislop (fragment)
there were some other [clicking sound: 0.5s] bits [pause: 0.20s]um, and it is [pause: 0.50s] I thinkMacDonald has done it really well inhis ’Voices’ and againin those pictures you you [pause: 0.75s]yes there is the horror [pause: 0.40s]um and um horror like you haven’t seen beforeand that’s expressedbut [pause: 0.60s] um [pause: 0.65s]you knowit lasted for four yearsand [pause: 0.81s] not every day was the first dayof the Battle of the Somme [pause: 0.70s]and there are [pause: 1.05s]other things that were happening [pause: 0.90s]in the trenches and whatever that wereum [pause: 0.70s] worth rememberingA and it was so [pause: 0.40s]certainly vital in changinghow Britain was [pause: 1.80s]And when I came [pause: 0.65s]I was doing that programme about the memorialsand I thought [pause: 1.30s]the first one I saw where [pause: 0.60s]there were no ranks [pause: 1.40s]the dead were all just listed [pause: 0.80s]um and this is the first time in British history [pause: 0.40s] um [pause: 0.35s]and it was because of the people in the trenches said"we fought together, [pause: 0.30s] we’re gonna die together" [pause: 0.35s] um [pause: 0.40s]and it was that feeling of [pause: 0.80s]well [pause: 0.75s] something reallyfundamental had changed [Note: syllables -damental while laughing] [[pause: 0.45s]um, and when they came back [pause: 0.65s][clicking sound: 0.8s] you know [pause: 0.82s]flawed for the rest of the century [pause: 0.45s]that’s when it all happened [pause: 0.30s]
Lee (26.35)hmm
Hislop (26.36-27.09)I think it’s difficult to read the history of the centurycertainly the middle bits of it [pause: 0.30s]without remembering that [pause: 1.20s]someone said there was an enormous black cloud over [pause: 1.23s]Britain [pause: 1.12s] of grief [pause: 1.00s]um [pause: 0.50s] and it’s just the fact thateveryone had lost someone [pause: 0.90s]and um [pause: 0.60s]we tend to assume that everyone goes onand it’s normal and they do their thingsbut actually for almost the entire country [pause: 0.83s]
99
Workshop: An Introduction to XML and the Text Encoding Initiative
everybody was spending every daysome bit of it thinking [pause: 0.92s]"they’re dead" [pause: 1.45s] I always find that extraordinary [pause: 0.30s]
[gap for sampling purposes]
100
6.1 Timetable
6 Workshop: Working with TEI TextsThis advanced workshop will teach how to do something practical with your TEI XML texts beyondsimply converting them to HTML and putting them on the web. A mixture of talks and practical exerciseswill take participants through:
• Advanced validation and integrity checking using TEI ODD, Schematron and XSLT
• Transforming your TEI XML to formats other than HTML (Word, ePub, LaTeX etc)
• Extracting data from TEI texts for further analysis (eg names and places)
• Processing some more complex TEI documents (eg genetic encoding and timelines)
• Storing TEI documents in an XML database and querying them
Our document sets will consist of some 18th century ECCO texts (full set from http://www.ota.ox.ac.uk/catalogue/), the diaries of William Godwin of the years 1788-1791(full set from http://godwindiary.bodleian.ox.ac.uk/godwindiary.zip), a setof Greek epigraphical records (full set from http://irt.kcl.ac.uk/irt2009/redist/inscr/irt2009-P5.zip), and a poem of Wilfred Owen.
6.1 Timetable
When SubjectMonam
The ODD system and using constraints in Schematron
Monpm
a) write ODD from scratch with embedded Schematron constraint and check doc-ument instances; b) write an XSLT stylesheet which analyzes some aspect of thedocument set.
Tuesam
Processing some more complex TEI documents, and XSLT / XPath techniquesneeded
Tuespm
Write XSLT stylesheet to display facsimile encoding (<facsimile> and<sourceDoc> elements) in web page, using combination of HTML and CSS.
Wedam
Extracting and summarizing data in TEI texts, looking for names, dates and places.Understanding functions, grouping and sorting techniques in XSLT.
Wedpm
Extract catalogue of names and dates, and visualize the results by creating CSV fileand loading it into spreadsheet.
Thursam
XML databases and techniques for managing large-scale collections. Demonstrationof systems including eXist, BaseX, Cocoon, Solr, etc. Understanding XQuery.
Thurspm
Set up BaseX database, import XML files, and make XQueries against the system togenerate HTML files
Friam
Transforming TEI XML to and from formats other than HTML (Word, ePub, LaTeXetc)
Fripm
Set up a local instance of the TEI stylesheet family; define a Word template; createa Word document; develop stylesheet to turn the Word into TEI XML. Experimentwith round-tripping.
Requirements: You must already have a good basic knowledge of XML, TEI and some familiarity withprogramming/scripting ideas. Most of the work will be based on XSLT and XPath.
101
Workshop: Working with TEI Texts
6.2 Data samples6.2.1 ECCO
<TEI xmlns="http://www.tei-c.org/ns/1.0"><teiHeader><fileDesc><titleStmt><title>The carpenter: or, the danger of evil
company.</title><author>More, Hannah, 1745-1833.</author>
</titleStmt><publicationStmt><publisher>University of Michigan Library</publisher><pubPlace>Ann Arbor, Michigan</pubPlace><date>2009 April</date><availability><p>These pages may be freely searched and displayed.
Permission must be received for subsequentdistribution in print or electronically. Please goto http://www.lib.umich.edu/tcp/ecco for moreinformation.</p>
</availability><idno type="STC">n001762</idno><idno type="TCP">K001133.000</idno><idno type="BIBNO">cw3317776505</idno><idno type="ECRP">0098301900</idno>
</publicationStmt><sourceDesc><biblFull><titleStmt><title>The carpenter: or, the danger of evil
company.</title><author>More, Hannah, 1745-1833.</author>
</titleStmt><extent>1 sheet : ill. ; 10.</extent><publicationStmt><pubPlace>[Bath] :</pubPlace><publisher>Sold by S. Hazard, at Bath; by J.
Marshall, Cheap-side, and Aldermary church-yard;R. White, London; and by all booksellers, newsmen,and hawkers, in town and country,</publisher>
<date>[1795?]</date></publicationStmt><notesStmt><note>Signed at end: Z, i.e. Hannah More.</note><note>Verse.</note><note>At head of title: Cheap repository.</note><note>Reproduction of original from the Harvard
University Houghton Library.</note><note>English Short Title Catalog, ESTCN1762.</note><note>Electronic data. Farmington Hills, Mich. :
Thomson Gale, 2003. Page image (PNG). Digitizedimage of the microfilm version produced inWoodbridge, CT by Research Publications, 1982-2002(later known as Primary Source Microfilm, animprint of the Gale Group).</note>
</notesStmt></biblFull>
</sourceDesc></fileDesc><encodingDesc><projectDesc>
102
6.2 Data samples
<p>Created by converting TCP files to TEI P5 usingtcp2tei.xsl, TEI @ Oxford. </p>
</projectDesc><editorialDecl n="4"><p>This electronic text file was keyed from page images and
partially proofread for accuracy. Character capture andencoding have been done following the guidelines of theECCO Text Creation Partnership, which correspond roughlyto the recommendations found in Level 4 of the TEI inLibraries Guidelines. Digital page images are linked tothe text file.</p>
</editorialDecl></encodingDesc><profileDesc><langUsage><language ident="eng">eng</language>
</langUsage></profileDesc>
</teiHeader><text xml:lang="eng"><body><div type="poem"><pb facs="1" rend="none"/><head>THE CARPENTER; Or, the DANGER of EVIL COMPANY.</head><lg><l>THERE was a young West-country man,</l><l>A Carpenter by trade;</l><l>A skilful wheelwright too was he,</l><l>And few such Waggons made.</l>
</lg><lg><l>No Man a tighter Barn cou’d build,</l><l>Throughout his native town,</l><l>Thro’ many a village round was he,</l><l>The best of workmen known.</l>
</lg><lg><l>His father left him what he had,</l><l>In sooth it was enough;</l><l>His shining pewter, pots of brass,</l><l>And all his household stuff.</l>
</lg><lg><l>A little cottage too he had,</l><l>For ease and comfort plann’d,</l><l>And that he might not lack for ought,</l><l>An acre of good land.</l>
</lg></div>
</body><back><div type="colophon"><p>Sold by S. HAZARD, (PRINTER to the CHEAP REPOSITORY for
Religious and Moral Tracts) at BATH; By J. MARSHALL,PRINTER to the CHEAP REPOSITORIES No. 17, Queen-Street,Cheap-Side, and No. 4, Aldermary Church-Yard; R. WHITE,Piccadilly, LONDON; and by all Booksellers, Newsmen, andHawkers, in Town and Country.--Great Allowance will bemade to Shopkeepers and Hawkers.</p>
<p>Price an Half-penny, or 2s. 3d, per 100. 1s. 3d, for 50,9d. for 25.</p>
</div>
103
Workshop: Working with TEI Texts
</back></text>
</TEI>
6.2.2 Godwin
<div xml:id="g1798" type="dYear"><div type="dMonth" xml:id="g1798-01"><ab type="dDay" xml:id="g1798-01-01"><date when="1798-01-01">Jan. 1. 1798. M.</date><ref type="dText" subtype="read" target="/bibl/te0807.html">Burke’s
3<hi rend="sup">rd</hi> Letter, p. 34</ref>:<ref type="dText" subtype="read" target="/bibl/te0808.html">Rival
Queens, acts 1, 2, 3</ref>. <seg type="dMeeting" subtype="CG"><persName ref="/people/FAW01.html">Fawcet</persName>
calls</seg>: <seg type="dMeal" subtype="SG"><persName ref="/people/MAR01.html">M</persName> sups</seg>.
<seg type="dMeeting" subtype="M">meet<persName>Barnes</persName></seg>.</ab>
<ab type="dDay" xml:id="g1798-01-02"><date when="1798-01-02">2. Tu.</date><ref type="dWrote" subtype="write" target="/works/leon01.html">O. M., p. 2,
3</ref>. <ref type="dText" subtype="read" target="/bibl/te0810.html">Burke’sMemorials, p. 40</ref>.
<seg type="dMeeting" subtype="CG"><persName ref="/people/COO05.html">Miss Cooper</persName>
</seg>, <seg type="dMeeting" subtype="CG"><persName ref="/people/HOL10.html">mrs Cole</persName>
</seg>, <seg type="dMeeting" subtype="CG"><persName ref="/people/HOL06.html">F Ht</persName>
</seg> & <seg type="dMeeting" subtype="CG"><persName ref="/people/FEN01.html">F</persName> call</seg>:
<seg type="dMeal" subtype="D">dine at <persName ref="/people/JOH01.html"><placeName type="venue">Johnson’s</placeName>
</persName>, w. <persName ref="/people/FUS01.html">Fuseli</persName> &<persName>Wilkinson</persName>. </seg>
<seg type="dMeeting" subtype="See"><persName ref="/people/CAR01.html">Carlisle</persName> &
<persName ref="/people/COM01.html">Combe</persName></seg>.</ab>
<ab type="dDay" xml:id="g1798-01-03"><date when="1798-01-03">3. W.</date><ref type="dText" subtype="read" target="/bibl/te0810.html">Memorials, p.
122</ref>. <seg type="dMeeting" subtype="CG"><persName ref="/people/COM01.html">Combe</persName>
</seg> & <seg type="dMeeting" subtype="CG"><persName ref="/people/WHI03.html">White</persName>
call</seg>: <seg type="dMeeting" subtype="C">call on<persName ref="/people/LES02.html" type="nah">
<placeName type="venue">Leslie</placeName></persName> n</seg>, <seg type="dMeeting" subtype="C"><persName ref="/people/KEA01.html" type="nah"><placeName type="venue">Kearsley</placeName>
</persName> n</seg>, & <seg type="dMeeting" subtype="C"><persName ref="/people/NIC01.html" type="nah"><placeName type="venue">Nicholson</placeName>
</persName> n</seg>. <ref type="dEntertainment" subtype="Theat" target="/plays/cast01.html"><placeName type="DL"/>Theatre, 3/10 Castle
Spectre</ref>.</ab>
104
6.2 Data samples
</div></div>
6.2.3 IRT
<TEI xmlns="http://www.tei-c.org/ns/1.0"><teiHeader><fileDesc><titleStmt><title><rs type="textType">Dedication</rs> to Geta </title>
<editor>J. M. Reynolds</editor><editor>J. B. Ward-Perkins</editor>
</titleStmt><publicationStmt><authority>Centre for Computing in the Humanities, King’s
College London</authority><idno type="filename">IRT036</idno><availability><p>Creative Commons licence Attribution UK 2.0
(<ref>http://creativecommons.org/licenses/by/2.0/uk/</ref>). </p><p>All reuse or distribution of this work must contain
somewhere a link back to the URL<ref>http://irt.kcl.ac.uk/</ref></p>
</availability></publicationStmt><sourceDesc><bibl xml:id="irt1952"><author>J. M. Reynolds</author> and <author>J. B.
Ward-Perkins</author>, <title level="m">TheInscriptions of Roman Tripolitania</title>,
<pubPlace>Rome</pubPlace>: <publisher>British Schoolat Rome</publisher>, <date>1952</date>. </bibl>
<msDesc><msIdentifier/><physDesc><objectDesc><supportDesc><support><p> Impression left by a fragment from the lower
part of a lost <material>marble</material><rs type="objectType">panel</rs> (approx. <dimensions><width unit="metre">0.34</width><height unit="metre">0.39</height>
</dimensions>).</p></support>
</supportDesc><layoutDesc><layout>Original text inscribed within a moulded
border. </layout></layoutDesc>
</objectDesc><handDesc><handNote>Capitals: <height unit="metre">0.07</height>. </handNote>
</handDesc></physDesc><history><origin><p>Unknown</p><origDate notBefore="0198-12-10" notAfter="0199-12-09" evidence="titulature">Between
105
Workshop: Working with TEI Texts
10th Dec. A.D. 198 and 9th Dec. 199.(titulature)</origDate>
</origin><provenance><listEvent><event type="found"><p><placeName
type="ancientFindspot"ref="http://atlantides.org/batlas/abrotonum-sabratha-35-e2"key="db659">Sabratha</placeName>: <rs type="monuList" key="db853">Office
Baths</rs>,re-used in the fourth century pavement of the SCaldarium. </p>
</event><event type="observed"><p>Findspot</p>
</event></listEvent>
</provenance></history>
</msDesc></sourceDesc>
</fileDesc><encodingDesc><p>Marked-up according to the EpiDoc Guidelines version 8</p>
</encodingDesc><profileDesc><langUsage><language ident="ar">Arabic</language><language ident="en">English</language><language ident="fr">French</language><language ident="de">German</language><language ident="grc">Ancient Greek</language><language ident="grc-Latn">Transliterated Greek</language><language ident="el">Modern Greek</language><language ident="he">Hebrew</language><language ident="it">Italian</language><language ident="la">Latin</language><language ident="phn-LY">Punic</language><language ident="ber-Latn">Native Libyan language in Latin
script</language></langUsage><textClass/><textClass><keywords scheme="IRCyr"><term><geogName type="ancientRegion" key="Tripolitania">Tripolitania</geogName>
</term><term><geogName type="modernCountry" key="LY">Libya</geogName>
</term><term><placeName
type="modernFindspot"key="http://www.geonames.org/2208578/marsa-zawaghah.html">Marsa
Zawaghah</placeName></term>
</keywords></textClass>
</profileDesc></teiHeader>
106
6.2 Data samples
<text><body><div type="bibliography"><head>Bibliography</head><p>Not previously published.</p>
</div><div subtype="text-constituted-from" type="history"><head>Text constituted from</head><p>Transcription (Reynolds, Ward-Perkins)</p>
</div><div type="edition" xml:lang="la" xml:space="preserve"><head xml:lang="en">Text</head><ab><lb n="1"/><persName type="emperor"><supplied reason="lost"><name type="praenomen" nymRef="Publius"><expan><abbr>P</abbr><ex>ublio</ex>
</expan></name><name type="gentilicium" nymRef="Septimius">Septimio</name><name type="cognomen" nymRef="Geta">Getae</name><w lemma="nobilis">nobilissimo</w><name type="cognomen" nymRef="Caesar">Caesari</name>
</supplied></persName><lb n="2"/><supplied reason="lost"><w lemma="imperator"><expan><abbr>Imp</abbr><ex>eratoris</ex>
</expan></w>
</supplied><persName type="emperor"><supplied reason="lost"><name type="cognomen" nymRef="Caesar"><expan><abbr>Caes</abbr><ex>aris</ex>
</expan></name><name type="praenomen" nymRef="Lucius"><expan><abbr>L</abbr><ex>uci</ex>
</expan></name><name type="gentilicium" nymRef="Septimius">Septimi</name><name type="cognomen" nymRef="Seuerus">Seueri</name><name type="cognomen" nymRef="Pius">Pii</name><name type="cognomen" nymRef="Pertinax">Pertinacis</name>
</supplied><lb n="3"/><supplied reason="lost"><name type="cognomen" nymRef="Augustus"><expan><abbr>Aug</abbr><ex>usti</ex>
107
Workshop: Working with TEI Texts
</expan></name>
</supplied><name><supplied reason="lost">Arab</supplied>ici</name>
<name type="cognomen" nymRef="Adiabenicus">Ad<supplied reason="lost">iabenici</supplied></name><supplied reason="lost"><name type="cognomen" nymRef="Parthicus">Parthici</name><w lemma="magnus">maximi</w>
</supplied></persName><lb n="4"/><supplied reason="lost"><w lemma="tribunicius"><expan><abbr>trib</abbr><ex>unicia</ex>
</expan></w><w lemma="potestas"><expan><abbr>pot</abbr><ex>estate</ex>
</expan></w>
</supplied><num value="7">VII</num><w lemma="imperator"><expan><abbr>imp</abbr><ex>eratoris</ex>
</expan></w><num value="11">X<supplied reason="lost">I</supplied></num><supplied reason="lost"><w lemma="consul"><expan><abbr>co</abbr><ex>n</ex><abbr>s</abbr><ex>ulis</ex>
</expan></w><num value="2">II</num><w lemma="pater"><expan><abbr>p</abbr><ex>atris</ex>
</expan></w><w lemma="patria"><expan><abbr>p</abbr><ex>atriae</ex>
</expan></w><w lemma="proconsul"><expan><abbr>proco</abbr><ex>n</ex>
108
6.2 Data samples
<abbr>s</abbr><ex>ulis</ex>
</expan></w>
</supplied><supplied reason="lost"><w lemma="filius">filio</w><w lemma="et">et</w>
</supplied><supplied reason="lost"><w lemma="imperator"><expan><abbr>Imp</abbr><ex>eratoris</ex>
</expan></w>
</supplied><lb n="5"/><persName type="emperor"><supplied reason="lost"><name type="gentilicium" nymRef="Antoninus">Antonini</name>
</supplied><name type="cognomen" nymRef="Augustus"><expan><abbr><supplied reason="lost">Au</supplied>g</abbr>
<ex>usti</ex></expan>
</name></persName><w lemma="frater">fratri</w><gap reason="lost" extent="unknown" unit="character"/></ab>
</div><div type="translation" xml:space="preserve"><head>Translation</head><p><supplied reason="lost">To Publius Septimius Geta,
most noble Caesar, son of Emperor Caesar Lucius Septimius Severus,Pius, Pertinax, Augustus</supplied> Victor in Arabia,
Victor in Adiabene, <supplied reason="lost">greatest Victor in Parthia,holding tribunician power for the</supplied> seventh time, acclaimed
victor <supplied reason="lost">eleven times, consul twice,father of the country, proconsul and</supplied> brother
<supplied reason="lost">of Emperor Antoninus</supplied> Augustus<gap reason="lost"/></p>
</div><div type="commentary"><head>Commentary</head><p>l. 4. trib. pot. VII. 10 Dec. 198 to 9 Dec. 199.</p>
</div></body>
</text></TEI>
6.2.4 Wilfred Owen
<TEI xmlns="http://www.tei-c.org/ns/1.0"><teiHeader type="text"><fileDesc><titleStmt><title>
109
Workshop: Working with TEI Texts
<orgName ref="#TEI_Consortium">TEI</orgName> LearningEnvironment - <persName ref="#Wilfred_Owen">Wilfred
Owen</persName></title><principal>Encoded by <persName ref="#Elena_Pierazzo">Elena
Pierazzo</persName> and <persName ref="#Renée_van_Baalen">Renee vanBaalen</persName>
</principal></titleStmt><publicationStmt><publisher><orgName ref="#TEI_Consortium">TEI Consortium</orgName>
</publisher><distributor><orgName ref="#OUCS"><placeName ref="#Oxford">Oxford</placeName>
University Computing Services</orgName></distributor><authority><persName ref="#Sebastian_Rahtz">Sebastian
Rahtz</persName></authority><pubPlace><placeName ref="#Oxford">Oxford</placeName>
</pubPlace><address><street>13 Banbury Road</street><postCode>OX2 6NN</postCode><placeName><settlement ref="#Oxford">Oxford</settlement><country ref="#UK">United Kingdom</country>
</placeName></address><availability><p><ref
target="http://creativecommons.org/licenses/by-nc-sa/3.0/"><orgName ref="#Creative_Commons">Creative
Commons</orgName>Attribution-NonCommercial-ShareAlike 3.0 UnportedLicense.</ref>
</p><p>First draft <orgName ref="#TEI_Consortium">TEI</orgName> Learning
Environment by <persName ref="#Renée_van_Baalen">Renee vanBaalen</persName>, 2012-01-10.</p>
</availability><date when="2012-01-10">10th January 2012</date>
</publicationStmt><sourceDesc><biblFull xml:id="poems"><titleStmt><title type="full"><title type="main">Joint Information System’s
Committee Technology Applications Program(JTAP)</title>
<title type="sub">Project ’Virtual Seminars forTeaching Literature’</title>
<title type="sub">Tutorial 4</title><title type="sub">’Strange Meeting’</title>
</title><author><persName ref="#Wilfred_Owen">Wilfred
110
6.2 Data samples
Owen</persName></author><editor role="director"><persName ref="#Stuart_Lee">Stuart Lee</persName>
</editor><editor role="project_officer"><persName>Paul Groves</persName>
</editor><editor role="encoder"><persName ref="#Elena_Pierazzo">Elena
Pierazzo</persName></editor>
</titleStmt><publicationStmt><publisher><orgName ref="#University_of_Oxford">University of<placeName ref="#Oxford">Oxford</placeName></orgName>
</publisher><pubPlace><placeName ref="#Oxford">Oxford</placeName>
</pubPlace><date from="1996-10" to="1998-10">From October 1996
to October 1998</date></publicationStmt><sourceDesc><biblFull><titleStmt><title type="full"><title type="main">The First World War Poetry
Digital Archive</title><title type="sub">The <persName ref="#Wilfred_Owen">Wilfred
Owen</persName>Collection</title>
<title type="sub">Poems by <persName ref="#Wilfred_Owen">WilfredOwen</persName>
</title><title type="sub">Strange Meeting</title>
</title><author><persName ref="#Wilfred_Owen">Wilfred
Owen</persName></author><editor role="director"><persName ref="#Stuart_Lee">Stuart Lee</persName>
</editor><editor role="project_manager"><persName ref="#Kate_Lindsay">Kate
Lindsay</persName></editor><editor role="technical_specialist"><persName ref="#Michael_Loizou">Michael
Loizou</persName></editor><editor role="cataloguer"><persName ref="#Everett_Sharp">Everett
Sharp</persName></editor><editor role="cataloguer"><persName ref="#Alisa_Miller">Alisa
Miller</persName></editor>
111
Workshop: Working with TEI Texts
<editor role="research_officer"><persName ref="#Alun_Edwards">Alun
Edwards</persName></editor><editor role="web_developer"><persName ref="#Richard_Doe">Richard
Doe</persName></editor><editor role="web_designer"><persName ref="#Joseph_Talbot">Joseph
Talbot</persName></editor>
</titleStmt><editionStmt><edition>First edition</edition>
</editionStmt><publicationStmt><publisher><orgName ref="#University_of_Oxford">University of<placeName ref="#Oxford">Oxford</placeName></orgName>
</publisher><pubPlace><placeName ref="#Oxford">Oxford</placeName>
</pubPlace><date when="2008-11-11">11th November 2008</date>
</publicationStmt><sourceDesc><msDesc xml:id="strange_meeting"><msIdentifier><placeName ref="#London"><settlement>London</settlement>, <country ref="#UK">United
Kingdom</country></placeName><institution>The British Library</institution><repository>The Wilfred Owen Literary
Estate</repository><idno>This is no. 148 in ed. ’The Complete Poems
and Fragments’.</idno></msIdentifier><msContents><msItem><title>Strange Meeting</title><author><persName ref="#Wilfred_Owen">Wilfred
Owen</persName></author><docImprint><pubPlace><placeName ref="#Scarborough">Scarborough</placeName>
</pubPlace><address><placeName><country ref="#UK">United Kingdom</country>
</placeName></address><date from="1918-01" to="1918-03">January to March
1918</date></docImprint>
</msItem></msContents><physDesc>
112
6.2 Data samples
<objectDesc form="poem"><supportDesc><support><dimensions type="leaves"/>
</support><extent>One leaf, handwritten double
sided.</extent></supportDesc>
</objectDesc><handDesc><p>Handwritten by <persName ref="#Wilfred_Owen">Wilfred
Owen</persName>.</p></handDesc>
</physDesc></msDesc>
</sourceDesc></biblFull>
</sourceDesc></biblFull><listPlace><place xml:id="London"><placeName><settlement>London</settlement><country ref="#UK">United Kingdom</country>
</placeName></place><place xml:id="Oxford"><placeName><settlement>Oxford</settlement><country ref="#UK">United Kingdom</country>
</placeName></place><place xml:id="Belgium"><placeName><country ref="#Belgium">Belgium</country>
</placeName></place><place xml:id="Flanders"><placeName><region ref="#Flanders">Flanders</region><country ref="#Belgium">Belgium</country>
</placeName></place><place xml:id="France"><placeName><country>France</country>
</placeName></place><place xml:id="Germany"><placeName><country ref="#Germany">Germany</country>
</placeName></place><place xml:id="Hell"><placeName><geogName>Hades</geogName><geogName>Hell</geogName><geogName>Underworld</geogName>
</placeName><note><p>Fictional geographic location.</p>
</note>
113
Workshop: Working with TEI Texts
</place><place xml:id="Heytesbury"><placeName><settlement ref="#Heytesbury">Heytesbury</settlement><country ref="#UK">United Kingdom</country>
</placeName></place><place xml:id="Ireland"><placeName>Ireland</placeName><note>Kingdom of the <placeName>United
Kingdom</placeName></note>
</place><place xml:id="Ors"><placeName><settlement>Ors</settlement><country ref="#France">France</country>
</placeName></place><place xml:id="Oswestry"><placeName><settlement>Oswestry</settlement><country>United Kindom</country>
</placeName></place><place xml:id="Prussia"><placeName><region ref="#Prussia">Prussia</region><country ref="#Germany">Germany</country>
</placeName></place><place xml:id="Ripon"><placeName><settlement ref="#Ripon">Ripon</settlement><country ref="#UK">United Kingdom</country>
</placeName></place><place xml:id="Scarborough"><placeName><settlement>Scarborough</settlement><country ref="#UK">United Kingdom</country>
</placeName></place><place xml:id="Tipperary"><placeName><settlement ref="#Tipperary">Tipperary</settlement><country ref="#Ireland">Ireland</country>
</placeName></place><place xml:id="UK"><placeName><country ref="#UK">United Kingdom</country>
</placeName><placeName type="alt">British colonies and
kingdoms</placeName></place><place xml:id="USA"><placeName><country ref="#USA">United Stated of
America</country></placeName>
</place>
114
6.2 Data samples
<place xml:id="Weirleigh"><placeName><settlement>Weirleigh</settlement><country ref="#UK">United Kingdom</country>
</placeName></place>
</listPlace><listPerson><person role="cataloguer" xml:id="Alisa_Miller"><persName>Alisa Miller</persName><note><ref
target="http://www.oucs.ox.ac.uk/ww1lit/about/staff.html">Alisa Miller’sbiography at the First World War
Poetry Digital Archive</ref></note>
</person><person role="research_officer" xml:id="Alun_Edwards"><persName>Alun Edwards</persName><note><ref
target="http://www.oucs.ox.ac.uk/ww1lit/about/staff.html">Alun Edwards’biography at the First World War
Poetry Digital Archive</ref></note>
</person><person role="publisher" sex="1" xml:id="Andrew_Chatto"><persName>Chatto</persName><persName>Andrew Chatto</persName><note><ref
target="http://www.randomhouse.co.uk/about-us/about-us/companies/uk-companies-and-imprints/vintage-publishing/chatto-windus">AndrewChatto’s biography at Random House</ref>
</note></person><person role="mythological_hero" sex="1" xml:id="Antaeus"><persName xml:lang="latin" ref="#Antaeus">Antaeus</persName><persName xml:lang="greek" ref="#Antaeus">Antaios</persName><note><ref
target="http://www.theoi.com/Gigante/GiganteAntaios.html">Antaeus attheoi.com</ref>
</note></person><person role="editor" xml:id="Elena_Pierazzo"><persName>Elena Pierazzo</persName><note><ref
target="http://www.kcl.ac.uk/artshums/depts/ddh/people/core/pierazzo/index.aspx">ElenaPierazzo’s biography at King’s
College</ref></note>
</person><person role="cataloguer" xml:id="Everett_Sharp"><persName>Everett Sharp</persName><note><ref
target="http://www.oucs.ox.ac.uk/ww1lit/about/staff.html">Everett Sharp’sbiography at the First World War
Poetry Digital Archive</ref></note>
</person>
115
Workshop: Working with TEI Texts
<person role="mythological_hero" sex="1" xml:id="Heracles"><persName xml:lang="greek" ref="#Heracles">Heracles</persName><persName xml:lang="latin" ref="#Heracles">Hercules</persName><persName>Herk</persName><note><ref
target="http://www.theoi.com/greek-mythology/heracles.html">Heracles attheoi.com</ref>
</note></person><person role="manager" xml:id="James_Cummings"><persName>James Cummings</persName><note><ref
target="http://digital.humanities.ox.ac.uk/PeopleProfile/person_profile_page.aspx?pid=119">JamesCummings’ biography at Oxford
University</ref></note>
</person><person role="philosopher" sex="1" xml:id="John_Locke"><persName>John Locke</persName><persName>Locke</persName><note><ref
target="http://plato.stanford.edu/entries/locke/">John Locke at StanfordEncyclopedia of
Philosophy</ref></note>
</person><person role="editor" xml:id="Jon_Stallworthy"><persName>Jon Stallworthy</persName><note><ref
target="http://www.english.ox.ac.uk/about-faculty/faculty-members/other-members/stallworthy-professor-jon">JonStallworthy’s biography at Oxford
University</ref></note>
</person><person role="web_designer" xml:id="Joseph_Talbot"><persName>Joseph Talbot</persName><note><ref
target="http://www.oucs.ox.ac.uk/ww1lit/about/staff.html">Joseph Talbot’sbiography at the First World War
Poetry Digital Archive</ref></note>
</person><person role="project_manager" xml:id="Kate_Lindsay"><persName>Kate Lindsay</persName><note><ref
target="http://www.oucs.ox.ac.uk/ww1lit/about/staff.html">Kate Lindsy’sbiography at the First World War
Poetry Digital Archive</ref></note>
</person><person role="technical_specialist" xml:id="Michael_Loizou"><persName>Michael Loizou</persName><note><ref
target="http://www.oucs.ox.ac.uk/ww1lit/about/staff.html">MichaelLoizou’s biography at the First World War
116
6.2 Data samples
Poetry Digital Archive</ref></note>
</person><person role="god" xml:id="Pluto"><persName xml:lang="latin" ref="#Pluto">Pluto</persName><persName xml:lang="greek" ref="#Pluto">Hades</persName><note><ref
target="http://www.theoi.com/Khthonios/Haides.html">Pluto attheoi.com</ref>
</note></person><person role="editor" xml:id="Renée_van_Baalen"><persName>Renée van Baalen</persName>
</person><person role="web_developer" xml:id="Richard_Doe"><persName>Richard Doe</persName><note><ref
target="http://www.oucs.ox.ac.uk/ww1lit/about/staff.html">Richard Doe’sbiography at the First World War
Poetry Digital Archive</ref></note>
</person><person xml:id="SG_Partridge"><persName>Captain S.G. Partridge</persName><note><ref
target="http://alihollington.typepad.com/historic_battlefields/2008/01/how-did-they-do.html">HistoricBattlefields on the S.S. 143 adn S.G.
Partridge</ref></note>
</person><person role="head" xml:id="Sebastian_Rahtz"><persName>Sebastian Rahtz</persName><note><ref
target="http://digital.humanities.ox.ac.uk/PeopleProfile/person_profile_page.aspx?pid=157">SebastianRahtz’ biography at Oxford
University</ref></note>
</person><person role="director" xml:id="Stuart_Lee"><persName>Stuart Lee</persName><note><ref
target="http://www.oucs.ox.ac.uk/ww1lit/about/staff.html">Stuart Lee’sbiography at the First World War
Poetry Digital Archive</ref></note>
</person><person role="god" xml:id="Titan"><persName xml:lang="greek" ref="#Titan">Titanes</persName><note><ref
target="http://www.theoi.com/Titan/Titanes.html">Titans ontheoi.com</ref>
</note></person><person role="author" xml:id="Victor_Hugo"><persName>Victor Hugo</persName><birth when="1802-02-26">
117
Workshop: Working with TEI Texts
<placeName><settlement>Besançon</settlement>, <country ref="#France">France</country>,
<date>26thFebruary, 1802</date>. </placeName>
</birth><death when="1882-05-22"><placeName>Paris</placeName>, <country ref="#France">France</country>,
<date>22nd May,1882</date>. </death>
<note> Author of the essay <reftarget="http://www.gavroche.org/vhugo/shakespeare/">
<title ref="#Shakespeare_by_Victor_Hugo">’Shaksperés’, or ’WilliamShakespeare’</title>
</ref>. </note></person><person role="publisher" sex="1" xml:id="W.E._Windus"><persName>Windus</persName><persName>W.E. Windus</persName><note><ref
target="http://www.randomhouse.co.uk/about-us/about-us/companies/uk-companies-and-imprints/vintage-publishing/chatto-windus">W.E.Windus’ biography at Random House</ref>
</note></person><person sex="1" role="poet" xml:id="Wilfred_Owen"><persName>Wilfred Edward Salter Owen</persName><persName>Wilfred Owen</persName><birth when="1893-03-18"><placeName><settlement ref="#Oswestry">Oswestry</settlement>,
<country ref="#UK">United Kindom</country>, <date when="1893-03-18">18thMarch, 1893</date>.
</placeName></birth><death when="1918-11-04"><placeName><settlement ref="#Ors">Ors</settlement>, <country ref="#France">France</country>,
<date when="1918-11-04">4th November, 1918</date>.</placeName>
</death><note><ref
target="http://www.oucs.ox.ac.uk/ww1lit/collections/owen">The WilfredOwen Collection - Biography</ref>
</note></person>
</listPerson><listOrg><org xml:id="Chatto_and_Windus"><orgName>Chatto & Windus</orgName><note><ref
target="http://www.randomhouse.co.uk/about-us/about-us/companies/uk-companies-and-imprints/vintage-publishing/chatto-windus">Chatto& Windus’ about page at Random
House</ref></note>
</org><org xml:id="Creative_Commons"><orgName>Creative Commons</orgName><note><ref target="http://creativecommons.org/">Creative
Commons home page</ref>
118
6.2 Data samples
</note></org><org xml:id="OUCS"><orgName>Oxford University Computing
Services</orgName><orgName>OUCS</orgName><note><ref target="http://www.oucs.ox.ac.uk/">OUCS home
page</ref></note>
</org><org xml:id="TEI_Consortium"><orgName>TEI Consortium</orgName><orgName>TEI</orgName><note><ref target="http://www.tei-c.org/index.xml">TEI
Consortium home page</ref></note>
</org><org xml:id="University_of_Oxford"><orgName>University of Oxford</orgName><note><ref target="http://www.ox.ac.uk/">University of
Oxford’s home page</ref></note>
</org></listOrg><listBibl><bibl xml:id="Shakespeare_by_Victor_Hugo"><title level="m">Shakespeare</title><author><persName ref="#Victor_Hugo">Victor
Hugo</persName></author><note><ref
target="http://www.gavroche.org/vhugo/shakespeare/">Victor Hugo’s essay’Shakespeare’</ref>
</note></bibl>
</listBibl></sourceDesc>
</fileDesc><encodingDesc><charDecl><glyph xml:id="v_stroke"><glyphName>V stroke</glyphName><charProp><localName>entity</localName><value>V shaped stroke signifying an
addition.</value></charProp>
</glyph></charDecl>
</encodingDesc></teiHeader><facsimile><surfaceGrp n="leaf1"><surface xml:id="leaf1_surface1_Strange_Meeting"><graphic url="Strange-Meeting-manuscript-1.jpg"/>
</surface><surface xml:id="leaf1_surface2_Strange_Meeting">
119
Workshop: Working with TEI Texts
<graphic url="Strange-Meeting-manuscript-2.jpg"/></surface>
</surfaceGrp></facsimile><sourceDoc><surfaceGrp n="leaf1"><surface
facs="#leaf1_surface1_Strange_Meeting"type="letter"subtype="handwritten">
<zone><zone><line>Strange Meeting.</line>
</zone><zone><line>3</line>
</zone><line>It seemed that <del rend="stroked">from my
dug-out</del><add place="above">out of <del rend="stroked">the</del> battle</add> I
escaped</line><line>Down some profound<del rend="waving">er</del><del rend="stroked"><add place="above">earth-</add>
</del><add place="below">dull</add> tunnel, <del rend="stroked">older</del><del rend="stroked"><add place="above">nether</add>
</del><add place="below">long since</add> scooped</line>
<line>Through granites which <del rend="stroked">thenether flames</del>
<del rend="stroked"><add place="below">plutonic</add>
</del><add place="below">titanic wars</add> had
groined.</line><line>Yet also there <metamark place="inline" function="add" target="#ad1">
<g ref="#v_stroke" rend="v_stroke"/></metamark><del rend="stroked"><add place="above" xml:id="ad1">Down all its
length</add></del> encumbered sleepers groaned,</line>
<line>Too fast in thought or death to bebestirred.</line>
<line>Then, as I probed them, one sprang up, andstared</line>
<line>With piteous recognition in fixed eyes,</line><line>Lifting <del rend="stroked">his</del> distressful
hands, as if to bless.</line><line>And by his smile, I knew that sullen hall,--</line><line xml:id="alt0">By his dead smile <metamark function="join" target="#ad2">
<g rend="arrow"/></metamark><metamark function="join" target="#ad2"><g rend="arrow"/>
</metamark><hi rend="circled">I knew we stood in
hell</hi>.</line><line xml:id="alt1"><del rend="waving">And</del>
120
6.2 Data samples
<del rend="waving"><del>b</del><add place="overwritten">B</add>y</del> his <del rend="oblique_strokes">dead</del>
smile <seg xml:id="ad2">I knew that sullen hall</seg></line><line xml:id="alt2"><del rend="stroked">Yet slumber droned <seg xml:id="alt3">all</seg><seg xml:id="alt4" rend="above">pains</seg> down
that sullen hall</del></line><alt targets="#alt0 #alt1 #alt2" mode="excl" weights="1 0 0"/><line>With a thousand <del rend="stroked">fears</del><del rend="stroked"><add place="above"><unclear>ways</unclear>
</add></del><add place="above_above"><ptr target="#alt4"/>
</add> that <del rend="stroked">creature</del><add place="above">vision</add>’s face was
grained;</line><line>Yet no blood <del rend="stroked">sumped</del><add place="above">reached <del rend="stroked">him</del></add><del rend="stroked">here</del><add place="above">there</add> from the upper
ground,</line><line>And no <del rend="stroked">shell</del><add place="above">guns</add> thumped, or down the
flues made moan.</line><line><del rend="stroked">But all was sleep. And no voice
called for men.</del></line><line>“<del rend="stroked">My</del><add place="inspace">Strange</add> friend,” I said,
“here is no cause to mourn.”</line><line>“None”, said that other, “save the undone
years,</line><line>The hopelessness.<add place="above">
<del rend="stroked">unachieved.</del></add><add place="below"><del rend="stroked"><gap reason="unreadable"/>
</del></add> Whatever hope is yours,</line>
<line>Was my life also; <del>comrade.</del><add place="above"><del rend="stroked">for</del>
</add><add place="below">I went</add><del rend="stroked">I ran</del><add place="above">hunting</add> wild.</line>
<line>After the wildest beauty in the world,</line><line>Which lies not calm in eyes, or braided
hair,</line></zone>
</surface><surface
facs="#leaf1_surface2_Strange_Meeting"
121
Workshop: Working with TEI Texts
type="letter"subtype="handwritten">
<zone><zone>4</zone><line>But mocks the steady running of the hour.</line><line>And if it grieves, grieves richlier than
here.</line><line>For by my glee might many men have laughed,</line><line>And of my weeping something had been left,</line><line>Which must die now. I mean the truth
untold,</line><line>The pity of war, the <del rend="stroked">one
thing</del><add place="above">pity</add> war distilled.</line>
<line>Now men will go content with what wespoiled,</line>
<line>Or, discontent, boil bloody, and bespilled.</line>
<line>They will be swift with swiftness of thetigress.</line>
<line>None will break ranks, though nations trek from<add place="below">progress</add>.</line><line>Courage was mine, and I had mystery,</line><line>Wisdom was mine, and I had mastery:</line><line>To miss the march of this retreating world</line><line>Into vain citadels that are not walled.</line><line>Then, when much blood had clogged their
chariot-<add place="above"><metamark function="add"><g rend="stroke"/>
</metamark>wheels</add></line><line>I would go up and wash them from sweet
wells,</line><line>Even <del rend="stroked">
<unclear>the wells</unclear></del><add place="above"><del rend="stroked">the truths</del>
</add><add place="below">with truths</add><del rend="stroked">I sank</del><add place="above"><del rend="stroked">that</del> lie</add> too deep
for taint.</line><line>I would have poured my spirit without stint</line><line>But not <del rend="waving">by my blood into</del><add place="below">through wounds; not on</add> the <restore><metamark function="restore" target="#del1"/><del xml:id="del1">cess</del>
</restore><add place="below"><del rend="stroked"><unclear>mure</unclear>
</del></add>of war.</line>
<line><hi rend="circled"><metamark function="point" target="#l1"><g rend="arrow"/>
</metamark>Foreheads of men have bled where nowounds <add place="below">were</add>.</hi>
122
6.2 Data samples
</line><line>I <del rend="stroked">I was a German conscript,
and your </del><add place="above">am the <del rend="stroked">German</del><add place="above">enemy</add><del rend="stroked">when</del> you killed,
my</add> friend.</line><line>I knew you in this dark: for so you frowned</line><line>Yesterday through me as you jabbed and
killed.</line><line><del rend="stroked" xml:id="del2"><undo target="#del2"/>I parried; but my hands were
loath and</del> cold.</line><line xml:id="l1">Let us sleep now....</line>
</zone></surface>
</surfaceGrp></sourceDoc><text><body><div type="verse"><head><hi rend="capitalize"> STRANGE MEETING</hi>
</head><lg type="stanza"><l> It seemed that out of battle I escaped</l><l> Down some profound dull tunnel, long since scooped</l><l> Through granites which titanic wars had groined. </l>
</lg><lg type="stanza"><l> Yet also there encumbered sleepers groaned, </l><l> Too fast in thought or death to be bestirred. </l><l> Then, as I probed them, one sprang up, and stared</l><l> With piteous recognition in fixed eyes, </l><l> Lifting distressful hands, as if to bless. </l><l> And by his smile, I knew that sullen hall,- </l><l> By his <seg>dead</seg> smile I knew we stood in Hell. </l>
</lg><lg type="stanza"><l> With a thousand pains that vision ’s face was grained ; </l><l> Yet no blood reached there from the upper ground, </l><l> And no guns thumped, or down the flues made moan. </l><l> ’Strange friend, ’ I said, ’here is no cause to mourn.’ </l><l> ’None, ’ said that other, ’save the undone years, </l><l> The hopelessness. Whatever hope is yours, </l><l> Was my life also; I went hunting wild</l><l> After the wildest beauty in the world, </l><l> Which lies not calm in eyes, or braided hair, </l><l> But mocks the steady running of the hour, </l><l> And if it grieves, grieves richlier than here. </l><l> For by my glee might many men have laughed, </l><l> And of my weeping something had been left, </l><l> Which must die now. I mean the truth untold, </l><l> The pity of war, the pity war distilled. </l><l> Now men will go content with what we spoiled, </l><l> Or, discontent, boil bloody, and be spilled. </l><l> They will be swift with swiftness of the tigress. </l><l> None will break ranks, though nations trek from progress. </l><l> Courage was mine, and I had mystery, </l><l> Wisdom was mine, and I had mastery: </l><l> To miss the march of this retreating world</l>
123
Workshop: Working with TEI Texts
<l> Into vain citadels that are not walled. </l><l> Then, when much blood had clogged their chariot-wheels, </l><l> I would go up and wash them from sweet wells, </l><l> Even with truths <seg>that</seg> lie too deep for taint. </l><l> I would have poured my spirit without stint</l><l> But not through wounds; not on the cess of war. </l><l> Foreheads of men have bled where no wounds were. </l>
</lg><lg type="stanza"><l> ’I am the enemy you killed, my friend. </l><l> I knew you in this dark: for so you frowned</l><l> Yesterday through me as you jabbed and killed. </l><l> I parried; but my hands were loath and cold. </l><l>Let us sleep now....’ </l>
</lg></div>
</body></text>
</TEI>
6.3 Getting better quality TEI XML6.3.1 A more complex ODDWrite an ODD from scratch, or use Roma to create a skeleton, and then edit the result. Use Roma togenerate a schema, and then validate any of the ECCO files against the result. The ODD should have thefollowing features:
1. There should be a <valList> for the @type attribute on <div> which limits it to a few fixedvalues; provide a <desc> for each <valItem>
2. There should be a Schematron constraint which checks that the <publicationStmt> is notempty
3. There should be a Schematron constraint which checks that all <div> elements have a <head>,unless they have a @type attribute with the value ’title_page’.
4. The examples for some elements should be replaced with ones from the ECCO texts
5. Mathematics using MathML should be allowed as a child of <formula> (you’ll need to studythe Exemplars for this)
6.3.2 Work with XSLTWrite an XSLT stylesheet which analyzes all the ECCO files and generates a closed <valList> for<div>/@type
Write an XSLT stylesheet which checks that each pointer in a @resp attribute has a corresponding IDin the file.
6.4 XSLT transformations for genetic editionsYour task is to write an XSLT transformation to make plausible rendition of the encoding on WilfredOwen’s poem Strange Meeting as a web page. Study the input XML carefully, and consider thedifference between the teiHeader/fileDesc/sourceDesc/msDesc, <text>, the <facsimile>, and<sourceDoc>. Then consider what sort of HTML you want to make. This could be
• four separate pages for header, facsimile, genetic editing and edited edition
• four sections in the same document
• side by side sections for some parts
You’ll need to decide what these look like in HTML and start creating the right structures in your XSL.
124
6.5 Grouping Exercises
6.5 Grouping Exercises6.5.1 Grouping, part 1The TEI files from the Godwin project encode each type of event recorded in the diaries with <seg>elements. A number of people can be mentioned in each event and they are encoded with <persName>elements. A @ref attribute provides a unique key for each person.
Find all names of people (<persName>) in some of the TEI file from Godwin and group them bythe type of event within which they are mentioned (<seg>/@type). Sort the names alphabetically.
You can either produce a web page with the results, or a simple text file.
6.5.2 Grouping, part 2Find all names of people (<persName>) in some of the TEI files from Godwin and group them by thetype of event within which they are mentioned (<seg>@type). Return the number of times each uniquename (<persName>/@ref) is mentioned for each type of event.
You can either produce a web page with the results, or a simple text file.
6.5.3 Grouping, part 3Find all names of people (<persName>) in some of the TEI files from Godwin and group them bythe type of event within which they are mentioned (<seg>/@type), then organize them by the event’ssubtype (<seg>/@subtype).
Sort the names alphabetically.You can either produce a web page with the results, or a simple text file.
6.5.4 Grouping, part 4Each of the TEI files from Inscriptions of Roman Tripolitania (IRT) encodes the text from one epigraphicinscritpion, along with a substantial amount of metadata.
Imagine that you are working on a table of contents for the inscription and want to include a "snippet"of the text, for example the first two lines (each line is identified by a <lb>/@n).
Using <xsl:for-each-group> and @group-starting-with or @group-adjacent, return an XMLfile containing all the elements of the first two lines from an IRT file. That is in div[@type=’edition]/ab,between lb[@n=’1’] and lb[@n=’3’]
6.6 Using XQuery6.6.1 XQuery 1The ECCO corpus contains many kinds of text. Use XQuery to retrieve all textcontaining poetry (<lg>). Produce an HTML file containing the id of each file(tei:TEI//tei:idno[@type="TCP"]) and the first line of poetry.
6.6.2 XQuery 2Each TEI file from Inscriptions of Roman Tripolitania (IRT) encodes the text from one epigraphicinscription. Metadata in the teiHeader contains information about the place where the object carryingthe inscription was found.
Your task is to write some xQuery code to create an HTML page containing an index of "find spots"(tei:provenance//tei:event[@type=’found’]//tei:placeName[@type=’ancientFindspot’])from all the IRT TEI files. Each location should contain a list of all the files where it is mentioned. Thename of each file is found under tei:TEI//tei:idno[@type="filename"]
6.7 Using TEI stylesheet family6.7.1 IntroductionThis is a set of XSLT 2.0 specifications to transform TEI XML documents to XHTML, to LaTeX, toXSL Formatting Objects, to/from OOXML (docx), to/from OpenOfice (odt) and to ePub format. Thefiles can be downloaded from the Releases area of http://tei.sf.net. They concentrate on the
125
Workshop: Working with TEI Texts
simpler TEI modules, but adding support for other modules is fairly easy. In the main, the setup hasbeen used on ‘new’ documents, ie reports and web pages that have been authored from scratch, ratherthan traditional TEI-encoded existing material.
There is a change log file available.The XSL FO style sheets were developed for use with PassiveTeX (http://projects.oucs.
ox.ac.uk/passivetex/), a system using XSL formatting objects to render XML to PDF viaLaTeX. They have not been extensively tested with the other XSL FO implementations.
6.7.2 File organisationThe main stylesheets are divided into four directories:
common2 templates which are independent of output type
fo2 templates for making XSL FO output
xhtml2 templates for making HTML output
latex2 templates for making LaTeX output
Within each directory there is a separate file for the templates which implement each of the TEI modules(eg textstructure.xsl, linking.xsl, or drama.xsl); these are included by a master file tei.xsl. Thisalso includes a parameterization layer in the file tei-param.xsl, and the parameterization file from thecommon directory. The tei.xsl does any necessary declaration of constants and XSL keys.
There are further directories for special-purposes conversions:
epub conversion to ePub
odt conversion to and from OpenOffice Writer format
docx conversion to and from Word OOXML format
odds2 processing of TEI ODD files
rdf conversion to RDF
txt conversion to plain text
The final important directory is profiles, which has a set of predefined project starting points, eachof which may have a file to.xsl for one or more of the supported output formats (csv, dtd, html, odt,docbook, epub, latex, p4, docx, fo, lite, and relaxng). There may also be a from.xsl to go from theselected format to TEI XML.
For example, to convert TEI to HTML in the default mannner, the user may run pro-files/default/html/to.xsl on the selected input file. Other starting points are listed below.
For the brave, there are Linux/OSX command-line shell scripts docxtotei, odttotei, teitodocx,teitodtd, teitoepub, teitoepub3, teitohtml, teitoodt, teitordf, teitorelaxng, teitornc, teitotxt, andteitoxsd for converting to/from Word, to/from OpenOffice, and to DTD, ePub, HTML, RDF, RelaxNG, plain text, W3C schema etc. These are implemented using Ant tasks, which are also availablewithin the oXygen XML editor as part of the TEI framework.
Any other use of the stylesheets, eg by referencing individual modules, is not supported and requiresgood understanding of XSL.
6.7.3 Trying the PDF rendering• load an ECCO text into oXygen, choose the TEI P5 to PDF transform scenario (press the
‘Configure Transfomation Scenario’ icon, ). If all goes well, your browser wil load a PDFrendering in due course.
126
6.7 Using TEI stylesheet family
• Now set the parameter Institution to ‘Oxford Summer School’ and rerun the transformation.See the difference?
• More dramatically, change columnCount to have the value 2, and see what happens then.
• Set parIndent to ‘0em’ and parSkip to ‘2pt’
• Finally, change pageWidth to ‘1755mm’, change columnCount back to 1, run the transform,and check that the page width is lessened.
6.7.4 Going further with parameters of the HTMLTry some of these changes to the HTML rendering of Punch, by setting parameters, and check theresults:
• Set autoToc to ‘false’
• Set numberHeadings to false
• Set pageLayout to ‘CSS’
• Set numberParagraphs to ‘true’
You can see the catalogue of parameters at http://www.tei-c.org/release/doc/tei-xsl-common/customize.html.
6.7.5 Using OxGarageNow it is time to work with OxGarage, to check that you can create word-processor and ebook files.Visit oxgarage.oucs.ox.ac.uk:8080/ege-webclient and:
• Upload one of the ECCO files. Try conversions to Word or OpenOffice format, and check thatthey load into the relevant application properly.
• Make an ePub file, if you have an eBook reader to hand (Firefox users can download a good addonfrom http://www.epubread.com/en/)
• Open Word or OpenOffice and write a simple document. Upload this to OxGarage and ask forTEI P5 XML to be sent back. Load it into oXygen and see if it is valid or useable. Do not expectmiracles. OxGarage cannot read your mind. . .
• Edit the generate TEI file a litle, then upload it back into OxGarage and ask for a Word orOpenOffice file. How does that compare with the one you started with?
6.7.6 Rolling our ownIt is time to set up our own profile in the stylesheets, and add some custom code. You will need to
• Set up a copy of the stylesheet family by downloading it from Sourceforge, or checking it outusing Subversion
• Adapt your local processing setup to point to the copy of the XSL
• Add a new profile, and subdirectories for your chosen format
• Put in support for some elements you use which are not properly covered by the existing setup
127
Workshop: Working with TEI Texts
6.8 TEI reference material: XSL stylesheets6.8.1 IntroductionThis section describes how to produce a customization of the TEI stylesheets. It describes all theparameters which you can set, the templates which are designed to be changed, and the empty templatesprovided into which you can add your own code.
There are 13 areas for customization. In most cases there are parameters and templates which arespecific to one of the three output methods (HTML, FO and LaTeX), and those which are common toall three.
6.8.2 Making HTML: exampleYou can simply refer to the specification xhtml2/tei.xsl directly with your XSL processor, orinstall it locally on your own server. For more flexibility, you may prefer to reference the specificationsfrom an XSL wrapper of your own. The minimal specification would look like this:
<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><xsl:import href="xhtml2/tei.xsl"/></xsl:stylesheet>
You can customize the result by adding to this wrapper file. The normal result will be a single streamof HTML which you can save in a file. You can also configure it to produce multiple output files, oneper top-level <div> or <div1>.
6.8.3 Standard page featuresThe default behaviour of the system is to construct each HTML page with per-page navigation bars topand bottom, and a standard set of navigation links underneath.
Variables
Type Name Description Defaultdepartment Name of department within institu-
tion [string]homeLabel Name of link to home page of appli-
cation [string]Home
homeURL Project Home [anyURI] http://www.tei-c.org/homeWords Project [string] TEIinstitution Institution [string] A TEI ProjectparentURL Institution link [anyURI] http://www.tei-c.org/parentWords Name of overall institution
[string]Parent Institution
searchURL Link to search application [anyURI] http://www.google.comxhtml alignNavigationPanel How to align the navigation panel at
the bottom of the page [string]right
xhtml bottomNavigationPanel Display navigation panel at bottom ofpages [boolean]
true
xhtml feedbackURL Link for feedback [anyURI] mailto:feedbackxhtml htmlTitlePrefix Fixed string to insert before normal
page title in HTML meta <title> ele-ment [string]
xhtml linkPanel Make a panel with nextpage/previous page links.[boolean]
true
128
6.8 TEI reference material: XSL stylesheets
Templates6.8.4 LayoutThere are three ways to provide a constant navigation aid. You can either make the whole page into atable, where the first column has a table of contents, or you can make an HTML frameset, or you canjust have a table of links on the left or right
Hypertext links present special problems, as we have to choose whether they should start a newwindow, occupy all of the current window, or stay within the frame. These stylesheets implement thefollowing rules:
1. Any <ref> or <ptr> link stays within the frame
2. Any link containing ‘://’ uses the whole browser window
3. Any link starting ‘.’ uses the whole browser window
4. If the stylesheet sets no splitting of the document, any <ref> or <ptr> link uses the wholebrowser window
5. If a <ref> or <ptr> link has a rend attribute value of ‘noframe’, the whole browser window isused
6. If a <ref> or <ptr> link has a rend attribute value of ‘new’, a new browser window is started
Variables
Type Name Description DefaultoddWeaveLite Whether to make simplified display
of ODD [boolean]false
parIndent Paragraph indentation [string] 1embiblioStyle Style for formatted bibliography
[string]parSkip Default spacing between paragraphs
[string]0pt
xhtml filePerPage Whether we should construct a sepa-rate file for each page (based on pagebreaks) [boolean]
false
xhtml viewPortWidth When making fixed format epub,width of viewport [number]
1200
xhtml viewPortHeight When making fixed format epub,height of viewport [number]
1700
xhtml consecutiveFNs Number footnotes consecutively[boolean]
false
xhtml footnoteBackLink Link back from footnotes to reference[boolean]
false
xhtml contentStructure How to use the front/body/back mat-ter in creating columns. The choice isbetween all: use <front> for left-handcolumn, use <body> for centre col-umn, and use <back> for right-handcolumnbody: use <body> for right-hand column, generate left-hand witha TOC or whatever [string]
body
129
Workshop: Working with TEI Texts
xhtml divOffset The difference between TEI div lev-els and HTML. headings. TEI <div>sare implicitly or explicitly numberedfrom 0 upwards; this offset is addedto that number to produce an HTML<Hn> element. So a value of 2 heremeans that a <div1> will generate an<h2> [integer]
2
xhtml footnoteFile Make a separate file for footnotes[boolean]
false
xhtml linksWidth Width of left-hand column when$pageLayout is "Table" [string]
15%
xhtml navbarFile XML resource defining a navigationbar. The XML should provide a<list> containing a series of <item>elements, each containing an <xref>link. [anyURI]
xhtml autoEndNotes Make all notes into endnotes[boolean]
false
fo backMulticolumns Put back matter in multiple columns[boolean]
false
fo bodyMarginBottom Margin at bottom of text body[string]
24pt
fo bodyMarginTop Margin at top of text body [string] 24ptfo bodyMulticolumns Put body matter in multiple columns
[boolean]false
fo bulletFour Symbol for 4th level itemized list[string]
+
fo bulletOne Symbol for top-level itemized list[string]
•
fo bulletThree Symbol for 3rd level itemized list[string]
*
fo bulletTwo Symbol for 2nd level itemized list[string]
–
fo columnCount Number of columns, whenmultiple-column work is requested[integer]
1
fo betweenStarts XSL FO "provisional-distance-between starts" [string]
18pt
fo betweenGlossStarts XSL FO "provisional-distance-between starts" for gloss lists[string]
42pt
fo betweenBiblStarts XSL FO "provisional-distance-between starts" for bibliographies[string]
14pt
fo divRunningheads Display section headings in runningheads [boolean]
false
fo exampleAfter Space below examples [string] 4ptfo exampleBefore Space above examples [string] 4ptfo exampleMargin Left margin for examples [string] 12pt
130
6.8 TEI reference material: XSL stylesheets
fo flowMarginLeft Left margin of flow [string]fo forcePageMaster Which named page master name to
use [string]fo formatBackpage How to format page numbers in back
matter (use XSLT number format)[string]
1
fo formatBodypage How to format page numbers in mainmatter (use XSLT number format)[string]
1
fo formatFrontpage How to format page numbers in frontmatter (use XSLT number format)[string]
i
fo frontMulticolumns Put front matter in multiple columns[boolean]
false
fo labelSeparation XSL FO "provisional-label-separation" [string]
6pt
fo listAbove-1 Space above lists at top level[string]
6pt
fo listAbove-2 Space above lists at 2nd level[string]
4pt
fo listAbove-3 Space above lists at 3rd level[string]
0pt
fo listAbove-4 Space above lists at 4th level[string]
0pt
fo listBelow-1 Space below lists at top level[string]
6pt
fo listBelow-2 Space below lists at 2nd level[string]
4pt
fo listBelow-3 Space below lists at 3rd level[string]
0pt
fo listBelow-4 Space below lists at 4th level[string]
0pt
fo listItemsep Spacing between list items[string]
4pt
fo listLeftGlossIndent Left margin for gloss lists [string] 0.5info listLeftGlossInnerIndent Left margin for nested gloss lists
[string]0.25in
fo listLeftIndent Indentation for lists [string] 0ptfo listRightMargin Right margin for lists [string] 10ptfo pageHeight Paper height [string] 297mmfo pageMarginBottom Margin at bottom of text area
[string]100pt
fo pageMarginLeft Left margin [string] 80ptfo pageMarginRight Right margin [string] 150ptfo pageMarginTop Margin at top of text area [string] 75ptfo pageWidth Paper width [string] 211mmfo parSkipmax Maximum space allowed between
paragraphs [string]12pt
131
Workshop: Working with TEI Texts
fo readColSpecFile External XML file containing specifi-cations for column sizes for tables indocument [anyURI]
fo regionAfterExtent Region after [string] 14ptfo regionBeforeExtent Region before [string] 14ptfo sectionHeaders Construct running headers from
page number and section headings[boolean]
true
fo spaceAfterBibl Space after bibliography [string] 0ptfo spaceAroundTable Space above and below a table
[string]8pt
fo spaceBeforeBibl Space above bibliography [string] 4ptfo spaceBelowCaption Space below caption of figure or table
[string]4pt
fo titlePage Make title page [boolean] truefo twoSided Make 2-page spreads [boolean] truelatex classParameters Optional parameters for document-
class [string]11pt,twoside
latex latexLogo Logo graphics file [string]latex pagebreakStyle When processing a "pb" element, de-
cide what to generate: "active" gen-erates a page break; "visible" gener-ates a bracketed number (with scis-sors), and "bracketsonly" generates abracketed number (without scissors).[float]
latex tableMaxWidth When making a table, what widthmust be constrained to fit, as a pro-portion of the page width. [float]
0.85
latex verseNumbering Whether to number lines of poetry[boolean]
false
latex everyHowManyLines When numbering poetry, how oftento put in a line number [integer]
5
latex resetVerseLineNumbering When numbering poetry, when torestart the sequence; this must be thename of a TEI element [string]
div1
latex latexPaperSize LaTeX paper size [] a4paper
Templates
columnHeader (for xhtml) [html] Banner for top of column
hdr (for xhtml) [html] Header section across top of page
<xsl:call-template name="pageHeader"><xsl:with-param name="mode"/>
</xsl:call-template>
hdr2 (for xhtml) [html] Navigation bar
132
6.8 TEI reference material: XSL stylesheets
<xsl:call-template name="navbar"/>
preBreadCrumbPath (for xhtml) [html] Text or action to take at the start of the breadcrumb trail
hdr3 (for xhtml) [html] Breadcrumb trail
<html:a href="#rh-col" title="Go to main page content" class="skiplinks">Skiplinks</html:a><html:a class="hide">|</html:a><xsl:call-template name="crumbPath"/><html:a class="hide">|</html:a><html:a class="bannerright" href="{$parentURL}" title="Go to home page"><xsl:value-of select="$parentWords"/>
</html:a>
lh-col-bottom (for xhtml) [html]Bottom of left-hand columnID of selected section
<xsl:param name="currentID"/><xsl:call-template name="leftHandFrame"><xsl:with-param name="currentID" select="$currentID"/>
</xsl:call-template>
lh-col-top (for xhtml) [html]Top of left-hand column
<xsl:call-template name="searchbox"/><xsl:call-template name="printLink"/>
logoPicture (for xhtml) [html] Logo
<html:aclass="framelogo"href="http://www.tei-c.org/Stylesheets/">
<html:imgsrc="http://www.tei-c.org/release/common2/doc/tei-xsl-common/teixsl.png"vspace="5"width="124"height="161"border="0"alt="created by TEI XSL Stylesheets"/>
</html:a>
metaHTML (for xhtml) [html] Making elements in HTML <head>The text used to create the DC.Titlefield in the HTML header
<xsl:param name="title"/><html:meta name="author"><xsl:attribute name="content"><xsl:call-template name="generateAuthor"/>
</xsl:attribute></html:meta><xsl:if test="$filePerPage=’true’"><html:meta
133
Workshop: Working with TEI Texts
name="viewport"content="width={$viewPortWidth}, height={$viewPortHeight}"/>
</xsl:if><html:meta
name="generator"content="Text Encoding Initiative Consortium XSLT stylesheets"/>
<xsl:choose><xsl:when
test="$outputTarget=’html5’ or $outputTarget=’epub3’"><html:meta charset="utf-8"/>
</xsl:when><xsl:otherwise><html:meta
http-equiv="Content-Type"content="text/html; charset={$outputEncoding}"/>
<html:meta name="DC.Title"><xsl:attribute name="content"><xsl:value-of select="normalize-space($title)"/>
</xsl:attribute></html:meta><html:meta name="DC.Type" content="Text"/><html:meta name="DC.Format" content="text/html"/>
</xsl:otherwise></xsl:choose>
navbar (for xhtml) [html] Construction of navigation bar A file is looked for relative to the stylesheet(the second parameter of the document function), which is expected to contain a TEI <list> whereeach <item> has an embedded <xref>
<xsl:choose><xsl:when test="$navbarFile=”"><xsl:comment>no nav bar</xsl:comment>
</xsl:when><xsl:otherwise><xsl:element
name="{if ($outputTarget=’html5’) then ’nav’ else ’div’}"><xsl:for-each select="document($navbarFile,document(”))"><xsl:for-each select="tei:list/tei:item"><html:span class="navbar"><html:a href="{$URLPREFIX}{tei:xref/@url}" class="navbar"><xsl:apply-templates select="tei:xref/text()"/>
</html:a></html:span><xsl:if test="following-sibling::tei:item"> | </xsl:if>
</xsl:for-each></xsl:for-each>
</xsl:element></xsl:otherwise>
</xsl:choose>
pageHeader (for xhtml) [html] Banner for top of pagelayout mode
<xsl:param name="mode"/><xsl:choose><xsl:when test="$mode=’table’"><html:table width="100%" border="0"><html:tr><html:td
134
6.8 TEI reference material: XSL stylesheets
height="98"class="bgimage"onclick="window.location=’{$homeURL}’"cellpadding="0">
<xsl:call-template name="makeHTMLHeading"><xsl:with-param name="class">subtitle</xsl:with-param><xsl:with-param name="text"><xsl:call-template name="generateSubTitle"/>
</xsl:with-param><xsl:with-param name="level">2</xsl:with-param>
</xsl:call-template><xsl:call-template name="makeHTMLHeading"><xsl:with-param name="class">title</xsl:with-param><xsl:with-param name="text"><xsl:call-template name="generateTitle"/>
</xsl:with-param><xsl:with-param name="level">1</xsl:with-param>
</xsl:call-template></html:td><html:td style="vertical-align:top;"/>
</html:tr></html:table>
</xsl:when><xsl:otherwise><xsl:call-template name="makeHTMLHeading"><xsl:with-param name="class">subtitle</xsl:with-param><xsl:with-param name="text"><xsl:call-template name="generateSubTitle"/>
</xsl:with-param><xsl:with-param name="level">2</xsl:with-param>
</xsl:call-template><xsl:call-template name="makeHTMLHeading"><xsl:with-param name="class">title</xsl:with-param><xsl:with-param name="text"><xsl:call-template name="generateTitle"/>
</xsl:with-param><xsl:with-param name="level">1</xsl:with-param>
</xsl:call-template></xsl:otherwise>
</xsl:choose>
rh-col-bottom (for xhtml) [html] Bottom of right-hand columnID of selected section
<xsl:param name="currentID"/><xsl:call-template name="mainFrame"><xsl:with-param name="currentID" select="$currentID"/>
</xsl:call-template>
rh-col-top (for xhtml) [html] Top of right-hand column
<xsl:call-template name="columnHeader"/>
searchbox (for xhtml) [html] Make a search box
singleFileLabel (for xhtml) [html] Construct a label for the link which makes a printable version ofthe document.For Printing
135
Workshop: Working with TEI Texts
latexPackages (for latex) LaTeX package setup Declaration of the LaTeX packages needed toimplement this markup
<xsl:text>\usepackage[</xsl:text><xsl:value-of select="$latexPaperSize"/><xsl:text>,</xsl:text><xsl:value-of select="$latexGeometryOptions"/><xsl:text>]{geometry}\usepackage{framed}</xsl:text><xsl:text>\definecolor{shadecolor}{gray}{0.95}\usepackage{longtable}\usepackage[normalem]{ulem}\usepackage{fancyvrb}\usepackage{fancyhdr}\usepackage{graphicx}</xsl:text><xsl:if test="key(’ENDNOTES’,1)"> \usepackage{endnotes}<xsl:choose>
<xsl:when test="key(’FOOTNOTES’,1)"> \def\theendnote{\@alph\c@endnote}</xsl:when><xsl:otherwise> \def\theendnote{\@arabic\c@endnote}</xsl:otherwise>
</xsl:choose></xsl:if><xsl:text>\def\Gin@extensions{.pdf,.png,.jpg,.mps,.tif}</xsl:text><xsl:choose><xsl:when test="$reencode=’true’"><xsl:text>\IfFileExists{tipa.sty}{\usepackage{tipa}}{}
\usepackage{times}</xsl:text>
</xsl:when></xsl:choose><xsl:if test="not($userpackage=”)"> \usepackage{<xsl:value-of select="$userpackage"/>}</xsl:if><xsl:text> \pagestyle{fancy}</xsl:text>\usepackage[pdftitle={<xsl:call-template name="generateSimpleTitle"/>},pdfauthor={<xsl:call-template name="generateAuthor"/>}]{hyperref}\hyperbaseurl{<xsl:value-of select="$baseURL"/>}
<xsl:if test="count(key(’APP’,1))>0">\usepackage{ledmac}<xsl:call-template name="ledmacOptions"/></xsl:if>
latexSetup (for latex) LaTeX setup The basic LaTeX setup which you should not really tinker withunless you really understand why and how. Note that we need to set up a mapping here for Unicode8421, 10100 and 10100 to glyphs for backslash and the two curly brackets, to provide literalcharacters. The normal characters remain active for LaTeX commands. Note that if $reencode isset to false, no input or output encoding packages are loaded, since it is assumed you are using aTeX variant capable of dealing with UTF-8 directly.
<xsl:call-template name="latexSetupHook"/>\IfFileExists{xcolor.sty}%{\RequirePackage{xcolor}}%{\RequirePackage{color}}\usepackage{colortbl}<xsl:choose>
136
6.8 TEI reference material: XSL stylesheets
<xsl:when test="$reencode=’true’">\IfFileExists{utf8x.def}%{\usepackage[utf8x]{inputenc}\PrerenderUnicode{-}}%{\usepackage[utf8]{inputenc}}
<xsl:call-template name="latexBabel"/>\usepackage[T1]{fontenc}\usepackage{float}\usepackage[]{ucs}\uc@dclc{8421}{default}{\textbackslash }\uc@dclc{10100}{default}{\{}\uc@dclc{10101}{default}{\}}\uc@dclc{8491}{default}{\AA{}}\uc@dclc{8239}{default}{\,}\uc@dclc{20154}{default}{ }\uc@dclc{10148}{default}{>}\def\textschwa{\rotatebox{-90}{e}}\def\textJapanese{}\def\textChinese{}
</xsl:when><xsl:otherwise>\usepackage{fontspec}
\usepackage{xunicode}\catcode‘\=\active \def\{\textbackslash}\catcode‘{=\active \def{{\{}\catcode‘}=\active \def}{\}}\def\textJapanese{\fontspec{Kochi Mincho}}\def\textChinese{\fontspec{HAN NOM A}\XeTeXlinebreaklocale
"zh"\XeTeXlinebreakskip = 0pt plus 1pt }\def\textKorean{\fontspec{Baekmuk Gulim} }\setmonofont{<xsl:value-of select="$typewriterFont"/>}
<xsl:if test="not($sansFont=”)"> \setsansfont{<xsl:value-of select="$sansFont"/>}</xsl:if><xsl:if test="not($romanFont=”)"> \setromanfont{<xsl:value-of select="$romanFont"/>}</xsl:if>
</xsl:otherwise></xsl:choose>\DeclareTextSymbol{\textpi}{OML}{25}\usepackage{relsize}\def\textsubscript#1{%\@textsubscript{\selectfont#1}}\def\@textsubscript#1{%{\m@th\ensuremath{_{\mbox{\fontsize\sf@size\z@#1}}}}}\def\textquoted#1{‘#1’}\def\textsmall#1{{\small #1}}\def\textlarge#1{{\large #1}}\def\textoverbar#1{\ensuremath{\overline{#1}}}\def\textgothic#1{{\fontspec{<xsl:value-of select="$gothicFont"/>}#1}}\def\textcal#1{{\fontspec{<xsl:value-of select="$calligraphicFont"/>}#1}}\RequirePackage{array}\def\@testpach{\@chclass\ifnum \@lastchclass=6 \@ne \@chnum \@ne \else\ifnum \@lastchclass=7 5 \else\ifnum \@lastchclass=8 \tw@ \else\ifnum \@lastchclass=9 \thr@@\else \z@\ifnum \@lastchclass = 10 \else\edef\@nextchar{\expandafter\string\@nextchar}%\@chnum\if \@nextchar c\z@ \else\if \@nextchar l\@ne \else\if \@nextchar r\tw@ \else
137
Workshop: Working with TEI Texts
\z@ \@chclass\if\@nextchar |\@ne \else\if \@nextchar !6 \else\if \@nextchar @7 \else\if \@nextchar (8 \else\if \@nextchar )9 \else10\@chnum\if \@nextchar m\thr@@\else\if \@nextchar p4 \else\if \@nextchar b5 \else\z@ \@chclass \z@ \@preamerr \z@ \fi \fi \fi \fi\fi \fi \fi \fi \fi \fi \fi \fi \fi \fi \fi \fi}
\gdef\arraybackslash{\let\\=\@arraycr}\def\textxi{\ensuremath{\xi}}\def\Panel#1#2#3#4{\multicolumn{#3}{){\columncolor{#2}}#4}{#1}}
<xsl:text disable-output-escaping="yes">\newcolumntype{L}[1]{){\raggedright\arraybackslash}p{#1}}\newcolumntype{C}[1]{){\centering\arraybackslash}p{#1}}\newcolumntype{R}[1]{){\raggedleft\arraybackslash}p{#1}}\newcolumntype{P}[1]{){\arraybackslash}p{#1}}\newcolumntype{B}[1]{){\arraybackslash}b{#1}}\newcolumntype{M}[1]{){\arraybackslash}m{#1}}\definecolor{label}{gray}{0.75}\DeclareRobustCommand*{\xref}{\hyper@normalise\xref@}\def\xref@#1#2{\hyper@linkurl{#2}{#1}}\def\Div[#1]#2{\section*{#2}}\begingroup\catcode‘\_=\active\gdef_#1{\ensuremath{\sb{\mathrm{#1}}}}\endgroup\mathcode‘\_=\string"8000\catcode‘\_=12\relax</xsl:text>
latexBabel (for latex) LaTeX babel setup LaTeX loading of babel with options \usepack-age[english]{babel}
latexLayout (for latex) LaTeX layout preamble All the LaTeX setup which affects page layout
<xsl:choose><xsl:when test="$latexPaperSize=’a3paper’">\paperwidth297mm
\paperheight420mm</xsl:when><xsl:when test="$latexPaperSize=’a5paper’">
\paperwidth148mm\paperheight210mm
</xsl:when><xsl:when test="$latexPaperSize=’a4paper’">\paperwidth210mm
\paperheight297mm</xsl:when><xsl:when test="$latexPaperSize=’letterpaper’">\paperwidth216mm
\paperheight279mm</xsl:when><xsl:otherwise/>
</xsl:choose>\def\@pnumwidth{1.55em}\def\@tocrmarg {2.55em}\def\@dotsep{4.5}
138
6.8 TEI reference material: XSL stylesheets
\setcounter{tocdepth}{3}\clubpenalty=8000\emergencystretch 3em\hbadness=4000\hyphenpenalty=400\pretolerance=750\tolerance=2000\vbadness=4000\widowpenalty=10000<xsl:if test="not($docClass=’letter’)">\renewcommand\section{\@startsection{section}{1}{\z@}%{-1.75ex \@plus -0.5ex \@minus -.2ex}%{0.5ex \@plus .2ex}%{\reset@font\Large\bfseries\sffamily}}\renewcommand\subsection{\@startsection{subsection}{2}{\z@}%{-1.75ex\@plus -0.5ex \@minus- .2ex}%{0.5ex \@plus .2ex}%{\reset@font\Large\sffamily}}\renewcommand\subsubsection{\@startsection{subsubsection}{3}{\z@}%{-1.5ex\@plus -0.35ex \@minus -.2ex}%{0.5ex \@plus .2ex}%{\reset@font\large\sffamily}}\renewcommand\paragraph{\@startsection{paragraph}{4}{\z@}%{-1ex \@plus-0.35ex \@minus -0.2ex}%{0.5ex \@plus .2ex}%{\reset@font\normalsize\sffamily}}\renewcommand\subparagraph{\@startsection{subparagraph}{5}{\parindent}%{1.5ex \@plus1ex \@minus .2ex}%{-1em}%{\reset@font\normalsize\bfseries}}
</xsl:if>\def\l@section#1#2{\addpenalty{\@secpenalty} \addvspace{1.0em plus 1pt}\@tempdima 1.5em \begingroup\parindent \z@ \rightskip \@pnumwidth\parfillskip -\@pnumwidth\bfseries \leavevmode #1\hfil \hbox to\@pnumwidth{\hss #2}\par\endgroup}\def\l@subsection{\@dottedtocline{2}{1.5em}{2.3em}}\def\l@subsubsection{\@dottedtocline{3}{3.8em}{3.2em}}\def\l@paragraph{\@dottedtocline{4}{7.0em}{4.1em}}\def\l@subparagraph{\@dottedtocline{5}{10em}{5em}}\@ifundefined{c@section}{\newcounter{section}}{}\@ifundefined{c@chapter}{\newcounter{chapter}}{}\newif\if@mainmatter\@mainmattertrue\def\chaptername{Chapter}\def\frontmatter{%\pagenumbering{roman}\def\thechapter{\@roman\c@chapter}\def\theHchapter{\alph{chapter}}\def\@chapapp{}%}\def\mainmatter{%\cleardoublepage\def\thechapter{\@arabic\c@chapter}\setcounter{chapter}{0}\setcounter{section}{0}\pagenumbering{arabic}\setcounter{secnumdepth}{6}\def\@chapapp{\chaptername}%\def\theHchapter{\arabic{chapter}}
139
Workshop: Working with TEI Texts
}\def\backmatter{%\cleardoublepage\setcounter{chapter}{0}\setcounter{section}{0}\setcounter{secnumdepth}{0}\def\@chapapp{\appendixname}%\def\thechapter{\@Alph\c@chapter}\def\theHchapter{\Alph{chapter}}\appendix}\newenvironment{bibitemlist}[1]{%\list{\@biblabel{\@arabic\c@enumiv}}%{\settowidth\labelwidth{\@biblabel{#1}}%\leftmargin\labelwidth\advance\leftmargin\labelsep\@openbib@code\usecounter{enumiv}%\let\p@enumiv\@empty\renewcommand\theenumiv{\@arabic\c@enumiv}%}%\sloppy\clubpenalty4000\@clubpenalty \clubpenalty\widowpenalty4000%\sfcode‘\.\@m}%{\def\@noitemerr{\@latex@warning{Empty ‘bibitemlist’ environment}}%\endlist}
\def\tableofcontents{\section*{\contentsname}\@starttoc{toc}}\parskip<xsl:value-of select="$parSkip"/>\parindent<xsl:value-of select="$parIndent"/>\def\Panel#1#2#3#4{\multicolumn{#3}{){\columncolor{#2}}#4}{#1}}\newenvironment{reflist}{%\begin{raggedright}\begin{list}{}{%\setlength{\topsep}{0pt}%\setlength{\rightmargin}{0.25in}%\setlength{\itemsep}{0pt}%\setlength{\itemindent}{0pt}%\setlength{\parskip}{0pt}%\setlength{\parsep}{2pt}%\def\makelabel##1{\itshape ##1}}%}{\end{list}\end{raggedright}}\newenvironment{sansreflist}{%\begin{raggedright}\begin{list}{}{%\setlength{\topsep}{0pt}%\setlength{\rightmargin}{0.25in}%\setlength{\itemindent}{0pt}%\setlength{\parskip}{0pt}%\setlength{\itemsep}{0pt}%\setlength{\parsep}{2pt}%\def\makelabel##1{\upshape\sffamily ##1}}%}{\end{list}\end{raggedright}}\newenvironment{specHead}[2]%{\vspace{20pt}\hrule\vspace{10pt}%\hypertarget{#1}{}%\markright{#2}%
140
6.8 TEI reference material: XSL stylesheets
<xsl:text> \pdfbookmark[</xsl:text><xsl:value-of select="$specLinkDepth"/><xsl:text>]{#2}{#1}%\hspace{-0.75in}{\bfseries\fontsize{16pt}{18pt}\selectfont#2}%}{}</xsl:text><xsl:call-template name="latexPreambleHook"/>
ledmacOptions (for latex) LaTeX setup commands for ledmac package \renewcom-mand{\notenumfont}{\bfseries} \lineation{page} \linenummargin{inner} \footthreecol{A}\foottwocol{B}
latexBegin (for latex) LaTeX setup before start of document All the LaTeX setup which are executedbefore the start of the document
<xsl:text>\makeatletter\thispagestyle{empty}\markright{\@title}\markboth{\@title}{\@author}\renewcommand\small{\@setfontsize\small{9pt}{11pt}\abovedisplayskip 8.5\p@plus3\p@ minus4\p@\belowdisplayskip \abovedisplayskip\abovedisplayshortskip \z@ plus2\p@\belowdisplayshortskip 4\p@ plus2\p@ minus2\p@\def\@listi{\leftmargin\leftmargini\topsep 2\p@ plus1\p@ minus1\p@\parsep 2\p@ plus\p@ minus\p@\itemsep 1pt}}\makeatother\fvset{frame=single,numberblanklines=false,xleftmargin=5mm,xrightmargin=5mm}\fancyhf{}\setlength{\headheight}{14pt}\fancyhead[LE]{\bfseries\leftmark}\fancyhead[RO]{\bfseries\rightmark}\fancyfoot[RO]{}\fancyfoot[CO]{\thepage}\fancyfoot[LO]{\TheID}\fancyfoot[LE]{}\fancyfoot[CE]{\thepage}\fancyfoot[RE]{\TheID}\hypersetup{linkbordercolor=0.75 0.75 0.75,urlbordercolor=0.75 0.750.75,bookmarksnumbered=true}\fancypagestyle{plain}{\fancyhead{}\renewcommand{\headrulewidth}{0pt}}</xsl:text>
latexEnd (for latex) LaTeX setup at end of document All the LaTeX setup which are executed at theend of the document
6.8.5 HeadingsHeadings for sections can be customized in various ways.
Variables
Type Name Description DefaultautoHead Construct a heading for <div> ele-
ments with no <head> [boolean]numberSpacer Character to put after number of sec-
tion header [string]space
141
Workshop: Working with TEI Texts
TemplatesautoMakeHead (for common) [common] How to make a heading for section if there is no <head>
<xsl:param name="display"/><xsl:choose><xsl:when test="tei:head and $display=’full’"><xsl:apply-templates select="tei:head" mode="makeheading"/>
</xsl:when><xsl:when test="tei:head"><xsl:apply-templates select="tei:head" mode="plain"/>
</xsl:when><xsl:when test="tei:front/tei:head"><xsl:apply-templates select="tei:front/tei:head" mode="plain"/>
</xsl:when><xsl:when test="@n"><xsl:value-of select="@n"/>
</xsl:when><xsl:when test="@type"><xsl:text>[</xsl:text><xsl:value-of select="@type"/><xsl:text>]</xsl:text>
</xsl:when><xsl:otherwise><xsl:text>></xsl:text>
</xsl:otherwise></xsl:choose>
headingNumberSuffix (for common) Punctuation to insert after a section number
<xsl:text>.</xsl:text><xsl:value-of select="$numberSpacer"/>
6.8.6 NumberingSection headings, figures, tables and notes can be numbered automatically. We can set the numberingof front matter and back matter separately. If you prefer to supply your own numbering, using the nattribute, you can choose this over automatic numbering.
Normally, heading numbers are followed by ‘. ’, but you can vary this. This would let you use egfixed spaces.
Variables
Type Name Description DefaultnumberBackFigures Automatically number figures in
back matter [boolean]false
numberBackHeadings How to construct heading numberingin back matter [string]
A.1
numberBackTables Automatically number tables in backmatter [boolean]
true
numberBodyHeadings How to construct heading numberingin main matter [string]
1.1.1.1
numberFigures Automatically number figures[boolean]
true
numberFrontFigures Automatically number figures infront matter [boolean]
false
142
6.8 TEI reference material: XSL stylesheets
numberFrontHeadings How to construct heading numberingin front matter [string]
numberFrontTables Automatically number tables in frontmatter [boolean]
true
numberHeadings Automatically number sections[boolean]
true
numberHeadingsDepth Depth to which sections should benumbered [integer]
9
numberTables Automatically number tables[boolean]
true
numberParagraphs Use value of "n" attribute to numbersections [boolean]
false
numberParagraphs Automatically number paragraphs.[boolean]
false
TemplatesnumberBackDiv (for common) [common] How to number sections in back matter
<xsl:if test="not($numberBackHeadings=”)"><xsl:number
count="tei:div|tei:div1|tei:div2|tei:div3|tei:div4|tei:div5|tei:div6"format="A.1.1.1.1.1"level="multiple"/>
</xsl:if>
numberBodyDiv (for common) [common] How to number sections in main matter
<xsl:if test="$numberHeadings=’true’"><xsl:number
count="tei:div|tei:div1|tei:div2|tei:div3|tei:div4|tei:div5|tei:div6"level="multiple"/>
</xsl:if>
numberFrontDiv (for common) [common] How to number sections in front matter
<xsl:param name="minimal"/><xsl:number
count="tei:div|tei:div1|tei:div2|tei:div3|tei:div4|tei:div5|tei:div6"level="multiple"/>
<xsl:if test="$minimal=’false’"><xsl:value-of select="$numberSpacer"/>
</xsl:if>
6.8.7 OutputYou can set a name for the output file(s); if you ask for multiple output files, this name will be used tocreate unique filenames for each section. By default, results will go to wherever your XSLT processornormally writes (usually standard output). If you opt to have files created, you can specify the name ofthe directory where the output is to be placed.
If you are making HTML, do you want a single output page, or a separate one for each section of thedocument? You can decide to have a different splitting policy for front and back matter.
Variables
143
Workshop: Working with TEI Texts
Type Name Description DefaultoutputTarget Type of output being generated
[string]html
REQUEST The complete URL when the docu-ment is being delivered from a webserver (normally set by Apache orCocoon) [string]
STDOUT Write to standard output channel[boolean]
true
xhtml ID An ID passed to the stylesheet toindicate which section to display[string]
xhtml requestedID A wrapper around the ID, to allow forother ways of getting it [string]
<xsl:value-of select="$ID"/>
xhtml URLPREFIX A path fragment to put before allinternal URLs [string]
xhtml outputName The name of the output file[string]
xhtml outputDir Directory in which to place generatedfiles. [string]
xhtml outputEncoding Encoding of output file(s).[string]
utf-8
xhtml outputMethod Output method for output file(s).[string]
xhtml
xhtml outputSuffix Suffix of output file(s). [string] .htmlxhtml doctypePublic Public Doctype of output file(s).
[string]-//W3C//DTDXHTML 1.0Transitional//EN
xhtml doctypeSystem System Doctype of output file(s).[string]
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
xhtml pageLayout The style of HTML (Simple, CSS orTable) which creates the layout forgenerated pages. The choice is be-tween Simple: A linear presentationis createdCSS: The page is created asa series of nested <div>s which canbe arranged using CSS into a multi-column layoutTable: The page is cre-ated as an HTML table [string]
Simple
xhtml splitBackmatter Break back matter into separateHTML pages (if splitting enabled).[boolean]
true
xhtml splitFrontmatter Break front matter into separateHTML pages (if splitting enabled).[boolean]
true
144
6.8 TEI reference material: XSL stylesheets
xhtml splitLevel Level at which to split sections.When processing a <div> or <div[0-5]>, compare the nesting depth andsee whether to start a new HTMLpage. Since the TEI starts with<div1>, setting this parameter to 0will cause top-level sections to besplit apart. The default is not to splitat all. [integer]
-1
xhtml standardSuffix Suffix for generated output files.[string]
<xsl:choose> <xsl:when test="tei:teiCorpus">.html</xsl:when> <xsl:when test="$STDOUT=’true’"/> <xsl:otherwise> <xsl:value-of select="$outputSuffix"/> </xsl:otherwise></xsl:choose>
xhtml topNavigationPanel Display navigation panel at top ofpages. [boolean]
true
xhtml urlChunkPrefix How to specify infra-document links.When a document is split, links needto be constructed between parts ofthe document. The default is touse a query parameter on the URL.[string]
?ID=
xhtml useIDs Construct links using existing ID val-ues. It is often nice if, when mak-ing separate files, their names corre-spond to the ID attribute of the >div<.Alternatively, you can let the systemchoose names. [boolean]
true
xhtml autoBlockQuote Whether it should be attempted tomake quotes into block quotes if theyare over a certain length [boolean]
false
xhtml autoBlockQuoteLength Length beyond which a quote is ablock quote [integer]
150
fo language Language (for hyphenation)[string]
en_US
fo foEngine Name of intended XSL FO en-gine This is used to tailor the re-sult for different XSL FO proces-sors. By default, no special mea-sures are taken, so there are nobookmarks or other such features.Possible values are passivetex (theTeX-based PassiveTeX processor xep(XEP) fop (FOP) antenna (AntennaHouse) [string]
latex baseURL URL root where referenced docu-ments are located [string]
latex reencode Whether or not to load LaTeX pack-ages which attempt to process theUTF-8 characters. Set to "false"if you are using XeTeX or similar.[boolean]
true
145
Workshop: Working with TEI Texts
latex realFigures Use real name of graphics files ratherthan pointers [boolean]
true
Templates6.8.8 Table of contents generationYou probably want tables of contents built for your document, using the <div> structure. However, ifyou have used a <divGen type="toc"> explicitly, that will also create a table of contents, so youcan suppress the automatic one. When a table of contents is created, you choose how many levels ofheadings it will show. You can choose whether or not the front and backmatter appear in the table ofcontents.
Variables
Type Name Description Defaultxhtml autoToc Make an automatic table of contents
[boolean]true
xhtml class_subtoc CSS class for second-level TOC en-tries [string]
subtoc
xhtml subTocDepth Depth at which to stop doing a recur-sive table of contents. You can havea mini table of contents at the start ofeach section. The default is only toconstruct a TOC at the top level; avalue of -1 here means no subtoc atall. [integer]
-1
xhtml tocBack Include the back matter in the table ofcontents. [boolean]
true
xhtml tocDepth Depth to which table of contents isconstructed. [string]
5
xhtml tocFront Include the front matter in the tableof contents. [boolean]
true
xhtml tocElement Which HTML element to wrap eachTOCs entry in. [string]
p
xhtml tocContainerElement Which HTML element to wrap eachTOC sections in. [string]
div
xhtml refDocFooterText Text to link back to from foot of ODDreference pages [string]
TEI Guidelines
xhtml refDocFooterURL URL to link back to from foot ofODD reference pages [anyURI]
index.html
fo div0Tocindent Indentation for level 0 TOC entries[string]
0in
fo div1Tocindent Indentation for level 1 TOC entries[string]
0.25in
fo div2Tocindent Indentation for level 2 TOC entries[string]
0.5in
fo div3Tocindent Indentation for level 3 TOC entries[string]
0.75in
fo div4Tocindent Indentation for level 4 TOC entries[string]
1in
fo div5Tocindent Indentation for level 5 TOC entries[string]
1.25in
146
6.8 TEI reference material: XSL stylesheets
fo tocBack Make TOC for sections in <back>[boolean]
true
fo tocFront Make TOC for sections in <front>[boolean]
true
fo tocNumberSuffix Punctuation to insert after a sectionnumber in a TOC [string]
.
fo tocStartPage Page number on which TOC shouldstart [integer]
1
TemplatesnavInterSep (for xhtml) Gap between elements in navigation list
<xsl:text>: </xsl:text>
6.8.9 InternationalizationAt various places, the system has to create text. You can choose the words it uses (eg translate them toanother language).
Variables
Type Name Description Default
TemplatescopyrightStatement (for xhtml) [html] Make a copyright claimThis page is copyrighted
6.8.10 CSSSetting up material for the CSS file to accompany HTML output.
Variables
Type Name Description Defaultclass_toc CSS class for TOC entries [string] toc
xhtml class_ptr CSS class for links derived from<ptr> [string]
ptr
xhtml class_ref CSS class for links derived from<ref> [string]
ref
xhtml cssFile CSS style file to be associated withoutput file(s) [anyURI]
http://www.tei-c.org/release/xml/tei/stylesheet/tei.css
xhtml cssPrintFile CSS style file for print; this willbe given a media=print attribute.[anyURI]
http://www.tei-c.org/release/xml/tei/stylesheet/tei-print.css
xhtml cssSecondaryFile Secondary CSS style file; this willbe given a media=screen attribute,so that it does not affect printing.It should be used for screen layout.[anyURI]
xhtml cssInlineFile CSS file to include in the output filedirectly [anyURI]
147
Workshop: Working with TEI Texts
Templates6.8.11 TablesDefault behaviour of table elements.
Variables
Type Name Description DefaultcellAlign Default alignment of table cells
[string]left
tableAlign Default alignment of tables[string]
left
fo defaultCellLabelBackgroundDefault colour for background of ta-ble cells which are labelling rows orcolumns [string]
silver
fo inlineTables Force tables to appear inline[boolean]
false
fo makeTableCaption Put a caption on tables [boolean] truefo tableCaptionAlign Alignment of table captions
[string]center
fo tableCellPadding Default padding on table cells[string]
2pt
TemplatestableCaptionstyle (for fo) [fo] Set attributes for display of table
<xsl:attribute name="text-align">center</xsl:attribute><xsl:attribute name="font-style">italic</xsl:attribute><xsl:attribute name="end-indent"><xsl:value-of select="$exampleMargin"/>
</xsl:attribute><xsl:attribute name="start-indent"><xsl:value-of select="$exampleMargin"/>
</xsl:attribute><xsl:attribute name="space-before"><xsl:value-of select="$spaceAroundTable"/>
</xsl:attribute><xsl:attribute name="space-after"><xsl:value-of select="$spaceBelowCaption"/>
</xsl:attribute><xsl:attribute name="keep-with-next">always</xsl:attribute>
6.8.12 Figures and graphicsSometimes you need to prefix the names of all graphics files with a directory name or a URL, or providea default suffix. You can also tell <figure> elements whether or not to produce anything.
Variables
Type Name Description DefaultgraphicsPrefix Directory specification to put before
names of graphics files, unless theystart with "./" [string]
graphicsSuffix Default file suffix for graphics files, ifnot directly specified [string]
.png
148
6.8 TEI reference material: XSL stylesheets
standardScale Scaling of imported graphics[decimal]
1
headInXref [common] Whether cross-referenceto a figure or table includes its caption[boolean]
true
xhtml dpi Resolution of images. This is neededto calculate HTML width and height(in pixels) from supplied dimensions.[integer]
96
xhtml showFigures Display figures. [boolean] truefo autoScaleFigures How to scale figures if no width and
height specified (pass to XSL FOcontent-width) [string]
fo captionInlineFigures Put captions on inline figures[boolean]
false
fo showFloatHead Show the contents of <head> ina cross-reference to table or figure[boolean]
false
fo showFloatLabel Show a title for figures or tables (egTable or Figure) in a cross-reference[boolean]
false
fo xrefShowPage Show the page number in across-reference to table or figure[boolean]
false
TemplatesfigureCaptionstyle (for fo) [fo] Set attributes for display of figures
<xsl:attribute name="text-align">center</xsl:attribute><xsl:attribute name="font-style">italic</xsl:attribute><xsl:attribute name="end-indent"><xsl:value-of select="$exampleMargin"/>
</xsl:attribute><xsl:attribute name="start-indent"><xsl:value-of select="$exampleMargin"/>
</xsl:attribute>
6.8.13 StyleYou can choose lots of features which affect the font, size, etc
• What font to use for URLs.
• Whether titles, dates and authors are shown.
• Whether headings of objects are included in cross-references.
Variables
Type Name Description DefaultpagebreakStyle Display of <pb> element. Choices
are "visible", "active" and "none".[string]
visible
149
Workshop: Working with TEI Texts
displayMode How to display Relax NG schemafragments (rnc or rng) [string]
rnc
minimalCrossRef Provide minimal context for a link[boolean]
false
postQuote Character to insert at end of quote.[string]
’
preQuote Character to insert at start of quote[string]
‘
xhtml urlMarkup HTML element to put around visibletext of display URLs [string]
span
fo activeLinebreaks Make <lb> active (ie cause a linebreak) [boolean]
true
fo alignment Alignment of text (ie justified orragged) [string]
justify
fo authorSize Font size for display of author name[string]
14pt
fo biblSize Font size for bibliography [string] 16ptfo bodyFont Default font for body [string] Timesfo bodyMaster Default font size for body (without
dimension) [string]10
fo bodySize Calculation of normal body font size(add dimension) [string]
<xsl:value-of select="$bodyMaster"/><xsl:text>pt</xsl:text>
fo dateSize Font size for display of date[string]
14pt
fo divFont Font for section headings [string] Timesfo exampleColor Colour for display of <eg> blocks.
[string]black
fo exampleBackgroundColor Colour for background display of<eg> blocks. [string]
gray
fo exampleSize Calculation of font size for examples(add dimension) [string]
<xsl:value-of select="$bodyMaster * 0.6"/><xsl:text>pt</xsl:text>
fo quoteSize Calculation of font size for quota-tions [string]
<xsl:value-of select="$bodyMaster * 0.9"/><xsl:text>pt</xsl:text>
fo footnoteSize Font size for footnotes [string]<xsl:value-of select="$bodyMaster * 0.8"/>
fo footnotenumSize Font size for footnote numbers[string]
<xsl:value-of select="$bodyMaster * 0.7"/>
fo giColor Colour for display of element names[string]
black
fo headingOutdent Indentation of headings [string] 0emfo hyphenate Hyphenate text [boolean] truefo identColor Colour for display of <ident> values
Customization parameter. [string]black
fo runFont Font family for running header andfooter [string]
sans-serif
150
6.8 TEI reference material: XSL stylesheets
fo runSize Font size for running header andfooter [string]
9pt
fo sansFont Sans-serif font [string] Helveticafo smallSize Calculation of small font size (add
dimension) [string]<xsl:value-of select="$bodyMaster * 0.9"/><xsl:text>pt</xsl:text>
fo tableSize Create font size for tables, by refer-ence to $bodyMaster [string]
<xsl:value-of select="$bodyMaster * 0.9"/><xsl:text>pt</xsl:text>
fo titleSize Font size for display of title[string]
16pt
fo tocSize Font size for TOC heading[string]
16pt
fo typewriterFont Font for literal code [string] Courierlatex typewriterFont Font for literal code [string] DejaVu Sans Monolatex sansFont Font for sans-serif [string]latex romanFont Font for serif [string]latex gothicFont Font for gothic [string] Lucida Blackletterlatex calligraphicFont Font for calligraphic [string] Lucida Calligraphy
TemplatesdivXRefHeading (for fo) [fo] How to display section headings in a cross-reference section title
<xsl:param name="head"><xsl:apply-templates mode="section" select="tei:head"/>
</xsl:param><xsl:text> (</xsl:text><xsl:value-of select="normalize-space($head)"/><xsl:text>)</xsl:text>
linkStyle (for fo) [fo] Set attributes for display of links
<xsl:attribute name="text-decoration">underline</xsl:attribute>
setupDiv0 (for fo) [fo] Set attributes for display of heading for chapters (level 0)
<xsl:attribute name="font-size">18pt</xsl:attribute><xsl:attribute name="text-align">left</xsl:attribute><xsl:attribute name="font-weight">bold</xsl:attribute><xsl:attribute name="space-after">6pt</xsl:attribute><xsl:attribute name="space-before.optimum">12pt</xsl:attribute><xsl:attribute name="text-indent"><xsl:value-of select="$headingOutdent"/>
</xsl:attribute>
setupDiv1 (for fo) [fo] Set attributes for display of heading for 1st level sections
<xsl:attribute name="font-size">14pt</xsl:attribute><xsl:attribute name="text-align">left</xsl:attribute>
151
Workshop: Working with TEI Texts
<xsl:attribute name="font-weight">bold</xsl:attribute><xsl:attribute name="space-after">3pt</xsl:attribute><xsl:attribute name="space-before.optimum">9pt</xsl:attribute><xsl:attribute name="text-indent"><xsl:value-of select="$headingOutdent"/>
</xsl:attribute>
setupDiv2 (for fo) [fo] Set attributes for display of heading for 2nd level sections
<xsl:attribute name="font-size">12pt</xsl:attribute><xsl:attribute name="text-align">left</xsl:attribute><xsl:attribute name="font-weight">bold</xsl:attribute><xsl:attribute name="font-style">italic</xsl:attribute><xsl:attribute name="space-after">2pt</xsl:attribute><xsl:attribute name="space-before.optimum">4pt</xsl:attribute><xsl:attribute name="text-indent"><xsl:value-of select="$headingOutdent"/>
</xsl:attribute>
setupDiv3 (for fo) [fo]Set attributes for display of heading for 3rd level sections
<xsl:attribute name="font-size">10pt</xsl:attribute><xsl:attribute name="text-align">left</xsl:attribute><xsl:attribute name="font-style">italic</xsl:attribute><xsl:attribute name="space-after">0pt</xsl:attribute><xsl:attribute name="space-before.optimum">4pt</xsl:attribute><xsl:attribute name="text-indent"><xsl:value-of select="$headingOutdent"/>
</xsl:attribute>
setupDiv4 (for fo) [fo] Set attributes for display of heading for 4th level sections
<xsl:attribute name="font-size">10pt</xsl:attribute><xsl:attribute name="text-align">left</xsl:attribute><xsl:attribute name="font-style">italic</xsl:attribute><xsl:attribute name="space-after">0pt</xsl:attribute><xsl:attribute name="space-before.optimum">4pt</xsl:attribute><xsl:attribute name="text-indent"><xsl:value-of select="$headingOutdent"/>
</xsl:attribute>
setupDiv5 (for fo) [fo] Set attributes for display of heading for 5th level sections
<xsl:attribute name="font-size">10pt</xsl:attribute><xsl:attribute name="text-align">left</xsl:attribute><xsl:attribute name="font-style">italic</xsl:attribute><xsl:attribute name="space-after">0pt</xsl:attribute><xsl:attribute name="space-before.optimum">4pt</xsl:attribute><xsl:attribute name="text-indent"><xsl:value-of select="$headingOutdent"/>
</xsl:attribute>
setupDiv6 (for fo) [fo] Set attributes for display of heading for 6th level sections
152
6.8 TEI reference material: XSL stylesheets
<xsl:attribute name="font-size">10pt</xsl:attribute><xsl:attribute name="text-align">left</xsl:attribute><xsl:attribute name="font-style">italic</xsl:attribute><xsl:attribute name="space-after">0pt</xsl:attribute><xsl:attribute name="space-before.optimum">4pt</xsl:attribute><xsl:attribute name="text-indent"><xsl:value-of select="$headingOutdent"/>
</xsl:attribute>
showXrefURL (for fo) [fo] How to display the link text of a <ptr>the URL being linked to
<xsl:param name="dest"/><xsl:value-of select="$dest"/>
6.8.14 HooksA set of templates which are empty by default; they can be used to add code at strategic points. Thecontent must be valid XSLT.
Variables
Type Name Description Default
TemplatessectionHeadHook (for common) [common] Hook where actions can be inserted when making a
heading
bodyHook (for xhtml) [html] Hook where HTML can be inserted just after <body>
bodyEndHook (for xhtml) [html] Hook where HTML can be inserted just before the <body> ends.This can be used to add a page-wide footer block.
bodyJavascriptHook (for xhtml) [html] Hook where Javascript calls can be inserted just after <body>
cssHook (for xhtml) [html] Hook where extra CSS can be inserted
headHook (for xhtml) [html] Hook where code can be added to the HTML <head>. This would beused to insert <meta> tags.
imgHook (for xhtml) [html] Hook where HTML can be inserted when creating an <img>
figureHook (for xhtml) [html] Hook where HTML can be inserted when processing a figure
javascriptHook (for xhtml) [html] Hook where extra Javascript functions can be defined
preAddressHook (for xhtml) [html] Hook where HTML can be inserted just before the <address>
startDivHook (for xhtml) [html] Hook where HTML can be inserted at the start of processing eachsection
startHook (for xhtml) [html] Hook where HTML can be inserted at the beginning of the main text,after the header
teiEndHook (for xhtml) [html] Hook where HTML can be inserted after processing <TEI>
teiStartHook (for xhtml) [html] Hook where HTML can be inserted before processing <TEI>
153
Workshop: Working with TEI Texts
xrefHook (for xhtml) [html] Hook where HTML can be inserted when creating an <a> element
egXMLStartHook (for xhtml) [html] Hooks where HTML can be inserted when processing<egXML> element
afterBodyHook (for fo) [fo] Hook where extra material can be inserted after the <body> has beenprocessed
blockStartHook (for fo) [fo] Hook where work can be done at the start of each block
pageMasterHook (for fo) [fo] Hook where extra page masters can be defined
beginDocumentHook (for latex) [latex] Hook where LaTeX commands can be inserted after thebeginning of the document
latexSetupHook (for latex) [latex] Hook where LaTeX commands can be at start of setup
latexPreambleHook (for latex) [latex] Hook where LaTeX commands can be inserted in the preamblebefore the beginning of the document
6.8.15 Miscellaneous and advancedFinally, some miscellaneous or advanced features which you probably won’t use much.
Variables
Type Name Description DefaultteixslHome The home page for these stylesheets
[anyURI]http://www.tei-c.org/Stylesheets/
teiP4Compat Process elements according to as-sumptions of TEI P4 [boolean]
false
useHeaderFrontMatter Title, author and date is taken fromthe <teiHeader> rather than lookedfor in the front matter [boolean]
false
useFixedDate Whether to attempt to work out acurrent date (set to true for test resultswhich won’t differ [boolean]
false
xhtml generateParagraphIDs Generate a unique ID for all para-graphs [boolean]
false
xhtml rendSeparator Character separating values in a rendattribute. Some projects use multi-ple values in rend attributes. Theseare handled, but the separator charac-ter(s) must be specified. [string]
;
xhtml showTitleAuthor Show a title and author at start ofdocument [boolean]
false
xhtml verbose Be talkative while working.[boolean]
false
Templates
154
6.9 Quick reference cards for XSLT, XQuery, XPath, Regular Expressions, and Schematron
6.9 Quick reference cards for XSLT, XQuery, XPath, Regular Expres-sions, and Schematron
155
Text/
Str
ing F
uncti
ons
codepoin
t-equal(xs:s
trin
g?,
xs:s
trin
g?)
as
xs:b
oole
an?
codepoin
ts-to
-str
ing(x
s:inte
ger*
) as x
s:s
trin
g
com
pare
(xs:s
trin
g?,
xs:s
trin
g?)
as x
s:inte
ger?
com
pare
(xs:s
trin
g?,
xs:s
trin
g?,
xs:s
trin
g)
as
xs:inte
ger?
concat(
xs:a
nyA
tom
icType?,
xs:a
nyA
tom
icType?,
)
as x
s:s
trin
g
conta
ins(x
s:s
trin
g?,
xs:s
trin
g?)
as x
s:b
oole
an
conta
ins(x
s:s
trin
g?,
xs:s
trin
g?,
xs:s
trin
g)
as
xs:b
oole
an
curr
ent-
date
() a
s x
s:d
ate
curr
ent-
date
Tim
e()
as x
s:d
ate
Tim
e
curr
ent-
tim
e()
as x
s:t
ime
defa
ult
-collati
on()
as x
s:s
trin
g
encode-fo
r-uri
(xs:s
trin
g?)
as x
s:s
trin
g
ends-w
ith(x
s:s
trin
g?,
xs:s
trin
g?)
as x
s:b
oole
an
ends-w
ith(x
s:s
trin
g?,
xs:s
trin
g?,
xs:s
trin
g)
as
xs:b
oole
an
escape-htm
l-uri
(xs:s
trin
g?)
as x
s:s
trin
g
low
er-
case(x
s:s
trin
g?)
as x
s:s
trin
g
norm
alize-space()
as x
s:s
trin
g
norm
alize-space(x
s:s
trin
g?)
as x
s:s
trin
g
norm
alize-unic
ode(x
s:s
trin
g?)
as x
s:s
trin
g
norm
alize-unic
ode(x
s:s
trin
g?,
xs:s
trin
g)
as
xs:s
trin
g
sta
rts-w
ith(x
s:s
trin
g?,
xs:s
trin
g?)
as x
s:b
oole
an
sta
rts-w
ith(x
s:s
trin
g?,
xs:s
trin
g?,
xs:s
trin
g) as
xs:b
oole
an
str
ing()
as x
s:s
trin
g
str
ing(ite
m()
?) a
s x
s:s
trin
g
str
ing-jo
in(x
s:s
trin
g*,
xs:s
trin
g)
as x
s:s
trin
g
str
ing-le
ngth
() a
s x
s:inte
ger
str
ing-le
ngth
(xs:s
trin
g?)
as x
s:inte
ger
str
ing-to
-codepoin
ts(x
s:s
trin
g?)
as x
s:inte
ger*
substr
ing(x
s:s
trin
g?,
xs:d
ouble
) as x
s:s
trin
g
substr
ing(x
s:s
trin
g?,
xs:d
ouble
, xs:d
ouble
) as
xs:s
trin
g
substr
ing-aft
er(
xs:s
trin
g?,
xs:s
trin
g?)
as x
s:s
trin
g
substr
ing-aft
er(
xs:s
trin
g?,
xs:s
trin
g?,
xs:s
trin
g)
as
xs:s
trin
g
substr
ing-befo
re(x
s:s
trin
g?,
xs:
str
ing?)
as x
s:s
trin
g
substr
ing-befo
re(x
s:s
trin
g?,
xs:
str
ing?,
xs:s
trin
g)
as x
s:s
trin
g
transla
te(x
s:s
trin
g?,
xs:s
trin
g, xs:s
trin
g)
as x
s:s
trin
g
upper-
case(x
s:s
trin
g?)
as x
s:s
trin
g
XSL-Lis
t:
htt
p:/
/w
ww
.mulb
err
yte
ch.c
om
/xsl/
xsl-
list
REG
EX F
uncti
ons
matc
hes(x
s:s
trin
g?,
xs:s
trin
g) as x
s:b
oole
an
matc
hes(x
s:s
trin
g?,
xs:s
trin
g,
xs:s
trin
g) as
xs:b
oole
an
rep
lace(x
s:s
trin
g?,
xs:s
trin
g,
xs:
str
ing) as
xs:s
trin
g
rep
lace(x
s:s
trin
g?,
xs:s
trin
g,
xs:
str
ing, xs:s
trin
g)
as x
s:s
trin
g
tokeniz
e(x
s:s
trin
g?,
xs:s
trin
g) as x
s:s
trin
g*
tokeniz
e(x
s:s
trin
g?,
xs:s
trin
g,
xs:s
trin
g) as
xs:s
trin
g*
Ari
thm
eti
c O
pera
tors
+
(num
eri
c) as ~
num
eri
c
(num
eri
c) +
(num
eri
c)
as ~
num
eri
c
- (num
eri
c)
as ~
num
eri
c
(num
eri
c) - (num
eri
c) as ~
num
eri
c
(num
eri
c) *
(num
eri
c)
as ~
num
eri
c
(num
eri
c) d
iv (num
eri
c) as ~
num
eri
c
(num
eri
c)
idiv
(num
eri
c)
as x
s:inte
ger
(num
eri
c) m
od
(num
eri
c)
as ~
num
eri
c
Ari
thm
eti
c F
uncti
ons
abs(n
um
eri
c?)
as ~
num
eri
c?
avg(x
s:a
nyA
tom
icType*)
as ~
xs:
anyA
tom
icType?
ceilin
g(n
um
eri
c?)
as ~
num
eri
c?
floor(
num
eri
c?)
as ~
num
eri
c?
num
ber(
) as x
s:d
ouble
num
ber(
xs:a
nyA
tom
icType?)
as x
s:d
ouble
round(n
um
eri
c?)
as ~
num
eri
c?
round-half
-to
-even(n
um
eri
c?)
as ~
num
eri
c?
round-half
-to
-even(n
um
eri
c?,
xs:inte
ger)
as
~num
eri
c?
sum
(xs:a
nyA
tom
icType*)
as ~
xs:a
nyA
tom
icType
sum
(xs:a
nyA
tom
icType*,
xs:a
nyA
tom
icType?)
as
~xs:a
nyA
tom
icType?
The e
q, ne, lt
, gt,
le a
nd g
e c
om
pari
sons a
re
support
ed f
or
the n
um
eri
c t
ypes.
Sequence O
pera
tors
(ite
m()
*) ,
(it
em
()*)
as ~
item
()*
(node()
*) u
nio
n (node()
*) a
s ~
node()
*
(node()
*) inte
rsect
(node()
*) a
s ~
node()
*
(node()
*) e
xcept
(node()
*) a
s ~
node()
*
(xs:inte
ger)
to (
xs:inte
ger)
as x
s:inte
ger*
Node C
om
pari
sons
(node()
) is
(node()
) as x
s:b
oole
an
(node()
) <
< (node()
) as x
s:b
oole
an
(node()
) >
> (node()
) as x
s:b
oole
an
Sequence a
nd N
ode F
uncti
ons
collecti
on()
as n
ode()
*
collecti
on(x
s:s
trin
g?)
as n
ode()
*
count(
item
()*)
as x
s:inte
ger
data
(ite
m()
*) a
s ~
xs:a
nyA
tom
icType*
deep-equal(it
em
()*,
ite
m()
*) a
s x
s:b
oole
an
deep-equal(it
em
()*,
ite
m()
*, s
trin
g)
as x
s:b
oole
an
dis
tinct-
valu
es(x
s:a
nyA
tom
icType*)
as
~xs:a
nyA
tom
icType*
dis
tinct-
valu
es(x
s:a
nyA
tom
icType*,
xs:s
trin
g)
as
~xs:a
nyA
tom
icType*
doc(x
s:s
trin
g?)
as d
ocum
ent-
node()
?
em
pty
(ite
m()
*) a
s x
s:b
oole
an
exactl
y-one(ite
m()
*) a
s ~
item
()
exis
ts(ite
m()
*) a
s x
s:b
oole
an
index-of(
xs:a
nyA
tom
icType*,
xs:a
nyA
tom
icType)
as x
s:inte
ger*
index-of(
xs:a
nyA
tom
icType*,
xs:a
nyA
tom
icType,
xs:s
trin
g) as x
s:inte
ger*
insert
-befo
re(ite
m()
*, x
s:inte
ger,
ite
m()
*) a
s
~it
em
()*
last(
) as x
s:inte
ger
nille
d(n
ode()
?) a
s x
s:b
oole
an?
node-nam
e(n
ode()
?) a
s x
s:Q
Nam
e?
one-or-
more
(ite
m()
*) a
s ~
item
()+
posit
ion()
as x
s:inte
ger
rem
ove(ite
m()
*, x
s:inte
ger)
as ~
item
()*
revers
e(ite
m()
*) a
s ~
item
()*
root(
) as n
ode()
root(
node()
?) a
s n
ode()
?
subsequence(ite
m()
*, x
s:d
ouble
) as ~
item
()*
subsequence(ite
m()
*, x
s:d
ouble
, xs:d
ouble
) as
~it
em
()*
unord
ere
d(ite
m()
*) a
s ~
item
()*
zero
-or-
one(ite
m()
*) a
s ~
item
()?
Mis
cellaneous F
uncti
ons
err
or(
) as n
one
err
or(
xs:Q
Nam
e) as n
one
err
or(
xs:Q
Nam
e?,
xs:s
trin
g)
as n
one
err
or(
xs:Q
Nam
e?,
xs:s
trin
g,
item
()*)
as n
one
lang(x
s:s
trin
g?)
as x
s:b
oole
an
lang(x
s:s
trin
g?,
node()
) as x
s:b
oole
an
max(x
s:a
nyA
tom
icType*)
as ~
xs:a
nyA
tom
icType?
max(x
s:a
nyA
tom
icType*,
str
ing)
as
~xs:a
nyA
tom
icType?
min
(xs:a
nyA
tom
icType*)
as ~
xs:
anyA
tom
icType?
min
(xs:a
nyA
tom
icType*,
str
ing)
as
~xs:a
nyA
tom
icType?
trace(ite
m()
*, x
s:s
trin
g)
as ~
item
()*
Boole
an F
uncti
ons
boole
an(ite
m()
*) a
s x
s:b
oole
an
fals
e()
as x
s:b
oole
an
not(
item
()*)
as x
s:b
oole
an
true()
as x
s:b
oole
an
The e
q, ne, lt
, gt,
le a
nd g
e c
om
pari
sons a
re
support
ed f
or
the x
s:b
oole
an t
ype.
UR
I, ID
and X
ML N
am
e F
uncti
ons
base-uri
() a
s x
s:a
nyU
RI?
base-uri
(node()
?) a
s x
s:a
nyU
RI?
docum
ent-
uri
(node()
?) a
s x
s:a
nyU
RI?
doc-available
(xs:s
trin
g?)
as x
s:b
oole
an
in-scope-pre
fixes(e
lem
ent(
)) a
s x
s:s
trin
g*
id(x
s:s
trin
g*)
as e
lem
ent(
)*
id(x
s:s
trin
g*,
node()
) as e
lem
ent(
)*
idre
f(xs:s
trin
g*)
as n
ode()
*
idre
f(xs:s
trin
g*,
node()
) as n
ode()
*
iri-
to-uri
(xs:s
trin
g?)
as x
s:s
trin
g
local-
nam
e()
as x
s:s
trin
g
local-
nam
e(n
ode()
?) a
s x
s:s
trin
g
local-
nam
e-fr
om
-Q
Nam
e(x
s:Q
Nam
e?)
as
xs:N
CN
am
e?
nam
e()
as x
s:s
trin
g
nam
e(n
ode()
?) a
s x
s:s
trin
g
nam
espace-uri
() a
s x
s:a
nyU
RI
nam
espace-uri
(node()
?) a
s x
s:a
nyU
RI
nam
espace-uri
-fo
r-pre
fix(x
s:s
trin
g?,
ele
ment(
))
as x
s:a
nyU
RI?
nam
espace-uri
-fr
om
-Q
Nam
e(x
s:Q
Nam
e?)
as
xs:a
nyU
RI?
pre
fix-fr
om
-Q
Nam
e(x
s:Q
Nam
e?)
as x
s:N
CN
am
e?
QN
am
e(x
s:s
trin
g?,
xs:s
trin
g)
as x
s:Q
Nam
e
resolv
e-Q
Nam
e(x
s:s
trin
g?,
ele
ment(
)) a
s
xs:Q
Nam
e?
resolv
e-uri
(xs:s
trin
g?)
as x
s:a
nyU
RI?
resolv
e-uri
(xs:s
trin
g?,
xs:s
trin
g)
as x
s:a
nyU
RI?
sta
tic-base-uri
() a
s x
s:a
nyU
RI?
Built-
In S
chem
a T
ypes
These t
ypes a
re a
vailable
in a
ll im
ple
menta
tions.
xs:a
nyA
tom
icType
xs:g
Month
xs:a
nySim
ple
Type
xs:a
nyU
RI
xs:a
nyType
xs:g
Month
Day
xs:b
ase64Bin
ary
xs:g
Year
xs:b
oole
an
xs:g
YearM
onth
xs:d
ate
xs:h
exBin
ary
xs:d
ate
Tim
e
xs:inte
ger
xs:d
ayTim
eD
ura
tion
xs:Q
Nam
e
xs:d
ecim
al
xs:s
trin
g
xs:d
ouble
xs:t
ime
xs:d
ura
tion
xs:u
nty
ped
xs:f
loat
xs:u
nty
pedA
tom
ic
xs:g
Day
xs:y
earM
onth
Dura
tion
XPath Functions
156
Date
/T
ime F
uncti
ons
adju
st-
date
-to
-ti
mezone(x
s:d
ate
?) a
s x
s:d
ate
?
adju
st-
date
-to
-ti
mezone(x
s:d
ate
?,
xs:d
ayTim
eD
ura
tion?)
as x
s:d
ate
?
adju
st-
date
Tim
e-to
-ti
mezone(x
s:d
ate
Tim
e?)
as
xs:d
ate
Tim
e?
adju
st-
date
Tim
e-to
-ti
mezone(x
s:d
ate
Tim
e?,
xs:d
ayTim
eD
ura
tion?)
as x
s:d
ate
Tim
e?
adju
st-
tim
e-to
-ti
mezone(x
s:t
ime?)
as x
s:t
ime?
adju
st-
tim
e-to
-ti
mezone(x
s:t
ime?,
xs:d
ayTim
eD
ura
tion?)
as x
s:t
ime?
date
Tim
e(x
s:d
ate
?, x
s:t
ime?)
as x
s:d
ate
Tim
e?
day-fr
om
-date
(xs:d
ate
?) a
s x
s:inte
ger?
day-fr
om
-date
Tim
e(x
s:d
ate
Tim
e?)
as x
s:inte
ger?
days-fr
om
-dura
tion(x
s:d
ura
tion?)
as x
s:inte
ger?
hours
-fr
om
-date
Tim
e(x
s:d
ate
Tim
e?)
as
xs:inte
ger?
hours
-fr
om
-dura
tion(x
s:d
ura
tion?)
as x
s:inte
ger?
hours
-fr
om
-ti
me(x
s:t
ime?)
as x
s:inte
ger?
implicit
-ti
mezone()
as x
s:d
ayTim
eD
ura
tion
min
ute
s-fr
om
-date
Tim
e(x
s:d
ate
Tim
e?)
as
xs:inte
ger?
min
ute
s-fr
om
-dura
tion(x
s:d
ura
tion?)
as
xs:inte
ger?
min
ute
s-fr
om
-ti
me(x
s:t
ime?)
as x
s:inte
ger?
month
-fr
om
-date
(xs:d
ate
?) a
s x
s:inte
ger?
month
-fr
om
-date
Tim
e(x
s:d
ate
Tim
e?)
as
xs:inte
ger?
month
s-fr
om
-dura
tion(x
s:d
ura
tion?)
as
xs:inte
ger?
seconds-fr
om
-date
Tim
e(x
s:d
ate
Tim
e?)
as
xs:d
ecim
al?
seconds-fr
om
-dura
tion(x
s:d
ura
tion?)
as
xs:d
ecim
al?
seconds-fr
om
-ti
me(x
s:t
ime?)
as x
s:d
ecim
al?
tim
ezone-fr
om
-date
(xs:d
ate
?) a
s
xs:d
ayTim
eD
ura
tion?
tim
ezone-fr
om
-date
Tim
e(x
s:d
ate
Tim
e?)
as
xs:d
ayTim
eD
ura
tion?
tim
ezone-fr
om
-ti
me(x
s:t
ime?)
as
xs:d
ayTim
eD
ura
tion?
year-
from
-date
(xs:d
ate
?) a
s x
s:in
teger?
year-
from
-date
Tim
e(x
s:d
ate
Tim
e?)
as x
s:inte
ger?
years
-fr
om
-dura
tion(x
s:d
ura
tion?)
as x
s:inte
ger?
XPath
2.0
: htt
p:/
/w
ww
.w3
.org
/TR/xpath
20/
XQ
uery
1.0
: htt
p:/
/w
ww
.w3
.org
/TR/xquery
/
XQ
uery
1.0
& X
Path
2.0
Functi
ons &
Opera
tors
: htt
p:/
/w
ww
.w3
.org
/TR/xpath
-fu
ncti
ons/
XSLT
-O
nly
Functi
ons
curr
ent(
) as ite
m()
curr
ent-
gro
up()
as ite
m()
*
curr
ent-
gro
upin
g-key()
as x
s:a
nyA
tom
icType?
docum
ent(
item
()*)
as n
ode()
*
docum
ent(
item
()*,
node()
) as n
ode()
*
ele
ment-
available
(xs:s
trin
g) as x
s:b
oole
an
form
at-
date
Tim
e(x
s:d
ate
Tim
e?,
xs:s
trin
g,
xs:s
trin
g?,
xs:s
trin
g?,
xs:s
trin
g?)
as x
s:s
trin
g?
form
at-
date
Tim
e(x
s:d
ate
Tim
e?,
xs:s
trin
g)
as
xs:s
trin
g?
form
at-
date
(xs:d
ate
?, x
s:s
trin
g, xs:s
trin
g?,
xs:s
trin
g?,
xs:s
trin
g?)
as x
s:s
trin
g?
form
at-
date
(xs:d
ate
?, x
s:s
trin
g)
as x
s:s
trin
g?
form
at-
num
ber(
num
eri
c?,
xs:s
trin
g)
as x
s:s
trin
g
form
at-
num
ber(
num
eri
c?,
xs:s
trin
g,
xs:s
trin
g)
as
xs:s
trin
g
form
at-
tim
e(x
s:t
ime?,
xs:s
trin
g, xs:s
trin
g?,
xs:s
trin
g?,
xs:s
trin
g?)
as x
s:s
trin
g?
form
at-
tim
e(x
s:t
ime?,
xs:s
trin
g)
as x
s:s
trin
g?
functi
on-available
(xs:s
trin
g) as x
s:b
oole
an
functi
on-available
(xs:s
trin
g,
xs:
inte
ger)
as
xs:b
oole
an
genera
te-id
() a
s x
s:s
trin
g
genera
te-id
(node()
?) a
s x
s:s
trin
g
key(x
s:s
trin
g,
xs:a
nyA
tom
icType*)
as n
ode()
*
key(x
s:s
trin
g,
xs:a
nyA
tom
icType*,
node()
) as
node()
*
regex-gro
up(x
s:inte
ger)
as x
s:s
trin
g
syste
m-pro
pert
y(x
s:s
trin
g) as x
s:s
trin
g
type-available
(xs:s
trin
g)
as x
s:b
oole
an
unpars
ed-te
xt(
xs:s
trin
g?)
as x
s:str
ing?
unpars
ed-te
xt(
xs:s
trin
g?,
xs:s
trin
g) as x
s:s
trin
g?
unpars
ed-te
xt-
available
(xs:s
trin
g?)
as x
s:b
oole
an
unpars
ed-te
xt-
available
(xs:s
trin
g?,
xs:s
trin
g?)
as
xs:b
oole
an
unpars
ed-enti
ty-uri
(xs:s
trin
g) as x
s:a
nyU
RI
unpars
ed-enti
ty-public-id
(xs:s
trin
g)
as x
s:s
trin
g
Arg
um
ent
Nota
tion
num
eri
c
Any o
f xs:inte
ger,
xs:d
ecim
al, x
s:f
loat
or
xs:d
ouble
. *
A s
equence o
f th
e indic
ate
d t
ype.
? The indic
ate
d t
ype o
r em
pty
sequence.
~
The r
esult
type v
ari
es d
ependin
g o
n t
he
arg
um
ents
. xs:
htt
p:/
/w
ww
.w3
.org
/2001/XM
LSchem
a
2008-07-21
XQ
uery
1.0
&
XPath
2.0
Functi
ons &
Opera
tors
Quic
k R
efe
rence
Sam
Wilm
ott
sam
@w
ilm
ott
.ca
htt
p:/
/w
ww
.wilm
ott
.ca
and
Mulb
err
y T
echnolo
gie
s, In
c.
17 W
est
Jeff
ers
on S
treet,
Suit
e 2
07
Rockville
, M
D 2
085
0 U
SA
Phone:
+1 3
01/31
5-9
63
1
Fax:
+1 3
01
/31
5-828
5
info
@m
ulb
err
yte
ch.c
om
htt
p:/
/w
ww
.mulb
err
yte
ch.c
om
© 2
007
-2
008
Sam
Wilm
ott
and
M
ulb
err
y T
echnolo
gie
s, In
c.
Date
/T
ime O
pera
tors
(x
s:d
ate
) +
(xs:d
ayTim
eD
ura
tion)
as x
s:d
ate
(xs:d
ate
) +
(xs:y
earM
onth
Dura
tion)
as x
s:d
ate
(xs:d
ate
Tim
e)
+ (
xs:d
ayTim
eD
ura
tion) as
xs:d
ate
Tim
e
(xs:d
ate
Tim
e)
+ (
xs:y
earM
onth
Dura
tion)
as
xs:d
ate
Tim
e
(xs:d
ayTim
eD
ura
tion)
+ (
xs:d
ayTim
eD
ura
tion) as
xs:d
ayTim
eD
ura
tion
(xs:t
ime) +
(xs:d
ayTim
eD
ura
tion)
as x
s:t
ime
(xs:y
earM
onth
Dura
tion)
+ (xs:y
earM
onth
Dura
tion)
as x
s:y
earM
onth
Dura
tion
(xs:d
ate
) - (
xs:d
ate
) as x
s:d
ayTim
eD
ura
tion
(xs:d
ate
) - (
xs:d
ayTim
eD
ura
tion) as x
s:d
ate
(xs:d
ate
) - (
xs:y
earM
onth
Dura
tion)
as x
s:d
ate
(xs:d
ate
Tim
e)
- (xs:d
ate
Tim
e) as
xs:d
ayTim
eD
ura
tion
(xs:d
ate
Tim
e)
- (xs:d
ayTim
eD
ura
tion) as
xs:d
ate
Tim
e
(xs:d
ate
Tim
e)
- (xs:y
earM
onth
Dura
tion)
as
xs:d
ate
Tim
e
(xs:d
ayTim
eD
ura
tion)
- (xs:d
ayTim
eD
ura
tion) as
xs:d
ayTim
eD
ura
tion
(xs:t
ime) - (xs:d
ayTim
eD
ura
tion)
as x
s:t
ime
(xs:t
ime) - (xs:t
ime) as x
s:d
ayTim
eD
ura
tion
(xs:y
earM
onth
Dura
tion)
- (
xs:y
earM
onth
Dura
tion)
as x
s:y
earM
onth
Dura
tion
(xs:d
ayTim
eD
ura
tion)
* (x
s:d
ouble
) as
xs:d
ayTim
eD
ura
tion
(xs:y
earM
onth
Dura
tion)
* (x
s:d
ouble
) as
xs:y
earM
onth
Dura
tion
(xs:d
ayTim
eD
ura
tion)
div
(xs:d
ayTim
eD
ura
tion)
as
xs:d
ecim
al
(xs:d
ayTim
eD
ura
tion)
div
(xs:d
ouble
) as
xs:d
ayTim
eD
ura
tion
(xs:y
earM
onth
Dura
tion)
div
(xs:d
ouble
) as
xs:y
earM
onth
Dura
tion
(xs:y
earM
onth
Dura
tion)
div
(x
s:y
earM
onth
Dura
tion)
as x
s:d
ecim
al
The e
q, ne, lt
, gt,
le a
nd g
e c
om
pari
sons a
re
suppote
d f
or
the t
ypes: xs:d
ate
and x
s:t
ime.
The e
q a
nd n
e (
only
) com
pari
sons a
re s
upport
ed
for
the t
ypes: xs:d
ura
tion, xs:g
Day,
xs:g
Month
, xs:g
Month
Day, xs:g
Year
and
xs:g
YearM
onth
.
The lt,
gt,
le a
nd g
e (
only
) com
pari
sons a
re
support
ed f
or
the t
ypes: xs:d
ayTim
eD
ura
tion
and x
s:y
earM
onth
Dura
tion
.
Oth
er
Com
pari
sons
The e
q a
nd n
e (
only
) com
pari
sons a
re s
upport
ed
for
the t
ypes: xs:b
ase64Bin
ary
, xs:h
exBin
ary
, xs:N
OTA
TIO
N a
nd x
s:Q
Nam
e.
157
Cate
gory
Escapes
A c
ate
gory
escape m
atc
hes a
chara
cte
r fr
om
a s
et
specif
ied b
y a
pro
pert
y o
r usin
g a
blo
ck:
\p
indic
ate
s m
atc
h a
ny c
hara
cte
r in
the s
et.
\P
indic
ate
s m
atc
h a
ny c
hara
cte
r not
in t
he s
et.
Cate
gori
es a
nd P
ropert
ies
Any c
hara
cte
r can b
e m
atc
hed b
y its
pro
pert
ies
usin
g a
cate
gory
escape c
onsis
ting o
f a C
ate
gory
code f
ollow
ed b
y a
n o
pti
onal Pro
pert
y c
ode:
\p{L
} A
ny L
ett
er
\p{L
u}
Any U
pper-
case L
ett
er
\p{L
l}
Any L
ow
er-
case L
ett
er
\p{L
t}
Any T
itle
-case L
ett
er
\p{L
m}
Any L
ett
er
Modif
ier
\p{L
o}
Any “
Oth
er”
Lett
er
\p{M
} A
ny M
ark
\p{M
n}
Any N
on-Spacin
g M
ark
\p{M
c}
Any C
om
bin
ing M
ark
\p{M
e}
Any E
nclo
sin
g M
ark
\p{N
} A
ny D
igit
\p{N
d}
Any D
ecim
al D
igit
\p{N
l}
Any L
ett
er
Dig
it
\p{N
o}
Any “
Oth
er”
Dig
it
\p{P
} A
ny P
unctu
ati
on C
hara
cte
r
\p{P
c}
Any C
onnecto
r C
hara
cte
r
\p{P
d}
Any D
ash
Chara
cte
r
\p{P
s}
Any O
pen C
hara
cte
r
\p{P
e}
Any C
lose C
hara
cte
r
\p{P
i}
Any Init
ial Q
uote
Chara
cte
r
\p{P
f}
Any F
inal Q
uote
Chara
cte
r
\p{P
o}
Any “
Oth
er”
Punctu
ati
on
\p{Z
} A
ny S
epara
tor
Chara
cte
r
\p{Z
s}
Any S
pace S
epara
tor
\p{Z
l}
Any L
ine S
epara
tor
\p{Z
p}
Any P
ara
gra
ph S
epara
tor
\p{S
} A
ny S
ym
bol C
hara
cte
r
\p{S
m}
Any M
ath
Sym
bol
\p{S
c}
Any C
urr
ency S
ym
bol
\p{S
k}
Any M
odif
ier
Sym
bol
\p{S
o}
Any “
Oth
er”
Sym
bol
\p{C
} A
ny “
Oth
er”
Chara
cte
r
\p{C
c}
Any C
ontr
ol C
hara
cte
r
\p{C
f}
Any F
orm
at
Chara
cte
r
\p{C
o}
Any P
rivate
Use C
hara
cte
r
\p{C
n}
Any “
Not
Assig
ned”
Chara
cte
r
Chara
cte
r Blo
cks
Any c
hara
cte
r w
ithin
a U
nic
ode c
hara
cte
r blo
ck
can b
e m
atc
hed u
sin
g a
cate
gory
escape
consis
ting o
f “I
s”
follow
ed b
y t
he b
lock‟s
nam
e.
For
exam
ple
: \p{IsBasic
Lati
n}
Blo
ck
Sta
rt
Blo
ck
End
Blo
ck
Nam
e
0000
007F
Basic
Lati
n
0080
00FF
Lati
n-1Supple
ment
0100
017F
Lati
nExte
nded-A
0180
024F
Lati
nExte
nded-B
0250
02A
F
IPA
Exte
nsio
ns
02B0
02FF
Spacin
gM
odif
ierL
ett
ers
0300
036F
Com
bin
ingD
iacri
ticalM
ark
s
0370
03FF
Gre
ek
0400
04FF
Cyri
llic
0530
058F
Arm
enia
n
0590
05FF
Hebre
w
0600
06FF
Ara
bic
0700
074F
Syri
ac
0780
07BF
Thaana
0900
097F
Devanagari
0980
09FF
Bengali
0A
00
0A
7F
Gurm
ukhi
0A
80
0A
FF
Guja
rati
0B00
0B7F
Ori
ya
0B80
0BFF
Tam
il
0C
00
0C
7F
Telu
gu
0C
80
0C
FF
Kannada
0D
00
0D
7F
Mala
yala
m
0D
80
0D
FF
Sin
hala
0E00
0E7F
Thai
0E80
0EFF
Lao
0F00
0FFF
Tib
eta
n
1000
109F
Myanm
ar
10A
0
10FF
Georg
ian
1100
11FF
HangulJam
o
1200
137F
Eth
iopic
13A
0
13FF
Chero
kee
1400
167F
U
nif
iedC
anadia
nA
bori
gin
alS
yllabic
s
1680
169F
Ogham
16A
0
16FF
Runic
1780
17FF
Khm
er
1800
18A
F
Mongolian
1E00
1EFF
Lati
nExte
ndedA
ddit
ional
1F00
1FFF
Gre
ekExte
nded
2000
206F
Genera
lPunctu
ati
on
2070
209F
Supers
cri
pts
andSubscri
pts
20A
0
20C
F
Curr
encySym
bols
20D
0
20FF
Com
bin
ingM
ark
sfo
rSym
bols
2100
214F
Lett
erl
ikeSym
bols
2150
218F
Num
berF
orm
s
Blo
ck
Sta
rt
Blo
ck
End
Blo
ck
Nam
e
2190
21FF
Arr
ow
s
2200
22FF
Math
em
ati
calO
pera
tors
2300
23FF
Mis
cellaneousTechnic
al
2400
243F
Contr
olP
ictu
res
2440
245F
Opti
calC
hara
cte
rRecognit
ion
2460
24FF
Enclo
sedA
lphanum
eri
cs
2500
257F
BoxD
raw
ing
2580
259F
Blo
ckEle
ments
25A
0
25FF
Geom
etr
icShapes
2600
26FF
Mis
cellaneousSym
bols
2700
27BF
Din
gbats
2800
28FF
Bra
ille
Patt
ern
s
2E80
2EFF
CJK
Radic
als
Supple
ment
2F00
2FD
F
KangxiR
adic
als
2FF0
2FFF
Ideogra
phic
Descri
pti
onC
hara
cte
rs
3000
303F
CJK
Sym
bols
andPunctu
ati
on
3040
309F
Hir
agana
30A
0
30FF
Kata
kana
3100
312F
Bopom
ofo
3130
318F
HangulC
om
pati
bilit
yJa
mo
3190
319F
Kanbun
31A
0
31BF
Bopom
ofo
Exte
nded
3200
32FF
Enclo
sedC
JKLett
ers
andM
onth
s
3300
33FF
CJK
Com
pati
bilit
y
3400
4D
B5
C
JKU
nif
iedId
eogra
phsExte
nsio
nA
4E00
9FFF
CJK
Unif
iedId
eogra
phs
A000
A48F
YiS
yllable
s
A490
A4C
F
YiR
adic
als
AC
00
D7A
3
HangulS
yllable
s
E000
F8FF
Pri
vate
Use
F900
FA
FF
CJK
Com
pati
bilit
yId
eogra
phs
FB00
FB4F
Alp
habeti
cPre
senta
tionForm
s
FB50
FD
FF
Ara
bic
Pre
senta
tionForm
s-A
FE20
FE2F
Com
bin
ingH
alf
Mark
s
FE30
FE4F
CJK
Com
pati
bilit
yForm
s
FE50
FE6F
Sm
allForm
Vari
ants
FE70
FEFE
Ara
bic
Pre
senta
tionForm
s-B
FEFF
FEFF
Specia
ls
FF0
0
FFEF
Half
wid
thandFullw
idth
Form
s
FFF0
FFFD
Specia
ls
XSLT 2
.0:
htt
p:/
/w
ww
.w3
.org
/TR/xslt
20/
XQ
uery
1.0
:
htt
p:/
/w
ww
.w3
.org
/TR/xquery
/
XPath
2.0
:
htt
p:/
/w
ww
.w3
.org
/TR/xpath
20/
Unic
ode:
htt
p:/
/w
ww
.unic
ode.o
rg
Regula
r Expre
ssio
n E
xam
ple
s
^[A
-Za-z]
An A
scii lett
er
at
the s
tart
of
a s
trin
g o
r line.
^\p{L
u}
An u
pper-
case U
nic
ode lett
er
at
the s
tart
of
a
str
ing o
r line.
\.$
A p
eri
od a
t th
e e
nd o
f a s
trin
g o
r line.
\p{IsG
reek}+
One o
r m
ore
Gre
ek lett
ers
.
\p{IsG
reek}{
1,}
One o
r m
ore
Gre
ek lett
ers
.
.*?;
Up t
o a
nd inclu
din
g t
he n
ext
sem
icolo
n.
.*;
Up t
o a
nd inclu
din
g t
he last
sem
icolo
n.
^\c+
$
Matc
h o
nly
if
the s
trin
g c
onsis
ts e
nti
rely
of
XM
L n
am
e c
hara
cte
rs.
[ -~-[\
[\]]
]+
Any A
scii p
rinta
ble
chara
cte
r except
the
square
bra
ckets
.
\w
+ A "
word
".
[^\s]+
Non-w
hit
e-space c
hara
cte
rs.
\S+
Non-w
hit
e-space c
hara
cte
rs.
(['"])
(.*?
)\1
A s
trin
g d
elim
ited b
y s
ingle
or
double
quote
s.
$2 o
r re
gex-gro
up(2
) w
ill re
turn
the u
nquote
d
substr
ing.
(\1 is t
he q
uote
chara
cte
r used.)
\s*(
\i\
c*)
\s*=
\s*(
["'])(
.*?)
\2
An X
ML-att
ribute
-like n
am
e, equal and
quote
d v
alu
e (
wit
h o
pti
onal le
adin
g a
nd
inte
rvenin
g w
hit
e s
pace).
$1 is t
he n
am
e a
nd
$3 is t
he v
alu
e.
\((
\d+
|\p{L
}+)\
)
A p
are
nth
esiz
ed s
equence e
ither
of
dig
its o
r
of
lett
ers
(but
not
a m
ixtu
re o
f both
).
\p{S
c}(
\d+
(\.\
d*)
?|\.\
d+
)
A d
ecim
al num
ber
wit
h a
leadin
g c
urr
ency
sym
bol.
Regular expressions
158
Escapin
g C
hara
cte
rs
Chara
cte
rs t
hat
have s
pecia
l m
eanin
g in r
egula
r expre
ssio
ns n
eed t
o b
e e
scaped if
they a
re t
o b
e
repre
sente
d “
as is”.
These c
hara
cte
rs a
re:
\
| .
? *
+
( )
{ }
[ ]
-
^
$
In a
ddit
ion, th
e f
ollow
ing e
scapes r
epre
sent
sin
gle
chara
cte
rs:
\n
new
line o
r line-fe
ed c
hara
cte
r (&
#x0A
;)
\r
carr
iage r
etu
rn c
hara
cte
r (&
#x0D
;)
\t
tab c
hara
cte
r (&
#x09;)
Mult
i-C
hara
cte
r Escapes
. (d
ot)
Any N
on
-Lin
e-End C
hara
cter
\s
Any S
pace C
hara
cte
r
\i
Any Init
ial N
am
e C
hara
cte
r
(inclu
din
g „
_‟ and „:‟)
\c
Any N
am
e C
hara
cte
r
(inclu
din
g „
.‟, „-
„, „_‟ and „:‟)
\d
Any D
ecim
al D
igit
\w
A
ny “
Word
” C
hara
cte
r (a
nyth
ing o
ther
than P
unctu
ati
on, Separa
tor
or
“Oth
er”
)
An u
pper-
case m
ult
i-ch
ara
cte
r escape m
atc
hes
any c
hara
cte
r not
descri
bed b
y t
he low
er-
case
escape. T
he u
pper-
case e
scapes a
re:
\S
\I
\C
\D
\W
Chara
cte
r C
lass E
xpre
ssio
ns
A c
hara
cte
r cla
ss e
xpre
ssio
n m
atc
hes a
sin
gle
chara
cte
r. It
‟s w
rapped in s
quare
bra
ckets
and
consis
ts o
f th
ree p
art
s:
1.
an o
pti
onal negati
on indic
ato
r, ^
.
2.
one o
r m
ore
chara
cte
rs o
r ra
nges, and
3.
an o
pti
onal ch
ara
cte
r cla
ss s
ubtr
acti
on.
If t
he n
egati
on indic
ato
r is
used, th
e s
ingle
chara
cte
r m
atc
hed is a
ny c
hara
cte
r not
giv
en
follow
ing it
or
in a
giv
en r
ange.
A c
hara
cte
r ra
nge c
onsis
ts o
f tw
o c
hara
cte
rs
separa
ted b
y a
dash, as in:
[-a-zA
-Z0-9_]
A leadin
g d
ash (-)
is a
dash
, not
a r
ange.
A c
hara
cte
r cla
ss s
ubtr
acti
on c
onsis
ts o
f a d
ash
fo
llow
ed b
y a
chara
cte
r, c
ate
gory
escape o
r neste
d c
hara
cte
r cla
ss e
xpre
ssio
n, as in:
[a-z-[a
eio
u]]
i.e. M
atc
h low
er-
case lett
ers
but
not
the v
ow
els
.
XPath
2.0
and X
Query
1.0
Functi
ons
That
Use R
egula
r Expre
ssio
ns
matc
hes(x
s:s
trin
g?,
xs:s
trin
g) as x
s:b
oole
an
matc
hes(x
s:s
trin
g?,
xs:s
trin
g,
xs:s
trin
g) as
xs:b
oole
an
rep
lace(x
s:s
trin
g?,
xs:s
trin
g,
xs:
str
ing) as
xs:s
trin
g
rep
lace(x
s:s
trin
g?,
xs:s
trin
g,
xs:
str
ing,
xs:s
trin
g)
as x
s:s
trin
g
tokeniz
e(x
s:s
trin
g?,
xs:s
trin
g) as x
s:s
trin
g*
tokeniz
e(x
s:s
trin
g?,
xs:s
trin
g,
xs:s
trin
g) as
xs:s
trin
g*
XSLT
2.0
Instr
ucti
ons T
hat
Use
Regula
r Expre
ssio
ns
<xsl:analy
ze-str
ing s
ele
ct
= e
xpre
ssio
n
regex =
{ s
trin
g }
flags =
{ s
trin
g }>
<xsl:m
atc
hin
g-su
bstr
ing>
sequence-constr
ucto
r
</xsl:m
atc
hin
g-su
bstr
ing>
<xsl:non-m
atc
hin
g-substr
ing>
sequence-constr
ucto
r
</xsl:non-m
atc
hin
g-substr
ing>
xsl:fa
llback*
<
/xsl:analy
ze-str
ing>
One b
ut
not
both
of
xsl:m
atc
hin
g-su
bstr
ing a
nd
xsl:non-m
atc
hin
g-substr
ing
can b
e o
mit
ted.
Insid
e x
sl:m
atc
hin
g-su
bstr
ing, th
e
regex-gro
up(N
) fu
ncti
on r
etu
rns t
he N
th g
roup
captu
red b
y t
he r
egula
r expre
ssio
n.
Regula
r Expre
ssio
n M
atc
hin
g F
lags
Fla
gs a
re lett
ers
used t
o indic
ate
how
Regula
r Expre
ssio
n m
atc
hin
g is t
o b
e d
one:
s
Dot
(.)
matc
hes a
ny c
hara
cte
r, lin
e-end
chara
cte
rs inclu
ded.
m
^ a
nd $
matc
h a
t th
e s
tart
and e
nd o
f all
lines, not
just
the s
tart
and e
nd o
f th
e
sele
cte
d s
trin
g a
s a
whole
.
i M
atc
h c
ase insensit
ive.
x
Rem
ove w
hit
e-space (
space, ta
b a
nd lin
e-
end)
chara
cte
rs f
rom
the r
egula
r expre
ssio
n
befo
re u
sin
g it.
Zero
or
more
fla
gs a
re s
pecif
ied a
s a
str
ing u
sin
g
the o
pti
onal fl
ags=
att
ribute
of
xsl:analy
ze-str
ing
or
the o
pti
onal la
st
arg
um
ent
of
the m
atc
hes,
rep
lace a
nd t
okeniz
e f
uncti
ons.
2008-07-21
Regula
r Expre
ssio
ns
in X
SLT
2.0
,
XQ
uery
1.0
and
XPath
2.0
Sam
Wilm
ott
sam
@w
ilm
ott
.ca
htt
p:/
/w
ww
.wilm
ott
.ca
and
Mulb
err
y T
echnolo
gie
s, In
c.
17 W
est
Jeff
ers
on S
treet,
Suit
e 2
07
Rockville
, M
D 2
085
0 U
SA
Phone:
+1 3
01/31
5-9
63
1
Fax:
+1 3
01
/31
5-828
5
info
@m
ulb
err
yte
ch.c
om
htt
p:/
/w
ww
.mulb
err
yte
ch.c
om
© 2
007
-2
008
Sam
Wilm
ott
and
M
ulb
err
y T
echnolo
gie
s, In
c.
Regula
r Expre
ssio
n B
asic
s
A r
egula
r expre
ssio
n is:
oneThin
g |
anoth
erT
hin
g |
yetA
noth
er
Matc
h o
ne t
hin
g o
r anoth
er
or
anoth
er
(one o
r m
ore
thin
gs).
oneThin
g a
noth
erT
hin
g y
etA
noth
er
Matc
h o
ne t
hin
g f
ollow
ed b
y a
noth
er
etc
. (o
ne
or
more
thin
gs)
ato
m q
uanti
fier
Matc
h a
tom
the n
um
ber
of
tim
es indic
ate
d b
y
quanti
fier;
once if
quanti
fier
is o
mit
ted.
Where
ato
m is a
ny o
f:
an u
nescaped c
hara
cte
r,
an e
scaped c
hara
cte
r,
a p
are
nth
esiz
ed r
egula
r expre
ssio
n, or
a c
hara
cte
r cla
ss e
xpre
ssio
n.
Where
quanti
fier
is a
ny o
f:
? zero
or
one t
imes (
i.e. opti
onal)
* zero
or
more
tim
es
+
one o
r m
ore
tim
es
{N}
exactl
y N
tim
es
{N,}
N o
r m
ore
tim
es
{N,M
} betw
een N
and M
tim
es inclu
siv
e.
An e
xtr
a t
railin
g ?
, as in ?
?, +
? or
{N,M
}? m
eans
matc
h t
he s
hort
est
possib
le n
um
ber
of
repeti
tions r
ath
er
than t
he (
defa
ult
) lo
ngest.
Lin
e S
tart
s a
nd E
nds
A r
egula
r expre
ssio
n c
an b
e a
nchore
d a
t th
e s
tart
and/or
end o
f a s
trin
g u
sin
g ^
(th
e s
tart
) and $
(t
he e
nd).
If
a r
egula
r expre
ssio
n is u
sed w
ith
the m
fla
g, ^
and $
matc
h a
t th
e s
tart
and e
nd o
f each lin
e.
In t
he a
bsence o
f ^
or
$, a r
egula
r expre
ssio
n
matc
hes u
nanch
ore
d: anyw
here
wit
hin
the s
trin
g.
Subexpre
ssio
ns a
nd B
ack R
efe
rences
Each p
are
nth
esiz
ed g
roup in a
regula
r expre
ssio
n
is a
ssig
ned a
gro
up n
um
ber
counti
ng u
nescaped
left
pare
nth
eses s
tart
ing f
rom
the left
.
Gro
up n
um
bers
can b
e u
sed in t
hre
e w
ays:
1.
Wit
hin
a r
egula
r expre
ssio
n, to
matc
h w
hat
was m
atc
hed b
y a
pre
vio
us s
ubexpre
ssio
n. A
pre
vio
usly
matc
hed g
roup is identi
fied b
y
backsla
sh a
nd a
num
ber:
\1
, \2
etc
.
2.
Wit
hin
a r
epla
ce r
epla
cem
ent
expre
ssio
n t
o
matc
h w
hat
was m
atc
hed b
y a
pre
vio
us
subexpre
ssio
n. A
gro
up is identi
fied b
y a
num
eri
c n
am
e:
$1
, $
2 e
tc. A
s w
ell, $
0
identi
fies t
he w
hole
matc
hed s
ubstr
ing.
3.
wit
hin
a X
SLT r
egex-gro
up(N
) to
access t
he
matc
hed s
ubexpre
ssio
n.
159
Sim
ple
Expre
ssio
ns
$V
arN
am
e
( Expr
)
( ) . (
one d
ot:
self
)
QN
am
e ( E
xpr
, ... )
QN
am
e ( )
Inte
gerL
itera
l
Decim
alL
itera
l
Double
Lit
era
l
Str
ingLit
era
l
Ari
thm
eti
c E
xpre
ssio
ns
+ E
xpr
Expr
+ E
xpr
- E
xpr
Expr
- E
xpr
Expr
* Expr
Expr
div
Expr
Expr
idiv
Expr
Expr
mod
Expr
Cre
ati
ng S
equences
Cre
ate
a s
equence f
rom
a lis
t of
item
s:
Expr
, ...
Note
: A
sequence lis
t m
ust
usu
ally b
e p
are
nth
esiz
ed.
Repeat
over
one o
r m
ore
sequences, re
turn
ing a
sequence o
f re
sult
s:
for
Vari
able
Bin
din
g ,
... r
etu
rn E
xpr
where
a V
ari
able
Bin
din
g is:
$V
arN
am
e in E
xpr
Cre
ate
a n
um
eri
c s
equences, fr
om
low
er
bound t
o
upper
bound:
Expr
to E
xpr
All t
he ite
ms a
ppeari
ng in e
ither
sequence:
Expr
unio
n E
xpr
Expr
| Expr
Only
ite
ms a
ppeari
ng in b
oth
sequences:
Expr
inte
rsect
Expr
All ite
ms in t
he f
irst
sequence n
ot
in s
econd:
Expr
except
Expr
Com
ments
in X
Path
Expre
ssio
ns
(: T
his
is a
com
ment
wit
hin
an X
Path
expr
:)
Testi
ng
Test
if t
he c
ondit
ion is s
ati
sfi
ed f
or
at
least
one
com
bin
ati
on o
f th
e b
ound e
xpre
ssio
ns:
som
e V
ari
able
Bin
din
g , ... s
ati
sfi
es E
xpr
Test
if t
he c
ondit
ion is s
ati
sfi
ed f
or
all o
f th
e
bound e
xpre
ssio
ns:
every
Vari
able
Bin
din
g ,
... s
ati
sfi
es E
xpr
Sele
ct
one o
r th
e o
ther
of
two p
ossib
iliite
s:
if (
Expr
) th
en E
xpr
els
e E
xpr
Eit
her
or
both
of
two t
ests
:
Expr
or
Expr
Expr
and
Expr
Test
if t
hey a
re t
he s
am
e n
ode:
Expr
is E
xpr
Test
if a
node a
ppears
befo
re o
r aft
er
anoth
er:
Expr
<<
Expr
Expr
>>
Expr
Test
an e
xpre
ssio
n’s
dynam
ic t
ype:
Expr
insta
nce o
f SequenceType
Test
if a
n e
xpre
ssio
n c
an b
e c
onvert
ed t
o a
type:
Expr
casta
ble
as A
tom
icType
Expr
casta
ble
as A
tom
icType?
Com
pare
tw
o a
tom
ic v
alu
es:
Expr
eq
Expr
Expr
ne E
xpr
Expr
lt E
xpr
Expr
le E
xpr
Expr
gt
Expr
Expr
ge E
xpr
Com
pare
all ite
ms in o
ne s
equence t
o a
ll ite
ms in
a s
econd, and r
etu
rn if
true f
or
any p
air
of
valu
es:
Expr
= E
xpr
Expr
!= Expr
Expr
< E
xpr
Expr
<=
Expr
Expr
> E
xpr
Expr
>=
Expr
Type M
odif
icati
on E
xpre
ssio
ns
Use a
s w
ithout
convert
ing:
Expr
treat
as S
equenceType
Use a
s, convert
ing a
s n
eeded a
nd d
oable
:
Expr
cast
as A
tom
icType
Expr
cast
as A
tom
icType?
XPath
2.0
: htt
p:/
/w
ww
.w3.o
rg/TR/xpath
20/
XSL-Lis
t:
htt
p:/
/w
ww
.mulb
err
yte
ch.c
om
/xsl/
xsl-
list
Path
Expre
ssio
ns
/
Top level, d
ocum
ent
root
/ S
tep
At
top level
Ste
p
Rela
tive t
o c
urr
ent
node
// S
tep
Anyw
here
wit
hin
docum
ent
Path
/ S
tep
Imm
edia
tely
wit
hin
Path
Path
// S
tep
Anyw
here
wit
hin
Path
Where
a S
tep is o
ne o
f:
Expr
Axis
Nam
e::N
am
eTest
Axis
Nam
e::K
indTest
@N
am
eTest
(a
ttri
bute
test)
Nam
eTest
(
child e
lem
ent
test)
Kin
dTest
(
child n
ode t
est)
..
(tw
o d
ots
: pare
nt
test)
Follow
ed b
y z
ero
or
more
pre
dic
ate
s:
[ Expr
]
Where
an A
xis
Nam
e is o
ne o
f:
ancesto
r ancesto
r-or-
self
att
ribute
child
descendant
descendant-
or-
self
follow
ing
follow
ing-sib
ling
nam
espace
pare
nt
pre
cedin
g
pre
cedin
g-sib
ling
self
Where
a N
am
eTest
is o
ne o
f:
QN
am
e
* NC
Nam
e:*
*:N
CN
am
e
Where
a K
indTest
is o
ne o
f:
att
ribute
( A
ttri
bute
Nam
e )
att
ribute
( A
ttri
bute
Nam
e ,
TypeN
am
e )
att
ribute
( *
)
att
ribute
( *
, T
ypeN
am
e )
att
ribute
( )
com
ment
( )
docum
ent-
node (
ele
ment
... )
docum
ent-
node (
schem
a-ele
ment
... )
docum
ent-
node (
)
ele
ment
( Ele
mentN
am
e )
ele
ment
( Ele
mentN
am
e ,
TypeN
am
e )
ele
ment
( *
)
ele
ment
( *
, TypeN
am
e )
ele
ment
( )
node (
)
pro
cessin
g-in
str
ucti
on (
NC
Nam
e )
pro
cessin
g-in
str
ucti
on (
Str
ingLit
era
l )
pro
cessin
g-in
str
ucti
on (
)
schem
a-att
ribute
( A
ttri
bute
Nam
e )
schem
a-ele
ment
( Ele
mentN
am
e )
text
( )
Nam
es a
nd T
ypes
XM
L Q
Nam
es, w
ith o
r w
ithout
a c
olo
n-separa
ted
pre
fix, is
use f
or
all o
f:
VarN
am
e
Att
ribute
Nam
e
Ele
mentN
am
e
TypeN
am
e
Ato
mic
Type
A S
equenceType is o
ne o
f:
em
pty
-sequence (
)
Kin
dTest
item
( )
Ato
mic
Type
Where
Kin
dTest,
ite
m()
or
Ato
mic
Type c
an b
e
opti
onally f
ollow
ed b
y:
?
(may b
e e
mpty
sequence)\
+
(is a
non-em
pty
sequence o
f th
e ty
pe)
* (is a
sequence o
f th
e t
ype, em
pty
or
not)
O
pera
tor
Pre
cedence:
1
, (c
om
ma)
2
for
s
om
e e
very
if
3
or
4
and
5
=
!= <
<
= >
>
=
eq
ne lt
le
g
t ge
is <
< >
>
6
to
7
(tw
o-arg
um
ent)
+
-
8
*
div
id
iv m
od
9
unio
n |
10
inte
rsect
e
xcept
11
insta
nce o
f
12
treat
as
13
casta
ble
as
14
cast
as
15
(one-arg
um
ent)
+
-
16
/
//
17
ste
p node-te
st
$
nam
e
( Expr
) f
uncti
on-call lite
ral
XPath2
160
Rela
tive L
ocati
on P
ath
s
Rela
tive L
ocati
on P
ath
s t
ravers
e t
he d
ocum
ent
from
the c
onte
xt
node
para
para
ele
ment
childre
n
Als
o -
child::para
@ty
pe
the t
ype a
ttri
bute
A
lso -
att
ribute
::ty
pe
../ti
tle
the t
itle
ele
ment
childre
n o
f th
e p
are
nt
* except
titl
e
child e
lem
ents
except
titl
e e
lem
ents
A
lso -
*[n
ot(
self
::ti
tle)]
(w
ork
s in X
Path
1.0
)
ancesto
r::s
ec
all s
ec a
ncesto
r ele
ments
ancesto
r::s
ec/@
n
all n
att
ribute
s o
n s
ec a
ncesto
r ele
ments
list/
(ite
m |
ste
p)
item
and s
tep
ele
ment
childre
n o
f list
childre
n, in
docum
ent
ord
er
list/
item
, list/
ste
p
item
ele
ment
childre
n o
f list
childre
n f
ollow
ed
by s
tep
childre
n o
f list
childre
n
pre
cedin
g-sib
ling::
ste
p
all p
recedin
g s
ibling s
tep
ele
ments
pre
cedin
g-sib
ling::
*[1][
self
::ste
p]
the d
irectl
y p
recedin
g s
ibling e
lem
ent,
if
it is a
ste
p (
oth
erw
ise n
oth
ing)
descendant:
:div
[last(
)]
the last
div
descendant
of
the c
urr
ent
node
.//div
[last(
)]
div
descendants
that
are
the last
child d
iv o
f each o
f th
eir
pare
nts
pre
cedin
g::
pb[1
] th
e f
irst
(most
im
media
te)
pre
cedin
g p
b
ancesto
r::s
ec//pb inte
rsect
pre
ced
ing::
pb
pb
ele
ments
insid
e t
he s
am
e s
ec e
lem
ent
as
the c
onte
xt
node, pre
cedin
g it
p[n
orm
alize-space()
] p
child e
lem
ents
that
have a
non
-w
hit
espace
valu
e (
text
conte
nt)
*[not(
node()
)]
em
pty
ele
ment
childre
n (
i.e., e
lem
ent
childre
n
wit
h n
o n
ode c
hildre
n)
*[not(
node()
excep
t (c
om
ment(
)|
p
rocessin
g-in
str
ucti
on()
)]
ele
ment
childre
n t
hat
are
em
pty
(have n
o
childre
n)
except
for
com
ments
or
pro
cessin
g
instr
ucti
ons
ste
p[p
osit
ion()
gt
1]
all s
tep
ele
ment
childre
n b
ut
the f
irst
ste
p e
xcept
*[1]
ste
p e
lem
ent
childre
n b
ut
the f
irst
ste
p[p
osit
ion()
le 4
] th
e f
irst
four
ste
p e
lem
ent
childre
n
Als
o -
ste
p[p
osit
ion()
= (
1 t
o 4
)]
ste
p[p
osit
ion()
mod 2
] odd-num
bere
d s
tep
childre
n
ste
p[n
ot(
posit
ion()
mod 2
)]
even-num
bere
d s
tep
childre
n
*[posit
ion()
le 4
] in
ters
ect
ste
p
from
the f
irst
four
ele
ment
childre
n, th
e s
tep
childre
n
ancesto
r-or-
self
::*[
exis
ts(@
lang)]
[1]/
@la
ng
the c
losest
lang a
ttri
bute
on t
he c
onte
xt
node
or
an a
ncesto
r ele
ment
Expre
ssio
ns t
hat
are
not
Locati
on P
ath
s
(@
cla
ss,'none')[1
] th
e c
lass a
ttri
bute
, or
if it
does n
ot
exis
t, t
he
str
ing "
none".
A
lso -
if
(exis
ts(@
cla
ss))
then @
cla
ss e
lse "
none"
//*/
nam
e()
the n
am
es o
f all e
lem
ents
, in
docum
ent
ord
er
dis
tinct-
valu
es(/
/*/
nam
e()
) th
e n
am
es o
f all e
lem
ents
, in
docum
ent
ord
er,
w
ith d
uplicate
s r
em
oved
//nam
e/str
ing-jo
in((
firs
t, last)
,' ')
a s
equence o
f str
ings c
onstr
ucte
d f
rom
the
nam
e e
lem
ents
in t
he d
ocum
ent,
each o
ne
concate
nati
ng t
he v
alu
es o
f it
s f
irst
and last
ele
ment
childre
n, in
that
ord
er,
join
ing t
hem
w
ith s
paces
Als
o -
for
$n in /
/nam
e r
etu
rn
s
trin
g-jo
in((
$n/fi
rst,
$n/la
st)
,' ')
//*/
count(
ancesto
r-or-
self
::*)
a s
equence o
f num
bers
repre
senti
ng t
he
depth
of
each e
lem
ent
in t
he d
ocum
ent
max(/
/*/
count(
ancesto
r-or-
self
::*)
) th
e m
axim
um
depth
of
all e
lem
ents
in t
he
docum
ent
(a n
um
ber
in a
sin
gle
ton s
equence)
for
$sto
oge in (
'Moe','L
arr
y','C
url
y')
retu
rncount(
//p[c
onta
ins(.
,$sto
oge)]
) th
e c
ounts
of
all p
ele
ments
in t
he d
ocum
ent
menti
onin
g e
ach
of
"Moe",
"Larr
y"
and "
Curl
y",
in
that
ord
er
index-of(
('M
oe','L
arr
y','C
url
y'), speaker[
1])
if
the f
irst
speaker
ele
ment
child h
as t
he v
alu
e
"Moe",
then 1
; if
"Larr
y",
then 2
; if
"C
url
y",
th
en 3
; oth
erw
ise t
he e
mpty
sequence (
i.e., n
o
valu
e)
(: Y
ou’v
e g
ot
to b
e k
idd
ing m
e. :)
do n
oth
ing.
A c
om
ment
is just
a c
om
ment.
2008-07-21
XPath
2.0
Q
uic
k R
efe
rence
See a
lso t
he “
XQ
uery
1.0
& X
Path
2.0
Functi
ons &
Opera
tors
Quic
k
Refe
rence”
Sam
Wilm
ott
sam
@w
ilm
ott
.ca
htt
p:/
/w
ww
.wilm
ott
.ca
and
Mulb
err
y T
echnolo
gie
s, In
c.
17 W
est
Jeff
ers
on S
treet,
Suit
e 2
07
Rockville
, M
D 2
085
0 U
SA
Phone:
+1 3
01/31
5-9
63
1
Fax:
+1 3
01
/31
5-828
5
info
@m
ulb
err
yte
ch.c
om
htt
p:/
/w
ww
.mulb
err
yte
ch.c
om
© 2
007
-2
008
Sam
Wilm
ott
and
M
ulb
err
y T
echnolo
gie
s, In
c.
Absolu
te L
ocati
on P
ath
s
Absolu
te L
ocati
on P
ath
s t
ravers
e t
he d
ocum
ent
sta
rtin
g a
t th
e t
op (
the r
oot)
, and c
an b
e
recogniz
ed b
y t
heir
init
ial / (
forw
ard
sla
sh).
/book/bookin
fo/abstr
act
an a
bstr
act
ele
ment
child o
f a b
ookin
fo c
hild
of
the b
ook d
ocum
ent
ele
ment
Als
o -
/child::book/child::bookin
fo/ch
ild::abstr
act
//para
all p
ara
ele
ments
in t
he d
ocum
ent
Als
o -
/descendant-
or-
self
::*/
child::para
A
lso -
/descendant:
:para
/descendant:
:para
[1]
the f
irst
para
ele
ment
in t
he d
ocum
ent
Als
o -
(//para
)[1]
//@
ord
er-
by
all o
rder-
by a
ttri
bute
s in t
he d
ocum
ent
//list[
exis
ts(a
ncesto
r::lis
t)]
all lis
t ele
ments
that
have a
ncesto
r liste
lem
ents
//list[
not(
ancesto
r::lis
t)]
all lis
t ele
ments
that
do n
ot
have a
ncesto
r list
ele
ments
A
lso -
//list[
not(
exis
ts(a
ncesto
r::lis
t))]
A
lso -
//list[
em
pty
(ancesto
r::lis
t)]
//(*
except
titl
e)
all e
lem
ents
except
titl
e e
lem
ents
A
lso -
//*[
not(
self
::ti
tle)]
(w
ork
s in X
Path
1.0
)
//pro
cessin
g-in
str
ucti
on()
[not(
ancesto
r::s
ec/@
n =
1)]
all p
rocessin
g instr
ucti
ons w
ith n
o s
ec a
ncesto
r ele
ments
wit
h n
att
ribute
s e
qual to
1
//para
[matc
hes(.
,'[X
|x]{
3}')]
all p
ara
ele
ments
whose v
alu
e inclu
des t
he
regula
r expre
ssio
n [
X|x
]{3}
Tip
- [
X|x
]{3} m
atc
hes t
hre
e X
or
xchara
cte
rs
appeari
ng in a
row
//sec[@
id =
//@
rid/to
keniz
e(.
,'\s+
')]
all s
ec e
lem
ents
wit
h id
att
ribute
s w
hose
valu
es a
re a
lso g
iven a
s a
valu
e b
y a
to
keniz
ed r
id a
ttri
bute
anyw
here
in t
he
docum
ent
Als
o -
//sec[@
id =
$ri
d-valu
es]
where
$
rid-valu
es is
dis
tinct-
valu
es(/
/@
rid/to
keniz
e(.
,'\s+
'))
Tip
- u
se
dis
tinct-
valu
es(/
/@
rid/to
keniz
e(.
,'\s+
'))
to
rem
ove d
uplicate
s f
rom
the lis
t of
tokeniz
ed
@ri
d v
alu
es
Tip
- t
he r
egula
r expre
ssio
n \
s+
matc
hes a
ny
conti
guous s
equence o
f spaces
(space,
linefe
ed o
r ta
b c
hara
cte
rs)
161
XQ
uery
Scri
pts
An X
Query
scri
pt
consis
ts o
f:
1.
A V
ers
ion D
ecla
rati
on
xquery
vers
ion
Str
ingLit
era
l
follow
ed, opti
onally, by:
encodin
g S
trin
gLit
era
l
follow
ed, opti
onally, by a
sem
icolo
n (
";")
.
2. If
an X
Query
scri
pt
is a
Lib
rary
Module
, th
en it's m
odule
nam
espace d
ecla
rati
on
com
es n
ext:
module
nam
espace N
CN
am
e =
URIL
itera
l ;
3.
Defa
ult
Decla
rati
ons a
nd Im
port
s:
zero
or
more
of:
decla
re d
efa
ult
ele
ment
nam
esp
ace U
RIL
itera
l ;
decla
re d
efa
ult
functi
on n
am
espace U
RIL
itera
l ;
decla
re b
oundary
-space p
reserv
e ;
decla
re b
oundary
-space s
trip
;
decla
re d
efa
ult
collati
on
URIL
itera
l ;
decla
re b
ase-uri
URIL
itera
l ;
decla
re c
onstr
ucti
on s
trip
;
decla
re c
onstr
ucti
on p
reserv
e ;
decla
re o
rderi
ng o
rdere
d ;
decla
re o
rderi
ng u
nord
ere
d ;
decla
re d
efa
ult
ord
er
em
pty
gre
ate
st
;
decla
re d
efa
ult
ord
er
em
pty
least
;
decla
re c
opy-nam
espaces p
reserv
e ,
inheri
t ;
decla
re c
opy-nam
espaces p
reserv
e ,
no-in
heri
t ;
decla
re c
opy-nam
espaces n
o-pre
serv
e , inheri
t ;
decla
re c
opy-nam
espaces n
o-pre
serv
e ,
no-in
heri
t ;
decla
re n
am
espace N
CN
am
e =
URIL
itera
l ;
import
schem
a n
am
espace N
CN
am
e =
U
RIL
itera
lLis
t ;
import
schem
a d
efa
ult
ele
ment
nam
espace
URIL
itera
lLis
t ;
import
schem
a U
RIL
itera
lLis
t ;
import
module
nam
espace N
CN
am
e =
U
RIL
itera
lLis
t ;
import
module
URIL
itera
lLis
t ;
XQ
uery
1.0
:
htt
p:/
/w
ww
.w3
.org
/TR/xquery
/
4.
Vari
able
, Functi
on a
nd O
pti
on D
ecla
rati
ons:
zero
or
more
of:
decla
re v
ari
able
Vari
able
Decla
rati
on :=
ExprS
ingle
;
decla
re v
ari
able
Vari
able
Decla
rati
on e
xte
rnal ;
decla
re f
uncti
on Q
Nam
e
Para
mete
rDecla
rati
ons ;
decla
re f
uncti
on Q
Nam
e
Para
mete
rDecla
rati
ons
exte
rnal;
decla
re f
uncti
on Q
Nam
e
Para
mete
rDecla
rati
ons a
s
SequenceType e
xte
rnal ;
decla
re o
pti
on
QN
am
e S
trin
gLit
era
l ;
where
Para
mete
rDecla
rati
ons is o
ne o
f:
( )
(
i.e. em
pty
if
no p
ara
mete
rs)
( V
ari
able
Decla
rati
on )
(
for
one p
ara
mete
r)
( V
ari
able
Decla
rati
on ,
... )
(w
hen t
wo o
r m
ore
)
where
Vari
able
Decla
rati
on is o
ne o
f:
$Q
Nam
e
$Q
Nam
e a
s S
equenceType
and w
here
URIL
itera
lLis
t is
one o
f:
URIL
itera
l
URIL
itera
l at
URIL
itera
l
URIL
itera
l at
URIL
itera
l ,
... (t
wo o
r m
ore
)
5. Fin
ally, if
the X
Query
scri
pt
is a
Main
module
,
not
a L
ibra
ry m
odule
, an X
Query
expre
ssio
n is
requir
ed t
o s
pecif
y t
he q
uery
bein
g m
ade:
Expr
Cre
ati
ng S
equences
Cre
ate
a s
equence f
rom
a lis
t of
item
s:
Expr
, ...
Note
: A
sequence lis
t m
ust
usu
ally b
e p
are
nth
esiz
ed.
Repeat
over
one o
r m
ore
sequences, re
turn
ing a
sequence o
f re
sult
s:
for
Vari
able
Bin
din
g ,
... r
etu
rn E
xpr
Cre
ate
a n
um
eri
c s
equences, fr
om
low
er
bound t
o
upper
bound:
Expr
to E
xpr
All t
he ite
ms a
ppeari
ng in e
ither
sequence:
Expr
unio
n E
xpr
Exp
r |
Expr
Only
ite
ms a
ppeari
ng in b
oth
sequences:
Expr
inte
rsect
Expr
All ite
ms in t
he f
irst
sequence n
ot
in s
econd:
Expr
except
Expr
Ari
thm
eti
c E
xpre
ssio
ns
+ E
xpr
Expr
+ E
xpr
- E
xpr
Expr
- E
xpr
Expr
* Expr
Expr
div
Expr
Expr
idiv
Expr
Expr
mod
Expr
Type M
odif
icati
on E
xpre
ssio
ns
Use a
s w
ithout
convert
ing:
Expr
treat
as S
equenceType
Use a
s, convert
ing a
s n
eeded a
nd d
oable
:
Expr
cast
as A
tom
icType
Expr
cast
as A
tom
icType?
Sim
ple
Expre
ssio
ns
$ V
arN
am
e
. (
one d
ot:
self
)
( )
( Expr
)
QN
am
e ( E
xpr
, ... )
QN
am
e ( )
Inte
gerL
itera
l D
ecim
alL
itera
l
Double
Lit
era
l Str
ingLit
era
l
Validati
ng N
odes
validate
{ E
xpr
} (d
efa
ult
s t
o s
tric
t)
validate
lax {
Expr
}
validate
str
ict
{ Expr
}
Ord
eri
ng M
ode f
or
Sequences
ord
ere
d {
Expr
}
unord
ere
d {
Expr
}
Imple
menta
tion-D
efi
ned
Instr
ucti
ons
(#
QN
am
e ... #
) …
{ O
pti
onalE
xpr
}
Path
Expre
ssio
ns
/
Top level, d
ocum
ent
root
/ S
tep
At
top level
Ste
p
Rela
tive t
o c
urr
ent
node
// S
tep
Anyw
here
wit
hin
docum
ent
Path
/ S
tep
Imm
edia
tely
wit
hin
Path
Path
// S
tep
Anyw
here
wit
hin
Path
Where
a S
tep is o
ne o
f:
Expr
Axis
Nam
e :: N
am
eTest
Axis
Nam
e ::
Kin
dTest
@N
am
eTest
(
att
ribute
test)
Nam
eTest
(
child e
lem
ent
test)
Kin
dTest
(
child n
ode t
est)
..
(
two d
ots
: pare
nt
test)
Follow
ed b
y z
ero
or
more
pre
dic
ate
s:
[ Expr
]
Where
an A
xis
Nam
e is o
ne o
f:
ancesto
r ancesto
r-or-
self
att
ribute
child
descendant
descendant-
or-
self
follow
ing
follow
ing-sib
ling
nam
espace
pare
nt
pre
cedin
g
pre
cedin
g-sib
ling
self
A N
am
eTest
is o
ne o
f:
QN
am
e
*
NC
Nam
e:*
*:
NC
Nam
e
And a
Kin
dTest
is o
ne o
f:
att
ribute
( A
ttri
bute
Nam
e )
att
ribute
( A
ttri
bute
Nam
e ,
TypeN
am
e )
att
ribute
( *
, T
ypeN
am
e )
att
ribute
( *
)
att
ribute
( )
com
ment
( )
docum
ent-
node (
ele
ment
... )
docum
ent-
node (
schem
a-ele
ment
... )
docum
ent-
node (
)
ele
ment
( Ele
mentN
am
e )
ele
ment
( Ele
mentN
am
e ,
TypeN
am
e )
ele
ment
( *
, TypeN
am
e )
ele
ment
( *
)
ele
ment
( )
node (
)
pro
cessin
g-in
str
ucti
on (
NC
Nam
e )
pro
cessin
g-in
str
ucti
on (
Str
ingLit
era
l )
pro
cessin
g-in
str
ucti
on (
)
schem
a-att
ribute
( A
ttri
bute
Nam
e )
schem
a-ele
ment
( Ele
mentN
am
e )
text
( )
XQuery.pdf
162
Testi
ng
Sele
ct
based o
n t
he t
ype o
f an e
xpre
ssio
n (
one o
r
more
cases p
lus a
defa
ult
):
typesw
itch (
Expr
) case ... d
efa
ult
...
where
case a
nd d
efa
ult
are
:
case S
equenceType r
etu
rn E
xpr
case $
VarN
am
e a
s S
equenceType r
etu
rn E
xpr
defa
ult
retu
rn E
xpr
defa
ult
$V
arN
am
e r
etu
rn E
xpr
Test
if t
he c
ondit
ion is s
ati
sfi
ed f
or
at
least
one
com
bin
ati
on o
f th
e b
ound e
xpre
ssio
ns:
som
e V
ari
able
Bin
din
g ,
... s
ati
sfi
es E
xpr
Test
if t
he c
ondit
ion is s
ati
sfi
ed f
or
all o
f th
e
bound e
xpre
ssio
ns:
every
Vari
able
Bin
din
g ,
... s
ati
sfi
es E
xpr
where
a V
ari
able
Bin
din
g is:
$V
arN
am
e in E
xpr
$V
arN
am
e a
s S
equenceType in
Expr
Sele
ct
one o
r th
e o
ther
of
two p
ossib
iliite
s:
if (
Expr
) th
en E
xpr
els
e E
xpr
Eit
her
or
both
of
two t
ests
:
Expr
or
Expr
Expr
and
Expr
Test
if t
hey a
re t
he s
am
e n
ode:
Expr
is E
xpr
Test
if a
node a
ppears
befo
re o
r aft
er
anoth
er:
Expr
<<
Expr
Expr
>>
Expr
Test
an e
xpre
ssio
n’s
dynam
ic t
ype:
Expr
insta
nce o
f SequenceType
Test
if a
n e
xpre
ssio
n c
an b
e c
onvert
ed t
o a
type:
Expr
casta
ble
as A
tom
icType
Expr
casta
ble
as A
tom
icType?
Com
pare
tw
o ite
m v
alu
es:
Expr
eq
Expr
Expr
ne E
xpr
Expr
lt E
xpr
Expr
le E
xpr
Expr
gt
Expr
Expr
ge E
xpr
Com
pare
all ite
ms in o
ne s
equence t
o a
ll ite
ms in
a s
econd, and r
etu
rn if
true f
or
any p
air
of
valu
es:
Expr
= E
xpr
Expr
!= Expr
Expr
< E
xpr
Expr
<=
Expr
Expr
> E
xpr
Expr
>=
Expr
Nam
es a
nd T
ypes
VarN
am
e
Att
ribute
Nam
e
Ele
mentN
am
e
TypeN
am
e
Ato
mic
Type
are
all X
ML Q
Nam
es, w
ith o
r w
ithout
a c
olo
n-
separa
ted p
refi
x.
A S
equenceType is o
ne o
f:
em
pty
-sequence (
)
Kin
dTest
item
( )
Ato
mic
Type
Where
Kin
dTest,
ite
m()
or
Ato
mic
Type c
an b
e
opti
onally f
ollow
ed b
y:
?
(may b
e e
mpty
sequence)
+
(is a
non-em
pty
sequence o
f th
e ty
pe)
* (is a
sequence o
f th
e t
ype, em
pty
or
not)
O
pera
tor
Pre
cedence:
1
, (c
om
ma)
2
for
let
som
e every
if
t
ypesw
itch
3
or
4
and
5
=
!= <
<
= >
>
=
eq
ne lt
le
g
t ge
is <
< >
>
6
to
7
(tw
o-arg
um
ent)
+
-
8
*
div
id
iv m
od
9
unio
n |
10
inte
rsect
e
xcept
11
insta
nce o
f
12
treat
as
13
casta
ble
as
14
cast
as
15
(one-arg
um
ent)
+
-
16
/
//
17
ste
p n
ode-te
st
$
nam
e
( Expr
)
functi
on-call lite
ral
validate
(#
… #
) c
onstr
ucto
r
ord
ere
d unord
ere
d
Pre
defi
ned N
am
espace N
am
es:
xm
l =
htt
p:/
/w
ww
.w3.o
rg/XM
L/1998/nam
espace
xs =
htt
p:/
/w
ww
.w3
.org
/2001/XM
LSch
em
a
xsi =
htt
p:/
/w
ww
.w3
.org
/2001/XM
LSchem
a-in
sta
nce
fn =
htt
p:/
/w
ww
.w3
.org
/2005/xpath
-fu
ncti
ons
local =
htt
p:/
/w
ww
.w3
.org
/2005/xquery
-lo
cal-
functi
ons
2008-07-21
XQ
uery
1.0
Quic
k R
efe
rence
See a
lso t
he “
XQ
uery
1.0
& X
Path
2.0
Functi
ons &
Opera
tors
Quic
k
Refe
rence”
Sam
Wilm
ott
sam
@w
ilm
ott
.ca
htt
p:/
/w
ww
.wilm
ott
.ca
and
Mulb
err
y T
echnolo
gie
s, In
c.
17 W
est
Jeff
ers
on S
treet,
Suit
e 2
07
Rockville
, M
D 2
085
0 U
SA
Phone:
+1 3
01/31
5-9
63
1
Fax:
+1 3
01
/31
5-828
5
info
@m
ulb
err
yte
ch.c
om
htt
p:/
/w
ww
.mulb
err
yte
ch.c
om
© 2
007
-2
008
Sam
Wilm
ott
and
M
ulb
err
y T
echnolo
gie
s, In
c.
FLW
OR
Expre
ssio
ns
FLW
OR E
xpre
ssio
ns s
tart
wit
h o
ne o
r m
ore
for
or
let:
for
SequenceV
ari
able
Bin
din
g ,
...
let
Assig
nedV
ari
able
Bin
din
g ,
...
follow
ed b
y:
where
Expr
(
opti
onal)
Ord
eri
ngIn
fo , …
(o
ne o
r m
ore
, opti
onal)
retu
rn E
xpr
where
SequenceV
ari
able
Bin
din
g is o
ne o
f:
$V
arN
am
e in E
xpr
$V
arN
am
e a
s S
equenceType in
Expr
$V
arN
am
e a
t $
VarN
am
e in
Expr
$V
arN
am
e a
s S
eq
uence
Typ
e a
t $
VarN
am
e in E
xpr
where
Assig
nedV
ari
able
Bin
din
g is o
ne o
f:
$V
arN
am
e :
= E
xpr
$V
arN
am
e a
s S
equenceType :=
Exp
and w
here
Ord
eri
ngIn
fo c
onsis
ts o
f, in o
rder:
sta
ble
(o
pti
onal)
ord
er
Expr
ascendin
g o
r d
escendin
g
(opti
onal)
em
pty
gre
ate
st
or
em
pty
least
(o
pti
onal)
collati
on U
RIL
itera
l
(opti
onal)
Constr
ucto
rs
< Q
Nam
e ...
/>
< Q
Nam
e ... >
... <
/ Q
Nam
e >
<![
CD
ATA
[ ... ]]
>
<!-
- ... -
->
<?
PIT
arg
et
... ?>
docum
ent
{ Expr
}
ele
ment
QN
am
e {
Opti
onalE
xpr
}
ele
ment
{ Expr
} {
Opti
onalE
xpr
}
att
ribute
QN
am
e {
Opti
onalE
xpr
}
att
ribute
{ E
xpr
} {
Opti
onalE
xpr
}
text
{ Expr
}
com
ment
{ Expr
}
pro
cessin
g-in
str
ucti
on N
CN
am
e {
Opti
onalE
xpr
}
pro
cessin
g-in
str
ucti
on {
Expr
} {
Opti
onalE
xpr
}
Wit
hin
a c
onstr
ucto
r’s a
ttri
bute
valu
es a
nd
ele
ment
conte
nt,
lit
era
l "{
" and "
}" n
eed d
oubling.
Anyth
ing w
ithin
sin
gle
"{"
... "
}" is e
valu
ate
d a
s a
n
Expr.
163
Top-Level D
ecla
rati
ons
<xsl:att
ribute
-set
nam
e =
qnam
e
use-att
ribute
-sets
= q
nam
es>
xsl:att
ribute
*
<
/xsl:att
ribute
-set>
<xsl:chara
cte
r-m
ap n
am
e =
qnam
e
use-chara
cte
r-m
aps =
qnam
es>
xsl:outp
ut-
chara
cte
r*
<
xsl:outp
ut-
chara
cte
r chara
cte
r =
char
str
ing =
str
ing /
>
<
/xsl:chara
cte
r-m
ap>
One o
r m
ore
xsl:outp
ut-
chara
cter
is a
llow
ed.
<xsl:d
ecim
al-
form
at
nam
e =
qnam
e
decim
al-
separa
tor
= c
har
gro
upin
g-separa
tor
= c
har
infi
nit
y =
str
ing
min
us-sig
n =
char
NaN
= s
trin
g
perc
ent
= c
har
per-
mille
= c
har
zero
-dig
it =
char
dig
it =
char
patt
ern
-separa
tor
= c
har
/>
<xsl:fu
ncti
on n
am
e =
qnam
e
as =
sequence-ty
pe
overr
ide =
"yes"
| "n
o">
xsl:para
m*,
sequence-constr
ucto
r
<
/xsl:fu
ncti
on>
<xsl:im
port
-sch
em
a n
am
espace
= u
ri
schem
a-lo
cati
on =
uri
>
xs:s
chem
a?
</xsl:im
port
-sch
em
a>
<xsl:in
clu
de h
ref
= u
ri /
>
<xsl:key n
am
e =
qnam
e
matc
h =
patt
ern
use =
expre
ssio
n
collati
on =
uri
>
sequence-constr
ucto
r
<
/xsl:key>
<xsl:nam
espace-alias
sty
lesheet-
pre
fix =
pre
fix |
"#defa
ult
"
re
sult
-pre
fix =
pre
fix |
"#defa
ult
" />
Conte
nt
Specif
icati
on O
pti
ons
? opti
onal
* zero
or
more
+
one o
r m
ore
#PC
DA
TA
ju
st
text
sequence-constr
ucto
r In
str
ucti
ons a
nd t
ext
<xsl:outp
ut
nam
e =
qnam
e
meth
od =
"xm
l" |
"htm
l" |
"xhtm
l" |
"text"
|qnam
e-but-
not-
ncn
am
e
byte
-ord
er-
mark
= "
yes"
| "n
o"
cdata
-secti
on-ele
ments
= q
nam
es
docty
pe-public =
str
ing
docty
pe-syste
m =
str
ing
encodin
g =
str
ing
escape-uri
-att
ribute
s =
"yes"
| "n
o"
inclu
de-conte
nt-
type =
"yes"
| "n
o"
indent
= "
yes"
| "n
o"
media
-ty
pe =
str
ing
norm
alizati
on-fo
rm =
"N
FC
" |
"NFD
" |
"NFK
C"
| "N
FK
D"
| "n
one"
|
"f
ully-norm
alized"
| nm
token
om
it-xm
l-decla
rati
on =
"yes"
| "n
o"
sta
ndalo
ne =
"yes"
| "n
o"
| "o
mit
"
undecla
re-pre
fixes =
"yes"
| "n
o"
use-chara
cte
r-m
aps =
qnam
es
vers
ion =
nm
token /
>
<xsl:para
m n
am
e =
qnam
e
sele
ct
= e
xpre
ssio
n
as =
sequence-ty
pe
requir
ed =
"yes"
| "n
o"
tunnel =
"yes"
| "n
o">
sequence-constr
ucto
r
<
/xsl:para
m>
xsl:para
m is a
lso a
llow
ed in x
sl:fu
ncti
on a
nd
xsl:te
mpla
te.
<xsl:p
reserv
e-space e
lem
ents
= t
okens /
>
<xsl:str
ip-space e
lem
ents
= t
okens /
>
<xsl:te
mpla
te m
atc
h =
patt
ern
nam
e =
qnam
e
pri
ori
ty =
num
ber
mode =
tokens
as =
sequence-ty
pe>
xsl:para
m*,
sequence-constr
uct
or
</xsl:te
mpla
te>
<xsl:vari
ab
le n
am
e =
qnam
e
sele
ct
= e
xpre
ssio
n
as =
sequence-ty
pe>
sequence-constr
ucto
r
<
/xsl:vari
ab
le>
xsl:vari
ab
le is a
lso a
llow
ed in s
equence-
constr
ucto
r conte
xts
.
Att
ribute
Specif
icati
on O
pti
ons
{ }
specif
ied u
sin
g a
n a
ttri
bute
valu
e t
em
pla
te
bold
=
requir
ed a
ttri
bute
non-bold
=
opti
onal att
ribute
Node C
onstr
ucti
ng Instr
ucti
ons
<xsl:att
ribute
nam
e =
{ q
nam
e }
nam
espace =
{ u
ri }
sele
ct
= e
xpre
ssio
n
separa
tor
= {
str
ing }
ty
pe =
qnam
e
validati
on =
"str
ict"
| "
lax"
|
"pre
serv
e"
| "s
trip
">
sequence-constr
ucto
r
<
/xsl:att
ribute
>
<xsl:com
ment
sele
ct
= e
xpre
ssio
n>
sequence-constr
ucto
r
<
/xsl:com
ment>
<xsl:docum
ent
type =
qnam
e
validati
on =
"str
ict"
| "
lax"
|
"pre
serv
e"
| "s
trip
" >
sequence-constr
ucto
r
<
/xsl:docum
ent>
<xsl:ele
ment
nam
e =
{ q
nam
e }
nam
espace =
{ u
ri}
inheri
t-nam
espaces =
"yes"
| "n
o"
use-att
ribute
-sets
= q
nam
es
type =
qnam
e
validati
on =
"str
ict"
| "
lax"
|
"pre
serv
e"
| "s
trip
">
sequence-constr
ucto
r
<
/xsl:ele
ment>
Ele
ment
nodes c
an a
lso b
e c
onstr
ucte
d u
sin
g X
ML
ele
ments
not
in t
he x
sl: n
am
esp
ace, w
hic
h c
an
als
o s
pecif
y x
sl:ty
pe, xsl:validati
on
and
xsl:use-att
ribute
-sets
att
ribute
s.
<xsl:nam
espace n
am
e =
{ n
cnam
e }
sele
ct
= e
xpre
ssio
n>
sequence-constr
ucto
r
<
/xsl:nam
espace>
<xsl:p
rocessin
g-in
str
ucti
on
nam
e =
{ n
cnam
e }
sele
ct
= e
xpre
ssio
n>
sequence-constr
ucto
r
<
/xsl:p
rocessin
g-in
str
ucti
on>
<xsl:sequence s
ele
ct
= e
xpre
ssio
n>
xsl:fa
llback*
</xsl:sequence>
<xsl:te
xt
dis
able
-outp
ut-
escapin
g =
"yes"
| "n
o"
>
#PC
DA
TA
</xsl:te
xt>
dis
able
-outp
ut-
escapin
g is d
epre
cate
d.
Text
als
o c
onstr
ucts
text
nodes.
XSL-Lis
t:
htt
p:/
/w
ww
.mulb
err
yte
ch.c
om
/xsl/
xsl-
list
<xsl:re
sult
-docum
ent
form
at
= {
qnam
e }
hre
f =
{ u
ri }
validati
on =
"str
ict"
| "
lax"
|
"pre
serv
e"
| "s
trip
"
ty
pe =
qnam
e
meth
od =
{ "
xm
l" |
"htm
l" |
"xhtm
l" |
"
text"
| q
nam
e-but-
not-
ncnam
e }
byte
-ord
er-
mark
= {
"yes"
| "n
o"
}
cdata
-secti
on-ele
ments
= {
qnam
es }
docty
pe-public =
{ s
trin
g }
docty
pe-syste
m =
{ s
trin
g }
encodin
g =
{ s
trin
g }
escape-uri
-att
ribute
s =
{ "
yes"
| "n
o"
}
in
clu
de-conte
nt-
type =
{ "
yes"
| "n
o"
}
in
dent
= {
"yes"
| "n
o"
}
m
edia
-ty
pe =
{ s
trin
g }
norm
alizati
on-fo
rm =
{ "
NFC
" |
"NFD
" |
"
NFK
C"
| "N
FK
D"
| "n
one”
|
"f
ully-norm
alized"
| nm
token }
om
it-xm
l-decla
rati
on =
{ "
yes"
| "n
o"
}
sta
ndalo
ne =
{ "
yes"
| "n
o"
| "o
mit
" }
undecla
re-pre
fixes =
{ "
yes"
| "n
o"
}
use-chara
cte
r-m
aps =
qnam
es
outp
ut-
vers
ion =
{ n
mto
ken } >
sequence-constr
ucto
r
<
/xsl:re
sult
-docum
ent>
Allow
ed A
ttri
bute
Valu
es:
char
a s
ingle
chara
cte
r
expre
ssio
n
an X
Path
expre
ssio
n
id
an ID
att
ribute
valu
e
ncnam
e
a n
am
e w
ith n
o
nam
espace p
refi
x
nm
token
a n
um
ber
token
num
ber
a n
um
ber
(only
dig
its)
patt
ern
an X
Path
expre
ssio
n
confo
rmin
g t
o p
att
ern
synta
x
pre
fix
a n
am
espace p
refi
x
qnam
e-but-
not-
ncnam
e
a n
am
e w
ith a
nam
espace p
refi
x
qnam
e
a n
am
e w
ith o
r w
ithout
a
nam
espace p
refi
x
sequence-ty
pe
an X
ML S
chem
a
sequence t
ype (w
ith *
)
str
ing
just
text
token
specif
ic t
o its
use
uri
-list
whit
e-space s
epara
ted
list
of
URIs
uri
a u
nif
orm
resourc
e
identi
fier
XSLT2
164
Condit
ional and L
oopin
g
Instr
ucti
ons
<xsl:analy
ze-str
ing s
ele
ct
= e
xpre
ssio
n
regex =
{ s
trin
g }
flags =
{ s
trin
g }>
<xsl:m
atc
hin
g-su
bstr
ing>
sequence-constr
ucto
r
</xsl:m
atc
hin
g-su
bstr
ing>
<xsl:non-m
atc
hin
g-substr
ing>
sequence-constr
ucto
r
</xsl:non-m
atc
hin
g-substr
ing>
xsl:fa
llback*
<
/xsl:analy
ze-str
ing>
One b
ut
not
both
of
xsl:m
atc
hin
g-su
bstr
ing a
nd
xsl:non-m
atc
hin
g-substr
ing
can b
e o
mit
ted.
regex-gro
up(N
) re
turn
s t
he N
th g
roup m
atc
hed
by t
he r
egex w
ithin
xsl:m
atc
hin
g-substr
ing.
<xsl:choose>
<xsl:w
hen t
est
= e
xpre
ssio
n>
sequence-constr
ucto
r
</xsl:w
hen>
<xsl:oth
erw
ise>
sequence-constr
ucto
r
</xsl:oth
erw
ise>
</xsl:choose>
One o
r m
ore
xsl:w
hen
and z
ero
or
one
xsl:oth
erw
ise a
re a
lllo
wed.
<xsl:fo
r-each
sele
ct
= e
xpre
ssio
n>
xsl:sort
*, sequence-constr
ucto
r
<
/xsl:fo
r-each>
<xsl:fo
r-each-gro
up s
ele
ct
= e
xpre
ssio
n
gro
up-by =
expre
ssio
n
gro
up-adja
cent
= e
xpre
ssio
n
gro
up-sta
rtin
g-w
ith =
patt
ern
gro
up-endin
g-w
ith =
patt
ern
collati
on =
{ u
ri }
>
xsl:sort
*, sequence-constr
ucto
r
<
/xsl:fo
r-each-gro
up>
<xsl:if
test
= e
xpre
ssio
n>
sequence-constr
ucto
r
<
/xsl:if
>
Sta
ndard
Att
ribute
s
Sta
ndard
att
ribute
s a
re a
llow
ed o
n a
ll e
lem
ents
.
When n
ot
on x
sl: e
lem
ents
, th
e x
sl: p
refi
x is
requir
ed o
n t
he a
ttri
bute
nam
e.
[xsl:]d
efa
ult
-collati
on =
uri
[xsl:]e
xclu
de-re
sult
-pre
fixes =
tokens
[xsl:]e
xte
nsio
n-ele
ment-
pre
fixes =
tokens
[xsl:]u
se-w
hen =
expre
ssio
n
[xsl:]v
ers
ion =
"1.0
" |
"2.0
"
[xsl:]x
path
-defa
ult
-nam
espace =
uri
Valu
e/C
opy Instr
ucti
ons
<xsl:copy c
opy-nam
espaces =
"yes"
| "n
o"
inheri
t-nam
espaces =
"yes"
| "n
o"
use-att
ribute
-sets
= q
nam
es
type =
qnam
e
validati
on =
"str
ict"
| "
lax"
|
"pre
serv
e"
| "s
trip
">
sequence-constr
ucto
r
<
/xsl:copy>
<xsl:copy-of
sele
ct
= e
xpre
ssio
n
copy-nam
espaces =
"yes"
| "n
o"
type =
qnam
e
validati
on =
"str
ict"
| "
lax"
|
"pre
serv
e"
| "s
trip
" />
<xsl:num
ber
valu
e =
expre
ssio
n
sele
ct
= e
xpre
ssio
n
level =
"sin
gle
" |
"mult
iple
" |
"any"
count
= p
att
ern
fr
om
= p
att
ern
fo
rmat
= {
str
ing }
lang =
{ n
mto
ken }
lett
er-
valu
e =
{ "
alp
habeti
c"
|
"tr
adit
ional"
}
ord
inal =
{ s
trin
g }
gro
upin
g-separa
tor
= {
char
}
gro
upin
g-siz
e =
{ n
um
ber
} />
<xsl:p
erf
orm
-sort
sele
ct
= e
xpre
ssio
n>
xsl:sort
+, sequence-constr
ucto
r
<
/xsl:p
erf
orm
-sort
>
<xsl:valu
e-of
sele
ct
= e
xpre
ssio
n
separa
tor
= {
str
ing }
dis
able
-outp
ut-
escapin
g =
"yes"
| "n
o"
>
sequence-constr
ucto
r
<
/xsl:valu
e-of>
dis
able
-outp
ut-
escapin
g is d
epre
cate
d.
<xsl:sort
sele
ct
= e
xpre
ssio
n
lang =
{ n
mto
ken }
ord
er
= {
"ascendin
g"
| "d
escendin
g"}
collati
on =
{ u
ri }
sta
ble
= {
"yes"
| "n
o"
}
case-ord
er
= {
"upper-
firs
t" |
"lo
wer-
firs
t" }
data
-ty
pe =
{ "
text"
| "
num
ber"
|
qnam
e-but-
not-
ncn
am
e } >
sequence-constr
ucto
r
<
/xsl:sort
>
xsl:sort
is u
sed in x
sl:fo
r-each,
xsl:fo
r-each-gro
up, xsl:apply
-te
mpla
tes a
nd
xsl:p
erf
orm
-sort
. XSLT 2
.0:
htt
p:/
/w
ww
.w3
.org
/TR/xslt
20/
XPath
2.0
:
htt
p:/
/w
ww
.w3
.org
/TR/xpath
20/
2008-07-21
XSLT
2.0
Quic
k R
efe
rence
Sam
Wilm
ott
sam
@w
ilm
ott
.ca
htt
p:/
/w
ww
.wilm
ott
.ca
and
Mulb
err
y T
echnolo
gie
s, In
c.
17 W
est
Jeff
ers
on S
treet,
Suit
e 2
07
Rockville
, M
D 2
085
0 U
SA
Phone:
+1 3
01/31
5-9
63
1
Fax:
+1 3
01
/31
5-828
5
info
@m
ulb
err
yte
ch.c
om
htt
p:/
/w
ww
.mulb
err
yte
ch.c
om
© 2
007
-2
008
Sam
Wilm
ott
and
M
ulb
err
y T
echnolo
gie
s, In
c.
The S
tyle
sheet
Ele
ment
<xsl:sty
lesheet
id =
id
exte
nsio
n-ele
ment-
pre
fixes =
tokens
exclu
de-re
sult
-pre
fixes =
tokens
vers
ion
= "
1.0
" |
"2.0
"
xpath
-defa
ult
-nam
espace =
uri
defa
ult
-validati
on =
"pre
serv
e"
| "s
trip
"
defa
ult
-collati
on =
uri
-list
input-
type-annota
tions =
"pre
serv
e"
|
"
str
ip"
| "u
nspecif
ied"
xm
lns:x
sl=
"
htt
p:/
/w
ww
.w3
.org
/1999/XSL/Tra
nsf
orm
">
xsl:im
port
*, t
op-le
vel-
decla
rati
ons
<
/xsl:sty
lesheet>
xsl:tr
ansf
orm
is a
synonym
for
xsl:sty
lesheet.
<xsl:im
port
hre
f =
uri
/>
A lit
era
l re
sult
ele
ment
can b
e u
sed in p
lace o
f xsl:sty
lesheet,
so long a
s it
specif
ies a
ttri
bute
xsl:vers
ion a
nd n
am
espace x
mln
s:x
sl.
Tem
pla
te Invocati
on Instr
ucti
ons
<xsl:app
ly-im
port
s>
xsl:w
ith-para
m*
</xsl:app
ly-im
port
s>
<xsl:app
ly-te
mpla
tes s
ele
ct
= e
xpre
ssio
n
mode =
token>
(xsl:sort
| x
sl:w
ith-para
m)*
</xsl:app
ly-te
mpla
tes>
<xsl:call-te
mpla
te n
am
e =
qnam
e>
xsl:w
ith-para
m*
</xsl:call-te
mpla
te>
<xsl:next-
matc
h>
(xsl:w
ith-para
m |
xsl:fa
llback)*
</xsl:next-
matc
h>
<xsl:w
ith-para
m n
am
e =
qnam
e
sele
ct
= e
xpre
ssio
n
as =
sequence-ty
pe
tunnel =
"yes"
| "n
o">
sequence-constr
ucto
r
<
/xsl:w
ith-para
m>
Excepti
on-H
andling Instr
ucti
ons
<xsl:fa
llback>
sequence-constr
ucto
r
<
/xsl:fa
llback>
<xsl:m
essage s
ele
ct
= e
xpre
ssio
n
term
inate
= {
"yes"
| "n
o"
}>
sequence-constr
ucto
r
<
/xsl:m
essage>
165
Top-
Leve
l Sc
hem
a Th
is Q
uick
Ref
eren
ce p
rimar
ily d
escr
ibes
ISO
Sc
hem
atro
n. S
ee th
e “D
iffer
ence
” pan
el fo
r Sc
ham
atro
n 1.
5 an
d 1.
6.
<sc
hem
a id
="ID
" ico
n="U
RI" s
ee=
"URI
"
fp
i=”F
ORM
AL-P
UBLI
C-ID
” xm
l:lan
g="L
ANG"
xm
l:spa
ce=
{"pre
serv
e" |
"def
ault"
}
sc
hem
aVer
sion
="V
ERSI
ON
"
de
faul
tPha
se=
"IDRE
F"
quer
yBin
ding
="B
IND
ING-
NAM
E"
xmln
s=
"h
ttp:
//pu
rl.oc
lc.o
rg/d
sdl/
sche
mat
ron"
>
<
title
>?,
<ns
>*,
<p>
*, <
let>
*, <
phas
e>*,
<
patt
ern>
+, <
p>*,
<di
agno
stic
s>?,
plu
s
in
ters
pers
ed <
incl
ude>
</s
chem
a>
<ns
pre
fix=
"NM
TOKE
N" u
ri=
"URI
"/>
All n
ames
pace
s us
ed in
val
idat
ed d
ocum
ents
, and
re
fere
nced
in th
e sc
hem
a, m
ust b
e de
clar
ed u
sing
<
ns>
.
<le
t na
me=
"NAM
E" v
alue
="V
ALUE
"/>
<in
clud
e hr
ef=
”URI
”/>
Patt
erns
<
patt
ern
abst
ract
="f
alse
" id=
"ID"
icon
="U
RI" s
ee=
"URI
"
fp
i=”F
ORM
AL-P
UBLI
C-ID
” xm
l:lan
g="L
ANG"
xm
l:spa
ce=
{"pre
serv
e" |
"def
ault"
}>
<
p>*,
<le
t>*,
<ru
le>
*, p
lus
inte
rspe
rsed
<
incl
ude>
</p
atte
rn>
With
in e
ach
patt
ern,
onl
y th
e fir
st n
on-a
bstr
act
<ru
le>
who
se @
cont
ext
mat
ches
is u
sed.
Abs
trac
t pa
tter
ns
<pa
tter
n ab
stra
ct=
"tru
e" id
="ID
"
ic
on=
"URI
" see
="U
RI"
fpi=
”FO
RMAL
-PUB
LIC-
ID” x
ml:l
ang=
"LAN
G"
xml:s
pace
={"p
rese
rve"
| "d
efau
lt"}>
<p>
*, <
let>
*, <
rule
>*,
plu
s in
ters
pers
ed
<in
clud
e>
<
/pat
tern
>
Usi
ng a
bstr
act
pat
tern
s <
patt
ern
abst
ract
="f
alse
" is-
a="ID
REF"
id=
"ID"
icon
="U
RI" s
ee=
"URI
"
fp
i=”F
ORM
AL-P
UBLI
C-ID
” xm
l:lan
g="L
ANG"
xm
l:spa
ce=
{"pre
serv
e" |
"def
ault"
}>
<
p>*,
<pa
ram
>*,
and
inte
rspe
rsed
<in
clud
e>
<
/pat
tern
>
<pa
ram
nam
e="N
CNAM
E" v
alue
="V
ALUE
"/>
@va
lue
mus
t be
non-
empt
y-st
ring
Phas
es
<ph
ase
id=
"ID" i
con=
"URI
" see
="U
RI"
fpi=
”FO
RMAL
-PUB
LIC-
ID” x
ml:l
ang=
"LAN
G"
xml:s
pace
={"p
rese
rve"
| "d
efau
lt"}>
<p>
*, <
let>
*, <
activ
e>*,
plu
s in
ters
pers
ed
<in
clud
e>
<
/pha
se>
<ac
tive
pat
tern
="ID
REF"
>
an
y nu
mbe
r of t
ext,
<di
r>, <
emph
> a
nd
<sp
an>
</a
ctiv
e>
Rule
s, A
sser
tion
s an
d Re
port
s <
rule
flag
=”N
AME”
abs
trac
t="f
alse
"?
cont
ext=
”PAT
H” i
d="ID
" ico
n="U
RI"
fpi=
”FO
RMAL
-PUB
LIC-
ID” x
ml:l
ang=
"LAN
G"
xml:s
pace
={"p
rese
rve"
| "d
efau
lt"}
see=
"URI
" rol
e="R
OLE
" sub
ject
="P
ATH
">
an
y nu
mbe
r of <
let>
, fol
low
ed b
y an
y nu
mbe
r
(a
t lea
st o
ne) o
f <as
sert
>, <
repo
rt>
and
<
exte
nds>
, plu
s in
ters
pers
ed <
incl
ude>
</r
ule>
<ex
tend
s ru
le=
"IDRE
F"/>
plus
any
fore
ign
attr
ibut
es
<as
sert
tes
t="E
XPR"
flag
=”N
AME”
id=
"ID"
diag
nost
ics=
"IDRE
FS" i
con=
"URI
" see
="U
RI"
fpi=
”FO
RMAL
-PUB
LIC-
ID” x
ml:l
ang=
"LAN
G"
xml:s
pace
={"p
rese
rve"
| "d
efau
lt"}
role
="R
OLE
" sub
ject
="P
ATH
">
an
y nu
mbe
r of t
ext,
<na
me>
, <va
lue-
of>
,
<
emph
>, <
dir>
and
<sp
an>
</a
sser
t>
<re
port
tes
t="E
XPR"
flag
=”N
AME”
id=
"ID"
diag
nost
ics=
"IDRE
FS" i
con=
"URI
" see
="U
RI"
fpi=
”FO
RMAL
-PUB
LIC-
ID” x
ml:l
ang=
"LAN
G"
xml:s
pace
={"p
rese
rve"
| "d
efau
lt"}
role
="R
OLE
" sub
ject
="P
ATH
">
an
y nu
mbe
r of t
ext,
<na
me>
, <va
lue-
of>
,
<
emph
>, <
dir>
and
<sp
an>
</r
epor
t>
Abs
trac
t ru
les
(use
d to
<ex
tend
s> o
ther
s)
<ru
le fl
ag=
”NAM
E” a
bstr
act=
"tru
e"
id=
"ID" i
con=
"URI
"
fp
i=”F
ORM
AL-P
UBLI
C-ID
” xm
l:lan
g="L
ANG"
xm
l:spa
ce=
{"pre
serv
e" |
"def
ault"
}
se
e="U
RI" r
ole=
"RO
LE" s
ubje
ct=
"PAT
H">
any
num
ber o
f <le
t>, f
ollo
wed
by
any
num
ber
(at l
east
one
) of <
asse
rt>
, <re
port
> a
nd
<ex
tend
s>, p
lus
inte
rspe
rsed
<in
clud
e>
<
/rul
e>
XSL-
List
: ht
tp:/
/ww
w.m
ulbe
rryt
ech.
com
/xsl
/xsl
-lis
t
Dia
gnos
tics
<
diag
nost
ics>
any
num
ber o
f <di
agno
stic
> a
nd <
incl
ude>
</d
iagn
osti
cs>
<di
agno
stic
id=
"ID" i
con=
"URI
" see
="U
RI"
fpi=
”FO
RMAL
-PUB
LIC-
ID” x
ml:l
ang=
"LAN
G"
xml:s
pace
={"p
rese
rve"
| "d
efau
lt"}>
any
num
ber o
f tex
t, <
valu
e-of
>, <
emph
>,
<di
r> a
nd <
span
>
<
/dia
gnos
tic>
Form
atti
ng O
utpu
t <
titl
e>
an
y nu
mbe
r of <
dir>
and
text
</t
itle
>
<p
id=
"ID" c
lass
="C
LASS
" ico
n="U
RI">
any
num
ber o
f tex
t, <
dir>
, <em
ph>
and
<
span
>
<
/p>
<di
r va
lue=
{"ltr
" | "r
tl"}>
text
</d
ir>
<em
ph>
text
</e
mph
>
<sp
an c
lass
="C
LASS
">
te
xt
<
/spa
n>
<va
lue-
of s
elec
t="P
ATH
"/>
<
nam
e pa
th=
"PAT
H"/
>
If @
path
not
spe
cifie
d, <
nam
e> re
turn
s th
e na
me
of th
e cu
rren
t nod
e.
Att
ribu
te S
peci
fica
tion
Opt
ions
{
}
alte
rnat
e al
low
ed v
alue
s bo
ld =
re
quire
d at
trib
ute
non-
bold
=
optio
nal a
ttrib
ute
W3C
XSL
T 1.
0 Sp
ecif
icat
ion:
ht
tp:/
/ww
w.w
3.or
g/TR
/xsl
t
W3C
XPa
th 1
.0 S
peci
fica
tion
: ht
tp:/
/ww
w.w
3.or
g/TR
/xpa
th
W3C
XSL
T 2.
0 Sp
ecif
icat
ion:
ht
tp:/
/ww
w.w
3.or
g/TR
/xsl
t20
W3C
XPa
th 2
.0 S
peci
fica
tion
: ht
tp:/
/ww
w.w
3.or
g/TR
/xpa
th20
Whi
ch P
atte
rns
Are
Use
d?
All n
on-a
bstr
act <
patt
ern>
s ar
e us
ed if
:
• th
ere’
s no
<ph
ase>
in th
e <
sche
ma>
,
• th
ere’
s no
<ph
ase>
sel
ecte
d by
its
@id
at
trib
ute,
or
• th
e <
sche
ma>
is in
voke
d w
ith th
e #A
LL o
ptio
n.
If th
ere’
s a
@de
faul
tPha
se, a
nd th
e <
sche
ma>
is
invo
ked
with
the
#DEF
AULT
opt
ion,
then
all
<pa
tter
n>s
refe
renc
ed in
the
<ac
tive>
chi
ldre
n of
the
defa
ult <
phas
e> a
re u
sed.
If th
e im
plem
enta
tion
sele
cts
a <
phas
e> u
sing
its
@id
att
ribut
e, th
en a
ll <
patt
ern>
s re
fere
nced
in
the
<ac
tive>
chi
ldre
n of
that
<ph
ase>
are
use
d.
How
#AL
L, #
DEF
AULT
and
nam
ed p
hase
s ar
e sp
ecifi
ed is
impl
emen
tatio
n-de
term
ined
.
Mor
e A
bout
Att
ribu
tes
@ab
stra
ct in
dica
tes
whe
ther
a <
patt
ern>
or
<ru
le>
is to
be
used
as-
is (i
f “fa
lse”
) or b
y an
othe
r <pa
tter
n> o
r <ru
le>
(if “
true
”).
@de
faul
tPha
se (o
n <
sche
ma>
) ind
icat
es w
hich
<
phas
e> is
use
d to
det
erm
ine
whi
ch <
patt
ern>
s ar
e se
lect
ed b
y th
e #D
EFAU
LT o
ptio
n.
@fl
ag o
n a
fired
<ru
le>
, on
a fa
iling
<as
sert
> o
r on
a s
ucce
edin
g <
repo
rt>
set
s a
flag
for f
urth
er
proc
essi
ng.
@fp
i is
a pu
blic
iden
tifie
r ass
ocia
ted
with
the
elem
ent i
t app
ears
on.
@ic
on is
the
URI o
f the
loca
tion
of a
gra
phic
.
@qu
eryB
indi
ng (
on <
sche
ma>
) ind
icat
es w
hich
qu
ery
lang
uage
is to
be
used
. Th
e de
faul
t is
“xsl
t” —
for X
SLT/
XPat
h 1.
0. O
ther
app
ropr
iate
va
lues
are
: “st
x”, “
xslt1
.1”,
“exs
lt”, “
xslt2
”, “x
path
”, “x
path
2”, “
xque
ry”.
@ro
le is
a n
ame
clas
sify
ing
the
<ru
le>
, <as
sert
>
or <
repo
rt>
, or t
he @
subj
ect,
if a
ny.
@se
e is
the
URI o
f inf
orm
atio
n ab
out t
he s
chem
a its
elf.
@su
bjec
t is
a p
ath
desc
ribin
g re
late
d el
emen
ts
and/
or a
ttrib
utes
, if o
ther
than
the
cont
ext o
f the
cu
rren
t <ru
le>
.
Fore
ign
Elem
ents
and
Att
ribu
tes
Sche
ma
elem
ents
can
hav
e “f
orei
gn” a
ttrib
utes
, an
d no
n-em
pty
sche
ma
elem
ents
can
con
tain
“f
orei
gn” c
hild
ele
men
ts.
Fore
ign
attr
ibut
es a
nd
elem
ents
are
thos
e in
a n
ames
pace
oth
er th
an
"htt
p://
purl.
oclc
.org
/dsd
l/sc
hem
atro
n”.
ISO Schematron
166
Sche
mat
ron
1.5
Scha
mat
ron
1.5
diff
ers
from
ISO
Sch
mat
ron
in th
e fo
llow
ing
way
s:
Ove
rall:
•
The
nam
espa
ce fo
r Sch
emat
ron
1.5
is:
"htt
p:/w
ww
.asc
c.ne
t/xm
l/sc
hem
atro
n"
• <
let>
and
<in
clud
e> e
lem
ents
are
not
su
ppor
ted.
• <
key>
ele
men
t is
supp
orte
d:
<
key
nam
e="N
AME"
pat
h="P
ATH
"
ic
on=
"URI
"/>
<
key>
is a
llow
ed a
nyw
here
in th
e co
nten
t of
<ru
le>
. (In
ISO
Sch
emat
rons
impl
emen
tatio
ns
supp
ortin
g th
e us
e of
XSL
T "f
orei
gn" e
lem
ents
, <
xsl:k
ey>
can
be
used
in p
lace
of S
chem
atro
n 1.
5's
<ke
y>.)
• Ab
stra
ct <
patt
ern>
s ar
e no
t sup
port
ed.
• At
trib
ute
patt
ern/
@na
me
used
to n
ame
<pa
tter
n>s
rath
er th
an @
id.
It's
a re
quire
d at
trib
ute.
Uns
uppo
rted
Att
ribu
tes:
•
Thes
e at
trib
utes
are
not
sup
port
ed a
nyw
here
: @
xml:s
pace
, @fla
g.
• Th
ese
attr
ibut
es a
re n
ot s
uppo
rted
on
<ru
le>
: @
see,
@xm
l:lan
g, @
icon
, @fp
i, @
subj
ect.
• Th
ese
attr
ibut
es a
re n
ot s
uppo
rted
on
<di
agno
stic
s>: @
see,
@fp
i.
• In
add
ition
, att
ribut
e @
see
is n
ot s
uppo
rted
on
<sc
hem
a>, <
asse
rt>
or <
repo
rt>
.
Oth
er D
iffe
renc
es:
• <
valu
e-of
> is
n’t a
llow
ed a
s a
child
of
<as
sert
> o
r <re
port
>.
• At
trib
ute
@ve
rsio
n is
allo
wed
on
<sc
hem
a>.
(Def
ault
valu
e is
"1.5
".)
• Th
e fo
llow
ing
attr
ibut
es a
re o
ptio
nal:
ns/@
uri,
dir/
@va
lue
and
span
/@cl
ass.
Sche
mat
ron
1.6
Sche
mat
ron
1.6
diff
ers
from
Sch
emat
ron
1.5
in
supp
ortin
g m
ost I
SO S
chem
atro
n fe
atur
es,
incl
udin
g <
let>
, <in
clud
e>, a
bstr
act <
patt
ern>
s an
d <
valu
e-of
> in
<as
sert
> a
nd <
repo
rt>
. Sc
hem
atro
n 1.
5/1.
6 Re
sour
ces:
ht
tp:/
/xm
l.asc
c.ne
t/sc
hem
atro
n/
Sche
mat
ron
Val
idat
ion
Repo
rt L
angu
age
The
Sche
mat
ron
Valid
atio
n Re
port
Lan
guag
e is
th
e st
anda
rd fo
r the
out
put o
f an
ISO
Sch
emat
ron
proc
esso
r. It
can
be
post
-pro
cess
ed to
pro
duce
m
ore
read
able
out
put,
if re
quire
d.
<sc
hem
atro
n-ou
tput
title
="T
EXT"
ph
ase=
"NM
TOKE
N" s
chem
aVer
sion
="T
EXT"
xm
lns=
"htt
p://
purl.
oclc
.org
/dsd
l/sv
rl">
<te
xt>
*, <
ns-p
refix
-in-
attr
ibut
e-va
lues
>*,
(<
activ
e-pa
tter
n>, (
<fir
ed-r
ule>
,
(<fa
iled-
asse
rt>
|
<
succ
essf
ul-r
epor
t>)*
)+)+
</s
chem
atro
n-ou
tput
>
<ns
-pre
fix-
in-a
ttri
bute
-val
ues
pref
ix=
"NM
TOKE
N" u
ri=
"URI
"/>
Onl
y na
mes
pace
s fr
om <
ns>
nee
d to
be
repo
rted
.
<ac
tive
-pat
tern
id=
"ID" n
ame=
"TEX
T"
role
="N
MTO
KEN
"/>
Onl
y ac
tive
<pa
tter
n>s
are
repo
rted
.
<fi
red-
rule
id=
"ID" c
onte
xt=
"TEX
T"
role
="N
MTO
KEN
" fla
g="N
MTO
KEN
"/>
Onl
y <
rule
>s
that
are
fire
d ar
e re
port
ed.
<di
agno
stic
-ref
eren
ce
diag
nost
ic=
"NM
TOKE
N">
<te
xt>
</d
iagn
osti
c-re
fere
nce>
Onl
y re
fere
nces
are
repo
rted
, not
the
<di
agno
stic
>.
<fa
iled
-ass
ert
id=
"ID" l
ocat
ion=
"TEX
T"
test
="T
EXT"
role
="N
MTO
KEN
"
fla
g="N
MTO
KEN
">
<
diag
nost
ic-r
efer
ence
>*,
<te
xt>
</f
aile
d-as
sert
>
Onl
y fa
iled
<as
sert
>s
are
repo
rted
.
<su
cces
sful
-rep
ort
id=
"ID" l
ocat
ion=
"TEX
T"
test
="T
EXT"
role
="N
MTO
KEN
"
fla
g="N
MTO
KEN
">
<
diag
nost
ic-r
efer
ence
>*,
<te
xt>
</s
ucce
ssfu
l-re
port
>
Onl
y su
cces
sful
<re
port
>s
are
repo
rted
.
<te
xt>
text
</t
ext>
Se
e ot
her
Qui
ck R
efer
ence
s fo
r at
: ht
tp:/
/ww
w.m
ulbe
rryt
ech.
com
/qui
ckre
f
2012
-03-
05
ISO
Sch
emat
ron
Qui
ck R
efer
ence
Sam
Wil
mot
t sa
m@
wilm
ott.c
a ht
tp:/
/ww
w.w
ilmot
t.ca
and
Mul
berr
y Te
chno
logi
es,
Inc.
17
Wes
t Jef
fers
on S
tree
t, Su
ite 2
07
Rock
ville
, MD
208
50 U
SA
Phon
e: +
1 30
1/31
5-96
31
Fax:
+1
301/
315-
8285
in
fo@
mul
berr
ytec
h.co
m
http
://w
ww
.mul
berr
ytec
h.co
m
© 2
009-
2012
Sam
Wilm
ott a
nd
Mul
berr
y Te
chno
logi
es, I
nc.
ISO
Sch
emat
ron
Exam
ples
C
heck
ing
a do
cum
ent
for
good
pr
acti
ce:
<sc
hem
a xm
lns=
"htt
p://
purl.
oclc
.org
/dsd
l/sc
hem
atro
n"
qu
eryB
indi
ng=
"xsl
t2">
<pa
tter
n>
<tit
le>
Chec
k pa
ragr
aphs
and
title
s fo
r
co
nten
t</t
itle>
<
rule
con
text
="t
itle"
>
<
repo
rt te
st=
"*">
A tit
le c
an o
nly
cont
ain
text
.</r
epor
t>
<
asse
rt te
st=
"nor
mal
ize-
spac
e()">
A tit
le
mus
t hav
e co
nten
t.</a
sser
t>
</r
ule>
<
rule
con
text
="p
">
<
asse
rt te
st=
"* o
r nor
mal
ize-
spac
e()">
A
p
mus
t hav
e co
nten
t.</a
sser
t>
</r
ule>
</p
atte
rn>
<pa
tter
n>
<tit
le>
Repo
rt u
se o
f HTM
L fo
rmat
ting
elem
ents
.</t
itle>
<
rule
con
text
="b
| i">
<
repo
rt te
st=
"tru
e()">
HTM
L <
nam
e/>
el
eme n
ts s
houl
dn't
be u
sed
(foun
d
in
<na
me
path
="..
"/>
).</r
epor
t>
</r
ule>
</p
atte
rn>
<pa
tter
n>
<tit
le>
Chec
k th
at ti
tles
prec
ede
som
ethi
ng.<
/titl
e>
<ru
le c
onte
xt=
"titl
e">
<as
sert
test
=
"f
ollo
win
g-si
blin
g::*
[1][n
ot(s
elf::
title
)]"
>
A tit
le s
houl
d be
follo
wed
by
a
no
n-tit
le e
lem
ent.<
/ass
ert>
<
/rul
e>
<
/pat
tern
>
</s
chem
a>
ISO
Sch
emat
ron:
Go
to:
http
://w
ww
.iso.
org/
Publ
icly
Avai
labl
eSta
ndar
ds
and
sear
ch fo
r "Sc
hem
atro
n".
Oth
er S
chem
atro
n re
sour
ces:
ht
tp:/
/ww
w.s
chem
atro
n.co
m
167
Workshop: An Introduction to Digital Humanities Tools and Approaches
7 Workshop: An Introduction to Digital Humanities Tools andApproaches
7.1 Corpus Linguistics and Text Analysis7.1.1 Corpus and Text Analysis for Research in the HumanitiesAs more and more large digital datasets of modern and historical texts become available, it is becomingincreasingly important for scholars in the humanities and beyond to be able to search, sort and analyzeelectronic texts. Traditional scholarship and research methods have taught us how to read texts closelyand critically, and how to evaluate and use printed sources. But what do we do when we have a millionbooks readily available to us online? Corpus linguistics and related areas of electronic electronic textanalysis have pioneered techniques to deal with this ’data deluge’ and have transformed many areas ofliterary and linguistic study. This short course will aim to promote the use of the techniques developedin these domains to a dress a much wider range of research questions from across the disciplines of theHumanities.
The first session will be a lecture giving an overview of some of the most relevant and useful resources,tools and methods, some suggestions of investigations based on them, and an exploration of some of theongoing barriers and problems.
7.1.2 Dealing with the Data Deluge: Corpus Linguistics for Text-Based ResearchThe second session will be an opportunity to work ’hands-on’ with some of these corpora and tools,exploring exercises based on finding and evaluating evidence relevant to a variety of primarily non-linguistic topics, such as:
• Do men swear more than women in conversation?
• How can we trace changes in meaning in political terminology?
• Can we find when a word was first used, and how its meaning has changed?
• In my writing, can I find out if someone overuses some words and phrases compared to the normsfor the language?
• How much can we trust Google Books? How can it be used for scholarship?
• How many new words or meanings did Shakespeare invent?
168
Spatial awareness: a brief introduction to ArcGIS
This practical is designed to give a brief introduction to some of the analyses you can perform when working with GIS. It is intentionally over length: I do not expect you to necessarily get every stage done. Feel free to either work through as much as you can, or to pick and choose the parts of most interest. However, sections 1 to 4 should be considered the most important
The data used is completely invented (if you plotted it against the Ordnance Survey basemap, you would find that it is located somewhere in the uplands east of Manchester). This means you can feel free to play around with it however you wish after you leave, but also means that you should not use it for any kind of real world application or in any published material.
The back story to this GIS analysis is that the development of a small tourist attraction has been proposed in the region of a rural village. You are taking the role of the local authority planning archaeologist (the techniques used are of wide application), who has brought together a collection of different data in order to assess what impact this development might have on any buried archaeological remains in the local area. You have already added most of this data to your GIS map, and you will now try to study the data available to you to discover what past remains might still exist under the ground.
1. Exploring the map
We will begin by learning how to explore the map, zooming in and out, and turning the visibility of data layers on and off. Here is an image of the ArcMap user interface:
You will not have this exact configuration of toolbars, but don’t worry about that. The main things to take note of are the map exploration tools: The magnifying glasses enable you to zoom in and out by either clicking on the map or dragging out a box. The hand lets you grab and pan the map. The globe lets you zoom out to the full extent of all layers on the map. And the two sets of arrows are fixed zooms in and out. Try all of these tools now. Then, zoom to the full extent of the map.
Another useful tool is the identify tool: This little ‘i’ allows you to click on any object on the map to get more information on it. The gridded features on the map are field survey results: use the identify tool to click on some of the transects to see what data was gathered. When you are done with the identify tool, click on the pointer icon next to it to get back your normal mouse pointer (the same applies to the magnification and pan tools).
On the left is the Table of Contents, which shows you which geographic data layers have been added to the map document. When you are working in ArcMap, the document you are working on is essentially just a container, which links through to other files containing the actual data and which defines the appearance and ordering of layers. As a result, when you undertake most operations in ArcMap, you are not changing
anything in the original data (unless working with editing tools or creating new data). However, this also means that if you move files around on your hard drive, then ArcMap will lose its links to them: being disciplined about maintaining file locations and directory structures is very important.
If you drag layers up and down in the Table of Contents, you will see that their drawing order is changed accordingly on the map. If you untick the boxes next to the layers, you will switch off their visibility (tick to turn back on). Try experimenting with this now. One thing to note is that there are several different views available in the Table of Contents, we want the ‘List by drawing order’ view, which should already be selected at the top with the left-hand icon:
While we are looking at the user interface, here are a few other things to note which will be of use later. If you want to add new data to the map, you click on the + icon: If you want to open up ArcCatalog or ArcToolbox (see below), you click on the filing cabinet or toolbox icons:
These should then appear as new windows or as tabs along the right hand edge of the user interface (image rotated):
Finally, to switch easily between the data and layout views (see below), we click on these small icons at the bottom left of the map window: The map icon is for data view (which is our normal view) and the page icon is for layout view (which is for creating map outputs, more on which later). The little refresh icon will refresh the displayed map and the pause icon will stop the map re-drawing until clicked again.
Here is a quick guide to what you are looking at on the map. The layer named ‘dem’ is a Digital Elevation Model of the local area: this shows the terrain. We then have a rectified (i.e. fitted to the map) satellite image and the rectified results of a geophysical survey (showing buried features based upon their magnetic properties). These are all raster data, whereas the rest of the data is vector. The lakes, rivers, roads and buildings layers should be self-explanatory. The divisions layer shows local field divisions. As already briefly stated, the survey fields layers show the transects surveyed by fieldwalking to discover surface pottery finds. Finally, the development layer shows the extent of the proposed new development. As you can see, we have quite a large amount of data here with which to assess the potential archaeological impact of this planned new tourist attraction.
2. Creating a layer from a table
Next, we will add one more data layer to the map. This is in the form of a .csv table, which is essentially a form of spreadsheet (Excel files will also work similarly with modern versions of ArcGIS). This spreadsheet contains x and y coordinates for each entry, so when we plot the contents on the map, it will appear as a series of points.
Open up the ArcCatalog window. This will either be a tab at the right hand side of the window or will open as a window by itself when clicking on the ArcCatalog yellow filing cabinet button (). You should see a list of folders and files. In the list, there will be an entry called SMR next to a page icon. Right click on this item and select ‘Create feature class’ and then ‘From XY table’. A new window should appear. Select ‘Easting’ from the drop down box below ‘X Field:’ and ‘Northing’ from the drop down box below ‘Y Field:’. Then click on the button labelled ‘Coordinate System of Input Coordinates…’. This is where we set the projection for our new data, which in this case is the British National Grid. In the new window, click on the ‘Select…’ button and then navigate to Projected Coordinate Systems\National Grids\Europe\ and double click on the British National Grid.prj file. Then click on ‘OK’ in the two previous windows in turn. This has now created a new ‘shapefile’ from our table.
The term shapefile is a slight misnomer, in that it actually consists of a series of related files. It is a very common type of GIS data object, and as such is a good option to store your data, as it is very portable to other software. The files that make up a shapefile must always be kept together in the same location on your disk, otherwise it may become broken. Other common types of GIS data object include TAB files (used by MapInfo) and KML or KMZ files (particularly associated with Google Earth). These are all vector objects. Raster files are probably most commonly seen in TIFF or IMG format, or sometimes JPEG. The new layer may have appeared automatically on your map or may not have. If it did not, click on the Add Data button ( ) and add the new layer (which will be called XYSMR.shp unless you renamed it). You should then see a series of points appear on your map. These represent records kept by the local authority of previously discovered archaeological features in the area.
3. Displaying attributes
Now, we will learn how to change the symbols for our different data layers, particularly in regard to displaying attributes associated with our data.
In the Table of Contents, right click on the layer you just added (on the layer name, not the symbol) and select ‘Open Attribute Table’. You can see from this table that there are a series of attributes associated with each of our points. We can query these attributes (in the conventional database fashion) or we can use them to define the symbols used to display our objects on the map. Close the table in the normal way.
For example, you might wish to use different symbols to display which period each point dated to. To do so, either double click on the layer name or right click on the name and select ‘Properties…’. The window that appears is used to set the various properties associated with this layer. You will see a series of tabs along the top: select the ‘Symbology’ tab. Here is where we set the symbols used. Under ‘Show:’ on the left select ‘Categories’ and then ‘Unique values’. This allows us to draw different symbols depending upon the different categories in any of the columns in the layer’s attribute table. Under ‘Value Field’, select the ‘Period’ entry and then click on the ‘Add All Values’ button. You should see a series of different symbols appear for each category of period. If you double click on a symbol icon, you can change it using the symbol selector window. If you right click on a symbol icon and select ‘Properties for all symbols…’, you can set the symbol for all of the categories in the list. Experiment with this and try to create a symbology that you like.
When you are satisfied, click on ‘OK’ and you should see that the points on the map are now drawn using your new symbology. Next, try setting the symbology of the various field survey layers so that they show the number of mid-Roman pottery sherds recovered from each transect: to make this work, you will have to use the ‘Quantities’ rather than the ‘Categories’ symbology option (hint: graduated colours will display the number of sherds using a colour range). Unfortunately, you may discover a difficulty in this in that these surveys were undertaken by different people who quantified their pottery counts differently, but do the best that you can: this is the type of difficulty that
you might encounter quite regularly when dealing with other people’s data.
Another problem with the output is that the different layers use their own colour scaling according to the different maximum sherd counts. If you wish to try to counter this, note which layer has the highest sherd count, then open up the properties for the other layers (one at a time), click on the ‘Import…’ button and import the symbology from the layer with the highest maximum value, making sure to select the relevant value field. Don’t worry if this is getting a little too complicated, just move onto the next item.
4. Creating map outputs
We are currently in the data view, but certain things are lacking to turn this into an acceptable map for publication or wider dissemination: the so-called map furniture. A map should be considered incomplete if it lacks a scale, an indication of the direction of north and a legend (albeit this final one is not always entirely necessary). We will now learn how to add these to our map and export the results to an image file.
First we need to switch to Layout view by clicking on the page symbol at the bottom left of the map ( ). You can also do this from the ‘View’ menu. Next, we need to switch the page layout to landscape to better fit the shape of our map. As with most software, this is done by selecting ‘Page and print setup…’ from the ‘File’ menu. Select ‘Landscape’ and click ‘OK’. You will see that the frame in which our map is drawn now overlaps the edges of the page, so resize it to fit and then click on the zoom to extent button (the globe icon: ).
Next, we will add a north arrow. This is easily accomplished via the ‘Insert’ menu by selecting the ‘North arrow’ option. Select an arrow you like from the window that appears and click on ‘OK’. Drag the arrow to a sensible place on your page.
Then add a scale bar by selecting ‘Scale bar’ from the ‘Insert menu’. Again, choose a design you like and then click ‘OK’. You will probably see that the result looks something of a mess. If we drag to resize the scale bar, the units will adjust, so drag it out until it is 1 kilometre in length. However, kilometres is hardly a sensible unit for a map on this scale, so double click on the scale bar to open up its properties. Set the number of divisions to 5, the units
to metres, and then on the next tab, set the marks to only appear next to divisions (rather than subdivisions). Once you click ‘OK’, you will probably find that the scale bar again needs resizing to use sensible units. Set it to 1,000m.
You should now have a map that looks much more useful. It still lacks a legend, however. Try adding a legend in the same way. This is quite complicated, but a bit of experimentation should result in something acceptable. Think about which layers require a legend to explain them and which ought to be pretty self-explanatory based upon their form: you do not have to add every layer to the legend. If you cannot get your legend to look pretty, then feel free to remove it: you can explain what is on the map in your figure caption when putting together your document if your map is not too complex.
Finally, we now want to export our map as an image so that we can insert it into a document (or for whatever else you might wish to do with it). To do so, go to the ‘File’ menu and select ‘Export map…’. Navigate to a sensible location to save your file in the usual way, select a file type (I usually use PNG as it results in a smaller file size than a TIFF but is much crisper than a JPEG). Set the resolution to 300dpi and tick the box at the bottom of the window that says ‘Clip output to graphics extent’. Then click on the ‘Save’ button to save your image. Once it has finished exporting (watch for messages in the bar at the bottom of the ArcMap window), minimise ArcMap and find your image where you saved it. You should hopefully end up with something like this (preferably prettier though!):
5. Performing spatial queries
You can query a GIS database in similar ways to conventional databases, e.g. based on attributes. However, one of the great strengths of GIS is in the performing of spatial queries. We will now learn how to undertake these.
Firstly, switch back to the Data view by clicking on the map icon () below the map window. You can see the area of the
proposed development on the map as a partially transparent red polygon. We can see that one of the points we added to the map falls within this area. However, because the records in this dataset are not necessarily precisely recorded (spatially) and because they are points which might represent features of greater extent than a point implies, we cannot be certain that none of the other material represented by the other records is likely to be impacted by the development. Therefore, we need to find out which records fall within 500 metres of the development area.
To do so, we go to the ‘Select’ menu and choose ‘Select by location…’: when you do this, note that the ‘Select by attributes…’ option in the same menu is what we would choose to perform a standard query. A
new window should appear. Working downwards, the ‘Selection method:’ should be ‘select features from’, the ‘Target layer(s):’ should be the SMR point layer added earlier, the ‘Source layer:’ should be ‘Development’, the ‘Spatial selection method:’ should be ‘Target layer(s) features are within a distance of the Source layer feature’, ‘Apply a search distance’ should be ticked, and the search distance needs to be 500 metres. When you have this set correctly, click ‘OK’.
You should see that all of the features in our point layer that fall within 500 metres of the development area are selected on the map (probably in cyan). If you open the attribute table for the layer (right click on the layer name in the Table of Contents), you will see that these records are also highlighted in cyan here. If you read the descriptions, you will see that some of these are indeed features which might be of sufficient extent to fall within the development area.
If you wish to try a slightly more complex query, try to find out how many SMR point records fall within 250 metres of the church seen in the buildings layer. To do this, you will first need to either construct an attribute query to select the church or use the selection tool ( ) on the toolbars to select it by hand.
The final exercise is more complicated and, as a result, you may not wish to work through it. At the end of these instructions are suggestions for how you might explore this data further, so if you do not want to try the more complex analysis, feel free to move ahead to the end.
6. Visibility analysis (viewshed) [OPTIONAL]
One form of more complex GIS analysis which is often used by archaeologists and other GIS users is the assessment of what is visible (or not visible) from a point in the landscape. We shall give this a try now.
First, add the layer named Observer.shp. This is a point representing a person standing within the end of the rectangular feature seen in the geophysics and satellite image layers. We need to add an entry to the attribute table for this layer to represent the height of the person represented by the point. Open the attribute table. At the top left of the table is an icon that looks like a record card with a drop down arrow next to it. Click on the arrow to open the table menu and then select
‘Add field…’: the field needs to be called OFFSETA and should be of ‘Float’ type, i.e. a floating point number. Set these and then click ‘OK’. Once you have added the new OFFSETA field, right click on its heading (i.e. where it says OFFSETA) and select the ‘Field Calculator’ option. In the large textbox below ‘OFFSETA =’, type 1.7 and press ‘OK’. This has set the height of our person to 1.7 metres. If this has worked, close the attribute table.
Next, we will venture into the ArcToolbox, where all of the tools built into ArcGIS are kept. However, we need to make sure that the Spatial Analyst extension is turned on, so under the ‘Customize’ menu, select ‘Extensions…’ and then make sure that Spatial Analyst is ticked.
Open ArcToolbox. This will either be in a tab at the right hand side of the window or can be accessed by clicking on the icon that looks like a red toolbox ( ). In the window that opens, you should see a lot of red toolboxes. Expand ‘Spatial Analyst’ and then ‘Surface’: you should now see a tool called ‘Viewshed’. Double click on it.
In the window that opens, under ‘Input raster’ open the drop down box and select the ‘dem’ layer: this is the surface which will be used to determine what our observer could see. Under ‘Input point or polyline observer features’ open the drop down box and select the ‘Observer’ layer: this is, naturally, the observing person. Click on the little yellow folder icon next to the ‘Output raster’ text box, make sure you are in the correct data directory (i.e. the one with all the other files in it) and name the new layer view1.tif, then press the ‘Save’ button. Then click on ‘OK’ in the viewshed window.
The viewshed will probably take up to a minute to calculate, but should appear on the map when it is done. The resulting layer will be transparent (i.e. hold a value of ‘NODATA’) for areas which the person could not see and coloured for areas which he or she could see. Due to the relatively flat country and the slight local variation in pixel values, this is quite a messy picture, but it does raise some interesting questions in that it would appear that this monument was situated in an area which provided good views of the surrounding hills and the river valley, but which provided poor views of the middle distance around the observation point. Of course, this would only truly be interpretatively interesting if this was a real world case!
If you liked doing this, you could try creating viewsheds for some of the points in our SMR layer. To do so, you would need to select one of the points (using the manual select tool, via a query, or by selecting one of the rows in the attribute table) and then right click on the layer in the Table of Contents and select ‘Data’ then ‘Export data…’; then export the selected record to a new layer and repeat from the start of this section with your new observer layer. Of course, because this map covers quite a small area, any sites towards the top of the hills will have good views of most of the valley!
If you do finish early or do not want to try exercise 6, try to create some images to show what factors would cause concern about the archaeological
impact of the development area seen on the map. Would you allow the developer to go ahead with their scheme? The geophysics and satellite image layers have features in them that appear to be archaeological (the long rectangular feature and also the partial circles): can you work out what date these might be from looking at the dates of pottery seen in the survey of those and nearby fields, particularly pottery recovered near the features?
Chris Green - 05/2012
8 Workshop: A Humanities Web of Data: Publishing, Linking,Querying and Visualisation on the Semantic Web
[Materials will be provided to students at the workshop]
181