Download pdf - A Spot of TEI

Page 1: A Spot of TEI

February 4th, 2013

A spot of TEIHugh Cayless, [email protected] me on Twitter: @hcayless

Page 2: A Spot of TEI

Who am I?

✤ Ph.D. in Classics, M.S. in Information Science

✤ Worked as a software engineer for the last 12 years or so

✤ the last 4 have been for NYU doing Digital Classics and similar cultural heritage digital access projects

✤ recently elected to the TEI Technical Council.

✤ One of the founders of EpiDoc, a TEI-based standard for encoding ancient inscriptions (and now papyri too).

Page 3: A Spot of TEI

What am I talking about?

✤ How we use TEI/XML in projects

✤ Why TEI?

✤ Current projects

Page 4: A Spot of TEI

Integrating Digital Papryology

✤ Unification of several long-running projects:

✤ Duke Databank of Documentary Papyri (DDbDP)✤ Heidelberg Gesamtverzeichnis (directory of Greek documentary

papyri — HGV)✤ Advanced Papyrological Information System (APIS)✤ Bibliographie Papyrologique✤ Trismegistos

Page 5: A Spot of TEI

State of play at the beginning

✤ DDbDP: TEI SGML files

✤ HGV: Filemaker Pro database + web interface

✤ APIS: idiosyncratic text-based catalog + images + web interface

✤ BP: database only, published annually in print/on disk

✤ TM: database + web interface

✤ TM is a going concern, working with IDP, but with no plans to be subsumed by it

Page 6: A Spot of TEI

What we did

✤ DDbDP: converted TEI SGML to EpiDoc (TEI) XML

✤ HGV: converted to EpiDoc XML

✤ APIS: converted to EpiDoc XML

✤ BP: converted to TEI <bibl> fragments

✤ TM: inserted TM ids into IDP documents, generated linkages to TM site

Page 7: A Spot of TEI


✤ The core of the system is just TEI files in a Git repository.

✤ These are transformed, using XSLT, into RDF, HTML, plain text, and add documents for our search index.

✤ They are pulled into an editing workflow system as needed, which allows editing the files using a web form or (for texts) a non-XML syntax based on papyrological/epigraphic editing conventions.

✤ An automated process syncs data from the editor’s repo and a Github repo, and publishes them to the site.

Page 8: A Spot of TEI

Or, visually

Canonical Git Repo

Github Repo


Git Repos

Editor Database

Numbers Server Git Repo

Navigator Interface

search API




Automated Document Sync

Leiden+ Conversion


Search Engine

Page 9: A Spot of TEI

So why TEI?

✤ Lots of reasons:

✤ Granular control over records

✤ Attribution

✤ Multiple outputs

✤ Mixture of controlled and free-form data

✤ Relatively easy to obtain / create tools

✤ Engaged and responsive community

Page 10: A Spot of TEI

What I’m working on now

✤ Fixing the TEI Pointer spec

✤ Annotation of documents to mark things like personal and place names

✤ Linguistic annotation

✤ Linking text and image

Page 11: A Spot of TEI

Some examples


✤ fine-grained attribution / version control (click on “Editorial History”) and “Detailed” at the bottom of the text)


✤ What’s going on underneath?

Page 12: A Spot of TEI

r  ̣[  ̣  ̣  ̣]  ̣c  ̣[  ̣  ̣ Aelio Fel]ici pluṛ[imam] ṣạ[lutem] opto deos · ut mi[hi v]ạleas · quod ṃẹ[um votum est] ego enim · valeọ coṛpọṛe   ̣  ̣  ̣[ -ca.?- ] te non videọ rog̣ọ ṇe · fac ̣ịaṣ [ -ca.?- ] f  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣[- ca.9 -]uma  ̣[ -ca.?- ]   ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣ [ -ca.?- ]vAelio Felici

Beginning of a letter marked up according to the Leiden Conventions

Page 13: A Spot of TEI

r  ̣[  ̣  ̣  ̣]  ̣C  ̣[  ̣  ̣–ca.9– ]ICIPLUṚ[. . . .] ṢẠ[. . . . .] OPTODEOS · UTMI[. . . ]ẠLEAS · QUODṂẸ[ –ca.10– ] EGOENIM · VALEỌCOṚPỌṚE   ̣  ̣  ̣[ -ca.?- ] TENONVIDEỌROG̣ỌṆE · FAC̣ỊAṢ [ -ca.?- ] F ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣[- ca.9 -]UMA  ̣[ -ca.?- ]   ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣  ̣ [ -ca.?- ]vAELIO FELICI

The same letter, diplomatic(ish) edition

Page 14: A Spot of TEI

<div xml:lang="la" type="edition" xml:space="preserve"><div n="r" type="textpart"><!--milestone unit="4"--><ab><lb n="1"/><gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="3" unit="character"/><gap reason="illegible" quantity="1" unit="character"/>c<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" quantity="2" unit="character"/><supplied reason="lost"> Aelio Fel</supplied>ici plu<unclear>r</unclear><supplied reason="lost">imam</supplied><lb n="2"/><unclear>sa</unclear><supplied reason="lost">lutem</supplied><lb n="3"/>opto deos <g type="middot"/> ut mi<supplied reason="lost">hi v</supplied><unclear>a</unclear>leas <g type="middot"/> quod <unclear>me</unclear><supplied reason="lost">um votum est</supplied><lb n="4"/>ego enim <g type="middot"/> vale<unclear>o</unclear> co<unclear>r</unclear>p<unclear>or</unclear>e <gap reason="illegible" quantity="3" unit="character"/><gap reason="lost" extent="unknown" unit="character"/><lb n="5"/>te non vide<unclear>o</unclear> ro<unclear>go</unclear> <unclear>n</unclear>e <g type="middot"/> fa<unclear>ci</unclear>a<unclear>s</unclear> <gap reason="lost" extent="unknown" unit="character"/><lb n="6"/>f<gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/><gap reason="lost" quantity="9" unit="character"/>uma<gap reason="illegible" quantity="1" unit="character"/><gap reason="lost" extent="unknown" unit="character"/><lb n="7"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="4" unit="character"/><gap reason="illegible" quantity="2" unit="character"/> <gap reason="lost" extent="unknown" unit="character"/></ab></div><div n="v" type="textpart"><!--milestone unit="4"--><ab><lb n="1"/>Aelio Felici </ab></div></div>

The same letter marked up in EpiDoc (TEI) XML

Page 15: A Spot of TEI

The same letter, visualization of the tree structure of the XML

Page 16: A Spot of TEI

✤ What is the text and what is the markup?

✤ There is no text, only readings. EpiDoc allows you to produce models of readings.

✤ Slicing the text up into bits isn’t adulterating it, it just adds hooks for transforming the text in useful ways.

Page 17: A Spot of TEI

✤ Mailing list: [email protected] ✤✤

✤ TEI Sourceforge:✤ Report a bug:

✤✤ Make a feature request:


✤ IRC: #tei-c on

How to get involved
