View
1.984
Download
0
Embed Size (px)
DESCRIPTION
Par Lou Burnard. Tous droits réservés
Citation preview
Beyond the DocumentLou Burnard
The message The metaphor of the digital book is so pervasive that we can barely see it.
But going digital is not only about producing cheaper and more accessible simulations of printed or painted pages.
Digital applications should enable us to do more with a text than simply read it from beginning to end, or attach annotations to it for others to read, or link it to other digital texts
We are at last moving beyond the document, towards a distributed world, in which “the books in the library can talk to each other”
Plan What's that noise in the digital library? From Literary and Linguistic Computing to
Humanities Computing to Digital Humanities
A classical case study What should we be proud of?
Three simple truths
1. There is no going back : the knowledge infrastructure is now irrevocably digital
2. The business models of the knowledge infrastructure have changed irrevocably
3. The quantititative changes facilitated by digital technologies approximate qualitative change
Irrevocable digitality
The objects of Humanities scholarship are now digital, even if its methods are not
And our methods are changing all around us... We are moving from hypertext to hyperdata From a web of documents to a web of data
The technology is here (more or less) The problems are mostly socio-politico-cultural
But first, a little history lesson
Literary & Linguistic
Computing
1960-1980
The Heroic age... Father Busa and the Index Thomasticum The Brown Corpus Thesaurus Linguae Graecae
concordances, stylistic analysis, authorship studies, language corpora
technical barriers, inpenetrable for all but the determined (or mad)
LLC is also a journal, and an annual conference
http://llc.oxfordjournals.org/
LLC is alive and well and living in France
Text as a statistical phenomenon Factor analysis and data mining Textometrie
Humanities Computing
1980-1994
Institutionalization Is Humanities Computing an Academic
Discipline? The “text encoding” project
In the home, the eighties was a decade of technology that nearly worked
In academia, digital methods and resources, though perceived as alien and difficult, were also finding their place
In the UK Computers in Teaching Initiative Arts and Humanities Data Service
Something new, or something old done better?
The rise of the HC centre
Communities
E-mail and e-mail lists: Humanist Electronic Text paradigms
Oxford Text Archive Project Gutenberg
NLP (TALN) Public funding becomes important
Computers in Teaching Initiative (CTI) And private enterprise is curious
Electronic Publishing SIG
Once we have made our digital surrogates, what then?
Traditions (”scholarly primitives”) finding by means of external characteristics analysing by means of internal features associating by means of shared perceptions
What tools and methods will help combine these approaches?
What theory will inform their application?
The challenge for HC
Resources
digital resources
encoding
analysis
abstractmodel
scholarship depends on continuity it is not enough to preserve the bytes of an
encoding there must also be a continuity of
comprehension: the encoding must be self-descriptive
Transmitting our interpretations
TEI: the main achievement of HC?
Originally a response to the multiplicity of formats and lack of standards
The TEI emerged as a single, encyclopaedic model of the “significant particularities” of textual resources
And also an adaptable architecture able to respond to changing needs and priorities
Digital Humanities
1995 - ?
While we were talking about the theory.... digital libraries mass digitization commodity computing, folksonomies, cloud
computing... Convergence and collaboration
rethinking scholarly editing redefining the discipline
New infrastructures?
The rise of the digital library
“Public good” digitization efforts From Gallica to JISC Digititization Programme
The metadata challenge Authority and link-rot: Resource Discovery Network
to Intute From Dublin Core to OAI/PMH Can systems be self-organizing?
What is the right business model?
An alternative model
What works for software could work equally well for digital resources
When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs.
When developers can access, redistribute, and enhance the digital resources underlying a digital application, new applications can evolve. People can add value, people can adapt it, people can fix bugs.
Open up the data warehouse!
Digital Humanities is not a unified field but an array of convergent practices that explore a universe in which: a) print is no longer the exclusive or the normative medium in which knowledge is produced and/or disseminated; instead, print finds itself absorbed into new, multimedia configurations; and b) digital tools, techniques, and media have altered the production and dissemination of knowledge in the arts, human and social sciences.http://dev.cdh.ucla.edu/digitalhumanities/2009/05/29/the-digital-humanities-manifesto-20/#0
Digital humanities manifesto 2.0
Digital Humanities implies the multi-purposing and multiple channeling of humanistic knowledge: no channel excludes the other. Its economy is abundance based, not one based upon scarcity.... though notions of humanistic research are everywhere under institutional pressure, there is (potentially) plenty for all. And, indeed, there is plenty to do.
ibid...
The importance of not reading
“What can you do with a million books?” (Greg Crane)
“Although there is still a need for close-reading... we never don't not read” (John Unsworth)
A new synergy of methods: Corpus linguistics Pattern recognition Data mining
http://www3.isrl.illinois.edu/~unsworth/hownot2read.html
How to not read
We need to find ways of cross-searching, decomposing, and re-composing
rich xml documents complex relational database structures simple presentation-focussed websites sound, image, video...
The challenge is to do this in an open and standards-compliant manner
And on a massive scale
Escaping from the text
From footnote to hypertext
A classicalcase study
CLAROS , for example
(Current) Partners University of Oxford: Faculty of Classics
Beazley Archive: documentation of pottery, jewels, etc. Lexicon of Greek Personal Names: attested names
University of Cologne Arachne Archive: data about sculpture
German Archaeological Institute, Berlin Images from archaeological sites
University of Paris X Lexicon Iconographicum Mythologiae Classicae:
Over 2 million records and images Four different database systems
BeazleyArchive
DAIArachne
LGPN(Oxford)
LIMC(Paris)
.NET / ASP XSLT, PHP Java XSLT
Relationaldatabase:
MS SQL Server
XMLdatabase
Relationaldatabase:MySQL
Relationaldatabase:MySQL
Browser Browser Browser Browser
A mix of technologies...
...but a common conceptual model
34
How does it actually work?
What makes this possible?
It's not rocket science! XML markup with a shared semantics (TEI) Appropriate use of new technologies (e.g.
Unicode, javascript) A willingness to open up our data
Rethinking the digital editionThe insights of critical editing/edition philology need to be re-discovered and re-applied in the new context
We need a new synergy of semiotics and hermeneutics
Combined with the traditional virtues of skepticism and empiricism
Components of the digital edition
Manuscript page images Annotated transcriptions Critical (synthetic) edition Modern translation and summary Notes, glossary, foreword, bibliography, etc. Manuscript descriptions and metadata “Factoids” about the real world
The textual trinity Textual descriptions tend to focus on one of:
its linguistic nature (because texts are made of words used in particular ways)
its physical state (because texts are made up of glyphs arranged in particular ways)
its intentions (because texts are supposed to tell us something about the world)
Likewise, software tends to distinguish document management and production systems image management and production systems database systems
Convergence
But the digital agenda requires us to mash these things up: for example to combine
a GIS database about places in the Aegean sea a historical gazeteer of placenames in the same
area a corpus of texts mentioning those placenames
TEI has recently expanded its scope to support this kind of convergence
conclusions
A key role for the Humanities We know about textual objects
how is this discourse represented? what stories does it tell
We know about hermeneutics what does this discourse mean? what does it say aside from its
denotational content? This is our contribution to the
semantic web