41
Beyond the Document Lou Burnard

Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

  • View
    1.984

  • Download
    0

Embed Size (px)

DESCRIPTION

Par Lou Burnard. Tous droits réservés

Citation preview

Page 1: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Beyond the DocumentLou Burnard

Page 2: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

The message The metaphor of the digital book is so pervasive that we can barely see it.

But going digital is not only about producing cheaper and more accessible simulations of printed or painted pages.

Digital applications should enable us to do more with a text than simply read it from beginning to end, or attach annotations to it for others to read, or link it to other digital texts

We are at last moving beyond the document, towards a distributed world, in which “the books in the library can talk to each other”

Page 3: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Plan What's that noise in the digital library? From Literary and Linguistic Computing to

Humanities Computing to Digital Humanities

A classical case study What should we be proud of?

Page 4: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Three simple truths

1. There is no going back : the knowledge infrastructure is now irrevocably digital

2. The business models of the knowledge infrastructure have changed irrevocably

3. The quantititative changes facilitated by digital technologies approximate qualitative change

Page 5: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Irrevocable digitality

The objects of Humanities scholarship are now digital, even if its methods are not

And our methods are changing all around us... We are moving from hypertext to hyperdata From a web of documents to a web of data

The technology is here (more or less) The problems are mostly socio-politico-cultural

But first, a little history lesson

Page 6: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Literary & Linguistic

Computing

Page 7: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

1960-1980

The Heroic age... Father Busa and the Index Thomasticum The Brown Corpus Thesaurus Linguae Graecae

concordances, stylistic analysis, authorship studies, language corpora

technical barriers, inpenetrable for all but the determined (or mad)

Page 8: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

LLC is also a journal, and an annual conference

http://llc.oxfordjournals.org/

Page 9: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

LLC is alive and well and living in France

Text as a statistical phenomenon Factor analysis and data mining Textometrie

Page 10: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Humanities Computing

Page 11: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

1980-1994

Institutionalization Is Humanities Computing an Academic

Discipline? The “text encoding” project

Page 12: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Institutionalization

http://www.allc.org/imhc

Page 13: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

In the home, the eighties was a decade of technology that nearly worked

In academia, digital methods and resources, though perceived as alien and difficult, were also finding their place

In the UK Computers in Teaching Initiative Arts and Humanities Data Service

Something new, or something old done better?

The rise of the HC centre

Page 14: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Communities

E-mail and e-mail lists: Humanist Electronic Text paradigms

Oxford Text Archive Project Gutenberg

NLP (TALN) Public funding becomes important

Computers in Teaching Initiative (CTI) And private enterprise is curious

Electronic Publishing SIG

Page 15: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Once we have made our digital surrogates, what then?

Traditions (”scholarly primitives”) finding by means of external characteristics analysing by means of internal features associating by means of shared perceptions

What tools and methods will help combine these approaches?

What theory will inform their application?

The challenge for HC

Page 16: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Resources

digital resources

encoding

analysis

abstractmodel

Page 17: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

scholarship depends on continuity it is not enough to preserve the bytes of an

encoding there must also be a continuity of

comprehension: the encoding must be self-descriptive

Transmitting our interpretations

Page 18: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

TEI: the main achievement of HC?

Originally a response to the multiplicity of formats and lack of standards

The TEI emerged as a single, encyclopaedic model of the “significant particularities” of textual resources

And also an adaptable architecture able to respond to changing needs and priorities

Page 19: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Digital Humanities

Page 20: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

1995 - ?

While we were talking about the theory.... digital libraries mass digitization commodity computing, folksonomies, cloud

computing... Convergence and collaboration

rethinking scholarly editing redefining the discipline

New infrastructures?

Page 21: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

The rise of the digital library

“Public good” digitization efforts From Gallica to JISC Digititization Programme

The metadata challenge Authority and link-rot: Resource Discovery Network

to Intute From Dublin Core to OAI/PMH Can systems be self-organizing?

What is the right business model?

Page 22: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

An alternative model

What works for software could work equally well for digital resources

When programmers can read, redistribute, and modify the source code for a piece of software, the software evolves. People improve it, people adapt it, people fix bugs.

When developers can access, redistribute, and enhance the digital resources underlying a digital application, new applications can evolve. People can add value, people can adapt it, people can fix bugs.

Page 23: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Open up the data warehouse!

Page 24: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Digital Humanities is not a unified field but an array of convergent practices that explore a universe in which: a) print is no longer the exclusive or the normative medium in which knowledge is produced and/or disseminated; instead, print finds itself absorbed into new, multimedia configurations; and b) digital tools, techniques, and media have altered the production and dissemination of knowledge in the arts, human and social sciences.http://dev.cdh.ucla.edu/digitalhumanities/2009/05/29/the-digital-humanities-manifesto-20/#0

Digital humanities manifesto 2.0

Page 25: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Digital Humanities implies the multi-purposing and multiple channeling of humanistic knowledge: no channel excludes the other. Its economy is abundance based, not one based upon scarcity.... though notions of humanistic research are everywhere under institutional pressure, there is (potentially) plenty for all. And, indeed, there is plenty to do.

ibid...

Page 26: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

The importance of not reading

“What can you do with a million books?” (Greg Crane)

“Although there is still a need for close-reading... we never don't not read” (John Unsworth)

A new synergy of methods: Corpus linguistics Pattern recognition Data mining

http://www3.isrl.illinois.edu/~unsworth/hownot2read.html

Page 27: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

How to not read

We need to find ways of cross-searching, decomposing, and re-composing

rich xml documents complex relational database structures simple presentation-focussed websites sound, image, video...

The challenge is to do this in an open and standards-compliant manner

And on a massive scale

Page 28: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Escaping from the text

From footnote to hypertext

Page 29: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique
Page 30: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

A classicalcase study

Page 31: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

CLAROS , for example

(Current) Partners University of Oxford: Faculty of Classics

Beazley Archive: documentation of pottery, jewels, etc. Lexicon of Greek Personal Names: attested names

University of Cologne Arachne Archive: data about sculpture

German Archaeological Institute, Berlin Images from archaeological sites

University of Paris X Lexicon Iconographicum Mythologiae Classicae:

Over 2 million records and images Four different database systems

Page 32: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

BeazleyArchive

DAIArachne

LGPN(Oxford)

LIMC(Paris)

.NET / ASP XSLT, PHP Java XSLT

Relationaldatabase:

MS SQL Server

XMLdatabase

Relationaldatabase:MySQL

Relationaldatabase:MySQL

Browser Browser Browser Browser

A mix of technologies...

Page 33: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

...but a common conceptual model

Page 34: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

34

How does it actually work?

Page 35: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

What makes this possible?

It's not rocket science! XML markup with a shared semantics (TEI) Appropriate use of new technologies (e.g.

Unicode, javascript) A willingness to open up our data

Page 36: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Rethinking the digital editionThe insights of critical editing/edition philology need to be re-discovered and re-applied in the new context

We need a new synergy of semiotics and hermeneutics

Combined with the traditional virtues of skepticism and empiricism

Page 37: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Components of the digital edition

Manuscript page images Annotated transcriptions Critical (synthetic) edition Modern translation and summary Notes, glossary, foreword, bibliography, etc. Manuscript descriptions and metadata “Factoids” about the real world

Page 38: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

The textual trinity Textual descriptions tend to focus on one of:

its linguistic nature (because texts are made of words used in particular ways)

its physical state (because texts are made up of glyphs arranged in particular ways)

its intentions (because texts are supposed to tell us something about the world)

Likewise, software tends to distinguish document management and production systems image management and production systems database systems

Page 39: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

Convergence

But the digital agenda requires us to mash these things up: for example to combine

a GIS database about places in the Aegean sea a historical gazeteer of placenames in the same

area a corpus of texts mentioning those placenames

TEI has recently expanded its scope to support this kind of convergence

Page 40: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

conclusions

Page 41: Du Literary and linguistic computing aux Digital Humanities : retour sur 40 ans de relations entre sciences humaines et informatique

A key role for the Humanities We know about textual objects

how is this discourse represented? what stories does it tell

We know about hermeneutics what does this discourse mean? what does it say aside from its

denotational content? This is our contribution to the

semantic web