Beyond the Page - users.ox.ac.ukusers.ox.ac.uk/~lou/Talks/beyondthepage.pdf · The message Today's digital library applications still focus on serving up virtual pages for the reader:

Beyond the PageLou Burnard

Oxford University Computing Services

The message●Today's digital library applications still focus on serving up virtual pages for the reader: the metaphor of the book is so pervasive that we can barely see it.

●But going digital is not only about producing cheaper and more accessible simulations of printed or painted pages.

●Digital applications should enable us to do more with a text than simply read it from beginning to end, or attach annotations to it for others to read.

The Knowledge Economy and the Information Society

"If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy." (Report of the National Science Foundation Blue Ribbon Panel on Cyberinfrastructure)

What is the digital content chain?

For publishers, the most dangerous aspect of digital content distribution is not piracy, but rather the lack of viable alternatives to piracy

By 2005, mankind will play with more than 100 billion gigabytes of content in the form of images, text, audio, video, graphics or a combination of all these. The task of managing this content and making sure it reaches the right customers at the right time will be a great challenge... If content assets are treated as SKUs (Stock Keeping Units) in supply chain parlance, and technology applications like Digital Assets Management are applied from an Inventory Optimization perspective, Media content managers will be able to handle the digital onslaught better.

Unifying the digital content commerce value chain The market for digital content services is determined by a relatively simple value chain. It shares the same simple characteristics as any physical product value chain in that product or service flows one way - adding value as it goes - and revenue flows the other.

Three simple truths

1. There is no going back : the knowledge infrastructure is now irrevocably digital

2. The business models of the knowledge infrastructure have changed irrevocably

3. The quantititative changes facilitated by digital technologies approximate qualitative change

1. Irrevocable digitality

● The sciences don't see this as an issue...● The humanities claim to be different

– hence the concept of “Humanities Computing”● But the objects of Humanities scholarship are now

digital, even if its methods are not

2. Business models for the knowledge infrastructure

● The war of the journal article● The book and the ebook● Other forms of communication

As a key resource of the 21st century, information goods might displace industrial goods as key drivers of markets. The foundation of the economic prosperity of developed countries is not only based on the efficient conversion of information to knowledge, but also in imparting this knowledge in the educational system. In this context, scientific libraries play a decisive role as a provider of scientific and technical information (STI). After introducing the 2-3-6-concept, an analysis concept based on a special value chain, the paper examines the roles of the different players - author, scientific library, publisher, bookstore and scientific association - involved in the production of STI. A structural model for the value chain of the STI market is developed to analyse in detail the opportunities for scientific libraries offered by technological progress within the current economic, legal and regulatory framework. The analysis reveals that none of the players can be expected to stay within their historical core competencies. Due to technical developments and associated changes in the structure of transaction costs, each player can cover more fields of value-adding activities. The roles of the different players are merging more and more. Further, analysis of current direct and indirect monetary flows reveals considerable potential for conflict.

The academic article

● a very special kind of document● a long history of exploitation, now ending● e.g. HC STC Report

– institutional repositories vs self-publishing● Fom sept 05, all RCUK-funded output should be

offered to an Institutional Repository

Books and ebooks

● Goodbye to the monograph ● Hello to the best-seller● The market for e-books remains unclear and its

potential remains untapped

What's new about the “e-book”?

● Continuing trends– more, and more varied, readers– more, and more varied, resources– broader cultural sensitivities– decanonization

● Convergence of media

Other forms of communication

● In private life– chatrooms and SMS– the blog– the ipod

● In public life– digital cultural initiatives: public good– influence of the web on other media e.g. radio

Changing roles for publishers

● From producer to aggregator● The RSS concept

3. Qualitative change: the next challenge

● How do we identify and enrich the content of our resources?

● How do we communicate the results?

What do these documents have in common?

Content enrichment: some ways

● Top down– semantic web, topic maps...

● add keywords and relationships derived from pre-existing ontologies (aka conceptual reference models)

– the basic business of humanities scholarship● Bottom up

– automatic keyphrase identification on linguistic evidence

The humanities tradition

● A focus on textual objects– how is this discourse represented?

● A focus on hermeneutics– what does this discourse mean?– what does it say aside from its denotational content?

● Uncertainty, doubt, skepticism● How useful are those skills?

towards the uncritical edition

● The insights of critical editing/edition philology need to be re-discovered and re-applied in a new context

● A fruitful synergy – semiotics– textuality– hermeneutics

qualititative differences in our interactions with digital resources

decentred, non-linear, fragmented, and associative modes of cognition are favoureddifferences of scalestatistical apprehension of decontextualized language useplasticity of format and presentation

cultural objects (are those which) require an explication

● Resources are invested with meaning by our use of them

● Explication confers value ● “We need to interpret interpretations more than

to interpret things” (Derrida, citing Montaigne)

Resources

digital resources

encoding

analysis

abstractmodel

digitization reifies an explication

● To encode a resource, it must first be decoded● Decoding implies selection of features● And their re-encoding in unambiguous terms

whose explication?

● the observer effect – a novelty in the sciences– central to the humanities

● hermeneutics 'r us– computers are for symbolic manipulation, not just for

calculation

transmitting the hermeneutic

● Scholarship depends on continuity● It is not enough to preserve an encoding● There must also be a continuity of comprehension

digital resources can only be preserved by migration

● This separation of medium and message implies – selection– potential information loss or transformation at

each step● Hence the need for media-independent

encodings

Frequently Answered Questions

● resource description or characterization● re-use of comon text for multiple purposes

– scholarly edition, school edition, speaking edition● alignment of differing “versions”

– e.g. transcription, sound, image– resource descriptors from different domains

● multiple annotations of a common text– may be additive or alternative

● authoring!

What do content providers need?

● We have – a good notation for textual structure and semantics

(XML)– a pretty complete character encoding (Unicode)– well-defined processing systems for doing cool stuff

with XML fragments ● What more do we need?

Interoperability

● We also need – to interchange and integrate metadata, texts, and tools

● between persons and machines● between machines and machines● across time and space

– to express formal constraints on our markup– (probably) to document the semantics of our markup

● This is the domain of the schema or DTD

What did using a schema ever do for us?


<list> <label> <item>


<list> <label> <item>list ((label, item)+ | item+)



figure.attributes.url = xsd:anyURI



if list is of type GLOSS, content must include labels






persons referenced by key must exist in the persons database





persons referenced by key must exist in the persons database

dont use the table element to represent glossaries!

The scope of “intelligent” markup– orthographic transcription– links to digital recordings, images…– proper nouns, dates, times, etc.– linguistic analyses (morphological, syntactic, discoursal...)– named entity recognition– cross references to other material on the topic– meta-textual status (correction etc)– editorial commentary and annotation– traditional bibliographic description– etc., etc., etc.

How can all these things co-exist?

Towards a new babel?

● If we have ● historical records using “Historical Markup Language”● linguistic data using “Linguistic Markup Language”● illustrations using a “Visual Markup Language”● metadata using (a) “Metadata Markup Language”

● how will we integrate resources or ask interesting questions?

One answer: the TEI

● TEI P5 takes a modular approach● It provides an integrated XML framework for

– definition of simple or complex text markup schemes– documentation of their use– generation of formal schemata to validate them– mapping of their concepts to other ontologies

● See http://www.tei-c.org and http://tei.sf.net

Semantic interoperability

● It is not hard to achieve interoperability between different markup schemes

● The real challenge is to relate their underlying semantics– where does “meaning” come from?– how does “translation” work?

● These are not new questions in the linguistic research community!

For example: terminology

● Termbanks work by defining – relationships between concepts– relationships between terms in different languages and

those concepts● Translators use termbanks to help them decide

what texts should mean

Translators also use corpora

● How do you (quickly) find out about the technical language of a domain for which no termbank exists?

● Apply the “walks-like-a-duck” procedure to build your own corpus

● This is often a reliable way of identifying new terminology (as document classification research shows)

Mapping of markup languages

● We can map mark-up semantics using standard conceptual reference models (aka ontologies)– ISO DIS 12620: Data Category Registry for linguistic

resources– CIDOC CRM (now also in ISO)

● But in markup, as elsewhere, praxis means more than syntax

What web semantics?

● denotation: what (we think) it says● connotation: what (we think) it appears to suggest● annotation: what we want to say about it

The next challengeDigital applications can restore the fugitive multilayered complexity of a textual tradition otherwise instantiated in a fragmented way by the individual physical copies of the traditional library

They can reconstruct the witnesses as evidence in an analysis of the changing semiotic systems underlying that complexity, for example in linguistic or stylistic terms

They can deliver components of the tradition for reintegration and synthesis into new forms

There is a world of difference between an "electronic library" and a "digital repository"

Documents

Beyond the Page - users.ox.ac.ukusers.ox.ac.uk/~lou/Talks/beyondthepage.pdf · The message Today's digital library applications still focus on serving up virtual pages for the reader: