Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Beyond the PageLou Burnard
Oxford University Computing Services
The message●Today's digital library applications still focus on serving up virtual pages for the reader: the metaphor of the book is so pervasive that we can barely see it.
●But going digital is not only about producing cheaper and more accessible simulations of printed or painted pages.
●Digital applications should enable us to do more with a text than simply read it from beginning to end, or attach annotations to it for others to read.
The Knowledge Economy and the Information Society
"If infrastructure is required for an industrial economy, then we could say that cyberinfrastructure is required for a knowledge economy." (Report of the National Science Foundation Blue Ribbon Panel on Cyberinfrastructure)
What is the digital content chain?
For publishers, the most dangerous aspect of digital content distribution is not piracy, but rather the lack of viable alternatives to piracy
By 2005, mankind will play with more than 100 billion gigabytes of content in the form of images, text, audio, video, graphics or a combination of all these. The task of managing this content and making sure it reaches the right customers at the right time will be a great challenge... If content assets are treated as SKUs (Stock Keeping Units) in supply chain parlance, and technology applications like Digital Assets Management are applied from an Inventory Optimization perspective, Media content managers will be able to handle the digital onslaught better.
Unifying the digital content commerce value chain The market for digital content services is determined by a relatively simple value chain. It shares the same simple characteristics as any physical product value chain in that product or service flows one way - adding value as it goes - and revenue flows the other.
Three simple truths
1. There is no going back : the knowledge infrastructure is now irrevocably digital
2. The business models of the knowledge infrastructure have changed irrevocably
3. The quantititative changes facilitated by digital technologies approximate qualitative change
1. Irrevocable digitality
● The sciences don't see this as an issue...● The humanities claim to be different
– hence the concept of “Humanities Computing”● But the objects of Humanities scholarship are now
digital, even if its methods are not
2. Business models for the knowledge infrastructure
● The war of the journal article● The book and the ebook● Other forms of communication
As a key resource of the 21st century, information goods might displace industrial goods as key drivers of markets. The foundation of the economic prosperity of developed countries is not only based on the efficient conversion of information to knowledge, but also in imparting this knowledge in the educational system. In this context, scientific libraries play a decisive role as a provider of scientific and technical information (STI). After introducing the 2-3-6-concept, an analysis concept based on a special value chain, the paper examines the roles of the different players - author, scientific library, publisher, bookstore and scientific association - involved in the production of STI. A structural model for the value chain of the STI market is developed to analyse in detail the opportunities for scientific libraries offered by technological progress within the current economic, legal and regulatory framework. The analysis reveals that none of the players can be expected to stay within their historical core competencies. Due to technical developments and associated changes in the structure of transaction costs, each player can cover more fields of value-adding activities. The roles of the different players are merging more and more. Further, analysis of current direct and indirect monetary flows reveals considerable potential for conflict.
The academic article
● a very special kind of document● a long history of exploitation, now ending● e.g. HC STC Report
– institutional repositories vs self-publishing● Fom sept 05, all RCUK-funded output should be
offered to an Institutional Repository
Books and ebooks
● Goodbye to the monograph ● Hello to the best-seller● The market for e-books remains unclear and its
potential remains untapped
What's new about the “e-book”?
● Continuing trends– more, and more varied, readers– more, and more varied, resources– broader cultural sensitivities– decanonization
● Convergence of media
Other forms of communication
● In private life– chatrooms and SMS– the blog– the ipod
● In public life– digital cultural initiatives: public good– influence of the web on other media e.g. radio
Changing roles for publishers
● From producer to aggregator● The RSS concept
3. Qualitative change: the next challenge
● How do we identify and enrich the content of our resources?
● How do we communicate the results?
What do these documents have in common?
Content enrichment: some ways
● Top down– semantic web, topic maps...
● add keywords and relationships derived from pre-existing ontologies (aka conceptual reference models)
– the basic business of humanities scholarship● Bottom up
– automatic keyphrase identification on linguistic evidence
The humanities tradition
● A focus on textual objects– how is this discourse represented?
● A focus on hermeneutics– what does this discourse mean?– what does it say aside from its denotational content?
● Uncertainty, doubt, skepticism● How useful are those skills?
towards the uncritical edition
● The insights of critical editing/edition philology need to be re-discovered and re-applied in a new context
● A fruitful synergy – semiotics– textuality– hermeneutics
qualititative differences in our interactions with digital resources
decentred, non-linear, fragmented, and associative modes of cognition are favoureddifferences of scalestatistical apprehension of decontextualized language useplasticity of format and presentation
cultural objects (are those which) require an explication
● Resources are invested with meaning by our use of them
● Explication confers value ● “We need to interpret interpretations more than
to interpret things” (Derrida, citing Montaigne)
Resources
digital resources
encoding
analysis
abstractmodel
digitization reifies an explication
● To encode a resource, it must first be decoded● Decoding implies selection of features● And their re-encoding in unambiguous terms
whose explication?
● the observer effect – a novelty in the sciences– central to the humanities
● hermeneutics 'r us– computers are for symbolic manipulation, not just for
calculation
transmitting the hermeneutic
● Scholarship depends on continuity● It is not enough to preserve an encoding● There must also be a continuity of comprehension
digital resources can only be preserved by migration
● This separation of medium and message implies – selection– potential information loss or transformation at
each step● Hence the need for media-independent
encodings
Frequently Answered Questions
● resource description or characterization● re-use of comon text for multiple purposes
– scholarly edition, school edition, speaking edition● alignment of differing “versions”
– e.g. transcription, sound, image– resource descriptors from different domains
● multiple annotations of a common text– may be additive or alternative
● authoring!
What do content providers need?
● We have – a good notation for textual structure and semantics
(XML)– a pretty complete character encoding (Unicode)– well-defined processing systems for doing cool stuff
with XML fragments ● What more do we need?
Interoperability
● We also need – to interchange and integrate metadata, texts, and tools
● between persons and machines● between machines and machines● across time and space
– to express formal constraints on our markup– (probably) to document the semantics of our markup
● This is the domain of the schema or DTD
What did using a schema ever do for us?
What did using a schema ever do for us?
<list> <label> <item>
What did using a schema ever do for us?
<list> <label> <item>list ((label, item)+ | item+)
What did using a schema ever do for us?
<list> <label> <item>list ((label, item)+ | item+)
figure.attributes.url = xsd:anyURI
What did using a schema ever do for us?
<list> <label> <item>list ((label, item)+ | item+)
if list is of type GLOSS, content must include labels
figure.attributes.url = xsd:anyURI
What did using a schema ever do for us?
<list> <label> <item>list ((label, item)+ | item+)
if list is of type GLOSS, content must include labels
figure.attributes.url = xsd:anyURI
persons referenced by key must exist in the persons database
What did using a schema ever do for us?
<list> <label> <item>list ((label, item)+ | item+)
if list is of type GLOSS, content must include labels
figure.attributes.url = xsd:anyURI
persons referenced by key must exist in the persons database
dont use the table element to represent glossaries!
The scope of “intelligent” markup– orthographic transcription– links to digital recordings, images…– proper nouns, dates, times, etc.– linguistic analyses (morphological, syntactic, discoursal...)– named entity recognition– cross references to other material on the topic– meta-textual status (correction etc)– editorial commentary and annotation– traditional bibliographic description– etc., etc., etc.
How can all these things co-exist?
Towards a new babel?
● If we have ● historical records using “Historical Markup Language”● linguistic data using “Linguistic Markup Language”● illustrations using a “Visual Markup Language”● metadata using (a) “Metadata Markup Language”
● how will we integrate resources or ask interesting questions?
One answer: the TEI
● TEI P5 takes a modular approach● It provides an integrated XML framework for
– definition of simple or complex text markup schemes– documentation of their use– generation of formal schemata to validate them– mapping of their concepts to other ontologies
● See http://www.tei-c.org and http://tei.sf.net
Semantic interoperability
● It is not hard to achieve interoperability between different markup schemes
● The real challenge is to relate their underlying semantics– where does “meaning” come from?– how does “translation” work?
● These are not new questions in the linguistic research community!
For example: terminology
● Termbanks work by defining – relationships between concepts– relationships between terms in different languages and
those concepts● Translators use termbanks to help them decide
what texts should mean
Translators also use corpora
● How do you (quickly) find out about the technical language of a domain for which no termbank exists?
● Apply the “walks-like-a-duck” procedure to build your own corpus
● This is often a reliable way of identifying new terminology (as document classification research shows)
Mapping of markup languages
● We can map mark-up semantics using standard conceptual reference models (aka ontologies)– ISO DIS 12620: Data Category Registry for linguistic
resources– CIDOC CRM (now also in ISO)
● But in markup, as elsewhere, praxis means more than syntax
What web semantics?
● denotation: what (we think) it says● connotation: what (we think) it appears to suggest● annotation: what we want to say about it
The next challengeDigital applications can restore the fugitive multilayered complexity of a textual tradition otherwise instantiated in a fragmented way by the individual physical copies of the traditional library
They can reconstruct the witnesses as evidence in an analysis of the changing semiotic systems underlying that complexity, for example in linguistic or stylistic terms
They can deliver components of the tradition for reintegration and synthesis into new forms
There is a world of difference between an "electronic library" and a "digital repository"