17
06/27/08 1 Critical value-added services for e-journals on classics Proposal of a Microformat to encode Canonical Texts References Matteo Romanello, Univ. "Ca' Foscari" di Venezia ELPUB 2008, Toronto, June 26 th 2008

Presentatio @ ELPUB 2008, Toronto

Embed Size (px)

Citation preview

Page 1: Presentatio @ ELPUB 2008, Toronto

06/27/08 1

Critical value-added services for e-journals on classics

Proposal of a Microformat to encode Canonical Texts References

Matteo Romanello, Univ. "Ca' Foscari" di Venezia

ELPUB 2008, Toronto, June 26th 2008

Page 2: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

2 /17

Preliminary definitions

• Reference Linking:– In electronic publications the capability of transforming textual references

into links to the resource itself referred to.

• Primary and secondary sources in the field of classics:– PRIMARY: witnesses, texts of ancients authors

– SECONDARY: every commentary, monograph, journal article written about a primary source

• Canonical Text References:

Page 3: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

3 /17

Rationale

• In the Field of Classics:– E-publications still need to be bootstrapped

– Scholars need (and deserve) more effective research tools to be provided

– Necessary provide more (and more useful) value added services

• Switching from content holding to service providing: (Armbruster 2007)

– Favors the Open Access to research findings

– Value Added Services could make the OA economically sustainable

Page 4: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

4 /17

Critical Value Added Services

• What kind of services are most important for philologists and scholars of Classics?

– Every knowledge domain has some significant entry points to information

– Canonical Texts references are the meaningful ones to access publications in the domain of Classics

• In chemistry: name (and structure) of chemical compounds

• What services?– Reference Linking

– Reference Indexing: accessing journal articles and monographes on the basis of the canonical texts that are referred to within them

But... why a new linking framework?

Page 5: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

5 /17

Lacks of actual scenarios: search

1. Google search 2. L'Année Philologique search

• String-based search algorithms• No semantic understanding• No multilingual search• High recall, low precision

Page 6: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

6 /17

Lacks of actual scenarios: reference linking

• Tightly coupled approach

• Hard-linking (1 to 1 mapping)

• Linking system– Peculiar to a given project– Language-dependent– Closed system

<!-- Plut. Sol. 19.1 Canonical Text Reference -->

<a class="citation" target="_blank" href="http://www.perseus.tufts.edu/cgi-bin/ptext?lookup=Plut.+Sol.+19.1">Plut. <em>Sol.</em> 19.1</a>

Page 7: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

7 /17

Desired Scenario (building an E-scholium)

The first attempt... ...an e-scholium on the Web scale

Venetus A: Marcianus Graecus Z. 454, <http://chs75.harvard.edu/manuscripts/image-viewer>

Map of the Web (Jan 15 2005), <http://www.opte.org/maps/>

Page 8: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

8 /17

Proposal: loosely coupled approach• Desired linking system:

– Semantic

– Open-ended

– Language-neutral

• Layers separation:

1) Metadata contained in canonical text references

2) Protocols and Programming Interfaces (API)

3) Services

• Glue:

– Client side application

• Implementation:

– Microformats

– CTS (Canonical Texts Services) URNs

Page 9: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

9 /17

Microformats (http://microformats.org)

– from Blogs and Web 2.0

– Development: pattern and design principles by Microformats community

– Microformats: semantic compounds of Plain Old Semantic HTML (POSH)

tags

– Aimed at embedding semantic data in HTML elements

– Community interested in semantic encoding of citation formats: hBib draft, Microformat for bibliographic references to modern publications

– Examples of MFs:

• geo -> Geographical data

• hCard -> personal profile

• tag -> tags

• HCalendar -> events

• (*) CoinS -> embedding OpenURLs within an HTML element

Page 10: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

10 /17

CTS URNs

– Lie on the ontology FRBR (Functional Requirements for Bibliographic Records) model

– Provide URNs (Uniform Resource Names) for Canonical Texts References

• e.g. isbn:xxxxxxx

– URNs = unambiguous identifiers for

• Authors

– Homer: urn:cts:greekLit:tlg001

• Works

– Iliad: urn:cts:greekLit:tlg001

• Text passages

– Homer's Iliad book 1, line 1: urn:cts:greekLit:tlg0012.tlg001:1.1)

• Work Editions

– Venetus A: 1.1 Holy Cross / Furman Fellows edd.: urn:cts:greekLit:tlg001.tlg001.greekLit:msA-tei:1.1

• Work exemplars

Page 11: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

11 /17

Microformatted reference

1 <a class="citation" target="_blank" href=" http://www.perseus.tufts.edu/cgi-

2 bin/ptext?lookup=Plut.+Sol.+19.1">

3 <cite class="ctref">

4 <abbr class="ctauthor" title="urn:cts:greekLit:tlg0007">Plut.</abbr>

5 <em>

6 <abbr class="ctwork" title="urn:cts:greekLit:tlg0007.tlg007">Sol.</abbr>

7 </em>

8 <abbr class="range" title="19.1">XIX 1</abbr>

9 <abbr class="edition" title="Bernadotte Perin"/>

10 </cite>

11 </a>

• URNs and implicit information (e.g. Edition statement) are hidden by using Cascading Stylesheets (CSS) -> separation of content and presentation

Plut. Sol. XIX 1

Page 12: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

12 /17

Microformats suitability

• Least Power Rule (W3C):– RDF is the best technology to express semantic meaning

– Microformats are an already working solution

– Bottom-up way to Semantic Web

• Strong points:– Rapid and Wide success/adoption (suported by FF3 and IE8)

– More HTML-compliant than RDFa and eRDF

– Forward-compatibility with Resource Description Framework (RDF) through GRDDL

– Embedding -> embedded URNs may be discovered also by 'normal' (unsemantic) search engines

Page 13: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

13 /17

Prototype of a semantic Reference Linking feature

1

2

3

Page 14: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

14 /17

Semantic Reference Linking Service

1. Reference detection

2. Construction of the CTS-compliant query

3. Query against CTS repositories

4. Response parsing

5. Content display

Page 15: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

15 /17

Use of Microformatted references

• Where?– E-journal articles, e-publications to encode references

– Web feeds

– Combined with other Microformats (a conference presentation about the prologue of Homer's Iliad, ...)

• Use of CTS URNs:– as keywords in Dublin Core metadata descriptions

– as semantic tags in folksonomies and social applications (delicious, CiteULike...)

• Value added services to be built upon them:– Targeted search engines

– Aggregators of relevant information

– Piping of web feeds

– Reference linking

– Reference Indexing

Page 16: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

16 /17

Solution scalability

• How produce now microformatted references:– from XML encoding through XSLT (requires a lot of human work)

• The proposed solution should scale with The Million books library's (Crane 2006) dimensions

– In the Humanities currently several mass digitization projects (Google Book, JSTOR ...)

– Millions of books and journal articles will be soon available

• How to scale?– Building a semantic parser:

• using NLP (Natural Language Processing) techniques

– Named entity recognition

– Edit Distance

– Finite State automata

• should make possible the automatic markup of great amounts of texts

Page 17: Presentatio @ ELPUB 2008, Toronto

M. Romanello, A semantic linking framework to provide critical value-added services for E-journals on classics, ELPUB 2008, Toronto, June 26th 2008

17 /17

Thank you for your attention