33
Developing a Digital Library for the Humanities Gregory Crane ([email protected]) Winnick Family Chair in Technology and Entrepreneurship Professor of Classics Director, Perseus Digital Library Project Http://www.perseus.tufts.edu/About/grc.htm l

Developing a Digital Library for the Humanities

  • Upload
    tale

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Developing a Digital Library for the Humanities. Gregory Crane ([email protected]) Winnick Family Chair in Technology and Entrepreneurship Professor of Classics Director, Perseus Digital Library Project Http://www.perseus.tufts.edu/About/grc.html. Perseus Digital Library. - PowerPoint PPT Presentation

Citation preview

Page 1: Developing a Digital Library for the Humanities

Developing a Digital Library for the Humanities

• Gregory Crane ([email protected])

• Winnick Family Chair in Technology and EntrepreneurshipProfessor of ClassicsDirector, Perseus Digital Library ProjectHttp://www.perseus.tufts.edu/About/grc.html

Page 2: Developing a Digital Library for the Humanities

Perseus Digital Library

• On-going areas of Development• 1987: DL on Classical Greek Culture• 1993: History of Science• 1996: Began work on Latin and Rome• 1997: Early Modern English• 1999: History and Topography of London• 2000: Ancient Egyptian Giza• 2000: Slavery and the US Civil War

Page 3: Developing a Digital Library for the Humanities

Partner Institutions

• Max Planck Institute for the History of Science (Berlin)

• Museum of Fine Arts, Boston• Stoa Publishing Consortium• New Variorum Shakespeare Series, Modern

Language Association• Special Collections at Tufts, Brandeis, the

University of Pennsylvania

Page 4: Developing a Digital Library for the Humanities

On-Going Support

• National Endowment for the Humanities(DLI2, Preservation & Access, Education)

• National Science Foundation (DLI2)

• Fund for the Improvement of Postsecondary Education, Dept of Ed.

• Max Planck Society

Page 5: Developing a Digital Library for the Humanities

The Whole greater than the sum

• Tufts Health Sciences Database:

• An on-line Medical School Curriculum– First iteration: 70% of the value– Second Iteration: 90%– Third Iteration: 130%

• “Data” and “system” interact in increasingly dynamic ways.

Page 6: Developing a Digital Library for the Humanities

Persistent value over time &space

• How many ages hence Shall this our lofty scene be acted over,In states unborn and accents yet unknown?– Brutus in Julius Caesar

• How do we structure data for– Contemporary users we can’t directly

anticipate?– Systems not yet designed?

Page 7: Developing a Digital Library for the Humanities

Radically New Documents

• Reconstructions of Historical Spaces, e.g.– UVA’s Crystal Palace (London) – UCLA’s Rome and VR Lab

• Integrating Virtual Spaces with Sources– Museum of Fine Arts, Tombs at Giza– Greek Sculpture– The Streets of 19th Century London

Page 8: Developing a Digital Library for the Humanities

Traditional Docs Rethought

• Concordance: “Obsolete”

• Bibliographies — databases

• Encyclopedias — automatic linking

• Lexica and lexicography — – Automatically discovered semantic rel-s– THEN lexicographic work

Page 9: Developing a Digital Library for the Humanities

Development is two part

• Ultimate end: Radically new docs?

• Short term: Electronic Incunabula– New Variorum Shakespeare– Electronic Marlowe– Tallis Street Maps

• FIRST we thoroughly analyze what we have

• THEN radical redesign emerges

Page 10: Developing a Digital Library for the Humanities

Technology outruns Practice

• The 3D Reconstruction/Virtual Space– Cutting edge technology– Still nascent scholarly practices

• Mature Document Structures– Textual Notes: 1908 Richard 3– Traditional Text Citations: 1887 Commentary

Page 12: Developing a Digital Library for the Humanities

Current Paradigm: DL Dipomacy

• Monolithic Systems (e.g., Perseus!)– One way to view each document

• Intercommunication via metadata– DL as metadata for “opaque” objects

• Major Problems– Renting access, rather than collecting content– All publications become ephemera

Page 13: Developing a Digital Library for the Humanities

Three Strategies

• 1) The Editing Problem — – How do real authors create structured docs?

• 2) Developing Radically New Docs —– Archimedes DL on Mechanics– MFA Excavations at Giza

• 3) Radical Repurposing of Print– Bolles Collection on London

Page 14: Developing a Digital Library for the Humanities

Bolles Collection at Tufts

• documenting the history and topography of London and its environs – 35 "full-size” maps– 320 more specialized maps– 400 books (284 linear feet of shelf space) – 1,000 pamphlets. – “Paper Hypertexts”

• 10,000+ “extra illustrations”

Page 15: Developing a Digital Library for the Humanities

Bolles Electronic Archive

• A Testbed for the Perseus Digital Library

• “Level 5” TEI Encoded Full Text– Quotes, languages, proper names, dates, money

• High-end OCR and Double Keyboarding– OCR ideal for some but not all– Keyboarding much the best — money

permitting

Page 16: Developing a Digital Library for the Humanities

Bolles — Initial Texts

• Five Million Words now in L5 TEI– Will exceed 10 million by year’s end

• Surveys of London History and Topography– Stow, Maitland, Wilkinson, Allen, Thornbury

• Commentary on social conditions– Mayhew, Archer, Hollingshead, Booth

• Literary works with London as backdrop– Defoe, Dickens, “Sherlock Holmes”

Page 17: Developing a Digital Library for the Humanities

Images

• 10,000 Grayscale Images– Mainly engravings of people and places– “opportunistic” metadata (=captions & context)

• 2,400 Contemporary Images– Well catalogued and geo-referenced

• QTVR Panoramas

• 70 Tallis Map “Elevations”

Page 18: Developing a Digital Library for the Humanities

Geospatial Data

• Bartholomew 1:5000 Data set for London– Modern data as reference and interchange

• Historical maps georeferenced to Barth. Data– 10 so far (c. 2 hours each)– Urban maps do not easily “line up”– How to create an historical GIS?

• GPS Waypoints– As of May 2000, good to within 10m. or better

Page 19: Developing a Digital Library for the Humanities

Feature Extraction

• Easy identification: Dates, Money• Known Keywords and Classes

– The Getty TGN (1 m. places and lon/lats)– The Bartholomew Gazzetteer (10,000)– Indices to Maps (e.g. Cruchley 1826, 4200)– The Index/Abstract of the DNB (30,000+)

• Clean-up with rule based Proper Name classification: Mr NAME; NAME street

Page 20: Developing a Digital Library for the Humanities

“Runtime” Links

• Runtime links supplement in file tagging

• 1) Where metadata is less precise– Metadata from unedited headers and captions

• 2) Where the source does not contain data– If no dates, then scan for them

• Use tagging for “high confidence” data– Ideal situation: automated tags hand proofed

Page 21: Developing a Digital Library for the Humanities

Strategic Questions

• “Editions” a foundation for scholarship

• Where does the editor’s job start?

• How does editor’s job change?

• How do we define “Corpus Editors”?– People with domain expertise in content– Expertise in software and Library systems

• Need for scholarly automated processing

Page 23: Developing a Digital Library for the Humanities

Further Work

• Disambig., auto-cataloguing, Time/Space

• VR Interface: Tallis 1, 2 and Headset

• New challenging document types

• Geospatial Data in : Patterson's Journeys

• Urban data in Booth and City Directories.– Tallis Map for Oxford Street with overall and

more focused directories.

Page 24: Developing a Digital Library for the Humanities

Research Projects

• Robert Jacob and VR Interfaces– Figure: Tallis VR Conversion 1.

– Figure: Tallis VR Conversion 2..

– Figure: Head mounted VR navigation.

• Holly Taylor and Cognitive Analysis

– Spatial Cognition

– Text Comprehension

Page 25: Developing a Digital Library for the Humanities

Conclusions

• Baseline Knowledge Environment– Practical and useful

• “Corpus Editions”

• Midway between editions and library digitiz.

• Requires a new config. of skills

• The “Diplomatic” Federated DL model weak– Need access to full data for visualizations

Page 26: Developing a Digital Library for the Humanities

Perseus Document Manager

• Works with XML– Multiple granularities: sentence, section,

chapter– Deals with overlapping doc hierarchies– Combines internal and external metadata– Our metadata in RDF and can be XML

• Since all data and metadata —> XML– Well suited to Federated DL Applications

Page 27: Developing a Digital Library for the Humanities

Scalable DL• SGML/XML need translation for display

– Can’t maintain stylesheets for millions of docs

• Intelligent display of various DTDs– “Cheaply” acquires XML/SGML docs – Individual Custom Style sheets allowed

• Integration of Geo-spatial Data

• Multilingual support, feature extraction

• Integrated multi-resolution image support

Page 28: Developing a Digital Library for the Humanities

Perseus Document Manager

• Short term development:– Collecting new datasets to the Perseus DL

• (leveraging Internet 2 investment)

– Adding value: e.g.,• Sources for the History of Mechanics (Max Planck)

• Duke Databank of Documentary Papyri

• Books, maps etc. on the City of London

• Shakespeare and Early modern English

Page 29: Developing a Digital Library for the Humanities

Perseus Document Manager

• Longer Term: Distribution of the System

• How best to maintain and expand the system?– Open source?– Commercial Licensing?– Wait for third party to match PDM features?

Page 30: Developing a Digital Library for the Humanities

Automatic Integration

• Content Analysis: Various Languages• Time: extracting and visualizing dates• Space: Integrating historical Geographic Data• Names: establishing authority lists

– Getty Thesaurus of Geographic Names • Names and Coordinates

– Encyclopedias: e.g., Harpers, DNB• Names and Dates

Page 31: Developing a Digital Library for the Humanities

Our Research Agenda

• Developing a self-sustaining models– Publication of documents– Maintenance of software

• Exploring Problem Sets in different domains– E.g., sparse data (antiquity) vs. rich (London)

• Helping humanists rethink their position– Reaching new audiences– Changing habits

Page 32: Developing a Digital Library for the Humanities

Technology matters: e.g.19th c. Printing in England

• 20th Century Radio/Film/TV: ambiguous

• 19th Century Print Technology– 1810: c. 10,000 copies for a successful book

• Audience for literature mainly upper class

– 1850: hundreds of thousands• Audience vastly expands

• Huge numbers read Dickens, etc.

• 21st Century Network Technology?

Page 33: Developing a Digital Library for the Humanities

The Future?

• Two models:– Reproduce current world in new form

• Narrow/expensive distribution

– Think about how that world may change• Broader/inexpensive distribution

• What happens now sets the stage for …– “talk show” cyber culture? or– a new dispersal of intellectual life?