13
Natural Language Processing in Natural Language Processing in Archaeology: disciplinary impact Archaeology: disciplinary impact and beyond. and beyond. Arts and Humanities E-Science Arts and Humanities E-Science Project Meeting, Project Meeting, UCL, London, June 8 UCL, London, June 8 th th 2009. 2009. Work package 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; primary facets - What, Where, When . Work package 2&3 – Natural language processing /Data-mining of Grey Literature, Data-mining of Historic Literature; plus geoXwalk A quick reminder about Archaeotools……….. A quick reminder about Archaeotools………..

Natural Language Processing in Archaeology: disciplinary impact and beyond. Arts and Humanities E-Science Project Meeting, UCL, London, June 8 th 2009

Embed Size (px)

Citation preview

Natural Language Processing in Archaeology: Natural Language Processing in Archaeology: disciplinary impact and beyond.disciplinary impact and beyond.

Arts and Humanities E-Science Project Meeting, Arts and Humanities E-Science Project Meeting, UCL, London, June 8UCL, London, June 8thth 2009. 2009.

Work package 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; primary facets - What, Where, When .

Work package 2&3 – Natural language processing /Data-mining of Grey Literature, Data-mining of Historic Literature; plus geoXwalk

A quick reminder about Archaeotools………..A quick reminder about Archaeotools………..

UP TO DATE VERSION OF THIS PLEASE….

“WHAT”

• Records that have no subject information

• Records that use terms not found in TMT, so these records cannot be indexed (6,442 unique terms)

Records (1,001,407)

19,269 records (2%)

Records (1,001,407)

101,507 records (10.1%)

“WHEN”

• Records that have no temporal information

• Records that use period terms not found in MIDAS so these records cannot be indexed (457 types of irresolvable dates)

Records (1,001,407)

292,793 records (29.2%)

Records (1,001,407)

114,505 (11.4%)

1066, 1001-1100,11th Centuary, C11, 11C, Eleventh Century

“WHERE”

• Records that have no spatial information

• Records that use terms not found in CDP, so these records cannot be indexed.

Records (1,001,407)

11,126(1.1%)

Records (1,001,407)

245,601 records (24.5%)

• Vast majority of UK archaeological work undertaken as part of the planning process, administered by local authority archaeologists.

• 4,500 fieldwork events each year in England alone.

• Use of different recording standards for events recording.

University Researchers

Local authority curators

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

DownloadsDownloads per quarter 2005-2009

OASIS - Grey Literature Library

EH and University of Glamorgan SKOS browser project.

University of Edinburgh,Edina - Digimap

GeoXwalk service.

TALM – Transatlantic Archaeological Literature Mining – ADS, University of Sheffield, Arizona State University and Arkansas University (geosciences, computer science and ‘Digital Antiquity’, JISC, NEH & NSF.)

CDI Type II proposal - Symbiosis of automated knowledge extraction from scientific text and virtual community contribution leading to reasoning based scientific discovery (NSF, awaiting decision).

“By enormously increasing academic access to, and thereby academic use and appreciation of the results of archaeological work done in cultural resource management settings, it would foster a rapprochement between academic and consulting archaeology, resulting in more productive research in both sectors….the same tools would also serve to make the management of cultural resources more effective, because managers (largely in government) could make decisions with an improved ability to assess what is known, what is contested, and what is little investigated in a given management context”. Chitta Baral, ASU