Upload
thyra
View
44
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Digital Libraries of the Future: Integration through the 5S Framework Sixth National Russian Research Conference Sep. 28 - Oct. 1, 2004. Edward A. Fox Virginia Tech, Blacksburg, VA 24061 USA [email protected] http://fox.cs.vt.edu/talks. Acknowledgements: Sponsors. Conference Organizers and Staff - PowerPoint PPT Presentation
Citation preview
Digital Libraries of the Future: Integration through the 5S Framework
Sixth National Russian Research ConferenceSep. 28 - Oct. 1, 2004
Edward A. FoxVirginia Tech, Blacksburg, VA 24061 [email protected] http://fox.cs.vt.edu/talks
Acknowledgements: Sponsors Conference Organizers and Staff NSF Grants
CITIDEL: DUE-0121679 DL-in-a-box: DUE-0136690 ETANA: ITR-0325579 GetSmart: DUE-0121741 OAD: IIS-0086227
Others AOL, Capes (Brazilian funding agency) ASOR, CWRU, ETANA, Vanderbilt U. ACM, Adobe, CONACyT, DFG, IBM, Microsoft, NASA,
NDLTD, NLM, OCLC, SUN, US Dept. of Ed. (FIPSE)
Acknowledgements: Faculty, Staff Lillian Cassel, Debra Dudley, Roger
Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, Douglas Knight, Deborah Knox, John Impagliazzo, Gail McMillan, Manuel Perez, Naren Ramakrishnan, Layne Watson, …
Acknowledgements: Students Yuxin Chen, Fernando Das Neves,
Shahrooz Feizabadi, Marcos Goncalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, …
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
As data, information, and knowledge play increasingly central roles … digital library research should focus on:
Increasing the scope and scale of information resources and services;
Employing context at the individual, community, and societal levels to improve performance;
Developing algorithms and strategies for transforming data into actionable information;
Demonstrating the integration of information spaces into everyday life; and
Improving availability, accessibility, and, thereby, productivity.
An appropriate infrastructure program will provide sustainability of digital knowledge resources among five dimensions:
Acquisition of new information resources; Effective access mechanisms that span media
type, mode, and language; Facilities to leverage the utilization of
humankind’s knowledge resources; Assured stewardship over humanity’s
scholarly and cultural legacy; and Efficient and accountable management of
systems, services, and resources.
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
Integration: Rationale We can read any paper book (ignoring
limitations of language, vision, …). Scholarship requires access, analysis, and
synthesis spanning disciplines and sources. New theories, systems, and services build
upon our past accomplishments. Our “Small World” and the “Internet Age”
demand that we, and our computers, work together and interoperate.
Integration: Urgency, Longevity If we collect, capture, acquire, or produce
information, will it be usable in 100 years?
NSF Digital Archiving Program Library of Congress National Digital
Information Infrastructure and Preservation Program
Integration: Standards Standards don’t exist in many areas. Standards that do exist create a jumble:
Conversion between (without loss?) Bridging gaps (Z39.50 -> OAI) Managing legacy content and systems
Standards in DLs have focused on: Metadata (e.g., Dublin Core) Architecture (e.g., handles, repositories)
Integration: Challenges “Semantic Web” is vision, not reality. How can we integrate without a theory? How can we interoperate without a
common framework? How can we have a science of DLs if we
lack agreement on definitions (so we can reason and discuss) and measures of quality (so we can compare and improve)?
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
Motivation DLs are not benefiting from formal theories
as have other CS fields: DB, IR, PL, etc. DL construction: difficult, ad-hoc, lacking
support for tailoring/customization Conceptual modeling, requirements analysis,
and methodological approaches are rarely supported in DL development. Lack of specific DL models, formalisms,
languages
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
DL Services/Activities Taxonomy
BrowsingCollaboratingCustomizingFilteringProviding accessRecommendingRequestingSearchingVisualizing
AnnotatingClassifyingClusteringEvaluatingExtractingIndexing
MeasuringPublicizing
RatingReviewing (peer)
SurveyingTranslating (language)
ConservingConverting
Copying/ReplicatingEmulatingRenewing
Translating (format)
AcquiringCataloging
Crawling (focused)DescribingDigitizingFederatingHarvestingPurchasingSubmitting
PreservationalCreational
AddValue
Repository-Building
Information SatisfactionServices
Infrastructure Services
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
5S Model – Informally
Digital libraries are complex information systems that:• help satisfy info needs of users (societies)• provide info services (scenarios)• organize info in usable ways (structures)• present info in usable ways (spaces)• communicate info with users (streams)
5S in Archaeology - Structures
Streams
Structures
Spaces
Scenarios
Societies
5S
RegionsExample: Madaba Plains
5SsModels Examples Objectives
Stream Text; video; audio; image Describes properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
Structures Collection; catalog; hypertext; document; metadata; organization tools
Specifies organizational aspects of the DL content
Spaces Measure; measurable, topological, vector, probabilistic
Defines logical and presentational views of several DL components
Scenarios Searching, browsing, recommending,
Details the behavior of DL services
Societies Service managers, learners, Teachers, etc.
Defines managers, responsible for running DL services; actors, that use those services; and relationships among them
Background: The 5S ModelStreams
Scenarios
Societies
Structures
Spaces
Static /Passive
Dynamic /Active
BackgroundStreams
text
audio
image
video do mss
R
C DMcIc
Se
Sc
e
SM
Ac
op
Scenarios
Societies
Top
Pr
Metric
Measurable
Measure
Structures
Spaces
Vec
ms
Background: 5S and DL formal definitions and compositions (April 2004 TOIS)
5S
structures (d.10)streams (d.9) spaces (d.18) scenarios (d.21) societies (d. 24)
structural metadataspecification(d.25)
descriptive metadataspecification(d.26)
repository(d. 33)
collection (d. 31)
(d.34)indexingservice
structured stream (d.29)
digitalobject (d.30)
metadata catalog (d.32)
browsingservice
(d.37)
searchingservice (d.35)
digital library(minimal) (d. 38)
services (d.22)
sequence (d. 3)
graph (d. 6)function (d. 2)
measurable(d.12), measure(d.13), probability (d.14), vector (d.15), topological (d.16) spaces
event (d.10)state (d. 18)
hypertext(d.36)
sequence (d. 3)
transmission(d.23)
relation (d. 1) language (d.5)
grammar (d. 7)
tuple (d. 4)*
Glossary: Concepts in the Minimal DL and Representing SymbolsConcept Symbol Digital object do Metadata specification ms Set of metadata specifications mss Collection C Catalog DMC Repository S Event e Scenario Sc Services Se Actor Ac Service Manager SM Operation op Society Soc
The 5S Formal Model A digital library is a 10-tuple (Streams, Structs, Sps, Scs,
St2, Coll, Cat, Rep, Serv, Soc) in which: Streams is a set of streams, which are sequences of
arbitrary types (e.g., bits, characters, pixels, frames);
Structs is a set of structures, which are tuples, (G, ), where G= (V, E) is a directed graph and : (V E) L is a labeling function;
Sps is a set of spaces each of which can be a measurable, measure, probability, topological, metric, or vector space.
The 5S Formal Model Scs = {sc1, sc2, …, scd} is a set of scenarios where each sck =
<e1k({p1k}), e2k({p2k}), …, ed_kk({pd_kk})> is a sequence of events that also can have a number of parameters {pik}. Events represent changes in computational states; parameters represent specific locations in a state and respective values.
St2 is a set of functions : V Streams ( ) that associate nodes of a structure with a pair of natural numbers (a, b) corresponding to a portion (span/segment) of a stream.
Coll = {C1, C2, …, Cf} is a set of DL collections where each DL
collection Ck = {do1k, do2k, …, dof_kk} is a set of digital objects.
Each digital object dok = (hk, Stm1k, Stt2k, k) is a tuple where
Stm1k Streams, Stt2k Structs, k St2, and hk is a handle
which represents a unique identifier for the object.
The 5S Formal Model
Cat = {DMC_1, DMC_2, …, DMC_f} is a set of metadata catalogs for Coll where each metadata catalog DMC_k = {(h, msshk)}, and msshk = {mshk1, mshk2, …, mshkn_hk} is a set of descriptive metadata specifications. Each descriptive metadata specification mshki is a structure with atomic values (e.g., numbers, dates, strings) associated with nodes.
A repository Rep = {(Ci, DMC_i)} (i=1 to f) is a set of pairs (collection, metadata catalog); it is assumed there exists operations to manipulate them (e.g., get, store, delete).
The 5S Formal Model Serv = {Se1, Se2, …, Ses} is a set of services where each service Sek = {sc1k,
.., scs_kk} is described by a set of related scenarios.
Soc = (C, R) where C is a set of communities and R is a set of relationships among communities. SM = {sm1, sm2, …, smj}, and Ac = {ac1, ac2, …, acr } are two such communities where the former is a set of service managers responsible for running DL services and the latter is a set of actors that use those services. Being basically an electronic entity, a member smk of SM
distinguishes itself from actors by defining or implementing a set of operations {op1k, op2k, …, opnk} smk. Each operation opik of smk is characterized by a triple (nik, sigik, impik), where nik is the operation’s name, sigik is the operation’s signature (which includes the operation’s input parameters and output), and impik is the operation’s implementation. These operations define the capabilities of a service manager smk.
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
Motivation Previous definitions emphasize syntactic aspects, i.e., how
digital library concepts are composed or built from previously defined concepts.
Complete a formal DL theory by: Making explicit the implicit relationships that exist among the DL
formal concepts defined in [Gonc04] Providing set of axiomatic rules that precisely define and constrain
the semantics of the relationships Categorizing and classifying DL services on the basis of the
ontology Research questions
How should DL services be built from the other DL components
Which are the fundamental and elementary DL services ? How can services be built/composed from other DL services?
We will explore semantic relations and rules of the DL domain by using ontologies.
Digital Library Formal Ontology An ontology is a tuple = (Ontol_Concepts,
Ontol_Rels) where: Ontol_Concepts is a family of ontological concepts, Ontol_Rels is a family of relations. Relations in Ontol_Rels are operationally realized by
one or more rules (e.g., first-order logic axioms) which intentionally specify or constrain which elements of a concept can participate in a relation.
Ontol_Rules is a family of rules of a particular ontology.
Relationships Intra-Model
Video contains Audio (MM) Metadata Catalog describes Collection (LIS) Probabilistic Space is_a Measure Space Service extends Service (reuse) Service Manager inherits_from Service Manager (OO)
Inter-Model Event executes Operation Actor participates_in Scenario Service Manager runs Service Service employs/produces Streams Structures
Spaces
Digital Library Formal Ontology
Digital Library Formal Ontology Concepts: {Se, Sc, e}; Key: Se = service; Sc = scenario; e =
event. Relations:
contains Sc e Symbolic Rule. x, y (x contains y Sc(x) e(y) j: (j x.Dom y = x(j)) )
precedes e e Sc; happens_before e e Sc Symbolic Rule 1. x, y, z (x precedesz y e(x) e(y) Sc(z) i, j: (z contains x
z contains y x = z(i) y=z(j) i + 1 = j)) Symbolic Rule 2. x, y, z (x happens_beforez y e(x) e(y) Sc(z) i, j: (z
contains x z contains y x = z(i) y=z(j) i < j)) includes Se Se Sc Sc; extends Se Se Sc Sc
Symbolic Rule 1. x, y (x includes y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p precedesy q p precedesx q))
Symbolic Rule 2. x, y (x extends y Sc(x) Sc(y) (z: e(z) y contains z x contains z) (p, q: e(p) e(q) p happens_beforey q p happens_beforex q))
Symbolic Rule 3. x, y (x extends y Se(x) Se(y) y x (x y p, q: Sc(p) Sc(q) p x q y p extends q))
Digital Library Formal OntologyStreams
text
audio
image
video do mss
R
C DMc
describes
stores
is_version_of
Ic
Se
Sc
e
extendsreuses
SM
Ac
opexecutes
participates_in
recipient
runs
Scenarios
Societies
inherits_from/includes
association
uses
Top
Pr Metric
Measurable
Measure
describes
employsproduces
employsproduces
employsproduces
Structures
Spaces
Vec
belongs_to
contains
ms
is_ais_a
precedeshappens_before
is_a
redefinesinvokes
contains
contains
Digital Library Formal Ontology Consistency Rules
Catalog-Collection A complete catalog has at least one set of metadata
specifications for each digital object in the collection it describes (surjective partial function).
In a consistent catalog, each set of metadata specifications describes (exactly) one digital object in the related collection (total function).
Scenarios-Society A scenario x is consistent with regards to a set of
service managers Y if each operation executed by each event in the scenario is defined in some service manager y Y.
Digital Library Formal Ontology Characterizing employs/produces relationships In the table each service is characterized by
parameters (input, output) of the initial and final events of the scenarios that compose those services
All other previous definitions and keys apply here.
That set is complemented with the following definitions:
Services Related Definitions A query q is the representation of user interest or
information need. Hyptxt is an hypertext; wherein an anchor is a node. A log_entry is a descriptive metadata specification about
an event of a scenario. Let {doi} = {doi1, doi2,…, doin } be a set of digital
objects and Ct = {c1, c2,…,cn} be a set of labels for categories. A classifier classCt: {doi} 2Ct is a function that maps a digital object to a set of categories.
A cluster cluk = {do1k, do2k, …, donk} is a subset of a set of digital objects.
Service User input Other Service Input
Output
Acquiring {doi} Ci Cj
Browsing anchor Hyptxtk {doi}
Cataloging doi, msi_k (hi, mssi_m) (hi, mssi_(m+k))
Classifying doi classCt (doi, {ck_i})
Clustering {doi} X {cluk_i}
Expanding (query) {doi} IC_i, qi qj
Indexing Ci none IC_i
Linking doi Hyptxtk Hyptxtik
Logging none ei({pi}) log_entryi
Rating doi ,acj none {(doi,acj,rk)}
Searching q, Ci IC_i {dok}
Visualizing {doi} tfrk spik
Infrastructure Services: dealing with basic concepts such as collections and catalogs Repository-Building: create collections (digital objects)
and/or catalogs (metadata specifications). Preservational: generate instances by copying collections
(digital objects) or transforming (converting/translating) objects into different formats for preservation purposes
Add_Value: either aggregate value/information to collections (digital objects) or connect objects together.
Information Satisfaction: dealing with higher level societal requirements
KEY in next slide: Fundamental: minimal set of services or essential to existence
of a DL Composite DL service: takes input from some other service;
otherwise the service is called elementary.
Applications: A Taxonomy of DL Services
Searching Browsing
Ic
AcquiringUser interests/needs
query anchor
UniversalCollection
Ci
DMCi
Indexing
Society
actor
DescribingCataloguing
Linking
Hypertext
Infra-structure Services(fundamental)
Information Satisfaction Services(fundamental)
criteria sortOrder
{doi}
Submitting
Authoring
dok
mskj
Application: A Taxonomy of DL Services
Applications: A Taxonomy of DL Services
SearchingBrowsing
queryanchor
Society
actor
criteria sortOrder
Ck, {doi}
Recommending Filtering Binding Visualizing Expanding query
user model/expr Classifier/expr {doj}
{doR} {doF}
bi
InformationSatisfaction Services
spV query’
fundamental
Rating/Reviewing (peer)
Training
Infrastructure
Services (Add_Value)
composite
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
5S model/ 5S languageModel Formal definition Objective within 5SL
Streams Sequences of arbitrary types Describe properties of the DL content such as encoding and language for textual material or particular forms of multimedia data
Structures Labeled directed graphs Specify organizational aspects of the DL (e.g., structural /descriptive metadata, hypertexts, taxonomies, classification schemes)
Spaces Sets of objects and operations on those objects that obey specific constraints
Define logical and presentational views of several components.
Scenarios Sequences of events that modify states of a computation in order to accomplish some functional requirement
Detail the behavior of the DL services
Societies Sets of communities and relationships (relations) among them
Define managers, responsible for running DL services; actors, that use those services; and relationships
5SL primitives and implementationModel Primitives 5SL implementation
Streams Model
Text; video; audio; picture; software program MIME types
StructuresModel
Collection; catalog; hypertext; document; metadata; organization tools
XML and RDF schemas; Topic maps ML (XTM)
SpacesModel
User interface; index; retrieval model MathML, UIML, XSL
ScenariosModel
Service; event; condition; action Extended UML sequence diagrams; XML serialization
SocietiesModel
Community; service managers; actors; relationships; attributes; operations
XML serialization
Challenges with Approach The designer should know the 5S theory
very well and be very familiar with the syntax and semantics of 5SL to be able to write correct 5SL files.
It is difficult to get the big picture of a digital library just from a textual 5SL file.
Overall objective of 5SGraph:Help users model their own instances of a digital library (DL) in the 5S language (5SL).
A simple modeling process which enables rapid generation of digital libraries is needed.
Support non-expert users. Speed-up development process. Increase the quality of final product.
5SGraph: A DL Modeling Tool
Goals of 5SGraph To help digital library designers understand the
5S model quickly and easily To help digital library designers build their own
digital libraries without difficulty To help digital library designers transform their
models into 5SL files automatically To help digital library designers understand,
maintain, and upgrade existing digital library models conveniently
5SGraphHow does 5SGraph work?
5SGraph loads and displays a metamodel in a structured toolbox.
The structured editor of 5SGraph provides a top-down visual environment for the DL designer.
5SGraph produces correct 5SL files according to the visual model built by the designer.
Overview of 5SGraph
Workspace
(instance model)
Structured
toolbox
(metamodel)
Visualization Features The structured toolbox
Visualization of the metamodel Visual components that can be added
Truncated display of trees Node-link representation Deep-node problem
Icons Type/Instance relationship
Cardinality
Component Reuse Components can be loaded/saved.
Load and save sub-trees Component reuse saves time and effort.
Full reuse from component pool Partial reuse: adapting components
Semantic Constraints• There are inherent semantic constraints in the
hierarchical structure of the 5S model.
• 5SGraph maintains the constraints and enforces these constraints over the instance model to ensure correctness.
The Preliminary Test of 5SGraph Research Questions
Does the tool help users understand and use the 5S model to build their own digital libraries?
Does the tool help users efficiently describe digital library models in the 5SL language?
Are users satisfied with the tool?
Test Results Task 1 Task 2 Task 3
Completion Rate (%)
100 100 100
Mean Task Time (min)
11.3 11.4 15.1
Mean Closeness to Expertise
0.483 0.752 0.712
Mean Goal Achievement (%)
97.4 97.4 98.2
Satisfaction and Usefulness The average rating of user satisfaction is 91%.
The average rating of usefulness of the tool is 92%.
Statistical analysis shows that the mean value of post-understanding of the 5S model is significantly greater than that of pre-understanding.
5SLGen: Automatic DL Generation
5S Meta
Model5SLGraph
DL Expert
DL Designer
5SL DL
Model
5SLGen
Practitioner
Researcher
TailoredDL
Services
Teacher
componentpool
ODLSearch,ODLBrowse,ODLRate,ODLReview,
…….
Requirements (1) Analysis (2)
Implementation (4)
Design (3)
The XML Log Standard for DLs1. Marcos André Gonçalves, Ganesh Panchanathan, Unnikrishnan
Ravindranathan, Aaron Krowne, Edward A. Fox, Filip Jagodzinski, and Lillian Cassel. The XML Log Standard for Digital Libraries: Analysis, Evolution, and Deployment. Proc. JCDL'2003, Third Joint ACM / IEEE-CS Joint Conference on Digital Libraries, May 27-31, 2003, Houston, 312 - 314
2. Marcos André Gonçalves, Ming Luo, Rao Shen, Mir Farooq Ali, and Edward A. Fox. An XML Log Standard and Tool for Digital Library Logging Analysis. In "Research and Advanced Technology for Digital Libraries, 6th European Conference, ECDL 2002, Rome, Italy, September 16-18, 2002, Proceedings", eds. Maristella Agosti and Constantino Thanos, LNCS 2458, Springer, pp. 129-143.
The XML Log Standard for DLsDL Concept Dimensions of Quality Log can be used to
measure? Digital object Accessibility
Pertinence Preservability Relevance Similarity Significance Timeliness
No Yes No Yes No No No
Metadata specification Accuracy Completeness Conformance
No No No
Collection Completeness Impact Factor
No No
Catalog Completeness Consistency
No No
Repository Completeness Consistency
No No
Services Composability Efficiency Effectiveness Extensibility Reusability Reliability
No Yes Yes No No Yes
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
Defining Quality in Digital Libraries What’s a “good” digital Library?
Central Concept: Quality! Hypotheses of this work: Formal theory can help to define “what’s a good
digital library” by:Proposing and formalizing new quality measures
for DLsFormalizing traditional measures within our 5S
frameworkContextualizing these measures within the
Information Life Cycle
Information Life Cycle
AuthoringModifying
OrganizingIndexing
Storing
Archiving
NetworkingAccessing
Filtering
Creation
DistributionUtilization
Significance
Similarity
Pertinence
AccuracyCompletenessConformance
Seeking
SearchingBrowsingRecommending
Relevance
Timeliness
Accessibility
Accessibility
Believability
Inactive
Active
Discard
RetentionMining
Semi-Active
Preservability
Timeliness
Preservability
Describing
Quality and the Information Life Cycle
Defining Quality in Digital librariesDL Concept Dimensions of Quality Digital object Accessibility
Pertinence Preservability Relevance Similarity Significance Timeliness
Metadata specification Accuracy Completeness Conformance
Collection Completeness Impact Factor
Catalog Completeness Consistency
Repository Completeness Consistency
Services Composability Efficiency Effectiveness Extensibility Reusability Reliability
Defining Quality in Digital Libraries
Structure of Presentation For each quality metric
Discussion about the metric Meaning, use, etc..
Definition of numerical measure Example of Use
Digital Objects: Accessibility A digital object is accessible by an DL
actor or patron, if it exists in the collections of the DL, the repository is able to retrieve the object, and:
1) an overly restrictive rights management property of a metadata specification does not exist for that object; or
2) if it exists, the property does not restrict access to the particular society to which the actor belongs or to that actor in particular.
Digital Objects: Accessibility Accessibility acc(dox, acy) of digital object dox
to actor acy is: 0, if there is no collection C in the DL so that dox C;
otherwise acc(dox, acy) = z struct_streams(dox) rz(ac_y))/|struct_streams(dox)|, where: rz(acy) is a rights management rule defined as an indicator
function: 1, if
z has no access constraints; or z has access constraints and acy cmz, where cmz Soc(1) is
a community that has the right to access z; and 0, otherwise
Digital Objects: Accessibility VT ETD Collection
Fi rst Letter of Author’ s Name
Unrestri cted Restri cted Mi xed Degree of accessi bi l i ty for users not on the VT communi ty
A 164 50 5 mi x(0. 5, 0. 5, 0. 167, 0. 1875, 0. 6)
B 286 102 3 mi x(0. 5, 0. 5, 0. 13)
C 231 108 7 mi x (0. 11, 0. 5, 0. 5, 0. 5, 0. 33, 0. 09, 0. 33)
D 159 54 2 mi x(0. 875, 0. 666)
E 67 26 1 mi x(0. 5)
F 88 39 2 mi x(0. 375, 0. 09)
G 166 72 2 mi x(0. 666, 0. 5)
H 225 91 3 mi x(0. 66, 0. 5, 0. 235)
I 20 8 1 mi x(0. 5)
J 84 36 2 mi x(0. 5, 0. 6)
K 166 69 2 mi x(0. 5, 0. 5)
L 189 68 6 mi x(0. 153, 0. 33, 0. 5, 0. 5, 0. 94)
M 299 115 9 mi x(0. 5, 0. 5, 0. 5, 0. 041, 0. 5, 0. 5, 0. 5, 0. 117, 0. 5)
Digital Objects: AccessibilityN 74 16 1 mix(0.8)
O 45 19 2 mix(0.5, 0.125)
P 172 71 3 mix(0, 0, 0.33)
Q 13 6 0 mix = none
R 158 71 3 mix(0.66, 0.5, 0.5)
S 398 159 8 mix(0.66, 0.5, 0.5, 0.6, 0.33, 0.66, 0.33, 0.6)
T 111 49 1 mix(0.13)
U 9 7 0 mix = none
V 63 20 0 mix = none
W 191 81 5 mix (0.5, 0.22, 0.38, 0.875, 0.5)
X 11 5 0 mix = none
Y 38 9 3 mix(0.5, 0.5, 0.125)
Z 47 17 2 mix(0.5, 0.5)
All 3474 1368 73
Digital Objects: Pertinence
Let Inf(doi) represent the ``information'' (not physical) carried by a digital object or any of its (metadata) descriptions, IN(acj) be the information need of an actor and Contextjk be an amalgam of societal factors which can impact the judgment of pertinence by acj at time k. These include among others, time, place, the
actor's history of interaction, task in hand, and a range of other factors that are not given explicitly but are implicit in the interaction and ambient environment.
Digital Objects: Pertinence
Let's define two sub-communities of actors, users and external-judges Ac, as: users: set of actors with an information need who use
DL services to try to fulfill/satisfy that need external-judges: set of actors responsible for determining
the relevance of a document to a query.
Let's also constrain that a member of external-judges can not judge the relevance of a document to a query representing her own information need, i.e., at the same point in time users external-judges = .
Digital Objects: Pertinence The pertinence of a digital object to a user acj
is an indicator function Pertinence(doi, acj): Inf(doi) IN(acj) Contextjk defined as: 1, if Inf(doi) is judged by acj to be informative
with regards to IN(aci) in context Contextjk;
0, otherwise
Preservability
Fidelity Obsolescence
Depends on Depends on
Process Source format
Target format
Cost
Software Hardware Evaluation StorageIdentification Training …
Digital Objects: Preservability Factors in Preservability
Digital Objects: Preservability Preservability(doi, dl) =
(fidelity of migrating(doi,formatx, formaty),
obsolescence(doi, dl)).
fidelity(doi, formatx, formaty) = 1/ distortion(p(formatx, formaty))
obsolescence(doi, dl) = cost of converting/migrating
object within the context of the specific dl
Digital Objects: Relevance
Relevance (doi,q) 1, if doi is judge by external-judge to be relevant to q0, otherwise
Relevance Estimate Rel(doi,q) = doi
dj / |doi| |q|
Objective, public, social notion Established by a general consensus in the field, not
subjective, private judgment by an actor with an information need
Digital Objects: Similarity
reflect the relatedness between two or more digital objects
Used in many services (e.g., classification, find similar, etc)
Digital Objects: Similarity Metrics
Content-based Cosine(di, dj)
doi dj
/ |doi| |doj
|
Bag-of-words(di,dj) |W(di) W(dj)| / |W(di)|
Okapi(di,dj) (see draft)
Digital Objects: Similarity Metrics
Citation-based Co-citation
cocit(di,dj) = |Pdi Pdj| /max P
Bibliographic coupling bibcoup(di, dj) = |Cdi Cdj|/ max Cd
Amsler Amsler(di, dj) =|(Pdi Cdi) (Pdj Cdj)| / max P
Cd
Digital Objects: SimilarityHighest degree of cocitation Publication Year A unified lattice model for static analysis of programs by construction or approximation of fixpoints
4th ACM SIGACT-SIGPLAN 1977
Active messages: a mechanism for integrated communication and computation
19th annual int. symposium on Computer architecture
1992
Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers
17th annual international symposium on Computer Architecture
1990
Computer programming as an art CACM 1974
The SPLASH-2 programs: characterization and methodological considerations
22nd annual international symposium on Computer architecture
1995
ATOM: a system for building customized program analysis tools
ACM SIGPLAN '94 1994
Analysis of pointers and structures Proceedings of the conference on Programming language design and implementation
1990
Revised report on the algorithmic language scheme | ACM SIGPLAN Notices (Issue) 1986
The directory-based cache coherence protocol for the DASH multiprocessor
17th annual international symposium on Computer Architecture
1990
Digital Objects: SimilarityHighest degree of bibliographic coupling publication date
Query evaluation techniques for large databases CSUR 1993
Compiler transformations for high-performance computing CSUR 1994
On randomization in sequential and distributed algorithms CSUR 1994 External memory algorithms and data structures: dealing with massive data CSUR 2001
A schema for interprocedural modification side-effect analysis with pointer aliasing TOPLAS 2001
Complexity and expressive power of logic programming CSUR 2001
Computational geometry: a retrospective ACM symposium on Theory of computing 1994
Research directions in object-oriented database systems ACM SIGACT-SIGMOD-SIGART symposium
Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons CSUR 1993
Digital Objects: Similarity Distributions
Figure 3(a) Figure 3(b)
Digital Objects: Similarity Application: Automatic classification with
kNN
Evidence Macro F1 (30%) Abstract_BagOfWords 0.195 Abstract_Cosine 0.343 Abstract_Okapi 0.339 Bib_Coup 0.347 Amsler 0.412 Co-citation 0.273 Title_BagOfWords 0.492 Title_Cosine 0.525 Title_Okapi 0.525
Digital Object: Timeliness
(current time or time of last freshening) – time of the latest citation, if object is ever cited
age = (current time or time of last freshening) – (creation time or publication time) , if object is never cited.
Time of last freshening =time of the creation/publication of most recent object in the collection to which doi belongs
Digital Objects: Timeliness ACM Digital Library
0
1000
2000
3000
4000
5000
6000
7000
8000
Timeliness 0 1 2 3 4 5 6 7 8 9 10
No. of Documents 5165 7264 5162 4209 2716 2120 1698 1554 1372 1357 1019
1 2 3 4 5 6 7 8 9 10 11
Metadata Specifications and Metadata Format: Completeness
Refers to the degree to which values are present in the description, according to a metadata standard. As far as an individual property is concerned, only two situations are possible: either a value is assigned to the property in question, or not.
Metric Completeness(msx) = 1 - (no. of missing attributes in
msx/ total attributes of the schema to which msx
conforms)
Metadata Specifications and Metadata Format: Completeness OCLC NDLTD Union catalog
00. 10. 20. 30. 40. 50. 60. 70. 80. 9
1
GWUD LSU
VTET
D
MIT
UBC
PHYS
NET
VTIN
DIV
VAND
ERBI
LT
NCSU
USAS
K
PITT HKU
HUMB
OLT
OCLC
BGMY
U
DRES
DEN
VIEN
NA
GATE
CH
ETSU USF
MUEN
CHEN
UTEN
N
CCSD
WATE
RLOO
NSYS
U
LAVA
L
UPSA
LLA
CALT
ECH
UCL
WagU
niv
Metadata Specifications and Metadata Format: Conformance An attribute attxy of a metadata specification
msx is conformant to a metadata format/standard if: 2) it appears at least once, if attxy is marked as
mandatory, and; 1) its value is from the domain defined for attxy; 3) it does not appear more than once, if it is not
marked as repeatable. Metric
Conformance(msx) = ((attribute attxy of msx) degree of conformance of attxy)/ total attributes).
Metadata Specifications and Metadata Format: Conformance Based on ETD-MS
0. 75
0. 8
0. 85
0. 9
0. 95
1
GW
UD
LSU
VTET
D
MIT
UBC
PHYS
NET
VTIN
DIV
VAN
DER
BILT
NC
SU
USA
SK
PITT HKU
HU
MBO
LT
OC
LC
BGM
YU
DR
ESD
EN
VIEN
NA
GAT
ECH
ETSU
USF
MU
ENC
HEN
UTE
NN
CC
SD
WAT
ERLO
O
NSY
SU
LAVA
L
UPS
ALLA
CAL
TEC
H
UC
L
Wag
Uni
v
Collection, Metadata Catalog, and Repository: Collection Completeness
A complete DL collection or is one which contains all the pertinent existing digital objects.
Metric completeness(Cx) = |Cx| /|’ideal collection’|
Collection, Metadata Catalog, and Repository: Collection Completeness
Collection Degree of Completeness
ACM Guide 1
DBLP 0.652 CITIDEL(DBLP + ACM + NCSTRL + NDLTD-CS) 0.467
IEEE-DL 0.168
ACM-DL 0.146
ACM Guide
Journal (articles) 256527
Proceeding (papers) 299850
Book(chapters) 107870
Thesis 46098
Tech. Reports 25081
Bibliography 2
Play 1
735429
Catalog Completeness/Consistency Completeness(DMC)=
1 – (no. of do’s without a metadata specification/size of the described collection)
Consistency(DMC)=
0, if there is at least one set of metadata specifications assigned to more than one digital object
1, otherwise
Repository Completeness and Consistency
Completeness (Rep) = Number of collections in the repository/ideal number of
collections
Consistency(Rep) = 1, if the consistency of all the repositories’ catalogs with
respect to their described collection is 1
0, otherwise
Services: Efficiency/ Effectiveness Effectiveness
Very common measures: Precision, Recall, F1, 10-precision, R-Precision
Other services may have different measures: e.g., Recommending, etc
Efficiency: Let t(e) be the time of an event e, eix and efx be the first
and the last event of service sex . The efficiency of service sex is defined as: Efficiency(sex) = t(efx) - t(eix)
Services: Extensibility and Reusability A service Y reuses a service X if the behavior of Y
incorporates the behavior of X A service Y extends a service X if it subsumes the
behavior of X and potentially includes additional
subflows of events. Metrics
Macro-Reusability(Serv) = ( reused(sei), sei Serv)/ |Serv|, where
reused is a indicator function defined as : 1, if smj, sej reuses si; 0,
otherwise. Micro-Reusability(Serv) = ( LOC(smx) * reused(sei), smx SM, sei
Serv, sex runs sei )/ |LOC(sm), sm SM|, where LOC
corresponds to the number of lines of code of a service manager
Services: Extensibility and Reusability
Service Component
Based
LOC for implementing
service
LOC reused from
component
Total LOC
Searching – Back-end Yes - 1650 1650
Search Wrapping No 100 - 100
Recommending Yes - 700 700
Recommend Wrapping No 200 - 200
Annotating – Back-end Yes 50 600 600
Annotate Wrapping No 50 - 50
Union Catalog Yes - 680 680
User Interface Service No 1800 - 1600
Browsing No 1390 - 1390
Comparing (objects) No 650 - 650
Marking Items No 550 - 550
Items of Interest No 480 - 480
Recent Searches/Discussions
No 230 - 230
Collections Description No 250 - 250
User Management No 600 - 600
Framework Code No 2000 - 2000
Total 8280 3630 11910
Macro-Reusability = 3/16 = 0.187Micro-Reusability = 3630 / 11910 = 0.304
Services: Reliability Def: 1 – no. of failures/no. of accesses Failure is an event that
was supposed to happen in a scenario but did not;
did happen, but did not execute some of its operations
did happen, where the operations were executed, but the results were not the expected ones.
Services: Reliability CITIDEL
CITIDEL servi ce No. of fai l ures/no. of accesses Rel i abi l i ty
searchi ng 73/ 14370 0. 994
browsi ng 4130/ 153369 0. 973
requesti ng (getobj ect) 1569/ 318036 0. 995
structured search 214/752 0. 66
contri buti ng 0/ 980 1
Quality in DLs: Conclusions and Future Work Development of more “usage”-oriented measures
Current measures very much “system-oriented” Development of Quality ToolKit (5SQual) for DL
managers with following features: Mapping tool to map local log format to standard
XML Log format Components to implement all measures Visualization Broken into several logical pieces to be used in the
different phases of the information cycle
Outline Vision of the Future: Chatham Report Integration 5S Framework
DL Taxonomy Minimal DL DL Ontology Applications of Framework: Language (5SL),
Design (5SGraph), Generation (5SGen), Logging Quality DLs
ETANA-DL
• Apply 5S to the archaeological domain• Identified requirements for future versions of
system• Extensible and componentized approach for
handling heterogeneous archaeological data from disparate sources
• Rapidly generated prototype archaeological DL• Making primary archaeological data available
without significant delay
Modeling ETANA-DL – An Archaeological DL Meta-model
Text Video Audio
*Site *Sub-partition *Container *Artifact*LocusRegion
Taxonomies
Temporal Artifact-specific
Space model
Structuremodel
Metadata
Drawing Photo 3DStreammodel
*Partition
Society model
Archaeologist
General public
Geographic space
Service Manager
Information Satisfaction
Value added
Repository buildingScenario
model Services
Domain specific
User interface Metric space
Spatial
Modeling ETANA-DL – The ETANA-DL model
*Field *Pail *Bone*LocusJordan
Taxonomies
Space model
Structuremodel Field record,
locus sheet
Figurine image (photo)
Streammodel
Umayri
Society model
Archaeologist
Generic public
Site-specific coordinate system
Web interface Vector space
ETANA-DLService Manager
Searching, Browsing
Annotation, binding
Harvesting, Converting Scenariomodel Services
Object comparison, marking item for analysis
Archaeologicalperiods
Bone type
Seed species
*Square
*Figurine
*Quadrant *Bag*LocusJordan Valley Nimrin *Square
*Field *Basket*LocusSouthern Israel Halif *Area*Seed
Site/field plan(drawing)
Preliminary/FinalReport (application/pdf)
Spatial
Conclusions Vision Integration 5S DL Curricula, Courses, Books Questions?