Upload
ludwig
View
45
Download
0
Embed Size (px)
DESCRIPTION
Controlled Vocabulary Working Group - 2013. Presented by John Porter. Goal. Make it easy for researchers to find the data they need from LTER repositories by Enhancing searches through the use of a thesaurus that provides synonyms, narrower terms and related terms - PowerPoint PPT Presentation
Citation preview
Controlled Vocabulary Working Group - 2013PRESENTED BY JOHN PORTER
Goal
Make it easy for researchers to find the data they need from LTER repositories byEnhancing searches through the use of a
thesaurus that provides synonyms, narrower terms and related terms
Creating a browseable structure for locating datasets
2013 Goals
Enhance term list to incorporate: New terms suggested by sites Frequently searched terms Frequently used terms Terms related to human activities (social science) More synonyms for existing terms that are found in LTER
Metadata Needed: Establish clear criteria for evaluating candidate
terms Best Practices
Goals
Add definitions for terms in the Controlled Vocabulary
Create plans for dealing with taxonomic names and places that are currently not part of the existing Controlled Vocabulary
Workshop – May 2013
Pre-Workshop Queried LTER Sites for new candidate terms – Melendez,
Henshaw, Vanderbilt Queried existing documents for words not currently in the
Controlled Vocabulary – Gastil-Buhl Queried logs for search terms used by Metacat users - Costa Updated Tematres software to the latest version - Porter Identified online sources for definitions – O’Brien, Vanderbilt Investigated taxonomic web services and gazetteers – Gries
Note: the group favors using Taxonomic and Geographic Coverage elements rather than keywords for these elements
Workshop Participants 2013
LTER Information ManagersMargaret O’Brien, Kristen Vanderbilt, Donald
Henshaw and John Porter Professional Librarians from UVA:
Sherry Lake and Ivey Glendon Added a lot to our discussions
“about” vs. “contains” taxonomies our focus is describing what datasets contain“about” is much harder to define for data
Workshop Results 2013
New Terms ~ 230 terms were suggested by 4 sites
~ 75 terms were accepted and added to LTER Vocabulary Reason for rejection was given for each term not added
~ 25 additional terms were added based on use at 3 or more LTER Sites or 2 or more sites with > 10 datasets
~ Several suggested terms were added as non-preferred (UF) terms
Definitions 309 new definitions added
Controlled Vocabulary Status
710 total preferred terms200 synonyms (“use for” terms)363 total definitions
Important Workshop Activities - 2013
Developed improved Best Practices for identifying additional terms for inclusion (http://im.lternet.edu/VocabBestPractices)Including a table that lays out grounds for
rejecting particular words
What Rationale Do’s ProblemAbbreviation
Keywords should be applied to a number of datasets across the LTER Network.
Data discovery is the goal, so keywords that find data are most useful.
Propose keywords that are used at several other sites, and numerous datasets
NR - not repeated in multiple datasets
Keywords should be used at more than one site
A goal is to enable cross-site searching
Propose keywords that are used at several other sites
A - absent from other sites
Avoid proposing stand-alone adjectives
Stand alone adjectives imply an “of what” question. Such as “aboveground” raises the question “aboveground what?”
Propose nouns or possibly verbs, but not stand-alone adjectives. Perferred terms can include an adjective with an object (e.g., aboveground biomass)
ADJ - stand-alone adjective
Be specific Vague or ill-defined terms are hard to consistently assign
Use specific, unambiguous and well-defined terms
V - Vague
Avoid duplicating concepts already in the Controlled Vocabulary
Duplicative keywords lead to inconsistent keyword assignments
Avoid duplication of nearly-equivalent terms
AWE - adequate alternative word exists
Keywords should be well-defined Without definition and context some technical terms may be difficult to assess or place
Provide good definitions NC - needs clarification or better definition
Proposed synonyms should have exact correspondence to the preferred term
Synonyms should not refer to different concepts than the associated preferred term
Select synonyms that are exact matches for the concept described by the preferred term
NS - not a synonym
Keywords should be terms that users frequently search on
Keywords that are not searched for by users are not particularly useful.
Propose keywords that are frequently used in searches
NU - not used for search
Vision
Refining the “Vision” for how the controlled vocabulary can be used to make PASTA and other NIS elements more effective And link to other efforts such as DataOne, LODE and
EnvThes Optional workshop yesterday – tasks identified:
Identify systems and software tools that effectively exploit controlled vocabularies for searching/browsing and ranking
Metrics tools: help identify specific datasets that could benefit from additional keywords
Help us out!
During discussions today and tomorrow, think about how the Controlled Vocabulary can be leveraged
Incorporate terms from the Controlled Vocabulary into your site EML documents ASK us if you need help!!!!! – we have tools