14
Vocabulary Working Vocabulary Working Group Group Virtual Water Cooler Virtual Water Cooler Session April 6-7, Session April 6-7, 2009 2009 Moderator: John Porter Moderator: John Porter http://webmeeting.dimdim.com/porta l/JoinForm.action?confKey=jhp7e 1

Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter rm.action?confKey=jhp7e

Embed Size (px)

Citation preview

Page 1: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Controlled Vocabulary Controlled Vocabulary Working GroupWorking Group

Virtual Water Cooler Virtual Water Cooler Session April 6-7, 2009Session April 6-7, 2009

Moderator: John PorterModerator: John Porterhttp://webmeeting.dimdim.com/portal/JoinForm.action?confKey=jhp7e

11

Page 2: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Goals for this VTCGoals for this VTC

►Brief review of activitiesBrief review of activities►Get feedback on “LTER Data Get feedback on “LTER Data

Keywords” draft listKeywords” draft list►Discuss process for managing keyword Discuss process for managing keyword

listlist►Next steps? – Taxonomys, Tools etc. Next steps? – Taxonomys, Tools etc. ►What should we do at the ASM What should we do at the ASM

meeting?meeting?2

Page 3: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Carbon Dataset 1

Carbon Dataset2

Carbon Dataset 3

Disjointed keywords make it hard to locate similar datasets

3

Page 4: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Carbon Dataset 1

Carbon Dataset2

Carbon Dataset 3

Overlapping keywords make it easier to locate similar datasets

Note that the purpose of keywords and a controlled vocabulary is not to provide the best possible description of a particular dataset, but to provide a mechanism for appropriate groupings of datasets

4

Page 5: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

The ProblemThe Problem► Inconsistent, disjunct and sparse keywords Inconsistent, disjunct and sparse keywords

negatively impact data discoverynegatively impact data discovery

72.2% of all keywords are used at only a single LTER site

90% of all keywords are used at 4 or fewer LTER sites

5

Page 6: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Goals for the Controlled Goals for the Controlled Vocabulary GroupVocabulary Group

►Aid the discovery of data by Aid the discovery of data by researchersresearchers Consistent, broadly applied keywordsConsistent, broadly applied keywords Develop “browseable” structures Develop “browseable” structures

(taxonomys, thesauri, ontologies)(taxonomys, thesauri, ontologies)

►Aid in the creation of high-quality Aid in the creation of high-quality metadatametadata

►Make it easier for LTER data to Make it easier for LTER data to interoperate with other data systemsinteroperate with other data systems

6

Page 7: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Past ActivitiesPast Activities

►ResearchResearch A variety of studies regarding which words A variety of studies regarding which words

are used whereare used where

► Improvement of existing systemsImprovement of existing systems Metacat drop down list now features the Metacat drop down list now features the

most common most common existingexisting keywords keywords

►Discussion of possible tools to:Discussion of possible tools to: Aid in KeywordingAid in Keywording Aid in searchingAid in searching

7

Page 8: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Draft ListDraft List

►Creation of a draft list of ~650 words for Creation of a draft list of ~650 words for an LTER-wide controlled vocabularyan LTER-wide controlled vocabulary Words must be used at two or more sites, ORWords must be used at two or more sites, OR Words must be used at one or more sites Words must be used at one or more sites

and also be found in either NBII, GCMD, the and also be found in either NBII, GCMD, the KNB/Metacat browse list or recent metacat KNB/Metacat browse list or recent metacat searchessearches

Excluded were species names and names of Excluded were species names and names of geographic locations which probably belong geographic locations which probably belong in separate listsin separate lists

8

Page 9: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Draft ListDraft List

►Words on the candidate list were edited Words on the candidate list were edited to create “Preferred forms” that comply to create “Preferred forms” that comply with with NISO-Z39.19-2005 Nouns are plural if you would count them, Nouns are plural if you would count them,

singular if they are an amountsingular if they are an amount Removal of hyphenated words when Removal of hyphenated words when

possiblepossible Creation of a “synonym ring” linking extant Creation of a “synonym ring” linking extant

forms with preferred forms (~150 terms)forms with preferred forms (~150 terms)

9

Page 10: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

A Logical Next StepA Logical Next Step

► The draft list needs to be formalized in a database The draft list needs to be formalized in a database that includes (NISO Z39.19 sections 11.1.4 & ): that includes (NISO Z39.19 sections 11.1.4 & ): termterm source(s)source(s) consulted for terms and entry terms. consulted for terms and entry terms. scope notescope note USED FOR references USED FOR references – to indicate which synonyms, near synonyms, and – to indicate which synonyms, near synonyms, and

other expressions are covered by the term.other expressions are covered by the term. nondisplayable variations, e.g., common spelling errorsnondisplayable variations, e.g., common spelling errors broader termsbroader terms narrower termsnarrower terms related termsrelated terms locally established relationshipslocally established relationships category or classification numbercategory or classification number history notehistory note, including minimally the date added, as well as the record of , including minimally the date added, as well as the record of

changes, if anychanges, if any

Some elements support development of hierarchical taxonomys and thesauri

10

Page 11: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

IssuesIssues

►Who should make decisions regarding the Who should make decisions regarding the content of the list (11.3 in NISO Z39.19)?content of the list (11.3 in NISO Z39.19)?

►How should site-specific terms be How should site-specific terms be handled?handled? Include in list, but use Scope or Category Include in list, but use Scope or Category

elements to distinguishelements to distinguish

►What steps are needed to create a What steps are needed to create a hierarchical polytaxonomy or thesaurus?hierarchical polytaxonomy or thesaurus?

11

Page 12: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Discussion TopicsDiscussion Topics

►Get feedback on the draft listGet feedback on the draft list►How (who) should manage the How (who) should manage the

keyword list?keyword list?►Next steps? – Taxonomys, Tools etc. Next steps? – Taxonomys, Tools etc. ►What should we do at the ASM What should we do at the ASM

meeting to move the process forward?meeting to move the process forward?

12

Page 13: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Day 1 – Discussion PointsDay 1 – Discussion Points►Generally pleased with the list. Issues:Generally pleased with the list. Issues:

Site-specific wordsSite-specific words Human dimensions largely absentHuman dimensions largely absent LocationsLocations HomographsHomographs

►Next Steps:Next Steps: Give sites a chance to propose addition, Give sites a chance to propose addition,

deletion or substitution of terms in the list, deletion or substitution of terms in the list, and/or additions to the synonym ringand/or additions to the synonym ring

Vote on changes Vote on changes

13

Page 14: Controlled Vocabulary Working Group Virtual Water Cooler Session April 6-7, 2009 Moderator: John Porter  rm.action?confKey=jhp7e

Day 1 – Discussion PointsDay 1 – Discussion Points

►What to do at ASM meeting?What to do at ASM meeting? Session presenting different approachesSession presenting different approaches

►Lists through ontologiesLists through ontologies

Session: New Tools for Locating DataSession: New Tools for Locating Data►Spec out tools for keywording and searchingSpec out tools for keywording and searching

Session “How to find and use data”Session “How to find and use data”

14