33
A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers Adam Chandler Cornell University Library Cornell University Library, Metadata Working Group Forum 16 October 2009

A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

  • Upload
    alc28

  • View
    673

  • Download
    4

Embed Size (px)

Citation preview

Page 1: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

A demonstration of transparent and scalable OpenURL quality metrics for use

in promoting metadata consistency across content providers

Adam ChandlerCornell University Library

Cornell University Library, Metadata Working Group Forum16 October 2009

Page 2: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

OpenURL model

Page 3: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

OpenURL model cont. incoming OpenURL

http://linkresolver.library.cornell.edu:4550/resserv?&url_ver=z39.88-2004&url_ctx_fmt=info:ofi/fmt:kev:mtx:ctx&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=item-level+usage+statistics+a+review+of+current+practices+and+recommendations+for+normalization+and+exchange&rft.auinit=

c&rft.aulast=merk&rft.date=2009&rft.epage=162&rft.genre=article&rft.issn=0737-8831&rft.issue=1&rft.place=bingley&rft.pub=emerald+group+publishing+limited&rft.spage=151&rft.stitle=libr+hi+tech

&rft.title=library+hi+tech&rft.volume=27&rfr_id=info:sid/www.isinet.com:wok:wos&rft.au=scholze,+f&rft.au=windisch,+n&rft_id=info:doi/10.1108%2f07378830910942991/

in our knowledge base?

title: Library hi tech issn: 0737-8831 start date: 19970101 end date:

link-to syntax for Emerald

http://www.emeraldinsight.com/rpsv/cgi-bin/cgi?body=linker&reqidx=#@ISSN-HYPHEN#(#@DATE#)#@VOLUME#:#@ISSUE#L.#@SPAGE#

Page 4: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

OpenURL is pervasive

Cornell link resolver alone:July 1, 2008 – June 30, 2009: 402,000 OpenURL service requests.

402,000 * 123(ARL libraries) = 49 million

Page 5: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Cornell’s top 10 OpenURL sources

1. Web of Knowledge2. WorldCat Local3. Google Scholar4. Webfeat (our “Find Articles” service)5. EBSCOHost6. OCLC FirstSearch7. SilverPlatter8. Weill Cornell Medical Center9. SciFinder Scholar 10. PubMed

Page 6: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

… but quality of experience is difficult to benchmark

• Wrong start end date in the local library's holdings knowledge base (see NISO KBART)

• Semantically inaccurate metadata from the OpenURL origin (wrong ISSN, for example)

• Wrong link-to syntax in link resolver• Fragile handling of incoming links by content

provider

Page 7: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

… but quality of experience is difficult to benchmark

• Inaccurate or missing Crossref DOI URL (sometimes the DOI registration process is out of sync with the mounting of articles)

• Subscription errors (especially with the start of a new calendar year)

• Syntactically incorrect or missing metadata from the OpenURL origin

Page 8: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Literature review

I can identify no systematic study designed and carried out to benchmark the quality of linking. The OpenURL standard was introduced some ten years ago.

Page 9: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Wakimoto, Walker, and Dabbour (2006)

Main finding: Users just expect full-text. When they do not get it they are disappointed.

Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136

Page 10: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Wakimoto, Walker, and Dabbour (2006)

"Where does SFX start and where does it end? If an SFX request does not result in a full-text link, does the problem lie with the source database’s metadata, the construction of the OpenURL request, the SFX KnowledgeBase, the SFX software, the resulting target resource, or even the local library’s collection development plan?" (p. 134)

Jina Choi Wakimoto, David S. Walker, and Katherine S. Dabbour (2006). "The Myths and Realities of SFX in Academic Libraries." The Journal of Academic Librarianship 32 (2): 127–136

Page 11: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Blake and Knudson (2002)

• “Increased outreach by librarians to authors emphasizing and promoting the importance of citation standards for electronic document retrieval.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.

Page 12: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Blake and Knudson (2002)

• “Increased communication between primary publishers and secondary publishers. Metadata corrections and updates need to be better coordinated.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.

(NISO KBART role)

Page 13: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Blake and Knudson (2002)

• “Increased consistency in metadata within a single database and across databases. This would result in a higher success rate of linking and would allow the algorithms to be simpler. Simpler algorithms are easier to maintain and modify.”

Blake, Miriam E. and Frances L. Knudson. "Metadata and Reference Linking." Library Collections, Acquisitions & Technical Services 26 (3), (2002): 230.

Page 14: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Mellon funded planning grant for L'Année philologique

1. Canonical Citation Linking: http://cwkb.orgIn collaboration with Eric Rebillard, Professor, Classics and History, and David Ruddy, Cornell University Library

2. OpenURL QualityIs it possible to build a tool for evaluating the quality of OpenURLs from a content provider?

Page 15: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Constant: Core elements used by content providers in their link-to targets

title - 64%spage - 64%volume - 61%issue - 60%date - 48%aulast - 47%issn - 35%atitle - 35%DOI - 14%ISBN – 5%

Based on an analysis of link-tos in the Cornell instance of the III WebBridge link resolver product.

Page 16: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Variable: Frequency of element string patterns for all sources

Page 17: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

aulast

First author's family name. This may be more than one word. In many citations, the author's family name is recorded first and is followed by a comma, e.g. Smith, Fred James is recorded as "aulast=smith"

Page 18: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

aulast if ($e =~ /aulast/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^[A-Za-z]+$/) { $patterns{$neworigin}

{$newsid}{"aulast_simple"}++; } elsif ($elementhash{$e} =~ /^[A-Za-z]+, .+$/)

{ $patterns{$neworigin}{$newsid}{"aulast_comma"}++; } elsif ($elementhash{$e} =~ /^[A-Z][a-z]+( [A-Z]\.)+$/)

{ $patterns{$neworigin}{$newsid}{"aulast_simpleplusinitial"}++;} else { $patterns{$neworigin}{$newsid}{"aulast_other"}++; } }

Page 19: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

aulast_other examples

Ryan S MillerLouise D BryantDAVID J MCKENZIE%C4%90okovi%C4%87Indu B Ahluwalia Carreras-Sangr%c3%a0Bautista-Casta%C3%B1oO%27SheaMelissa Ventura MarraGuan XueYing%3B Yu Nan%3B Shangguan XiaoXia

Page 20: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

spage

First page number of a start/end (spage-epage) pair. Note that pages are not always numeric.

Page 21: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

spage

if ($e =~ /spage/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^\d+$/) { $patterns{$neworigin}

{$newsid}{"spage_number"}++; } elsif ($elementhash{$e} =~ /^\d+-\d+$/) { $patterns{$neworigin}

{$newsid}{"spage_number_number"}++; } elsif ($elementhash{$e} =~ /[A-Za-z].+\d/)

{ $patterns{$neworigin}{$newsid}{"spage_string_w_number"}++; }

else { $patterns{$neworigin}{$newsid}{"spage_other"}++; } }

Page 22: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

spage_other examples

• 1033 (6 pages)• 85(19)• 575 (11 pages)• 283...290• PHYS• GLRM• 58,+VI

Page 23: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

date

The publication date of the item or bundle encoded in the "Complete date" variant of ISO8601 (see http://www.w3.org/TR/NOTE-datetime). This format is YYYYMM- DD where YYYY is the four-digit year, MM is the month of the year between 01 (January) and 12 (December), and DD is the day of the month between 01 and 28 or 29 or 30 or 31, depending on length of the month and whether it is a leap year.

Page 24: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

date if ($e =~ /date/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^\d{4}$/) { $patterns{$neworigin}{$newsid}{"date_dddd"}++; } elsif ($elementhash{$e} =~ /^\d{4}-\d{2}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dd"}++; } elsif ($elementhash{$e} =~ /^\d{4}-\d{2}-\d{2}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dd-dd"}++; } elsif ($elementhash{$e} =~ /^\d{4}-\d{4}$/) { $patterns{$neworigin}{$newsid}{"date_dddd-dddd"}++; } elsif ($elementhash{$e} =~ /^\d{8}$/) { $patterns{$neworigin}{$newsid}{"date_dddddddd"}++; } else {$patterns{$neworigin}{$newsid}{"date_dateother"}++; } }

Page 25: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

date_other examples

• 1956 July• %7E1994• June 5%2C 2002• JUN 30 05• 2006%282007%29• 1922,+April+25th

• %5B%5B1943-06-19%5D%5D

Page 26: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

issn

International Standard Serials Number (ISSN). The issn may contain a hyphen, e.g. "1041-5653"

Page 27: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

issn

if ($e =~ /issn/) { $patterns{$neworigin}{$newsid}{$e}++; if ($elementhash{$e} =~ /^\d{4}-\d{3}./)

{ $patterns{$neworigin}{$newsid}{"issn_number_number"}++; }

elsif ($elementhash{$e} =~ /^\d{7}./) { $patterns{$neworigin}{$newsid}{"issn_number"}++; }

else { $patterns{$neworigin}{$newsid}{"issn_other"}++; } }

Page 28: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

issn_other examples

• 0065-2598%28print%29• 0018-5345+%28ISSN+print%29• ISSN ISBN 0-9525091-5-6.• 0021-8375%28print%29%7C1439-

0361%28electronic%29• 1471-2164+%28ISSN+online%29• 0191-8699%3B0191-8699• 0741-8329 (Print)%3B NLM Unique Journal

Identifier%3A 8502311

Page 29: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

How often out of 402,000 Cornell OpenURLs?

metric frequency in July-Sep 2008 sample

au_last_other 5476spage_other 772date_other 591issn_other 200

Page 30: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

flat file output

logsource year quarter origin sid metric countcornell 2009 Q1 csa csa:commabs-set-c atitle 154cornell 2009 Q1 csa csa:commabs-set-c atitle_colon 101cornell 2009 Q1 csa csa:commabs-set-c atitle_other 53cornell 2009 Q1 csa csa:commabs-set-c aulast 159cornell 2009 Q1 csa csa:commabs-set-c aulast_other 4cornell 2009 Q1 csa csa:commabs-set-c aulast_simple 155cornell 2009 Q1 csa csa:commabs-set-c date 159cornell 2009 Q1 csa csa:commabs-set-c date_dddd 110cornell 2009 Q1 csa csa:commabs-set-c date_dddd-dd 49cornell 2009 Q1 csa csa:commabs-set-c isbn 6cornell 2009 Q1 csa csa:commabs-set-c isbn_10 6cornell 2009 Q1 csa csa:commabs-set-c issn 135cornell 2009 Q1 csa csa:commabs-set-c issn_number-number 135cornell 2009 Q1 csa csa:commabs-set-c issue 136cornell 2009 Q1 csa csa:commabs-set-c issue_number 132cornell 2009 Q1 csa csa:commabs-set-c issue_number_dash_number 2cornell 2009 Q1 csa csa:commabs-set-c issue_other 2cornell 2009 Q1 csa csa:commabs-set-c spage 153cornell 2009 Q1 csa csa:commabs-set-c spage_number 153cornell 2009 Q1 csa csa:commabs-set-c title 160cornell 2009 Q1 csa csa:commabs-set-c total 160cornell 2009 Q1 csa csa:commabs-set-c volume 139cornell 2009 Q1 csa csa:commabs-set-c volume_number 139

Page 31: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Demonstration

http://openurlquality.blogspot.com/

Page 32: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

Next steps

• create a NISO structure to wrap around the metrics: “NISO OpenURL Quality Index”

• add non-Cornell data from libraries and link resolver vendors (model is agnostic to source)

• confirm and publicize key elements used by target syntaxes

• can the quality of the global OpenURL network be modeled mathematically?

Page 33: A demonstration of transparent and scalable OpenURL quality metrics for use in promoting metadata consistency across content providers

How to stay in the loop

http://openurlquality.blogspot.com/

Adam ChandlerDatabase Management and Electronic Resources Research LibrarianCentral Library OperationsCornell University Librarytel: 607-255-5760email: [email protected]