View
251
Download
0
Tags:
Embed Size (px)
Citation preview
In and out: how does that metadata getinto a knowledgebase anyhow?
Heather Sherman
Head of Library Programme Management – Dawson Books
Connect Group PLC
Creation process
2In and out: how does that metadata get into a knowledgebase anyhow?
Sign contract with publisher
Acquire content and basic metadata
Correct metadata errors
Enhance basic metadata
Create ProQuest xml feed
Create TOC data
Connect Group PLC 3In and out: how does that metadata get into a knowledgebase anyhow?
Sign contract with publisher
Process starts with a publisher agreeing to host their titles on dawsonera.
Publishers are asked to send Dawson the ebook content, jacket image and associated metadata.
Some send this in xml. Others complete a spreadsheet.
Connect Group PLC 4In and out: how does that metadata get into a knowledgebase anyhow?
Publisher sends files of metadata
Publishers supply key pieces of metadata
eISBN
Title
Subtitle
Author(s)
Price
Currency
PDF file name
Jacket image
Publisher
Imprint
Publication date
Edition
Country of publication
Usage model
Connect Group PLC
Spreadsheet of metadata
5In and out: how does that metadata get into a knowledgebase anyhow?
Connect Group PLC 6In and out: how does that metadata get into a knowledgebase anyhow?
Publisher sends files of metadata
However….
Not all publishers supply the key data, so we have to go and find it.
Some supply incorrect data, so we have to fix that.
Dawson’s automated import process checks that key data is present and correct, and reports on error.
Connect Group PLC
Metadata errors
7In and out: how does that metadata get into a knowledgebase anyhow?
Connect Group PLC 8In and out: how does that metadata get into a knowledgebase anyhow?
Table of contents data created
PDF files are sent to an agency who create Table of Contents (TOC) data.
For ePub files, the TOC is extracted directly from the file.
TOC data is imported into the Dawson system and matched up with the PDFs and metadata.
Connect Group PLC
TOC xml
9In and out: how does that metadata get into a knowledgebase anyhow?
Connect Group PLC 10In and out: how does that metadata get into a knowledgebase anyhow?
Metadata enhanced
Publisher metadata and TOC data is matched to existing print records in the Dawson title database.
Hybrid record is created incorporating data from the publishers and Dawson.
Produces a record containing as much information as Dawson have about the title.
Connect Group PLC
Dawson ebook MARC record
=LDR 01354nam 2200349 4500=001 DAW28874972=007 cr=008 140327s2014\\\\enk\\\\fs\\\\\001|0|eng|d=020 \\$a0191015024 (e-book) =020 \\$a9780191015021 (e-book) =040 \\$aStDuBDS$cStDuBDS$erda$dDAWSON=041 1\$aeng$hita=082 04$a320.53209$223=100 1\$aPons, Silvio,$eauthor.=245 14$aThe global revolution$h[electronic resource] : $ba history of international communism, 1917-1991 / $cSilvio Pons ; translated by Allan Cameron. =264 \1$aOxford :$bOxford University Press,$c2014.=300 \\$axx, 365 pages =336 \\$atext$2rdacontent=337 \\$acomputer$2rdamedia=338 \\$aonline resource$2rdacarrier=490 1\$aOxford studies in modern European history=500 \\$aTranslated from the Italian.=504 \\$aIncludes bibliographical references and index.=530 \\$aAlso available in printed form.=533 \\$aElectronic reproduction.$cDawson Books.$nMode of access: World Wide Web.=650 \0$aCommunism$xHistory.=650 \0$aCommunism.=655 \7$aElectronic books.$2lcsh=700 1\$aCameron, Allan,$d1952-$etranslator.=776 0\$cHardback$z9780199657629=830 \0$aOxford studies in modern European history.
11In and out: how does that metadata get into a knowledgebase anyhow?
Connect Group PLC 12In and out: how does that metadata get into a knowledgebase anyhow?
ProQuest feed created
Hybrid record is extracted and turned into an xml record.
Dawson sends daily files of new titles and updated data to ProQuest.
A weekly file of data for all titles is sent.
Connect Group PLC
xml data sent to ProQuest
<document initial-page="4" jacket="9780191015021.jpg" lang="eng">
<eisbn>
<eisbn13>9780191015021</eisbn13>
<eisbn10>0191015024</eisbn10>
</eisbn>
<isbn-group>
<isbn10 type="hb">0199657629</isbn10>
<isbn13 type="hb">9780199657629</isbn13>
</isbn-group>
<title-group>
<title>The Global Revolution: A History of International Communism 1917-1991</title>
<subtitle>A History of International Communism 1917-1991</subtitle>
</title-group>
<author-group>
<author>
<person-name>Silvio Pons ; Translated By Allan Cameron.</person-name>
</author>
</author-group>
13In and out: how does that metadata get into a knowledgebase anyhow?
IN AND OUT: HOW DOES THAT
METADATA GET
INTO A KNOWLEDGEBASE ANYHOW?
Ben Johnson
Lead Metadata Librarian, KB Provider Data
Acquisition and Ingestion of Provider Data
into a Knowledgebase (KB)
Introduction
What do I do?
4/15/2015 15
Lots of times it feels more like this:
4/15/2015 16
Introduction
Acquire
• Get the data
• Verify compatibility
• Map the data
Ingest
• Transform the data
• Load
• Review
• Accept/Reject
Correct
• Customer inquiries
• Content integrity
• Product interoperability
… Profit!
4/15/2015 17
Providers we partner with
PublishersContent
Aggregators (PQ, Gale)
University and Library
Local Content
Library Consortia
(JISC, BIBSAM)
4/15/2015 18
Content Acquisition
• No data
• No contracts
• Provider Relations
4/15/2015 19
KBART
• Joint NISO/UKSG Group
• Librarians, Vendors, Providers
• Transmission of metadata to vendors
• Human and machine readable data
• http://www.niso.org/workrooms/kbart
4/15/2015 20
Ingestion – mapping and transformation
• FTP, HTML
• CSV/Text, Excel, XML, HTML
Acquire the data
• Data for existing content is mapped to KB packages (new T&F package, JISC/BIBSAM new license)
Create packages
• Map the content to our schema
• Normalize the data (dates, diacritics)
Transform the content
4/15/2015 21
XML Data from Dawsonera
4/15/2015 22
File ready for ingestion
4/15/2015 23
Ingestion – Loading and Review
4/15/2015 24
Currency (Updating)
Acquisition
IngestionReview
4/15/2015 25
Corrections
4/15/2015 26
Correcshunz Corrections
Downstream products
Data in KBDownstream
ProductsProduct
functionalityDiscovery Access
4/15/2015 27
IN AND OUT: HOW DOES THAT
METADATA GET INTO A
KNOWLEDGEBASE ANYHOW?
Dave Hovenden – Content Operations Manager, Summon
ProQuest
UKSG Conference – 30 March – 1 April, 2015
The Content Ingestion Streams for Summon
4/15/2015 29
The Content Ingestion Process at Summon for Commercial
Content
Identify New Content to Add into Summon
4/15/2015 30
• Product Management, Sales,
and our Global Content Alliance
work together to identify new
content to add into Summon
• New content requests from
Summon customers are also
considered
• Publishers and content
providers may also request to
have their content added into
Summon
4/15/2015 31
Identifying New Commercial Content to Add into Summon
The Content Ingestion Process at Summon for Commercial
Content
Identify New Content to Add into Summon
Engage with Publisher/Provider
Pre-Agreement Content Sample
Analysis
4/15/2015 32
• The sample analysis is used to help determine the quality and extent of the metadata and the metadata schema
• We also try to determine things such as linking methods, how rights are assigned to the content, and what databases we may need in our knowledgebase (if they don’t already exist)
• Summon often indexes content at the article-level, or chapter-level as that is usually the level of granularity that the content is supplied at
4/15/2015 33
Pre-Agreement Content Sample Analysis
What Metadata Do We Look For During Sample Analysis?
4/15/2015 34
Title Metadata
• Article titles, Book titles, Publication titles, Subtitles, etc.
Identifier Metadata
• Unique IDs for specific articles, chapters, etc.
• Publication-level unique identifiers such as ISSN or ISBN
• Additional identifiers such as OCLC Number, LCCN, Dewey, DOI, etc.
Publication Information Metadata
• Publisher, Author(s), Corporate Authors, Volume Numbers, Issue Numbers, Start Page, Publication Date, Publication Series, etc.
Additional Metadata
• Subject Headings, Keywords, Language
Dawsonera Book Example – The Global Revolution: A History of
International Communism 1917-1991 (ISBN-13 – 9780199657629)
4/15/2015 35
<document initial-page="4" jacket="9780191015021.jpg" lang="eng">
<eisbn>
<eisbn13>9780191015021</eisbn13>
<eisbn10>0191015024</eisbn10>
</eisbn>
<territory-group/>
<parent-isbn/>
<isbn-group>
<isbn10 type="hb">0199657629</isbn10>
<isbn13 type="hb">9780199657629</isbn13>
</isbn-group>
<title-group>
<title>The Global Revolution: A History of International Communism 1917-1991</title>
<subtitle>A History of International Communism 1917-1991</subtitle>
</title-group>
<author-group>
<author>
<person-name>Silvio Pons ; Translated By Allan Cameron.</person-name>
</author>
</author-group>
<endnote-authors>
<endnote-author>Pons, Silvio,</endnote-author>
<endnote-author>Cameron, Allan,</endnote-author>
</endnote-authors>
Dawsonera Book Example (cont.) – The Global Revolution: A
History of International Communism 1917-1991 (ISBN-13 –
9780199657629)
<publisher>
<publisher-name>Oxford University Press</publisher-name>
<imprint>Oxford University Press</imprint>
</publisher>
<pub-place>GB</pub-place>
<pub-date>20140815</pub-date>
<date-added>20140911</date-added>
<first-published/>
<edition/>
<copyright>© Oxford University Press 2014</copyright>
<classification type="dewey">320.53209</classification>
<classification type="loc">HX40</classification>
<classification type="bic">HB</classification>
<series issn="" series-name="Oxford studies in modern European history." number-within-series="">Oxford studies in
modern European history.</series>
<abstract-text>The Global Revolution. A History of International Communism 1917-1991 establishes a relationship
between the history of communism and the main processes of globalization in the past century. Drawing on a wealth of
archival sources, Silvio Pons analyses the multifaceted and contradictory relationship between the Soviet Union and the
international communist movement, to show how communism played a major part in the formation of our modern world.
The volume presents the argument that during the age of wars from 1914 to 1945, the establishment of the Soviet state in
Russia and the birth of the communist movement had an enormous impact because of their promise of world revolution
and international civil war. Such perspective appeared even more plausible in the aftermath of the Second World War and
of revolution in China, which paved the way for the expansion of communism in the post-colonial world. Communism
challenged the West in the Cold War - by means of anti-capitalist modernization and anti-imperialist mobilization - showing
itself to be a powerful factor in the politicization of global trends. However, the international legitimacy of communism
declined rapidly in the post-war era. Soviet power exposed its inability to exercise hegemony, as distinct from domination.
The consequences of Sovietization in Europe and the break between the Soviet Union and China were the primary
reasons for the decline of communist influence and appeal. Since communism lost its political credibility and cultural
cohesion, its global project had failed. The ground was prepared for the devastating impact of Western globalization on
communist regimes in Europe and the Soviet Union.</abstract-text>
<keywords>Communism</keywords>4/15/2015 36
• Summon relies upon the
knowledgebase to help facilitate
rights access to the content
• Rights access is assigned by
tracking a particular title by ISSN
or ISBN in the knowledgebase, or
by Database ID
• The knowledgebase also helps
Summon indicate when content
has full-text availability
4/15/2015 37
Summon and the Knowledgebase
The Content Ingestion Process at Summon for Commercial
Content
Identify New Content to Add into Summon
Engage with Publisher/Provider
Pre-Agreement Content Sample
Analysis
Formalize and Sign Data Sharing
Agreement
Data is Delivered in Full from
Publisher/Provider
Data Normalization, Mapping, and Enrichment
4/15/2015 38
Data Normalization, Mapping, and Enrichment Work
• Very basic high-level clean-up of the data to standardize it
• Examples include:
• Remove leading/trailing white spaces in Title and Subtitle fields
• Clean-up diacritics and other encoding issues
Data Normalization
• Map the metadata fields in the records to the Summon schema
• This allows the metadata to appear in the UI and/or be made searchable within Summon
Mapping
• Enriching the content by adding additional metadata when applicable
• Examples:
• Scholarly/peer-reviewed flags from Ulrich’s
• Citation counts from Scopus
• Book cover images from Syndetics
Enrichment
4/15/2015 39
The Content Ingestion Process at Summon for Commercial
Content
Identify New Content to Add into
Summon
Engage with Publisher/Provider
Pre-Agreement Content Sample
Analysis
Formalize and Sign Data Sharing Agreement
Data is Delivered in Full from
Publisher/Provider
Data Normalization, Mapping, and Enrichment
Indexing
4/15/2015 40
The Title as it Appears in Summon Once Indexed
4/15/2015 41
The Content Ingestion Process at Summon for Commercial
Content
Identify New Content to Add into
Summon
Engage with Publisher/Provider
Pre-Agreement Content Sample
Analysis
Formalize and Sign Data Sharing Agreement
Data is Delivered in Full from
Publisher/Provider
Data Normalization, Mapping, and Enrichment
IndexingPost-Ingestion Maintenance
4/15/2015 42
Post-Ingestion Maintenance
4/15/2015 43
Currency
• Currency is the process of the publisher/provider sending to Summon new/updated metadata records, or record deletions for content that need to be removed
• Frequency of providing updates is often at the discretion of the publisher/provider
Metadata Issues
• Address reported issues of metadata quality from Summon customers
• Most issues involve incorrect metadata, or slight variations in the metadata that may impact OpenURL linking or the record deduplication process (Match & Merge)
Thank you – Any Questions?
Heather Sherman
Benjamin Johnson
Dave Hovenden