29
XML & MODS INTRODUCTION TO XML AND THE METADATA OBJECT DESCRIPTION SCHEMA IN THE CTDA

CTDA Workshop on XML and MODS

Embed Size (px)

Citation preview

XML & MODSINTRODUCTION TO XML AND THE METADATA OBJECT DESCRIPTION SCHEMA IN THE CTDA

XMLXML stands for eXtensible markup language. XML was designed to describe data whereas HTML was designed to display data.

XML uses “tags”. In metadata land, these are also referred to as “labels”, “elements”, or “fields”. These tags are not predefined but are meant to be self-descriptive.

You can invent your own tags in XML.

By itself, XML DOES NOT DO ANYTHING. XML needs a script written by someone or a piece of software to receive, send, transform, or display it.

XML is a software and hardware independent tool for carrying information. It is not a replacement for HTML but can be a complement to HTML.

ExtensibleYou can create and define your own tags.

<note>

<myAwesomeNote>

<thisIsMyTag>

The power of being extensible is the ability to customize your xml.

MarkupIt’s all about the <tags>. The angle brackets are the most recognizable feature of XML. These

tags or elements are very similar to the ones in HTML.

Elements are surrounding by angle brackets. Each element has an opening and closing designation like HTML.

LanguageXML is a language or rather a “meta” – language. XML allows you to create and definite other languages.

Have you ever heard of RSS feeds, XSLT, or XSD?

Languages such as XSLT and XSD are sometimes referred to as members of the XML family.

XSLT is eXtensible stylesheet transformation

XSD is eXtensible schema definition

XML DocumentsWhen you create an XML file or document, you essentially are creating a text file with the extension .xml. Because it is a text file, it can be read by any type of software or hardware. This is why xml simplifies data sharing and transport. It also helps when you change platforms because text can be read a large number of programs and systems.

XML documents all have the same structure, called a tree. There is a branch, limbs, and leaves.

The XML declaration declares that this is an XML document. The branch of the tree is called the root. The limbs and leaves are called children. Another name for the root is “parent”.

XML Document <?xml version=“1.0” encoding=“UTF-8”?>

<note><to>Homer</to>

<from>Marcy</from>

<heading>Reminder</heading>

<body>Don’t forget about the BBQ this weekend</body>

</note>

__

<root>

<child>

<subchild>….</subchild>

</child>

</root>

XML Declaration

The Root or ultimate parent element

Children elements to the parent element, note, which is also the root

note

to from heading body

XML Expanded<note> is the root. It is also the parent to 4 children.

<to>, <from>, <heading>, <body> are children to its parent, <note>, and are siblings.

A parent element does not necessarily have to be the root element in the XML file.

All elements must have a closing tag.

All elements are case sensitive.

All elements must be properly nested.

All XML documents (or files) must have a root element.

All attributes values must be quoted.

All entity references (such as &, <, “, etc.) must use the 5 pre-defined entity references.

More XMLXML has comments that appear in the following syntax:

<!-- Add your comments here -->

White-space is preserved in XML. Hello Homer. Hello Homer.

A new line in XML is just a line feed whereas in Windows it is a carriage return and line feed. Use Notepad++ or Oxygen to edit your XML.

An XML document is well-formed is it conforms to the rules above.

What is an element?An element is everything from the start tag to the closing tag.

An element can contain:• Other elements

• Text

• Attributes

• Mix of the above

<bookstore>

<book category=“children”>

<title>Harry Potter</title>

<author>J.K. Rowling</author>

<year>2005</year>

</book>

<book category=“young adult”>

<title>Hunger Games, book 1</title>

<author>Suzanne Collins</author>

<year>2008</year>

</book>

</bookstore>

ElementsXML Naming rules:

•Elements are case sensitive.

•Element names must start with a letter or an underscore.

•Element names can’t start with the letters xml (XML, xMl, xmL, etc.)

•Element names can contain letters, digits, hyphens, underscores, and periods

•Element names cannot contain spaces

AttributesAttributes provide additional information about elements. Values must be placed in quotes.

<person gender=“female”>

<book category=“young adult”>

Notice that attribute values can have spaces. Attributes can’t have multiple values, tree structures and are there not very extensible.

<person>

<gender>female</gender>

</person>

When would you use an attribute and not an element? It depends on what you want and if you are writing an XML document based on definitions already decided for you such as a metadata standard.

Name ConflictsBecause you can create your own elements, there are times when elements have the same name but refer to very different things.

Here’s an HTML table:

Here’s a table that is a piece of furniture:

If we combine these XML documents, there will be a conflict. How do you know that <table> is different from <table>?

Namespaces – The Name Authority of XMLName conflicts such as this are resolved by adding a prefix. The prefix is a namespace and must be defined by using the xmlns attribute in the start tag of the root or element.

xmlns:prefix=“URI”

The URI can be fictional in some cases. In many cases, it is not and refers to what is called a schema or document definition type. A schema, XSD, is like a dictionary and grammar for an XML document. It outlines the syntax and semantics that an XML document needs to follow in order to conform to that schema.

For example, an XML that is a MODS file and that references the MODS schema must conform to the syntax and semantics required by MODS as specified by the MODS schema. If you want to learn German, you need a German dictionary and grammar book to help you write in German.

Metadata Object Description SchemaMODS is an XML based bibliographic description schema developed and maintained by the Library of Congress. It is a compromise between the simplicity of Dublin Core and the complexity of MARC. It was developed in 2002. Currently, MODS is now in version 3.6.

The main web site for MODS: http://www.loc.gov/standards/mods/.

This site provides information about the standard, guidelines, tools, schemas (for each version of MODS), conversions, etc.

The CTDA does not implement the full standard of MODS.

CTDA Implementation of MODSCTDA’s implementation guidelines and metadata application profile can be found online on our web site (http://ctdigitalarchive.org/resources-for-participants).

These guidelines and profile are based on the full standard and in part on the technical infrastructure’s capabilities for managing metadata. Such capabilities include indexing, mapping/transforming, re-using, sharing, displaying, or extracting metadata.

CTDA implements MODS version 3.5 and references that version in MODS XML records using the XML namespace declaration, xmlns, and the prefix, mods.

Minimum MODS XMLXML Declaration

◦ <?xml version=“1.0” encoding=“UTF-8”>

Root◦ <mods:mods xmlns:mods=“http://www.loc.gov/mods/v3” xmlns:xlink=“http://www.w3.org/1999/xlink”

xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” version=“3.5” xsi:schemaLocation=“http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-5.xsd”>

Title◦ <mods:titleInfo><mods:title>

Resource type◦ <mods:typeOfResource>

Digital Resource◦ <mods:physicalDescription><mods:digitalOrigin>

Minimum MODS XML ContinuedHeld By

◦ <mods:note type=“ownership”>

Rights◦ <mods:accessCondition type=“use and reproduction”>

Persistent Identifier◦ <mods:identifier type=“hdl”>

Language of MODS record◦ <mods:recordInfo><mods:languageOfCataloging><mods:languageTerm type=“code” authority=“iso639-

2b”>

Remember that each opening tag needs a closing tag and there is a specific MODS tree to follow according to the MODS specification or the schema version 3.5.

Example of Minimal MODS XML Document<?xml version=“1.0” encoding=“UTF-8”>

<mods:mods xmlns:mods=“http://www.loc.gov/mods/v3” xmlns:xlink=“http://www.w3.org/1999/xlink” xmlns:xsi=“http://www.w3.org/2001/XMLSchema-instance” version=“3.5” xsi:schemaLocation=“http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-5.xsd”>

<mods:titleInfo>

<mods:title>This is an example title an image</mods:title>

</mods:titleInfo>

<mods:typeOfResource>still image</mods:typeOfResource>

<mods:physicalDescription>

<mods:digitalOrigin>reformatted digital</mods:digitalOrigin>

</mods:physicalDescription>

<mods:note type=“ownership”>Bridgeport History Center, Bridgeport Public Library</mods:note>

<mods:accessCondition type=“use and reproduction”>Rights statement</mods:accessCondition>

<mods:identifier type=“hdl”>http://hdl.handle.net/11134/110002:495858</mods:identifier>

<mods:recordInfo>

<mods:languageOfCataloging>

<mods:languageTerm type=“code” authority=“iso639-2b”>eng</mods:languageTerm>

</mods:languageOfCataloging>

</mods:recordInfo>

</mods:mods>

MODS XML Explained

XML declaration

+

Root (mods:mods)

mods:titleInfo

mods:title

mods:typeOfResource

(controlled vocabulary)mods:physicalDescription

mods:digitalOrigin

(controlled vocabulary)

mods:note mods:accessCondition mods:identifier mods:recordInfo

mods:languageOfCataloging

mods:languageTerm

XML declaration

Open root

Open 1st child (titleInfo)

Open 1st grandchild (or child of parent titleInfo) (title)

Add content

Close 1st grandchild (title)

Close 1st child (titleInfo)

Open 2nd child (typeOfResource)

Add content

Close 2nd child (typeOfResource)

Open 3rd child (physicalDescription)

Open child of parent physicalDescription (digitalOrigin)

Add content using one of the required terms from schema

Close child of parent (digitalOrigin)

Close 3rd child (physicalDescription)

Open 4th child (note)

Add attribute type with suggested value based on LC recommendations

Add content

Close 4th child

ETC.

type type

typetypeauthority

Attributes go in the opening tag only.

Particulars of MODStypeOfResource has a required value list: text; cartographic; notated music; sound recording-musical; sound recording-nonmusical; sound recording; still image; moving image; three dimensional object; software; multimedia; mixed material.

digitalOrigin has a required value list: born digital, reformatted digital, digitized microfilm, digitized other analog

languageTerm requires the attribute type with the value of code and the attribute authority set to iso639-2b

The attribute qualifier for dateIssued has a required value list: approximate, inferred, questionable.

There is an ORDER to how elements appear. For example, the element scale must appear before coodinates.

We don’t use the MODS element relatedItem.

Particulars of CTDA MODS - NameWhen you want to include a name such as an author or contributor, the role must be specified and the entire name goes into one namePart element. The element name requires the attribute type that has the required values of personal, corporate, family, conference. The child of role, roleTerm, requires the attributes authority and type with the required values of marcrelator and text respectively.

<mods:name type=“personal”>

<mods:namePart>Smith, John, 1850-1899</mods:namePart>

<mods:role>

<mods:roleTerm authority=“marcrelator” type=“text”>Author</mods:roleTerm>

</mods:role>

</mods:name>

Particulars of CTDA MODS - DateDates are not required. If you add a date, CTDA implements the element dateIssued element and requires the w3cdtf encoding and attribute keyDate. For date ranges, it is necessary to implement the attribute point with either the value start of end.

Single Date:

<mods:originInfo>

<mods:dateIssued encoding=“w3cdtf” keyDate=“yes”>2010</mods:dateIssued>

</mods:originInfo>

Date Range:

<mods:originInfo>

<mods:dateIssued encoding=“w3cdtf” keyDate=“yes” point=“start”>1907</mods:dateIssued>

<mods:dateIssued encoding=“w3cdtf” point=“end”>1917</mods:dateIssued>

</mods:originInfo>

Single Date with Qualifier:

<mods:originInfo>

<mods:dateIssued encoding=“w3cdtf” keyDate=“yes” qualifier=“inferred”>1908</mods:dateIssued>

</mods:originInfo>

Particulars of CTDA MODS - CoordinatesIn CTDA you can record both a center point and a bounding box. The center point is recording in the element <mods:coordintates>. MODS 3.5 does not have a convenient way to record a bounding box. We use the <mods:extension> element to record bounding box information in the content standard CSGDM.

<mods:cartographics><mods:scale>0.4583333333333333</mods:scale><mods:coordinates>42.023187, -71.852071</mods:coordinates>

</mods:cartographics>

<mods:extension xmlns:fgdc="http://www.fgdc.gov/schemas/metadata/fgdc-std-001-1998.xsd"><fgdc:metadata>

<fgdc:idinfo><fgdc:spdom>

<fgdc:bounding><fgdc:westbc>-71.852071</fgdc:westbc><fgdc:eastbc>-71.841559</fgdc:eastbc><fgdc:northbc>42.030805</fgdc:northbc><fgdc:southbc>42.023187</fgdc:southbc>

</fgdc:bounding></fgdc:spdom>

</fgdc:idinfo></fgdc:metadata>

</mods:extension>

Particulars of CTDA MODS – Aggregating ContentThere is one repository where all content is stored for long-term preservation purposes. Content can be presented on different “channels” or sites. One way of doing this is using what are called Aggregation Tags. These tags are 3 uppercase letters. Each tag designates a particular channel. The index is configured to recognize these tags and then push content to where it needs to go. CTDA has 2 tags: CHO, GEO. These tags are values that go in the element <mods:targetAudience>. This element, targetAudience, CANNOT be used for any other type of content or tags that are made up on the fly.

<mods:targetAudience>CHO</mods:targetAudience>

<mods:targetAudience>GEO</mods:targetAudience>

Question: What is the parent element of this element?

Question: What’s the different between <mods:targetAudience> and <targetAudience>?

How To Recognize Parent/Child Relationships?If you go to the main web site on MODS 3.5 outline (http://www.loc.gov/standards/mods/mods-outline-3-5.html), you will see a list of the TOP LEVEL Elements. Top level elements are all children of the root. Each top level element is then described in terms of its children, required or recommended attributes, and other requirements.

Requirements of CTDA MODSWell-Formed XML

The MODS xml document conforms to the requirements of the XML standard.

Do you remember the requirements?

There are online tools to check this:

http://www.w3schools.com/xml/xml_validator.asp

http://xmlgrid.net/validator.html

Oxygen xml software editing tool

Valid Document

The MODS xml document conforms to the requirements of MODS version 3.5.

http://www.loc.gov/standards/mods/v3/mods-3-5.xsd

What does this mean?

There are online tools to check this:

http://www.xmlvalidation.com/

http://www.utilities-online.info/xsdvalidation/#.VVS9x_lVhBc (requires to input both your xml and the MODS 3.5 xsd)

Oxygen xml software editing tool

An exampleLet’s write a MODS xml document from scratch….

QuestionsLinks:

http://ctdigitalarchive.org/resources-for-participants/

http://www.loc.gov/standards/mods/

http://www.w3schools.com/xml/default.asp

http://www.w3schools.com/xml/xml_schema.asp

http://www.w3schools.com/xml/xml_validator.asp

http://www.utilities-online.info/xsdvalidation/#.VVTCc_lVhBc

http://www.oxygenxml.com/

https://notepad-plus-plus.org/