28
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences http://bioimages.vanderbilt.edu/ August 3, 2010

Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Embed Size (px)

Citation preview

Page 1: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Integrating Live Plant Images with Other Types of Biodiversity

Records

Steve BaskaufVanderbilt Dept. of Biological Sciences

http://bioimages.vanderbilt.edu/August 3, 2010

Page 2: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

I. Challenges in Biodiversity Informatics

• Common interest in databasing metadata.• Metadata describe resources and their

properties. • Resource: anything that can be assigned an

identifier (e.g. a tree, a specimen, an image, a taxon, a name, etc.)

• Property: a string literal that describes the resource or a relationship between the subject resource and some other resource.

Page 3: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Example: Vanderbilt Arboretum 5935 identifiedand geolocated trees

Page 4: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Example

subjectresource(a tree)

“native”

literalproperty

relationshipproperty

objectresource(an image)

text string

establishmentMeans

depiction

Page 5: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Relationship “graph”

the tree(7-314)

image(79657)

“native”establishmentMeans

depiction

Tree ID Establishment Means

Image ID

7-314 native 79657

7-340 native 79674

4-145 cultivated 79684

Traditional database (typical for specimens)

Page 6: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Non-“flat” relationships in live-plant imaging

live tree

whole tree imageleaf image

bark image

determination

taxon

standardized viewsBaskauf and Kirchoff (2008)

Vulpina 7:16-30

Page 7: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Duplicate herbarium specimens

live tree

specimen image

duplicate herbarium specimen at institution B

herbarium specimen at institution A

live treesame

individual

determination A

taxon A

determination B

taxon B

Page 8: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

live tree (individual organism)

whole tree image

leaf image

bark image

determination A

taxon A

specimen image

herbarium specimen

determination B

taxon B

Complex relationshipsindividual-based organization system

Baskauf (2010) Biodiversity Informatics 7:17-44

Page 9: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

II. Building blocks of a Web-based metadata system

1. We need to be able to unambiguously identify the resources (globally unique identifiers =GUIDs)

2. We need standardized property definitions (e.g. Darwin Core terms)

3. We need a technological solution for communicating properties and relationships to a user anywhere (RDF/XML representation sent to user via the Internet)

design principleshttp://bioimages.vanderbilt.edu/guid

Page 10: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Building block #1: GUIDs

A globally unique identifier (GUID) should be:1. globally unique2. actionable3. persistent

Anyone on the planet should be able to use the GUID to find out about the particular thing that it identifies, forever.

That is a pretty tall order (but you can do it)!!!

Page 11: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

1. How do you make an identifier globally unique?

• Create a locally unique identifier:– identifier (catalog number) unique within a

collection, e.g. GIS tree ID number: 7-314– namespace (collection code) unique within the

institution, e.g. vanderbilt

vanderbilt/7-314• Make it globally unique by appending a

domain name that you control, e.g. bioimages.vanderbilt.edu

Page 12: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Complete HTTP URI GUID

• combine “http://” with other pieces: http://bioimages.vanderbilt.edu/vanderbilt/7-314

• This identifier looks like a URL!

An HTTP URI is a uniform resource identifier as well as a resource locator (web address=URL).

Page 13: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

2. What does actionable mean?

• Something happens when you put an actionable GUID in a Web browser (GUID is “resolved”).

• HTTP URIs– unlike LSIDs and DIOs, they work in any web

browser– resolved using existing Internet infrastructure– consensus GUID of Linked Data (Semantic

Web) community– http://bioimages.vanderbilt.edu/vanderbilt/7-314

Page 14: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

3. Persistent URIs always work

• URIs “break”: when filenames change:

Javascript based URI:http://bioimages.vanderbilt.edu/metadata.htm?baskauf/66921/metadata/img/3456/2304

Independent of method:http://bioimages.vanderbilt.edu/baskauf/66921.htm

Both URIs eventually lead to the same page, but the second URI is simpler and won’t change.

• URIs “break”: when domain names disappearbioblitznashville.org vs. vanderbilt.edu• Planning for URI permanence is important.

Page 15: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

How long is “persistent”?

• Forever is a pretty long time.• The Internet is only 40 years old and the Web

only 20.• Plan for your institution and domain name to

last at least 10 years.• Don’t change the URI of anything that you are

trying to identify!

Page 16: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Building block #2: Standardized property definitions

Recent consensus on metadata terms:• Dublin Core Metadata Initiative (DCMI) =

describes generic resources• Friend-Of-A-Friend (FOAF) = describes people

and their affiliations• Darwin Core (DwC) = describes biodiversity

resources• Media Resources Task Group (MRTG) =

describes media (e.g. images) in a biodiversity context

Page 17: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

A property described by a metadata term:

• is an HTTP URI, e.g. http://rs.tdwg.org/dwc/terms/establishmentMeans

• has a definition that can be accessed via the Internet

• has an abbreviated form that usually makes sense to humansdwc: = http://rs.tdwg.org/dwc/terms/so the abbreviated URI for the term isdwc:establishmentMeans

Page 18: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

subject resource (tree)

“native”

object resource (image)

establishmentMeans

depiction

nativehttp://bioimages.vanderbilt.edu/vanderbilt/7-314

dwc:establishmentMeans

foaf:depiction

http://bioimages.vanderbilt.edu/baskauf/79657

Resource Description Framework (RDF) graph

Building block #3: Communicating relationships

Page 19: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

nativehttp://bioimages.vanderbilt.edu/vanderbilt/7-314

dwc:establishmentMeans

foaf:depiction

http://bioimages.vanderbilt.edu/baskauf/79657

Resource Description Framework (RDF) graph

RDF in XML format (a tiny snippet)<rdf:Description rdf:about="http://bioimages.vanderbilt.edu/vanderbilt/7-314"> <dwc:establishmentMeans>native</dwc:establishmentMeans> <foaf:depiction rdf:resource="http://bioimages.vanderbilt.edu/baskauf/79657"/></rdf:Description>

How do you translate relationships into language a computer can understand?

Page 20: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

III. Why use a new way to describe metadata?

• People are good at figuring out what web pages mean.

• Computers (like a GoogleBot) have to guess what the information on a web page means.

• The Semantic Web (a.k.a. Web 2.0) provides a means to provide information to computers explicitly.

Page 21: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Content Negotiation, part 1“I am a human. Send me

http://bioimages.vanderbilt.edu/vanderbilt/7-314”

web server

GET http://bioimages.vanderbilt.edu/vanderbilt/7-314MIME type: text/html

http://bioimages.vanderbilt.edu/vanderbilt/7-314.htm

I cannot send this guy a tree!

Web page

Page 22: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Content Negotiation, part 2

web server

GET http://bioimages.vanderbilt.edu/vanderbilt/7-314MIME type: application/rdf+xml

http://bioimages.vanderbilt.edu/vanderbilt/7-314.rdf

10011000101!

XML file

“I am a computer. Send me http://bioimages.vanderbilt.edu/vanderbilt/7-314”

Page 23: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

What’s so great about this?• A computer can crawl the Web and discover

metadata about resources that are identified by HTTP URI GUIDs.

• RDF metadata from many sources can be assembled into a database (RDF “triple store”).

• The database can be searched or used to generate web content.

• Source data does not need to be “sent” to the database; any “semantic web client” can retrieve it at will.

• The format is standard, no special communication protocols are required.

Page 24: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Why would this benefit me now?

• RDF/XML metadata files for numerous resources can be transformed directly into web pages using a single program file.

single web page usingXSLT and/or AJAX

Page 25: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Benefits (cont.)

• Branding in the URI.http://bioimages.vanderbilt.edu/vanderbilt/7-314

Page 26: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Benefits (cont.)

• HTTP URI GUIDs provide direct access to metadata about a resource to anyone with Internet access. – Clickable attribution link in website– Reference link in publication PDF– Physical QR codes for Smart Phone access

Page 27: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

QR code on a museum display

Page 28: Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

http://bioimages.vanderbilt.edu/