Re-usable metadata, re-usable content

Preview:

Citation preview

                                                             

UKOLN is supported by:

Re-usable metadata, re-usable content

Paul WalkTechnical Managerp.walk@ukoln.ac.uk

A centre of expertise in digital information management

www.ukoln.ac.uk

                                                             

harvesting, searching, syndicating

• options for metadata and content:

• the lines can be blurred– search engines also harvest!

• your metadata may be my content

metadata content

harvestable

searchable ✓ ✓syndicable ✓

                                                             

being harvestable (1)

• Open Archives Initiative– OAI-PMH

– repositories

– OAI-ORE

• aggregators:• Intute Institutional Repository

Search– currently harvesting eprints

metadata records from 88 institutions

– planning to explore the harvesting of metadata for:

• images• learning objects• other media.....

• MLA’s Discover Service– your content is of interest to other

domains

                                                             

being harvestable (2)

• what is your metadata record actually going to point to?– more than one item of content?

– a ‘jumping off’ page?

– is this consistent?

• what metadata format are you going to use?– is it commonly supported?

– are you using it correctly? (you’d be surprised.....)

• where/how is your metadata going to be used?– this is necessarily out of your control!

                                                             

being searchable (1)

• exposing your content to search engines

• search engine optimisation (SEO)

– make it easy for the search engines

– have content people want

– make it eminently linkable

• Google is your friend!– SiteMaps - describe your content in

ways Google can understand

– OAI-PMH interface can be treated as a SiteMap

                                                             

being searchable (2)

• Z39.50– from the library domain

– allows the target to participate in a cross search

– very mature, very widely deployed

– not a web protocol

• SRU– web-ified Z39.50

– ReSTful

– Common Query Language (CQL)

• SRW– as above, but for heavier SOA/Web Services use

• OpenSearch– piggyback on RSS/Atom

                                                             

being searchable (3)

• search portals

• community portals

• institutional portals/VLEs

                                                             

be syndicable, enable re-use by 3rd parties

• consider RSS (and the Atom syndication format)– in some ways the lingua franca of Web 2.0

– machine and human friendly

– surprsing how much content lends itself to this structure

• RSS2.0 can also ‘enclose’ binary data– syndicating podcasts

• “the coolest use of your data will be thought of by someone else”• be mashup friendly:

– addressable content

– cool URLs

– simple formats

– aspire to APIs that need no documentation!

                                                             

human and machine interfaces (1)

• they’re completely different....right?

• well, not necessarily– RSS!

– OAI-PMH with a CSS stylesheet referenced from the XML

                                                             

human and machine interfaces (2)

• ‘screen-scraping’ is back in fashion

• plain old semantic HTML (POSH)

• linked-data (the semantic web with a small ‘s’)

• the web of data is imminent!

                                                             

future design: taking a REST from service provision

• the resource-oriented-architecture

• ReST:– resources with cool URLs

– 4 HTTP verbs: get, put, post & delete

– CRUD for the Web (create, retrieve, update, delete)

• make everything addressable with URLs• be cool!

– make the URLs persistent

– make them human-parsable

– e.g.• http://www.myserver.com/gallery/collections/pictures/image_0001.jpg

– is better than:• http://www.myserver.com/gallery.php?collection_id=7&item_id=0001

                                                             

my suggestions

• using web protocols

• make content addressable - and persistently so

• reduce barriers to third-parties developing other (competing!?) UIs– are our UIs really just ‘gateways’ to information (implying that there is a wall around that

information)

• making the machine APIs the heart of our services– a good design principle is to use the machine API as the API used by our own user-

interfaces

– we just can’t know for sure all the ways in which our information services might be used

                                                             

acknowledgements

• in preparation for this presentation, I blogged about giving this presentation and asked my readers:

– “Aside from the obvious stuff like OAI-PMH, Google, RSS, what should I be talking about? Persistent identifiers? Cool URLs? Any other suggestions?”

• 6 responses - all containing great suggestions which I have incorporated into this presentation, from the following people:

– Jim Downing, Owen Stephens, Ian Ibbotson, Pete Johnston, Mike Ellis

• thanks!!

• you can read all of the comments, and find links/addresses for these people on my blog at:

– http://blog.paulwalk.net/2008/02/11/making-digitised-content-available-for-searching-and-harvesting/

                                                             

comments

• Ian Ibbotson said:– It’s very hard to engineer a consistent search user interface when half the metadata refers

to the actual digital artefact, and half to a front page. It’s useful to have both links, as you can then negotiate with providers if they feel you need to go through a front page for stats and marketing....

• Pete Johnstone said:– a shift away from the “repository” towards the “collection” or “collections” (which I think is

the consequence of a more “resource-oriented view”)

• Owen Stephens said:– Integration of resources into the wider web - e.g. LoC experiment with Flickr to expose

content. Many projects in this area create a new silo of material that is hidden from the wider web [...] reusable metadata as well as objects.

• Jim Downing said:– ....making the content reusable (not a hard sell in eLearning?). Recent use of RDF and

Atom in a cultural setting: Asemantics BBC aggregator

• Mike Ellis said:– ....RSS, and possibly “programmable” RSS (for example, surfacing search results by

adding query parameters to the feed address, etc)....

                                                             

questions?

Recommended