76
THE EVOLVING SEMANTIC WORLD Barbara McGlamery Taxonomist Martha Stewart Living Omnimedia

The Evolving Semantic World

  • Upload
    oksana

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

The Evolving Semantic World. Barbara McGlamery Taxonomist Martha Stewart Living Omnimedia. About me. Masters in Library and Information Science Long Island University New York Public Library Branch librarian NYPL for the Performing Arts – Drama reference Entertainment Weekly Data Manager - PowerPoint PPT Presentation

Citation preview

Page 1: The Evolving Semantic World

THE EVOLVING SEMANTIC WORLD

Barbara McGlameryTaxonomistMartha Stewart Living Omnimedia

Page 2: The Evolving Semantic World

ABOUT ME Masters in Library and Information Science

Long Island University

New York Public Library Branch librarian NYPL for the Performing Arts – Drama reference

Entertainment Weekly Data Manager

Time Inc. Senior Data Manager, Taxonomist, Metadata Architect, Ontologist

Martha Stewart Living Omnimedia Taxonomist

Page 3: The Evolving Semantic World

AGENDA What is the Semantic Web?

Big “S” and little “s” semantics

What we used to believe Time Inc. & the theory of overkill

What we know now Martha Stewart and the theory that less is more

Where we’re going Leaner and meaner (but more standards)

Page 4: The Evolving Semantic World

WHAT IS THE SEMANTIC WEB?

Page 5: The Evolving Semantic World

The Semantic Web is a web of data…. (it) provides a

common framework that allows data to be shared and

reused across applications, enterprise, and community

boundaries.--w3c

Page 6: The Evolving Semantic World

"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

--Tim Berners-Lee, James Hendler, and Ora Lassila, Scientific American, 2001

Page 7: The Evolving Semantic World

The Semantic Web is about making knowledge machine and human-readable

Page 8: The Evolving Semantic World

---- Amit Agarwalhttp://www.labnol.org/internet/web-3-concepts-explained/8908/

Page 9: The Evolving Semantic World

Web 1.0 Web 2.0 Web 3.0Connections Collaboration Intelligence

Page 10: The Evolving Semantic World

Big S semantic web

Little s semantic web

Page 11: The Evolving Semantic World

BIG S SEMANTIC WEB

…big "S" web technologies provide aframework for describing data on a web page whenthe data on the website is published. If data is reador captured, because the data's semantic meaninghas already been described, you don't have to gothrough the process of understanding the meaningof the data after the fact.

--Sean Martin, CEO of Cambridge Semantics

Page 12: The Evolving Semantic World

LITTLE S SEMANTICS

Little "s" web technologies capture and filter data with no description or understanding of the data provided after the capture process. The process of understanding the meaning of that data starts once data capture has happened. People have to intervene to provide the context and meaning for language on the web.

--Sean Martin, CEO of Cambridge Semantics

Page 13: The Evolving Semantic World

Big S–W3C approvedstandard

Little sLooser groups of

unaffiliatedstandards

Page 14: The Evolving Semantic World

BIG S SEMANTICS

Page 15: The Evolving Semantic World

ESSENTIALS OF BIG S SEMANTIC WEB

URI – Uniform Resource Identifier

RDF – Resource Description Framework

OWL – Web Ontology Language

Semantic reasoner (inference engine)

Page 16: The Evolving Semantic World

URI – UNIFORM RESOURCE IDENTIFIER

Way to identify things Images, pages of text, locations

De-referenceable Freebase

http://www.freebase.com/view/en/will_smith

• URI’s are unique, no two are the same

• Will Smith http://www.freebase.com/view/en/

will_smith

Page 17: The Evolving Semantic World

RDF – RESOURCE DESCRIPTION FRAMEWORK

Framework used to describe relationships between objects

Extends and formalizes XML

Subject>Predicate>Object

Page 18: The Evolving Semantic World

RDF – RESOURCE DESCRIPTION FRAMEWORK

Subject>Predicate>Object

http://ew.com/PersonsTax/Will_Smith

http://ew.com/EntertainmentOnt/leadPerformanceIn

http://ew.com/EntertainmentTax/Movies/Bad_Boys

Will Smith Bad

Boys

>> >>>is the lead actor >>>>>>

Page 19: The Evolving Semantic World

OWL – WEB ONTOLOGY LANGUAGE

…designed to be used by applications that need to process the content of information instead of just presenting it to humans

-- W3C

Page 20: The Evolving Semantic World

OWL – WEB ONTOLOGY LANGUAGE Metadata model

Extends RDF to further define properties Ex: Equivalent relationships

>> >>>is married to>>>>>>

>> >>>is married to>>>>>>

Page 21: The Evolving Semantic World

SEMANTIC REASONER Software able to infer logical consequences from

a set of asserted facts

Follows inference rules specified by OWL properties

Inverse Transitive Symmetric Functional/Inverse functional Equivalent

Page 22: The Evolving Semantic World

PUTTING IT ALL TOGETHER Ontology

Rule set Classes and Properties

Taxonomy Application of Rule Set

Tags and Relationships

Everything is a statement Subject>Predicate>Object

Ex: Will Smith is lead performer in Bad Boys

Page 23: The Evolving Semantic World

BENEFITS OF RDF/OWL

Persistent URIs

Verifiable XML

Unambiguous Relationships

Polyhierarchy

Interoperability

Page 24: The Evolving Semantic World

LIMITATIONS OF RDF/OWL

Difficult to propagate across web

Challenge to integrate with legacy systems

Expensive queries

No “Killer App”

Page 25: The Evolving Semantic World

SEMANTIC WEB LAYER CAKE

Page 26: The Evolving Semantic World

LITTLE S SEMANTICS

Page 27: The Evolving Semantic World

RDFa - Resource Description Framework (in) Attributes

W3C recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents

Easy to implement Not HTML 5 compliant

Page 28: The Evolving Semantic World

RDFA: BEST BUY

Page 29: The Evolving Semantic World

LINKED OPEN DATA 2007

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-

cloud.net/”

Page 30: The Evolving Semantic World

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

Linked Open Data2010

Page 31: The Evolving Semantic World

MICROFORMATS

Semantic markup which seeks to re-use existing HTML/XHTML class attributes to structure data

Easy to implement Limited formats

Page 32: The Evolving Semantic World

MICROFORMATS: BON APPÉTIT

Page 33: The Evolving Semantic World

MICRODATA

A WHATWG HTML5 specification used to nest semantics within existing content on web pages

Officially supported by Bing, Yahoo, & Google Can imbed other markup languages like

RDFa, microformats, and Dublin Core Not well-known (yet)

Page 34: The Evolving Semantic World

MICRODATA:STEVE: THE MUSEUM SOCIAL TAGGING PROJECT

Page 35: The Evolving Semantic World

OPEN GRAPH PROTOCOL

Facebook-created markup language that turns any web page into an Open Graph Objects allowing for any page to become a Facebook page

I “Like” you Good for targeted advertising Limited in scope

Page 36: The Evolving Semantic World

OGP: MARTHA STEWART

Page 37: The Evolving Semantic World

BACK-OF-THE-NAPKIN COMPARISONFeatures RDF/

OWLRDFa MF MD OGP

W3C standard

X X X

Extensible X X X

Pre-existing Vocabs

X X

Uses URIs X XEasy to implement

X X X X

HMTL 5 compliant

X X X

Inferencing

X

Page 38: The Evolving Semantic World

STATUS REPORT ON S SEMANTIC WEB

Linked Open Data graph growing

Many countries have developed government sites with rich semantics

Development of Semantic search

More widespread adoption of lighter semantics

Page 39: The Evolving Semantic World

WHERE WE MIGHT BE GOING

Pharmaceutical industry identifies trends across clinical studies, and not just within them

News industry better targets content by locale

Department of Defense using it to make better decisions in the field

Utilized in advertising to drive more and more revenue

Page 40: The Evolving Semantic World

WHAT WE USED TO BELIEVE

Page 41: The Evolving Semantic World

TIME INC. AND TOPICS

Page 42: The Evolving Semantic World

TIME INC Largest magazine media company in U.S.

48 websites worldwide

Websites attract more than 50M unique visitors each month

Domains includes lifestyle, entertainment, style, news, sports, and business

Early adopter (2005-2006) of SW technologies

Page 43: The Evolving Semantic World

GOALS Enhance data integrity

Improve editorial efficiency

Create contextual presentation of content

Develop relationships that cannot be derived from content

Share resources among titles

Improve search and facilitate guided navigation

Page 44: The Evolving Semantic World

CHALLENGES

Aging CMS with sites on different versions

Many different domains

Scalability to accommodate volume of data and development of complex relationships

Lack of resources, money, and time

Page 45: The Evolving Semantic World

45

Star Wars: Episode I -- The Phantom MenaceEpisode 1Episode IPhantom MenaceStar Wars Episode I The Phantom MenaceStar Wars Episode I: The Phantom MenaceStar Wars prequelStar Wars: Episode 1 -- The Phantom MenaceStar Wars: Episode i -- the Phantom MenaceStar Wars: Episode I: The Phantom MenaceStar Wars: Episode I--The Phantom MenaceStar Wars: Episode I--The Phantom MenanceStar Wars: Episode One -- The Phantom MenaceStar Wars: The Phantom MenaceStar Wars: The Phantom Menace -- Episode IThe Phantom MenaceThe Phanton Menace

WHY WE NEED CONTROLLED VOCABULARIES (OR WHY FREEFORM KEYWORDS JUST DON’T WORK)

Star Wars: Episode I -- The Phantom Menace

Page 46: The Evolving Semantic World

WHAT STANDARD TO ADOPT?

RDF Flexible Scalable Fits business needs New technology but industry standard

Microformats Easy to implement No inferencing Solved some business needs but not all No standards Limited formats

Page 47: The Evolving Semantic World

SEARCH FOR VENDORS

In 2005 few commercial RDF/OWL tool available that fit our needs

Open source reasoners like Jena and a proprietary design seemed more cost-effective and realistic

Page 48: The Evolving Semantic World

TOPICS

Time Ontologies for Publishing, Inference, Classification and Semantics

Page 49: The Evolving Semantic World

WHAT IS TOPICS? Librarian Tool – allows librarians to create

resources and properties Relationship Tool - generates unambiguous

connections between data

Classification Tool - allows editors to add uniform, structured metadata to content

Semantic reasoner - finds new facts from existing data

Query Engine - manages logical retrieval of data

Page 50: The Evolving Semantic World

TECHNICAL DETAILS OF SYSTEM Java application Jena semantic reasoner Joseki query engine Sybase database

Page 51: The Evolving Semantic World

SITES AOL Home Instyle Entertainment Weekly People This Old House

Page 52: The Evolving Semantic World

AOL HOME

Features

Faceted browse

Related content

Page 53: The Evolving Semantic World

REAL SIMPLE

Features

Faceted browse

Related content

Page 54: The Evolving Semantic World

INSTYLEFeatures

Faceted browse

Related content

Improved search

Navigational taxonomy

Page 55: The Evolving Semantic World

ENTERTAINMENT WEEKLY

Aggregated content

Related content

Improved search

Sharing of resources among titles

Features

Page 56: The Evolving Semantic World
Page 57: The Evolving Semantic World

PEOPLE

Aggregated content

Related content

Improved search

Sharing of resources among titles

Features

Page 58: The Evolving Semantic World

THIS OLD HOUSE

Aggregated content

Navigational taxonomy

Improved search

Related content

Faceted browse

Features

Page 59: The Evolving Semantic World

THIS OLD HOUSE

Page 60: The Evolving Semantic World

STRENGTHS OF TOPICS Utilizes URIs

Sharable

Create once use many times

Unambiguous relationships

Facilitates aggregation of content

Controlled SEO keywords

,

Page 61: The Evolving Semantic World

WEAKNESSES OF TOPICS Creates massive database of RDF triples

Expensive to query

Based on unsupported open source code (Jena)

Polyhierarchy makes it difficult to create navigational taxonomies

Page 62: The Evolving Semantic World

62

QUERY RESULTS set taxons [TII_TOPICS_GET_ENTITY

"MediaProductsTax:MovieCasinoRoyale"]

Page 63: The Evolving Semantic World

WHAT WE KNOW NOW

Page 64: The Evolving Semantic World

MARTHA STEWART AND LITTLE “S” SEMANTICS

Page 65: The Evolving Semantic World

MARTHA STEWART LIVING OMNIMEDIA

MSLO is a Publishing, Broadcasting and Merchandising businesses

Extensive cross-promotion of content and products

3 websites and numerous digital apps

Domains include home, food, weddings, and healthy living

Page 66: The Evolving Semantic World

GOALS Enhance data integrity

Improve editorial efficiency

Share resources among titles and types of content

Create contextual presentation of content

Improve search and facilitate guided navigation

Page 67: The Evolving Semantic World

CHALLENGES Between CMS’s

Vingette to Drupal 6

Limited resources, time, money Working on new CMS

Fuzzy business requirements Unclear plan for redesign

Page 68: The Evolving Semantic World

SEARCH FOR STANDARDS RDF/OWL

RDFa

Microformats

Microdata

Open Graph Protocol

Page 69: The Evolving Semantic World

DECISIONS DECISIONS RDF/OWL

Expensive to implement No easy HTML 5 implementation No business reason to undertake such a large

endeavor

Roadblocks (Lots) LOE (Great) Time (Massive) Resources (Plenty)

Page 70: The Evolving Semantic World

DECISIONS DECISIONS RDFa

No easy HTML 5 implementation Microformats

Useful for recipes but limited formats Microdata

Useful for recipes, but new and untested Open Graph Protocol

Facebook use only, but critical to deploy ASAP

Page 71: The Evolving Semantic World

JUST ENOUGH SEMANTICS Now

Microformats Google Rich Snippets and Recipe search

OGP Site-wide implementation

Next up Probably Microdata from Schema.org

Google approved Integration of other formats

Shiny and new, untested

Page 72: The Evolving Semantic World

LESSONS LEARNED (SO FAR) Educate the troops

Buy-in from senior leadership

Loose, but coherent implementation plan

Concise, easy-to-reach business goals to start

One content type to start, then branch out

Page 73: The Evolving Semantic World

WHAT’S NEXT FOR MARTHA

Microdata deployed across all sites

Development of more sophisticated relationships with our content

Roll out of more robust faceted search

Integration of all content types into topic pages

Page 74: The Evolving Semantic World

FUTURE OF SEMANTIC WEB Move from web of objects to web of data

More personalized experiences

Positive impact on content management costs

Classifying content well allows for unanticipated uses and users; cataloging allows for audience targeting.

Page 75: The Evolving Semantic World

QUESTIONS?

Page 76: The Evolving Semantic World

Barbara McGlameryTaxonomistMartha Stewart Living Omnimedia(212)[email protected]