Upload
oksana
View
26
Download
0
Embed Size (px)
DESCRIPTION
The Evolving Semantic World. Barbara McGlamery Taxonomist Martha Stewart Living Omnimedia. About me. Masters in Library and Information Science Long Island University New York Public Library Branch librarian NYPL for the Performing Arts – Drama reference Entertainment Weekly Data Manager - PowerPoint PPT Presentation
Citation preview
THE EVOLVING SEMANTIC WORLD
Barbara McGlameryTaxonomistMartha Stewart Living Omnimedia
ABOUT ME Masters in Library and Information Science
Long Island University
New York Public Library Branch librarian NYPL for the Performing Arts – Drama reference
Entertainment Weekly Data Manager
Time Inc. Senior Data Manager, Taxonomist, Metadata Architect, Ontologist
Martha Stewart Living Omnimedia Taxonomist
AGENDA What is the Semantic Web?
Big “S” and little “s” semantics
What we used to believe Time Inc. & the theory of overkill
What we know now Martha Stewart and the theory that less is more
Where we’re going Leaner and meaner (but more standards)
WHAT IS THE SEMANTIC WEB?
The Semantic Web is a web of data…. (it) provides a
common framework that allows data to be shared and
reused across applications, enterprise, and community
boundaries.--w3c
"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”
--Tim Berners-Lee, James Hendler, and Ora Lassila, Scientific American, 2001
The Semantic Web is about making knowledge machine and human-readable
---- Amit Agarwalhttp://www.labnol.org/internet/web-3-concepts-explained/8908/
Web 1.0 Web 2.0 Web 3.0Connections Collaboration Intelligence
Big S semantic web
Little s semantic web
BIG S SEMANTIC WEB
…big "S" web technologies provide aframework for describing data on a web page whenthe data on the website is published. If data is reador captured, because the data's semantic meaninghas already been described, you don't have to gothrough the process of understanding the meaningof the data after the fact.
--Sean Martin, CEO of Cambridge Semantics
LITTLE S SEMANTICS
Little "s" web technologies capture and filter data with no description or understanding of the data provided after the capture process. The process of understanding the meaning of that data starts once data capture has happened. People have to intervene to provide the context and meaning for language on the web.
--Sean Martin, CEO of Cambridge Semantics
Big S–W3C approvedstandard
Little sLooser groups of
unaffiliatedstandards
BIG S SEMANTICS
ESSENTIALS OF BIG S SEMANTIC WEB
URI – Uniform Resource Identifier
RDF – Resource Description Framework
OWL – Web Ontology Language
Semantic reasoner (inference engine)
URI – UNIFORM RESOURCE IDENTIFIER
Way to identify things Images, pages of text, locations
De-referenceable Freebase
http://www.freebase.com/view/en/will_smith
• URI’s are unique, no two are the same
• Will Smith http://www.freebase.com/view/en/
will_smith
RDF – RESOURCE DESCRIPTION FRAMEWORK
Framework used to describe relationships between objects
Extends and formalizes XML
Subject>Predicate>Object
RDF – RESOURCE DESCRIPTION FRAMEWORK
Subject>Predicate>Object
http://ew.com/PersonsTax/Will_Smith
http://ew.com/EntertainmentOnt/leadPerformanceIn
http://ew.com/EntertainmentTax/Movies/Bad_Boys
Will Smith Bad
Boys
>> >>>is the lead actor >>>>>>
OWL – WEB ONTOLOGY LANGUAGE
…designed to be used by applications that need to process the content of information instead of just presenting it to humans
-- W3C
OWL – WEB ONTOLOGY LANGUAGE Metadata model
Extends RDF to further define properties Ex: Equivalent relationships
>> >>>is married to>>>>>>
>> >>>is married to>>>>>>
SEMANTIC REASONER Software able to infer logical consequences from
a set of asserted facts
Follows inference rules specified by OWL properties
Inverse Transitive Symmetric Functional/Inverse functional Equivalent
PUTTING IT ALL TOGETHER Ontology
Rule set Classes and Properties
Taxonomy Application of Rule Set
Tags and Relationships
Everything is a statement Subject>Predicate>Object
Ex: Will Smith is lead performer in Bad Boys
BENEFITS OF RDF/OWL
Persistent URIs
Verifiable XML
Unambiguous Relationships
Polyhierarchy
Interoperability
LIMITATIONS OF RDF/OWL
Difficult to propagate across web
Challenge to integrate with legacy systems
Expensive queries
No “Killer App”
SEMANTIC WEB LAYER CAKE
LITTLE S SEMANTICS
RDFa - Resource Description Framework (in) Attributes
W3C recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents
Easy to implement Not HTML 5 compliant
RDFA: BEST BUY
LINKED OPEN DATA 2007
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-
cloud.net/”
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Open Data2010
MICROFORMATS
Semantic markup which seeks to re-use existing HTML/XHTML class attributes to structure data
Easy to implement Limited formats
MICROFORMATS: BON APPÉTIT
MICRODATA
A WHATWG HTML5 specification used to nest semantics within existing content on web pages
Officially supported by Bing, Yahoo, & Google Can imbed other markup languages like
RDFa, microformats, and Dublin Core Not well-known (yet)
MICRODATA:STEVE: THE MUSEUM SOCIAL TAGGING PROJECT
OPEN GRAPH PROTOCOL
Facebook-created markup language that turns any web page into an Open Graph Objects allowing for any page to become a Facebook page
I “Like” you Good for targeted advertising Limited in scope
OGP: MARTHA STEWART
BACK-OF-THE-NAPKIN COMPARISONFeatures RDF/
OWLRDFa MF MD OGP
W3C standard
X X X
Extensible X X X
Pre-existing Vocabs
X X
Uses URIs X XEasy to implement
X X X X
HMTL 5 compliant
X X X
Inferencing
X
STATUS REPORT ON S SEMANTIC WEB
Linked Open Data graph growing
Many countries have developed government sites with rich semantics
Development of Semantic search
More widespread adoption of lighter semantics
WHERE WE MIGHT BE GOING
Pharmaceutical industry identifies trends across clinical studies, and not just within them
News industry better targets content by locale
Department of Defense using it to make better decisions in the field
Utilized in advertising to drive more and more revenue
WHAT WE USED TO BELIEVE
TIME INC. AND TOPICS
TIME INC Largest magazine media company in U.S.
48 websites worldwide
Websites attract more than 50M unique visitors each month
Domains includes lifestyle, entertainment, style, news, sports, and business
Early adopter (2005-2006) of SW technologies
GOALS Enhance data integrity
Improve editorial efficiency
Create contextual presentation of content
Develop relationships that cannot be derived from content
Share resources among titles
Improve search and facilitate guided navigation
CHALLENGES
Aging CMS with sites on different versions
Many different domains
Scalability to accommodate volume of data and development of complex relationships
Lack of resources, money, and time
45
Star Wars: Episode I -- The Phantom MenaceEpisode 1Episode IPhantom MenaceStar Wars Episode I The Phantom MenaceStar Wars Episode I: The Phantom MenaceStar Wars prequelStar Wars: Episode 1 -- The Phantom MenaceStar Wars: Episode i -- the Phantom MenaceStar Wars: Episode I: The Phantom MenaceStar Wars: Episode I--The Phantom MenaceStar Wars: Episode I--The Phantom MenanceStar Wars: Episode One -- The Phantom MenaceStar Wars: The Phantom MenaceStar Wars: The Phantom Menace -- Episode IThe Phantom MenaceThe Phanton Menace
WHY WE NEED CONTROLLED VOCABULARIES (OR WHY FREEFORM KEYWORDS JUST DON’T WORK)
Star Wars: Episode I -- The Phantom Menace
WHAT STANDARD TO ADOPT?
RDF Flexible Scalable Fits business needs New technology but industry standard
Microformats Easy to implement No inferencing Solved some business needs but not all No standards Limited formats
SEARCH FOR VENDORS
In 2005 few commercial RDF/OWL tool available that fit our needs
Open source reasoners like Jena and a proprietary design seemed more cost-effective and realistic
TOPICS
Time Ontologies for Publishing, Inference, Classification and Semantics
WHAT IS TOPICS? Librarian Tool – allows librarians to create
resources and properties Relationship Tool - generates unambiguous
connections between data
Classification Tool - allows editors to add uniform, structured metadata to content
Semantic reasoner - finds new facts from existing data
Query Engine - manages logical retrieval of data
TECHNICAL DETAILS OF SYSTEM Java application Jena semantic reasoner Joseki query engine Sybase database
SITES AOL Home Instyle Entertainment Weekly People This Old House
AOL HOME
Features
Faceted browse
Related content
REAL SIMPLE
Features
Faceted browse
Related content
INSTYLEFeatures
Faceted browse
Related content
Improved search
Navigational taxonomy
ENTERTAINMENT WEEKLY
Aggregated content
Related content
Improved search
Sharing of resources among titles
Features
PEOPLE
Aggregated content
Related content
Improved search
Sharing of resources among titles
Features
THIS OLD HOUSE
Aggregated content
Navigational taxonomy
Improved search
Related content
Faceted browse
Features
THIS OLD HOUSE
STRENGTHS OF TOPICS Utilizes URIs
Sharable
Create once use many times
Unambiguous relationships
Facilitates aggregation of content
Controlled SEO keywords
,
WEAKNESSES OF TOPICS Creates massive database of RDF triples
Expensive to query
Based on unsupported open source code (Jena)
Polyhierarchy makes it difficult to create navigational taxonomies
62
QUERY RESULTS set taxons [TII_TOPICS_GET_ENTITY
"MediaProductsTax:MovieCasinoRoyale"]
WHAT WE KNOW NOW
MARTHA STEWART AND LITTLE “S” SEMANTICS
MARTHA STEWART LIVING OMNIMEDIA
MSLO is a Publishing, Broadcasting and Merchandising businesses
Extensive cross-promotion of content and products
3 websites and numerous digital apps
Domains include home, food, weddings, and healthy living
GOALS Enhance data integrity
Improve editorial efficiency
Share resources among titles and types of content
Create contextual presentation of content
Improve search and facilitate guided navigation
CHALLENGES Between CMS’s
Vingette to Drupal 6
Limited resources, time, money Working on new CMS
Fuzzy business requirements Unclear plan for redesign
SEARCH FOR STANDARDS RDF/OWL
RDFa
Microformats
Microdata
Open Graph Protocol
DECISIONS DECISIONS RDF/OWL
Expensive to implement No easy HTML 5 implementation No business reason to undertake such a large
endeavor
Roadblocks (Lots) LOE (Great) Time (Massive) Resources (Plenty)
DECISIONS DECISIONS RDFa
No easy HTML 5 implementation Microformats
Useful for recipes but limited formats Microdata
Useful for recipes, but new and untested Open Graph Protocol
Facebook use only, but critical to deploy ASAP
JUST ENOUGH SEMANTICS Now
Microformats Google Rich Snippets and Recipe search
OGP Site-wide implementation
Next up Probably Microdata from Schema.org
Google approved Integration of other formats
Shiny and new, untested
LESSONS LEARNED (SO FAR) Educate the troops
Buy-in from senior leadership
Loose, but coherent implementation plan
Concise, easy-to-reach business goals to start
One content type to start, then branch out
WHAT’S NEXT FOR MARTHA
Microdata deployed across all sites
Development of more sophisticated relationships with our content
Roll out of more robust faceted search
Integration of all content types into topic pages
FUTURE OF SEMANTIC WEB Move from web of objects to web of data
More personalized experiences
Positive impact on content management costs
Classifying content well allows for unanticipated uses and users; cataloging allows for audience targeting.
QUESTIONS?
Barbara McGlameryTaxonomistMartha Stewart Living Omnimedia(212)[email protected]