Upload
krista-thomas
View
1.714
Download
3
Embed Size (px)
DESCRIPTION
Here is the deck we shared with the SF and LA Semantic Web Meetups this past week (March, '09). It covers Calais 4.0 and its connection to the Linked Data cloud. Please join us at OpenCalais.com
Citation preview
CalaisThomson Reuters Calais Initiative
Overview• Going to discuss five basic topics
– What is Calais?– Why we’re doing it & what our goals are– How it works / What’s under the hood?– A few examples – Where it’s headed
Calais…
• Calais extracts smart metadata from unstructured text and links that metadata to the Linked Data cloud.
Calais progress to date• Launched in late January, 2008
• 9,500 developers have joined OpenCalais.com
• 1-3 million content ‘transactions’ per day
• Delivered four major update releases
• Free (as in free) for commercial or non-commercial use
Unstructured Text
Unstructured Text
Calais extracts entities,
facts and events
Calais extracts entities,
facts and events
Metadata returned to
the user with keys
Metadata returned to
the user with keys
Keys provide
access to the Calais
Linked Data cloud
Keys provide
access to the Calais
Linked Data cloud
Which provides information and
other Linked Data pointers
Which provides information and
other Linked Data pointers
To a range of open and partner Linked
data assets, including
Thomson Reuters
To a range of open and partner Linked
data assets, including
Thomson Reuters
11
22
33
44
55
66
Quick DemoYou can find the Calais Viewer demonstration tool here: http://viewer.opencalais.com (Note that the Calais Viewer is not the Calais service. It is merely a demonstration of how the service works.)
– Copy and paste the text of a business news article from AP, Dow Jones or Reuters.com into the viewer, and press submit. The article is sent to the Calais engine which tags the content and returns it, marked-up.
– The tags appear on the left hand rail, and you can click on the plus (+) sign to see the tags expand.
– Since we are now on Calais 4.0, you can also use the viewer to see the Linked Data assets related to the tags Calais returns.
• Click on a company name on the left hand rail to find a Calais summary page featuring a basic description for that company, as well as a number of links.
• Follow those links to see the other data entries on that company that are available for public use in the Linked Data Cloud.
– For example, here is the Calais summary page for IBM: http://d.opencalais.com/er/company/ralg-tr1r/9e3f6c34-aa6b-3a3b-b221-a07aa7933633.html
– And here is the summary page for IBM in DBPedia (the Wikipedia translated into computer language): http://dbpedia.org/page/IBM
Why & What
1. Derive semantic metadata from textual assets2. Use that semantic metadata to create entry points into
the linked data ecosystem3. Provide a simple mechanism for the sharing of semantic
metadata about textual content assets4. And just why are you doing this…
1: Semantics from Text: The Text Problem
• People consume text
• Most of it isn’t semantically enabled
• Most of it won’t be semantically enabled
• This isn’t about standards –microfromats vs RDFa vs. whatever.
• Why: Latency, cost and short shelf-life
1: Semantics from Text: The Text Problem• Target areas
where:– The economics
don’t support metadata creation
– The value of metadata is potentially high
– The value of aggregated metadata is potentially extremely high
Seco
nds
Year
s
Seconds
Years
Tweets
New Gen
News
Legacy News
Scient. Pubs
Great Novels
Latency
Shel
f Life
2: Getting from Text to the Linked Data Ecosystem
The Linked Data Cloud
3: Semantic Metadata Transport Layer• I’m a content producer.
We’ve loaded the car with rich semantic metadata
– I’m sharing it within my four walls
– How do I transport it to my consumers?
– RSS / Atom, XML, Proprietary data feeds, Content API’s
4: Why We’re Doing It
• Two simple answers:
– Hyper-evolution of capabilities – better, faster, stronger
– The walled garden content world
How it Works – Under the Hood of Calais
How it Works – Under the Hood of Calais
Calais Web Service
ClearForest NLP Engine
Rule Base
Lexicons
RDF
Disambig. Engine
Reference Data Assets
Metadata Management
Document Level
Metadata
Entity Level Linked Data
and …
Output Formatting
Stat Tools
Where From Here?• We’ve seen examples of first generation uses.
• Where does this go in the future?
• Beyond the document– Social Resume analysis– Museum Content Coalitions– Knowledge Management Applications– Investigative Journalism*
Investigative Journalism
FOIA Contract Documents
Calais Web Service
Company:PersonFamilyRelation
News Calais Web Service
Company:ContractCompany:Affiliation
Big Fuzzy Graph
What’s in the Pipeline?• 2009 (this is a fuzzy list)
– Person disambiguation @ domain level?– Other disambiguation– Continued expansion of URI’s (entities & events)– Calais as hub– Exposure of the IDE?– User managed lexicons– Languages– Opt-in SPARQL Endpoint?
• www.opencalais.com
– Gallery – code and applications examples– Forums– Documentation
• Twitter @opencalais, Facebook Group