Upload
jesse-wang
View
623
Download
0
Embed Size (px)
DESCRIPTION
this is a high-level pitch deck for knowledge acquisition (KA), beside the textual part. We already decide on matter that we need low level textual entailment based KA, while the high-level part involving more human computation is partially ignored at the point of presentation. This deck is an introduction to social semantic web and let people know how it can help with our KA tasks.
Citation preview
From Text and Data to Knowledge: Via Semantic Wikis
The Social Semantic Web in the Small
From Text and Data to Knowledge: Via Semantic Wikis
The Social Semantic Web in the Small
Jesse Wang
2
The Bottleneck of AI is Knowledge AcquisitionThe Bottleneck of AI is Knowledge Acquisition
Human Intelligenc
e
Computer Intelligenc
e
3
COMPUTER INTELLIGENCE IS IN THE CONNECTIONSCOMPUTER INTELLIGENCE IS IN THE CONNECTIONS
Connecting both Information and PeopleConnecting both Information and People
Connections between people
Conn
ectio
ns b
etwe
en In
form
atio
n
Social Networking
Groupware
JavascriptWeblogs
Databases
File Systems
HTTPKeyword Search
USENET
Wikis
Websites
Directory Portals
2010 - 2020
Web 1.0
2000 - 2010
1990 - 2000
PC Era1980 - 1990
RSSWidgets
PC’s
2020 - 2030
Office 2.0
XML
RDF
SPARQLAJAX
FTP IRC
SOAP
Mashups
File Servers
Social Media Sharing
Lightweight Collaboration
ATOM
Web 3.0
Web 4.0
Semantic SearchSemantic Databases
Distributed Search
Intelligent personal agents
JavaSaaS
Web 2.0 Flash
OWL
HTML
SGML
SQLGopher
P2P
The Web
The PC
Windows
MacOS
SWRL
OpenID
BBS
MMO’s
VR
Semantic Web
Intelligent Web
The Internet
Social Web
Web OS
5
At Multiple Levels of UnderstandingAt Multiple Levels of Understanding
Signal entity (Words)
Signal form (Syntax)
Signal semantics (Concepts)
Categories (taxonomy)
Statements
Models
Decision-making
6
HOW DO WE CAPTURE ALL? HOW DO WE CAPTURE ALL? At least, the semantics?
Two Paths for Semantics (>>KB Construction)Two Paths for Semantics (>>KB Construction)
“Bottom-Up” – Add semantic metadata to pages and databases all over the Web
• Alternatively train models to extract above info (machine-assisted)– Every Website becomes semantic
• except for those not tagged, trained, or errors
“Top-Down”– Experts build models and rules for semantics– Create services that provide this as an overlay to non-semantic
Web– Every website becomes semantic
• except for those not covered
-- Alex Iskold
Five Approaches to SemanticsFive Approaches to Semantics
Tagging
Statistics
Linguistics
Semantic Web
Artificial Intelligence
The Tagging ApproachThe Tagging Approach
Pros– Easy for users to add and read tags– Tags are just strings– No algorithms or ontologies to deal with– No technology to learn
Cons– Easy for users to add and read tags– Tags are just strings– No algorithms or ontologies to deal with– No technology to learn
Technorati
Del.icio.us
Flickr
Wikipedia
YouTube
The Statistical ApproachThe Statistical Approach
Pros: – Pure mathematical algorithms– Massively scalable with good training data– Language independent
Cons: – No understanding of the content– Hard to craft good queries– Best for finding really popular things – not good at finding needles in haystacks– Limited by data (esp. quality training data)– Not great for sparse structured data with strong inherent semantics
Lucene
Autonomy
Farecast (Bing Travel)
The Linguistic ApproachThe Linguistic Approach
Pros:– Almost-true language understanding– Extract knowledge from text– Best for search for particular facts or relationships– More precise queries
Cons:– Computationally intensive– Difficult to scale– Lots of special case and other errors– Language-dependent
Powerset
Hakia
Inxight, Attensity, and others…
The Semantic Web ApproachThe Semantic Web Approach
Pros:– More precise queries– Smarter apps with less work– Not as computationally intensive– Share & link data between apps– Works for both unstructured and structured data
Cons:– Lack of tools– Difficult to scale– Who makes all the metadata?
Radar Networks
DBpedia Project
Metaweb (Freebase)
The Artificial Intelligence ApproachThe Artificial Intelligence Approach
Pros:– Smart in narrow domains– Answer questions intelligently– Reasoning and learning
Cons:– Computationally intensive– Difficult to scale– Extremely hard to program– Does not work well outside of narrow domains– Training takes a lot of work
Cycorp
AURA (Project Halo)
The Approaches ComparedThe Approaches Compared
Make the software smarter
Make the Data Smarter
Statistics
Linguistics
SemanticWeb
A.I.
Tagging
In PracticeIn Practice
TaggingSemantic WebStatisticsLinguisticsArtificial intelligence
16
From Tagging to AIFrom Tagging to AI
Data Structure
Intelligence
The Semantic Web is a Key EnablerThe Semantic Web is a Key Enabler
Moves the “intelligence” out of applications, into the data
Data need special structures
becomes self-describing; Meaning of data becomes part of the data
Apps can become smarter with less work, because the data carries knowledge about what it is and how to use it
Data can be shared and linked more easily
The Semantic Web = Open Database Layer for the WebThe Semantic Web = Open Database Layer for the Web
User
ProfilesWeb
ContentData
RecordsApps &
ServicesAds &
Listings
Open Data Mappings
Open Data Records
Open Rules
Open Ontologies
Open Query Interfaces
And The Web IS the Database!And The Web IS the Database!
Application A Application B
ColdplayBand
Palo AltoCity
JanePerson
IBMCompany
DavePerson
BobPerson
DesignTeamGroup
StanfordAlumnae
Group
IBM.comWeb Site
123.JPGPhotoDave.com
Weblog
SuePerson
JoePerson
Dave.comRSS Feed
Lives in
Publisher of
Friend of
Depiction of
Depiction of
Member of
Married to
Member of
Member of
Member of
Fan of
Lives in
Subscriber to
Source of
Author of
Member of
Employee of
Fan of
20
BUT THERE IS STILL SOMETHING MISSINGBUT THERE IS STILL SOMETHING MISSING
21
We Need Put Ourselves Into The Semantic Web!
22
In Every Part or Layer of the Semantic Web, We NeedIn Every Part or Layer of the Semantic Web, We Need
People’s Involvement(Wisdom of the Crowd)
23
Now a Complete WebNow a Complete Web
Social Semantic WebHuman Machine Web
24
Crowd Wisdom To Best Map Human Knowledge for HumanCrowd Wisdom To Best Map Human Knowledge for Human
25
Clear Semantics for Machine to Understand KnowledgeClear Semantics for Machine to Understand Knowledge
26
Semantic Wikis: the Social Semantic Web in Action!Semantic Wikis: the Social Semantic Web in Action!
Semantic Wikis
What is a Wiki? A Key Feature of Wikis isWhat is a Wiki? A Key Feature of Wikis is
27
Consensus
This distinguishes wikis from other publication tools
Consensus in Wikis Comes fromConsensus in Wikis Comes from
Collaboration– ~17 edits/page on average in
Wikipedia (with high variance)– Wikipedia’s Neutral Point of View
Convention– Users follow customs and
conventions to engage with articles effectively
28
Software Support Makes Wikis SuccessfulSoftware Support Makes Wikis Successful
Trivial to edit by anyone Tracking of all changes, one-step
rollback Every article has a “Talk” page for
discussion Notification facility allows anyone
to “watch” an article Sufficient security on pages,
logins can be required A hierarchy of administrators,
gardeners, and editors Software Bots recognize certain
kinds of vandalism and auto-revert, or recognize articles that need work, and flag them for editors 29
Success of WikisSuccess of Wikis
30
One of human’s greatest inventions
Actual number of articles on en.wikipedia.org (thick blue line) compared with a Gompertz model that leads eventually to a maximum of about 4.4 million articles
(thin green line)
Summary: What Wiki Is Really AboutSummary: What Wiki Is Really About
Quick and Easy – No download
Layered Community Authoring
Interlinked Hierarchical Content
Revision Control
Notification
Softw
are
Supp
ort
What is a Semantic WikiWhat is a Semantic Wiki
A wiki that has an underlying model of the knowledge described in its pages.
To allow users to make their knowledge explicit and formal Semantic Web Compatible
32
Semantic Wiki
Combining Human Knowledge and Data Structures Combining Human Knowledge and Data Structures
Wikis for Metadata
Metadata for Wikis
33
Basics of Semantic WikisBasics of Semantic Wikis
Still a wiki, with regular wiki features– E.g. Category/Tags, Namespaces, Title, Versioning, ...
Typed Content– E.g. Page/Card, Date, Number, URL/Email, String, …
Typed Links– E.g. “capital_of”, “contains”, “born_in”…
Querying Interface Support– E.g. “[[Category:Person]] [[Age::<30]]”
34
Advanced Semantic Wiki FeaturesAdvanced Semantic Wiki Features
Semantic forms or templates Auto-completion based on semantics Powerful visualizations based on semantics/structures/types Rules and reasoning support Advanced search and queries (faceted search, SPARQL,
etc.) Semantic notifications (personalized information filtering) Import and Export of Semantic Data Data Integration: identification, disambiguation, merging,
trust, security/privacy, …
35
Characteristics of Semantic WikisCharacteristics of Semantic Wikis
36
Semantic Wikis
What is the Promise of Semantic Wikis?What is the Promise of Semantic Wikis?
Semantic Wikis facilitate Consensus over Data (Knowledge)
Combine low-expressivity data authorship with the best features of traditional wikis
User-governed, user-maintained, user-defined
Easy to use as an extension of text authoring
37
The ultimate Knowledge aggregator
One Key Helpful Feature of Semantic WikisOne Key Helpful Feature of Semantic Wikis
Semantic Wikis are “Schema-Last”Databases require DBAs and schema design;
Semantic Wikis develop and maintain the schema in the wiki
39
Great Candidate for Knowledge AcquisitionGreat Candidate for Knowledge Acquisition
Combining both unstructured and semi-structured data High connectivity on both information and social dimensions Collaboration with sophisticated software support Expected low-cost for crowd-sourcing Evolving category and template systems
But…
BUT – Plain Wikis Are Not Good Enough for Deep Knowledge AcquisitionBUT – Plain Wikis Are Not Good Enough for Deep Knowledge Acquisition
40
Knowledge is represented MOSTLY in unstructured and semi-structured ways• Plain text• Templates• Infoboxes• Tables• Section headers• Links• References• Redirects• …
41
Software/Feature Enhancements Are NeededSoftware/Feature Enhancements Are Needed
Quick and easy way to view and edit schema
Machine assistence (NLP, Auto-suggest…)
Better visualizations with structured data
More user layers for better KB construction
Better targeted (semantic) notifications
42
K.A. is the well-known Artificial Intelligence Problem– AI authoring is too expensive, too slow, not scalable
Three Possible Solutions– Automatic Machine Parsing (e.g. NELL, ReVerb)
• Quality (depth) not good enough for textbook sentences• Error rates are too high• Still need humans in the loop for training data
– Crowd Sourced Authoring (e.g. AMT)• Biology and Knowledge Engineering expertise is difficult to get• Mechanical Turk uses individuals, but the Knowledge Entry tasks appear to require
coordination, judgment, discussion, and working together
– Social Authoring and Crowdsourcing with Intelligence Software Assistance
• Wikipedia showed this could work for text• Semantic Wiki software R&D to make it work for more structured knowledge
Best Bet for Knowledge Acquisition?Best Bet for Knowledge Acquisition?
43
With All These Features…With All These Features…
Effective Knowledge acquisition
via Semantic Wikis
Combine the strength of human and machines
Connecting Human and Machines
High Quality while low cost
44
Conclusion: To Bridge Machine and Human IntelligenceConclusion: To Bridge Machine and Human Intelligence
We Need Social Semantic Web
45
To Dive Into Social Semantic Web To Dive Into Social Semantic Web
Semantic Wiki is a Great Candidate
46
THANK YOU!THANK YOU!
Credits: some slides are originally from the following people, with little or no modifications:
Nova SpivackDenny VrandecicMark GreavesBao Jie